Cogent network issues

May 4th, 2010 @ 03:57 AM

Update May 12th@08:36AM CDT: Customers still experiencing Cogent-related network issues are invited to log into Support Chat where backbone engineer ‘coma’ is keen to gather data to help track down any remaining issues between now and 12:00AM CDT.

Update May 6th@12:25PM CDT: We have continued working with Cogent in response to this issue and believe it to be fully resolved at this time. If you continue to experience any additional connectivity issues, please contact our support team at support@slicehost.com. In your email, please include a traceroute to your slice’s IP address, your source, or local, IP address, and a series of 100 pings to your slice.

Update May 6th @9:57AM CDT: We do not currently have any further information to provide, however, we are still investigating this issue and hope to come to a final resolution soon. As mentioned in updates below we have shifted the majority of traffic away from Cogent in our STL-B data center and that should alleviate packet loss issues for many European customers. If you are still encountering issues, however, please contact our support team at support@slicehost.com. In your email, please include a traceroute to your slice’s IP address, your source or local IP address, and a series of 100 pings to your slice. This information is necessary to assist in our continued investigation to resolve this issue.

Update May 5th @3:15PM CDT: We have once again shifted the majority of traffic away from Cogent in our STL-B data center. This should alleviate the packet loss and connectivity issues many of our European customers are experiencing. If you still continue to encounter any connectivity issues, please contact our support team at support@slicehost.com. In your email, please include a traceroute to your slice’s IP address, your source, or local, IP address, and a series of 100 pings to your slice. This information is necessary to assist in our continued investigation to resolve this issue.

Update May 5th @2:00PM CDT: We don’t have any further information to provide at this time. However, we are still investigating this issue and hope to come to a final resolution soon. If you are still experiencing issues connecting to your slice, please continue to email us at support@slicehost.com with the requested information specified in previous updates.

Update May 5th @12:40PM CDT: Thanks to all who sent in the requested information thus far. We are continuing to work with Cogent to resolve this issue and appreciate your assistance.

Update May 5th @12:00PM CDT: We are still in direct contact with Cogent in an effort to fully resolve this issue. If you are still experiencing connectivity issues to your slice, we ask that you please email us at support@slicehost.com with the output of a traceroute to your slice’s IP address. Additionally, please include your source, or local, IP address and a series of 100 pings to your slice. This will greatly aid us in resolving this issue as soon as possible.

Update May 5th @11:15 AM CDT: We are currently performing tests with Cogent to help diagnose and completely resolve this issue as soon as possible. During these tests, European customers may experience packet loss and connectivity issues to their slice. We apologize for this inconvenience, but feel it is necessary in order to fully resolve this matter.

Update May 5th @10:30 AM CDT: Our networking team has shifted the majority of traffic away from Cogent in our STL-B data center. This should hopefully resolve many of the issues our European customers are experiencing as we continue to work towards a final resolution to this issue.

Update May 5th @9:00 AM CDT: Update: Routing issues continue. Although the routing problem is outside of Slicehost’s network and beyond our control, your help with gathering data has given our engineers good data to help urge backbone providers to address the cause of the issue.

Update: Cogent’s EU NOC are still investigating the issue that some of our European customers are seeing with packet loss.

Update: Rackspace Backbone engineers are working with Cogent to determine the cause of what seems to be approximately 30% or so packet loss on routes from Southern and parts of Western Europe. Note this is a third-party routing issue and the slices themselves are fine.

We’re seeing intermittent network issues on Cogent’s network that appear to affect European customers. We’ve informed Cogent and will report updates as we have them.

Posted by Lee

Comments:

Elliott Manley commented on Tue May 04 05:55:44 UTC 2010:

We and our clients are suffering intermittent connectivity problems that our hosts, slicehost.net, attribute to you. This has been outstanding for 9 hours and I am worried to note there is no mention of the issue on your status page.

Elliott Manley commented on Tue May 04 05:56:25 UTC 2010:

I’m an idiot. Got interrupted and thought I was on Cogent’s page!!

Patrick commented on Tue May 04 12:14:14 UTC 2010:

Has there been any official statement from Cogent Co yet, apart from a vague “we’re looking at it”? It’s a little unsettling to see persistent 50-100% packet loss for more than 12 hours without a proper response.

I’m in Stockholm, the packet loss seems to start at ORD (Chicago) in AMS->YMQ->ORD.

Nuno Zimas commented on Tue May 04 12:29:38 UTC 2010:

A customer of ours has reported severe connectivity issues from Poland and from the Netherlands*.

From Spain we have no trouble at all.

This seems to affect North and Central Europe, not the South. Or at least not NW Spain.

Dimitris commented on Tue May 04 13:26:31 UTC 2010:

We too see severe packet loss esp. from Greece’s largest ISP (OTE).

I wonder why cogent doesn’t show anything on their status page.

Jo Potts commented on Tue May 04 13:27:41 UTC 2010:

I phoned the Cogent helpline but were told that they weren’t able to give any information out to people who weren’t their ‘customers’ and refused to acknowledge that there was even a problem!

All of my Northern and Central Europe customers are experiencing problems, with some saying that it has been affecting them for ‘2 days’ now. I’m not impressed with Cogent at all!

Giovanni Intini commented on Tue May 04 13:31:55 UTC 2010:

Severe packetloss from Italy too.

Nuno Zimas commented on Tue May 04 13:49:33 UTC 2010:

From Coruña (Spain) everything is still working smoothly (i guess it’s because we are closer to the US hahahahaha). Shouldn’t laugh at a serious issue, but… oh well, pulling out my hair won’t make any difference too, so… None of our Spanish customers has complained either. Huge packet losses from Poland and The Netherlands as reported above,

Patrick commented on Tue May 04 13:54:33 UTC 2010:

@Nuno: I guess if you do a traceroute to your Slice from Spain, it doesn’t go through the AMS (130.117.50.33), YMQ (154.54.0.74) and ORD (154.54.28.5) hosts on the way?

Nuno Zimas commented on Tue May 04 14:04:02 UTC 2010:

@Patrick Don’t think so.

Apparently my ISP connects straight to a Cogent node. See below (origin and destination ip’s omitted)

5 129.216.106.212.static.jazztel.es (212.106.216.129) 59.527 ms 59.497 ms 60.327 ms 6 te2-2.mpd02.mad05.atlas.cogentco.com (149.6.80.29) 71.553 ms 42.008 ms 42.417 ms 7 te0-0-0-1.mpd22.par01.atlas.cogentco.com (130.117.0.226) 68.396 ms te0-2-0-1.mpd21.par01.atlas.cogentco.com (130.117.0.178) 70.622 ms te0-0-0-1.mpd22.par01.atlas.cogentco.com (130.117.0.226) 71.358 ms 8 te1-4.ccr01.par04.atlas.cogentco.com (130.117.1.62) 72.582 ms te1-1.ccr01.par04.atlas.cogentco.com (130.117.1.142) 73.930 ms te1-4.ccr01.par04.atlas.cogentco.com (130.117.1.62) 76.255 ms 9 te0-3-0-2.ccr22.jfk02.atlas.cogentco.com (154.54.0.209) 163.060 ms 162.981 ms 165.777 ms 10 te1-8.ccr02.ord01.atlas.cogentco.com (154.54.29.165) 194.530 ms te4-4.ccr02.ord01.atlas.cogentco.com (154.54.29.169) 195.746 ms 197.546 ms 11 te4-2.ccr01.stl03.atlas.cogentco.com (154.54.27.30) 206.503 ms 206.483 ms * 12 vl3802.na31.b016110-1.stl03.atlas.cogentco.com (66.28.5.246) 196.084 ms 191.218 ms vl3502.na31.b016110-1.stl03.atlas.cogentco.com (66.28.5.58) 191.101 ms 13 38.104.162.22 (38.104.162.22) 192.635 ms 194.861 ms 194.805 ms 14 209-20-79-17.slicehost.net (209.20.79.17) 195.954 ms 189.811 ms 191.645 ms

Patrick commented on Tue May 04 14:09:18 UTC 2010:

@Nuno: Thanks—yeah, it’s not the same route. Here’s mine:

HOST: singularity Loss% Snt Last Avg Best Wrst StDev 5. ti3097a211-xe9-2.ti.telenor. 2.0% 1000 35.7 106.9 22.3 4880. 485.3 6. ti3001d400-xe1-1-0.ti.teleno 2.6% 1000 53.3 108.8 22.3 5044. 487.5 7. ti3001c310-ae1-0.ti.telenor. 2.6% 1000 73.6 112.1 22.4 5475. 509.2 8. ti3001b300-ae0-0.ti.telenor. 0.9% 1000 40.4 104.4 22.2 5431. 479.0 9. te1-8.ccr02.sto01.atlas.coge 2.9% 1000 32.9 117.7 22.1 5374. 509.1 10. te0-0-0-2.ccr22.ham01.atlas. 0.5% 1000 55.5 131.3 46.7 5358. 501.8 11. te9-8.ccr02.ams03.atlas.coge 0.4% 1000 49.9 144.0 48.3 5304. 500.6 12. te3-2.ccr02.ymq02.atlas.coge 0.5% 1000 146.2 232.1 128.8 5335. 499.3 13. te9-8.ccr02.ord01.atlas.coge 99.1% 1000 183.4 170.3 152.6 275.7 40.8 14. te4-2.ccr01.stl03.atlas.coge 0.7% 1000 161.6 260.4 159.5 5266. 491.4 15. vl3502.na31.b016110-1.stl03. 1.0% 1000 188.0 241.1 158.2 5219. 492.4 16. 38.104.162.22 98.9% 1000 160.0 160.1 158.4 165.6 2.1 17. 209-20-79-21.slicehost.net 2.6% 1000 168.7 250.7 167.6 5125. 491.1

Patrick commented on Tue May 04 14:11:16 UTC 2010:

Oh wow, that turned out messy. Here it is: http://pastebin.com/11UkA0nr

Patrick commented on Tue May 04 14:17:07 UTC 2010:

PS: I’m not a networking guru, but that report, based on 1000 sequential pings per hop, seems to indicate that the level of packet loss is significantly more severe (90%+) than Cogent Co indicate.

Ben Smith commented on Tue May 04 16:35:27 UTC 2010:

So what’s the deal with Cogent’s status page:

http://status.cogentco.com/

ceb commented on Tue May 04 16:51:34 UTC 2010:

This is extremely frustrating.

Boo Radley commented on Tue May 04 18:18:32 UTC 2010:

It seems to me this issue is isolated to Slicehost and Cogent only.

Nuno Zimas commented on Tue May 04 18:54:20 UTC 2010:

Traceroute from polabnd below: http://pastebin.com/CjurUHxW

My impression is that the connection dies in Washington. Doesn’t seem to make any sense, but that’s where the trace dies (i think).

ceb commented on Tue May 04 20:36:48 UTC 2010:

Is anyone doing anything about it or should we start moving our services as our only solution to this ?

Ivan commented on Wed May 05 01:28:42 UTC 2010:

Issues in Slovakia

Simon commented on Wed May 05 01:57:50 UTC 2010:

Still Issues In Germany

Patrick commented on Wed May 05 02:11:28 UTC 2010:

Love Slicehost, but this is becoming a dealbreaker. Fix it or give a reasonable explanation.

Dimitris commented on Wed May 05 02:35:08 UTC 2010:

24hours later and the problem still persists.

18 hours ago I contacted CogentCo about this. They replied 2 hours ago asking if the problem still exists (I told them it does).

Awful situation :-(

Jo commented on Wed May 05 02:39:23 UTC 2010:

I agree with Patrick – but the deal is already broken I’m afraid.

Linode has a datacenter in London – http://www.linode.com/avail/

Arcanum commented on Wed May 05 02:49:16 UTC 2010:

The problem persists for me in Greece.

Peter commented on Wed May 05 03:15:43 UTC 2010:

Also in Hungary. PING 38.104.162.22 (38.104.162.22): 56 data bytes 64 bytes from 38.104.162.22: icmp_seq=1 ttl=242 time=159.480 ms

—38.104.162.22 ping statistics -- 11 packets transmitted, 1 packets received, 90% packet loss round-trip min/avg/max/stddev = 159.480/159.480/159.480/0.000 ms

Brian commented on Wed May 05 03:44:28 UTC 2010:

Still problems in Denmark – This is NOT good!!!

Paul commented on Wed May 05 03:48:28 UTC 2010:

Same here, clients reporting partially loading pages inside the UK. Making me and my sites look bad. :(

ceb commented on Wed May 05 05:36:54 UTC 2010:

Deleting my comments was classy, way to go guys. This is the right way to handle this. If you can’t fix it you can always silence the complaining customers suffering a 24h+ outage.

Brian commented on Wed May 05 05:43:22 UTC 2010:

Slicehost. Please provide us with a status!

Bjørn Axelsen commented on Wed May 05 05:53:32 UTC 2010:

Still severe problems in Denmark. This MUST be solved very quickly, otherwise will have to move away from Slicehost (even though I am otherwise a happy customer).

Marco Frissen commented on Wed May 05 06:02:37 UTC 2010:

Still having issues from Netherlands, although it is intermittent indeed (now it works, 5 mins ago it didn’t, before that it worked sometimes – although very slow)

Erik commented on Wed May 05 07:29:54 UTC 2010:

Pingdom shows problems in Copenhagen, Amsterdam, Frankfurt and Montreal.

See http://pastebin.com/ED88xNXw for whole list.

Clients are getting angry, I’m getting more worried.

James commented on Wed May 05 09:16:35 UTC 2010:

Clients unable to use their websites in the UK for more than 24 hours.

Now forced into looking at other hosting providers.

Slicehost, please refund this month’s charges.

Nuno Zimas commented on Wed May 05 09:59:09 UTC 2010:

From the UK i haven’t got any complaints yet. Issues remain in Poland and The Netherlands. In the meantime I am forced to move at least one slice to another provider :( The complete lack of info from SH is outstanding, no matter how much Cogent is to blame. Cogent should refund SH and SH should refund its customers. If you guys are unable to deal with Cogent because they “own” you, then ditch them!

Jo commented on Wed May 05 10:03:38 UTC 2010:

From a friend who knows about these things: “Cogent is fairly notorious with its peering policy and it will either not peer or rely on links which it will let get congested as both sides have to pay to upgrade.”

I’m not sure what it means, but the word ‘notorious’ and the fact that slicehost relies on them doesn’t fill me with confidence.

Brian commented on Wed May 05 10:09:57 UTC 2010:

Angry? Yes Quiting SliceHost? Of course!!!

Turning from happy customer to non-customer in 24 hours. Well Done SliceHost…..

Nuno Zimas commented on Wed May 05 10:17:43 UTC 2010:

@Jo Cogent should definitely be renamed to Congest. Although I’m far from being a network guru (very far), my reading of your friends comment is fairly straightforward: adding up traffic until it collapses here or there. Way to go.

Ben Smith commented on Wed May 05 10:53:11 UTC 2010:

Shifting the traffic away was an option (you have other lines to the EU)? Why didn’t you do that yesterday?

JR commented on Wed May 05 11:14:44 UTC 2010:

Some of the comments here show a remarkable ignorance of the way the internet works. For those who are shaking their fists in red-faced anger at Slicehost, consider this: would you toss your computer in the garbage and buy a new one if your home internet connection was on the fritz?

Slicehost is far from perfect, but I’ve dealt with quite a few hosting providers in my time, and they are by far the most transparent. Place the blame where it belongs. Cogent is the bad guy here.

Nuno Zimas commented on Wed May 05 11:55:11 UTC 2010:

@JR While i fully agree with you, I am not Cogent’s customer and my customers are not Slicehost’s. Guess you get the idea.

Ivan commented on Wed May 05 12:17:06 UTC 2010:

@JR fully agree

Albert commented on Wed May 05 12:20:02 UTC 2010:

Half of Spain couldn’t play World of Warcraft a couple of years ago because of 10+sec ping times thanks to cogent for a good week or two a couple years ago. And blizzard is fucking big. Good luck with that!

Patrick commented on Wed May 05 13:07:07 UTC 2010:

@JR I don’t think anyone here, save maybe for a select few, was directly targeting and blaming Slicehost as much as they were airing frustrations (that I guess Slicehost themselves have been feeling) and wondering why Slicehost weren’t being more communicative about what steps they were actually taking to solve the problem, rather than seemingly just shrugging their shoulders and saying, “It’s not us!”. With that said, I think you are a little fast to excuse Slicehost completely. I love them just as much as the next guy, but while they, and their monolith hosting parent Rackspace, might not have had direct control over the problem itself, one would think they could have done the fail-over to another backbone much, much faster.

Personally, I was more let down by the lack of status updates than the service problem itself. I understand shit can hit the fan, but you can still be transparent about it (look at Amazon’s incident reports for a prime example of this). To my delight, transparency is what I got when I contacted the Slicehost support directly; a calm, reassuring and professional response to a legitimate concern, and a brief explanation about what was being done to rectify the problem.

So, well, to whoever’s reading this, I don’t know about you, but things seem to be clearing up on my end. It was a bummer, but Slicehost have acted pretty professionally, and it’s the first outage I’ve experienced in my two years of being a customer in STL-B. Not too shabby for one of the most affordable VPS providers around.

Cheers.

Luke Pearce commented on Wed May 05 13:40:37 UTC 2010:

A big thank you for the regular updates, much appreciated.

I’m sure you’ve been working on this a lot in the background but there is nothing more frustrating than not knowing if anything is happening or if it’s even being worked on.

Keep up the good work – hope this is fully resolved soon.

Cheers Luke

Gaimhreadhan commented on Wed May 05 14:34:35 UTC 2010:

We operate a small B&B in New Zealand and we have relied on Beds24.com these past years to operate a terrific service for us with our on-line bookings.

Today I have been trying to update our price matrices while travelling in Scotland and found it a frustratingly slow and hit and miss business.

Our guests try and book us from all over the world (with Brits, Yanks, Aussies and Germans predominating).

These network problems are costing us money – our customers do not know the reasons they can not book with us and few bother to e-mail us asking to make a manual booking.

After speaking with some ex-patriate Germans attempting to book us from Singapore just now I know that the problems are not confined to Western Europe.

(The Germans had booked elsewhere for Christmas by the time I called them – a direct loss of NZ£456 from their party of four…)

Jo commented on Wed May 05 14:52:57 UTC 2010:

“Cogent are the low cost leaders in the IP Transit world, they have rock bottom prices in the hope they get volume, it has worked well for them but their metric’s to make that work mean their network is not the most robust. You would not rely on them to be your only provider for anything mission critical but they certainly have their place in the market.” – says my insider

Jeff Vyduna commented on Wed May 05 15:18:18 UTC 2010:

Thanks for the updates, SliceHost. This isn’t your fault, and we appreciate that. Also, you’re handling this outage really well with the frequent updates and communication (much better than some times in the past), so thanks are in order.

Garry Tan commented on Thu May 06 01:14:59 UTC 2010:

As of tonight we continue to see issues with 100% packet loss from Sweden and 70% packet loss from Spain.

http://just-ping.com/index.php?vh=posterous.com&c=&s=ping!

Netherlands and Germany have been resolved, thankfully.

Thanks for the updates. Please keep pressing on them to get connections back to smooth sailing.

Triak commented on Thu May 06 06:07:29 UTC 2010:

Our slice is still itermitently unreachable from Prague (Czech Republic).

Daniel Swan commented on Thu May 06 13:07:15 UTC 2010:

Thanks Slicehost, a couple of your UK customers were puzzling over slow access to the slice after upgrading to Lucid yesterday. We know now it was an OS unrelated issue. Upgraded my slice from a 256 today anyway, was painless and now it flies even faster. Some of us appreciate the pain of diagnosing stuff that isn’t necessarily your fault. Good work.

mjw commented on Fri May 07 03:05:10 UTC 2010:

When traffic was routed through Level3 things were going smoothly, but now that traffic once again is going through the cogentco network the connectivity issues in Europe are back again :{

Calhoun Smith commented on Fri May 07 07:11:59 UTC 2010:

mjw, there was a moment for me yesterday around 4 in the afternoon when everything seemed to be working again. I am in Amersfoort, the Netherlands, by the way. In the evening, however, things seemed to go south again and there is still no improvement with the connection as I write today. When I contacted Slicehost support last night around 10:30pm my local time, I was told I was the only one who still seemed to be having issues. Sounds like perhaps this is not the case. I have to say I appreciate Slicehost’s prompt response to all my support mails.

Iwein commented on Sat May 08 06:18:47 UTC 2010:

My blog (hosted at posterous) is still hardly reachable, so @Calhoun, there is at least two of us from NL that are not seeing any improvement.

joost commented on Tue May 11 07:34:33 UTC 2010:

@Calhoun, @Iwein, I am experiencing problems from Netherlands still as well. This issue is definitely not yet fully resolved.

Mark commented on Wed May 12 09:02:47 UTC 2010:

Hey guys,

If you are still having issues, can you email us at support@slicehost.com with the requested information in the status post? We need the output of a traceroute to your slice, your source IP, and a series of 100 pings.

Chris McCauley commented on Tue Jul 06 10:45:43 UTC 2010:

Problems with Cogent are back. Outage from Ireland on some network routes. Just got this from Slicehost

‘Thanks for letting me know. Our network team have escalated to Cogent, and Cogent is investigating as we speak.’

Which is great but I hope after this Slicehost might take a more active approach to monitoring problems

shivani commented on Wed Jul 14 04:59:47 UTC 2010:

Greeting from NettCon Systems, Dear Sir, We would like to introduce ourselves as an established and trusted outsourcing corporation based out of India with office in US and New Zealand. We will be glad to attend to any of your requirements at Reasonable cost. We can be reachable on Mob:- +64211843191 NettCon Systems, Inc. specializes in IT infrastructure services provider, which increases productivity in an organization. Cost savings are achieved through various innovative model such as implementation of remote advanced infrastructure support. We also help in identify some of the basic & advance design aspects of your IT infrastructure and propose optimization. We cover the major design, operational issues such as network management, security, load balancing, performance optimization and all the other services as per the profile attached. We offer Customer Satisfaction and assure you of our Best Services Always. If you would like more details about our Services, Please do not hesitate to Contact us. We look forward to hear from you. Best Regards, Shivani Choudhary Mobile :+91 999-930-0938 Mr. Puneet Bunet Mob:- +64211843191 Nettcon Systems www.nettconsys.com

Jo Potts commented on Mon Oct 11 06:25:14 UTC 2010:

I have a customer in Edinburgh who is unable to access my site (hosted on Slicehost). This is the same customer who had a week’s downtime during the last outage.

Are there currently any reported network problems? A traceroute from me (where it’s working ok) is routing via level3’s network.

Want to comment?


(not made public)

(optional)

(use plain text or Markdown syntax)