DFW Interruption
November 3rd, 2009 @ 03:20 AM
UPDATE 3:25AM CST: We are still working through issues on a couple of host servers. Everything else should be back up. We’re monitoring everything closely. Root cause was a maintenance going on in DFW. More information can be found at www.rackspace.com/blog. We sincerely apologize for the down time. You can be sure, we’re looking into everything surrounding this incident.
UPDATE 2:18AM CST: Most slices should be restarted. We’re working to get manage.slicehost.com back up. We have a few servers that are giving us problems. We’ll try to get you guys notified and it resolved as soon as possible.
UPDATE 2:04AM CDT: Working to restart everyone’s slices. Stand by.
UPDATE 1:16AM CDT: Power has been restored, however, we’re working to check all our systems and make sure everything comes back up correctly. Slices have not yet been restarted. We’ll try to keep you updated as much as possible.
We are currently experiencing a service interruption in our Dallas data center. Our engineers are currently working to restore connectivity. We will send an update as soon as information becomes available.
Posted by john

Comments:
Suba commented on Tue Nov 03 00:41:39 UTC 2009:
Hi
Why is that customers were not notified proactively via email.We had to figure out ourselves:(
suba
Andrew P commented on Tue Nov 03 00:43:44 UTC 2009:
Thanks for the update. You’ve posted this info pretty quickly. Good luck getting everything operational!
Alex commented on Tue Nov 03 00:44:54 UTC 2009:
I’m sure they’ll resolve it asap. They are focusing on fixing the problem rather than notifying everyone, and that’s ok.
HK commented on Tue Nov 03 00:45:06 UTC 2009:
Do you guy have an ETA (other than ASAP!)... just want to plan accordingly.
RALi commented on Tue Nov 03 00:45:13 UTC 2009:
Suba … chill. “Service Interruption” means it’s unplanned. How should SH be “proactive” about an outage? Hire a fortune teller?
From where I’m sitting, SH does a good job updating us and I’m happy to say they don’t have a lot of practice talking about outages. That’s a good thing; it means my slices are up!
R
Michael Hill commented on Tue Nov 03 00:46:11 UTC 2009:
I agree w/ Suba. Not a big deal and outages happen, but some notification should have been sent out instead of finding out from my devs who were trying to commit code.
dave commented on Tue Nov 03 00:48:19 UTC 2009:
http://twitter.com/slicehoststatus
???
Yuval commented on Tue Nov 03 00:49:52 UTC 2009:
yeah, that’s a shame. I was going to follow on twitter, but that’s not updated either. So I guess the best thing to do is subscribe to this RSS feed…
Bill Nordwall commented on Tue Nov 03 00:51:06 UTC 2009:
If you want to know when your site is down, use a monitoring solution with an SMS gateway – you’ll know as soon as Slicehost staff does.
Draka commented on Tue Nov 03 00:53:17 UTC 2009:
We are all awake waiting for him to reestablish the connection, to climb a step updates and make a BK. When did the last BK???
greg commented on Tue Nov 03 00:55:39 UTC 2009:
Thanks guys, you rule. :D
Fiona commented on Tue Nov 03 00:59:26 UTC 2009:
http://twitter.com/slicehost is always updated. Follow that one.
Rad commented on Tue Nov 03 01:02:49 UTC 2009:
Incident occured 30 minutes ago, twitter status was updated 9 minutes ago… 21 minutes is a lifetime when your site is down.
Jarin Udom commented on Tue Nov 03 01:02:56 UTC 2009:
http://pingdom.com works wonders for notifications about this
Vitaly Burshteyn commented on Tue Nov 03 01:03:38 UTC 2009:
Is there any update on what kind of outage this is? Power, Network, act of GOD?
I simply would like to know if the servers are up or not.
Etay commented on Tue Nov 03 01:06:27 UTC 2009:
At least my site (http://www.moodspin.com)is down while http://www.techcrunch.com/ is down…
good company, bad status…
Good luck fixing it!
HK commented on Tue Nov 03 01:08:08 UTC 2009:
@Etaay – techcrunch is back up!
zenalex commented on Tue Nov 03 01:08:54 UTC 2009:
Techcrunch down = Rackspace issue. http://search.twitter.com/search?q=rackspace
HK commented on Tue Nov 03 01:15:10 UTC 2009:
slicehoststatus on twitter mentions that it is a power outage.
Miles commented on Tue Nov 03 01:20:16 UTC 2009:
http://status.mosso.com/2009/11/cloud-sitesservers-dfwsat-degraded.html
As of 12:35AM CST Rackspace Cloud engineers are seeing intermittent connectivity to our WC2 cluster in our Dallas – Fort Worth (DFW) and data center. We are working to resolve the issue as quickly as possible and will update the status post accordingly.
If you have any questions or concerns please contact our support via live chat or at 1-877-934-0407 international +1.210.581.040.
UPDATE: As of 1:15am CST, Rackspace Cloud engineers are still working to address the current connectivity issues. We are making significant progress and we will post another update here shortly.
kevin commented on Tue Nov 03 01:30:57 UTC 2009:
I’ve got a complete nagios set up for all the statuses under the sun, I knew when it was down…I’d rather they fix than email
Kevin commented on Tue Nov 03 01:45:27 UTC 2009:
Do these DOS attacks seem to be getting more frequent? It makes me nervous. Or is this strictly a power issue?
Colin commented on Tue Nov 03 01:51:06 UTC 2009:
As if my clients weren’t paranoid enough about network issues. It’s gonna take a long time for me to recover from this.
If it was a power issue, I would like to think that they have backup power.
Miles commented on Tue Nov 03 01:51:57 UTC 2009:
http://status.mosso.com/2009/11/cloud-sitesservers-dfwsat-degraded.html
“UPDATE: As of 1:30am CST, service has been restored to our WC2 cluster. We are going to continue to monitor the situation closely. Additional updates to follow.”
It’s been almost 30 minutes since this update, but my Slicehost-hosted site is still down, along with my Rackspace-hosted email. Total downtime approx 1.5hrs at this point.
Miles commented on Tue Nov 03 01:55:10 UTC 2009:
posterous has apparently gotten their issues resolved by Rackspace:
http://twitter.com/posterous/status/5386718072
Miles commented on Tue Nov 03 01:58:39 UTC 2009:
Just now back up and running – thank you!
Anna commented on Tue Nov 03 02:04:39 UTC 2009:
Yay back up now. Thanks SliceHost!
androo commented on Tue Nov 03 02:05:15 UTC 2009:
Back up, now.
Selami commented on Tue Nov 03 02:09:33 UTC 2009:
It should be expacted that Slicehost would warn/inform us about this.. I have a news portal, and now is a very bad image for my site :((
DL commented on Tue Nov 03 02:10:31 UTC 2009:
My sites are still ALL down. :(
It3ration commented on Tue Nov 03 02:19:56 UTC 2009:
Having an email notification or something would be good.
CrankyMonkey commented on Tue Nov 03 02:24:38 UTC 2009:
Uh… if you guys have such mission critical stuff then you should implement your own monitoring. Stop whining about how Slicehost didn’t notify you, they were busy working on the outage. Shit happens… sites go down… even the majors have this issue.
I generally get notifications 5 minutes before Slicehost sends them from my own monitoring.
Hey! commented on Tue Nov 03 02:32:05 UTC 2009:
is there a way to get to your web console – port 22 is closed on our boxes and we cannot restart our services without the web-console.
Rowan commented on Tue Nov 03 02:44:06 UTC 2009:
If you have critical stuff get two slices in different DCs and set up HA (check the articles)
Rob Gough commented on Tue Nov 03 02:57:04 UTC 2009:
I agree with the (aptly named) CrankyMonkey … downtime is always a possibility no matter who you’re with. If your stuff is that critical, you should at least have your own monitoring – and I would say co-lo in two different datacentres (and providers).
Marian André commented on Tue Nov 03 03:23:27 UTC 2009:
Website is still down :( Any info on when it is supposed to be back up? Guess it’s not possible to tell. Paranoid customers calling.
Shawn commented on Tue Nov 03 06:08:59 UTC 2009:
Site down for over 6 hours! This is completely unacceptable as we are losing money and readers. This could not have happened at a worse time for us as we are in the peak time of year for our site.
Customers should have been notified of the maintenance and b/c of the issues, we should get status every 30 minutes of the status.
Obviously no back out plan was in place.
stretch commented on Tue Nov 03 07:10:55 UTC 2009:
@Shawn: I agree this could have been handled better, but if you’re site is still down you’re going to want to SSH in and see what’s going on. My VPS was back up after roughly an hour and twenty minutes.
Mark commented on Tue Nov 03 07:13:57 UTC 2009:
Thanks for the updates and perfect uptime so far.