I’m not sure if this is the right forum or not to discuss this, but my VPS recently suffered a 24 hour loss of service when a RAID controller failed, and this caused downtime for my website.
Leaving aside the time that it took DreamHost to deal with the problem, and their lack of communication regarding how long the recovery would take, to mitigate against this happening in the future, I am wondering what actions can be taken, both from a customer and DreamHost point of view, in the event that anything like this happens in the future.
My site runs on two servers. I have a web PS and a MySQL PS. I spend approx $150 dollars a month for both. The MySQL PS is locked down and I assume DreamHost are operating some kind of SQL cluster which is not visible to us customers.
However, when a web PS goes down it causes huge issues because there is no fail over, and because we are not told how long a recovery might take, we can’t make informed decisions about what to do. If I had been told within an hour of the outage that it would take 24 hours to fix, I might have made arrangements to change DNS entries to point somewhere else etc, but I didn’t do that because I thought at any moment the server could be back up, and there would be DNS propagation delays which would not make this worthwhile.
During the outage, there was no 503 error code served, so Google and the other search engines are now treating my site very badly and traffic has fallen to 20% of what it was before the outage. It’ll now take weeks for users to build up trust in the website again, and for the site to recover its position in the search engines. Not to mention lost revenue etc because of the drop in traffic.
The web PS that I am on doesn’t just host Apache, I’ve also got a memcached daemon, and a Sphinx search daemon. To have a hot stand-by, it would have to be at least as good a spec as my current PS, and I’m not sure I could afford this as I’m already paying a lot for the PS.
I don’t know if a fail-over service is something that DreamHost currently offer (but it has been suggested on multiple occasions), but I really think a sensible thing would be for a PS to fail-over to a splash page which could serve a 503 error.
I’m paying a lot of money on PS, but there needs to be a big improvement in how outages are currently handled for me to be happy to keep my site hosted here. If no contingency plans are made for these types of occurrences (and it’s bound to happen again sometime), I will have to take my money elsewhere.