Fail-over strategies on DreamHost Web PS


#1

I’m not sure if this is the right forum or not to discuss this, but my VPS recently suffered a 24 hour loss of service when a RAID controller failed, and this caused downtime for my website.

Leaving aside the time that it took DreamHost to deal with the problem, and their lack of communication regarding how long the recovery would take, to mitigate against this happening in the future, I am wondering what actions can be taken, both from a customer and DreamHost point of view, in the event that anything like this happens in the future.

My site runs on two servers. I have a web PS and a MySQL PS. I spend approx $150 dollars a month for both. The MySQL PS is locked down and I assume DreamHost are operating some kind of SQL cluster which is not visible to us customers.

However, when a web PS goes down it causes huge issues because there is no fail over, and because we are not told how long a recovery might take, we can’t make informed decisions about what to do. If I had been told within an hour of the outage that it would take 24 hours to fix, I might have made arrangements to change DNS entries to point somewhere else etc, but I didn’t do that because I thought at any moment the server could be back up, and there would be DNS propagation delays which would not make this worthwhile.

During the outage, there was no 503 error code served, so Google and the other search engines are now treating my site very badly and traffic has fallen to 20% of what it was before the outage. It’ll now take weeks for users to build up trust in the website again, and for the site to recover its position in the search engines. Not to mention lost revenue etc because of the drop in traffic.

The web PS that I am on doesn’t just host Apache, I’ve also got a memcached daemon, and a Sphinx search daemon. To have a hot stand-by, it would have to be at least as good a spec as my current PS, and I’m not sure I could afford this as I’m already paying a lot for the PS.

I don’t know if a fail-over service is something that DreamHost currently offer (but it has been suggested on multiple occasions), but I really think a sensible thing would be for a PS to fail-over to a splash page which could serve a 503 error.

I’m paying a lot of money on PS, but there needs to be a big improvement in how outages are currently handled for me to be happy to keep my site hosted here. If no contingency plans are made for these types of occurrences (and it’s bound to happen again sometime), I will have to take my money elsewhere.

Jon


#2

Maybe I am misunderstanding the VPS setup here, but isn’t one of the points of a virtual private server that it can be quickly moved to other hardware?


#3

Apparently not, have you seen all the v-server status posts that pop up on http://www.dreamhoststatus.com/ and stay there for 24 to 72 hours?


#4

It’s ridiculous, my site was down for 24 hours and I’m paying over $150 a month for VPS. I expect a better service when something goes wrong. I expect proper communication right from the start about what is happening and how long it will take to fix (and accidentally stumbling on the blog after something has gone wrong doesn’t count, not does a status update every three hours, when every minute counts.)

I don’t want this situation to happen again. However without any change in how DreamHost approaches things, that risk remains. If DreamHost were to read the comments on the DreamHostStatus blog, they’d discover some great ideas on how to improve their handling of these situations, and they’d realise how serious customers treat any downtime (let alone a 24 hour loss of service). I’m still seething three days later.

Unfortunately, DreamHost don’t seem to want to engage with their customers and learn from past mistakes about how to handle these situations. If they did, something better would already be implemented. I just feel on the verge of having to move my site elsewhere if nothing is put in place very soon to improve this. I’m not looking forward to doing that because it is a big site and it will mean even more downtime, but it feels like the lesser of two evils. It’s just so frustrating…


#5

$150+ for VPS? Mate, I’d be outta here yesterday.


#6

$150 per month = 2 GB of RAM on web VPS and 800 MB on MySQL PS.

yes, it’s expensive, and a dedicated server would be cheaper.

There’s is an old adage that you get what you pay for, but it doesn’t ring true in this case.


#7

I took a squizz at their prices for VPS & Dedi and can’t say that I’m seeing much value there at all. I’m totally unaware of how many VPS & Dedi customers they have (could very well number in the multiple thousands making the stats on status listing a very small percentage of problematic servers of course) but for the kinda money you’re spending I’d be expecting extremely prompt service with respect to any type of downtime. I’d likely be pounding the keyboard with the CAPS-LOCK SET FIRMLY IN THE ON POSITION after any extended period of downtime.

Your suggestion of implementing a 503 for known down servers (as a minimum) is a really good idea. They should also email and SMS VPS+ clients (I was under the impression they did this) with at least a no guarantee roundabout guesstimate as soon as they become aware of known down servers so that clients can decide how to react with respect to waiting it out or switching to any failover device they may have at their disposal.


#8

yes, well I figure I could kick and scream really loudly, or try and get a constructive discussion going about how things could be improved.

The same sort of thing has happened to me in the past. Then, customers posted on the DreamHostStatus blog to ask when the server will be back up, others came up with great ideas about how the communication could be improved, ideas to counteract downtime issues with the search engines etc, and DreamHost did nothing about it

… and then the same thing happens to me all over again, and the same customers are all suggesting the same things all over again, and these issues must have happened multiple times to other people in the interim

No host is ever going to have 100% up time, so accepting down time will happen from time to time, there needs to be a better process for handling it. The trouble is, DreamHost don’t seem to want to engage with their customers about this, so it just feels futile to try and change anything.


#9

I agree with everything said here, in the entire thread. I have to comment tho, that I particularly agree here. Dreamhost seems to have turned off it’s ears. I think they do some good stuff, but then there are gaping holes as well. When someone makes a well formed suggestion and they refuse it, they are just helping there competitors grow. Sometimes it seems like they only react to problems, rather than embrace growth and change. Sometimes they can plug the holes and move on, but then they dam bursts and it seems like they don’t address anything very well.

I’ve wondered about the stability of the private server product. Private servers never appear on Dreamhoststatus, but it also hardly seems necessary to post a failure status for one customer, the the notification system is probably different. I also wonder if there is a different level of service provided for the dedicated customer as well, or if it’s the same as it appears to be for VPS customers.

With all the hardware failures effecting VPS’s lately I think I would have a very hard time deciding to buy that product, especially since it seems like dreamhost doesn’t have a ready fail-over process. Seems like there should be spare server capacity and downtime should amount to nothing as the VPS “guests” are moved and brought up on another machine.

I was working with a group at the end of last year that ended up with colo space at a much greater cost. I presented Dreamhost private servers as one option, but it was passed over due to the fact we were unable to locate any type of SLA, only the statement that annual subscriptions were non-refundable.


#10

Here’s my proposal as to what DreamHost could be doing to improve things. At the outset, there should be a ‘push’ communication to say that your server is down, and it will probably be back up again within one of the following time windows (this gives a get-out clause for DreamHost as I appreciate it is not an exact science):

0-1 hours
1-3 hours
3-6 hours
6-12 hours
12-24 hours
24+ hours

At the moment, a customer’s only choice when a server goes down seems to be to change our DNS to point somewhere else, but we can’t do that unless we know we are in for serious downtime (because of propagation issues etc), so it’s important for customers to know straight away the kind of delay we are in for (we don’t need to know to the minute).

In the meantime, any request coming into DreamHost for a known ‘bad’ server could be redirected (with some network trickery) to a splash page containing a ‘sorry’ message (for the end user) and a 503 error (for the search engines).

This is an interesting read about how webmasters should deal with planned downtime (from a Google point of view). http://googlewebmastercentral.blogspot.com/2011/01/how-to-deal-with-planned-site-downtime.html.

Just to quote from that post “Outages that are not clearly marked as such can negatively affect a site’s reputation.” and I know this is true, because it has happened to me.


#11

Not that anybody here gives a monkey’s, but I’ve just signed up with another hosting company. I’ve had enough here now. :frowning:


#12

Sorry to see you go. I, for one, will be interested in hearing about how things go at your next host.

Personally I only have a shared account and have only used Dreamhost, but I’m curious about how other hosts perform and what they offer. If I ever do need to expand beyond shared hosting though, I think I would look for a host which specialises in that. Dreamhost started as a shared server provider and I guess that provides the bulk of their income and probably also affects their mindset. It certainly affects the hardware - VPS is just a guaranteed slice of the shared infrastructure, so in reality, it’s just a glorified shared server. I’ve heard many good things about Linode, Rackspace, and even AWS. I think they probably have many satisfied customers because they don’t offer shared hosting (unless things have changed). They focus on VPS/dedicated servers so the support and such is more in line with what you need.


#13

Thanks, I won’t inflame anyone here by writing the name of my new host, but they are a renowned VPS specialist and I’ll be paying about the same as I am now for a better VPS specification (DreamHost is not cheap for VPS if you use 2 GB of memory). After I’ve gone, I’ll keep a small account on DreamHost to play around with, but it certainly won’t be worth anywhere near the $$$ that I’m currently paying.

TBH, I was quite happy here up until a month ago and since then I’ve suffered a 24 hour outage (which was bad enough) followed by weeks of unresolved performance issues (and right now my website is unusable). And there’s simply no end in sight.

The main issue that I have is that MySQL VPS is simply not flexible enough. It’s nonsensical that we can barely touch the configuration to suit our needs, but like you said, MySQL VPS (in particular) is just like glorified shared hosting, except in price, so we aren’t allowed to change anything and DreamHost gets our money anyway.

On the positive side, it’s been interesting here, and my shared hosting experience was very positive (before I moved to VPS.) I think you’re right that maybe DreamHost should stick to providing shared hosting as they do that very well.

Sure, I’ll report back once my site has settled on the new host and provide a comparison of my VPS experience.

Jon


#14

For the money you’re spending you could have got a fully dedicated server mate.


#15

Yes, but my site is spread over two servers and I want to keep the database and web server separate. I can’t get two dedicated servers for that money (not of a decent spec anyway). I’m happy to stick on VPS until the site outgrows it.


#16

Ahh! That makes sense to me now :smiley:


#17

I have had the same experience with my $80 VPS; down 4 times in 30 days, with no notification, and a 4 day response lag to my tickets. I suspect they have tons of clients, tons of hardware, and 2 IT guys.

I’ve had a personal shared account here for years, and it has never gone down, to my knowledge. The personal offers “unlimited bandwidth”, so I don’t know why I bothered with a VPS. (My application is special; speed of delivery is not an issue).

I signed up with WebHostingHub, and have mirrored my sites there. Their interface is much more primitive, and they also do not allow SSH access(!), so any changes, like adding an SSL, take forever. Also, they are more expensive than DH. However, they have an excellent uptime rating, so I’ll see what happens. Yesterday I set up a cron there to check my DH site every 5 minutes; if DH fails before WHH does, I’m cancelling my DH VPS.

I don’t think DH prices are high, given what they offer. I did a lot of comparison shopping. And downtime happens everywhere, including Amazon, but everyone else seems to handle their customers better.

—Mike


#18

As much as it pains me to have to say so, I’ve become utterly disenfranchised by the recent performance and support from DreamHost.

I’m a VPS customer, previously a shared hosting customer, and I’ve enjoyed telling my friends and clients about how nice DreamHost makes my life. Except that recently, that’s not been the case. At all. Normally, I would write this post in my blog, except that for the reasons you’ll read about below, I can’t actually access my blog right now.

Among those many threads on the Status blog is one regarding the transfer of ‘homie-vserver173’, on which my VPS resides, to new hardware. Now I’m not an expert in server administration by any stretch of the imagination, but I like to think that if I were running a business where my primary responsibility was to keep websites online, I would want to ensure that I was able to, you know, keep websites online. Regardless of what maintenance or emergency replacement tasks I have to perform. I know it’s difficult, complicated stuff, but can it really be possible that no one at DreamHost considered the possibility of harddrive failures on their hardware? That there’s no backup plan or redundancy of any kind in place that would allow this hardware change to remain invisible to the users (as it should be, as far as I’m concerned)? That, if not invisible, the switch would at least occur within a 24-hour window?

I’d like to draw your attention to Exhibit A, the blog post. It gently explains that due to drive issues, this procedure had to occur and was unplanned, hence the lack of warning. Fair enough. But note the date stamp: February 25th. As of this writing, that means that this restoration process has been going on for 18 days. Besides being utterly appalling, it makes one wonder just what kind of hardware they’re running over there that needs more than two weeks to restore. We’re talking about one server. Restoring for two weeks.

While that percolates, why not consider also that the status update messages are — at the very least — unhelpful: “The restore process continues and all VPS guests are reporting online at this time.” Really? Are they? For what definition of “online”? Because my sites have been either offline entirely or so painfully slow to load as to make no difference (>30 seconds for a basic WordPress homepage). Contact support is a great suggestion, except that DreamHost support seems to have vanished. What used to be a blisteringly quick support response from helpful engineers has turned into a hopeless waiting game where days roll by like tumbleweeds while those of us stuck with a handful of dead sites fling hopeful coins into the well and wait for some placating nod from the powers that be to grant our wishes.

But I’m a patient person, so rather than get too worked up over the first few days, I simply sent in a dutiful heads-up to support informing them that, in fact, my sites were all very much down rather than up, and that it would be terrific if the situation could be addressed. Besides, CloudFlare saved my butt for the first few days by serving a cached copy of the site. I have the luxury of being able to do this, because I’m not running a huge e-commerce website or web-based operation where the site is integral to daily operations. I do run a web business, but my clients have been patient and understanding. I am very fortunate in this regard. Maybe I should have been more dramatic. Maybe then I wouldn’t have had to wait half a week for a response to my first ticket.

But what are my fellow server buddies doing, I wonder? Are their sites, in fact, running just fine now? Have they discovered a means of appealing to the gods of uptime to bless their sites with life amid this troubling hardware migration? I’m very curious.

In between all this fun, the web panel itself has been down at least twice in the same period of time. The sum total of these experiences has crucially compromised my ability to trust DreamHost with my web presence, and I say that with the utmost sadness as a loyal customer. I thought we were friends, D.

Unfortunately, brand loyalty only gets you so far when there’s no follow-through to the promises though, and so I have made arrangements with another host and am slowly (and I mean VERY slowly — FTP access is a deathly crawl) trying to get my sites migrated somewhere safer where I can trust that my data and uptime are secure, and that my time and energy will be better respected as a loyal customer.

What a pity.


#19

[quote=“Mathazzar, post:18, topic:57090”]
Normally, I would write this post in my blog, except that for the reasons you’ll read about below, I can’t actually access my blog right now. [/quote]

Very sardonic observation.

I think you are being too patient. According to the TOS, the cap on downtime credit is the equivalent of 1 month @ 1 day / hour of downtime. So any single or cumulative outage > 30 hours and you are screwed. I would be making a very loud rukus if I were you. When you open a ticket with Support, do you mark the severity as people are dying or whatever it is?


#20

If you’re spending anything more than $10/month then you should expect to be prioritised.

I <3 U Dreamhost - but you’re shafting these guys.