mySQL redundancy

I love Dreamhost. Usually they are fantastic; I recommend them to everyone. However, today I am plagued by mySQL outages that have crippled a client’s site. When you have a site that runs php/mySQL and the mySQL is not working, you get pages of errors and ugliness and things that just don’t work. All the e-commerce ground to a halt. I’ve got duplicate transactions and refunds to sift through. A big mess.

So, I’m wondering, is it really difficult to set up some kind of mySQL redundancy? I mean, could Dreamhost easily just move the databases from a troubled machine and plunk them onto another one that is working while they fix the broken one? I’m assuming they would do something like that if a shared webserver flaked out, so why not on a shared mySQL server? Honestly, it would be better for me if the Apache and the mySQL went down together. Having a half-functioning site just confuses people.

There are a lot of technical reasons why this is hard to do (compared to doing something similar with a webserver). In general, fixing the problem / restarting the MySQL instance is going to be quicker / easier than syncing the data to a new machine. With the web machines, there are a ton of them, so failure of a single machine is more likely, all user data is mounted over NFS… quickly failing over to a new machine is possible if there’s some sort of problem affecting the whole machine.

MySQL does have some support for replication; we’ve used this some internally, but so far, the results haven’t been that promising… lots of problems. We have looked into some of the commercial MySQL replication systems… it’s unclear whether we’d be able to offer something like this for shared hosting customers (in terms of cost).