MySql down again! grrrrrr


#1

Warning: mysql_connect(): Can’t connect to MySQL server on ‘db.new-ipod.com’ (4) in /home/.natasha/archon810/new-ipod.com/header.php on line 5
Unable to select database

This is getting frustrating. 3rd time that I caught it, dont know how many that I haven’t. This is not 99.97% uptime or whatever it is, this is more like 97-98%!


#2

Sure enough, mine is down too. :frowning:

I’m on grizabella.

Edit: Working again after 10 min or so! Phew.

http://www.stoffersphoto.com


#3

same here - all on Murdock


#4

We’ve been having a bunch of problems with one or two of our mysql machines. We’ve been in communication with the MySQL people about it and are trying out some changes to improve stability. We’ve also been isolating heavy resource users and moving databases around to stabilize things. The problems are mostly isolated to a single MySQL server now, though that’s not much solace for those of you with databases on that machine.

  • Dallas
  • DreamHost Head Honcho/Founder

#5

Does this sort of thing happen often? Given that MySQL usage is monitored, it surprises me that one user can cause so many problems with a database server. Is it a question of heavy-use, or is it something else causing these problems?

Incidentally, I am just curious about it - I have no problems with the databases on my own sites.


Simon Jessey
Keystone Websites | si-blog


#6

MySQL usage is monitored but the monitoring happens off-line periodically so it’s still possible for a particularly busy database to slip through the cracks in between the analyzing times. Also, the monitoring is not perfect and does not show us all of the data we need sometimes. Our main MySQL guy has been working on some much better tools to help in this regard. Unfortunately, they’re not quite ready for use yet.

I’m not completely up on all the specifics of what’s going on. We do move heavy use databases off of shared mysql servers when we notice them, but sometimes they still manage to spin the whole server out of control before we notice. That’s clearly not ideal and we are taking steps to reduce the impact one user can have on the whole server as well as improving our response time to problems.

Also, some of the problems we’ve had have been problems with MySQL itself and we have been working with MySQL support and changing configurations to improve things.

  • Dallas
  • DreamHost Head Honcho/Founder

#7

Thank you for the response. I’m glad to hear that your chaps are working on resolving the problems. Cool beans.


Simon Jessey
Keystone Websites | si-blog


#8

Note that this morning we had two more MySQL servers crash and we now believe we have isolated the problem down to a bug in the particular kernel we’re using on those two servers. The kernel is going to be swapped out today with one that does not have the bug. It seems likely this will solve the stability problems.

  • Dallas
  • DreamHost Head Honcho/Founder

#9

I’m not a perfectionist, but i don’t think it’s acceptable. You don’t think it’s acceptable either, so what now?

I wish I could help and thanks for sorting it out.

For the future, can you not have an extra server that you use in such cases? I’d be happy to pay a bit of an extra if that would help. Or would a dedicated server be more reliable?


#10

[quote]For the future, can you not have an extra server that you use in such
cases? I’d be happy to pay a bit of an extra if that would help.

[/quote]

Well, the problem isn’t really so much an issue of not having enough servers. We have servers like you wouldn’t believe.

A lot of the time, issues can simply be difficult to track down, and if you switch people to a different server (which in itself can result in some downtime), the problem may very well follow them. For example, these kinds of problems can often be tracked down to a specific user whose usage is “problematic” (ie. regularly and repeatedly running a SELECT on a table with a few hundred thousand rows but no indexes).

Rest assured, whenever there is a problem like this we’re usually looking into it. The problem is that after a server crashes, it may take time and/or actually actively watching it crash (which is what did the trick this time around, if I recall) before the root cause of the problem can be found.

[quote]Or would a dedicated server be more reliable?

[/quote]

Generally, dedicated servers are more reliable for one reason: Your site/database’s availability is not impacted by other people. If you’re the only person using the database, nobody else’s poorly coded web application will be accessing it and causing trouble. Obviously this isn’t going to help if you’re the guy with the poorly coded web application, but in that case it’s still much easier for us to diagnose and fix as there aren’t nearly as many variables to consider.

  • Jeff @ DreamHost
  • DH Discussion Forum Admin

#11

The felix server still seems to be down. :frowning:

snowball & stimpy are ok.


#12

Felix has been up for 123 days now. Are you sure the server itself was down? Are you still having problems now? You should contact support if you are because I don’t see any problems on it.

  • Dallas
  • DreamHost Head Honcho/Founder

#13

My site doesn’t load again. Am I so unfortunate that after relaunching my website on this new host 2 weeks ago, I have had at least 3 mysql server malfunctions?? Even Brinkster that I left for Dreamhost had only 1-2 downtimes a YEAR! Thanks god I’m not eaning my bread with this site. :frowning:


#14

Grizabella had a multi-drive failure! :frowning:

http://www.stoffersphoto.com


#15

Yeah, it’s cursed or something. Oh, and I’m using the technical meaning of ‘cursed’ there.

We’re going to move everything off of grizabella once we get it back up even if this raid drive rebuild is successful. Something is weird with it.

  • Dallas
  • DreamHost Head Honcho/Founder

#16

arg!! down again :o(


#17

3 domains, all MySQL-driven, all down

:o(


#18

I hope the rebuild will be successful. Thanks for moving the stuff from the cursed one. :slight_smile:


#19

… you guys have backups… right? i hope having raid saved the databases or you did backup jobs.


#20

perhaps the better question is…

WHEN did you do the last backup?