My website is SLOW


#1

http://phpdiplomacy.net/

This site sometimes takes less than a second, but sometimes takes 2 minutes to load a page. Often I’ll get Internal Server Errors because of it.

  • It’s not client side, because it gives Internal Server Error, the script counts time on the server side, and other users have complained.
  • It’s not bandwidth related, because I can download from the site with static pages at full speed.
  • It’s not a CPU problem, because the CPU time for an average script is a fraction of a second.
  • It’s not a problem with my script, because the speed varies from time to time and doesn’t depend on the number of users or the time of the day.

I’ve brought it up with support several times now.

The first time I found a spam bot, which was shut down (but then I became inable to see the processes of other users so I don’t know if this is still happening.)

The second time they told me there was an error with the monitoring software that was causing the connection to the MySQL server to be slow.

The third time they said the load was too high, but that they were working on it.

The fourth time they offered to move me to a new MySQL server.

The fifth time (they hadn’t moved me by now) they got around to moving me.

The sixth time they said the new server they added hadn’t been enough to reduce the load, but that new servers were being installed.

The seventh time they looked at my site once, presumably when it was going through a period of not being slow, and told me there was no problem. (Guess I was just imagining it.)

The eighth time they helpfully suggested that the internal server errors might be a client-side problem. Eventually I managed to convince him that there really was a problem and I wasn’t just dreaming, and he took the time to check the MySQL server load and found it was too high.
So now it has been 3 days and I’m waiting for him to write back about the recurring server load problem.

I’m a pretty patient guy but this is getting ####ing ridiculous.

My site is just for fun, I’ve put a lot of work into it, and my money goes towards my uni fees and can’t go towards trying every host under the sun (Dreamhost is the third I’ve tried, the fourth if you count donated hosting).

My site doesn’t even get that many users; around 30 in peak times.

Is anyone else experiencing a similar problem? What can I do about this? It seems to be a problem with MySQL server load, but it’s making the site I’ve put all this effort into unusable.


#2

My first guess would be that your site’s broken. Have you started looking through all of your HTTP and PHP logs? When I’ve had slowness, it’s because there was a bug in my software that had to timeout before continuing.

-Scott


#3

My site is broken, but my software isn’t. If it was my software why would my software’s error logging code not catch any errors, why would it only affect Dreamhost’s servers, why would the error appear and disappear without respect to users of my site, etc, etc.

The error logged in the http logs is this:
[Sat May 12 20:51:29 2007] [error] [client 1.2.3.4] Premature end of script headers: php5.cgi, referer: http://phpdiplomacy.net/board.php?gid=902&orders=on

The internal server error message only happens when it takes an even longer time than usual to load, so presumably it’s the same error.
Also any error that causes the script to end but which is handled within PHP is caught and logged by my software, so whatever the problem is it’s not coming from within PHP.


#4

Reading back over my post it gives the impression that it has been constant. It has come and gone; when I was first moved it was fast for a while, it does come and go and I usually only report errors when it hits a crescendo. Usually it gets slower and slower and slower, and that’s when you start to see internal server errors. Right now it’s at a peak of slowness again.


#5

From a search on that error, it looks to be a load issue. Some people reported that when they to the ‘top’ command from the server’s command line, they see a handful of PHP processes. Once they kill those, things get back to normal.

How about SSHing into the server to see which processes are running, and the load on the server?

-Scott


#6

It’s worth checking into server load to see if that’s related to your problem, but if the support techs have been right and it’s the mysql server that you’re on there’s no way that I know of the check on the load of that machine.

Have you noticed if you tend to get the errors at the same time(s) every day? Also, do the errors tend to pop up when processing a request that would require more work from your database? Ie pulling a data from a very full table or anything like that?

And one final question, have all your 8+ replies from support been from the same person or different people?

A note, it is still possible that your code could be at least partially to blame here. If you’re running an inefficient code to access a database that already under a bit of a load then that could also be at fault. I would suggest that the next time you contact support you provide as many details as possible - give them all the information you’ve gathered, regardless if you think it points towards the problem or not, give highlights from your error logs with dates and and everything. My experience with support has been that they are extremely knowledgeable and know what they are doing, so give them the benefit of the doubt and at least be willing to try what they suggest (even if you’re sure they’re wrong).

Hope this helps…

–Matttail
art.googlies.net - personal website


#7

The load on the server is “load average: 7.53, 7.10, 6.80”, it’s often higher, I’ve seen it up at 11, but the support staff don’t seem to respond to the load averages I give them.

Also the only PHP processes I see when I use top are new ones that come and go. Often processes will stay around for several seconds despite using only 0.01-0.10 seconds of computer time, which does point to something external. I think all signs point towards the database server.

They say they’ve escalated my support ticket to level 2, all I know is this means it’ll take over 24 (more) hours to get a response.


#8

There is a tendency for the problems to occur at certain times of the day. Usually in the middle of the night here (Australia), which is the middle of the day in America, so that might be related to load. However it can occur at any time, it’s not so predictable.

Different people; some have been very helpful and understanding, others haven’t.

Yes, there’s always room for improvement, and I spend a lot of development time looking at xdebug profiling output and running EXPLAIN on queries to look for places where things can be optimized or where indexes are needed.
However when a script takes 0.1 seconds to execute on a 1GHz machine with 512MB RAM but takes over two minutes to execute on a 2 CPU machine with 4GB of RAM I’d be more reluctant to blame the script than the server.

I’m sure they are, but that’s what can get me more annoyed because you know they’re not trying. When a support staff tells you he’s trying to eliminate the client as the cause of "500 Internal Server Error"s you get the feeling he’s just trying to get rid of you, or isn’t spending any time on your problem.


#9

AS far as the load averages go I don’t think that’s bad. As I understand, the newer servers at quad core processors and even a load of 11 divides out to 2.75 - which is just fine.

Seeing as it’s Sunday you’ll probably have to wait till the level 2 folks come in Monday morning and have a chance to look into your support request.

Let us know how things go.

–Matttail
art.googlies.net - personal website


#10

Here’s a good example of what happens when things start to crash:

In this shot processes suddenly rack up and don’t get finished as quickly as usual. Note that each process is using very little CPU time, and if they could quickly be started up and finished there wouldn’t be a problem.
http://kestas.kuliukas.com/load.png

Then here it kills a whole load of processes, it probably does this because all of them together are using too much memory.
http://kestas.kuliukas.com/load2.png

In this one you can see that the load has suddenly shot up. Whether this is cause or effect I don’t know. Shortly after this top was killed, presumably because my user was exceeding its memory quota.
http://kestas.kuliukas.com/load3.png

The thing is that all of these processes are using very little CPU time; they should be able to each be processed and finished in less than half a second, which would mean that they wouldn’t stack up and exceed the memory quota.