High load on dedicated server

vps

#1

Hello,

For a while now we have been having ongoing issues (several months) with the load on our dedicated server. The problem is very random and is always an apache or php process that starts to consume 100% of the CPU. Websites become less and less responsive when this starts to happen. Eventually, they do not respond at all. Sometimes we can wait and the process will release and the load will go back down on its own. Other times we can kill all the apache and php processes, restart apache, and everything is ok. We even have to reboot the server occasionally to fix the issue if a process cannot be killed. The server this is happening on has approximately 140 websites hosted on it, all Joomla! based on various versions.

This is causing a lot of service disruptions to our customers and their websites and we would like to find out what we can do to fix it long term.

We have tried contacting live support, but they simply kill the process, or reboot the server, and say it is fixed. That is not really a good long-term solution for us. It is exactly what we are doing, and only temporary at best. Because the problem is random, and we want to return service to our customers as quickly as possible, we cannot put in a ticket. Response is not fast enough and the issue gets resolved before a higher level admin has time to investigate. Last contact with support suggested a cron job to restart apache every hour. That is too disruptive. They also suggested increasing memory allocation to Apache, but refused to suggest which options to change. I just don’t see how this could be a memory issue at this point. Swap is never even touched on that server.

We are running Debian Squeeze (6.0.7) on this server, so not the latest available, but still fairly recent. Although, apache seems to be a custom compiled version that DH packages on their servers. I know making config changes can be hairy at best, and overwritten by the panel. So I’m decidedly cautious about this route.

Is there anything we can do? Has anyone seen this before? Any suggestions on where we might be having problems?

Thank you for any help,
Jay


#2

Hi Jay,

I’m not an expert at all, but thought I’d share my experiences in case they were of relevance.

Our DH dedicated server occasionally hits 20%+ CPU usage leading to slow response times on the website. I was told by DH support to try using these commands from the shell next time it happened to aid diagnosing the cause:

uptime - shows cpu usage for last 1 minute, 5 minutes and 15 minutes
free - shows memory usage
top (or ps aux) - shows process usage

So when it happened again and uptime showed cpu usage was at 20+% for the last minute I then used top to see what processes were using all that cpu. Trouble is, none of them were using over 0.5% and there were only 10 or so processes. And yet a minute later, uptime was still showing 20+% usage for the previous minute.

So we are none the wiser yet. Maybe someone more informed can explain why top doesn’t seem to agree with uptime on cpu usage for the last minute. Or else how best to identify the problem process from the shell when cpu usage is currently too high.

Apologies if this is not the same issue as you, Jay, but it might be related.


James


#3

Hi James,

Thanks for the response. This is related to what we are STILL experiencing, but I use those tools daily and they aren’t really that helpful in diagnosing this issue. Our issue is always either apache or php causing high CPU load and I need more visibility into those processes and what might be the culprit. And no response from Dreamhost support was ever helpful. We finally gave up and just started bouncing apache when it happens.

In your experience, I would suggest running “vmstat 1” or “iostat 1”. If you don’t see a particular process that is consuming a lot of CPU, then I bet it is I/O related. You might have a process blocked waiting to write to disk. When running vmstat, you want to look at the first column “r”, which is the number of processes that are blocked waiting to run. If that number is say, more than 0 or 1, you could be disk bound. The bi and bo columns also can be helpful. Those are the number of blocks being written in and out. See if you have a lot of disk activity during the high load. You can use iostat to narrow down your disk activity even more.

If it is disk bound, the question is now are you writing to shared storage somewhere, or dedicated disks? And what process might be doing that. I have seen bad disks cause this issue.

Hope that helps your situation. Make sure you run those commands using sudo, including top.

Jay


#4

Hi Jay,

It’s extremely kind of you to give those suggestions. That’s terrific! I will try them when the problem next occurs. Our problem could well be disk issues for all I know.

I really hope you receive some good ideas on the forum here for your situation, Jay. It’s a big shame you’re having to bounce apache so often.


James