Sporadic but repeatable php slowness


#1

This is on a shared server. Out of about 50 domains and subdomains hosted in my Dreamhost account, about 48 behave normally, but 2 show a repeatably strange phenomenon.

Which is, an unexplained delay of between 30 seconds and 2 minutes when a browser or curl requests a php file for the first time.

The php file is served normally, it’s just terribly slow to arrive.

If there is a request for the same php file again within the next few minutes, then it is served without delay. But if there is inactivity for more than a few minutes before requesting the same php file again, then there is again approx one minute of delay like before.

NB it only happens in two of my websites out of more than 50.

One of the problem sites is on a shell user; the other is on an ftp-only user.

One of the problem sites is a domain; the other is a subdomain. The parent domain of the problem subdomain behaves normally.

One of the problem sites is the only site on its user; the other is on a user that contains several other sites all of which behave normally.

The delay applies only to php; one browser can be happily browsing html on a problem site while another browser, or curl, is being delayed while asking for php.

The strange behaviour has been consistent over the 5 days or so that I’ve noticed it, and I’ve been testing it a few times every day; it is always the same two websites which show the problem.

The php file on which I’ve been testing this is about as simple as it could possibly be, it is just

The delay, insofar as I’ve observed it (which is a few dozen times by now), is always at least 30 secs and less than about 2 minutes.

The window of goodness within which a subsequent request is served normally appears to vary but is more than 5 minutes and less than 10 minutes. i.e. if I wait more than 10 minutes before trying again, then the problem will certainly return.

I’ve tried several alternatives, such as running a browser on my local machine while running curl on the server, and all combinations give the same result in terms of when there is or isn’t a delay.

Just as an example, to show what I mean: if I request the php file in a browser on my local machine, and at the same time request it from curl on the server, then both requests are delayed, and both are satisfied within a few seconds of each other after a delay of one or two minutes;

but if I first request it from curl on the server, and wait for the delay to finish, and then request it from a local browser, the local request is satisfied immediately. And vice versa.

The delay happens even when I run curl on the same user as the problem site is on (in the case of the site which is on a shell user).

Checking the dns with http://www.intodns.com/ gives a clean bill of health for the problem domain, and for the parent domain of the problem subdomain (except for the “Recursive Queries” caution which I believe applies to all domains at Dreamhost)

This seems strange. Can anyone suggest what might be the problem?
[hr]
P.S. another thing that might be worth mentioning, one of the problem sites is on ‘PHP 5.2.x FastCGI’ and the other is on ‘PHP 5.3x FastCGI’.


#2

If suexec is lagging on firing up the cgi then that’d explain any “first run lag” with subsequent requests behaving quickly.

Just a stab in the dark - but I’ll blame suexec and then deny everything when proved wrong :smiley:


#3

Turn off FastCGI.

Use webpagetest.org to test from various locations to ensure it’s not your machine/connection.


#4

Well this is strange. This morning (I’m on Europe time, by the way) I saw your replies. I was about to turn off FastCGI as you suggest, but before doing so I decided to run the tests again, and found that the problem had disappeared!

So I came back to the forum to post a note to that effect, but found it saying “Bad Gateway”. Later in the morning it was still saying “Bad Gateway”, so I drafted the following note in order to post whenever the forum came back up:

[quote]This morning the problem has disappeared, and all my websites are serving php without delay.

While waiting for the “502 Bad Gateway nginx/0.8.53” message on this forum to clear today, I’ve had time to run the test three times at hourly intervals, and all the tests have worked perfectly.

It seems very likely that Dreamhost must have fixed something in the last 12 hours, I wish they’d tell us what they did!

I’m sure it wasn’t a problem with my local machine or location, because the delays occurred identically when I ran scripts on the Dreamhost server in LA which (as I mentioned) loop through my websites doing ‘curl mysite/hello.php’[/quote]

By afternoon the forum was still saying “Bad Gateway”, and then I went out. Now it’s evening and I’ve returned, and find the forum working again. So I decided to run the tests one more time before posting … and the problem has come back again!

Now I’m not sure what to do about turning off FastCGI. My instinct is it might be advisable to make no change until the problem is better understood.

webpagetest.org (when run outside the window of goodness on the two problem websites) is reporting typically First View = about 35 secs and Repeat View = about 0.3 secs, but sometimes it times out on First View and refuses to try Repeat View.


#5

Bad Gateway errors here at the board have been common for several weeks and have nothing to do with your connection or websites.

You could always turn fcgi mode back on (recommended) if you wanted to try dropping it temporarily as a test.

Set up something to ping a blank.php file in the PHP5.3 FastCGI domain every 5 minutes and compare results against the non-pinged PHP5.2 FastCGI domain.

If you happen to catch the issue happening you could log in to shell and poll some stuff with commands like free -ms 1 and sar -u 2, etc. to see if maybe you can gain any insight on what might be occurring.


#6

“Bad Gateway errors here at the board have been common for several weeks and have nothing to do with your connection or websites.”

Yes I was aware of that. I simply brought that up to explain why I drafted a note but did not post it immediately.

I’ve now changed the site which was previously on ‘PHP 5.2.x FastCGI’ to 5.3 while turning off FastCGI. This has had the interesting effect of removing the window of goodness, i.e. it now gives a delay every time. The other problem site is unchanged and still has a window of goodness.

I’ll next look in to your further suggestions. Thanks.


#7

Yesterday evening (European time) the problem had gone away, and all my websites were serving php without delay ~ I ran the tests several times at hourly intervals or more, and all the tests ran perfectly.

This morning the problem has come back again, exactly like before: the tests are consistently showing delays.

The site which is on ‘PHP 5.3.x CGI’ gives a delay of at least 30 secs every time, in response to ‘curl mysite.com/filename.php’, where filename.php contains simply

The site which is on ‘PHP 5.3.x FastCGI’ likewise gives a delay of at least 30 secs the first time, but then serves the file immediately upon request, unless there is a period of inactivity for more than 10 minutes or so … then the next time it gives a delay again.

Maybe this difference is due to the way in which FastCGI keeps some processes running and ready to handle requests; maybe such processes either die or become somehow less ready after 10 minutes of inactivity.

I can now add one further bit of interesting information: the length of the delay tends to be a multiple of 30 seconds plus 1 second.

Usually the delay is approx 31 secs; quite often it is approx 61 secs; occasionally it is 91 secs; and so on, up to the longest delay I have observed which was 3 minutes and 1 sec.

This rule is not 100%. Occasionally there is a delay of say 40 secs. But that is the exception.

I am running all these tests in scripts on the DH server in LA.

I have tried out sXi’s suggestions of free -ms 1 and sar -u 2

Running these on a user that has a problem website, there doesn’t seem to be any appreciable change in the numbers shown while the delay is being incurred. Also, the numbers shown for this user don’t appear to be appreciably different from the numbers shown for others of my users which don’t have problem websites.

But am I right in thinking that when php is served in response to an incoming http: request, it all happens on another user from mine, and my own users would not be expected to be showing anything unusual?


#8

^ Try that.


#9

I have a cron job doing ‘curl’ on a minimal php file on the FastCGI site.

Varying the frequency of the cron job gives some insight into the length of the ‘window of goodness’. When the cron job runs every 6 minutes or more frequently, it usually keeps the window open and there is generally no delay; when it runs every 7 minutes or less frequently, the window usually has become closed and there is generally a delay.

Here is some timing information that might be useful. For a typical run without delay,

[quote]User time (seconds): 0.01
System time (seconds): 0.00
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm: ss or m: ss): 0:00.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 12784
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 890
Voluntary context switches: 5
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0[/quote]

and for a typical run with delay,

[quote] User time (seconds): 0.02
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm: ss or m: ss): 2:10.41
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 12800
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 891
Voluntary context switches: 132
Involuntary context switches: 5
Swaps: 0
File system inputs: 0
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0[/quote]

The main difference that I can see (apart from the wall clock) is in the context switches: voluntary,involuntary = 5,1 (when there is no delay) and = 132,5 (when there is delay).


#10

^ this


#11

I’m not sure what you mean; the other bad site already has FastCGI turned off, and it consistently gives a delay every time.


#12

Experiencing the same issue here on a new domain. PHP files usually take 30 seconds + a few milliseconds. Subsequent request happen without delay… until I stop making requests for a while and then the delay reemerges.

I opened a ticket, but the extent of the support was restarting the server, which didn’t help. I’ll try disabled fastcgi and report back, but why would that suddenly be an issue when I’ve had several other domains using it just fine for years?


#13

After persisting all day today (European time) the problem has (according to my cron job) suddenly cleared itself, within the last hour, apparently spontaneously.

If Dreamhost have done something that would make the problem clear, I hope we will hear about it.

If the problem has cleared all by itself without Dreamhost doing anything, then there is still reason to be concerned.


#14

It is concerning, because I’m still having the problem. I just requested my site and it takes 30 seconds.


#15

Yeah, don’t turn off FastCGI. It’s likely the starting mechanism that is the point of failure.

[quote=“tomtavoy, post:13, topic:59014”]If Dreamhost have done something that would make the problem clear, I hope we will hear about it.

If the problem has cleared all by itself without Dreamhost doing anything, then there is still reason to be concerned.[/quote]

Until someone from DreamHost responds then you should assume that things are broken.

Turn FastCGI ON for all troubled domains and use the 5 minute ping hack to keep PHP loaded until such time as DreamHost officially inform you that they have indeed fixed the issue. If they do, please share with us what it was.


#16

In this neck of the woods the problem returned Jan 14th 20:30 PST (according to cron job) and is persistent thereafter (time now is Jan 15th 03:45 PST) so I’ve sent in a support ticket. Will post any further developments.


#17

Seems like this is also related to the 100% Uptime post https://discussion.dreamhost.com/thread-135204.html as they seem to be the same issue. Sporadic site slow-down, which from a monitoring perspective is as bad as a site being down. I see this almost every other day from random domains on my account. Static html pages load fine, but even the simplest of php or cgi script crawls…30sec to 2min delay in loading.

I am interested to see if you get any useful response from DH regarding this issue. I have submitted many tickets on this very same issue, and outside of moving me to another server and restarting Apache instances, they have not solved the problem or identified the underlying issue.


#18

Dreamhost have responded,

[quote]Hi there,

This is just a note to let you know that although it’s almost been
24 hours since you sent in your message, we haven’t forgotten about
you!

We strive to answer all questions within 24 hours, but due to the
large number of questions we have right now we’re afraid we may have
to go over a bit this time. :frowning:

Please hang in there and we’ll be getting back to you as quickly
as we possibly can!

Our apologies again, and we appreciate your understanding.[/quote]

Meanwhile … I have experimented a little to see what is the minimal php action that triggers the ‘php is alive’ state (such that a request to another php file soon thereafter will be served without the 30 secs delay).

It turns out that

  • curl to a single-byte file (containing hex 0a) with extension php or phtml does trigger php alive
  • curl to a zero-length file with extension php or phtml does trigger php alive
  • curl to a non-existent file with extension php or phtml does not trigger php alive

Also, it is interesting to note that although the preponderance of the delays are around 30 secs (or actually 31 secs) as I have noted, there are very occasional delays that are much longer; my cron job has logged one instance of a delay of 11 minutes that eventually loaded successfully.
[hr]
P.S. Looking again at that note from Dreamhost, it’s unclear why they say “Our apologies again”. I think that the apology, which they claim is a repeated apology, is in fact the only apology.


#19

tomtavoy, I wouldn’t try to analyse that message too much. It appears to be an automated message you get when they haven’t closed the ticket within the first 24 hours. I received the same one.

I’m hoping that’s actually a good sign: it probably means they’re seriously investigating the issue.


#20

Yes there is reason to be hopeful. According to cron job, the problem vanished from both my bad websites at 12:30 PST.