First php call is very slow (~ 2 mins)


#1

Hello,

I have a problem with the first php in my web site. The first php call takes about 2 minutes to get response. The following calls takes about 100 ms. Then after the first call there is no problem.

However if I wait some time when I do a php call it takes 2 minutes again.

I am testing a “Hello world” php (http://www.nalo.cat/hello.php) that do not connect to database. It only prints a message.

I have the FastCGI enabled for this domain.
I have only one domain in my user context and it is running on “brasco” machine.

I have been monitoring the processes executed by my user. When I have a single “php53.cgi” process the php call is slow (~2mins) then a second “php53.cgi” process appear and keeps running until some timeout and it is stopped. After this timeout the php calls are slow. Before this timeout the php calls are fast.

Do you have any suggestion to solve my problem?

Thanks!


#2

That’s a problem we can’t fix ourselves, so let Support know about it.

What you’re witnessing is the first process failing to serve and a second one taking it’s place. To keep that second instance live you can try pinging a PHP file on the site every 5 minutes to keep it from timing out. You might use a free Pingdom account for this purpose.


#3

Thanks sXi.
I have set a monitor in “uptimerobot.com” (It is similar to Pingdom) and opened a Support ticket.
However, do you have any idea which can be the reason first process is failing to serve the request?


#4

The glitch is systemic; it’s most likely not your script causing it.


#5

Of course, the problem was solved by Support. They told me that there was “an issue with the apache instance”.
Thanks!


#6

The ‘multiple of 30 secs lag on php’ bug appeared (apparently spontaneously) on another of my websites around 14:30 (dreamhost server time) on April 16th and disappeared (apparently spontaneously) several hours later.

Is dreamhost still actively trying to chase down this bug? Is customer feedback useful?


#7

We are. Did you raise a ticket about this?


#8

The bug, which hit one of my websites for several hours on 16th April as I mentioned in post #6, has reappeared on the same website and is still there at this time of writing.

This time the bug sputtered into being gradually (I haven’t noticed that before … on previous occasions it seems to have begun suddenly)

I have scripts on dreamhost and on another host probing my sites every 10 minutes. They vary between probing static html (hello world), static php (hello world) and minimal php+sql (mysql_select_db)

Defining a ‘bad’ probe as a probe which takes 30 secs or more, there were some isolated bad probes around 2013-04-19-0340, then good probes for a few hours until 2013-04-19-0610 since when all probes have been bad.

The site is on 69.163.186.123 and is the only site which I have on that IP number.

Just now while writing this note I’ve used a “find my ip neighbours” tool to find some other sites on that IP number, chosen one at random hoping it would be a php site, and bingo … it takes >30 secs to load.

I’ve put in a support ticket referencing this post.

** edit ** P.S. sorry I should have clarified, it’s only the php probes which are bad. All the static html probes are good.


#9

I’ve received a reply from the support team,

[quote]I apologize for the delayed response on this issue.

I’ve checked your site and it’s loading quite fast for me.

And it looks like the apache instance for your site was recently
restarted:

Current Time: Sunday, 21-Apr-2013 15:17:54 PDT
Restart Time: Sunday, 21-Apr-2013 03:43:53 PDT
Parent Server Generation: 3
Server uptime: 11 hours 34 minutes 1 second

I suspect any slowness may have been due to the apache instance.

If you’d like, I can move your site to another one - just let me know.[/quote]

My php probe logs confirm that the site was indeed loading fast at the time when the support team checked the site.

The 30-secs-delay bug disappeared around 2013-04-21-0350 and reappeared around 2013-04-21-1740.

This is a bug that knows how to hide from the support team.

The bug is still there at this time of writing.

It would be interesting to know why the apache instance was re-started at 21-Apr-2013 03:43 PDT. And if any support staff are trying to find out why the bug returned, I suggest looking at anything that might have happened shortly before 21-Apr-2013 17:40 PDT

In summary,

bug present: 2013-04-19-0610 to 2013-04-21-0310
site down: 2013-04-21-0320 to 2013-04-21-0340 (corresponding to apache restart)
bug absent: 2013-04-21-0350 to 2013-04-21-1730
bug present: 2013-04-21-1740 to now

While the bug is present, min. response time to php probe is 30 secs and max. response time (observed so far) is 1496 secs which is nearly 25 minutes.

My probe logs show a lot more interesting information than I can include here and I’m not sure if I can be of any help and/or how.

I suspect that Dreamhost won’t get anywhere with tracking down this bug until they install their own probes.

I’ll just mention that, very occasionally, some of the pure static html (‘hello world’) probes also suffer unaccountable delays. Oddly, a pure html probe suffered a delay of 103 secs at 2013-04-21-1740 which is exactly the time when the php probes started going consistently bad again.


#10

** update at 2013-04-22-0650 **

the most recent bad probe was at 2013-04-22-0010 (290 seconds)

At 2013-04-22-0020 the site was down (curl error code 7) … another apache restart?

Since 2013-04-22-0040 all probes have been good (<1 second)


#11

** further update, April 23rd 02:40 **

Except for a brief 20 minute spike of badness around April 22nd 09:00, probes remained consistently good until April 22nd 12:10 at which point they became consistently bad again and are still bad now.


#12

The bug is still ongoing (as of April 24th 04:50)

The record for longest successful probe (which I mentioned previously as 1496 secs) has been broken a few times … there has been a probe which returned successfully after 2408 seconds (which is 40 minutes)

i.e. a ‘curl’ command for a simple php ‘hello world’ successfully came back after 40 minutes with the correct result (i.e. it didn’t just time out or get lost)

Interestingly, while these 40 minutes were passing, some of the subsequent ‘curl’ requests on the same file just got lost (i.e. the cron job produced no output) and some returned correctly immediately (<1 sec)

Just to mention, for over an hour on April 23rd (12:30 to 14:10) my database server was unreachable and all my sql sites were down but I have no reason to think that would have been connected to the php bug.

Anyway, I’ve replied on the support ticket as follows:


#13

** further update (April 25th, 05:30) **

It’s enough to make one think that the bug is psychic, because within half an hour of my note yesterday, there was (what appeared to be) another apache restart (suggested in my probe logs by curl error code 7).

The problem site has now been running for 24 hours without any appearance of the bug.


#14

Still looking good … no >30-sec delays logged since April 24th. Meanwhile, I’ve received a reply on the support ticket to my question (“Will there be an announcement when Dreamhost finally cracks this bug?”):

[quote]Thanks for the follow up.

Unfortunately, there really isn’t any bug - sometimes there may be a hung process that can cause a apache instance to go unresponsive or become slow.

Usually a restart resolves that, but we also don’t really log why an apache instance is restarted.

Since you’ve stated that you’re not too concerned with the apache instance, I’ll leave things as is for now.

However, if you do notice any further performance issues please let me know.[/quote]

That is interesting. I think it’s somewhat outrageous to call this situation “not a bug” but I suspect that arguing about it won’t help. The important question is, how frequent are these “hung processes”? If they are very rare, then one might reasonably regard them as among the background hazards (like cosmic rays) that one can’t really do anything about.

The problem (for me, anyway) is that my websites have been hit by this ‘non-bug’ on three separate occasions since January, on 3 different IP numbers. My websites are spread across 19 different IP numbers (which I assume means 19 different apache instances) so in the absence of further information it seems as though there is a nearly one-in-six chance that the bug could hit any particular apache instance over the course of a few months.

If customers could pool information about the prevalence (or non-prevalence) of this problem, then we could get a clearer idea of how much of a problem it is.

If Dreamhost is inviting its customers to do their own self-service monitoring of the health of their webites (“if you do notice any further performance issues please let me know”) perhaps we might prepare some generic scripts that should be useful for a number of customers to monitor their websites.


#15

[quote=“tomtavoy, post:14, topic:59766”]I’ve received a reply on the support ticket to my question (“Will there be an announcement when Dreamhost finally cracks this bug?”):

“Unfortunately, there really isn’t any bug - sometimes there may be a hung process that can cause a apache instance to go unresponsive or become slow.”

That is interesting. I think it’s somewhat outrageous to call this situation “not a bug” but I suspect that arguing about it won’t help.[/quote]

Unfortunately, the response you received is not only ridiculous; it’s expected.


#16

~Dreamhost support representative

What I have in mind is a couple of one-line scripts to take the pulse of the websites on a customer’s account and automatically send an email to Dreamhost support if any performance concerns arise.

One script places a “hello world” php file in the root directory of every website on an account. This is done just once, or it can be done again whenever the customer adds more websites to the account. The other script goes in a cron job, and sends a ‘curl’ request to each of the placed php files, and sends an email if any of the responses take more than say 5 seconds to come back.

I’ve set a first version running in my account and if it seems OK in practice I’ll post the scripts to ask for feedback. Then the next step would be to ask Dreamhost for an email address that the exception-reports can be emailed to.


#17

A minimal non-null php file would be something like <?php echo "hello world" ?> but we might just as well have it do something slightly more interesting, so I suggest the following as the php file to be distributed onto every website in the account

The filename can be anything, so long as it is not the name of any existing file on the website, and in the following sample script I’m naming it ‘jtime.php’ (the initial of the Dreamhost support staff member who started this off is J).

So, to distribute this file onto every website on an account:

This script takes one parameter, which is a dreamhost api key that can do ‘domain-list_domains’.

If anyone is interested I’ll explain the details but basically it is simple apart from the escape characters needed to put everything on one line. It gets a list of domain paths and associated user names (captured respectively into ‘$2’ and ‘$1’ in what follows) and for each item in this list it does (disregarding the escape characters for clarity)

(where the final “> $2/jtime.php” is escaped so that it is part of what is executed remotely)

On my account I’ve set up one of the users to have passwordless login to all the other users and I’ve found this very useful. You might object that this is a security lapse but, provided this user does not itself host any websites, then I think it is secure … but I’d be grateful if anyone could point out if I’ve gone wrong already.


#18

Dreamhost support have invited us to let them know if we notice any performance issues, so here is a script that could do that.

It takes the pulse of each website into which the script in my previous post has loaded the file ‘jtime.php’.

This gets a list of domain names and for each item on the list (where the domain name is $2) calls

and sends an email report if any of the calls takes longer than a given number of seconds.

The script takes three parameters: the first is a dreamhost api key that can do ‘domain-list_domains’; the second parameter is an email address for sending reports of slow sites; the third parameter is the number of seconds that counts as slow.

In this version we are (for simplicity) discarding the output of the php call and looking only at StdErr which, if the ‘curl’ command succeeds, will be the elapsed time in seconds reported by ‘time’ or, if the ‘curl’ command fails, will be (if anything) an error message produced by the system.

It is tempting to lump any error results in with the slow results. However, as we are specifically targetting slow sites, Dreamhost support will surely jump on any excuse to ignore the reports, and if they see anything that says (for example) “Command terminated by signal 9” they will say “Ah! Your command was killed. Obviously it was using too many resources” and they will refuse to look further.

So, for now, we had better weed out the error results and send only the slow results. But how to tell the difference?

Well, in the course of logging nearly a million php calls, my logs have collected a surprising variety of error messages but, so far, none of the error messages begin with a number. So this script tests the first word of StdErr for numeric (“if $2 == $2+0”), and if numeric, assumes that it must be the time in seconds reported by ‘time’. We then compare this with the third parameter supplied to the script (“if $2 > $3”) and if greater, we pass the line through to ‘mail’.

Finally, we should send email only if there are any lines to send. The ‘-E’ option to ‘mail’ sends mail only if the message body is non-empty.

So this script sends an email if (and only if) any of the php calls take longer than a specified number of seconds, where the email message is a list of the slow domains and their times: for example,

domain1.com 35.32
domain7.org 49.55

or even (there has been another apache slowdown leading apparently to an apache reset at approx 17:20 on April 26th on 69.163.192.103, which is the same IP number that had a slowdown on March 4th/5th)

mydomain.com 5981.95 (that is, nearly an hour and 40 minutes, the current record)

Next I will ask Dreamhost support if this approach could help track down performance issues, and if so, for an email address for sending the results to.

Comments welcome, especially about any mistakes or wrong assumptions I’ve made.

P.S. … just to mention … these scripts are generic. Anyone could run them. They work equally well at my other host, which gives an interesting perspective on comparative performance.


#19

There was another slowdown event on my server, starting May 2nd 02:50 PDT and ending at 20:35 on the same day. Over this period, several of my websites consistently showed delayed response, and very frequently the delay was either slightly more than 30 secs or slightly more than 60 secs.

So superficially it seemed like the same kind of slowdown that other customers and I have noted here several times over the past several months, specifically php-calls lagging (on the first call only, if FastCGI) by multiples of 30 secs.

But when I actually looked at the logs it was clear that something different happened this time. Here are 3 points which make it different (point 1 is enough to make this a completely different type of incident; points 2 and 3 just add more detail):

(1) on the affected websites, static html (‘hello world’) files showed exactly the same kinds of delay as php files

(2) there was no discernible correlation between IP numbers and affected websites, i.e. the same IP number could contain both affected and unaffected websites

(3) on closer examination, the delays showed a curious consistency that did not appear with the php-related slowdowns discussed previously:

specifically, the delays which were slightly over 30 seconds were tightly bunched between 34.5 and 35.5 secs, and the delays which were slightly over 60 seconds were even more tightly bunched between 63.2 and 63.5 seconds. This bunching did not occur in previous incidents, where for example the delays which I described as being slightly over 60 seconds were actually scattered between 60 and up to about 68 seconds.

Here are some additional details to provide context:

I have been logging 52 websites, of which 19 were affected in this incident, and the other 33 continued without performance problems; the 19 affected websites are spread across 10 different IP numbers (some of which also host unaffected websites, as I have mentioned)

There also seems to be no correlation between affected websites and users (i.e. some of my users host both affected and unaffected websites)

Here is the response on the support ticket which I’ve sent to Dreamhost:

[quote]Thank you for your reply (April 25th, this ticket number).

Please could you have another look at the forum thread, https://discussion.dreamhost.com/thread-137826.html, where I’ve been looking for ways to take up your invitation to let you know if we see any further performance issues.

There has now been another slowdown incident, as described in my post today on the thread.

Is this kind of information useful to Dreamhost? Do you regard these slowdown incidents as bugs? If so, how can we get this information to you more expeditiously? Would it help if our scripts automatically send alerts to a Dreamhost email address?

Thanks.[/quote]


#20

Reply has come back from DH Support,

[quote]Thanks for the feedback.

I double checked with a manager and it may be related to a fcgi issue
where get commands just hang forever and clear up on an apache restart.

Our dev team has been trying to find a cause, but it’s very difficult to
pin down since it isn’t isolated to any specific server or instance.

I appreciate your willingness to help, but it looks like our dev team is
indeed aware and performing some diagnostics and looking into things
further.[/quote]

Forza!