CPU resources / .htaccess?

wordpress

#1

beerorkid.com

So I got contacted by support. I have been using some high CPU minutes usage. I have hosted some linux repo’s and do not host them anymore. Thing is there are tons of dead links out there that still hit me. People posted links to specific packages and did not realize that they changed often. They get my 404 page.

So I am running wordpress and the regular 404 is just another wordpress page with all the crap on it which would probably be to blame for the high CPU usage cuz of the php. I have replaced the 404 with a simple HTML 404 (although it has a php extension) to hopefully reduce the strain on the shared server.

So I was wondering will changing my 404 help that much?
Should I do something in .htaccess to completely block incoming links to the now gone dirs?
Am I thinking correctly about this, as in the php on the 404 was using a bunch of CPU?

CPU monitoring has been turned on for a long time so by tomorrow I should be able to tell if switching the 404 made a difference by checking the logs.

EDIT: also I am sure many folk still have the repo’s in their sources.list. Would the daily automatic check for updates have any significant effect on the CPU usage?

beerorkid.com


#2

WordPress has a cache option. Go to the admin screen and click on Options. One of the subitems is WP-Cache. Enable it. I don’t think it comes enabled by default. You can also adjust the cache expire time which may help some.

Good luck!

-Scott


#3

[quote]
WordPress has a cache option. Go to the admin screen and click on Options. One of the subitems is WP-Cache. Enable it. I don’t think it comes enabled by default. You can also adjust the cache expire time which may help some.

Good luck!

-Scott [/quote]
Actually, this is a third-party plug-in, which is provided in the DreamHost installation. It’s otherwise not there.

I recommend caution, however, as the latest WordPress does its own caching, and dynamic plug-ins aren’t friendly to WP-Cache. I’ve tried it several times and I now keep it off.

Peace,
Gene Steinberg
Co-Host, The Paracast
http://www.theparacast.com
[b]My DreamHost Promo Plan—Use the code: [color=#CC0000]ROCKS[b][/color]


#4

hmmmmm…

I have had cache enabled. I did bump it up to 9600 seconds to see if it makes a difference.

Also I need to update wordpress. Can’t do it with the one click cuz I have changed its location since install.

Thanks for the replies.

beerorkid.com


#5

[quote]hmmmmm…

I have had cache enabled. I did bump it up to 9600 seconds to see if it makes a difference.

Also I need to update wordpress. Can’t do it with the one click cuz I have changed its location since install.

Thanks for the replies.

beerorkid.com [/quote]
There’s a plug-in known as Instant Upgrade that can do rapid-fire upgrades of your WordPress software. I’ve used it with both new release and even “Release Candidate” or RC versions, and it runs fine. Imagine upgrading your software in less than 30 seconds, perfectly. Take a look for it, but remember that it’ll probably have to be updated from time to time as WordPress is updated.

Peace,
Gene Steinberg
Co-Host, The Paracast
http://www.theparacast.com
[b]My DreamHost Promo Plan—Use the code: [color=#CC0000]ROCKS[b][/color]


#6

Could you please tell me how much cpu are you using? I’m asking because my site is using up something between 1200 and 1800 seconds per day and I have no idea how far I am from the limit. Should I start to worry or do I have still room?


#7

that upgrade tool looks pretty cool.

So I looked at my /stats/resources for today after switching to a generic 404 and it was even higher.

Process CPU seconds user machine count average
php.cgi 27906.9300 96.054% 116.279% 59043 0.473
php5.cgi 1022.2100 3.518% 4.259% 4854 0.211
pnmscale 60.8500 0.209% 0.254% 72 0.845
jpegtopnm 47.1500 0.162% 0.196% 72 0.655
pnmtojpeg 5.3000 0.018% 0.022% 72 0.074
top 3.5700 0.012% 0.015% 2 1.785
pnmcut 2.4100 0.008% 0.010% 12 0.201
sh 1.8100 0.006% 0.008% 162 0.011
convert 1.6400 0.006% 0.007% 3 0.547
jhead 0.7500 0.003% 0.003% 137 0.005
sh 0.1800 0.001% 0.001% 72 0.003
sendmail 0.1500 0.001% 0.001% 15 0.010
which 0.1300 0.000% 0.001% 30 0.004
postdrop 0.0900 0.000% 0.000% 15 0.006
curl 0.0800 0.000% 0.000% 5 0.016
bash 0.0500 0.000% 0.000% 3 0.017
uptime 0.0000 0.000% 0.000% 1 0.000
ls 0.0000 0.000% 0.000% 1 0.000

Total: 29053.3000 100.000% 121.055% 64571
Average per day: 29053.3000 1 days
CPU percentage assumes 24000 cpu seconds per day total.

So I have another idea, or it was mentioned to me.

I have two dir’s that are getting tons of hits pointing to packages that do not exist.


would a robots.txt work correctly to completely block the dead links more effectively?

wow mabiweb I was working on this reply and saw your reply. 29053 would be the answer to your Q

beerorkid.com


#8

Thanks, I can see how 29000 sec can be considered too much!


#9

Are you sure it’s the traffic for those particular requests that’s doing the damage?

If so, maybe try this and see if it helps.

Put this above whatever else is in your .htaccess file and it will just 403 any requests for the directories:

RewriteEngine On RewriteBase / RewriteCond %{REQUEST_URI} ^/(automatix|compiz)/(.*)$ [NC] RewriteRule .* - [F]

That just stops bots from crawling (assuming they pay attention to it), but wouldn’t affect humans.

If you want to do a little manual work, you could contact some of the site owners and ask them to remove the links. I wouldn’t bother with all of them, but if you have a few that are sending the most… go for it.

Instead of dropping the traffic, is there any way you can make money from it? If 404’ing isn’t helping the load, you could always just 301 it to an affiliate offer or something else. Even if it’s something irrelevant like a ringtone offer, tons of traffic will probably lead to a few sign-ups.

Just something to consider, since there are people out there that literally pay for more traffic. If it’s anything you can use to your advantage, I’d go that route.


:stuck_out_tongue: Save up to $96 at Dreamhost with ALMOST97 promo code (I get $1).
Or save $97 with THEFULL97.


#10

about 1.5 million hits a month just on those two dir’s. Sure most of it is just peoples automatic updates checking for new stuff. But according to my statcounter about 2000 a week will come in from a broken link and search around on my site for what they are looking for.

I contacted a bunch months back, but there are tons of links out there in different languages and forums where I would have to sign up, yadda, yadda… Them linux folk are crazy :wink:

I will try your thing in the .htaccess just to see what happens. Then consider milking the profit angle after a day to see if it calms down the CPU usage. Would be pretty funny to fill it up with ads.

Thanks for your assistance.

beerorkid.com


#11

That is a bunch of traffic.

If you have a CJ account, or any other affiliate account, you could surely convert some of that into income.

You wouldn’t even need to fill up your own page with ads, you could just redirect directly to an advertiser’s landing page, taking the load off your account.

If you wanted, you could probably even sell that traffic to someone that has something relevant to what they’re looking for. Then just changing that 403 to a 301 could just send it all their way.

Also, I’m not familiar with what you had, or about what would be auto-updating, but is it something where you can tell which requests are from people and which aren’t? Maybe by user agent? Specific file name request?

I guess HTTP_REFERER could be a tip off to a certain extent, but it would be nice to be able to drill it down more.

Maybe even a blank user-agent, or something similar that is rare with browsers.

If so, then I’d expand the .htaccess rules to continue 403’ing anything that’s not a human, but send humans to where you want them. That way, you can still make money from people while blocking bots, programs, etc…

You could even poke around the affiliate forums at places like Digital Point for more ideas.

If you end up making too much money from the traffic and can’t figure out how to spend all of it, I wouldn’t mind having a new Mercedes SL65 AMG. It’s only around $180,000 though, so if that’s not enough spending, I’ll take a few of them. :wink:


:stuck_out_tongue: Save up to $96 at Dreamhost with ALMOST97 promo code (I get $1).
Or save $97 with THEFULL97.


#12

“I have replaced the 404 with a simple HTML 404 (although it has a php extension)”

So does that mean you are actually still invoking PHP every single time a 404 gets generated ? Doesn’t really matter what’s in there then (it could be blank), still has to go through PHP, which is slow (especially the CGI version thereof).

seiler’s solution should conserve a lot more resources, though I fail to see why he uses RewriteConds and greedy RegExes. Also, it only works if you not also redirect 403 errors. You could probably condense it to

RewriteEngine On
RewriteRule ^(automatix|compiz) - [F]

Which avoids the useless greedy .* regex and only uses one regex instead of two (seiler’s version would first match the URI against .* (which always matches), then match it again in the RewriteCond against the actual thing we want to test against; the above just fires the RegEx engine up once, and it can ideally stop after just one character.

As for “converting” these into money, first of all, understand the traffic. Repository checks by apt-get update cannot be converted into money. And even if it could, dammit, not every single link on the web needs to lead to useless ads. It’s bad enough that you get tons of crap on domains people failed renew and vultures gobbled up.

I don’t believe apt checks robots.txt, since it isn’t actually crawling anything.

Once the links are out there it’s usually freaking hard to get them removed, I know the feeling :wink:


#13

Changing to a generic 404 has not seemed to fix much. And the highest thing using the CPU is the php.cgi. So maybe it is just that my main site uses too much php power with only 2000 on average page loads a day.

I do host 4 other sites under my user account, but they get very little traffic.

I would feel a bit dirty trying to make $ off of now defunct linux repo’s. Kinda bad karma or something.

Thanks for the assistance though, it seems DH is not mad at me anymore since I acknowledged the problem and tried to work on it.

beerorkid.com


#14

My question about it is whether that 404 is a PHP page (even if it is an empty one). If it is, that will be extremely slow with DH’s PHP-CGI setup (contrary to what the wiki says :wink:


#15

ahhhh…

I replaced the regular wordpress 404 to a html one with a php extension ( so it would play nicely with wordpress, did not work as a html )

http://www.beerorkid.com/404.php

but you say that would mess with it even more? Will look into changing it somehow.

beerorkid.com


#16

It won’t mess it up “more”, but it will not help, either. The .php extension usually means that PHP gets invoked when the page gets served. Therefore, while you may not be doing anything dynamic in there, it is still a dynamically generated page and will incurr the PHP start-up and tear-down costs (which are non-trivial).

What you can do is change the settings in your .htaccess file; I posted what you should add to it a few replies ago, as did seiler (his does the same thing, a little less obviously). i.e. place the following :

RewriteEngine On
RewriteRule ^(automatix|compiz) - [F,L]

into the .htaccess file in the main directory of your site at the top. That should avoid the PHP error pages. However, you might want to use

RewriteRule ^(automatix|compiz) http://somedomain/error.html [R,L]

instead, where the URL there is just a static HTML page (that way you can tell the user what is wrong instead of just getting a generic error page).


#17

I think wordpress is messing with me.

I am trying to point it to a html file and nothing seems to work other than calling it right out www.beerorkid.com/404.html

here is my .htaccess:

#ErrorDocument 404 /404.html

RewriteEngine On
RewriteRule #(automatix|compiz) http://www.beerorkid.com/404.html [R,L]


#ErrorDocument 404 /404.html
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} ^/(stats|failed_auth.html)/?(.)$ [NC]
RewriteRule ^.
$ - [L]

BEGIN WordPress

RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] #ErrorDocument 404 /404.html

END WordPress

Thanks for the help.

beerorkid.com


#18

I wonder where you get the # from.

Replace
RewriteRule #(automatix|compiz) http://www.beerorkid.com/404.html [R,L]

with

RewriteRule ^(automatix|compiz) http://www.beerorkid.com/404.html [R,L]

That should work.


#19

hehe… You got the poor guy so worked up about speed, that he’s commenting everything out as he writes it. :stuck_out_tongue:


:stuck_out_tongue: Save up to $96 at Dreamhost with ALMOST97 promo code (I get $1).
Or save $97 with THEFULL97.


#20

woops. But it still seems to not hit the html 404. I had added the word “html” up at the top to know when I hit the 404.html not the php.

fixed the ^ / # thing. still having a problem with it not hitting the html.

Oh well gonna search around on wordpress for some info. And prob go back to the WP 404.php for a day to see what happens.

Yesterday I used 46000 CPU seconds :frowning:

Thanks again for your help/

beerorkid.com