.htaccess processing time and whack-a-mole


I’ve been collecting a database of ip addresses which are doing all forms of badness to my wordpress installs. I’ve written a perl script which which post processes logs from a few of my sites and looks at people who are looking for stuff that I don’t have (and have never had), I’ve set a few traps, like telling robots.txt to ignore /blackhole and I’ve watched lots of bots fall in, and I’ve had lots of bots try to hack wordpress, or search for compromised plugins. The program writes the bad players to an MYSQL database.

The database has just shy of 1,600 entries. My question is if I created a 1,600 .htaccess file denying each individually, how much overhead would this cause apache? Would I notice it?

The reason this process is futile is this is whack-a-mole, I’m identifying lots of potential compromised PC’s which may be cleaned up in the future and offer no risk post cleanup, but are dangerous when controlled by the underlying people who are bad.

Yeah, you’d notice it. It’s hard to say how much overhead it’d cause, since it depends on (a) that list, and (b) how often they’re banging on your door. The general rule is a bigger .htaccess is slower, as every page load checks it.

Have you considered trying this? http://perishablepress.com/5g-blacklist-2012/

While I’m not a super huge fan of having to use htaccess for this, this bad-behavior type file (and the ultimate blacklist) are a lot more dynamic than blocking by IP.

I was actually looking into this recently. I have an app hosted that isn’t very useful to most people, but it’s huge, and gets hit with tons of traffic that is causing a bloated cache folder.

The blacklist looks cool. Consider blocking entire countries? Unfortunately some, not to name any names, have a lot more spam/malicious traffic and blocking whole ip ranges is good.

You can check your logs for the most common patterns. I found many hits came from IP 180.76.5.x (actually that’s a well known crawler but so what) so just block the whole range like this:
Deny from

You can also look for common user agents and block them based on their user agent instead of checking IP ranges. No legitimate user is going to use a funny looking user agent but I bet you can find common malicious ones if you record the user agent string and look through logs:

It’d be nice if DH offered a facility to block requests higher up than .htaccess since it is so slow.

Yeah, sadly iptables doesn’t like our virtualization.

I actually have a number of legit users from Russian IPs that are normally bad news, so I’ve never been much for trying to block by IP. I prefer to base it on behavior vs location.