Spambots & htaccess

design

#1

Hope it’s ok to post this here- not sure where it should go.

I used my .htaccesss file to redirect a list of known spambots and graphics harvesters- especially graphics harvesters!- away from my site. The list of user agents is quite long. It’s effective tho. Is this too much of a drain (in other words, a no-no) on DreamHost’s server?

If anyone from DH happens to read this, are there any plans in the future to set the servers up to keep these bots out at a more basic, across-the-board level? This would be less of a drain on the system.


#2

I think it would be risky – and possibly resource intensive as well – to implement a block of this sort globally.

Are you using mod_rewrite? mod_rewrite does have the potential to cause resource consumption problems. A poorly written regexp or an error in the file especially can cause problems.


#3

Thank you for your reply- using .htaccess is something I’m not very used to. At least not for this. Just didn’t want to have to change all of my pages to php. I’m using something like this:

SetEnvIfNoCase User-Agent “^[Ww]eb[Bb]andit” bad_bot

Order Allow,Deny Allow from all Deny from env=bad_bot

ErrorDocument 403 http://www.errorpage.com

Only the list of user agents is, of course, longer. Any thoughts?


#4

LOL! Yah, banner ads stink, but I don’t feel bad for the unwanted bots. Sorry you had to see the ads tho. :-)Actually, it should be just pointing to nowhere.com (which doesn’t exist) right now, since I don’t know that it’s necessarily good to send 'em to Excite.

Wow! You really pointed out a couple of helpful tips. It would be a good idea to redirect to the shtml pages. The thing with robots.txt is that it’s necessary for the “good” spiders. Hmm…

The encoded pages are done with a program called HTML Protector. Still have to get my subdirectories, but the important ones are covered. Disable-right-click scripts don’t work very well, but this program encodes the entire page if you want it to. You can get it at http://www.regnow.com/softsell/nph-softsell.cgi?item=8527-2&affiliate=25698 (yes, I’m a shameless affiliate- lol).

Thanks for your help!


#5

According to RFC 2606, you should use example.com, example.net, example.org, or a domain in the “dummy” TLDs .test, .example, or .invalid, if you want to have a link to a known-to-be-fake address.

http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2606.html

– Dan


#6

Thanks Dan! Didn’t even know those existed, and didn’t want to send the evil bots to eat up someone else’s bandwidth. Have learned so much from the replies here- thank you.

~Michelle