Any size restrictions for the .htaccess file?


#1

Hi, I am a potential new customer. The host I am currently with (Inmotion) gave me an ultimatum to reduce the size of my .htaccess file in three days or else my account gets disabled. This was after about 1 year of using it as described below:

I had been using a large all-encompassing .htaccess Deny list from this website:

https://www.countryipblocks.net/country_selection.php

I am doing that because I am interested in preventing any other countries from using my information, there is just no reason for it. It is local to my town. Using the list they generate to allow only the US and Canada (including Canada due to shared ISP networks) has some holes in it which we discovered, thereby causing some unintended blocking for the target audience. Therefore my best choice is to block every country EXCEPT the US and Canada. At least any holes in that list has not resulted in target audience getting blocked (thus so far).

My big question is to Dreamhost is this: With SHARED HOSTING, is there ANY limit to the size of the .htaccess file. In particular, the file in my root directory was ending up with about 80,000 entries after copying and pasting the list generated by this website. It was about a 2.1 megabyte file with everything included. I’d prefer to continue to use this method of blocking because it really works, and I find it super-easy to download and update my site with.

Also, if that is not feasible, I would also be willing to use the free services from Cloudflare in conjunction with a Dreamhost account. In particular I would be interested to know if I could use mod_couldflare in conjunction with a shared hosting account so that I could still use .htaccess to block specific IP address ranges if need be. (Inmotion requires a dedicated VPS or other non-basic service to use mod_cloudflare)

Thanks to all for any info
Chris


#2

Using an .htaccess file that large will slow your site down significantly, and may have some negative effects on the server as a whole. Without testing it ourselves, we can’t say whether we’d be able to support that on DreamHost, but, in general, an .htaccess file that large is inadvisable.

If your site uses PHP, you would get much better results by implementing the IP country lookup in your PHP scripts, rather than in the .htaccess file.

As far as CloudFlare goes, that is available on all DreamHost accounts.


#3

I’d investigate a more appropriate method.

ZB Block has a good one: www.spambotsecurity.com/zbblock.php


#4

I have one for you: Say I live in your town and go to Japan on vacation. I want to show my father your site. Can’t. Sad panda :frowning:

Honestly, it’s something I wouldn’t even bother with. Blocking by region, I might use PHP Geo IP - http://php.net/manual/en/book.geoip.php - and toss up a warning flag “Hi, you’re in Senegal!” but again, since I can’t possibly predict where people are located, it’s a wasted effort on my part unless I’m totally getting spam-hammered by Russia. Again. And then I’d use http://perishablepress.com/5g-blacklist-2012/ instead.


#5

Unless said potential customer is going for a VPS or PS, then PEAR’s implementation would be the way to go, no? GeoIP extension is not installed on shared servers, at least not on mine.


#6

Yeah, PEAR it in with maxmind’s free country tables (tested it on a stats app here years ago). GeoIP might be more work than necessary tho.


#7

I geolocate all visitors with PEAR’s GeoIP module as part of a framework I use. You can bet I benchmarked it and cache the results. Lookup takes only a few milliseconds and, if only interested in country-level resolution, updating the database once a month via a cronjob takes < 1 second.


#8

I didn’t say GeoIP was slow, I said it might be more work than necessary (i.e. for OP to accomplish his intended goal). There’s stuff available online that can be used practically out of the box without needing to muck around too much.

Country code lookups should work on DreamHost.

[php]<?php
$cmd = system('geoip-lookup '.$_SERVER[‘REMOTE_ADDR’]);
echo($cmd);
?>[/php]

EDIT:

Another DreamHoster just posted a snippet in this thread using javascript from http://j.maxmind.com/app/geoip.js that could be utilised on page load by reading the data from the geoip_country_name() function.


#9

Thanks to all so far for the ideas.

Obviously the more I keep “under the hood” on the Apache server, the less chance outsiders will have to “whimsically” redefine my security options for me (eg. Cloudflare, etc.). Since I am not using it for the speed or locale caching, it wouldn’t make much difference to me if I were to go back to a regular Apache server model. Due to Google and other bots (Baidu, Yandex) continuously devising new ways to break into and cache our information on our sites, I feel confident that the Apache server is the best model. It looks as if the PHP engine can be used as an extension of that utility value.

The GeoIP option looks like it could work. Since it uses the PHP engine and that can be used to protect the data served up by individual PHP pages, it can tell Apache to withhold serving up the page until country is confirmed.

It would seem that it might work the best under these conditions being met:

  1. these two lines exist in .htaccess:
    Options -Indexes
    IndexIgnore *

  2. All resources OTHER than the index.php (and up to this point for me index.html) are located in subdirectories.

  3. Exact names of files and subdirectories are undiscovered by the bots due to condition 1 being met. Transition of index page from from HTML to PHP file includes renaming all resources that are referenced by it. (break the existing bot mapping)

I am going to this trouble because I don’t want to have password protected pages. Additionally, it looks as if Baidu and possibly Yandex have servers located in the US. So the simple solution is to cut those off at .htaccess level.

Since this is a “conversion” of sorts, I am hoping that I can initially get away with as little conversion work as possible. The home page now is just a static HTML page with a couple Iframes for dynamic content. From that there are PHP modules and other resources which launch off of that in new tabs or new windows. Of course any bot which can see my index HTML page will be able to map out other unprotected pages based on unprotected references to other files and resources.

I am hoping I could get away with just copying and pasting much of the existing code on my HTML pages into a PHP page, or rather just pasting the aforementioned GeoIP code into the header of my index HTML document, maybe add new “PHP” tags outside the outermost “HTML” tags, and then rename it to a PHP page. I am wondering if it could work, just considering that my existing HTML documents could still be valid as static HTML content on a PHP page. My HTML pages are hand coded and rather simple, with some javascript popup functions at the most. Sound feasible?

Also, I just wanted somebody confirm that the GeoIP resources for PHP are indeed available for use on Dreamhost’s SHARED servers.

Thanks again!
Chris


#10

you know about [font=courier]robots.txt[/font], right?


#11

See edit above. A javascript method will probably be super easy for you.


#12

Hi, yes. It is hardly relevant to bots though.
[hr]

Okay, I will have to take a look at that. Wouldn’t anything in javascript run on the client-side machine and be easy to defeat just by disabling javascript for that website?


#13

If the bots you’re trying to hinder are of the evil scraper variety, they use javascript. I only “re-mentioned” it as you posted about JS and I thought it might be more up your alley - and you mightn’t need to recode your HTML to PHP. The next quickest alternative I guess would be to convert to PHP and use the already present (at least on my server, you’ll have to check yours) GeoIP via a PHP call like the one mentioned above and whack in an if country is not USA then die() or somesuch approach. But that won’t stop proxies (and many bots use them). JB Block is really good and is practically a drop-in if you do go with PHP. It sniffs lots of stuff (botnets, hacky requests, etc.) really quickly and you can add any IP blocks you like to your ban list, keeping your .htaccess file squeaky clean and server friendly.


#14

Hi, thanks for the suggestions. As far as different web development tools are concerned, I am basically a “jack of all trades and master of none.” I have no real preference of one tool over another, as long as it works and can be customized or implemented with a reasonable amount of effort and time. I have edited and customized some PHP scripts to my liking, so I have a general idea of what I’d be getting myself into. I am confident with what PHP does (server-side) and how well it works once it is up and running.

I guess the biggest difference between configuring .htaccess and PHP is that the PHP method will need to be implemented on a per page basis, whereas .htaccss is good for entire root or subdirectories. At least I don’t have a lot of pages to do.

I will have a look at JB Block. Based on your description, it sounds a lot like what I might need.

Thanks
Chris