Banning a domain


#1

Anyone know how to ban a specific domain from your
site using the htaccess file?

I’ve tried some combinations using the variable
but my syntax must be wrong because each time I upload
the htaccess file it spits out a 500 internal server error for
anyone accessing the site.

Basically all I want to do is ban one specific domain as my
images are being linked to and stolen. Any ideas?

TIA.


#2

The Apache docs are a great source of information for this type of situation. In this case, an Allow/Deny combination won’t work; it’s not the domain pulling the images, rather it’s the visitors of that domain (if I’m correct in assuming they are simply linking to your images).

What you want is something that will block by the REFERER environment variable. first of all is intended to match a URL on your site. Additionally, is not valid in an .htaccess file – only in the httpd.conf and within a container.

Now, on to the problem at hand, I know of two ways. One is to use mod_rewrite.

Unfortunately mod_rewrite is not for the faint at heart; it can cause serious problems if used incorrectly… It takes some time and practice to get it right. Plus, I’m not sure if mod_rewrite is enabled on DH (some hosts allow it, others don’t and for good reason).

The other way is a bit more complex, and is what I am using on my newest site. It involves serving images via a script that first validates the REFERER (as well as their site login in my case), then serves the image data. This, also, isn’t for inexperienced users (I’m not sure what your level of experience is…)

Now, after all of that – my personal recommendation would simply be to contact that site. What they are doing can be considered illegal (IANAL) and most certainly is wrong IMO. If they refuse to stop, then take the next step and contact their hosting provider. In most cases, one of these two steps should stop the abuse.

I do admit it’s more fun to use mod_rewrite or a wrapper script, and to give them an image with “This image is stolen…” on it :wink: See how long it takes them to remove the references…

Hope this helps somehow…

  • Jm4n

#3

There’s a difference between “fair use” and bandwidth theft. If they made a copy of the image on their own server, they might be able to argue fair use. But they are placing an image in their page that is pulled from the original poster’s domain, thus using his bandwidth to serve the image. Regardless of the copyright status of the image, they are utilizing someone else’s resources to serve their website, and this is illegal without permission.

Back to fair use, if you link to a web page, rather than including their content on your page, this shouldn’t be a problem, but personally I would recommend always obtaining permission.

Quoting in a book often doesn’t require permission, but the laws on that have a lot of grey areas. There is no defined limit on number of words that can be copied before you cross from “fair use” to “plaguerism” (sp?)… it’s such a fine line, and I’ll always take the safe side and ask first.

On the web, it’s far too common for one to use other people’s content. Most of the time it’s due to ignorance; the offender doesn’t realize they are doing something wrong. Ignorance, however, is no excuse (for any law).

Now, OTOH, if they outright refuse to remove the image, that calls for more drastic actions. I’ve found that contacting the host usually helps, but it depends on the host. If they are in the US, you can usually count on cooperation; if not, good luck…

By the way, you have to be careful with the REFERER variable. This is sent by the web browser, and may not always be implemented as you expect. Browsers behave differently, and some proxies and caches may strip these headers out. Result - ‘Stolen Image’ on your own site… My rule is to never trust outside input from any source, but I’m paranoid like that (I always use the -T flag in Perl ;).

  • Jm4n

#4

Just FYI, I’m not necessarily disagreeing or arguing with you; I’m rather enjoying this conversation, and of course my opinions are based on what I’ve observed and seen over the years. I am not a lawyer, etc :slight_smile:

About.com (as well as Ask Jeeves, and a number of others) does something different entirely. What they are doing is wraping your site within a frameset of their own site. This is wrong in my opinion, and a lot of people agree on that; their answer to all of the complaints was to add the “Remove Frames” or similar link. This seemed to satisfy most of the complaints, but personally I still don’t like it (they’re getting ad money, attracted by my content)…

As for Google, I am torn on what I think. I know that there are special rules for web caches, be it a browser cache or ISP – and, for the most part, what Google is doing is pretty much just that, only they cache it before you visit a site.

So on the one hand, it seems slightly wrong, but on the other, I really, really love that feature. I use it so often it’s pathetic. They have a really unique feature there (and I can only imagine what their file server looks like!), and in those frequent cases where the page description is exactly what you were looking for, and you click only to find a 404 - File Not Found… that ‘cached’ button is a life saver.

Actually I like many things about Google. Text version of PDF files (very handy on a WAP-enabled phone), not to mention they will convert any site to be readable on a WAP device (I’ve been known to read Slashdot on the bus :wink:

Back on topic, though, Google isn’t stealing bandwidth – they only hit the site once. And I do believe you can disallow this with a simple robots.txt file, just as you’d stop any other compliant search engine from spidering (and using the same amount of bandwidth mind you).

Just to show you why I personally thing referencing an image from another site is more wrong than the above cases, here’s a scenerio:

Site A has an image gallery. One of the images is, say, 100k in size.

Site B links to this image and has 1000 viewers in a given day.
Google indexes and caches this image, and subsequently gets 1000 viewers looking at the cached version.
About.com shows 1000 people Site A’s page in their frameset.

Given the above, here’s what happened:

  • Site B used 100k * 1000, or 0.1 gigs of Site A’s resources.
  • Google used 100k of bandwidth (1 hit), and they provided the bandwidth for subsequent views.
  • About.com used none of Site A’s bandwidth; they simply sent you to Site A, but kept their nav bar etc at the top.

Site B is the only one in this situation who actually used Site A’s resouces for their own serving needs. Using Site B’s logic, I could set up an image gallery with many thousands of images, yet use (and pay for) very little bandwidth on my hosting account. I’ll just use everyone else’s bandwidth.

And again, none of this takes intellectual property into consideration; we’re only thinking in terms of resource abuse.

The reason I think this way is that I run some high-bandwidth sites. One is a shareware site, and in that case I actually don’t mind people linking directly to a file (and I note this on the download page). But only because this promotes sales…

However, on my newest site (let’s call it an “image gallery” ;), I certainly don’t want anyone linking to my images. I don’t even want them saving a copy of them… And thus, I’m using a ton of mod_rewrite, wrapper scripts, and all sorts of other validation. I’m also providing “no-cache” headers, to try and prevent browsers from caching the images.

This is mostly because viewing the images requires a subscription, but if that weren’t the case I’d still do all of that – to avoid another site making $$$ by using my bandwidth.

In any case, I don’t know if my opinions have any legal basis, but I do know a lot of other webmasters agree with me on this. When you run a small site, with a couple of small images, and another small site uses your image, it’s negligable. But get into the many-gigabytes-per-month area, where one site using your image could mean actual $$$ added to your hosting bill, I believe you’d change your mind and agree that permission should be obtained :slight_smile:

Also briefly, this is the very reason SPAM is illegal. Because it costs the receipient money (bandwidth). Maybe indirectly, but it does cost. And if you’ve ever seen the statistics on how much spam costs people/ISPs per year…

  • Jm4n

#5

mod_rewrite? I suspect this poor individual would prefer to remain ulcer-free. I hate mod_rewrite. :>

The simplest way to fix the problem, assuming they are just linking to images stores on your server, is to rename the image files and all references to them. Then create images with the words “This site is stealing graphics from my site at http://www.blah.com/”. You could even make some of the other graphics as humorous as you’d like. I’ve done this in the past, and it’s much fun to change other peoples’ site content.

As for ‘fair use’, my un-lawyerly understanding of that is that only miniscule portions can be ‘quoted’ or used, with attribution, if you are critiquing or reviewing it, etc. You usually can’t put the whole thing up wholesale, and you definitely can’t give the impression that it’s your own content.

As for a webcam site, it’s possible that the operator got permission from those cam sites. It’s also possible that he/she just linked their images without permission. In the latter case, either they don’t know or don’t care that someone put their cam on the site.

  • Jeff @ DreamHost
  • DH Discussion Forum Admin

#6

I hate mod_rewrite. :>

For a first time user, it’s not a good choice, which is why I recommended against using it if you don’t know what you’re doing. However, if you’re any kind of web programmer, mod_rewrite can be your best friend. I love it – with it, I can make a dynamic, database-driven website look like thousands of static HTML pages.

In fact this is what I’m doing with my site. It will appear to have over 4500 HTML pages, when in reality, only about 10 PHP files make up the whole site. And a lot of images, and data :slight_smile:

Why? Well, search engines don’t like to index URLs that look dynamic. Dynamic pages generally aren’t cachable. mod_rewrite can turn this:

/images/400x300/000442.html

into this:

/pic.php?Size=400x300&ImageID=442

:wink:

  • Jm4n

#7

Why? Well, search engines don’t like to index URLs that look dynamic. Dynamic pages generally aren’t cachable. mod_rewrite can turn this:

This is a good point - one I’ve never thought of. Generally when that has been an issue, I just wrote a Perl script that iterated through a list of data and generated the pages. Usually every 6 or 12 hours. The sites in question weren’t anything that had to be updated any more than that, though, which made that pretty feasible.

  • Jeff @ DreamHost
  • DH Discussion Forum Admin