Just FYI, I’m not necessarily disagreeing or arguing with you; I’m rather enjoying this conversation, and of course my opinions are based on what I’ve observed and seen over the years. I am not a lawyer, etc
About.com (as well as Ask Jeeves, and a number of others) does something different entirely. What they are doing is wraping your site within a frameset of their own site. This is wrong in my opinion, and a lot of people agree on that; their answer to all of the complaints was to add the “Remove Frames” or similar link. This seemed to satisfy most of the complaints, but personally I still don’t like it (they’re getting ad money, attracted by my content)…
As for Google, I am torn on what I think. I know that there are special rules for web caches, be it a browser cache or ISP – and, for the most part, what Google is doing is pretty much just that, only they cache it before you visit a site.
So on the one hand, it seems slightly wrong, but on the other, I really, really love that feature. I use it so often it’s pathetic. They have a really unique feature there (and I can only imagine what their file server looks like!), and in those frequent cases where the page description is exactly what you were looking for, and you click only to find a 404 - File Not Found… that ‘cached’ button is a life saver.
Actually I like many things about Google. Text version of PDF files (very handy on a WAP-enabled phone), not to mention they will convert any site to be readable on a WAP device (I’ve been known to read Slashdot on the bus
Back on topic, though, Google isn’t stealing bandwidth – they only hit the site once. And I do believe you can disallow this with a simple robots.txt file, just as you’d stop any other compliant search engine from spidering (and using the same amount of bandwidth mind you).
Just to show you why I personally thing referencing an image from another site is more wrong than the above cases, here’s a scenerio:
Site A has an image gallery. One of the images is, say, 100k in size.
Site B links to this image and has 1000 viewers in a given day.
Google indexes and caches this image, and subsequently gets 1000 viewers looking at the cached version.
About.com shows 1000 people Site A’s page in their frameset.
Given the above, here’s what happened:
- Site B used 100k * 1000, or 0.1 gigs of Site A’s resources.
- Google used 100k of bandwidth (1 hit), and they provided the bandwidth for subsequent views.
About.com used none of Site A’s bandwidth; they simply sent you to Site A, but kept their nav bar etc at the top.
Site B is the only one in this situation who actually used Site A’s resouces for their own serving needs. Using Site B’s logic, I could set up an image gallery with many thousands of images, yet use (and pay for) very little bandwidth on my hosting account. I’ll just use everyone else’s bandwidth.
And again, none of this takes intellectual property into consideration; we’re only thinking in terms of resource abuse.
The reason I think this way is that I run some high-bandwidth sites. One is a shareware site, and in that case I actually don’t mind people linking directly to a file (and I note this on the download page). But only because this promotes sales…
However, on my newest site (let’s call it an “image gallery” ;), I certainly don’t want anyone linking to my images. I don’t even want them saving a copy of them… And thus, I’m using a ton of mod_rewrite, wrapper scripts, and all sorts of other validation. I’m also providing “no-cache” headers, to try and prevent browsers from caching the images.
This is mostly because viewing the images requires a subscription, but if that weren’t the case I’d still do all of that – to avoid another site making $$$ by using my bandwidth.
In any case, I don’t know if my opinions have any legal basis, but I do know a lot of other webmasters agree with me on this. When you run a small site, with a couple of small images, and another small site uses your image, it’s negligable. But get into the many-gigabytes-per-month area, where one site using your image could mean actual $$$ added to your hosting bill, I believe you’d change your mind and agree that permission should be obtained
Also briefly, this is the very reason SPAM is illegal. Because it costs the receipient money (bandwidth). Maybe indirectly, but it does cost. And if you’ve ever seen the statistics on how much spam costs people/ISPs per year…