Current time: 04-24-2014, 01:54 AM Hello There, Guest! (LoginRegister)

Googlebot
03-22-2004, 03:46 PM
Post: #1
Googlebot
Please excuse my ignorance, but would disabling directory listing prevent Googlebot from crawling one's site?
- marsbar
Find all posts by this user
03-22-2004, 03:59 PM
Post: #2
Googlebot
Using a robots.txt file would probably be better for that. Disabling directory listing might make it unlikely for sites not linked from *anywhere* to get harvested...
Visit this user's website Find all posts by this user
03-22-2004, 04:32 PM
Post: #3
Googlebot
Quote:Using a robots.txt file would probably be better for that.

I think he meant that he wants to turn off directory indexing, but still have Google index the site.

- Jeff @ DreamHost
- DH Discussion Forum Admin
Visit this user's website Find all posts by this user
03-22-2004, 04:36 PM
Post: #4
Googlebot
Quote:Please excuse my ignorance, but would disabling directory listing
prevent Googlebot from crawling one's site?

Despite the name, turning off directory listing indexing is almost completely unrelated the ability of search engines to index the site. Generally speaking, as long a page:

A) ...is viewable from the web.
B) ...is linked to from an already indexed site, or you manually tell Google about it.

...you should be okay. The only thing turning off directory indexing would do is prevent Google from finding out about files in that directory by crawling the index listing itself. If they're linked to from other pages or you manually tell Google to look at those specific files, they'll still be indexed.

As Will mentioned, if you really do want to turn off search engine spidering of certain content, you can use robots.txt:

http://www.robotstxt.org/wc/robots.html

- Jeff @ DreamHost
- DH Discussion Forum Admin
Visit this user's website Find all posts by this user
03-22-2004, 04:40 PM
Post: #5
Googlebot
Many thanks, Will, for replying.
I have directory listing turned off using .htaccess, and I have a robots.txt that encourages Google to crawl my site. Would the directives in one file affect the other and cause Google to stop crawling my site?
- marsbar
Find all posts by this user
03-22-2004, 04:50 PM
Post: #6
Googlebot
Thank you, Jeff, for 'getting' my question and for providing an answer. Sorry, folks, for not putting my original question very clearly.

What is the most efficient/simplest way to turn off directory listing for only certain directories? Place a .htaccess file in the directories that I do not wish the public to browse?

- marsbar
Find all posts by this user
03-22-2004, 07:41 PM
Post: #7
Googlebot
This has been kind of answered already, but here is my 2 cents (with inflation):

Googlebot does not guess at URLs - again it gets URLs from links in web pages or if you submit a URL to Google.

Now, with directory listing turned on, Apache autmatically makes a web page which would have links to the contents of the directory. So if Google gets a link to the directory itself, it will then try to index the contents.

However with directory listing turned off, it gets the verboten message just like any other web client will, and that means if the conents of the directory are to be indexed, they must be linked to from a web page or submitted manually.

So the real question you want to ask yourself is: Do I have any links to this directory itself in other web pages? That is to say, do you have:
Code:
<a href="http://yourdomain.com/directory/">Link to Directory</a>
appearing on any web sites that Googlebot might grab?

If the answer is "No", then there is nothing to worry about. If the answer if "Yes", well I am not sure how getting the verboten message affects the rank of your site, so you probably want to remove such a link from as many pages as you can, especially considering that if humans try to visit the link, they'll get the forbidden message too.

robots.txt only comes into play if you Disallow access to that directory using it. Googlebot would never try to visit the URL and thus would never see a forbidden message, but of course if you put such a rule it wouldn't index any files that may be in that directory as well.

Cool Perl / MySQL / HTML+CSS
Visit this user's website Find all posts by this user


Forum Jump: