GoogleBot and none of my sites being seen by said


#1

Hi all,
I have a few domain names hosted with DH and have been on here using DH for some time.

However a month or two ago I got the old “your site is using too much CPU” email from support, so I trimmed back my php code and shuffeld about my mySQL db’s to make it run better, but ever since then my sites have not been seen by google and its bot, I use the sitemaps and it tells me that it can not see my site. I then googled many of my domain names that are hosted with DH, none of them are in google, or if they are, they are just clinging on with one or two links.

So my questions is, where would DH disable google seeing my website? (where do they block googlebot from accessing my site?).
KEEP IN MIND that I have looked in the .htaccess file and the robots.txt file, nothing in there indicates a block of googlebots IP address. - So where ELSE could they block googlebot’s access?

Your help is greatly appreciated,
James

Remember, I have already checked .htaccess and robots.txt


#2

Do you have your domains added to Google’s Webmaster tools? It should give you specific error messages in there if Google is having a problem accessing the site.

I guess if they have disabled it you may need to contact customer service to let them know you’ve modified the code to use up fewer resources.

Maybe that won’t help if Google is trying to repetitively run code, maybe put something in robots to limit the bot’s behavior and keep them away from offending functions.

http://wiki.dreamhost.com/Googlebot

~
website building website building & discount coupons
my obligatory dreamhost discount coupon


#3

Hi, thanks for your fast reply.

The errors in Google sitemaps are:

General HTTP error: HTTP 403 error (Forbidden)
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit… it then goes on to say:

The server is refusing the request. If you see that Googlebot received this status code when trying to crawl valid pages of your site (you can see this on the Web crawl page under Diagnostics in Google Webmaster Tools), it’s possible that your server or host is blocking Googlebot’s access.

HTTP errors /4xx error:
“Likely reasons for this error are that the webserver didn’t understand or couldn’t process the request, the request was forbidden, or the request timed out. If the page has moved and requests for it return a status code of 410, you might consider changing the response to return a status code of 301 and permanently redirect the request.”

Its very suspicious they complained about my CPU usage etc then shortly after my hits from google die off and I get this reported in sitemaps :frowning:

Yeah, I have told Google to behave and wait a few seconds between hits, but Google isnt coming anymore :frowning:

Thanks,
James


#4

Your .htaccess and robots.txt are where you should be looking, and if you are not seeing anything there that is denying the bot access, then you might be missing an .htaccess file ABOVE your site directory.

Have you tried to prevent hotlinking ( is it possible you have inadvertently blocked Google in doing so)?

It would help us help you if you let us know the site(s) involved and the link(s) to your sitemap(s). :wink:

–rlparker
–DreamHost Tech Support


#5

yes I think that is it, it was the .htaccess file in the parent …/ directory before the domain name.

Many thanks,
James


#6

You are welcome, and I’m glad you were able to find it. That said, if your site was excessively resource intensive in the past, and Googlebot had anything to do with it, make sure and monitor your logs closely to see if Googlebot is going crazy again if you modify that .htaccess file to let it back in…

Otherwise, Googlebot might again cripple your server and if that happens again, you may find it disabled in a way that you cannot undo! :wink:

–rlparker
–DreamHost Tech Support


#7

Yeah, well before I didnt use the DH / Goodies / Spider blocker, so I have that going this time and if it plays up and I get hit hard, I will increase the sleep time between hits to calm it down, hopefully all will run sweet :).

Thanks again,
James


#8

Good deal and good luck! It seems like you have it under control so here is hoping you have no more trouble with it. :slight_smile:

–rlparker
–DreamHost Tech Support