I've got a site here that's been moved to an evaluation server because it's using too many resources on it's shared server. It's there to be looked at and see what we can trim down to make it more neighborly if at all possible. On an evaluation server you get access to some resource logs that I'm having trouble reading, I've been through all kbase articles I can find and there's really no mention of how to interpret these logs that I can find anywhere. Can anyone help me out here?
It generates 2 different files for review per day, one is raw data of the format:
user 0.04 cpu 1573k mem 0 io index.php
about a bazillion lines of that^ - then there's an overview/analyzed file that has column headers and looks a bit different:
Process CPU seconds user machine count average
index.php 26477.5600 87.641% 110.323% 15485 1.710
The overview/analyzed file^ there looks to take all files and tally up their numbers from the big raw data file.
It's hard for me to fully understand these logs without any reference, I think I might have a real good idea, or at least an overview of the big things to attack, but without definitive reference I'm a little leery that I'm not knowing all there s to know.
There is a perl script that does the analysis of the huge data files (assumed generated by sa(8) - print system accounting statistics) but I'm having trouble running down anything that helps me (the completely ignorant nix admin) with the output.
Right now it looks as though the major problem is search engine spiders, they are accounting for anywhere from 100-400 times the traffic of normal users on the site at any given time. The largest offender is Yahoo Slurp and we are clamping down on that robot, msn as well has robots directives for us to use (crawl delay). Google however appears to have nothing in place that would allow us to restrict their crawling of out site in any way (aside from a complete ban), and they are amongst the very top offenders.
Anyone know of a way to talk to the googlebot like you can to MSNbot and Yahoo Slurp using crawl-delay? The only thing I've been able to determine for Google is that you'll have to personally contact them and give them a crawl delay to manually feed to the bot...
All hep's appreciated!