How long can a cron job run?


#1

I’m making a website that tracks keyword positions in Google. It uses the custom search API, so it doesn’t violate Google’s TOS. Anyway, I need a script to run twice a week (I wrote it in php) that loops through all of the keywords being tracked, querying Google and then inserting the new positions in a MySQL database. From what I’ve read, it would be best to schedule this as a ‘cron job,’ as opposed to running the file from my browser, but I don’t know much about cron jobs and want to make sure that I’m doing this the right way.

Right now the script takes about 2 minutes to run (I’ve been testing on my localhost), but as more keywords are added it will take longer. Is what I’m wanting to do OK?


#2

Cron is just a scheduling mechanism that starts a script or process. Generally, if the script will run to completion under cron if it can run to completion from a manual start.

the wiki entry contains everything you should need http://wiki.dreamhost.com/Cron
note that there are a number of examples for starting php in that wiki entry, there is also a separate wiki entry with more information here: http://wiki.dreamhost.com/Run_php_from_cron


#3

This is incorrect. You may want to reread the Custom Search Terms of Service:


#4

So there’s no timeout value for a script if it’s started as a cron job?

That confused me at first, too, but I asked on a webmaster forum about that specefic sentance and the consensus was that what I’m doing doesn’t violate the terms of service. There are also some websites, take DigitalPoint for example, that have search engine ranking tools, and at least some of them use the custom search API for position checking. Google could easily shut down these tools - which have tens of thousands of users - if they wanted to.


#5

If you read the first 4 lines of the first wiki link, you will learn alot about cron.

I did mis-word slightly, it would have been more clear to say "Generally, if the script will run to completion from a manual start it will also run to completion from a cron start. "

If you configure it to do so cron can email you, either every time or just on error output.

Please review the previously linked pages, most likely many of your questions will be answered.

The key words in the TOS portion of the discussion seem to be

so find the API Documentation and see if what you are doing is expressly permitted, otherwise this is something andrewf deals with everyday so I would recognize his words are pretty darn credible.


#6

So, after hours of messing around, I finally got a cron job running. My test script contained a while loop that printed out a line of text every minute and exited after 60 minutes. I tried it several times and it worked fine, but I want to make sure that I’m not tying up the server by doing this! I read about the “top” command and tried it, with this being the output:

http://img834.imageshack.us/img834/1166/capturedae.png

Does the %CPU mean the percent of the server’s CPU, or the percent of CPU usage for my account?

In other words, was my simple little script tying up 100% of the servers CPU? Because that would be really bad, right?


#7

That’s 100% of one CPU core, which is bad. There is probably something wrong with your script.


#8

Yikes! I hope I don’t get in trouble :frowning: I ran the script several times before using the “top” command, seeing the CPU usage, getting worried, and killing the process.

This is what is in the php file:

#!/usr/local/bin/php -q
<?php

$firstTime = time();
$oldTime = time();
$i = 0;
while(1)
{
	if(time() - $oldTime >= 60) // if one minute has passed
	{
		echo "testjob " . $i . "\r\n";
		$oldTime = time();
		$i++;
	}
	
	if(time() - $firstTime >= 60*60) // if an hour has passed
		die(); // stop the script
}

?>

I guess I should have known that a while loop would take 100% of the CPU. In retrospect, that was really dumb. The “real” script I would like to run takes a long time to run, but I think that’s because cURL takes several seconds when fetching pages. Will that tie up the CPU at 100%, too?


#9

I just wanted to update and say that, according to the “top” window, the “real” script uses “0:06.85” CPU time. So I guess it’s not going to be a problem! Apparently cURL doesn’t use 100% CPU while waiting for a page (which is what I was afraid of).