I have some php programs that go to a Bureau of Reclamation site to screen scrape some flow and temperature data. The urls that I use are like http://www.usbr.gov/pn-bin/yak/arc3.pl?station=YRPW&year=1980&month=10&day=1&year=2010&month=3&day=20&pcode=QD where the beginning and ending dates are in the url. I run these scripts with lynx -dump. The php script looks for a couple strings in the output for the start and end of useful data:
$cStartStr = “BEGIN DATA”;
$cEndStr = “END DATA”;
$cPageTail = stristr($contents, $cStartStr);
$nUsefulDataEndPos = strpos($cPageTail, $cEndStr);
$cUsefulData = substr($cPageTail, 0, $nUsefulDataEndPos);
A difficulty has just appeared. If I put the end date too far out in the future, “END DATA” doesn’t appear at the end of the useful data. Instead, “Error: file access opening fab” appears at the end of the data. I suppose I could set the end date far enough in the future that I would consistently see “Error: file access opening fab” and use that for my end string, but it would be better to use either, depending on which string is encountered. How would you handle that? To I have to retrieve the url once to see what the end string is going to be, and again to get the data?
This signature line intentionally blank.