Php program, insert into MySQL not working

software development

#1

I have a php script that has been in production for a few years that screen scrapes water flow data from a US. Bureau of Reclamation web site and sticks it in a MySQL table for the use of our scientists and for my charting routines. I run it in a cron several times a day so I can catch any updates from the BOR. The cron runs a shell program that calls this script 10 different times to get different sites and parameters based on url arguments.

The script now looks like:
[php]<?php
//getFlowAll.php
//get Prosser and Roza flow (cfs) from BOR website, put into MySQL table on my webhost
// Paul Huffman, 2-5-2007
// Modified from getFlow5.php to add in Flows QJ from Chandler Cannal CHCW
// Paul Huffman, 3-28-2007
// Modified from getFlow6.php to add flows from Umtanum, Parker, Nachess 4/4/2011
//by Paul Huffman
// Copied back to getFlow6.php
// Modified to add flows for New Reservation and Sunnyside canal by PH 4/18/2012
// previous version saved as getFlow6d.php
//Modified from getFlow6.php by PH 1/22/2013 to get flow from just one site Chandler CHCW
//Modified from getFlowChandler by PH 123/2013 to get flow from any site using url parameters
//Modified by PH 8/29/2014 to change arc3.pl in BOR url to webarccsv.pl

require_once("…/ChartDirector/lib/phpchartdir.php");
//$site = “CHCW”;
$site = $_GET[‘site’];
$pcode = $_GET[‘pcode’];

set_time_limit(1090000);

//retrieve flow data from BOR database for site, pcode
//$theurl=“http://www.usbr.gov/pn-bin/yak/webarccsv.pl
//."?station=".$site."&year=2000&month=10&day=1&year=2017&month=5&day=20&pcode=$pcode";
date_default_timezone_set(“America/Los_Angeles”) ;
$yr = date(“Y”);
$month = date(“m”);
$day = date(“d”);
$theurl=“http://www.usbr.gov/pn-bin/yak/webarccsv.pl
."?station=".$site."&year=2000&month=10&day=1&year=$yr&month=$month&day=$day&pcode=$pcode";

echo $theurl, “
”;

$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, $theurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$contents = curl_exec($ch);
curl_close($ch);

// display file
//echo $contents;

// The feed appears to have useful parsing point identifiers.
// I just used them to get everything between them (including flags)
$cStartStr = “BEGIN DATA”;
$cEndStr = “END DATA”;
//$cEndStr = “Error: file access opening fab”;
$cPageTail = stristr($contents, $cStartStr);
$nUsefulDataEndPos = strpos($cPageTail, $cEndStr);
$cUsefulData = substr($cPageTail, 0, $nUsefulDataEndPos);

// explode the content using newlines as delimeters
$aContents = explode(chr(10), $cUsefulData);
//print_r($aContents);
//echo ‘line 34’;
// Two new arrays initialized to empty
//$myDates = array();
//$myTemps = array();
//echo ‘line 38’;
// Set up and test database connection
@ $db = new mysqli(‘bor.ykfp.org’, ‘boruser’, ‘steelhead2’, ‘bor’);
//echo ‘Line 41’;
if (mysqli_connect_errno())
{
echo ‘Error: Could not connect to database. Please try again later.’;
exit;
}
//Echo ‘line 47’;
// Empty the flow table in preparation for filling it up with revised data
echo “DELETE from Flow where SITE = “,$site,”
”;
$query = “DELETE from Flow where SITE = $site”;
$result = $db->query($query);
// skip the leading and trailing junk
// Prolly don’t want to do all these assignments in the loop. They’re just used for readability
for ($i=3; $i<count($aContents)-1; $i++) {

// Dates are formatted as 10 characters
$cDateStr = substr($aContents[$i],0,10);
#convert date string (a date in human language) to a date in programming language
$myDates = strtotime($cDateStr);
//echo $cDateStr, $myDates;

// QD is everything in the trimmed value after the last space
$nQDVal = substr($aContents[$i], strrpos(trim($aContents[$i]), chr(32))+1);
//$query = "insert into Flow values ('".$myDates."', '".$nQDVal."', '".YRPW."')";	
$nQDVal == "MISSING " ? 
	$query = "insert into Flow values ('".$myDates."', null, '".$site."')" :
	$query = "insert into Flow values ('".$myDates."', '".$nQDVal."', '".$site."')";
echo $cDateStr, "\t", $nQDVal,"<br />";	
$result = $db->query($query);
}
//peep scene

//echo(’

’);
//print_r($nQDVal);
//print_r($myDates);
//echo(’
’);

$db->close();
?>
[/php]

Bobcat helped me on this forum several years ago https://discussion.dreamhost.com/thread-136971.html and it’s been running fine since until BOR helped me change to their new url and to limit my query for dates from 2000 to the current date.

What has happened recently is that the flow value that gets inserted after 1-19-2015 is 0. When I run the shell that calls the script in a terminal, the echo in line 93 shows each date and proper flow value clear through to the current date. So how is that flow value not getting inserted into the table? However, when I manually run a delete query on the table, then run the script, the table gets filled correctly from 1-19 to the current date. The script is supposed to delete the table in lines 74 - 76. Is the delete not working?

I tried adding more processing time with line 22, but I’m not sure that overrides time limit set by the system php.ini.


#2

CHANGE YOUR DATABASE PASSWORD NOW!

your script includes your database password. you should keep that in a separate file and include it so you don’t accidentally leak it like this.
[hr]
Are you sure the problem is on your end? http://imgur.com/qnFiRzp


#3

Oh, crap, thanks for pointing that out. I thought I was using a connection file in these scripts.


#4

Here’s a screen shot that shows the code, a browser window that calls the code with the results of the echo in line 78 right before the insert, then a query of the database after the run showing the 0 temperature values inserted after 1/19. It doesn’t make since that the script hits a time limit because the dates get inserted.

https://dl.dropboxusercontent.com/u/9369478/BORdata2.JPG


#5

how are you telling the difference between missing data and some other error if they both show up as 0 in the database? shouldn’t missing data show up in your database as a date + missing value or no row at all? if nothing changed on your end on 19 Jan, then the problem is unlikely to be in your code. i would account for your treatment of missing data first as shown in my screenshot above


#6

Exactly! How can I be getting 0s when the echo of the insert shows the date and the data? The BOR insists nothing has changed on their site, so it must be my code. But I hadn’t changed my code.

If I run the controlling shell program from terminal, sometimes its intermittent, sometimes not!

It always seems to work if I manually submit a Delete query to dump the whole table before running the shell. I’m adding some time to the max_execution_time in my phprc and see what happens tomorrow. But if I was hitting a ex time limit, I’d get an error, not wrong data in the table!


#7

Is it possible that the values you’re trying to insert for those rows are out of range or otherwise invalid for your MySQL table?

Can you give an example of a URL for the raw data you’re pulling from the BOR? It’s possible there’s something weird about the data around that day that’s tripping you up.


#8

http://www.usbr.gov/pn-bin/yak/webarccsv.pl?station=YRPW&year=2000&month=10&day=1&year=2015&month=02&day=06&pcode=QD

This morning, the cron run of this script gave me 0 in the QD for 2-5-2015, not 0 back to 1-19, like it did several times yesterday, so current date - 1. Maybe the additional execution time I added in the phprc helped. Maybe I need to add more. Today, when I ran a delete query to dump the whole table, then ran the shell that usually runs in cron, all the data got properly inserted.

The script first runs a delete query on the table where site = the current site, then constructs a url to get a web page of flow for the current site from 10-1-2000 to current date. The script looks for END DATA to stop the data collection. The current day’s QD (flow) is shown in the retrieved webpage as “MISSING” until late in the day, but that’s expected and it’s all right that QD for current date is 0.

I could change the cron to run a new script that just does the DELETE query on the whole table before running the script to get the data for all the sites, but that seems inelegant.


#9

Today the script decided to insert 0 for flows starting on 2-6, rather than 1-19 when it first started, and rather than current day -2 late last week. This inconsistency makes no sense.

I do know that if I dump the whole database table for all sites before running the shell that runs the script in a cron for all sites, the table gets good data right up to the current day. Maybe I’ll have to run another cron procedure right before this data grab is scheduled to get this to work again. The php script runs a delete query just for the current site before getting the new data. I tried adding more execution time in the phprc, but it didn’t seem to help. If the delete query were delaying the execution enough that the script was hitting a time limit, I think I would be getting error messages, not zeros in the table.


#10

I think you should find a way to make MISSING values show up as such in the database first because there’s no way to distinguish between them and the errors you are seeing. There is consistency but you are just not seeing it and that’s probably the reason why. Programmes are going to do the same thing every time, so you have to find the way of disentangling two phenomenon which producing the same results in the database. It’s going to be hard to troubleshoot until you do that.


#11

I thought about handling MISSING values differently before, but 0 as missing value has worked well for us. I’m working with river flows here so the flow never gets to 0 naturally in our situation. ) gets treated in our fish passage models and my charting as missing values. Same with Temperature, our water temperatures are not going to get to 0 F. But I could add a few lines so that -1 is inserted into flow WZ when MISSING is seen.

WZ, the receiving item in the table, is typed float, 0 length, 0 decimals, nulls allowed. Hey, doesn’t that seem wrong that the length is 0? Is that causing strange results?

Someone else suggested that I modify my cron to email me any errors. A good idea, but when I run the cron job in terminal, I don’t see any errors. Just wrong data inserted into the table even though I see the right data on the screen as the result of the echo statement right before the INSERT.


#12

Hey, I have an idea.

When I went to try Bobocat’s idea to insert something besides 0 in the flow parameter for values that are returned as “MISSING” I saw that it already was set up that way:

[php]$nQDVal = substr($aContents[$i], strrpos(trim($aContents[$i]), chr(32))+1);
//$query = “insert into Flow values (’”.$myDates."’, ‘".$nQDVal."’, ‘".YRPW."’)";
$nQDVal == “MISSING " ?
$query = “insert into Flow values (’”.$myDates.”’, null, ‘".$site."’)" :
$query = “insert into Flow values (’”.$myDates."’, ‘".$nQDVal."’, ‘".$site."’)";
echo $cDateStr, “\t”, $nQDVal,"
";
$result = $db->query($query);[/php]

Therefore, when the script finds "MISSING ", the insert should be null. It should never be 0. I wonder if the following blank is now gone from the return from BOR? Because it looks like the condition is never evaluated as true, and the first option in the ternary is never executed.

But, oh crap, that doesn’t explain why I’m getting 0 when there actually is a number there, and $nQDVal shows a number in the echo.


#13

[php]
$nQDVal == “MISSING " ?
$query = “insert into Flow values (’”.$myDates.”’, null, ‘".$site."’)" :
$query = “insert into Flow values (’”.$myDates."’, ‘".$nQDVal."’, ‘".$site."’)";
[/php]

I have a terrible, sinking feeling that operator precedence is biting you in the ass here. I’m not exactly sure how this would end up parsing, but I suspect it’s not right.

Try rewriting this as an “if” statement and see if that makes any difference:

[php]
if ($nQDVal == “MISSING”) {
$query = …
} else {
$query = …
}[/php]

Also, is “MISSING” supposed to have a space after it in your code?


#14

Yeah, Andrew, there was an extra space needed because at some point, there was a change in the string I was getting pack from the url had that blank in it. To get around blank counting, I just tried this:

[php]// QD is everything in the trimmed value after the last space
$nQDVal = substr($aContents[$i], strrpos(trim($aContents[$i]), chr(32))+1);
//$query = “insert into Flow values (’”.$myDates."’, ‘".$nQDVal."’, ‘".YRPW."’)";
#$nQDVal == “MISSING " ?
fnmatch(“SS”,$nQDVal) ?
$query = “insert into Flow values (’”.$myDates.”’, null, ‘".$site."’)" :
$query = “insert into Flow values (’”.$myDates."’, ‘".$nQDVal."’, ‘".$site."’)";
echo $cDateStr, “\t”, $nQDVal,"
";
$result = $db->query($query);[/php]

So this ran, matched “SS”, but this time nulls were inserted into the database for the lines that matched. However the echo showed “MISSING” rather than null. Still that’s better than 0.

However, I had to delete the whole table before running or I would still get 0s in current date and current date -1. I wonder how I can reset the database so it is the same as the first time the procedure runs in the morning so I can test against that condition.


#15

Ok, this is it! It was the Delete query all the time.

I dumped all the lines in the script after the Delete query, so I could echo the query and to see if it emptied the table for the site that was being updated. This new version of the script didn’t empty the table. I looked at the echo. It said

Got me wondering if it was supposed to have a semicolon at the end or what about the quotes around YRPW. I copied and pasted the echo into a SQL tool I have and it wouldn’t run until I put single quotes around YRPW. Back to editing the script and changed
[php]$query = “DELETE from Flow where SITE = $site”;[/php]
to
[php]$query = “DELETE from Flow where SITE = ‘$site’”;[/php]
and the records were out of the table.

So I think something happened back on 1-19 that inserted 0s in the QD, then they never got overwritten for a week until noticed. With this change, the tables will be properly cleared then renewed each morning, and those changes to better handle missing data will also be an improvement.