Script for page/image retrieval?

software development


i’d use wget. assuming the image doesn’t change location, that should work. as long as stderr is sent to your email, you’ll get an email if the cron job fails.

basic syntax would be:

hope that is what you were looking for.


Hi Bob

This is the other Wi[color=#CC0000]l[/color] here …

To grab a web page off a remote server using Perl, it’s best to go for the LWP set of moudles. Should be installed on DH servers already, although I haven’t checked. Then, scroll down to see the other module I have used too.

This code is unchecked, and if it doesn’t work, I will write you a regex when I’m back at my decent machine in the morning.


use CGI;
use LWP::Simple;
use HTML::TokeParser;

$myremoteurl = “”;
$myremotepage = get($myremoteurl);

Now you could us a a regex to find the specific tag

you are looking for. Me? Well, I don’t paticulary like

to write regexes so late into the night, so I’ll

cheat and rely upon another module to do at

least some of the hard work …

while (my $token=$parser->get_tag(“img”))
if ($token->[1]{src}=~/test0(.*).jpg/ig)
$$myimageurl = $token->[1]{src};
die(“No such IMG tag on page : $!”);

You can then access your IMG url using the variable

$myimageurl, like so:

print qq|\n|;

[/code]Now this is untested code so I’m not sure if it will work. In theory it should, but with the amount of hours spent in front of this monitor today, I won’t be suprised if it doesn’t.

Some notes on extending the above. If the image is rather large, it would be beneficial for you to pull the WIDTH and the HEIGHT attributes of the image too. This can easily be done by adding the following to your script.

$$myimagewidth  = $token->[1]{width};
$$myimageheight = $token->[1]{height};

I hope this helps!



After re-reading your post, I realise that my post does not specificaly answer your problem. However, it shouldn’t be too much of a problem to you just to add a snippet at the end to save the IMG.

Again, I would just make another call using the LWP module, as shown in the first part of my example.




Hi Bob

Can you direct me to the website and the location of the image you’re trying to pull?

What is the format of the filename? How does it change?





In the code I provided earlier, change the line:

if ($token->[1]{src}=~/test0(.*).jpg/ig) to …

if ($token->[1]{src}=~/smokies(.*).jpg/ig) Cheers



Hey Bob

Glad you got it working in the end. I noticed that you just made the regex simpler, which hopefuly will give you guranteed results every time, but may give other restuls if any other IMG tag on the page includes the word “smokies”.

Try the following regex instead:

=~ /smokies(.+)\.jpg$/i; I think the problem was that I forgot to escape the . (period) before jpg :blush: .