Wget


#1

Newbie to wget. I thought --mirror would get the entire site recursively through all directories, but in doing wget --mirror www.sitename (my own site) it for example stops after downloading six files in a directory called docs which is right under www.sitename as in www.sitename/docs

Any thoughts would be appreciated.


#2

Uhhhhh, yah that makes sense.
Wget is seeing your site like a web browser, so it doesn’t automatically know where everything in your site’s directory is located at.
Assuming your main index page only points to 6 other files, wget will only get those files and nothing more.
Adding --mirror won’t do anything in this case.

What I would suggest, is either use wget and throw in an ftp link… like: wget --mirror ftp://username@password:127.0.0.1:21/mysite.com
In this way it logs into your ftp server and grabs everything under the folder ‘mysite.com’.
Either that, or you if you have SSH access on the other server, you just tar up everything and send it over to the other one.


#3

Thanks. So I tried:
wget --mirror ftp://username@password:208.113.161.4:21/mysite.com

(208.113.161.4 is the ipaddress returned when I login into the ftp server for my site from the command line.)

But it returns “Bad port number.”

Obviously doing something wrong.


#4

See the example here of using wget to backup a website.

As stated before, you must link to the pages for them to be backed up. Using the mirror command here will tell weget ot look at the index file for this domain (exmaple.com) and go from there. It follows all the links on that page out and trees out as many times as there are links. Also, wget will only follow links to the same domain - it won’t go and try to backup any thing else you may link to.

–Matttail
art.googlies.net - personal website


#5

Oh, and the reason it only grabbed the 6 files is because of wget default behavior. The command you originally gave could have had the potential to backup the entire internet. To avoid this wget is set by default (unless you specify otherwise) to only follow just so many links.

You man find "man wget’ helpful - that pulls up the manual page for wget. It may also be very confusing, depending on how familiar you are with command line stuff.

Hope that helps.

–Matttail
art.googlies.net - personal website