WGET does not work on larger files


#1

the file is 4GB.

It seems that since the file is large, Dreamhost automatically kills the process because it’s too CPU intensive, so support suggested that I use the “nice” command, so I try this:

nice -n 19 wget http://download.wikimedia.org/enwiki/20080312/enwiki-20080312-pages-articles.xml.bz2

but I still get what appear to be, the same results. basically, it stops. no specific errors. I guess Dreamhost is killing it still. note that I am using the option -n19, which basically says to give it the lowest priority possible. Here are the results of the wget:

–07:13:54-- http://download.wikimedia.org/enwiki/20080312/enwiki-20080312-pages-articles.xml.bz2
=> `enwiki-20080312-pages-articles.xml.bz2.5’
Resolving download.wikimedia.org… 208.80.152.183
Connecting to download.wikimedia.org[208.80.152.183]:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: -585,685,143 [application/octet-stream]

[ <=> ] 2,701 --.–K/s


#2

With such a large file, you should probably use a restartable transfer command. Even if you had no problem at the Dreamhost end, you can easily run into network timeouts, etc.

I think the following command would work. The “-C -” tells curl to resume the transfer:

curl -C - -O http://…articles.xml.bz2

See the curl man page for full details.