17 July 2010

Downloading huge file under slow dial-up modem

Today I'm progressing 70% of wikipedia download and I've found a good setting for downloading this 6GB monster of science. http://download.wikimedia.org/enwiki/20100622/enwiki-20100622-pages-articles.xml.bz2

When a download stuck for a long time it seems a sign that a corrupted data has take place. At least that what happen during the first 800MB download using Free Download Manager (FDM). I switch to curl after the prolonged glitches didn't stopped, and continue the unfinished download. After patching 10 corrupted areas, I found later that curl didn't hampered by the same issue. And here is how...



Dial-up modem especially wireless mobile is prone to airwaves disturbance. At least I could say my connection couldn't last stably over an hour continously. So rather leaving the download somehow stucked forever and you doesn't even realize what happened, it's wiser to instruct curl to disconnect itself periodically in 5-10 minutes and reconnect to continuing the download. Don't be upset with the timing threshold, reconnecting the download is just a matter of seconds afterall it really doesn't hurt. Therefore I make simple batch file as follow:

@echo off
setlocal enableextensions
:recurl
set url=http://download.wikimedia.org/enwiki/20100622/enwiki-20100622-pages-articles.xml.bz2

REM proxy duration
set dprx=30
REM minimum speedrate in bytes/second
set mspd=5192

REM timeout
set tout=15
REM toonel proxy

set prxy=127.0.0.1:8080

echo %url%

curl -y %tout% -Y %mspd% -# -L -k -C - -O %url%
curl -x %prxy% -# -L -k -C - -O -m %dprx% %url% 
goto recurl


Meaning when download stucked, switch to proxy mode and get rid those bytes briefly and back to direct mode

After doing this I never encounter single corrupted area since I switch to CURL up to now.

No comments:

Post a Comment