downloader: resume interrupted downloads #99

tiborsimko · 2020-11-19T10:47:37Z

When one download a big 20 GB file, something can go wrong along the way, or the user can sleep or restart their machines. When this happens and user re-issues download-file command, this will start download again from zero, event though local directory already has parts of the file.

The goal of this issue to resume interrupted downloads from the last good state.

How to reproduce:

Run local CERN Open Data instance as follows:

$ cd opendata.cern.ch
$ docker-compose build
$ docker-compose up
$ docker exec -i -t opendatacernch_web_1 ./scripts/populate-instance.sh --skip-records
$ docker exec -i -t opendatacernch_web_1 cernopendata fixtures records --mode insert-or-replace -f cernopendata/modules/fixtures/data/records/atlas-2020-exactly2lep.json
$ firefox http://localhost/record/15007

The test record contains a big 20 GB file.

Now start a download and interrupt after a while:

$ cernopendata-client download-files --recid 15007 --server http://localhost
==> Downloading file 1 of 1
  -> File: ./15007/exactly2lep.zip
 
^C
$ ls -lh 15007/exactly2lep.zip
-rw-r--r-- 1 simko simko 15M Nov 19 11:40 15007/exactly2lep.zip

and resume download from scratch again:

$ cernopendata-client download-files --recid 15007 --server http://localhost

The downloader should recognise already available 15007/exactly2lep.zip file and should continue from there.

Note that the user can interrupt download any number of times until successful completion.

The behaviour could be configurable by a new --resume option, for example:

when the downloader sees there is no target file yet, it proceeds to download as usual;
when the downloader sees the file, it checks it size and checksum, and
- if the file is complete, it will say that there is nothing to download, that the file is here and is already verified
- if the file is partial, it would check whether the user used --resume option, and if yes, then would continue from that point, and if no, it would ask the user whether resume or redownload is wanted.
Since we have only big files, I guess the resume behaviour could be the default though, and ask the user to do rm ... on the given file if the user would like not to do resume, but rather redownload.

CC @katilp

The text was updated successfully, but these errors were encountered:

ParthS007 · 2020-11-25T09:26:43Z

@tiborsimko

if the file is partial, it would check whether the user used --resume option, and if yes, then would continue from that point, and if no, it would ask the user whether the resume or redownload is wanted.

I have a couple of musings.

We will know if a file is partially downloaded when it is not matching the checksum and size from the remote file.
How we plan to handle the resuming of download?
We will go like requests -> pycurl -> xrootd and use inbuilt functionality from the respective library or you have some different approach in mind?

ParthS007 · 2020-12-15T14:52:05Z

@katilp comment link on root forum: https://root-forum.cern.ch/t/running-time-dependence-on-cluster-distance-from-cern-for-jobs-on-gke-cluster-open-data/42447

ParthS007 · 2020-12-17T15:23:30Z

Resuming of downloads

Requests - downloader: catch more error situations #109
Pycurl - downloader: resume interrupted downloads #117
Xrootd -

ParthS007 self-assigned this Nov 25, 2020

ParthS007 mentioned this issue Nov 26, 2020

downloader: resume interrupted downloads #111

Closed

ParthS007 mentioned this issue Dec 16, 2020

downloader: catch more error situations #109

Merged

tiborsimko mentioned this issue Jan 4, 2021

download-files: expose HTTP downloader choice (requests vs pycurl) #116

Closed

ParthS007 mentioned this issue Jan 4, 2021

downloader: resume interrupted downloads #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

downloader: resume interrupted downloads #99

downloader: resume interrupted downloads #99

tiborsimko commented Nov 19, 2020

ParthS007 commented Nov 25, 2020

ParthS007 commented Dec 15, 2020

ParthS007 commented Dec 17, 2020 •

edited

Loading

downloader: resume interrupted downloads #99

downloader: resume interrupted downloads #99

Comments

tiborsimko commented Nov 19, 2020

ParthS007 commented Nov 25, 2020

ParthS007 commented Dec 15, 2020

ParthS007 commented Dec 17, 2020 • edited Loading

ParthS007 commented Dec 17, 2020 •

edited

Loading