downloader: catch more error situations #109

ParthS007 · 2020-11-24T11:32:05Z

addresses #98 #99

tiborsimko · 2020-12-15T10:10:11Z

cernopendata_client/searcher.py

@@ -96,7 +103,7 @@ def get_recid(server=None, title=None, doi=None):
    response_json = response.json()
    try:
        response.raise_for_status()
-    except requests.HTTPError as e:
+    except Exception as e:


It would be nice to list all the exceptions instead of one single catch all.

BTW the scenario described in #98 does not for me fully with this PR. I get:

ConnectionResetError: [Errno 104] Connection reset by peer ... During handling of the above exception, another exception occurred: ... urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) ... During handling of the above exception, another exception occurred: ... requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Is this improved with sequel PRs? Shall we address them in one go?

Yes, This PR addresses #98 and will be completed in the sequel of PRs as we discussed.

So let's perhaps squash them together, it'll be easier to test that way!

The general except catch-all is not the best paradigm, as it can mask some error situations, but it is OK for now for partial addressing...

tiborsimko · 2020-12-17T09:33:36Z

Repetitive download leads to an issue. Here's a test case. The current behaviour is:

$ git clean -d -ff -x
$ cernopendata-client download-files --recid 5500 --verify
==> Downloading file 1 of 11
  -> File: ./5500/BuildFile.xml
  -> Progress: 0/0 KiB (100%)
==> Verifying file BuildFile.xml...
  -> Expected size 305, found 305
  -> Expected checksum adler32:ff63668a, found adler32:ff63668a
...
==> Downloading file 11 of 11
  -> File: ./5500/mass4l_combine.png
  -> Progress: 90/90 KiB (100%)
==> Verifying file mass4l_combine.png...
  -> Expected size 93152, found 93152
  -> Expected checksum adler32:62e0c299, found adler32:62e0c299
==> Success!
$ $ cernopendata-client download-files --recid 5500 --verify

==> Downloading file 1 of 11
  -> File {} is complete. Skip download.

Expected behaviour:

do not stop after 1st file
say properly which file is OK (f-string), something like:
"File ./5500/foo.py is up to date; skipping download
note that for xrootd protocol, the files are always re-downloaded, there is no check. It would be good to harmonise the behaviour, e.g. always-re-download or always-skip for both HTTP and XRootD protocols

ParthS007 · 2020-12-17T11:46:44Z

Pushed more changes, now you can check resuming-downloading with the requests library.
Added a couple of functions separating the functionalities, names can be changed.
I have harmonized the re-download now files will always-re-download if they are all present for now. We can change it once we are done with adding resuming downloads for all libs.

tiborsimko · 2020-12-17T15:09:48Z

cernopendata_client/searcher.py

@@ -43,7 +43,7 @@ def verify_recid(server=None, recid=None):
    else:
        try:
            input_record_url_check.raise_for_status()
-        except requests.HTTPError:
+        except Exception:
            display_message(
                msg_type="error",
                msg="The record id number you supplied is not valid.",


'record id' -> 'record ID'

Can you please update this part, even if it is not part of the PR? Also in other places:

$ rg 'record id' cernopendata_client/searcher.py 34: :return: Boolean after verifying the record id 49: msg="The record id number you supplied is not valid.", 73: msg="The record id number you supplied is not valid.", 80: """Return record id by either title or doi. 89: :return: record id cernopendata_client/validator.py 29: msg="You must supply a record id number as an " "input using -r flag.",

Could be a separate commit.

Add resume downloads for requests library addresses #98 and #99

tiborsimko · 2020-12-17T15:51:46Z

Tests of the "resume interrupted download" feature passed locally with requests downloaded, and otherwise in general:

  py27: commands succeeded
  py36: commands succeeded
  py37: commands succeeded
  py38: commands succeeded
  py39: commands succeeded
  congratulations :)

ParthS007 self-assigned this Nov 25, 2020

ParthS007 requested a review from tiborsimko November 25, 2020 10:00

tiborsimko reviewed Dec 15, 2020

View reviewed changes

tiborsimko reviewed Dec 17, 2020

View reviewed changes

Parth Shandilya added 2 commits December 17, 2020 16:14

downloader: catch more error situation and resume interrupted downloads

ab1679a

Add resume downloads for requests library addresses #98 and #99

global: replace record id with record ID

205da11

ParthS007 mentioned this pull request Dec 17, 2020

downloader: resume interrupted downloads #99

Open

ParthS007 requested a review from tiborsimko December 17, 2020 15:33

tiborsimko approved these changes Dec 17, 2020

View reviewed changes

tiborsimko merged commit 205da11 into cernopendata:master Dec 17, 2020

ParthS007 deleted the 98 branch December 17, 2020 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

downloader: catch more error situations #109

downloader: catch more error situations #109

ParthS007 commented Nov 24, 2020 •

edited

Loading

tiborsimko Dec 15, 2020

ParthS007 Dec 15, 2020

tiborsimko Dec 15, 2020

tiborsimko Dec 17, 2020

tiborsimko commented Dec 17, 2020 •

edited by ParthS007

Loading

ParthS007 commented Dec 17, 2020

tiborsimko Dec 17, 2020

tiborsimko commented Dec 17, 2020

downloader: catch more error situations #109

downloader: catch more error situations #109

Conversation

ParthS007 commented Nov 24, 2020 • edited Loading

tiborsimko Dec 15, 2020

Choose a reason for hiding this comment

ParthS007 Dec 15, 2020

Choose a reason for hiding this comment

tiborsimko Dec 15, 2020

Choose a reason for hiding this comment

tiborsimko Dec 17, 2020

Choose a reason for hiding this comment

tiborsimko commented Dec 17, 2020 • edited by ParthS007 Loading

ParthS007 commented Dec 17, 2020

tiborsimko Dec 17, 2020

Choose a reason for hiding this comment

tiborsimko commented Dec 17, 2020

ParthS007 commented Nov 24, 2020 •

edited

Loading

tiborsimko commented Dec 17, 2020 •

edited by ParthS007

Loading