Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloader: catch more error situations #109

Merged
merged 2 commits into from
Dec 17, 2020
Merged

downloader: catch more error situations #109

merged 2 commits into from
Dec 17, 2020

Conversation

ParthS007
Copy link
Member

@ParthS007 ParthS007 commented Nov 24, 2020

addresses #98 #99

@ParthS007 ParthS007 self-assigned this Nov 25, 2020
@ParthS007 ParthS007 requested a review from tiborsimko November 25, 2020 10:00
@@ -96,7 +103,7 @@ def get_recid(server=None, title=None, doi=None):
response_json = response.json()
try:
response.raise_for_status()
except requests.HTTPError as e:
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to list all the exceptions instead of one single catch all.

BTW the scenario described in #98 does not for me fully with this PR. I get:

ConnectionResetError: [Errno 104] Connection reset by peer
...
During handling of the above exception, another exception occurred:
...
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
...
During handling of the above exception, another exception occurred:
...
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Is this improved with sequel PRs? Shall we address them in one go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, This PR addresses #98 and will be completed in the sequel of PRs as we discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's perhaps squash them together, it'll be easier to test that way!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general except catch-all is not the best paradigm, as it can mask some error situations, but it is OK for now for partial addressing...

@tiborsimko
Copy link
Member

tiborsimko commented Dec 17, 2020

Repetitive download leads to an issue. Here's a test case. The current behaviour is:

$ git clean -d -ff -x
$ cernopendata-client download-files --recid 5500 --verify
==> Downloading file 1 of 11
  -> File: ./5500/BuildFile.xml
  -> Progress: 0/0 KiB (100%)
==> Verifying file BuildFile.xml...
  -> Expected size 305, found 305
  -> Expected checksum adler32:ff63668a, found adler32:ff63668a
...
==> Downloading file 11 of 11
  -> File: ./5500/mass4l_combine.png
  -> Progress: 90/90 KiB (100%)
==> Verifying file mass4l_combine.png...
  -> Expected size 93152, found 93152
  -> Expected checksum adler32:62e0c299, found adler32:62e0c299
==> Success!
$ $ cernopendata-client download-files --recid 5500 --verify

==> Downloading file 1 of 11
  -> File {} is complete. Skip download.

Expected behaviour:

  • do not stop after 1st file
  • say properly which file is OK (f-string), something like:
    "File ./5500/foo.py is up to date; skipping download
  • note that for xrootd protocol, the files are always re-downloaded, there is no check. It would be good to harmonise the behaviour, e.g. always-re-download or always-skip for both HTTP and XRootD protocols

@ParthS007
Copy link
Member Author

  • Pushed more changes, now you can check resuming-downloading with the requests library.

  • Added a couple of functions separating the functionalities, names can be changed.

  • I have harmonized the re-download now files will always-re-download if they are all present for now. We can change it once we are done with adding resuming downloads for all libs.

@@ -43,7 +43,7 @@ def verify_recid(server=None, recid=None):
else:
try:
input_record_url_check.raise_for_status()
except requests.HTTPError:
except Exception:
display_message(
msg_type="error",
msg="The record id number you supplied is not valid.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'record id' -> 'record ID'

Can you please update this part, even if it is not part of the PR? Also in other places:

$ rg 'record id'
cernopendata_client/searcher.py
34:    :return: Boolean after verifying the record id
49:                msg="The record id number you supplied is not valid.",
73:            msg="The record id number you supplied is not valid.",
80:    """Return record id by either title or doi.
89:    :return: record id

cernopendata_client/validator.py
29:            msg="You must supply a record id number as an " "input using -r flag.",

Could be a separate commit.

@tiborsimko
Copy link
Member

Tests of the "resume interrupted download" feature passed locally with requests downloaded, and otherwise in general:

  py27: commands succeeded
  py36: commands succeeded
  py37: commands succeeded
  py38: commands succeeded
  py39: commands succeeded
  congratulations :)

@tiborsimko tiborsimko merged commit 205da11 into cernopendata:master Dec 17, 2020
@ParthS007 ParthS007 deleted the 98 branch December 17, 2020 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants