Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make improvements to parallel tile fetching #108

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

adl1995
Copy link
Member

@adl1995 adl1995 commented Aug 28, 2017

This is a follow-up for issue #107. I have restructured the fetch_tiles function for now, but error handling if yet to be done.

  • Restructure high level function fetch_tiles
  • Implement error handling for timeout and missing tiles

@cdeil cdeil self-assigned this Aug 28, 2017
@cdeil cdeil added this to the 0.2 milestone Aug 28, 2017
@adl1995
Copy link
Member Author

adl1995 commented Aug 31, 2017

@cdeil I have successfully used ThreadPoolExecutor.map function, this simplifies the code quite a bit. But, I haven't been able to find an equivalent function for aiohttp.

@cdeil
Copy link
Contributor

cdeil commented Aug 31, 2017

@adl1995 - OK to merge?

Or did you already continue doing edits in this branch?

@adl1995
Copy link
Member Author

adl1995 commented Aug 31, 2017

@cdeil No, I haven't made any further edits. But, tile fetching only works correctly with urllib, not with aiohttp. The second test case in test_fetch.py is failing because the tiles aren't sorted correctly: https://travis-ci.org/hipspy/hips/jobs/270430193#L1557

Should I just manually sort the list of tiles in case of aiohttp?

@cdeil cdeil modified the milestones: 0.2, 0.3 Oct 28, 2017
@cdeil cdeil removed their assignment Jun 28, 2018
@adl1995 adl1995 mentioned this pull request Jul 5, 2018
@adl1995 adl1995 force-pushed the improve_async branch 5 times, most recently from 7f06939 to 03556ed Compare January 18, 2019 13:50
Signed-off-by: Adeel Ahmad <adeelahmad14@hotmail.com>
@adl1995
Copy link
Member Author

adl1995 commented Jan 18, 2019

@cdeil - Sorry for the late response.

I have added a some error handling for missing and timed out tiles.

For the moment, if a tile is missing, we simply raise a urllib.error.HTTPError exception. However, should we provide the functionality to draw tiles even if some of them are missing?

I think one option is to introduce an attribute (boolean?) in the HipsTile class which identifies if it's a dummy tile. Based on this we can return an np.zeros array when the drawing module queries the tile data.

@cdeil
Copy link
Contributor

cdeil commented Jan 18, 2019

For the moment, if a tile is missing, we simply raise a urllib.error.HTTPError exception. However, should we provide the functionality to draw tiles even if some of them are missing?

I think it's very common that some times don't arrive. There's of course the case of network issues, but also there are HiPS that aren't all-sky, so some tiles just aren't present on the server. We should be able to draw those, yes. This is also the case for all HiPS we have as example datasets in the hips-extra repo, so maybe some tests will fail already if you raise HTTPError.

I don't remember what we have. But basically I think we need to return a result object, working just with a stream of tiles can't work?

@adl1995
Copy link
Member Author

adl1995 commented Jan 18, 2019

This is also the case for all HiPS we have as example datasets in the hips-extra repo, so maybe some tests will fail already if you raise HTTPError.

I don't believe we currently handle the case of missing tiles. The tests seems to pass even if I raise the HTTPError for missing tiles. I also tried to remove one of the tiles from hips-extra that was used in a test case and the HipsTile.read() method raised an exception.

I'm starting to add this functionality in (0a17fe1). Currently, I have introduced a variable is_missing in HipsTile based on which the data property returns an array filled with zeros.

@adl1995
Copy link
Member Author

adl1995 commented Jan 21, 2019

We now have the functionality to create an all sky image when one (or more) tiles are missing. An example plot for AKARI-FIS where some tiles (URLs) don't exist:

akari-fis-missing-tiles

@cdeil - Can you please give this a review? We still need to add a few test cases for this. I'm not sure how to construct the test case, but we can probably pass a bogus URL and try to plot the result at the end.

Signed-off-by: Adeel Ahmad <adeelahmad14@hotmail.com>
@adl1995
Copy link
Member Author

adl1995 commented Jul 6, 2019

@cdeil - As discussed, I have now removed the aiohttp package and added a test case for missing HiPS tiles.

The test case is failing because of an issue with Astropy:

hips/conftest.py:5: in <module>
    from astropy.tests.pytest_plugins import *
E   ModuleNotFoundError: No module named 'astropy.tests.pytest_plugins'

@cdeil
Copy link
Contributor

cdeil commented Aug 14, 2019

@adl1995 - Looks like I dropped the ball here and didn't respond or merge this in, sorry!

I've now made you admin for hipspy, so you can also go ahead and merge if no-one responds from now on.

Note that in this case it would be best to first fix the CI test fails in GH master, possibly via #143, and to only then continue with functional changes and PRs like this one, always only merging in ones that have all test passing.

How would you feel about getting rid of both our homegrown multi-threading and aiohttp code, and to use parfive instead?
https://docs.sunpy.org/en/stable/whatsnew/1.0.html#improved-file-downloading-capability
It's a higher-level interface, so should mean less code on our side and same functionality.
I don't know how this would play with progress bars, don't remember if we still have them or what options we expose.
But presumably we could do whatever we now also with parfive, just by passing along a few config options.
The whole premise of parfive is that we get a synchronous and simple API and just a few lines, and all the async is hidden in the download call.
In principle we should structure our API so that it's usable also for parallel download and drawing or saving to file, but we never had that, and I think most user for hipspy will be able to fit all tiles in memory, i.e. it's OK to just not support the "tons of tile / out of core processing" in hipspy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants