-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use range properly in sda-download and fix coordinate handling when using encryption to comply with documented behaviour #1252
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
A note, unrelated to this change: this is not tested with htsget as download currently does not serve unencrypted .bai files, which htsget requires.
…and some other improvements
…rdinates rather than work on length of requested data and manage start as well
…l sending of entire file after move
…should receive a full block
1887c81
to
6aca7eb
Compare
Just a heads up! @MalinAhlberg I've changed the behaviour of But these changes also make some other minor adjustments that I think make more sense and are in line with what it says in
I don't think CSC (or anyone else) use coordinates with encryption, but I'll try to make sure to ping them about this just to let them know. |
After discussions with @jbygdell, here are some assumptions that we believe should be true. Note that all of them are not true for the current version, but for the future they seem to be sound.
|
This makes sense to me, but it seemed the world didn't want to be sensible :). For clarity, I also think there is a good use case for supporting "raw" access, e.g. where the consumer is aware of what it's getting (e.g. for https://github.com/NBISweden/sdafs, I really just want to access the encrypted data stream).
If coordinates are for the decrypted file, the header doesn't seem relevant for the coordinates since the header "doesn't exist" then. Also, if coordinates are for the decrypted file. I think a reasonable assumption is that one should get the requested region if one decodes the result. To me, that means each request should come with a crypt4gh header, and if either start or end doesn't align with block boundaries (natural or through file length), there should be a data-edit-list set up to adjust for that. (By coincidence I did most of that earlier I think, but I don't know if that code remains anywhere anymore.)
Not sure what this means. Maybe the same as above (if coordinates are for the decrypted file space, each request should serve a header, yes). If coordinates are for encrypted space, one shouldn't get anything other than what one requests.
If things are in the decrypted coordinate space, I agree with the caveat above (the header should put in a data edit list to adjust for that the user maybe doesn't want the data that is at the beginning of the block). If in encrypted coordinate space, we should just serve whatever is requested. Don't know where this leaves us for now. |
sda-download accepts and parses a
Range
header if any. Unfortunately, it decides whatever it's serving an entire file before doing so, ending up in serving from the start of the object request.This PR moves that decision until after the
Range
header has been considered. Also adds a test to check that getting from something other than the start of the object works when usingRange
and extends some tests to check additional things.