Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use range properly in sda-download and fix coordinate handling when using encryption to comply with documented behaviour #1252

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

pontus
Copy link
Contributor

@pontus pontus commented Jan 10, 2025

sda-download accepts and parses a Range header if any. Unfortunately, it decides whatever it's serving an entire file before doing so, ending up in serving from the start of the object request.

This PR moves that decision until after the Range header has been considered. Also adds a test to check that getting from something other than the start of the object works when using Range and extends some tests to check additional things.

@pontus pontus requested a review from MalinAhlberg January 10, 2025 14:04
jbygdell
jbygdell previously approved these changes Jan 13, 2025
MalinAhlberg
MalinAhlberg previously approved these changes Jan 13, 2025
Copy link
Contributor

@MalinAhlberg MalinAhlberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
A note, unrelated to this change: this is not tested with htsget as download currently does not serve unencrypted .bai files, which htsget requires.

@pontus pontus dismissed stale reviews from MalinAhlberg and jbygdell via 6aca7eb January 14, 2025 12:01
@pontus pontus force-pushed the use_range_properly branch from 1887c81 to 6aca7eb Compare January 14, 2025 12:01
jbygdell
jbygdell previously approved these changes Jan 14, 2025
@jbygdell jbygdell enabled auto-merge January 14, 2025 12:14
@pontus
Copy link
Contributor Author

pontus commented Jan 14, 2025

Just a heads up!

@MalinAhlberg I've changed the behaviour of calculateCoords here as I think the current implementation is broken (doesn't adjust start as it says it should in api/api.md, confuses the offset of the end of the requested data and the length of the requested data).

But these changes also make some other minor adjustments that I think make more sense and are in line with what it says in api.md:

  • lack of specification now returns 0 as start and end (and thus makes sure "send whole file" is still triggered) rather than returning the full size as end. This really shouldn't be visible from the outside.
  • it's possible (for whenever one would want to do that) to request part of the header, e.g. getting the first 8 bytes rather than getting the full header if that is requested.

I don't think CSC (or anyone else) use coordinates with encryption, but I'll try to make sure to ping them about this just to let them know.

@pontus pontus changed the title Use range properly in sda-download Use range properly in sda-download and fix coordinate handling when using encryption to comply with documented behaviour Jan 14, 2025
@MalinAhlberg
Copy link
Contributor

MalinAhlberg commented Jan 15, 2025

After discussions with @jbygdell, here are some assumptions that we believe should be true. Note that all of them are not true for the current version, but for the future they seem to be sound.
Would you agree with these, @pontus ? If not, please elaborate :)

  • the coordinates are for the decrypted file
  • the start/end position should be relative to the actual file or the crypt4gh header+file (note, a change in htsget would be needed here)
  • the crypt4gh header should always be intact
  • the start position, if inside of the file, should always adhere to block boundaries
  • the end position, should always adhere to block boundaries

@pontus
Copy link
Contributor Author

pontus commented Jan 15, 2025

  • the coordinates are for the decrypted file

This makes sense to me, but it seemed the world didn't want to be sensible :).

For clarity, I also think there is a good use case for supporting "raw" access, e.g. where the consumer is aware of what it's getting (e.g. for https://github.com/NBISweden/sdafs, I really just want to access the encrypted data stream).

htsget is a good example of when I think it would be easier not trying to deal with everything (e.g. having a bit from the end of a file that isn't a full block before things that come earlier gets needlessly complicated).

  • the start/end position be relative to the actual file or the crypt4gh header+file (note, a change in htsget would be needed here)

If coordinates are for the decrypted file, the header doesn't seem relevant for the coordinates since the header "doesn't exist" then.

Also, if coordinates are for the decrypted file. I think a reasonable assumption is that one should get the requested region if one decodes the result. To me, that means each request should come with a crypt4gh header, and if either start or end doesn't align with block boundaries (natural or through file length), there should be a data-edit-list set up to adjust for that.

(By coincidence I did most of that earlier I think, but I don't know if that code remains anywhere anymore.)

  • the crypt4gh header should always be intact

Not sure what this means. Maybe the same as above (if coordinates are for the decrypted file space, each request should serve a header, yes).

If coordinates are for encrypted space, one shouldn't get anything other than what one requests.

  • the start position, if inside of the file, should always adhere to block boundaries
  • the end position, should always adhere to block boundaries

If things are in the decrypted coordinate space, I agree with the caveat above (the header should put in a data edit list to adjust for that the user maybe doesn't want the data that is at the beginning of the block).

If in encrypted coordinate space, we should just serve whatever is requested.

Don't know where this leaves us for now.

@jbygdell jbygdell disabled auto-merge January 15, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants