Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Extract Backend Methods called from Python into Py4JBackendExtensions #14767

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

ehigham
Copy link
Member

@ehigham ehigham commented Dec 16, 2024

Refactors backend code to improve separation of concerns by extracting Python-specific functionality into a new Py4JBackendExtensions trait. This change consolidates Python interface methods that were previously scattered across different backend implementations into a single shared trait.

The change also improves reference genome handling by making the references map mutable and passing it explicitly through constructors rather than modifying it after creation.

Security Assessment

This change has no security impact

Impact Description

Low-level refactoring of internal backend code that moves functionality between classes without changing behavior. No changes to external interfaces or security boundaries.

@ehigham ehigham marked this pull request as ready for review December 16, 2024 18:50
@ehigham ehigham force-pushed the ehigham/py4j-backend-extensions branch 2 times, most recently from 2d4edc1 to 79ecea8 Compare December 17, 2024 20:00
@ehigham ehigham force-pushed the ehigham/py4j-backend-extensions branch from 79ecea8 to 1d36659 Compare January 13, 2025 16:36
Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I like how clear it makes it which methods are python endpoints. I just have a few small questions.

Comment on lines +40 to +50
// From https://github.com/hail-is/hail/issues/14580 :
// IR can get quite big, especially as it can contain an arbitrary
// amount of encoded literals from the user's python session. This
// was a (controversial) restriction imposed by Jackson and should be lifted.
//
// We remove this restriction at the earliest point possible for each backend/
// This can't be unified since each backend has its own entry-point from python
// and its own specific initialisation code.
StreamReadConstraints.overrideDefaultStreamReadConstraints(
StreamReadConstraints.builder().maxStringLength(Integer.MAX_VALUE).build()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for moving this from the Backend constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was tricky to debug. We need to set the defaults before com.fasterxml.jackson.core.JsonFactory is initialised (which is initialised by spark).

That factory holds onto a copy of these constants and so subsequent calls to overrideDefaultStreamReadConstraints will have no effect after the factory is initialised. It's that instance that's used to determine max string length.

Moving it to each backend entry point makes it more likely we'll initialise the defaults first but there are no guarantees. Say someone provides their own spark context before the spark backend is initialised. They're likely to encounter this problem.

@ehigham ehigham force-pushed the ehigham/py4j-backend-extensions branch from 1d36659 to 0f02bdc Compare January 17, 2025 21:15
Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, on to the next!

Copy link
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this change. Looks great!

@hail-ci-robot hail-ci-robot merged commit 61dfa8c into main Jan 21, 2025
2 of 3 checks passed
@hail-ci-robot hail-ci-robot deleted the ehigham/py4j-backend-extensions branch January 21, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants