-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Extract Backend Methods called from Python into Py4JBackendExtensions
#14767
Conversation
2d4edc1
to
79ecea8
Compare
79ecea8
to
1d36659
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I like how clear it makes it which methods are python endpoints. I just have a few small questions.
// From https://github.com/hail-is/hail/issues/14580 : | ||
// IR can get quite big, especially as it can contain an arbitrary | ||
// amount of encoded literals from the user's python session. This | ||
// was a (controversial) restriction imposed by Jackson and should be lifted. | ||
// | ||
// We remove this restriction at the earliest point possible for each backend/ | ||
// This can't be unified since each backend has its own entry-point from python | ||
// and its own specific initialisation code. | ||
StreamReadConstraints.overrideDefaultStreamReadConstraints( | ||
StreamReadConstraints.builder().maxStringLength(Integer.MAX_VALUE).build() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning for moving this from the Backend
constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was tricky to debug. We need to set the defaults before com.fasterxml.jackson.core.JsonFactory
is initialised (which is initialised by spark).
That factory holds onto a copy of these constants and so subsequent calls to overrideDefaultStreamReadConstraints
will have no effect after the factory is initialised. It's that instance that's used to determine max string length.
Moving it to each backend entry point makes it more likely we'll initialise the defaults first but there are no guarantees. Say someone provides their own spark context before the spark backend is initialised. They're likely to encounter this problem.
hail/src/main/scala/is/hail/backend/py4j/Py4JBackendExtensions.scala
Outdated
Show resolved
Hide resolved
1d36659
to
0f02bdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, on to the next!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like this change. Looks great!
Refactors backend code to improve separation of concerns by extracting Python-specific functionality into a new
Py4JBackendExtensions
trait. This change consolidates Python interface methods that were previously scattered across different backend implementations into a single shared trait.The change also improves reference genome handling by making the references map mutable and passing it explicitly through constructors rather than modifying it after creation.
Security Assessment
This change has no security impact
Impact Description
Low-level refactoring of internal backend code that moves functionality between classes without changing behavior. No changes to external interfaces or security boundaries.