Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locall install helm dataproc #4601

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bpblanken
Copy link
Collaborator

@bpblanken bpblanken commented Jan 17, 2025

Ok! This is fully working now!

Logs on the local pipeline worker:

===== Luigi Execution Summary =====

Scheduled 5 tasks of which:
* 5 ran successfully:
    - 1 CreateDataprocClusterTask(reference_genome=GRCh38, dataset_type=SNV_INDEL, run_id=20250122-154349)
    - 1 RsyncToSeqrAppDirsTask(...)
    - 1 RunPipelineOnDataprocTask(...)
    - 1 TriggerHailBackendReload(...)
    - 1 WriteSuccessFileTask(...)

This progress looks :) because there were no failed tasks or missing dependencies

And the successful dataproc job

I also confirmed search worked!

@bpblanken bpblanken marked this pull request as ready for review January 22, 2025 16:36
@bpblanken
Copy link
Collaborator Author

bpblanken commented Jan 22, 2025

One other import thing, I had to bump the open source pipeline to hail 0.2.132 to get everything to work together. iirc I wasn't able to get a docker image built with a version less than 0.2.132, and 0.2.131 includes this extremely important release note. Bumping the dataproc image required bumping to python 3.11, which required a whole dependency refactor etc.

I have not yet moved our airflow hail dataproc environment from its current setup, which means that it is now fix pinned to a non-latest pipeline version. This is fine for now, but we likely want to move forwards with python 3.11 and the new version of hail in our pipeline too.

I sanity checked that the hail version used by hail-search can read tables produced by 0.2.132 though, but didn't do any further evaluation.

PR for moving our pipeline is here: https://github.com/broadinstitute/seqr-pipeline-airflow/pull/312

@hanars
Copy link
Collaborator

hanars commented Jan 22, 2025

Ideally we should run the same version of python for everything - can you make a ticket to update hail-search and main seqr to 3.11?

@bpblanken
Copy link
Collaborator Author

Closed the hail-search/vlm upgrade pr in favor of keeping the pipeline on hail 0.2.132 and the readers on 0.2.128.

@bpblanken bpblanken requested review from hanars and jklugherz January 23, 2025 19:48
@bpblanken
Copy link
Collaborator Author

bpblanken commented Jan 23, 2025

Actually, I just realized we can bump python to 3.11 without bumping hail. I will do that.

EDIT: turns out we just happen to be on an unfortunate hail version:

bblanken@instance-20250113-212703:~$ docker run -it hailgenetics/hail:0.2.128-py3.11
root@a5066ed1cf1c:/# python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux

this was fixed on later versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants