Skip to content

Commit

Permalink
Merge pull request #3770 from broadinstitute/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
hanars authored Dec 5, 2023
2 parents 6bd1328 + 4d8c0fa commit 506234e
Show file tree
Hide file tree
Showing 183 changed files with 1,001 additions and 467 deletions.
1 change: 1 addition & 0 deletions .github/workflows/hail-search-unit-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
- name: Run coverage tests
run: |
export DATASETS_DIR=./hail_search/fixtures
export ONT_ENABLED=true
coverage run --source="./hail_search" --omit="./hail_search/__main__.py","./hail_search/test_utils.py" -m pytest hail_search/
coverage report --fail-under=99
6 changes: 6 additions & 0 deletions deploy/LOCAL_INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,12 @@ The steps below describe how to annotate a callset and then load it into your on
```bash
docker-compose up -d pipeline-runner # start the pipeline-runner container
```

1. authenticate into your google cloud account.
This is required for hail to access buckets hosted on gcloud.
```bash
docker-compose exec pipeline-runner gcloud auth application-default login
```

1. if you haven't already, download VEP and other reference data to the docker image's mounted directories.
This should be done once per build version, and does not need to be repeated for subsequent loading jobs.
Expand Down
1 change: 1 addition & 0 deletions hail_search/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
NEW_SV_FIELD = 'new_structural_variants'
SCREEN_KEY = 'SCREEN' # uses all caps to match filter provided by the seqr UI
CLINVAR_KEY = 'clinvar'
CLINVAR_MITO_KEY = 'clinvar_mito'
HGMD_KEY = 'hgmd'
STRUCTURAL_ANNOTATION_FIELD = 'structural'

Expand Down
Binary file modified hail_search/fixtures/GRCh38/MITO/annotations.ht/.README.txt.crc
Binary file not shown.
Binary file not shown.
4 changes: 2 additions & 2 deletions hail_search/fixtures/GRCh38/MITO/annotations.ht/README.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.115-10932c754edb
Created at 2023/10/20 10:41:48
Written with version 0.2.124-13536b531342
Created at 2023/11/22 10:50:28
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified hail_search/fixtures/GRCh38/MITO/annotations.ht/metadata.json.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.115-10932c754edb
Created at 2023/09/13 17:19:40
Written with version 0.2.124-13536b531342
Created at 2023/11/27 16:00:11
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.124-13536b531342
Created at 2023/11/21 12:19:09
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.124-13536b531342
Created at 2023/11/21 12:34:02
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.115-10932c754edb
Created at 2023/09/11 13:50:59
Written with version 0.2.124-13536b531342
Created at 2023/11/27 16:02:25
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.115-10932c754edb
Created at 2023/09/11 13:52:41
Written with version 0.2.124-13536b531342
Created at 2023/11/27 16:03:24
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.115-10932c754edb
Created at 2023/09/13 13:15:29
Written with version 0.2.124-13536b531342
Created at 2023/11/27 16:03:50
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
21 changes: 15 additions & 6 deletions hail_search/queries/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def annotation_fields(self):
'genotypes': lambda r: r.family_entries.flatmap(lambda x: x).filter(
lambda gt: hl.is_defined(gt.individualGuid)
).group_by(lambda x: x.individualGuid).map_values(lambda x: x[0].select(
'sampleId', 'individualGuid', 'familyGuid',
'sampleId', 'sampleType', 'individualGuid', 'familyGuid',
numAlt=hl.if_else(hl.is_defined(x[0].GT), x[0].GT.n_alt_alleles(), self.MISSING_NUM_ALT),
**{k: x[0][field] for k, field in self.GENOTYPE_FIELDS.items()},
**{_to_camel_case(k): v(x[0], k, r) for k, v in self.COMPUTED_GENOTYPE_FIELDS.items()},
Expand Down Expand Up @@ -159,15 +159,15 @@ def _format_enum(cls, r, field, enum, empty_array=False, format_array_values=Non
if hasattr(value, 'map'):
if empty_array:
value = hl.or_else(value, hl.empty_array(value.dtype.element_type))
value = value.map(lambda x: cls._enum_field(x, enum, **kwargs))
value = value.map(lambda x: cls._enum_field(field, x, enum, **kwargs))
if format_array_values:
value = format_array_values(value, r)
return value

return cls._enum_field(value, enum, **kwargs)
return cls._enum_field(field, value, enum, **kwargs)

@staticmethod
def _enum_field(value, enum, ht_globals=None, annotate_value=None, format_value=None, drop_fields=None, enum_keys=None, **kwargs):
def _enum_field(field_name, value, enum, ht_globals=None, annotate_value=None, format_value=None, drop_fields=None, enum_keys=None, include_version=False, **kwargs):
annotations = {}
drop = [] + (drop_fields or [])
value_keys = value.keys()
Expand All @@ -183,9 +183,12 @@ def _enum_field(value, enum, ht_globals=None, annotate_value=None, format_value=
else:
annotations[field] = enum_array[value[value_field]]

if include_version:
annotations['version'] = ht_globals['versions'][field_name]

value = value.annotate(**annotations)
if annotate_value:
annotations = annotate_value(value, enum, ht_globals)
annotations = annotate_value(value, enum)
value = value.annotate(**annotations)
value = value.drop(*drop)

Expand Down Expand Up @@ -348,7 +351,8 @@ def _filter_entries_table(self, ht, sample_data, inheritance_mode=None, inherita

@classmethod
def _add_entry_sample_families(cls, ht, sample_data):
sample_index_id_map = dict(enumerate(hl.eval(ht.sample_ids)))
ht_globals = hl.eval(ht.globals)
sample_index_id_map = dict(enumerate(ht_globals.sample_ids))
sample_id_index_map = {v: k for k, v in sample_index_id_map.items()}
sample_index_id_map = hl.dict(sample_index_id_map)
sample_individual_map = {s['sample_id']: s['individual_guid'] for s in sample_data}
Expand Down Expand Up @@ -386,6 +390,7 @@ def _add_entry_sample_families(cls, ht, sample_data):
family_entries=family_sample_indices.map(lambda sample_indices: sample_indices.map(
lambda i: ht.entries[i].annotate(
sampleId=sample_index_id_map.get(i),
sampleType=cls._get_sample_type(ht_globals),
individualGuid=sample_index_individual_map.get(i),
familyGuid=sample_index_family_map.get(i),
affected_id=sample_index_affected_status.get(i),
Expand All @@ -395,6 +400,10 @@ def _add_entry_sample_families(cls, ht, sample_data):

return ht, sample_id_family_index_map, num_families

@classmethod
def _get_sample_type(cls, ht_globals):
return ht_globals.sample_type

def _filter_inheritance(self, ht, inheritance_mode, inheritance_filter, sample_data, sample_id_family_index_map):
any_valid_entry = lambda x: self.GENOTYPE_QUERY_MAP[HAS_ALT](x.GT)

Expand Down
28 changes: 19 additions & 9 deletions hail_search/queries/mito.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
import hail as hl

from hail_search.constants import ABSENT_PATH_SORT_OFFSET, CLINVAR_KEY, CLINVAR_LIKELY_PATH_FILTER, CLINVAR_PATH_FILTER, \
from hail_search.constants import ABSENT_PATH_SORT_OFFSET, CLINVAR_KEY, CLINVAR_MITO_KEY, CLINVAR_LIKELY_PATH_FILTER, CLINVAR_PATH_FILTER, \
CLINVAR_PATH_RANGES, CLINVAR_PATH_SIGNIFICANCES, ALLOWED_TRANSCRIPTS, ALLOWED_SECONDARY_TRANSCRIPTS, PATHOGENICTY_SORT_KEY, CONSEQUENCE_SORT, \
PATHOGENICTY_HGMD_SORT_KEY
from hail_search.queries.base import BaseHailTableQuery, PredictionPath, QualityFilterFormat


def _clinvar_sort(clinvar_field, r):
return hl.or_else(r[clinvar_field].pathogenicity_id, ABSENT_PATH_SORT_OFFSET)


class MitoHailTableQuery(BaseHailTableQuery):

DATA_TYPE = 'MITO'
Expand Down Expand Up @@ -48,6 +52,7 @@ class MitoHailTableQuery(BaseHailTableQuery):
PATHOGENICITY_FILTERS = {
CLINVAR_KEY: ('pathogenicity', CLINVAR_PATH_RANGES),
}
PATHOGENICITY_FIELD_MAP = {CLINVAR_KEY: CLINVAR_MITO_KEY}

GLOBALS = BaseHailTableQuery.GLOBALS + ['versions']
CORE_FIELDS = BaseHailTableQuery.CORE_FIELDS + ['rsid']
Expand All @@ -69,24 +74,28 @@ class MitoHailTableQuery(BaseHailTableQuery):
**BaseHailTableQuery.BASE_ANNOTATION_FIELDS,
}
ENUM_ANNOTATION_FIELDS = {
'clinvar': {'annotate_value': lambda value, enum, ht_globals: {
'conflictingPathogenicities': MitoHailTableQuery._format_enum(
value, 'conflictingPathogenicities', enum, enum_keys=['pathogenicity']),
'version': ht_globals['versions'].clinvar,
}},
CLINVAR_MITO_KEY: {
'response_key': CLINVAR_KEY,
'include_version': True,
'annotate_value': lambda value, enum: {
'conflictingPathogenicities': MitoHailTableQuery._format_enum(
value, 'conflictingPathogenicities', enum, enum_keys=['pathogenicity']),
},
},
TRANSCRIPTS_FIELD: {
**BaseHailTableQuery.ENUM_ANNOTATION_FIELDS['transcripts'],
'annotate_value': lambda transcript, *args: {'major_consequence': transcript.consequence_terms.first()},
'drop_fields': ['consequence_terms'],
}
}

CLINVAR_SORT = _clinvar_sort
SORTS = {
CONSEQUENCE_SORT: lambda r: [
hl.min(r.sorted_transcript_consequences.flatmap(lambda t: t.consequence_term_ids)),
hl.min(r.selected_transcript.consequence_term_ids),
],
PATHOGENICTY_SORT_KEY: lambda r: [hl.or_else(r.clinvar.pathogenicity_id, ABSENT_PATH_SORT_OFFSET)],
PATHOGENICTY_SORT_KEY: lambda r: [_clinvar_sort(CLINVAR_MITO_KEY, r)],
**BaseHailTableQuery.SORTS,
}
SORTS[PATHOGENICTY_HGMD_SORT_KEY] = SORTS[PATHOGENICTY_SORT_KEY]
Expand Down Expand Up @@ -233,7 +242,8 @@ def _get_clinvar_path_filters(pathogenicity):

def _has_path_expr(self, terms, field):
subfield, range_configs = self.PATHOGENICITY_FILTERS[field]
enum_lookup = self._get_enum_lookup(field, subfield)
field_name = self.PATHOGENICITY_FIELD_MAP.get(field, field)
enum_lookup = self._get_enum_lookup(field_name, subfield)

ranges = [[None, None]]
for path_filter, start, end in range_configs:
Expand All @@ -245,7 +255,7 @@ def _has_path_expr(self, terms, field):
ranges.append([None, None])

ranges = [r for r in ranges if r[0] is not None]
value = self._ht[field][f'{subfield}_id']
value = self._ht[field_name][f'{subfield}_id']
return hl.any(lambda r: (value >= r[0]) & (value <= r[1]), ranges)

def _format_results(self, ht, *args, **kwargs):
Expand Down
11 changes: 8 additions & 3 deletions hail_search/queries/multi_data_types.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
import hail as hl
import os

from hail_search.constants import ALT_ALT, REF_REF, CONSEQUENCE_SORT, OMIM_SORT, GROUPED_VARIANTS_FIELD
from hail_search.queries.base import BaseHailTableQuery
from hail_search.queries.mito import MitoHailTableQuery
from hail_search.queries.snv_indel import SnvIndelHailTableQuery
from hail_search.queries.sv import SvHailTableQuery
from hail_search.queries.gcnv import GcnvHailTableQuery
from hail_search.queries.ont_snv_indel import OntSnvIndelHailTableQuery

QUERY_CLASS_MAP = {
cls.DATA_TYPE: cls for cls in [SnvIndelHailTableQuery, MitoHailTableQuery, SvHailTableQuery, GcnvHailTableQuery]
}
ONT_ENABLED = os.environ.get('ONT_ENABLED')

QUERY_CLASSES = [SnvIndelHailTableQuery, MitoHailTableQuery, SvHailTableQuery, GcnvHailTableQuery]
if ONT_ENABLED:
QUERY_CLASSES.append(OntSnvIndelHailTableQuery)
QUERY_CLASS_MAP = {cls.DATA_TYPE: cls for cls in QUERY_CLASSES}
SNV_INDEL_DATA_TYPE = SnvIndelHailTableQuery.DATA_TYPE


Expand Down
12 changes: 12 additions & 0 deletions hail_search/queries/ont_snv_indel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from hail_search.queries.base import BaseHailTableQuery
from hail_search.queries.snv_indel import SnvIndelHailTableQuery


class OntSnvIndelHailTableQuery(SnvIndelHailTableQuery):

DATA_TYPE = 'ONT_SNV_INDEL'

CORE_FIELDS = BaseHailTableQuery.CORE_FIELDS

def _get_loaded_filter_ht(self, *args, **kwargs):
return None
7 changes: 5 additions & 2 deletions hail_search/queries/snv_indel.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import hail as hl

from hail_search.constants import HGMD_KEY, HGMD_PATH_RANGES, \
from hail_search.constants import CLINVAR_KEY, CLINVAR_MITO_KEY, HGMD_KEY, HGMD_PATH_RANGES, \
GNOMAD_GENOMES_FIELD, PREFILTER_FREQ_CUTOFF, PATH_FREQ_OVERRIDE_CUTOFF, PATHOGENICTY_SORT_KEY, PATHOGENICTY_HGMD_SORT_KEY, \
SCREEN_KEY, SPLICE_AI_FIELD
from hail_search.queries.base import PredictionPath, QualityFilterFormat
Expand Down Expand Up @@ -45,6 +45,7 @@ class SnvIndelHailTableQuery(MitoHailTableQuery):
**MitoHailTableQuery.PATHOGENICITY_FILTERS,
HGMD_KEY: ('class', HGMD_PATH_RANGES),
}
PATHOGENICITY_FIELD_MAP = {}

BASE_ANNOTATION_FIELDS = {
k: v for k, v in MitoHailTableQuery.BASE_ANNOTATION_FIELDS.items()
Expand All @@ -57,10 +58,12 @@ class SnvIndelHailTableQuery(MitoHailTableQuery):
'format_value': lambda value: value.region_types.first(),
},
}
ENUM_ANNOTATION_FIELDS[CLINVAR_KEY] = ENUM_ANNOTATION_FIELDS.pop(CLINVAR_MITO_KEY)

SORTS = {
**MitoHailTableQuery.SORTS,
PATHOGENICTY_HGMD_SORT_KEY: lambda r: MitoHailTableQuery.SORTS[PATHOGENICTY_SORT_KEY](r) + [r.hgmd.class_id],
PATHOGENICTY_SORT_KEY: lambda r: [MitoHailTableQuery.CLINVAR_SORT(CLINVAR_KEY, r)],
PATHOGENICTY_HGMD_SORT_KEY: lambda r: [MitoHailTableQuery.CLINVAR_SORT(CLINVAR_KEY, r), r.hgmd.class_id],
}

def _prefilter_entries_table(self, ht, *args, **kwargs):
Expand Down
4 changes: 4 additions & 0 deletions hail_search/queries/sv.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ class SvHailTableQuery(BaseHailTableQuery):
)],
}

@classmethod
def _get_sample_type(cls, *args):
return cls.DATA_TYPE.split('_')[-1]

def _filter_annotated_table(self, *args, parsed_intervals=None, exclude_intervals=False, **kwargs):
if parsed_intervals:
interval_filter = hl.array(parsed_intervals).any(lambda interval: hl.if_else(
Expand Down
4 changes: 1 addition & 3 deletions hail_search/requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,14 @@
#
# pip-compile hail_search/requirements-test.in
#
aiohttp==3.8.6
aiohttp==3.9.0
# via pytest-aiohttp
aiosignal==1.3.1
# via aiohttp
async-timeout==4.0.2
# via aiohttp
attrs==23.1.0
# via aiohttp
charset-normalizer==3.2.0
# via aiohttp
coverage==5.1
# via -r requirements-test.in
exceptiongroup==1.1.3
Expand Down
Loading

0 comments on commit 506234e

Please sign in to comment.