Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for new reference data structure #4509

Merged
merged 14 commits into from
Dec 4, 2024
1 change: 0 additions & 1 deletion hail_search/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
MOTIF_FEATURES_KEY = 'motif_feature'
REGULATORY_FEATURES_KEY = 'regulatory_feature'
CLINVAR_KEY = 'clinvar'
CLINVAR_MITO_KEY = 'clinvar_mito'
HGMD_KEY = 'hgmd'
STRUCTURAL_ANNOTATION_FIELD = 'structural'
FAMILY_GUID_FIELD = 'familyGuids'
Expand Down
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.128-eead8100a1c1
Created at 2024/06/10 16:51:30
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 13:07:33
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified hail_search/fixtures/GRCh38/MITO/annotations.ht/.README.txt.crc
Binary file not shown.
Binary file not shown.
4 changes: 2 additions & 2 deletions hail_search/fixtures/GRCh38/MITO/annotations.ht/README.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.128-eead8100a1c1
Created at 2024/10/14 16:14:00
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 11:15:26
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified hail_search/fixtures/GRCh38/MITO/annotations.ht/metadata.json.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.128-eead8100a1c1
Created at 2024/06/14 15:14:52
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 12:35:22
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 10:48:02
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 10:46:25
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 10:46:46
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion hail_search/queries/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1218,7 +1218,7 @@ def gene_counts(self):

def lookup_variants(self, variant_ids, include_project_data=False, **kwargs):
self._parse_intervals(intervals=None, variant_ids=variant_ids, variant_keys=variant_ids)
ht = self._read_table('annotations.ht', drop_globals=['paths', 'versions'])
ht = self._read_table('annotations.ht', drop_globals=['versions'])
ht = ht.filter(hl.is_defined(ht[XPOS]))

annotation_fields = self.annotation_fields(include_genotype_overrides=False)
Expand Down
21 changes: 9 additions & 12 deletions hail_search/queries/mito.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import logging
import os

from hail_search.constants import ABSENT_PATH_SORT_OFFSET, CLINVAR_KEY, CLINVAR_MITO_KEY, CLINVAR_LIKELY_PATH_FILTER, \
from hail_search.constants import ABSENT_PATH_SORT_OFFSET, CLINVAR_KEY, CLINVAR_LIKELY_PATH_FILTER, \
CLINVAR_PATH_FILTER, \
CLINVAR_PATH_RANGES, CLINVAR_PATH_SIGNIFICANCES, ALLOWED_TRANSCRIPTS, ALLOWED_SECONDARY_TRANSCRIPTS, \
PATHOGENICTY_SORT_KEY, CONSEQUENCE_SORT, \
Expand All @@ -14,7 +14,6 @@
from hail_search.queries.base import BaseHailTableQuery, PredictionPath, QualityFilterFormat, MAX_PARTITIONS

REFERENCE_DATASETS_DIR = os.environ.get('REFERENCE_DATASETS_DIR', '/seqr/seqr-reference-data')
REFERENCE_DATASET_SUBDIR = 'cached_reference_dataset_queries'

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -57,15 +56,14 @@ class MitoHailTableQuery(BaseHailTableQuery):
'haplogroup_defining': PredictionPath('haplogroup', 'is_defining', lambda v: hl.or_missing(v, 'Y')),
'hmtvar': PredictionPath('hmtvar', 'score'),
'mitotip': PredictionPath('mitotip', 'trna_prediction'),
'mut_taster': PredictionPath('dbnsfp_mito', 'MutationTaster_pred'),
'sift': PredictionPath('dbnsfp_mito', 'SIFT_score'),
'mut_taster': PredictionPath('dbnsfp', 'MutationTaster_pred'),
'sift': PredictionPath('dbnsfp', 'SIFT_score'),
'mlc': PredictionPath('local_constraint_mito', 'score'),
}

PATHOGENICITY_FILTERS = {
CLINVAR_KEY: ('pathogenicity', CLINVAR_PATH_RANGES),
}
PATHOGENICITY_FIELD_MAP = {CLINVAR_KEY: CLINVAR_MITO_KEY}

GLOBALS = BaseHailTableQuery.GLOBALS + ['versions']
CORE_FIELDS = BaseHailTableQuery.CORE_FIELDS + ['rsid']
Expand All @@ -86,7 +84,7 @@ class MitoHailTableQuery(BaseHailTableQuery):
**BaseHailTableQuery.BASE_ANNOTATION_FIELDS,
}
ENUM_ANNOTATION_FIELDS = {
CLINVAR_MITO_KEY: {
CLINVAR_KEY: {
'response_key': CLINVAR_KEY,
'include_version': True,
'annotate_value': lambda value, enum: {
Expand All @@ -109,7 +107,7 @@ class MitoHailTableQuery(BaseHailTableQuery):
hl.min(r.sorted_transcript_consequences.flatmap(lambda t: t.consequence_term_ids)),
hl.min(r.selected_transcript.consequence_term_ids),
],
PATHOGENICTY_SORT_KEY: lambda r: [_clinvar_sort(CLINVAR_MITO_KEY, r)],
PATHOGENICTY_SORT_KEY: lambda r: [_clinvar_sort(CLINVAR_KEY, r)],
**BaseHailTableQuery.SORTS,
}
SORTS[PATHOGENICTY_HGMD_SORT_KEY] = SORTS[PATHOGENICTY_SORT_KEY]
Expand Down Expand Up @@ -363,7 +361,7 @@ def _get_loaded_filter_ht(self, key, get_filters, **kwargs):
if ht_filter is False:
self._filter_hts[key] = False
else:
ht = self._read_table(f'{REFERENCE_DATASET_SUBDIR}/{self.PREFILTER_TABLES[key]}')
ht = self._read_table(f'{self.PREFILTER_TABLES[key]}')
if ht_filter is not True:
ht = ht.filter(ht_filter(ht))
self._filter_hts[key] = ht
Expand All @@ -372,7 +370,7 @@ def _get_loaded_filter_ht(self, key, get_filters, **kwargs):

@classmethod
def _get_table_dir(cls, path):
if REFERENCE_DATASET_SUBDIR in path:
if any(prefilter_table_path in path for prefilter_table_path in cls.PREFILTER_TABLES.values()):
return REFERENCE_DATASETS_DIR
return super()._get_table_dir(path)

Expand Down Expand Up @@ -486,8 +484,7 @@ def _get_clinvar_path_filters(pathogenicity):

def _has_path_expr(self, ht, terms, field):
subfield, range_configs = self.PATHOGENICITY_FILTERS[field]
field_name = self.PATHOGENICITY_FIELD_MAP.get(field, field)
enum_lookup = self._get_enum_lookup(field_name, subfield)
enum_lookup = self._get_enum_lookup(field, subfield)

ranges = [[None, None]]
for path_filter, start, end in range_configs:
Expand All @@ -499,7 +496,7 @@ def _has_path_expr(self, ht, terms, field):
ranges.append([None, None])

ranges = [r for r in ranges if r[0] is not None]
value = ht[field_name][f'{subfield}_id']
value = ht[field][f'{subfield}_id']
return hl.any(lambda r: (value >= r[0]) & (value <= r[1]), ranges)

def _format_results(self, ht, *args, **kwargs):
Expand Down
13 changes: 5 additions & 8 deletions hail_search/queries/snv_indel_37.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from collections import OrderedDict
import hail as hl

from hail_search.constants import CLINVAR_KEY, CLINVAR_MITO_KEY, HGMD_KEY, HGMD_PATH_RANGES, \
GNOMAD_GENOMES_FIELD, PREFILTER_FREQ_CUTOFF, PATH_FREQ_OVERRIDE_CUTOFF, PATHOGENICTY_SORT_KEY, PATHOGENICTY_HGMD_SORT_KEY, \
from hail_search.constants import CLINVAR_KEY, HGMD_KEY, HGMD_PATH_RANGES, \
GNOMAD_GENOMES_FIELD, PREFILTER_FREQ_CUTOFF, PATH_FREQ_OVERRIDE_CUTOFF, PATHOGENICTY_HGMD_SORT_KEY, \
SPLICE_AI_FIELD, GENOME_VERSION_GRCh37
from hail_search.queries.base import PredictionPath, QualityFilterFormat
from hail_search.queries.mito import MitoHailTableQuery
Expand All @@ -28,10 +28,10 @@ class SnvIndelHailTableQuery37(MitoHailTableQuery):
GNOMAD_GENOMES_FIELD: {'filter_af': 'AF_POPMAX_OR_GLOBAL', 'het': None, 'sort': 'gnomad'},
}
PREDICTION_FIELDS_CONFIG = {
'cadd': PredictionPath('cadd', 'PHRED'),
'cadd': PredictionPath('dbnsfp', 'CADD_phred'),
'eigen': PredictionPath('eigen', 'Eigen_phred'),
'mpc': PredictionPath('mpc', 'MPC'),
'primate_ai': PredictionPath('primate_ai', 'score'),
'mpc': PredictionPath('dbnsfp', 'MPC_score'),
'primate_ai': PredictionPath('dbnsfp', 'PrimateAI_score'),
SPLICE_AI_FIELD: PredictionPath(SPLICE_AI_FIELD, 'delta_score'),
'splice_ai_consequence': PredictionPath(SPLICE_AI_FIELD, 'splice_consequence'),
'mut_taster': PredictionPath('dbnsfp', 'MutationTaster_pred'),
Expand All @@ -43,7 +43,6 @@ class SnvIndelHailTableQuery37(MitoHailTableQuery):
**MitoHailTableQuery.PATHOGENICITY_FILTERS,
HGMD_KEY: ('class', HGMD_PATH_RANGES),
}
PATHOGENICITY_FIELD_MAP = {}
ANNOTATION_OVERRIDE_FIELDS = [SPLICE_AI_FIELD]

CORE_FIELDS = MitoHailTableQuery.CORE_FIELDS + ['CAID']
Expand All @@ -60,11 +59,9 @@ class SnvIndelHailTableQuery37(MitoHailTableQuery):
'format_value': lambda value: value.region_types.first(),
},
}
ENUM_ANNOTATION_FIELDS[CLINVAR_KEY] = ENUM_ANNOTATION_FIELDS.pop(CLINVAR_MITO_KEY)

SORTS = {
**MitoHailTableQuery.SORTS,
PATHOGENICTY_SORT_KEY: lambda r: [MitoHailTableQuery.CLINVAR_SORT(CLINVAR_KEY, r)],
PATHOGENICTY_HGMD_SORT_KEY: lambda r: [MitoHailTableQuery.CLINVAR_SORT(CLINVAR_KEY, r), r.hgmd.class_id],
}

Expand Down
Loading