Skip to content

Commit

Permalink
Merge pull request #4519 from broadinstitute/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
bpblanken authored Dec 5, 2024
2 parents ef13b05 + 5d48616 commit f59d904
Show file tree
Hide file tree
Showing 179 changed files with 1,084 additions and 1,587 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/local-install-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,7 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Run test_local_deployment script
run: ./test_local_deployment.sh
run: |
mkdir ./data
chmod 777 ./data
./test_local_deployment.sh
2 changes: 2 additions & 0 deletions deploy/LOCAL_INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ The steps below describe how to create a new empty seqr instance with a single A
SEQR_DIR=$(pwd)

wget https://raw.githubusercontent.com/broadinstitute/seqr/master/docker-compose.yml
wget https://raw.githubusercontent.com/broadinstitute/seqr/master/deploy/postgres/initdb.sql
mv initdb.sql ./data/postgres_init/initdb.sql

docker compose up -d seqr # start up the seqr docker image in the background after also starting other components it depends on (postgres, redis, elasticsearch). This may take 10+ minutes.
docker compose logs -f seqr # (optional) continuously print seqr logs to see when it is done starting up or if there are any errors. Type Ctrl-C to exit from the logs.
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
- PGPORT=5433
- POSTGRES_PASSWORD=docker-compose-postgres-password
volumes:
- ./deploy/postgres/initdb.sql:/docker-entrypoint-initdb.d/initdb.sql
- ./data/postgres_init/initdb.sql:/docker-entrypoint-initdb.d/initdb.sql
- ./data/postgres:/var/lib/postgresql/data
healthcheck:
test: pg_isready -h postgres -U postgres
Expand Down
1 change: 0 additions & 1 deletion hail_search/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
MOTIF_FEATURES_KEY = 'motif_feature'
REGULATORY_FEATURES_KEY = 'regulatory_feature'
CLINVAR_KEY = 'clinvar'
CLINVAR_MITO_KEY = 'clinvar_mito'
HGMD_KEY = 'hgmd'
STRUCTURAL_ANNOTATION_FIELD = 'structural'
FAMILY_GUID_FIELD = 'familyGuids'
Expand Down
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.128-eead8100a1c1
Created at 2024/06/10 16:51:30
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 13:07:33
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified hail_search/fixtures/GRCh38/MITO/annotations.ht/.README.txt.crc
Binary file not shown.
Binary file not shown.
4 changes: 2 additions & 2 deletions hail_search/fixtures/GRCh38/MITO/annotations.ht/README.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.128-eead8100a1c1
Created at 2024/10/14 16:14:00
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 11:15:26
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified hail_search/fixtures/GRCh38/MITO/annotations.ht/metadata.json.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.128-eead8100a1c1
Created at 2024/06/14 15:14:52
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 12:35:22
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 10:48:02
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 10:46:25
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
Written with version 0.2.133-4c60fddb171a
Created at 2024/12/04 10:46:46
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion hail_search/queries/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1218,7 +1218,7 @@ def gene_counts(self):

def lookup_variants(self, variant_ids, include_project_data=False, **kwargs):
self._parse_intervals(intervals=None, variant_ids=variant_ids, variant_keys=variant_ids)
ht = self._read_table('annotations.ht', drop_globals=['paths', 'versions'])
ht = self._read_table('annotations.ht', drop_globals=['versions'])
ht = ht.filter(hl.is_defined(ht[XPOS]))

annotation_fields = self.annotation_fields(include_genotype_overrides=False)
Expand Down
21 changes: 9 additions & 12 deletions hail_search/queries/mito.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import logging
import os

from hail_search.constants import ABSENT_PATH_SORT_OFFSET, CLINVAR_KEY, CLINVAR_MITO_KEY, CLINVAR_LIKELY_PATH_FILTER, \
from hail_search.constants import ABSENT_PATH_SORT_OFFSET, CLINVAR_KEY, CLINVAR_LIKELY_PATH_FILTER, \
CLINVAR_PATH_FILTER, \
CLINVAR_PATH_RANGES, CLINVAR_PATH_SIGNIFICANCES, ALLOWED_TRANSCRIPTS, ALLOWED_SECONDARY_TRANSCRIPTS, \
PATHOGENICTY_SORT_KEY, CONSEQUENCE_SORT, \
Expand All @@ -14,7 +14,6 @@
from hail_search.queries.base import BaseHailTableQuery, PredictionPath, QualityFilterFormat, MAX_PARTITIONS

REFERENCE_DATASETS_DIR = os.environ.get('REFERENCE_DATASETS_DIR', '/seqr/seqr-reference-data')
REFERENCE_DATASET_SUBDIR = 'cached_reference_dataset_queries'

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -57,15 +56,14 @@ class MitoHailTableQuery(BaseHailTableQuery):
'haplogroup_defining': PredictionPath('haplogroup', 'is_defining', lambda v: hl.or_missing(v, 'Y')),
'hmtvar': PredictionPath('hmtvar', 'score'),
'mitotip': PredictionPath('mitotip', 'trna_prediction'),
'mut_taster': PredictionPath('dbnsfp_mito', 'MutationTaster_pred'),
'sift': PredictionPath('dbnsfp_mito', 'SIFT_score'),
'mut_taster': PredictionPath('dbnsfp', 'MutationTaster_pred'),
'sift': PredictionPath('dbnsfp', 'SIFT_score'),
'mlc': PredictionPath('local_constraint_mito', 'score'),
}

PATHOGENICITY_FILTERS = {
CLINVAR_KEY: ('pathogenicity', CLINVAR_PATH_RANGES),
}
PATHOGENICITY_FIELD_MAP = {CLINVAR_KEY: CLINVAR_MITO_KEY}

GLOBALS = BaseHailTableQuery.GLOBALS + ['versions']
CORE_FIELDS = BaseHailTableQuery.CORE_FIELDS + ['rsid']
Expand All @@ -86,7 +84,7 @@ class MitoHailTableQuery(BaseHailTableQuery):
**BaseHailTableQuery.BASE_ANNOTATION_FIELDS,
}
ENUM_ANNOTATION_FIELDS = {
CLINVAR_MITO_KEY: {
CLINVAR_KEY: {
'response_key': CLINVAR_KEY,
'include_version': True,
'annotate_value': lambda value, enum: {
Expand All @@ -109,7 +107,7 @@ class MitoHailTableQuery(BaseHailTableQuery):
hl.min(r.sorted_transcript_consequences.flatmap(lambda t: t.consequence_term_ids)),
hl.min(r.selected_transcript.consequence_term_ids),
],
PATHOGENICTY_SORT_KEY: lambda r: [_clinvar_sort(CLINVAR_MITO_KEY, r)],
PATHOGENICTY_SORT_KEY: lambda r: [_clinvar_sort(CLINVAR_KEY, r)],
**BaseHailTableQuery.SORTS,
}
SORTS[PATHOGENICTY_HGMD_SORT_KEY] = SORTS[PATHOGENICTY_SORT_KEY]
Expand Down Expand Up @@ -363,7 +361,7 @@ def _get_loaded_filter_ht(self, key, get_filters, **kwargs):
if ht_filter is False:
self._filter_hts[key] = False
else:
ht = self._read_table(f'{REFERENCE_DATASET_SUBDIR}/{self.PREFILTER_TABLES[key]}')
ht = self._read_table(f'{self.PREFILTER_TABLES[key]}')
if ht_filter is not True:
ht = ht.filter(ht_filter(ht))
self._filter_hts[key] = ht
Expand All @@ -372,7 +370,7 @@ def _get_loaded_filter_ht(self, key, get_filters, **kwargs):

@classmethod
def _get_table_dir(cls, path):
if REFERENCE_DATASET_SUBDIR in path:
if any(prefilter_table_path in path for prefilter_table_path in cls.PREFILTER_TABLES.values()):
return REFERENCE_DATASETS_DIR
return super()._get_table_dir(path)

Expand Down Expand Up @@ -486,8 +484,7 @@ def _get_clinvar_path_filters(pathogenicity):

def _has_path_expr(self, ht, terms, field):
subfield, range_configs = self.PATHOGENICITY_FILTERS[field]
field_name = self.PATHOGENICITY_FIELD_MAP.get(field, field)
enum_lookup = self._get_enum_lookup(field_name, subfield)
enum_lookup = self._get_enum_lookup(field, subfield)

ranges = [[None, None]]
for path_filter, start, end in range_configs:
Expand All @@ -499,7 +496,7 @@ def _has_path_expr(self, ht, terms, field):
ranges.append([None, None])

ranges = [r for r in ranges if r[0] is not None]
value = ht[field_name][f'{subfield}_id']
value = ht[field][f'{subfield}_id']
return hl.any(lambda r: (value >= r[0]) & (value <= r[1]), ranges)

def _format_results(self, ht, *args, **kwargs):
Expand Down
13 changes: 5 additions & 8 deletions hail_search/queries/snv_indel_37.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from collections import OrderedDict
import hail as hl

from hail_search.constants import CLINVAR_KEY, CLINVAR_MITO_KEY, HGMD_KEY, HGMD_PATH_RANGES, \
GNOMAD_GENOMES_FIELD, PREFILTER_FREQ_CUTOFF, PATH_FREQ_OVERRIDE_CUTOFF, PATHOGENICTY_SORT_KEY, PATHOGENICTY_HGMD_SORT_KEY, \
from hail_search.constants import CLINVAR_KEY, HGMD_KEY, HGMD_PATH_RANGES, \
GNOMAD_GENOMES_FIELD, PREFILTER_FREQ_CUTOFF, PATH_FREQ_OVERRIDE_CUTOFF, PATHOGENICTY_HGMD_SORT_KEY, \
SPLICE_AI_FIELD, GENOME_VERSION_GRCh37
from hail_search.queries.base import PredictionPath, QualityFilterFormat
from hail_search.queries.mito import MitoHailTableQuery
Expand All @@ -28,10 +28,10 @@ class SnvIndelHailTableQuery37(MitoHailTableQuery):
GNOMAD_GENOMES_FIELD: {'filter_af': 'AF_POPMAX_OR_GLOBAL', 'het': None, 'sort': 'gnomad'},
}
PREDICTION_FIELDS_CONFIG = {
'cadd': PredictionPath('cadd', 'PHRED'),
'cadd': PredictionPath('dbnsfp', 'CADD_phred'),
'eigen': PredictionPath('eigen', 'Eigen_phred'),
'mpc': PredictionPath('mpc', 'MPC'),
'primate_ai': PredictionPath('primate_ai', 'score'),
'mpc': PredictionPath('dbnsfp', 'MPC_score'),
'primate_ai': PredictionPath('dbnsfp', 'PrimateAI_score'),
SPLICE_AI_FIELD: PredictionPath(SPLICE_AI_FIELD, 'delta_score'),
'splice_ai_consequence': PredictionPath(SPLICE_AI_FIELD, 'splice_consequence'),
'mut_taster': PredictionPath('dbnsfp', 'MutationTaster_pred'),
Expand All @@ -43,7 +43,6 @@ class SnvIndelHailTableQuery37(MitoHailTableQuery):
**MitoHailTableQuery.PATHOGENICITY_FILTERS,
HGMD_KEY: ('class', HGMD_PATH_RANGES),
}
PATHOGENICITY_FIELD_MAP = {}
ANNOTATION_OVERRIDE_FIELDS = [SPLICE_AI_FIELD]

CORE_FIELDS = MitoHailTableQuery.CORE_FIELDS + ['CAID']
Expand All @@ -60,11 +59,9 @@ class SnvIndelHailTableQuery37(MitoHailTableQuery):
'format_value': lambda value: value.region_types.first(),
},
}
ENUM_ANNOTATION_FIELDS[CLINVAR_KEY] = ENUM_ANNOTATION_FIELDS.pop(CLINVAR_MITO_KEY)

SORTS = {
**MitoHailTableQuery.SORTS,
PATHOGENICTY_SORT_KEY: lambda r: [MitoHailTableQuery.CLINVAR_SORT(CLINVAR_KEY, r)],
PATHOGENICTY_HGMD_SORT_KEY: lambda r: [MitoHailTableQuery.CLINVAR_SORT(CLINVAR_KEY, r), r.hgmd.class_id],
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from reference_data.management.commands.update_gencc import GenCCReferenceDataHandler
from reference_data.management.commands.update_clingen import ClinGenReferenceDataHandler
from reference_data.management.commands.update_refseq import RefseqReferenceDataHandler
from reference_data.models import GeneInfo


logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -52,6 +53,9 @@ def handle(self, *args, **options):
update_failed = []

if not options["skip_gencode"]:
if GeneInfo.objects.count() > 0:
logger.info('Skipping update_all_reference_data because GeneInfo is already loaded')
return
# Download latest version first, and then add any genes from old releases not included in the latest release
# Old gene ids are used in the gene constraint table and other datasets, as well as older sequencing data
update_gencode(LATEST_GENCODE_RELEASE, reset=True)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from reference_data.management.commands.update_gencc import GenCCReferenceDataHandler
from reference_data.management.commands.update_clingen import ClinGenReferenceDataHandler
from reference_data.management.commands.update_refseq import RefseqReferenceDataHandler
from reference_data.models import GeneInfo


def omim_exception(omim_key):
Expand Down Expand Up @@ -78,7 +79,15 @@ def test_update_all_reference_data_command(self):
call_command('update_all_reference_data')
self.assertEqual(str(err.exception), 'Error: one of the arguments --omim-key --use-cached-omim --skip-omim is required')

# Test update is skipped when data is already loaded
self.mock_update_gencode.assert_not_called()
self.mock_omim.assert_not_called()
self.mock_cached_omim.assert_not_called()
self.mock_update_records.assert_not_called()
self.mock_update_hpo.assert_not_called()

# Test update all gencode, no skips, fail primate_ai and mgi
GeneInfo.objects.all().delete()
call_command('update_all_reference_data', '--omim-key=test_key')

calls = [
Expand Down
13 changes: 12 additions & 1 deletion seqr/utils/search/add_data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@
from seqr.models import Sample, Individual, Project
from seqr.utils.communication_utils import send_project_notification, safe_post_to_slack
from seqr.utils.logging_utils import SeqrLogger
from seqr.utils.middleware import ErrorsWarningsException
from seqr.utils.search.utils import backend_specific_call
from seqr.utils.search.elasticsearch.es_utils import validate_es_index_metadata_and_get_samples
from seqr.views.utils.airtable_utils import AirtableSession, ANVIL_REQUEST_TRACKING_TABLE
from seqr.views.utils.dataset_utils import match_and_update_search_samples, load_mapping_file
from seqr.views.utils.export_utils import write_multiple_files
from seqr.views.utils.pedigree_info_utils import get_no_affected_families
from settings import SEQR_SLACK_DATA_ALERTS_NOTIFICATION_CHANNEL, BASE_URL, ANVIL_UI_URL, \
SEQR_SLACK_ANVIL_DATA_LOADING_CHANNEL

Expand Down Expand Up @@ -144,14 +146,23 @@ def _upload_data_loading_files(projects: list[Project], user: User, file_path: s
'Individual_ID': F('individual_id'),
'Paternal_ID': F('father__individual_id'), 'Maternal_ID': F('mother__individual_id'), 'Sex': F('sex'),
})
annotations = {'project': F('family__project__guid'), **file_annotations}
annotations = {'project': F('family__project__guid'), 'affected_status': F('affected'), **file_annotations}
individual_filter = {'id__in': individual_ids} if individual_ids else {'family__project__in': projects}
data = Individual.objects.filter(**individual_filter).order_by('family_id', 'individual_id').values(
**dict(annotations))

data_by_project = defaultdict(list)
affected_by_family = defaultdict(list)
for row in data:
data_by_project[row.pop('project')].append(row)
affected_by_family[row['Family_GUID']].append(row.pop('affected_status'))

no_affected_families =get_no_affected_families(affected_by_family)
if no_affected_families:
families = ', '.join(sorted(no_affected_families))
raise ErrorsWarningsException(errors=[
f'The following families have no affected individuals and can not be loaded to seqr: {families}',
])

header = list(file_annotations.keys())
files = [(f'{project_guid}_pedigree', header, rows) for project_guid, rows in data_by_project.items()]
Expand Down
23 changes: 6 additions & 17 deletions seqr/views/apis/anvil_workspace_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ def create_project_from_workspace(request, namespace, name):
error = 'Field(s) "{}" are required'.format(', '.join(missing_fields))
return create_json_response({'error': error}, status=400, reason=error)

pedigree_records, _ = _parse_uploaded_pedigree(request_json)
pedigree_records = _parse_uploaded_pedigree(request_json)

# Create a new Project in seqr
project_args = {
Expand Down Expand Up @@ -229,7 +229,7 @@ def add_workspace_data(request, project_guid):
error = 'Field(s) "{}" are required'.format(', '.join(missing_fields))
return create_json_response({'error': error}, status=400, reason=error)

pedigree_records, records_by_family = _parse_uploaded_pedigree(request_json, project=project)
pedigree_records = _parse_uploaded_pedigree(request_json, project=project)

previous_samples = get_search_samples([project]).filter(dataset_type=Sample.DATASET_TYPE_VARIANT_CALLS)
sample = previous_samples.first()
Expand All @@ -239,8 +239,9 @@ def add_workspace_data(request, project_guid):
}, status=400)
sample_type = sample.sample_type

families = {record[JsonConstants.FAMILY_ID_COLUMN] for record in pedigree_records}
previous_loaded_individuals = previous_samples.filter(
individual__family__family_id__in=records_by_family,
individual__family__family_id__in=families,
).values_list('individual_id', 'individual__individual_id', 'individual__family__family_id')
missing_samples_by_family = defaultdict(list)
for _, individual_id, family_id in previous_loaded_individuals:
Expand Down Expand Up @@ -279,22 +280,10 @@ def _parse_uploaded_pedigree(request_json, project=None):
errors.append('The following samples are included in the pedigree file but are missing from the VCF: {}'.format(
', '.join(missing_samples)))

records_by_family = defaultdict(list)
for record in pedigree_records:
records_by_family[record[JsonConstants.FAMILY_ID_COLUMN]].append(record)

no_affected_families = [
family_id for family_id, records in records_by_family.items()
if not any(record[JsonConstants.AFFECTED_COLUMN] == Individual.AFFECTED_STATUS_AFFECTED for record in records)
]

if no_affected_families:
errors.append('The following families do not have any affected individuals: {}'.format(', '.join(no_affected_families)))

if errors:
raise ErrorsWarningsException(errors, [])

return pedigree_records, records_by_family
return pedigree_records


def _trigger_add_workspace_data(project, pedigree_records, user, data_path, sample_type, previous_loaded_ids=None, get_pedigree_json=False):
Expand Down Expand Up @@ -331,7 +320,7 @@ def _trigger_add_workspace_data(project, pedigree_records, user, data_path, samp
try:
email_body = f"""Hi {user.get_full_name() or user.email},
We have received your request to load data to seqr from AnVIL. Currently, the Broad Institute is holding an
internal retreat or closed for the winter break so we are unable to load data until mid-January
internal retreat or closed for the winter break so we may not be able to load data until mid-January
{loading_warning_date.year + 1}. We appreciate your understanding and support of our research team taking
some well-deserved time off and hope you also have a nice break.
- The seqr team
Expand Down
Loading

0 comments on commit f59d904

Please sign in to comment.