Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show RNAseq type in sample hover #4171

Merged
merged 16 commits into from
Jul 1, 2024
Merged

show RNAseq type in sample hover #4171

merged 16 commits into from
Jul 1, 2024

Conversation

jklugherz
Copy link
Contributor

Screenshot 2024-06-17 at 3 29 57 PM

@jklugherz jklugherz marked this pull request as ready for review June 21, 2024 18:32
@jklugherz jklugherz requested review from bpblanken and hanars June 21, 2024 18:32
@@ -43,7 +43,20 @@ def family_page_data(request, family_guid):
has_case_review_perm = has_case_review_permissions(project, request.user)

sample_models = Sample.objects.filter(individual__family=family)
samples = get_json_for_samples(sample_models, project_guid=project.guid, family_guid=family_guid, skip_nested=True, is_analyst=is_analyst)
additional_values = {
'rnaSeqTpm': Case(When(Exists(RnaSeqTpm.objects.filter(sample_id=OuterRef('pk'))), then=Value('TPM')), default=None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you should be able to do this using the related field on the sample model without having to write query objects with outer refs - something like When(rnaseqtpm__isnull=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had the __isnull syntax but the query was timing out with more than 1 rnaseq type. I could add indexes on the foreign keys to the rnaseq tables on the sample model?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be indices on the foreign keys already, so that sounds like a case where what django is doing is not what we want. You could try using the debug toolbar to see the actual executed SQL and figure out why its so inefficient and adjust it, but honestly that feels like overkill and its probably better not to try to use those fields in this case

That said, if you are going to make 3 distinct queries to each of the subtypes and then iterate through all the sample models and add annotations to them, using the additional_values does not really save us any compute and just makes the whole thing less readable in the end. In that case, you are better off with getting lists of sample Ids for each type and then adding to rnaSeqTypes based on that, something like

rna_type_samples = {
    'TPM': set(RnaSeqTpm.objects.filter(sample__in= sample_models).values_list('sample', flat=True),
    'Expression Outlier': set(RnaSeqOutlier.objects.filter(...
    'Splice Outlier': set(RnaSeqSpliceOutlier.objects.filter(...
}
...
sample['rnaSeqTypes'] = [type for type, sample_ids in rna_type_samples.items() if sample['id'] in sample_ids]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it this way ^ which does not require changes to the JS code as you suggested here.

Is there a general rule for what belongs in additional_values and when it's better to query related tables separately?

@jklugherz jklugherz requested a review from hanars June 27, 2024 15:38
@@ -169,7 +169,7 @@ const DataDetails = React.memo(({ loadedSamples, individual, mmeSubmission }) =>
/>
) : <MmeStatusLabel title="Submitted to MME" dateField="lastModifiedDate" color="violet" individual={individual} mmeSubmission={mmeSubmission} />
)}
{individual.hasRnaOutlierData && (
{loadedSamples.some(sample => sample.rnaSeqTypes?.length > 0) && (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasRnaOutlierData was based on whether there was rnaseqoutlier or rnaseqspliceoutlier data, for samples with only tpm data this would have been false, so we should update the behavior here to reflect that. Also it only considered active samples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted this condition check to match the behavior to show the link only if the sample is active and has either rnaseqoutlier or rnaseqspliceoutlier data

`data was${isOutdated ? ' previously ' : ''} ${hoverDetails ? `${hoverDetails} on ${new Date(loadedSample.loadedDate).toLocaleDateString()}` : 'loaded'}` :
'no data available'}
<VerticalSpacer height={5} />
{loadedSample.sampleType && loadedSample.sampleType === SAMPLE_TYPE_RNA &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rnaSeqTypes will only have values for RNA samples so you can remove this part of the condition check

Copy link
Contributor Author

@jklugherz jklugherz Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this part of the check some old(?) rna samples which do not have any matches in the 3 rnaseqtype tables will display like this:
image

There are 76 of these samples all created on 2016-12-05

WITH sample_rnaseq as (
    select s.guid, s.id, s.created_date,
           seqr_rnaseqtpm.id as tpm_id, seqr_rnaseqoutlier.id as outlier_id, sr.id as splice_outlier_id
    from seqr_sample s
    left join seqr_rnaseqtpm on s.id = seqr_rnaseqtpm.sample_id
    left join seqr_rnaseqoutlier on s.id = seqr_rnaseqoutlier.sample_id
    left join public.seqr_rnaseqspliceoutlier sr on s.id = sr.sample_id
    where s.sample_type in ('RNA')
)
SELECT * FROM sample_rnaseq
WHERE tpm_id is null
and outlier_id is null
and splice_outlier_id is null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're all probably from the R0246_rna_seq_bc project, I've been using that for testing since I was certain it would have rna samples given the name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we should only be showing this for active samples, not inactive sample

@jklugherz jklugherz requested a review from hanars June 27, 2024 21:26
@jklugherz jklugherz merged commit 0492c04 into dev Jul 1, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants