question about sequin concentration #72

sparthib · 2024-11-19T18:57:56Z

Hi @cying111 , thank you so much for putting this dataset together. I am interested in comparing different quantification methods using this dataset, and I am specifically interested in this sample:

SGNex_Hct116_directcDNA_replicate3_run2 which has RNA sequin Mix A at this concentration according to the sample spreadsheet: 1% RNA sequin Mix A v1.0 @3ng

I was wondering if you could help breakdown what the 1% and 3ng means in this case?

In the RNA Mix A spreadsheet, under version 1, say there are these four quantities,

Mix A (version 1)
R1_11_1 | 161.132813
R1_11_2 | 80.5664063
R1_12_1 | 1.77720014
R1_12_2 | 28.4352022

Does 1% mean that 1% of these values are the number of corresponding reads found in the fastq file? I'm still confused about what 3ng refers to. Hope you can help clarify.

Thank you,

Sowmya

The text was updated successfully, but these errors were encountered:

cying111 · 2024-11-29T06:50:34Z

Hi @sparthib,

We are glad to know that you find this resource helpful! And I am very sorry for getting back lately.

For the spike-in concentration, 1% means that of total RNAs, the spike-in is 1%, so you can interpret it as for the total sequencing reads for that sample, 1% should be expected to be spike-in reads. 3ng is just the amount of RNAs for the spike-in, so it should be 1% of the total mRNA amount for sample.

Hope this clarifies your question!

Thank you
Warm regards,
Ying

sparthib · 2024-12-05T16:28:40Z

Thanks for your response @cying111! Do you have suggestions on how to go about finding the true counts or CPM of the spike-ins or SIRVs in each of the samples?

I came across the SIRV-1 concentration calculator on the main README and I am not sure I am using it right. Would be great if there's a pre-exisiting table with the true counts information for the spiked samples listed here.

Thanks,
Sowmya

sparthib · 2024-12-30T00:02:39Z

Hi @cying111 as a follow up, I am going through the transcriptome aligned bam files, and I expected only 1% of the reads to be spike-ins, for example, I expected most of the transcripts this sample: SGNex_Hct116_cDNA_replicate3_run3 aligns to, to be ENSEMBL ID'ed, but turns out they are all spikeins? I'm not sure if I am misinterpreting this. Also, I see that a lot of the alignments here are secondary/supplementary how would you suggest I go about calculating the CPM and comparing them against the known concentrations?

Additionally, I am trying to obtain the length of the transcript these originate from, should I calculate that from the length of the strings in the transcriptome fasta or directly obtain from the GTF file? (I'm assuming the length in the GTF file includes intronic regions?)

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about sequin concentration #72

question about sequin concentration #72

sparthib commented Nov 19, 2024 •

edited

Loading

cying111 commented Nov 29, 2024

sparthib commented Dec 5, 2024

sparthib commented Dec 30, 2024 •

edited

Loading

question about sequin concentration #72

question about sequin concentration #72

Comments

sparthib commented Nov 19, 2024 • edited Loading

cying111 commented Nov 29, 2024

sparthib commented Dec 5, 2024

sparthib commented Dec 30, 2024 • edited Loading

sparthib commented Nov 19, 2024 •

edited

Loading

sparthib commented Dec 30, 2024 •

edited

Loading