Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about sequin concentration #72

Open
sparthib opened this issue Nov 19, 2024 · 3 comments
Open

question about sequin concentration #72

sparthib opened this issue Nov 19, 2024 · 3 comments

Comments

@sparthib
Copy link

sparthib commented Nov 19, 2024

Hi @cying111 , thank you so much for putting this dataset together. I am interested in comparing different quantification methods using this dataset, and I am specifically interested in this sample:

SGNex_Hct116_directcDNA_replicate3_run2 which has RNA sequin Mix A at this concentration according to the sample spreadsheet: 1% RNA sequin Mix A v1.0 @3ng

I was wondering if you could help breakdown what the 1% and 3ng means in this case?

In the RNA Mix A spreadsheet, under version 1, say there are these four quantities,

Mix A (version 1)
R1_11_1 | 161.132813
R1_11_2 | 80.5664063
R1_12_1 | 1.77720014
R1_12_2 | 28.4352022

Does 1% mean that 1% of these values are the number of corresponding reads found in the fastq file? I'm still confused about what 3ng refers to. Hope you can help clarify.

Thank you,

Sowmya

@cying111
Copy link
Collaborator

Hi @sparthib,

We are glad to know that you find this resource helpful! And I am very sorry for getting back lately.

For the spike-in concentration, 1% means that of total RNAs, the spike-in is 1%, so you can interpret it as for the total sequencing reads for that sample, 1% should be expected to be spike-in reads. 3ng is just the amount of RNAs for the spike-in, so it should be 1% of the total mRNA amount for sample.

Hope this clarifies your question!

Thank you
Warm regards,
Ying

@sparthib
Copy link
Author

sparthib commented Dec 5, 2024

Thanks for your response @cying111! Do you have suggestions on how to go about finding the true counts or CPM of the spike-ins or SIRVs in each of the samples?

I came across the SIRV-1 concentration calculator on the main README and I am not sure I am using it right. Would be great if there's a pre-exisiting table with the true counts information for the spiked samples listed here.

Thanks,
Sowmya

@sparthib
Copy link
Author

sparthib commented Dec 30, 2024

Hi @cying111 as a follow up, I am going through the transcriptome aligned bam files, and I expected only 1% of the reads to be spike-ins, for example, I expected most of the transcripts this sample: SGNex_Hct116_cDNA_replicate3_run3 aligns to, to be ENSEMBL ID'ed, but turns out they are all spikeins? I'm not sure if I am misinterpreting this. Also, I see that a lot of the alignments here are secondary/supplementary how would you suggest I go about calculating the CPM and comparing them against the known concentrations?

Additionally, I am trying to obtain the length of the transcript these originate from, should I calculate that from the length of the strings in the transcriptome fasta or directly obtain from the GTF file? (I'm assuming the length in the GTF file includes intronic regions?)

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants