-
Notifications
You must be signed in to change notification settings - Fork 3
Different ways of computing the fitness of an ORF from the normalized reads #25
Comments
Using this normalization, do you consider the fact of the differences in the number of insertions in each chromosome is not constant? And also that the differences in the number of reads between different SATAY experiments? |
In this case I get at least that dbem1 fitness is lower than dbem3 , which is what I naively expect . When I try to predict the most correlated genes to dpl1 , to get the positive and negative interactors I get this: the gene names that have a "-new" label added to it are the genes I predict that do not match with current annotations in SGD. The ones that do not have the label are matching the current knowledge. The script can be found here : https://github.com/Gregory94/LaanLab-SATAY-DataAnalysis/blob/dev_Leila/Python_scripts/corrrelation-based-networks/fitness-WT-mutant-from-reads.py |
I just took the values from your column of 'reads_per_bp_80%' of each gene and divided by the value of HO, as a first trying out . I opened this issue to discuss on other possible ideas to implement. |
Liedewij suggested in the meeting with Werner, to use an exponentially decaying weighted average for the 10kb non-coding regions around a gene. This would give more weight to non-coding regions close to the gene as they are more likely to be affected by similar low-frequency effects than the non-coding regions further away. |
Notes from Werner to calculate the fitness from insertions/reads data Suppose (to first approximation) every possible transposon insertion (labelled here a, b, c, d, …, n) has the same initial frequency, and is found in N_0 cases in the population. Then each cell with that insertion grows according to its fitness ω_n, commonly defined as inversely proportional to its doubling time τ_n. You observe N_(n,T), you control T, and you will have to estimate N_0 based on known fitness in the case of non-coding region insertions / HO locus insertions. |
I guess the question of the negative binomial has to do with the fact that the "abundance of cells with insertion n at t=T e.g. N(n,T)" is what we want to know , but what we observe is the number of reads of the insertion n, which can not be directly coupled with the number of cells with those insertions? |
I came across this article that discusses how to get genetic interactions from transposon sequencing data. It uses Bayesian statistics to determine whether a change in read count is significant and can be accounted for a change in fitness rather than stochasticity and/or noise. For this they also take a negative binomial distribution which they estimate using normal distributions. I don't understand the details yet, but maybe it is interesting. |
That is great! We should take a look at it, for sure . I knew that someone
would have already the same idea hahaha but I hadnt found it yet, thanks
Greg!
…On Thu, Oct 29, 2020, 15:17 Gregory94 ***@***.***> wrote:
I came across this article
<https://academic.oup.com/nar/article/45/11/e93/3044354> that discusses
how to get genetic interactions from transposon sequencing data. It uses
Bayesian statistics to determine whether a change in read count is
significant and can be accounted for a change in fitness rather than
stochasticity and/or noise. For this they also take a negative binomial
distribution which they estimate using normal distributions. I don't
understand the details yet, but maybe it is interesting.
They also have made a python script
<https://orca1.tamu.edu/essentiality/GI/> available in which you can
enter data (in the same format that we our data) and it will determine
significant changes in read counts. But they have a slightly different
experimental setup, so I am not sure if it would be useful for us.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACXNYSUGVC22USWDEHYITZ3SNF2P5ANCNFSM4SJ3NCBA>
.
|
Interesting! Their method is indeed a bit different from ours, but it might be an idea to store a sample of the induction culture (just before starting the reseed) @Wteunisse |
@EKingma, your comment is consequence of what they describe in Page 2 right? |
This is another paper that also takes a sample before reseeding to compare it with the cells after selection. They have a somewhat more similar approach to what we are currently trying to do for determining genetic interactions. The paper from Dejesus et.al. 2017 (the one I previously mentioned) compare their results with this paper and they are not very enthusiastic about it, yet I think it might still be interesting. |
We can discuss here different approaches on how to get fitness relative values from arbitrary units of normalized reads per ORF
The text was updated successfully, but these errors were encountered: