-
Notifications
You must be signed in to change notification settings - Fork 0
PCTSEA guide for developers
This guide is supposed to help future developers to continue improving this software.
See a flow chart here
Both command line and web versions are coupled to the pctsea-core module where the PCTSEA.java class is defined and where the logic of the analysis is implemented, more in particular, in the run() method which has been filled with comments so that can be followed.
This class is well documented, with a lot of comments along the code, however here there is more information about it that might be useful:
In order to change scoring methods, the developer should focus on method
private int calculateScoresToRankSingleCells(
List<SingleCell> singleCellList,
GeneExpressionsRetriever interactorExpressions,
ScoringSchema scoringSchema,
boolean writeScoresFile,
boolean outputToLog,
boolean getExpressionsUsedForScore,
boolean takeZerosForCorrelation,
double minCorrelation) throws IOException
where depending on the ScoringMethod of the ScoringSchema a different score is calculated per SingleCell that reorder them in a ranking list used in the Kolmogorov-Smirnov test used for the calculation of the enrichment score.
Inside this method there is a switch clause that calls the appropriate method depending on the ScoringMethod:
switch (scoringMethod) {
case PEARSONS_CORRELATION:
singleCell.calculateCorrelation(interactorExpressions, getExpressionsUsedForScore, minCorrelation);
break;
case SIMPLE_SCORE:
singleCell.calculateSimpleScore(interactorExpressions, getExpressionsUsedForScore, minCorrelation);
break;
case DOT_PRODUCT:
singleCell.calculateDotProductScore(interactorExpressions, takeZerosForCorrelation, getExpressionsUsedForScore);
break;
case REGRESSION:
singleCell.calculateRegressionCoefficient(interactorExpressions, getExpressionsUsedForScore);
break;
default:
throw new IllegalArgumentException("Method " + scoringMethod.getScoreName() + " still not supported.");
}
As you can note, the implementation of the scores is actually performed inside of each singleCell object.
Once all single cells have a score of similarity against the input protein list, we used the ranked list of single cells in a Kolmogorov-Smirnov test, following indications similar to Gene Set Enrichment Analysis. This is implemented in the method calculateEnrichmentScore
and the enrichment scores are stored in the CellTypeClassification objects.
Then, following the same principles described in the GSEA analysis article, we calculate the significance of the enrichment scores by randomly permutating the cell types of the single cells and recalculating the enrichment scores until having a distribution to use for calculating a p-value. This is implemented in the method calculateSignificanceByCellTypesPermutations
where, after permutating the cell types, calls to the method calculateEnrichmentScore with the parameter flag permutatedData=true
. Then, the p-value associated with each real enrichment score x of each cell type will be the proportion of random enrichment scores x' greater or equal to x divided by the total number of random enrichment scores obtained for that cell type.
Once we have a p-value per cell type, we want to calculate an FDR associated with each cell type, and we do this by using the real enrichment scores xt of all cell types t, and all the random enrichment scores x't of all cell types t. The FDR for a certain cell type t will be the number of random enrichment scores that are greater or equal than xt (snull
) divided by the number of real enrichment scores that are greater or equal than xt (sobs
). However, a factor of normalization by the number of cells in the cell type t is applied to that number. See line of code:
// nobs is the total number of real scores
// nnull is the total number of random scores
final int nobs = totalRealNormalizedScores.size();
final int nnull = totalRandomNormalizedScores.size();
fdr = (1.0 * snull / sobs) * (1.0 * nobs / nnull);
This is implemented at the end of the method calculateSignificanceByCellTypesPermutations
.
Proteomics Yates Laboratory
Salvador Martínez-Bartolomé (salvador at scripps.edu)
Research Associate
The Scripps Research Institute
10550 North Torrey Pines Road
La Jolla, CA 92037
Git-Hub profile