Skip to content

Comparison of surface accessibility in multiple samples

Salvador Martinez-Bartolome edited this page Dec 21, 2018 · 4 revisions

The program is encoded in the class edu.scripps.yates.pcq.quantsite.QuantSiteOutputComparator.
Having the jar file QuantSiteoutputcomparator.jar:

Therefore the command to use it is:

java -jar QuantSitecomparator.jar -f /home/salvador/file_with_paths -out output -RInf 1000 -number_sigmas 2

An explanation of the parameters:

 java -jar QuantSiteoutputComparator.jar
with the following parameters:
 -f,--input_file <arg>             [MANDATORY] Full path to a file containing pairs (separated by TAB) of sample names and full path to the
                                   peptideNodeTable of a PCQ run to compare
 -md,--minimum_discoveries <arg>   [OPTIONAL] minimum number of discoveries (significantly different between two samples) required for a quantified
                                   site to be in the output files. If not provided, there will be no minimum number, although no quant sites without
                                   any significantly different site between 2 samples will be reported in the Excel output file.
 -ns,--number_sigmas <arg>         [OPTIONAL] number of sigmas that will be used to decide whether an INFINITY ratio is significantly different than a
                                   FINITE ratio.
                                   If R1=POSITIVE_INFINITY and R2 < avg_distribution + ns*sigma_distribution_of_ratios, then R2 is significantly
                                   different.
                                   If R1=NEGATIVE_INFINITY and R2 > avg_distribution + ns*sigma_distribution_of_ratios, then R2 is significantly
                                   different.
 -out,--output_file_name <arg>     [MANDATORY] Output file name that will be created in the current folder
 -pvc,--pvalue_correction <arg>    [OPTIONAL] p-value correction method to apply. Valid values are: BH,BY,BONFERRONI,HOCHBERG,HOLM,HOMMEL. If not
                                   provided, the method will be BY (Reference: Yoav Benjamini, Daniel Yekutieli, "The control of the false discovery
                                   rate in multiple testing under dependency", Ann. Statist., Vol. 29, No. 4 (2001), pp. 1165-1188,
                                   DOI:10.1214/aos/1013699998 JSTOR:2674075)
 -qvt,--qvalue_threshold <arg>     [OPTIONAL] q-value threshold to apply to the corrected p-values. A value between 0 and 1 is permitted. If not
                                   provided, a threshold of 0.05 will be applied.
 -RInf,--replace_infinity <arg>    [OPTIONAL] -RInf replaces +/- Infinity with a user defined (+/-) value in the output summary table file

Detailed method explanation:

PCQ_site_comparator is a post analysis tool that statistically compares the relative abundance of individual peptide nodes or sites across multiple samples. It reports back the difference in ratio values as well as the corresponding p-value for each pairwise comparison. Quantitative ratios (mean and standard deviation) for each quantified site are analyzed from multiple PCQ runs. A pairwise t-test is performed for each quantified site that is represented as peptide node in PCQ, and the t-test result stored in a pairwise sample comparison matrix that encompasses all samples. The t-test is a two sample t-test between the individual ratio measurements of the site in one sample against the individual ratio measurements of the same site in the other sample.
Peptide nodes may not be quantified with a relative ratio value but might be only “light” or “heavy” which is an infinity value upon logarithmic conversion. If a peptide node encompasses several individual infinity measurements, pcq applies a “majority rules” approach. If several infinity values of opposite sign are present, the majority of the individual ratio measurements in the quant site’s peptide node were infinities of the same sign is further considered in pcq_site_comparator. If in the pairwise sample comparison both samples are infinity and the signs of the infinities are opposite, it is significant, and if the sign of the infinities is the same, it is non-significant.
In cases wherein the site was measured with a ratio value in one sample and infinity in the second sample, the infinity value is replaced with the average mean and standard deviation (rmean , SDmean) of all ratio measurements of the respective sample. The following rules decide whether the ratio value of the first sample is significant if:

  • rexp1 < rmean + 2 *SDmean for rexp1 > 0 and
  • rexp1 > rmean + 2 *SDmean for rexp1 < 0.

The results of this test for significance is displayed as a p-value of “0.0” in the sample comparison matrix.
In the absence of any value in one the of samples, the pairwise comparison with denote a “NaN”.
All other p-values expect the ones described above, are corrected for multiple hypothesis testing (1).
Finally, for all pairwise comparisons in the sample comparison matrix, all entries are counted that display a p-value < 0.05 (user defined threshold). PCQ_site_comparator outputs the distribution of number of discoveries per site for further evaluation.

The program outputs:

  • A single TSV file with the values used to calculate the p-values per each quantified site in each sample (output_comparison.tsv).
  • A single text file with the matrixes of p-values per each of the quantified sites (output_comparison_QVALUES_matrixes.txt). The quantified sites are sorted by descending number of discoveries.
  • A single Excel file with the matrixes of p-values per each of the quantified sites, having each quantified site in a different sheet. The quantified sites are sorted by descending number of discoveries.

References
(*) Yoav Benjamini, Daniel Yekutieli, "The control of the false discovery rate in multiple testing under dependency", Ann. Statist., Vol. 29, No. 4 (2001), pp. 1165-1188, DOI:10.1214/aos/1013699998 JSTOR:2674075
(**) a discovery means a p-value < than the threshold 0.05 in a comparison of ratios between two samples.