A python package for finding co-regulated phospho sites from phosphoproteomics and proteomics data.
docs to come
PhDc uses optimized clustering to find co-regulated phosphosite modules. Then, PhDc uses protein and phosphoprotein abundance data from kinases and phosphatases to nominate module regulators. It can also use sample annotation data to identify modules significantly correlated with clinical variables.
- Use regularized linear regression to normalize phosphopeptide data by protein data.
- Column/sample normalize data.
a. Median normalization.
b. Upper quartile normalization.
c. Two-component median normalization.
For either mode 1 or 2, PhDc can also be used to find reproducibly clustered sites.
TODO: describe hypercluster a little
- Nominate regulators: given protein and phosphoprotein abundances of kinases and phosphatases, PhDc can find the most correlated putative regulators for each module.
- Correlate module scores with clinical continuous and categorical features. Identify modules most high correlated with each feature.
n$^|$ = # p-sites in original table
m = # samples
k = # different sample annotations
- Phosphosite x Sample table of log2 relative phospho abundances (e.g. from iTRAQ). csv or tsv. n$^|$*m
- Sample x annotation table for clinical annotations. csv or tsv. m*k
- Continuous/categorical labels for annotation table. csv or tsv. k*1
n = # p-sites after normalization
m = # samples
i = # different optimization conditions
j = # modules in best clustering
l = # putative regulators, i.e. kinases and phosphatases
- Protein-normalized phospho data (optional, only if normalizing) n*m
- Phosphosite x clustering attempts table of p-site module labels. tsv. n*i (optional, only if optimizing)
- Best scoring clustering result table of p-site module labels. tsv. n*1
a. Heatmaps of p-sites x samples with annotations for each module
b. Module scores x samples. tsv. j*m i. Clustered heatmap of module scores vs samples c. Coefficients of each kinase and phosphatase x modules. tsv. l*j
i. Clustered heatmap of high scoring regulators vs modules. - Reproducible clusters, phosphosites x cluster labels n*\1. (optional, only if using optimized clustering)
phosphodisco
|_parsers.py
|_classes.py
|_analyze_clusters.py
|_visualize.py
|_cli.py
|_tests
|_test_parsers.py
|_test_analyze_clusters.py
|_test_visualize.py
|_test_cli.py
data
|_kinase_list.tst
|_phosphatase_list.txt
|_all_list.txt
|_acetylase_list.txt
|_deacetylase_list.txt
docs
|_docs.rst
|_conf.py
|_requirements.txt
LICENSE
README.md
setup.py