Skip to content
This repository has been archived by the owner on Apr 21, 2022. It is now read-only.

Commit

Permalink
Merge pull request #25 from a-slide/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
a-slide authored May 18, 2020
2 parents 0b780bd + 04259e8 commit e11e705
Show file tree
Hide file tree
Showing 223 changed files with 91,877 additions and 2,446 deletions.
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,18 @@
* Implement autodoc from docstring
* Fix and test CLI

### 09/10/2019 v-0.4.0
### 01/04/2020 v-0.4.0

* General improvement of logging message output
* Implement fancy color logger
* Add position tracking within intervals
* Add Comp_Report module to generate HTML reports of significant candidates

### 15/01/2019 v-0.4.5

* Add tabular text reports to Comp_report
* Add (default) option to write out all the intervals in Meth_Comp with reasons why excluded or included in DM analysis
* Improve Comp_report summary table and include new fields from Meth_Comp
* Tidy output folder structure for reports generated by meth_report
* Add Chromosome ideogram plot to summary report
* A fasta reference is now required to run Comp_Report
83 changes: 48 additions & 35 deletions docs/Comp_Report/API_usage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2020-04-07T14:13:14.870400Z",
"start_time": "2020-04-07T14:13:14.032380Z"
"end_time": "2020-05-18T22:05:17.635551Z",
"start_time": "2020-05-18T22:05:17.060005Z"
},
"init_cell": true,
"scrolled": true
Expand All @@ -51,18 +51,18 @@
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2020-04-07T14:13:14.882221Z",
"start_time": "2020-04-07T14:13:14.871885Z"
"end_time": "2020-05-18T22:05:17.652370Z",
"start_time": "2020-05-18T22:05:17.637512Z"
},
"init_cell": true
},
"outputs": [
{
"data": {
"text/markdown": [
"**Comp_Report** (methcomp_fn, gff3_fn, outdir, n_top, max_tss_distance, pvalue_threshold, min_diff_llr, verbose, quiet, progress, kwargs)\n",
"\n",
"**Comp_Report** (methcomp_fn, gff3_fn, ref_fasta_fn, outdir, n_top, max_tss_distance, pvalue_threshold, min_diff_llr, n_len_bin, verbose, quiet, progress, kwargs)\n",
"\n",
"Generate an HTML report of significantly differentially methylated CpG intervals from `Meth_Comp` text output. Significant intervals are annotated with their closest transcript TSS.\n",
"\n",
"---\n",
"\n",
Expand All @@ -74,15 +74,19 @@
"\n",
"Path to an **ensembl GFF3** file containing genomic annotations. Only the transcripts details are extracted.\n",
"\n",
"* **ref_fasta_fn** (required) [str]\n",
"\n",
"Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)\n",
"\n",
"* **outdir** (default: \"\") [str]\n",
"\n",
"Directory where to output HTML reports, By default current directory\n",
"\n",
"* **n_top** (default: 50) [int]\n",
"* **n_top** (default: 100) [int]\n",
"\n",
"Number of top interval candidates for which to generate an interval report. If there are not enough significant candidates this is automatically scaled down.\n",
"\n",
"* **max_tss_distance** (default: 100000) [int]\n",
"* **max_tss_distance** (default: 500000) [int]\n",
"\n",
"Maximal distance to transcription stat site to find transcripts close to interval candidates\n",
"\n",
Expand All @@ -94,6 +98,10 @@
"\n",
"Minimal llr boundary for negative and positive median llr. 1 is recommanded for vizualization purposes.\n",
"\n",
"* **n_len_bin** (default: 500) [int]\n",
"\n",
"Number of genomic intervals for the longest chromosome of the ideogram figure\n",
"\n",
"* **verbose** (default: False) [bool]\n",
"\n",
"* **quiet** (default: False) [bool]\n",
Expand Down Expand Up @@ -131,11 +139,11 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2020-04-07T14:13:21.209471Z",
"start_time": "2020-04-07T14:13:19.469317Z"
"end_time": "2020-05-18T22:03:12.150416Z",
"start_time": "2020-05-18T22:03:11.494377Z"
},
"scrolled": false
},
Expand All @@ -147,30 +155,33 @@
"\u001b[01;34m## Checking options and input files ##\u001b[0m\n",
"\u001b[37m\t[DEBUG]: Options summary\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tPackage name: pycoMeth\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tPackage version: 0.4.0\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tTimestamp: 2020-04-07 15:13:19.471381\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tkwargs\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tprogress: False\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tquiet: False\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tverbose: True\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tmin_diff_llr: 1\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tpvalue_threshold: 0.05\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tmax_tss_distance: 100000\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tn_top: 50\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \toutdir: yeast_html\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tgff3_fn: ./data/yeast.gff3\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tPackage version: 0.4.3\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tTimestamp: 2020-05-18 23:03:11.505853\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tmethcomp_fn: ./data/Yeast_CGI_meth_comp.tsv.gz\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tgff3_fn: ./data/yeast.gff3\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tref_fasta_fn: ./data/yeast.fa\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \toutdir: yeast_html\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tn_top: 100\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tmax_tss_distance: 500000\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tpvalue_threshold: 0.05\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tmin_diff_llr: 1\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tn_len_bin: 1000\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tverbose: True\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tquiet: False\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tprogress: False\u001b[0m\n",
"\u001b[37m\t[DEBUG]: \tkwargs\u001b[0m\n",
"\u001b[01;34m## Loading and preparing data ##\u001b[0m\n",
"\u001b[32m\tLoading Methcomp data from TSV file\u001b[0m\n",
"\u001b[32m\tLoading transcript info from GFF file\u001b[0m\n",
"\u001b[32m\tLoading transcripts info from GFF file\u001b[0m\n",
"\u001b[32m\tLoading chromosome info from reference FASTA file\u001b[0m\n",
"\u001b[32m\tNumber of significant intervals found (adjusted pvalue<0.05): 1\u001b[0m\n",
"\u001b[01;31mERROR: Low number of significant sites. The summary report will likely contains errors\u001b[0m\n",
"\u001b[01;31mERROR: Number of significant intervals lower than number of top candidates to plot\u001b[0m\n",
"\u001b[32m\tGenerating file names for top candidates reports\u001b[0m\n",
"\u001b[32m\tComputing source md5\u001b[0m\n",
"\u001b[01;34m## Parsing methcomp data ##\u001b[0m\n",
"\u001b[32m\tIterating over significant intervals and generating top candidates reports\u001b[0m\n",
"\u001b[37m\t[DEBUG]: Ploting top candidates: V-65,380-65,612\u001b[0m\n",
"\u001b[37m\t[DEBUG]: Ploting top candidates: ('V', 65380, 65612)\u001b[0m\n",
"\u001b[32m\tGenerating summary report\u001b[0m\n"
]
}
Expand All @@ -179,6 +190,7 @@
"Comp_Report (\n",
" methcomp_fn = \"./data/Yeast_CGI_meth_comp.tsv.gz\",\n",
" gff3_fn = \"./data/yeast.gff3\",\n",
" ref_fasta_fn=\"./data/yeast.fa\",\n",
" outdir = \"yeast_html\",\n",
" pvalue_threshold = 0.05,\n",
" verbose=True)"
Expand All @@ -193,11 +205,11 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2020-04-07T14:13:56.452332Z",
"start_time": "2020-04-07T14:13:23.787913Z"
"end_time": "2020-05-18T22:05:36.615867Z",
"start_time": "2020-05-18T22:05:24.153058Z"
},
"scrolled": true
},
Expand All @@ -209,14 +221,14 @@
"\u001b[01;34m## Checking options and input files ##\u001b[0m\n",
"\u001b[01;34m## Loading and preparing data ##\u001b[0m\n",
"\u001b[32m\tLoading Methcomp data from TSV file\u001b[0m\n",
"\u001b[32m\tLoading transcript info from GFF file\u001b[0m\n",
"\u001b[01;31mERROR: Not all the chromosomes found in the data file are present in the GFF3 file. This will lead to missing transcript ids\u001b[0m\n",
"\u001b[32m\tLoading transcripts info from GFF file\u001b[0m\n",
"\u001b[32m\tLoading chromosome info from reference FASTA file\u001b[0m\n",
"\u001b[32m\tNumber of significant intervals found (adjusted pvalue<0.01): 3532\u001b[0m\n",
"\u001b[32m\tGenerating file names for top candidates reports\u001b[0m\n",
"\u001b[32m\tComputing source md5\u001b[0m\n",
"\u001b[01;34m## Parsing methcomp data ##\u001b[0m\n",
"\u001b[32m\tIterating over significant intervals and generating top candidates reports\u001b[0m\n",
"\tProgress: 100%|██████████| 3.53k/3.53k [00:29<00:00, 122 intervals/s] \n",
"\tProgress: 100%|██████████| 3.53k/3.53k [00:09<00:00, 384 intervals/s]\n",
"\u001b[32m\tGenerating summary report\u001b[0m\n"
]
}
Expand All @@ -225,8 +237,9 @@
"Comp_Report (\n",
" methcomp_fn = \"./data/Medaka_CGI_meth_comp.tsv.gz\",\n",
" gff3_fn = \"./data/medaka.gff3\",\n",
" ref_fasta_fn=\"./data/medaka.fa\",\n",
" outdir = \"medaka_html\",\n",
" n_top=25,\n",
" n_top=50,\n",
" progress=True)"
]
}
Expand All @@ -235,9 +248,9 @@
"celltoolbar": "Initialization Cell",
"hide_input": false,
"kernelspec": {
"display_name": "pycoMeth",
"display_name": "Python [conda env:pycoMeth]",
"language": "python",
"name": "pycometh"
"name": "conda-env-pycoMeth-py"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -249,7 +262,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit e11e705

Please sign in to comment.