Skip to content

Latest commit

 

History

History
executable file
·
147 lines (108 loc) · 11.3 KB

README.md

File metadata and controls

executable file
·
147 lines (108 loc) · 11.3 KB

QTLToolKit

DOI

Languages

QTLToolKit is a pipeline written in the BASH, Perl and Python languages to run cis- or trans- quantitative trait loci analyses using QTLtools and tensorQTL efficiently. It will aid in selecting the region, get descriptive statistics on the variants used, parse results, create diagnostic plots where necessary, create LocusZoom style regional association plots when needed, and concatenate everything in tables.

All scripts are annotated for debugging purposes - and future reference. The scripts will work within the context of a certain Linux environment, in this case a CentOS7 system on a SUN Grid Engine background, using qsub submission systems. As such we have tested QTLToolKit on CentOS6.6, CentOS7, and OS X El Capitan+ (version 10.11.[x]).

back-to-top


INSTALLATION

You can use the scripts locally to run analyses on a Unix-based system, like Mac OS X (El Capitan+). We need to make an appropriate directory to download git to, and install this git.

Step 1: make a directory, and go there.
mkdir -p ~/git/ && cd ~/git
Step 2: clone this git, unless it already exists.
if [ -d ~/git/QTLToolKit/.git ]; then \
		cd ~/git/QTLToolKit && git pull; \
	else \
		cd ~/git/ && git clone https://github.com/swvanderlaan/QTLToolKit.git; \
	fi

back-to-top


USAGE

The only script the user should use is the QTLAnalyzer.sh script in conjunction with a configuration file qtl.conf.

By typing...

bash QTLAnalyzer.sh $(pwd)/qtl.conf

...the user will control what analysis will be done. Simply typing bash QTLAnalyzer.sh will produce an extensive error-message explaining what arguments are expected. Note: it is absolutely pivotal to use $(pwd) to indicate the whole path to the configuration file, because this is used by the script(s) for the creation of directories etc.

In addition, there are multiple scripts that work in union or solo. Below a description of each.

Script Description Location Usage
prepare_QTL.sh You can use this scripts to prepare the input-data. Root Standalone
This script will:
- create, and index the 'phenotype' bed-file, i.e.
the expression or methylation data.
- convert VCF files from bgen-files.
QTLAnalyzer.sh Analysis script. This is the 'master' script that Root Main script
controls the whole analysis in conjunction with the
configuration file.
qtl.config The configuration file in which you can add/edit paths Configuration
NominalResultsParser.py Parses the nominal QTL analysis results for downstream QTLToolKit
workflow.
QTL_QC.R Quality control of QTL analysis results. QTLToolKit
QTLChecker.sh Checks and wraps up the QTL analysis results. Root QTLToolKit
QTLClumpanator.py Clumps results to focus only on a particular list of Root QTLToolKit
variants/loci.
QTLPlotter.sh Creates relevant plots of analysis results, including Root QTLToolKit
LocusZoom plots after eQTL analysis results. Note:
regional association plots are (so far) not possible
for mQTL analyses, as there are many CpGs in a typical
analysis. This would require a bit more integrated
plotting to show the regional effects on proximal CpGs.
QTLSummarizer.sh Summarises the QTL-analysis results into a folder, and Root QTLToolKit
zipped files. Including a small, incomplete analysis
report (see TO DO list below).
QTLSumEditor.py Adds linkage disequilibrium r^2 when CLUMP="Y" into Root QTLToolKit
the summarised results.
QTLSumParser.py Parses some of the summarised results when CLUMP="Y". Root QTLToolKit
parse_clumps_eqtl.pl Script to parse clumps of QTL results. Root/SCRIPTS Legacy
parse_input.py Utility script to get the number of loci per chromosome. Root/SCRIPTS QTLToolKit
BED_Annotation_Creator.R Create appropriate annotation file for QTL analyses. Root/SCRIPTS Standalone
BED_Creator_DNAmArrays.R Create the required BED files from DNAmArray data. Root/SCRIPTS Standalone
SE_Creator.R Create a SummarizedExperiment object in R of the Root/SCRIPTS Standalone
expression, methylation or other 'omics'-data.
parseTable.pl Utility script to parse a table. Root/SCRIPTS QTLToolKit/Standalone
removedupes.pl Remove duplicate lines from a text-table. Root/SCRIPTS QTLToolKit/Standalone
runFDR_cis.R Correct for false-discovery rate (FDR), used for Root/SCRIPTS QTLToolKit
functional density, and Regulatory Trait Cconcordance
(RTC), and functional enrichment analysis.
QTLTransHitParser.py Parser of trans-QTL-analysis results. Root QTLToolKit (BETA)
runFDR_ftrans.R Correct for FDR, used after trans-QTL-analysis. Root/SCRIPTS QTLToolKit (BETA)
runFDR_atrans.R Extract and FDR-correct adjusted and permuted trans- Root/SCRIPTS QTLToolKit (BETA)
QTL-analysis results.
plotTrans.R Plot trans-QTL analysis results. Root/SCRIPTS QTLToolKit (BETA)

back-to-top


COMMON WORKFLOW

QTLToolKit Workflow (version: beta 1.9.0)

back-to-top


TO DO

There are definitely improvements needed. Below of things I'd like to add or edit in the (near) future. Priorities are indicated according to MoSCoW (must have, should have, could have, would have)

  • - M - simplify the configuration using a config-file, similar to GWASToolKit
  • - C - add proper --help flag
  • - C - clean up codes further, especially with respect to the various error-flags
  • - C - add in checks of the environment, similar to slideToolkit scripts
  • - M - add in some code to produce a simple report
  • - M - edit QTL_QC.R script
    • - S - to check the delimiter automatically of the annotation file
    • - S - add in the data.table() to read and write tables using fread or fwrite
    • - M - the eQTL-part (nom/perm for cis) to match with the new 'strand' column (as the column numbers have changed by the addition of the 'strand' column in the output)
    • - M - double check the trans-QTL-part to match with the new 'strand' column (as the column numbers have changed by the addition of the 'strand' column in the output)
    • - M - double check the mQTL-part to match with the new 'strand' column (as the column numbers have changed by the addition of the 'strand' column in the output)
  • - C - add an annotation creation script
  • - M - add a routine (somewhere) to remove CpGs (probes) containing SNPs or that map to multiple locations. Refer to: Zhou W. et al. Nucleic Acids Res. 2016.
  • - M - add in script to create BED-files.
  • - M - update workflow image.

back-to-top


The MIT License (MIT)

Copyright (c) 2014-2024 Sander W. van der Laan (s.w.vanderlaan [at] gmail [dot] com) | Lennart L.P. Landsmeer | Jacco Schaap.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Reference: http://opensource.org.