Skip to content

LorenaDerezanin/pipeline_test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Variant calling pipeline

A Snakemake workflow for calling and annotation of short variants.
Workflow takes paired-end Illumina short read data (fastq files) as input and outputs annotated variant calls in a vcf file as the final result. Input directory contains PE Illumina reads from a publicly available SARS-CoV-2 dataset SRA accession SRR15660643 downsampled to 16000 paired reads (sample.R1.paired.fq.gz and sample.R2.paired.fq.gz).
A fasta file with the Wuhan-Hu-1 reference genome Genbank accession MN908947.3 is included in the
reference directory (MN908947.3.fasta), along with the VEP cache for successful annotation of genomic features.

Usage

git clone https://github.com/LorenaDerezanin/pipeline_test

Step 1: Install Miniconda

Minimal conda installer for running pipeline in an isolated conda environment to avoid dependency hell and ensure reproducibility.

Step 2 (Recommended): Install mamba - faster package manager

conda install mamba -n base -c conda-forge

Recommended installation to speed up env setup. Mamba is a more robust and faster package manager (parallel download of data), and handles releases and dependencies better than conda. If continuing with conda, mamba command should be replaced with conda in Step 3.

Step 3: Recreate conda environment

cd pipeline_test/

mamba env create -n snek -f envs/snek.yml

Step 4: Activate environment

conda activate snek

Step 5: Run pipeline

snakemake --use-conda --cores 4 --verbose

Number of suggested --cores when running pipeline locally, should be increased if running on a cluster.

Troubleshooting

If conda fails to install snakemake v.6.15, install snakemake with mamba: mamba install snakemake.

Pipeline content

Bioinformatics tools used in the Snakemake workflow, in the form of snakemake wrappers obtained from The Snakemake Wrappers Repository:

  • fastQC
  • multiQC
  • trim_galore
  • bwa
  • samtools
  • picard
  • freebayes
  • bcftools
  • vep
  • to do:
    • Docker container + conda/mamba
    • AWS/Google cloud deployment
    • unit tests

About

Variant calling pipeline (Snakemake workflow)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages