Skip to content

Latest commit

 

History

History
68 lines (41 loc) · 3.79 KB

README.md

File metadata and controls

68 lines (41 loc) · 3.79 KB

Snakemake_plant_methylSeekR

Snakemake pipeline for analysis of bisulfite-seq data

Snakemake Miniconda

Aim

Snakemake pipeline made for reproducible analysis of paired-end Illumina bisulfite-seq data Mapping and methylation calling is done with the tool BSseeker2. Identification of regions of low or no methylation is based on MethylSeekR. The pipeline contains a few added features to make it more suitable for the analysis of plant samples.

Content of the repository

  • Snakefile, containing the targeted output and the rules to generate them from the input files.

  • data/, folder containing a subset of a couple of paired-end fastq files used to test the pipeline locally(tomato leaf bisulfite genomic sequence reads; SRR503393).

  • genome/, folder containing a small fragment of chromosome 12 of the tomato genome, to be used for the local test.

  • envs/, folder containing the environments needed for the Snakefile to run. To use Snakemake, it is required to create and activate an environment containing snakemake (envs/global_env.yaml )

  • samples.tsv, is a tab separated value file containing information about the used samplesnames (name of used species, tissue, ...) and the path to the fastq files relative to the Snakefile. Change this file according to your samples.

Usage

Conda environment

First, you need to create an environment for the use of Snakemake with Conda package manager.

  1. Create a virtual environment named "BSanalysis" from the global_env.yaml file with the following command: conda env create --name BSanalysis --file envs/global_env.yaml
  2. Then, activate this virtual environment with conda activate BSanalysis

The Snakefile will then take care of installing and loading the packages and software required by each step of the pipeline.

Configuration file

The configs.yaml file specifies the sample list (sample.tsv), the genomic reference fasta file to use, the directories to use, etc. This file is then used to build parameters in the main Snakefile.

Snakemake execution

The Snakemake pipeline/workflow management system reads a master file (often called Snakefile) to list the steps to be executed and defines their order. It has many rich features. More info on snakemake.

Samples

Samples are listed in the samples.tsv file and will be used by the Snakefile automatically. Change the name accordingly.

Dry run

Use the command snakemake -np to perform a dry run that prints out the rules and commands.

Real run

Simply type Snakemake --use-conda and provide the number of cores with --cores 10 for ten cores for instance. For cluster execution, please refer to the Snakemake reference. Please pay attention to --use-conda, it is required for the installation and loading of the dependencies used by the rules of the pipeline.

Main outputs

  • bed files containing the unmethylated (UMR) and low methylated (LMR) regions, separated in CG, CCG, CWG, CHG and CHH.
  • bed file containing "active" regions, ea regions in wich C's in both CG and CHG context are unmethylated.
  • log files containing reports of the fastP, BSseeker2 and methylcalling steps.

Parameters

The settings as given, is optimized to plant samples. Can be altered in the config.yaml.

Directed Acyclic Graph of jobs

dag