Snakemake_plant_methylSeekR

Snakemake pipeline for analysis of bisulfite-seq data

Aim

Snakemake pipeline made for reproducible analysis of paired-end Illumina bisulfite-seq data Mapping and methylation calling is done with the tool BSseeker2. Identification of regions of low or no methylation is based on MethylSeekR. The pipeline contains a few added features to make it more suitable for the analysis of plant samples.

Content of the repository

Snakefile, containing the targeted output and the rules to generate them from the input files.
data/, folder containing a subset of a couple of paired-end fastq files used to test the pipeline locally(tomato leaf bisulfite genomic sequence reads; SRR503393).
genome/, folder containing a small fragment of chromosome 12 of the tomato genome, to be used for the local test.
envs/, folder containing the environments needed for the Snakefile to run. To use Snakemake, it is required to create and activate an environment containing snakemake (envs/global_env.yaml )
samples.tsv, is a tab separated value file containing information about the used samplesnames (name of used species, tissue, ...) and the path to the fastq files relative to the Snakefile. Change this file according to your samples.

Usage

Conda environment

First, you need to create an environment for the use of Snakemake with Conda package manager.

Create a virtual environment named "BSanalysis" from the global_env.yaml file with the following command: conda env create --name BSanalysis --file envs/global_env.yaml
Then, activate this virtual environment with conda activate BSanalysis

The Snakefile will then take care of installing and loading the packages and software required by each step of the pipeline.

Configuration file

The configs.yaml file specifies the sample list (sample.tsv), the genomic reference fasta file to use, the directories to use, etc. This file is then used to build parameters in the main Snakefile.

Snakemake execution

The Snakemake pipeline/workflow management system reads a master file (often called Snakefile) to list the steps to be executed and defines their order. It has many rich features. More info on snakemake.

Samples

Samples are listed in the samples.tsv file and will be used by the Snakefile automatically. Change the name accordingly.

Dry run

Use the command snakemake -np to perform a dry run that prints out the rules and commands.

Real run

Simply type Snakemake --use-conda and provide the number of cores with --cores 10 for ten cores for instance. For cluster execution, please refer to the Snakemake reference. Please pay attention to --use-conda, it is required for the installation and loading of the dependencies used by the rules of the pipeline.

Main outputs

bed files containing the unmethylated (UMR) and low methylated (LMR) regions, separated in CG, CCG, CWG, CHG and CHH.
bed file containing "active" regions, ea regions in wich C's in both CG and CHG context are unmethylated.
log files containing reports of the fastP, BSseeker2 and methylcalling steps.

Parameters

The settings as given, is optimized to plant samples. Can be altered in the config.yaml.

Directed Acyclic Graph of jobs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake_plant_methylSeekR

Aim

Content of the repository

Usage

Conda environment

Configuration file

Snakemake execution

Samples

Dry run

Real run

Main outputs

Parameters

Directed Acyclic Graph of jobs

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
BSseeker2		BSseeker2
data		data
envs		envs
genome		genome
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
dag.png		dag.png
samples.tsv		samples.tsv

KoesGroup/Snakemake_plant_methylSeekR

Folders and files

Latest commit

History

Repository files navigation

Snakemake_plant_methylSeekR

Aim

Content of the repository

Usage

Conda environment

Configuration file

Snakemake execution

Samples

Dry run

Real run

Main outputs

Parameters

Directed Acyclic Graph of jobs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages