Epigenomics Workflow on Galaxy and Jupyter

biotools:Epigenomics_Workflow_on_Galaxy_and_Jupyter

Epigenomics Workflow on Galaxy and Jupyter, RRID:SCR_017544

Epigenomics Workflow on Galaxy and Jupyter

Over the last decade, extensive epigenomics data is being generated. Data analysis may be challenging, and usually requires bioinformatics knowledge. Here, we present a 2-step full pipeline for combined ChIP-Seq and RNA-Seq data analysis.

First, a container running Galaxy will run the bulk analysis of ChIP-Seq and RNA-Seq data. The workflows are designed to download data from SRA and export results locally. Major steps in workflows are:
- Trimming with Trimmomatic
- Mapping with Bowtie2
- ChIP-Seq:
  - Alignment filtering and deduplication
  - Generation of BigWig files
  - Peak calling with MACS2 and epic2
- RNA-Seq:
  - Read counting
  - Differential expression analysis with DESeq2
The second container running Jupyter will use the files generated by Galaxy and finish data analysis. Two notebooks are provided with a preview of results from each command cell. They run:
- ChIP-Seq:
  - Differential binding with MAnorm
  - Peak annotation
  - Metagene/heatmap plots of read distribution on genes
- Complete dataset:
  - Functional annotation of results
  - Combination of ChIP-Seq and RNA-Seq results
  - Generation of tables and figures

Additionally, a script is provided to specifically run data analysis on a Brassica dataset. In such case, Galaxy will download the raw sequencing files from SRA and run the analysis.

Usage

Docker

To use the images, Docker needs to be installed in the system (link to documentation). Basic docker commands are:

docker images: show all downloaded/built images.
docker run: download (if needed) and run a docker image. A container is launched as an instance of that image. Multiple options are available to handle the interaction between local system and container.
docker ps -a: list all containers.
docker stop <my_container>: stop running container.
docker start <my_container>: start a stopped container.
docker exec -it <my_container> bash: access a container from a terminal.
docker rm <my_container>: delete a stopped container.
docker rmi <image_id>: delete a docker image.

Galaxy in Docker

The epigenomics Galaxy image is based on bgruening/galaxy-stable (link). The key additions are:

The default user has administrative permissions
Tools to run the epigenomics analysis are pre-installed
Workflows are provided to run ChIP-Seq and RNA-Seq data analysis
Accessory files to run Brassica data analysis

The workflows are designed to start from two-column text files indicating SRA accession numbers on the first column and file names on the second column. The default workflows use paired-end reads ChIP-Seq data and single-end reads RNA-Seq data. They can be customized to modify this behavior.

Quick start

Initialize the container.

mkdir -p ~/DockerFolders/run_v1

docker run \
-d \
-v ~/DockerFolders/run_v1:/export/ \
-p 8080:80 \
--name "${cont_name}" \
mpaya/epigenomics_galaxy:1.0

Run Brassica data analysis.

cd ~/DockerFolders/run_v1/galaxy-central/lib/brassica_data/
bash run_analysis.sh 8080

Jupyter in Docker

The epigenomics Jupyter image is based on jupyter/datascience-notebook (link). It contains:

Kernels
- Python
- R
- bash
- Julia
Software
- R and Python libraries for data analysis
- MAnorm
- ngs.plot
- Miniconda 2 and 3
Notebooks
1. Bash notebook for differential binding analysis and ChIP-Seq data plotting
2. R notebook for results annotation and visualization

Quick start

Running a container on a local machine automatically opens Jupyter in a web browser. In Jupyter, the export folder is ~/work.

docker run \
-p 8888:8888 \
--name nb1 \
-v ~/DockerFolders/run_v1/analysis:/home/jovyan/work \
mpaya/epigenomics_jupyter:2.0

To continue with Brassica analysis, ~/work is mapped to the analysis folder created when running the Brassica data analysis on Galaxy for Jupyter to find and load results.

Output

Results are stored on the folder first created when running Galaxy, in this example ~/DockerFolders/run_v1/analysis. In summary, results consist of:

Galaxy
- Basic read statistics (MultiQC)
- Alignment files (.bam)
- Track files (.bigwig)
- ChIP-Seq peaks (.bed)
- RNA-Seq results (counts and DEGs)
Jupyter
- Differentially bound peaks (table from MAnorm)
- Annotated peaks
- Metagene plots and heatmaps
- Other figures and tables

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
ISA-Tab		ISA-Tab
docker_galaxy		docker_galaxy
docker_jupyter		docker_jupyter
LICENSE		LICENSE
METADATA1.json		METADATA1.json
METADATA2.json		METADATA2.json
PERSISTENCE.md		PERSISTENCE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Epigenomics Workflow on Galaxy and Jupyter

Contents

Usage

Docker

Galaxy in Docker

Quick start

Jupyter in Docker

Quick start

Output

About

Releases

Packages

Languages

License

mpaya/epigenomics_pipeline

Folders and files

Latest commit

History

Repository files navigation

Epigenomics Workflow on Galaxy and Jupyter

Contents

Usage

Docker

Galaxy in Docker

Quick start

Jupyter in Docker

Quick start

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages