Skip to content

mpaya/epigenomics_pipeline

 
 

Repository files navigation

DOI

biotools:Epigenomics_Workflow_on_Galaxy_and_Jupyter

Epigenomics Workflow on Galaxy and Jupyter, RRID:SCR_017544

Epigenomics Workflow on Galaxy and Jupyter

Over the last decade, extensive epigenomics data is being generated. Data analysis may be challenging, and usually requires bioinformatics knowledge. Here, we present a 2-step full pipeline for combined ChIP-Seq and RNA-Seq data analysis.

Contents

Two Docker images were prepared to run the analysis in a coordinated way.

  • First, a container running Galaxy will run the bulk analysis of ChIP-Seq and RNA-Seq data. The workflows are designed to download data from SRA and export results locally. Major steps in workflows are:
    • Trimming with Trimmomatic
    • Mapping with Bowtie2
    • ChIP-Seq:
      • Alignment filtering and deduplication
      • Generation of BigWig files
      • Peak calling with MACS2 and epic2
    • RNA-Seq:
      • Read counting
      • Differential expression analysis with DESeq2
  • The second container running Jupyter will use the files generated by Galaxy and finish data analysis. Two notebooks are provided with a preview of results from each command cell. They run:
    • ChIP-Seq:
      • Differential binding with MAnorm
      • Peak annotation
      • Metagene/heatmap plots of read distribution on genes
    • Complete dataset:
      • Functional annotation of results
      • Combination of ChIP-Seq and RNA-Seq results
      • Generation of tables and figures

Additionally, a script is provided to specifically run data analysis on a Brassica dataset. In such case, Galaxy will download the raw sequencing files from SRA and run the analysis.

Usage

Docker

To use the images, Docker needs to be installed in the system (link to documentation). Basic docker commands are:

  • docker images: show all downloaded/built images.
  • docker run: download (if needed) and run a docker image. A container is launched as an instance of that image. Multiple options are available to handle the interaction between local system and container.
  • docker ps -a: list all containers.
  • docker stop <my_container>: stop running container.
  • docker start <my_container>: start a stopped container.
  • docker exec -it <my_container> bash: access a container from a terminal.
  • docker rm <my_container>: delete a stopped container.
  • docker rmi <image_id>: delete a docker image.

Galaxy in Docker

The epigenomics Galaxy image is based on bgruening/galaxy-stable (link). The key additions are:

  • The default user has administrative permissions
  • Tools to run the epigenomics analysis are pre-installed
  • Workflows are provided to run ChIP-Seq and RNA-Seq data analysis
  • Accessory files to run Brassica data analysis

The workflows are designed to start from two-column text files indicating SRA accession numbers on the first column and file names on the second column. The default workflows use paired-end reads ChIP-Seq data and single-end reads RNA-Seq data. They can be customized to modify this behavior.

Quick start

Initialize the container.

mkdir -p ~/DockerFolders/run_v1

docker run \
-d \
-v ~/DockerFolders/run_v1:/export/ \
-p 8080:80 \
--name "${cont_name}" \
mpaya/epigenomics_galaxy:1.0

Run Brassica data analysis.

cd ~/DockerFolders/run_v1/galaxy-central/lib/brassica_data/
bash run_analysis.sh 8080

Jupyter in Docker

The epigenomics Jupyter image is based on jupyter/datascience-notebook (link). It contains:

  • Kernels
    • Python
    • R
    • bash
    • Julia
  • Software
    • R and Python libraries for data analysis
    • MAnorm
    • ngs.plot
    • Miniconda 2 and 3
  • Notebooks
    1. Bash notebook for differential binding analysis and ChIP-Seq data plotting
    2. R notebook for results annotation and visualization

Quick start

Running a container on a local machine automatically opens Jupyter in a web browser. In Jupyter, the export folder is ~/work.

docker run \
-p 8888:8888 \
--name nb1 \
-v ~/DockerFolders/run_v1/analysis:/home/jovyan/work \
mpaya/epigenomics_jupyter:2.0

To continue with Brassica analysis, ~/work is mapped to the analysis folder created when running the Brassica data analysis on Galaxy for Jupyter to find and load results.

Output

Results are stored on the folder first created when running Galaxy, in this example ~/DockerFolders/run_v1/analysis. In summary, results consist of:

  • Galaxy
    • Basic read statistics (MultiQC)
    • Alignment files (.bam)
    • Track files (.bigwig)
    • ChIP-Seq peaks (.bed)
    • RNA-Seq results (counts and DEGs)
  • Jupyter
    • Differentially bound peaks (table from MAnorm)
    • Annotated peaks
    • Metagene plots and heatmaps
    • Other figures and tables

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 98.5%
  • Other 1.5%