biotools:Epigenomics_Workflow_on_Galaxy_and_Jupyter
Epigenomics Workflow on Galaxy and Jupyter, RRID:SCR_017544
Over the last decade, extensive epigenomics data is being generated. Data analysis may be challenging, and usually requires bioinformatics knowledge. Here, we present a 2-step full pipeline for combined ChIP-Seq and RNA-Seq data analysis.
Two Docker images were prepared to run the analysis in a coordinated way.
- First, a container running Galaxy will run the bulk analysis of ChIP-Seq and RNA-Seq data. The workflows are designed to download data from SRA and export results locally. Major steps in workflows are:
- Trimming with Trimmomatic
- Mapping with Bowtie2
- ChIP-Seq:
- Alignment filtering and deduplication
- Generation of BigWig files
- Peak calling with MACS2 and epic2
- RNA-Seq:
- Read counting
- Differential expression analysis with DESeq2
- The second container running Jupyter will use the files generated by Galaxy and finish data analysis. Two notebooks are provided with a preview of results from each command cell. They run:
- ChIP-Seq:
- Differential binding with MAnorm
- Peak annotation
- Metagene/heatmap plots of read distribution on genes
- Complete dataset:
- Functional annotation of results
- Combination of ChIP-Seq and RNA-Seq results
- Generation of tables and figures
- ChIP-Seq:
Additionally, a script is provided to specifically run data analysis on a Brassica dataset. In such case, Galaxy will download the raw sequencing files from SRA and run the analysis.
To use the images, Docker needs to be installed in the system (link to documentation). Basic docker commands are:
docker images
: show all downloaded/built images.docker run
: download (if needed) and run a docker image. A container is launched as an instance of that image. Multiple options are available to handle the interaction between local system and container.docker ps -a
: list all containers.docker stop <my_container>
: stop running container.docker start <my_container>
: start a stopped container.docker exec -it <my_container> bash
: access a container from a terminal.docker rm <my_container>
: delete a stopped container.docker rmi <image_id>
: delete a docker image.
The epigenomics Galaxy image is based on bgruening/galaxy-stable
(link). The key additions are:
- The default user has administrative permissions
- Tools to run the epigenomics analysis are pre-installed
- Workflows are provided to run ChIP-Seq and RNA-Seq data analysis
- Accessory files to run Brassica data analysis
The workflows are designed to start from two-column text files indicating SRA accession numbers on the first column and file names on the second column. The default workflows use paired-end reads ChIP-Seq data and single-end reads RNA-Seq data. They can be customized to modify this behavior.
Initialize the container.
mkdir -p ~/DockerFolders/run_v1
docker run \
-d \
-v ~/DockerFolders/run_v1:/export/ \
-p 8080:80 \
--name "${cont_name}" \
mpaya/epigenomics_galaxy:1.0
Run Brassica data analysis.
cd ~/DockerFolders/run_v1/galaxy-central/lib/brassica_data/
bash run_analysis.sh 8080
The epigenomics Jupyter image is based on jupyter/datascience-notebook
(link). It contains:
- Kernels
- Python
- R
- bash
- Julia
- Software
- R and Python libraries for data analysis
- MAnorm
- ngs.plot
- Miniconda 2 and 3
- Notebooks
- Bash notebook for differential binding analysis and ChIP-Seq data plotting
- R notebook for results annotation and visualization
Running a container on a local machine automatically opens Jupyter in a web browser. In Jupyter, the export folder is ~/work
.
docker run \
-p 8888:8888 \
--name nb1 \
-v ~/DockerFolders/run_v1/analysis:/home/jovyan/work \
mpaya/epigenomics_jupyter:2.0
To continue with Brassica analysis, ~/work
is mapped to the analysis
folder created when running the Brassica data analysis on Galaxy for Jupyter to find and load results.
Results are stored on the folder first created when running Galaxy, in this example ~/DockerFolders/run_v1/analysis
. In summary, results consist of:
- Galaxy
- Basic read statistics (MultiQC)
- Alignment files (.bam)
- Track files (.bigwig)
- ChIP-Seq peaks (.bed)
- RNA-Seq results (counts and DEGs)
- Jupyter
- Differentially bound peaks (table from MAnorm)
- Annotated peaks
- Metagene plots and heatmaps
- Other figures and tables