This repo contains bash scripts and a snakemake workflow for performing bulk transcriptomic analyses, specifically using samples originating from human tissue/cell culture. Workflow can be used for sequencing reads in paired or single-end form generated using Illumina HiSeq/MiSeq or BGI Genomics.
- Briefly, the steps consist of quality control to assess read contamination and base quality, adapter detection and trimming (Illumina only; please provide adapter sequences in-case using another technology), read alignment against the human reference genome/transcriptome (build hg38/GRCh38), quality control of the aligned BAM files, sorting and indexing of BAM files, gene-level quantification, differential gene expression analyses, gene set enrichment analyses, and protein pathway analyses.
Tools and dependencies:
- snakemake=7.21.0 (Use v.5.2.2 in case running on a cluster)
- snakemake-minimal=7.21.0
- python=3.11.0
- pandas = 0.23
- star=2.7.0
- fastp=0.20.1-0
- subread=2.0.1-0
- multiqc=1.9-0
- yaml = 0.2.5
- fastqc =0.11.9=0
- samtools=1.3.1
- salmon=1.4.0
- tximport
- DESEQ2
- Clusterprofiler
- GSEA
- Gene Ontology
- STRINGDB
The Snakemake workflow is still under progress and contents will be modified frequently. Below is the current DAG of the rules used in the workflow.
An additional README is available on how to specifically run the snakemake workflow in a slurm HPC environment. Instructions on workflow installation and setup using Conda/Mamba are also detailed.
Author: Shweta Pipaliya
Date Updated: 17.2.2023