Skip to content

Commit

Permalink
feat: add --singcache and set default singularity cache dir
Browse files Browse the repository at this point in the history
fixes #108
  • Loading branch information
kelly-sovacool committed Sep 10, 2024
1 parent 0b417c9 commit 8e2c529
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 59 deletions.
13 changes: 7 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# CHARLIE development version

- Major updates to convert CHARLIE from a biowulf-specific to a platform-agnostic pipeline (#102, @kelly-sovacool):
- All rules now use containers instead of envmodules.
- Default config and cluster config files are provided for use on biowulf and FRCE.
- New entry `TEMPDIR` in the config file sets the temporary directory location for rules that require transient storage.
- All rules now use containers instead of envmodules.
- Default config and cluster config files are provided for use on biowulf and FRCE.
- New entry `TEMPDIR` in the config file sets the temporary directory location for rules that require transient storage.
- New `--singcache` argument to provide a singularity cache dir location. The singularity cache dir is automatically set inside `/data/$USER/` or `$WORKDIR/` if `--singcache` is not provided.

# CHARLIE 0.10.1

Expand All @@ -29,7 +30,7 @@ Significant upgrades since the last release:
- new job reporting using jobby and its derivatives
- separated creation of BWA and BOWTIE2 index from creation of STAR index to speed things up
- parallelized find_circ
- better cleanup (eg. deleting _STARgenome folders, etc.) for much smaller digital footprint
- better cleanup (eg. deleting \_STARgenome folders, etc.) for much smaller digital footprint
- multitude of comments throughout the snakefiles including listing of output file column descriptions
- preliminary GH actions added

Expand Down Expand Up @@ -64,8 +65,8 @@ Significant upgrades since the last release:
# CHARLIE 0.6.1

- customBSJs recalled from STAR alignments
- only for PE
- removes erroneously called CircExplorer BSJs
- only for PE
- removes erroneously called CircExplorer BSJs
- create sense and anti-sense BSJ BAMs and BW for each reference (host+viruses)
- find reads which contribute to CIRI BSJs but not on the STAR list of BSJ reads, see if they contribute to novel (not called by STAR) BSJs and append novel BSJs to customBSJ list

Expand Down
108 changes: 55 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,50 @@
# CHARLIE
![img](https://img.shields.io/github/issues/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/forks/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/stars/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/license/CCBR/CHARLIE?style=for-the-badge)

![img](https://img.shields.io/github/issues/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/forks/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/stars/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/license/CCBR/CHARLIE?style=for-the-badge)

### Table of Contents

- [CHARLIE - **C**ircrnas in **H**ost **A**nd vi**R**uses ana**L**ysis p**I**p**E**line](#charlie)
- [Table of Contents](#table-of-contents)
- [1. Introduction](#1-introduction)
- [2. Flowchart](#2-flowchart)
- [3. Software Dependencies](#3-software-dependencies)
- [4. Usage](#4-usage)
- [5. License](#5-license)
- [6. Testing](#6-testing)
- [6.1 Test data](#61-test-data)
- [6.2 Expected output](#62-expected-output)
- [Table of Contents](#table-of-contents)
- [1. Introduction](#1-introduction)
- [2. Flowchart](#2-flowchart)
- [3. Software Dependencies](#3-software-dependencies)
- [4. Usage](#4-usage)
- [5. License](#5-license)
- [6. Testing](#6-testing)
- [6.1 Test data](#61-test-data)
- [6.2 Expected output](#62-expected-output)

### 1. Introduction

**C**ircrnas in **H**ost **A**nd vi**R**uses ana**L**ysis p**I**p**E**line
**C**ircrnas in **H**ost **A**nd vi**R**uses ana**L**ysis p**I**p**E**line

Things to know about CHARLIE:

- Snakemake workflow to detect, annotate and quantify (DAQ) host and viral circular RNAs.
- Primirarily developed to run on [BIOWULF](https://hpc.nih.gov/)
- Reach out to [Vishal Koparde](mailto:vishal.koparde@nihgov) for questions/comments/requests.


This circularRNA detection pipeline uses CIRCExplorer2, CIRI2 and many other tools in parallel to detect, quantify and annotate circRNAs. Here is a list of tools that can be run using CHARLIE:

| circRNA Detection Tool | Aligner(s) | Run by default |
| ---------------------- | ---------- | -------------- |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | STAR<sup>1</sup> | Yes |
| [CIRI2](https://sourceforge.net/projects/ciri/files/CIRI2/) | BWA<sup>1</sup> | Yes |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | BWA<sup>1</sup> | Yes |
| [CLEAR](https://github.com/YangLab/CLEAR) | STAR<sup>1</sup> | Yes |
| [DCC](https://github.com/dieterich-lab/DCC) | STAR<sup>2</sup> | Yes |
| [circRNAFinder](https://github.com/bioxfu/circRNAFinder) | STAR<sup>3</sup> | Yes |
| [find_circ](https://github.com/marvin-jens/find_circ) | Bowtie2 | Yes |
| [MapSplice](https://github.com/merckey/MapSplice2) | BWA<sup>2</sup> | No |
| [NCLScan](https://github.com/TreesLab/NCLscan) | NovoAlign | No |
| circRNA Detection Tool | Aligner(s) | Run by default |
| ----------------------------------------------------------- | ---------------- | -------------- |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | STAR<sup>1</sup> | Yes |
| [CIRI2](https://sourceforge.net/projects/ciri/files/CIRI2/) | BWA<sup>1</sup> | Yes |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | BWA<sup>1</sup> | Yes |
| [CLEAR](https://github.com/YangLab/CLEAR) | STAR<sup>1</sup> | Yes |
| [DCC](https://github.com/dieterich-lab/DCC) | STAR<sup>2</sup> | Yes |
| [circRNAFinder](https://github.com/bioxfu/circRNAFinder) | STAR<sup>3</sup> | Yes |
| [find_circ](https://github.com/marvin-jens/find_circ) | Bowtie2 | Yes |
| [MapSplice](https://github.com/merckey/MapSplice2) | BWA<sup>2</sup> | No |
| [NCLScan](https://github.com/TreesLab/NCLscan) | NovoAlign | No |

> Note: STAR<sup>1</sup>, STAR<sup>2</sup>, STAR<sup>3</sup> denote 3 different sets of alignment parameters, etc.
> Note: BWA<sup>1</sup>, BWA<sup>2</sup> denote 2 different alignment parameters, etc.
### 2. Flowchart

![](docs/images/CHARLIE_v0.8.x.png)

For complete documentation with tutorial go [here](https://CCBR.github.io/CHARLIE/).
Expand All @@ -54,33 +55,32 @@ For complete documentation with tutorial go [here](https://CCBR.github.io/CHARLI

The following version of various bioinformatics tools are using within CHARLIE:

| tool | version |
| ------------- | --------- |
| blat | 3.5 |
| bedtools | 2.30.0 |
| bowtie | 2-2.5.1 |
| bowtie | 1.3.1 |
| bwa | 0.7.17 |
| circexplorer2 | 2.3.8 |
| cufflinks | 2.2.1 |
| cutadapt | 4.4 |
| fastqc | 0.11.9 |
| hisat | 2.2.2.1 |
| java | 18.0.1.1 |
| multiqc | 1.9 |
| parallel | 20231122 |
| perl | 5.34 |
| picard | 2.27.3 |
| python | 2.7 |
| python | 3.8 |
| sambamba | 0.8.2 |
| samtools | 1.16.1 |
| STAR | 2.7.6a |
| stringtie | 2.2.1 |
| ucsc | 450 |
| R | 4.0.5 |
| novocraft | 4.03.05 |

| tool | version |
| ------------- | -------- |
| blat | 3.5 |
| bedtools | 2.30.0 |
| bowtie | 2-2.5.1 |
| bowtie | 1.3.1 |
| bwa | 0.7.17 |
| circexplorer2 | 2.3.8 |
| cufflinks | 2.2.1 |
| cutadapt | 4.4 |
| fastqc | 0.11.9 |
| hisat | 2.2.2.1 |
| java | 18.0.1.1 |
| multiqc | 1.9 |
| parallel | 20231122 |
| perl | 5.34 |
| picard | 2.27.3 |
| python | 2.7 |
| python | 3.8 |
| sambamba | 0.8.2 |
| samtools | 1.16.1 |
| STAR | 2.7.6a |
| stringtie | 2.2.1 |
| ucsc | 450 |
| R | 4.0.5 |
| novocraft | 4.03.05 |

### 4. Usage

Expand Down Expand Up @@ -155,6 +155,7 @@ Required Arguments:

Optional Arguments:

--singcache|-c : singularity cache directory. Default is `/data/${USER}/.singularity` if available, or falls back to `${WORKDIR}/.singularity`. Use this flag to specify a different singularity cache directory.
--host|-g : supply host at command line. hg38 or mm39. (--runmode=init only)
--additives|-a : supply comma-separated list of additives at command line. ERCC or BAC16Insert or both (--runmode=init only)
--viruses|-v : supply comma-separated list of viruses at command line (--runmode=init only)
Expand Down Expand Up @@ -219,8 +220,8 @@ This will create the folder provided by `-w=`. The user should have write permis
Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `.tests/dummy_fastqs` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:
- sampleName
- path_to_R1_fastq
- sampleName
- path_to_R1_fastq
- path_to_R2_fastq
Column `path_to_R2_fastq` will be blank in case of single-end samples.
Expand All @@ -234,6 +235,7 @@ bash <path to charlie> -w=<path to output dir> -m=dryrun
This will create the reference fasta and gtf file based on the selections made in the `config.yaml`.
#### Run
If `-m=dryrun` was sucessful, then simply do `-m=run`. The output will look something like this
```
Expand Down Expand Up @@ -307,5 +309,5 @@ Expected output from the sample data is stored under `.tests/expected_output`.
More details about running test data can be found [here](https://ccbr.github.io/CHARLIE/tutorial).
> DISCLAIMER:
>
>
> CHARLIE is built to be run only on [BIOWULF](https://hpc.nih.gov). A newer HPC-agnostic version of CHARLIE is planned for 2024.
17 changes: 17 additions & 0 deletions charlie
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,8 @@ function run() {

preruncleanup

$EXPORT_SING_CACHE_DIR_CMD

snakemake -s $SNAKEFILE\
--directory $WORKDIR \
--printshellcmds \
Expand Down Expand Up @@ -520,6 +522,7 @@ function run() {
cd \$SLURM_SUBMIT_DIR
$MODULE_LOAD
$EXPORT_SING_CACHE_DIR_CMD
snakemake -s $SNAKEFILE \
--directory $WORKDIR \
Expand Down Expand Up @@ -618,6 +621,9 @@ function main(){
-w=*|--workdir=*)
WORKDIR="${i#*=}"
;;
-c=*|--singcache=*)
SING_CACHE_DIR="${i#*=}"
;;
-z|--changegrp)
CHANGEGRP=1
;;
Expand Down Expand Up @@ -645,6 +651,17 @@ function main(){
WORKDIR=$(readlink -f "$WORKDIR")
echo "Working Dir: $WORKDIR"

if [[ -z "$SING_CACHE_DIR" ]]; then
if [[ -d "/data/$USER" ]]; then
SING_CACHE_DIR="/data/$USER/.singularity"
else
SING_CACHE_DIR="${WORKDIR}/.singularity"
fi
echo "singularity cache dir (--singcache) is not set, using ${SING_CACHE_DIR}"
fi
mkdir -p $SING_CACHE_DIR
EXPORT_SING_CACHE_DIR_CMD="export SINGULARITY_CACHEDIR=\"${SING_CACHE_DIR}\""

# required files
CONFIGFILE="${WORKDIR}/config.yaml"
CLUSTERFILE="${WORKDIR}/cluster.json"
Expand Down
1 change: 1 addition & 0 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ Required Arguments:

Optional Arguments:

--singcache|-c : singularity cache directory. Default is `/data/${USER}/.singularity` if available, or falls back to `${WORKDIR}/.singularity`. Use this flag to specify a different singularity cache directory.
--host|-g : supply host at command line. hg38 or mm39. (--runmode=init only)
--additives|-a : supply comma-separated list of additives at command line. ERCC or BAC16Insert or both (--runmode=init only)
--viruses|-v : supply comma-separated list of viruses at command line (--runmode=init only)
Expand Down

0 comments on commit 8e2c529

Please sign in to comment.