SCRAP: a bioinformatic pipeline for the analysis of small chimeric RNA-seq data

Mills WT, Eadara S, Jaffe AE, Meffert MK. 2023. SCRAP: a bioinformatic pipeline for the analysis of small chimeric RNA-seq data. RNA 29: 1–17. doi:10.1261/rna.079240.122

File or Directory Name	Description
adatpers/	Contains adapter sequences for CLASH and CLEAR-CLIP
annotation/	Contains annotation files for human, mouse, and c elegans, as well as miRNA family file.
bin/	Contains SCRAP scripts
fasta/	contains miRBase and tRNA FASTAs
PLATFORM-SETUP.md	Preliminary setup instructions for each compatible platform
README.md	Contains instructions to install and run SCRAP
SCRAP_environment.yml	File for creating a Conda environment with the requisite tools for running SCRAP

adapters

File Name	Description
CLASH_Human_Adapters.txt	Adapters used in the CLASH libraries
CLEAR-CLIP_Mouse_Adapters.txt	Adapters used in the CLEAR-CLIP libraries

annotation

File Name	Description
miR_Family.txt	Tab-delimited list detailing which miRNA families contain which miRBase miRNAs
mouse.annotation.bed.gz	Mouse genome annotation file used to annotate peaks after calling
human.annotation.bed.gz	Human genome annotation file used to annotate peals after calling
worm.annotation.bed.gz	C. elegans genome annotation file used to annotate peaks after calling

bin

File Name	Description
Reference_Installation.sh	Script for configuring reference files required for SCRAP
SCRAP.sh	Script for processing raw FASTQ files to identify sncRNA and genomic alignment of reads
Peak_Calling.sh	Script for calling peaks using output files from SCRAP.sh
Peak_Annotation.sh	Script for annotating bed file produced by Peak_Calling.sh with gene names and features

fasta

File Name	Description
miRBase.fasta	FASTA file containing miRNAs downloaded from miRBase (accessed July 15, 2022)
miRBase.hairpin.fasta	FASTA file containing miRNA hairpin sequences obtained from miRBase (accessed July 15, 2022)
GtRNAdb.fasta	FASTA file containing tRNA sequences obtained from GtRNAdb (accessed July 15, 2022)
tRFdb.fasta	FASTA file containing tRNA fragment sequences obtained from miRBase (accessed July 15, 2022)

Installation

SCRAP is a cross-platform pipeline that can be used in Windows Subsystem for Linux, MacOS, and Ubuntu. In order to install SCRAP, you need one of these platforms as well as Git and Miniconda. See PLATFORM-SETUP.md for specific instructions.

Once in the directory where you would like the SCRAP source to be cloned, run:

git clone https://github.com/Meffert-Lab/SCRAP.git

Create the Conda environment by running (requires Miniconda, see PLATFORM-SETUP.md):

conda install -n base conda-forge::mamba
mamba env create -f SCRAP/SCRAP_environment.yml -n SCRAP

Note: as of Dec 2022, bioconda does not build for osx-arm64. If you are using an M1 Mac, please try the following workaround:

conda create -n SCRAP python=3.8
conda activate SCRAP
conda config --env --set subdir osx-64
conda env update --file SCRAP/SCRAP_environment.yml --prune

You can find more information about this at the following links:

Execute the Reference_Installation.sh script with the following command line parameters:

Flag	Description
`-r`	Path to reference directory (e.g. `SCRAP`)
`-m`	Three-letter miRBase species abbreviation
`-g`	Reference genome abbreviation
`-s`	Indicate species used for annotation (`human` (H. sapiens), `mouse` (M. musculus), or `worm` (C. elegans)

Note: You should check NCBI for the latest reference genome available for the species you are using, as these change over time.

Three-letter miRBase Species Abbreviations

Abbreviation	Species
hsa	Homo sapiens
mmu	Mus musculus
rno	Rattus norvegicus
dme	Drosophila melanogaster
cel	Caenorhabditis elegans
ath	Arabidopsis thaliana

Example code for configuring human references:

bash SCRAP/bin/Reference_Installation.sh \
    -r SCRAP/ \
    -m hsa \
    -g hg38 \
    -s human

Running SCRAP

Ensure data files are in the following configuration

│───SCRAP
│	│
│	│
│	└───bin        
│	│	SCRAP.sh
│	│	Peak_Calling.sh
│	│	Peak_Annotation.sh
│	│	Reference_Installation.sh
│	│
│	└───fasta
│	│	miRBase.fasta
│	│	miRBase.hairpin.fasta
│	│	GtRNAdb.fasta
│	│	tRFdb.fasta
│	└───annotation
│		human.annotation.bed
│		miR_Family.txt
│		mouse.annotation.bed
│		worm.annotation.bed
│
│
│
└───files 
	│
	│
	└───sample1
	│	sample1_R1.fastq.gz
	│	sample1_R2.fastq.gz
	│
	└───sample2
	│	sample2_R1.fastq.gz
	│	sample2_R2.fastq.gz
	│
	└───sample3
		sample3_R1.fastq.gz
		sample3_R2.fastq.gz

Execute the SCRAP.sh script with the following command line parameters:

Flag	Description
`-d`	Path to directory containing sample directories
`-a`	Path to adapter file
`-p`	Denote wether samples are paired-end (`yes` or `no`)
`-f`	Indicate whether or not to filter out pre-miRNAs and tRNAs (`yes` or `no`)
`-r`	Path to reference directory (e.g. `SCRAP`)
`-m`	Three-letter miRBase species abbreviation
`-g`	Reference genome abbreviation

The adapter file is a tab-delimited .txt file containing the sample name, 5' adapter, 3' adapter, 5' barcode, and 3' barcode. This file can be generated with a text editor or in Excel and saved as a tab-delimited text file.

Example code for analyzing CLASH data:

bash SCRAP/bin/SCRAP.sh \
    -d CLASH_Human/ \
    -a CLASH_Human/CLASH_Human_Adapters.txt \
    -p no \
    -f yes \
    -r SCRAP/ \
    -m hsa \
    -g hg38

After data have been analyzed with SCRAP.sh, sample folders will contain a file ending in .aligned.unique.bam

Peak Calling

The .aligned.unique.bam file produced by SCRAP.sh can be used to identify peaks where multiple sncRNAs or sncRNA familiy members bind to the same region of the genome.

Execute the Peak_Calling.sh script with the following command line parameters:

Flag	Description
`-d`	Path to directory containing sample directories
`-a`	Path to adapter file
`-c`	Indicate the minimum number of reads required to identify a peak
`-l`	Indicate the minimum number of libraries that a peak must be supported by
`-f`	Indicate whether or not peaks should be called by grouping sncRNAs into families (`yes` or `no`)
`-r`	Path to reference directory (e.g. `SCRAP`)
`-m`	Three-letter miRBase species abbreviation
`-g`	Reference genome abbreviation

The adapter file can be the same as the adapter file used when running SCRAP or simply a .txt file with one sample name per row.

Example code for calling peaks with CLASH data previously analyzed with SCRAP.sh:

bash SCRAP/bin/Peak_Calling.sh \
    -d CLASH_Human/ \
    -a CLASH_Human/CLASH_Human_Adapters.txt \
    -c 3 \
    -l 2 \
    -f no \
    -r SCRAP/ \
    -m hsa \
    -g hg38

Peak calling will generate a peaks.bed (or peaks.family.bed) and peakcalling.summary.txt (or peakcalling.family.summary.txt) file in the directory denoted with the -d flag.

Peak Annotation

The peaks.bed (or peaks.family.bed) file produced by Peak_Calling.sh can be annotated with gene names and features. Currently, this function is only available for annotating peaks called for data from human (H. sapiens), mouse (M. musculus), or worm (C. elegans).

Execute the Peak_Annotation.sh script with the following command line parameters:

Flag	Description
`-p`	Path to peaks.bed or peaks.family.bed file
`-r`	Path to reference directory (e.g. `SCRAP`)
`-s`	Indicate species used for annotation (`human` (H. sapiens), `mouse` (M. musculus), or `worm` (C. elegans)

Example code for annotating peaks with CLASH data idnetified using Peak_Calling.sh:

bash SCRAP/bin/Peak_Annotation.sh \
    -p CLASH_Human/peaks.bed \
    -r SCRAP/ \
    -s human

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCRAP: a bioinformatic pipeline for the analysis of small chimeric RNA-seq data

Contents

adapters

annotation

bin

fasta

Installation

Three-letter miRBase Species Abbreviations

Running SCRAP

Peak Calling

Peak Annotation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
adapters		adapters
annotation		annotation
bin		bin
fasta		fasta
PLATFORM-SETUP.md		PLATFORM-SETUP.md
README.md		README.md
SCRAP_environment.yml		SCRAP_environment.yml

SreeniEadara/SCRAP

Folders and files

Latest commit

History

Repository files navigation

SCRAP: a bioinformatic pipeline for the analysis of small chimeric RNA-seq data

Contents

adapters

annotation

bin

fasta

Installation

Three-letter miRBase Species Abbreviations

Running SCRAP

Peak Calling

Peak Annotation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages