Step-by-Step-CNV-Pipeline

STEP 1: Base calling and Quality Control (QC)

o Most high throughput sequencers generate output in FastQ format.

o Raw data reads – .fastq (uncompressed) or .fastq.gz (compressed)

Platform	Basecall File Format	Basecaller
Illumina	Binary Base Call (BCL)
PacBio	Basecall File (bash.h5/bax.h5)
ONT	FASTQ	Dorado

o Dorado is a basecaller for Oxford Nanopore Technology (ONT) reads

o Base quality (BQ) score = Phred score. Base quality is important for variant calling.

o Q20 and higher is generally a good score.

o QC tool

- FASTQC is a quality control tool for high throughput sequence data.
- It is written in Java.
- http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- MultiQC is used to aggregate results from multiple FastQC runs into one single html report

Platform	QC
ONT	NanoQC

STEP 2: Trimming

o Involves removal of poor quality reads and trim adapter sequences

o Trimming tool

- http://www.usadellab.org/cms/?page=trimmomatic

Platform	Tool
Illumina	BBduk or Cutadapt
ONT	Porechop

STEP 3: Mapping or Alignment

o .sam – sequence alignment map; .bam – binary alignment map; .cram – reference based compressed alignment map

o Mapping quality (MQ) is a phred-score of the quality of the alignment

o Mapping methods

Burrows-Wheeler	Hashing
BWA, Bowtie	Stampy

o Mapping tools

Platform	Tool
Illumina	BWA MEM
ONT	Minimap2

o Post-mapping data QC tools

- Qualimap

STEP 4: Variant calling

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step-by-Step-CNV-Pipeline

About

Releases

Packages

cxw441/Step-by-Step-CNV-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Step-by-Step-CNV-Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages