Skip to content

Bioinformatics

Andrea Telatin edited this page Dec 3, 2019 · 9 revisions

Short reads mapping with BWA

Indexing the genome

This is a one-step procedure: when you download a new FASTA file to be used as reference for an alignment, you have to index it first.

bwa index  ~/learn_bash/phage/vir_genomic.fna

Alignment: first test!

Now we can align sequences. If you created a simple FASTSA file with some fractions of the reference (suppose you called them ~/learn_bash/phage/seq.fa):

bwa mem ~/learn_bash/phage/vir_genomic.fna ~/learn_bash/phage/seq.fa > ~/first_alignment.sam

Time to inspect your first SAM file!

Alignment: a dataset

bwa mem  ~/learn_bash/phage/vir_genomic.fna ~/learn_bash/phage/vir_reads1.fq > ~/phage.sam

Samtools primer

samtools is the swiss-army knife for manipulating SAM files. We will see only the minimal pipeline to convert a SAM file to its binary version (BAM), sorting it by coordinate an finally indexing it.

This is a mock workflow: try to adapt it:

# Convert SAM to BAM: two alternatives
samtools view -b -T {reference} {sam_file} > {bam_output}
samtools view -b -S {sam_file} > {bam_output}
 
# Sort a BAM file 
samtools sort -o {sorted_bam} {unsorted_bam} 
 
# Indexing a _sorted_ BAM file
samtools index {sorted_bam}
# see with an 'ls' that a new file has been created

Menu

Clone this wiki locally