-
Notifications
You must be signed in to change notification settings - Fork 4
Bioinformatics
Andrea Telatin edited this page Dec 3, 2019
·
9 revisions
This is a one-step procedure: when you download a new FASTA file to be used as reference for an alignment, you have to index it first.
bwa index ~/learn_bash/phage/vir_genomic.fna
Now we can align sequences. If you created a simple FASTSA file with some fractions of the reference (suppose you called them ~/learn_bash/phage/seq.fa):
bwa mem ~/learn_bash/phage/vir_genomic.fna ~/learn_bash/phage/seq.fa > ~/first_alignment.sam
Time to inspect your first SAM file!
bwa mem ~/learn_bash/phage/vir_genomic.fna ~/learn_bash/phage/vir_reads1.fq > ~/phage.sam
samtools is the swiss-army knife for manipulating SAM files. We will see only the minimal pipeline to convert a SAM file to its binary version (BAM), sorting it by coordinate an finally indexing it.
This is a mock workflow: try to adapt it:
# Convert SAM to BAM: two alternatives
samtools view -b -T {reference} {sam_file} > {bam_output}
samtools view -b -S {sam_file} > {bam_output}
# Sort a BAM file
samtools sort -o {sorted_bam} {unsorted_bam}
# Indexing a _sorted_ BAM file
samtools index {sorted_bam}
# see with an 'ls' that a new file has been created
· Bioinformatics at the Command Line - Andrea Telatin, 2017-2020
Menu