Skip to content

Running

Kristy Horan edited this page May 31, 2023 · 18 revisions

Choosing your pipeline

bohra is made up of modules, these modules are grouped together into 7 possible pipelines. All pipelines require at least an input file with the sample ID and paths to read 1 and read2. If you have these files in a folder somewhere you can use

bohra generate_input --read_path <path_to_reads>

This will output a file called isolates.tab which can be used for all bohra pipelines.

All bohra pipelines will run through a setup step, checking inputs and setting up the directory and then output a nextflow command to run the pipeline. You can also use --proceed to automatically jump into the pipeline, once setup is complete.

Preview

The preview pipeline is great if you have an unfamilar dataset and just want to get a basic overview of each sequence and its suitability. This pipeline will generate basic read statistics, kmer IDs and also a mash tree so that you can identify anything that is particularly out pof place. To run this pipeline, all you need is the input file described above.

bohra run -i isolates.tab

SNPS

The SNP pipeline can be used if you simply want to identify variants in your genomes and don't require any assembly or typing. You will need the input file and a reference (both .gbk and .fa are supported).

bohra run -i isolates.tab -p snps --no_phylo -r <path_to_reference>

You can also provide a mask file to be used in the calculation of the core genome, to mask sites or phage or other areas of the genome that you wish to not be included in the core. This file should be a bed file

bohra run -i isolates.tab -p snps --no_phylo -r <path_to_reference> -m mask.bed

This will genrate all snippy output files, core genome alignments (from snippy-core) and pairwise SNP distance matrix, as well as read assessment and kmer ID.

Phylogeny

If you would like to generate a core genome alignment and also a phylogenetic tree you can run the phylogeny pipeline

bohra run -p phylogeny -i isolates.tab -r <path_to_reference> -m mask.bed (optional)

OR

bohra run -i isolates.tab -p snps -r <path_to_reference> -m mask.bed (optional)

This will generate all the same files as the SNPs pipeline but also a phyologenetic tree.

Assemble

This pipeline is particularly useful if you just want to assemble and annotate the genomes and do not require any variant or phylogenetic analysis. You can also provide existing assemblies that you just want to annotate. Be aware though that if not all assemblies are present you may end up with a mixture of assemblies - generated with different versions of tools or even tools. So please take care.

The paths to contigs should be provided in a tab-delimited file with the first column the sample ID (needs to be the same as what is in the isolates.tab) and the second the path to where contigs can be found.

bohra run -p assemble -i isolates.tab -c contigs.tab (optional)

This will generate read assessment, kmer ID, assemblies and assembly assessment, as well as annotation files.

AMR and typing

bohra comes with mlst, abritAMR and also a selection of in silico typing tools, described here. The typer used is determined by the kmer ID and so will only run if you have an appropriately formatted kraken database. By default, only AMR genes will be reported, however if you are interested in point mutations as well, you can use the --abritamr_args flag to provide the species or genus for which point mutations are to be idenftified. These are supplied from the AMRfinder plus database.

bohra run -p amr_typing -i isolates.tab -c contigs.tab (optional) --abritamr_args <your_species> (optional - to turn on point mutations)

This will output all the same results as the assemble pipeline, with resistome, virulome, mlst and serotyping (where available).

Default

The default pipeline is a good workhorse for general use. It will run the phylogeny and amr_typing pipelines, so ouptut snps, core genome alignments, phylogeny, assemble statistics, AMR, mlst and typers.

bohra run -p default -i isolates.tab -r <path_to_reference> -m mask.bed (optional)-c contigs.tab (optional) --abritamr_args <your_species> (optional - to turn on point mutations)

Full (previously pluspan)

This pipeline will run all the bohra modules (with the exception of the preview modules). It is essentially the default pipeline with the addition of the pangenome. This is provided as a separate pipeline because the pangenome can take some time to calculate, particularly on large datasets and may not always be needed. If there is seen to be a need the pangenome may be moved out to its own standalone pipeline. Let us know if this is a useful feature for you!!

bohra run -p full -i isolates.tab -r <path_to_reference> -m mask.bed (optional)-c contigs.tab (optional) --abritamr_args <your_species> (optional - to turn on point mutations)
Clone this wiki locally