-
Notifications
You must be signed in to change notification settings - Fork 5
Command line options
Commands are issued as the parameter on the command line and set the task to be run by the program.
The help options list can be printed on the console via:
# help for general options
phabox --help
# help for specific options
phabox2 --task [task] -h
#Example:
phabox2 --task phamer -h
phabox2 --task phagcn -h
We also listed the options below for your reference:
The following parameters are common when running phabox2:
--task Select a program to run: end_to_end || Run phamer, phagcn, phatyp, phavip, and cherry once (default) phamer || Virus identification phagcn || Taxonomy classification phatyp || Lifestyle prediction cherry || Host prediction phavip || Protein annotation contamination || Contamination/proviurs detection votu || vOTU grouping (ANI-based or AAI-based) tree || Build phylogenetic trees based on marker genes --dbdir Path of downloaded phabox2 database directory (required) --outpth Rootpth for the output folder (required) All the results, including intermediate files and final predictions, are stored in this folder. --contigs Path of the input FASTA file (required) --proteins FASTA file of predicted proteins. (optional) --midfolder Midfolder for intermediate files. (optional) This folder will be created within the --outpth to store intermediate files. --len Filter the length of contigs || default: 3000 Contigs with length smaller than this value will not proceed --threads Number of threads to use || default: all available threads
Please note that end_to_end task will run phamer, phagcn, cherry, phatyp, and phavip together. Thus, each task's options can also be used for the end_to_end task.
In addition, prediction with non-virus and low-confidence will not be used in the following taxonomy, host, and lifestyle prediction tasks.
The following parameters will be used in specific tasks:
usage: phabox2 --task phamer [options]
In-task options:
--reject Reject sequences in which the percent proteins aligned to known phages is smaller than the value. Default: 10 Range from 0 to 20
If the proportion is too low, the prediction for downstream analysis will be unreliable.
Usage: phabox2 --task phagcn [options]
In-task options:
The options below are used to generate a network for virus-virus connections. The current parameters are optimized for the ICTV 2024 and are highly accurate for grouping genus-level vOTUs. When making changes, make sure you understand 100% what they are.
--aai Average amino acids identity || default: 75 || range from 0 to 100 --share Minimum shared number of proteins || default: 15 || range from 0 to 100 --pcov Protein-based coverage || default: 80 || range from 0 to 100 --draw Draw network examples for the query virus relationship. || default: N || Y or N
--draw
is used to plot sub-networks containing the query virus. We use it to generate visualization for our web server.
However, it will only print the top 10 largest sub-networks, so we do not recommend that users use it.
We have provided the complete network for visualization (network_edges.tsv and network_nodes.tsv file)
please check it out via: here
Usage: phabox2 --task cherry [options]
In-task options:
The options below are used to generate a network for virus-virus connections. The current parameters are optimized for the ICTV 2024 and are highly accurate for grouping genus-level vOTUs. When making changes, make sure you understand 100% what they are.
--aai Average amino acids identity || default: 75 || range from 0 to 100 --share Minimum shared number of proteins || default: 15 || range from 0 to 100 --pcov Protein-based coverage || default: 80 || range from 0 to 100 --draw Draw network examples for the query virus relationship. || default: N || Y or N
--draw
is used to plot sub-networks containing the query virus. We use it to generate visualization for our web server.
However, it will only print the top 10 largest sub-networks, so we do not recommend that users use it.
We have provided the complete network for visualization (network_edges.tsv and network_nodes.tsv file)
please check it out via: here
The options below are used to predict CRISPRs based on MAGs.
--bfolder Path to the folder that contains MAGs || default: None
The options below are used to align contigs to CRISPRs.
--cpident Alignment identity for CRISPRs || default: 90 || range from 90 to 100 --ccov Alignment coverage for CRISPRs || default: 90 || range from 0 to 100 --blast BLAST program for CRISPRs || default: blastn || blastn or blastn-short blastn-short will lead to more sensitive results but require more time to execute the program
The default parameters are optimized for predicting prokaryotic hosts for the virus with 98% accuracy (data from the NCBI RefSeq database). When making changes, make sure you understand 100% what they are.
--magonly
Only predicting host based on the provided MAGs: Y or N || default: N
Y will only predict the host based on the provided MAGs
N will predict the host based on the MAGs and the reference database
usage: phabox2 --task phatyp [options]
In-task options:
There are no additional options for lifestyle prediction. Only need to follow the general options.
Please note that running task end_to_end
, phamer
, phagcn
, phatyp
, and cherry
, will automatically run phavip
. The output files are the same.
usage: phabox2 --task phavip [options]
usage: phabox2 --task end_to_end [options]
In-task options:
The end-to-end task allow to skip the PhaMer(virus identification).
If users already have the viral contigs as their inputs, they can run end-to-end task using --skip Y
to skip the virus identification
--skip Whether you want to skip the viruses identification (PhaMer) || default: N || Y or N
However, please noted that the default parameters is --skip N
. We also added a log output that tells the user that PhaMer detected no viruses and stopped the following pipelines in the end-to-end task in --skip N
condition.
Usage: phabox2 --task contamination [options]
In-task options:
--sensitive Sensitive when search for the prokaryotic genes || default: N || Y or N Y will lead to more sensitive results but require more time to execute the program
Usage: phabox2 --task votu [options]
In-task options:
--mode Mode for clustering ANI based or AAI based || default: ANI || ANI or AAI
AAI-based options:
--aai Average amino acids identity for AAI based genus grouping || default: 75 || range from 0 to 100 --pcov Protein-level coverage for AAI based genus grouping || default: 80 || range from 0 to 100 --share Minimum shared number of proteins for AAI based genus grouping || default: 15 || range from 0 to 100
ANI-based options:
--ani Alignment identity for ANI-based clustering || default: 95 || range from 0 to 100 --tcov Alignment coverage for ANI-based clustering || default: 85 || range from 0 to 100
Usage: phabox2 --task tree [options]
In-task options:
--marker A list of markers used to generate tree || default: terl portal You can choose more than one marker to generate the tree from below: The marker genes were obtained from the RefSeq 2024: endolysin || 91% prokaryotic virus have endolysin holin || 75% prokaryotic virus have holin head || 77% prokaryotic virus have marjor head portal || 84% prokaryotic viruses have portal terl || 92% prokaryotic viruses have terminase large subunit Using combinations of these markers can improve the accuracy of the tree But will decrease the number of sequences in the tree. --mcov Alignment coverage for matching marker genes || default: 50 || range from 0 to 100 --mpident Alignment identity for matching marker genes || default: 25 || range from 0 to 100 --msa Whether run msa || default: N || Y or N Y will run msa for the marker genes using mafft But this will require more time to execute the program --tree Whether build a tree || default: N || Y or N Y will generate the tree based on the marker genes using FastTree But this will require more time to execute the program