-
Notifications
You must be signed in to change notification settings - Fork 5
Update logs
Important
Now, the end-to-end task allow to skip the PhaMer(virus identification).
If users already have the viral contigs as their inputs, they can run end-to-end task using --skip Y
to skip the virus identification
However, please noted that the default parameters is --skip N
We also added a log output that tells the user that PhaMer detected no viruses and stopped the following pipelines in the end-to-end task in --skip N
condition.
Important
Add a new column for the PhaGCN so that bacteriophages can be easily identified now. Set a more flexible mode for diamond alignments
- Revised some typos in the name of the outputs and Wiki
- Allows more sensitive search during protein alignments
- This may affect the results.
- On the benchmark test set, this version improved the recall of the overall performance and the precision will not be affected.
Important
Adjust the default parameters for the phylogenetic task
- marker alignment coverage --mcov
: 50
- marker alignment identity --mpident
: 25
- Add message information when the calling program (diamond, blast+, fasttree, etc.) fails.
- Revised some typos in the
--help
descriptions -
Host
nodes are added to the cherry's network for better visualization
Updates:
- Fixed a bug that
--task tree
will incorrectly combine DNA and Protein sequences in one file - Fixed some typos in the scripts
Updates:
- All the
os.system()
are replaced bysubprocess.run()
, providing standard error and return a non-zero exit code in case one of the calls fails. - Providing a short parameters for
--dbdir
(-d
) and--outpth
(-o
)
Updates:
-
PhaVIP will provide an additional output
phavip_prediction.csv
. A detailed explanation of this file can be found via PhaVIP outputs - The protein annotation file
gene_annotation.tsv
will provide the alignment identity and coverage information as new columns. - Fixed a potential issue when running phylogenetic tree (FastTree) in the
tree
task. Users should re-download the PhaBOX v2 database if they would like to run thetree
task. Download link - Fixed a typo in the PhaTYP program.
Updates:
-
End_to_end mode will only make predictions on the predicted viruses. For low-confidence viruses and non-viruses with a flag lower than the viral score threshold, we provide a file named
uncertain_sequences_for_contamination_task.fa
. We suggest the user run the contamination task to check the quality of their sequences first. - The contamination task will provide four more fasta files for users:
low_quality_virus.fa
,medium_quality_virus.fa
,high_quality_virus.fa
and a croppedproviruses.fa
. Users can use them to re-run other tasks.
Updates:
- Integrate the PhaVIP into phabox2, and provide more detailed protein annotations. In this version, phavip will be automatically called when running end_to_end, phamer, phatyp, phagcn, and cherry. The outputs are named
gene_annotation.tsv
in thexxx_supplementary
folder. -
End_to_end/PhaMer will provide an additional file named
uncertain_sequences_for_contamination_task.fa
and the outputs in the phamer_prediction.tsv will suggest the user run the contamination task to check the quality of these sequences (probability proviruses or novel viruses. - CHERRY will provide a full lineage for the host in either the NCBI or GTDB version. However, because CHERRY is based on sequences from NCBI for prediction, some of them cannot find the corresponding lineage in GTDB.
- CHERRY will assign a score of 0 for the unpredicted host now rather than 'nan'.
- Fixed an issue where PhaTYP might not output anything when there was no alignment result for the input sequence.
- Fixed a possible problem when the length of the input sequence is equal to the filtering threshold.
- Revised some typos in the help documents.
Updates:
- A vOTU task is added for vOTU grouping.
- A tree task is added for phylogenetic tree construction.
- Please check the Options for detailed information.