You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is probably not a bug and maybe also documented somewhere, but I could not find any information about it.
I built a custom ARB database of a subset of the sequences from SILVA release 132 with the following command (after uncompressing):
sina -i SILVA_132_SSURef_Nr99_tax_silva.fasta -o custom_arb_database.arb --prealigned
By this, the created arb database has no taxonomy fields.
custom ARB database
official ARB database
I wanted to classify my sequences afterwards with the custom database, but since the field tax_slv does not exist, this results in an empty file. However, if I choose as full_name as LCA field, I get results but I do not get the entire taxonomic path.
This is the result for one entry with tax_slv (and other tax_* fields) with the official ARB database
the option to generate an ARB file on the fly was meant to allow people unfamiliar with ARB to quickly generate a file SINA can use as a reference. The fasta file is parsed as >$ID $DESCRIPTION with $ID mapped to acc and $DESCRIPTION mapped to full_name. That the SILVA FASTA files have $DESCRIPTION == tax_slv is just happenstance, and nothing SINA would know. Allowing people to customise this is a bit beyond what SINA is meant to do.
So in answer to 1: To create a custom ARB database, use ARB. You can start from a FASTA and import any fields you might like, split/copy parts of the FASTA header as needed, even add your own "import filter" to parse your type of FASTA header correctly.
In answer to 2: I don't know. Try with --copy-fields full_name, so see what the original path was. Since it works with the SILVA database, but does not work with your custom database, it must be the format of the field. Feel free to post a (small) example ARB database here, I'll have a look whether there is something improvable on SINA's side that doesn't impact other use cases.
This is probably not a bug and maybe also documented somewhere, but I could not find any information about it.
I built a custom ARB database of a subset of the sequences from SILVA release 132 with the following command (after uncompressing):
By this, the created arb database has no taxonomy fields.
custom ARB database
official ARB database
I wanted to classify my sequences afterwards with the custom database, but since the field
tax_slv
does not exist, this results in an empty file. However, if I choose asfull_name
as LCA field, I get results but I do not get the entire taxonomic path.This is the result for one entry with
tax_slv
(and othertax_*
fields) with the official ARB database# sina command sina \ --in sequences.fasta \ --out sina.fasta \ --threads 36 \ --db SILVA_132_SSURef_Nr99_tax_silva.arb \ --fs-min 2 \ --fs-msc 0.3 \ --fs-full-len 500 \ --search-min-sim 0.5 \ --search \ --search-db SILVA_132_SSURef_Nr99_tax_silva.arb \ --search-max-result 1 \ --lca-fields tax_slv,tax_embl,tax_ltp \ --lca-quorum 0.3 \ --meta-fmt \ csv
and here the same entry with the
full_name
field of the custom database# sina command sina \ --in sequences.fasta \ --out sina.fasta \ --threads 36 \ --db custom_arb_database.arb \ --fs-min 2 \ --fs-msc 0.3 \ --fs-full-len 500 \ --search-min-sim 0.5 \ --search \ --search-db custom_arb_database.arb \ --search-max-result 1 \ --lca-fields full_name \ --lca-quorum 0.3 \ --meta-fmt \ csv
TRINITY_DN279_c1_g1_i5,0,0,,54,2020-11-09 15:48:47,len=503 path=[1:0-102 4:103-274 18:275-285 19:286-327 20:328-502],Ipomoea nil (Japanese morning glory);,BDFN01001194.1.11177.12965~0.559 ,turn-check disabled
I have no idea why the taxonomy looks so different, but what surprises me more is that there is no taxonomic path here.
So, long introduction, my question is:
full_name
as field in my custom ARB database?Thank you very much for your help!
The text was updated successfully, but these errors were encountered: