mag: add pipeline compatible mini test database for upcoming CAT_pack #1408

jfy133 · 2024-11-28T15:22:55Z

No description provided.

prototaxites · 2024-11-28T15:40:30Z

README.md

+## Generate database using CAT_pack prepare
+CAT_pack prepare --db_fasta input_files/sequence_fixedheaders.txt --names input_files/names_reduced.dmp --nodes  input_files/nodes_reduced.dmp --acc2tax input_files/accession2taxid_reduced.dmp --db_dir test/
+
+## Test using uncompressed contigs from metaspades assembly (note: --no_stars was required for some reason but seesms to only occur when we have a single genome in there possible..)


Will this be a problem using it when testing in #mag? Or will it be OK to just add --no-stars to the test config?

The latter as far as I can tell :)

The flag was added for this purpose according to GitHub

afaik Maxbin2 will force a split into the contigs, it just assumes there are 2+ genomes (at least that happened a few years ago). But thats fine I think.

prototaxites

LGTM!

(It might be good at some point to add details to the README for the provenance of the test datasets...!)

jfy133 · 2024-11-28T17:10:57Z

LGTM!

(It might be good at some point to add details to the README for the provenance of the test datasets...!)

I agree but would need @HadrienG @skrakau @d4straub to do some data archaeology for us 😅

Although I was speaking with @muabnezor about the long read only stuff he adding to mag and we were wondering if it would be worth replacing the test data now to something less mysterious and has paired short and long read data before we implement nf-test...

d4straub

I agree that we should document where the data comes from. I'm going to speak with Sabrina about that.

Regarding whether that is related to nf-test: I dont think that matters whatsoever, because no normal user is running the tests with nf-test on old versions (usually -profile test without nf-test), and devs might be fine that a past release fails with nf-test because they are usually just interested in the dev branch.

d4straub · 2024-11-29T07:09:57Z

README.md

+curl "https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi?tool=portal&save=file&log$=seqview&db=nuccore&report=fasta_cds_aa&id=1992822979&extrafeat=null&conwithfeat=on&hide-cdd=on&ncbi_phid=CE8C15326D6BB8C10000000006490560" -o sequence.txt
+
+## use my scripts to filter NCBI nodes/names/acc2taxid files to just a given taxid (basically fancy iterative greps)
+bash ~/bin/taxdmp_filter.sh 817 ## for nodes/naames


Suggested change

bash ~/bin/taxdmp_filter.sh 817 ## for nodes/naames

bash ~/bin/taxdmp_filter.sh 817 ## for nodes/names

d4straub · 2024-11-29T07:10:53Z

README.md

+sed 's/lcl|//g;s/_/ /2' sequence.txt > sequence_fixedheaders.txt
+
+## Generate database using CAT_pack prepare
+CAT_pack prepare --db_fasta input_files/sequence_fixedheaders.txt --names input_files/names_reduced.dmp --nodes  input_files/nodes_reduced.dmp --acc2tax input_files/accession2taxid_reduced.dmp --db_dir test/


Suggested change

CAT_pack prepare --db_fasta input_files/sequence_fixedheaders.txt --names input_files/names_reduced.dmp --nodes input_files/nodes_reduced.dmp --acc2tax input_files/accession2taxid_reduced.dmp --db_dir test/

CAT_pack prepare --db_fasta input_files/sequence_fixedheaders.txt --names input_files/names_reduced.dmp --nodes input_files/nodes_reduced.dmp --acc2tax input_files/accession2taxid_reduced.dmp --db_dir test/

d4straub · 2024-11-29T07:13:06Z

README.md

+## Generate database using CAT_pack prepare
+CAT_pack prepare --db_fasta input_files/sequence_fixedheaders.txt --names input_files/names_reduced.dmp --nodes  input_files/nodes_reduced.dmp --acc2tax input_files/accession2taxid_reduced.dmp --db_dir test/
+
+## Test using uncompressed contigs from metaspades assembly (note: --no_stars was required for some reason but seesms to only occur when we have a single genome in there possible..)


afaik Maxbin2 will force a split into the contigs, it just assumes there are 2+ genomes (at least that happened a few years ago). But thats fine I think.

d4straub · 2024-11-30T19:10:39Z

About the origin of the test data: It seems to be from @HadrienG and there might be a discussion in slack about it.

jfy133 added 6 commits November 28, 2024 16:16

Add a tiny CAT_pack database

e0806c7

Rename minigut_cat.tar.gz to databases/minigut_cat.tar.gz

a0e6614

Update README.md

af109d9

Rename minigut_cat.tar.gz to minigut_cat.tar.gz

046a3b6

Update README.md

38bd3dc

Update README.md

f194bf4

prototaxites reviewed Nov 28, 2024

View reviewed changes

prototaxites approved these changes Nov 28, 2024

View reviewed changes

d4straub approved these changes Nov 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mag: add pipeline compatible mini test database for upcoming CAT_pack #1408

mag: add pipeline compatible mini test database for upcoming CAT_pack #1408

jfy133 commented Nov 28, 2024

prototaxites Nov 28, 2024

jfy133 Nov 28, 2024

d4straub Nov 29, 2024

prototaxites left a comment •

edited

Loading

jfy133 commented Nov 28, 2024

d4straub left a comment

d4straub Nov 29, 2024

d4straub Nov 29, 2024

d4straub Nov 29, 2024

d4straub commented Nov 30, 2024

	bash ~/bin/taxdmp_filter.sh 817 ## for nodes/naames
	bash ~/bin/taxdmp_filter.sh 817 ## for nodes/names

	CAT_pack prepare --db_fasta input_files/sequence_fixedheaders.txt --names input_files/names_reduced.dmp --nodes input_files/nodes_reduced.dmp --acc2tax input_files/accession2taxid_reduced.dmp --db_dir test/
	CAT_pack prepare --db_fasta input_files/sequence_fixedheaders.txt --names input_files/names_reduced.dmp --nodes input_files/nodes_reduced.dmp --acc2tax input_files/accession2taxid_reduced.dmp --db_dir test/

mag: add pipeline compatible mini test database for upcoming CAT_pack #1408

Are you sure you want to change the base?

mag: add pipeline compatible mini test database for upcoming CAT_pack #1408

Conversation

jfy133 commented Nov 28, 2024

prototaxites Nov 28, 2024

Choose a reason for hiding this comment

jfy133 Nov 28, 2024

Choose a reason for hiding this comment

d4straub Nov 29, 2024

Choose a reason for hiding this comment

prototaxites left a comment • edited Loading

Choose a reason for hiding this comment

jfy133 commented Nov 28, 2024

d4straub left a comment

Choose a reason for hiding this comment

d4straub Nov 29, 2024

Choose a reason for hiding this comment

d4straub Nov 29, 2024

Choose a reason for hiding this comment

d4straub Nov 29, 2024

Choose a reason for hiding this comment

d4straub commented Nov 30, 2024

prototaxites left a comment •

edited

Loading