-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training doesn't like sig files #67
Comments
The title here is a bit confusing to the error. YACHT does not like zip files or does not like sig files? I was able to reproduce this error and fixed the condition on the type of file that is being accepted, so now a .sig.zip and .sig files can be accepted. However, now I get the following error where the file will be extracted so a zipfile is expected. Is this necessary in YACHT? Unsure how to move on from here.
|
TODO:
|
I used the demo files to test.. (1) signature directory is not being created when running a sig file Could someone check if this works on their end? I'll just start from a blank slate if it is not an issue for others. Issue 1 sketch ref genomes for a sig file
Running yacht train on a sig file
Returns an error where the signature directory is not being created.
In YACHT, when a sig.zip file is being trained, the signatures are extracted into a directory called Currently, YACHT is unable to extract from a Does a I'm unsure what type of condition that would be because if the signature file has a single signature, do we tell yacht to NOT collect the signatures from Issue 2 I tested the yacht train on a sig.zip file
Works fine for the ref.sig.zip until it reaches
It looks like |
refactor collect_signatures_info to work on these other file ending types: .sig, .lca, .sqldb, etc. |
Hi @dkoslicki, This seems not a minor issue. To make YACHT handle different formats, many yacht and test code need to be modified. If this isn't urgent, I will address it when I am available later. For this to-do list, @dkoslicki do you want a .sig file contain a single genome or multiple genomes, or both are accepted? I am not sure if Currently, the accepted |
@chunyuma let's take the practical approach: if Regarding Re: the So long and short of it, no need to do significant refactoring, and not super high priority. |
Hi @dkoslicki, thanks for the response. Regading |
@chunyuma Here again, it depends on if you are talking about the reference or the sample. If the sample, then the constraint you mentioned should be there: a sample is expected to be a single sketch (if given two with different k-sizes, how would YACHT know which to choose?). |
Traceback (most recent call last):
File "/scratch/temp/Yacht_issue_12/YACHT/make_training_data_from_sketches.py", line 44, in
raise ValueError(f"Reference database file {ref_file} is not a zip file. Please a Sourmash signature database file with Zipfile format.")
ValueError: Reference database file /scratch/temp/Yacht_issue_12/97_Silva_111_rep_set_euk.fasta.sig is not a zip file. Please a Sourmash signature database file with Zipfile format.
The text was updated successfully, but these errors were encountered: