You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use pyrodigal(-gv) to predict ORFs in eukaryotic viruses, I find that it generally works well. However, in some cases pyrodigal overestimates the gene size:
For example chryso_test.txt is a sequence with an ORF that has a start codon at position 20, but no stop codon. If I run pyrodigal with -c this ORF is not found (as expected), but without -c, pyrodigal puts the start codon outside of the sequence:
Would it be possible to create a feature that allows the user to specify that a start codon should be enforced (and not both start and stop codon can be outside a gene as with -c)? Or would there be any reason that I'm overlooking, why this behaviour would not be desirable?
For people having a similar problem: while looking for a solution to my specific problem, I found that commenting out following lines (and rebuilding the package), would give me the output I want:
I tried to give it a go myself in the meantime (see forked repository), so if you're interested I can draft up a pull request. I guess there would be better ways to implement this, but that would involve breaking changes and I wanted to avoid that.
In summary, the closed argument of the GeneFinder class now takes a list of two boolean values, which is then parsed into closed_start and closed_stop, or alternatively, when only one boolean is given, both closed_start and closed_stop are set to that value (this way existing pipelines would not break).
For pyrodigal's CLI, -c now takes a string (options: none [default] -> no closed ends, both -> both closed ends, start -> only closed end at start) or when only -c is given, this corresponds to both.
That sounds like a reasonable request, especially for viral genomes. I have to think about some of the implications with the code, and also whether this doesn't combine with #65 in a larger "how to handle circular topologies" question?
Sounds great! In my forked repository, all unit tests run fine with my altered version (except 1 in test_nodes.py, because I changed the arguments of the extract function), but there might be of course hidden issues that I'm not able to spot.
Hi,
I would like to use
pyrodigal(-gv)
to predict ORFs in eukaryotic viruses, I find that it generally works well. However, in some cases pyrodigal overestimates the gene size:For example chryso_test.txt is a sequence with an ORF that has a start codon at position 20, but no stop codon. If I run
pyrodigal
with-c
this ORF is not found (as expected), but without-c
,pyrodigal
puts the start codon outside of the sequence:Would it be possible to create a feature that allows the user to specify that a start codon should be enforced (and not both start and stop codon can be outside a gene as with
-c
)? Or would there be any reason that I'm overlooking, why this behaviour would not be desirable?For people having a similar problem: while looking for a solution to my specific problem, I found that commenting out following lines (and rebuilding the package), would give me the output I want:
pyrodigal/src/pyrodigal/lib.pyx
Lines 1991 to 2000 in 48bea81
and
pyrodigal/src/pyrodigal/lib.pyx
Lines 2086 to 2095 in 48bea81
The text was updated successfully, but these errors were encountered: