-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Biopython update #157
base: master
Are you sure you want to change the base?
Biopython update #157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some apparent mistakes that I commented on the opencadd/structure/superposition/sequences.py
file.
However, more importantly, the fasta2select
function (i.e. the only part of the code being changed in this PR) does not have any explicit unit tests. Therefore, even after resolving these comments, we can't be sure of the correctness of the function until we write some explicit test-cases.
@@ -190,8 +211,8 @@ def resid(nseq, ipos, t=t, s=s): | |||
res_list = [] # collect individual selection string | |||
|
|||
# should be the same for both seqs | |||
GAP = alignment[0].seq.alphabet.gap_char | |||
if GAP != alignment[1].seq.alphabet.gap_char: | |||
GAP = find_gap_character(a.seq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this (Line 214) be GAP = find_gap_character(alignment[0].seq)
instead?
a
is a loop variable from above, and is simply equal to the last value of the loop iteration, which seems quite random. Also, the original lines were:
GAP = alignment[0].seq.alphabet.gap_char
if GAP != alignment[1].seq.alphabet.gap_char:
and the second line has changed to
if GAP != find_gap_character(alignment[1].seq):
so it looks like the first line must also be:
GAP = find_gap_character(alignment[0].seq)
instead of:
GAP = find_gap_character(a.seq)
@@ -175,7 +195,8 @@ def resid_factory(alignment, seq2resids): | |||
t = np.zeros((nseq, alignment.get_alignment_length()), dtype=int) | |||
s = np.zeros((nseq, alignment.get_alignment_length()), dtype=object) | |||
for iseq, a in enumerate(alignment): | |||
GAP = a.seq.alphabet.gap_char | |||
print(a.seq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from debugging?
@@ -204,4 +225,4 @@ def resid(nseq, ipos, t=t, s=s): | |||
|
|||
ref_selection = " or ".join(sel[0]) | |||
target_selection = " or ".join(sel[1]) | |||
return {"reference": ref_selection, "mobile": target_selection} | |||
return {"reference": ref_selection, "mobile": target_selection} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add end-of-file newline
return '' | ||
#raise ValueError("No standard gap character found!") | ||
|
||
|
||
def fasta2select( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the gap characters be removed explicitly now that BioPython doesn't recognize them?
From BioPython Documentation:
Another case where the alphabet was used was in declaring the gap character, by default -
in the various Biopython sequence and alignment parsers. If you are using a different character, you will need to pass this to the Seq
object .replace()
method explicitly now:
# Old style
from Bio.Alphabet import generic_dna, Gapped
from Bio.Seq import Seq
my_dna = Seq("ACGT=TT", Gapped(generic_dna, "="))
print(my_dna.ungap())
# New style
from Bio.Seq import Seq
my_dna = Seq("ACGT=TT")
print(my_dna.replace("=", ""))
@pipaj97 can you have a look at these comments please? |
@dominiquesydow can you please have a look in the CI failures realted to the kliffs package:
|
Hi @AndreaVolkamer, I fixed the following two items: 1. A future deprecation warning for
We are doing now instead:
2. Two instances of In these two cases we are pulling all interactions and all conformations - part of that query is that we pull all structure IDs from KLIFS, to then build a URL with all of them to retrieve interactions and conformations. This URL is naturally too long (I guess they rightfully added a limit). I am now chunking over the list of structure IDs instead. |
@AndreaVolkamer I did not touch any KLIFS-unrelated issues. |
@AndreaVolkamer ok could not resist & applied black to the full package --- can someone look into Python 3.12, which seems to fail with |
Description
In the latest version of biopython, some functionalities do not work anymore in OpenCADD.
This PR fixes this problem.
Todos
Notable points that this PR has either accomplished or will accomplish.
Status