Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biopython update #157

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Biopython update #157

wants to merge 8 commits into from

Conversation

pipaj97
Copy link
Collaborator

@pipaj97 pipaj97 commented Jun 25, 2023

Description

In the latest version of biopython, some functionalities do not work anymore in OpenCADD.
This PR fixes this problem.

Todos

Notable points that this PR has either accomplished or will accomplish.

  • Update sequence functionalities to work with the latest biopython version
  • add a test for the newly implemented gap function
  • Check and solve CI failures

Status

  • Ready to go

@pipaj97 pipaj97 added the enhancement New feature or request label Jun 25, 2023
@pipaj97 pipaj97 requested a review from AAriam June 25, 2023 12:10
@pipaj97 pipaj97 added the module-structure-superposition Concerns opencadd.structure.superposition module label Jun 25, 2023
@AAriam AAriam changed the base branch from dev to master January 30, 2024 09:17
@AAriam AAriam closed this Jan 30, 2024
@AAriam AAriam reopened this Jan 30, 2024
Copy link
Collaborator

@AAriam AAriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some apparent mistakes that I commented on the opencadd/structure/superposition/sequences.py file.

However, more importantly, the fasta2select function (i.e. the only part of the code being changed in this PR) does not have any explicit unit tests. Therefore, even after resolving these comments, we can't be sure of the correctness of the function until we write some explicit test-cases.

@@ -190,8 +211,8 @@ def resid(nseq, ipos, t=t, s=s):
res_list = [] # collect individual selection string

# should be the same for both seqs
GAP = alignment[0].seq.alphabet.gap_char
if GAP != alignment[1].seq.alphabet.gap_char:
GAP = find_gap_character(a.seq)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this (Line 214) be GAP = find_gap_character(alignment[0].seq) instead?
a is a loop variable from above, and is simply equal to the last value of the loop iteration, which seems quite random. Also, the original lines were:

GAP = alignment[0].seq.alphabet.gap_char
if GAP != alignment[1].seq.alphabet.gap_char:

and the second line has changed to

if GAP != find_gap_character(alignment[1].seq):

so it looks like the first line must also be:

GAP = find_gap_character(alignment[0].seq)

instead of:

GAP = find_gap_character(a.seq)

@@ -175,7 +195,8 @@ def resid_factory(alignment, seq2resids):
t = np.zeros((nseq, alignment.get_alignment_length()), dtype=int)
s = np.zeros((nseq, alignment.get_alignment_length()), dtype=object)
for iseq, a in enumerate(alignment):
GAP = a.seq.alphabet.gap_char
print(a.seq)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from debugging?

@@ -204,4 +225,4 @@ def resid(nseq, ipos, t=t, s=s):

ref_selection = " or ".join(sel[0])
target_selection = " or ".join(sel[1])
return {"reference": ref_selection, "mobile": target_selection}
return {"reference": ref_selection, "mobile": target_selection}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add end-of-file newline

return ''
#raise ValueError("No standard gap character found!")


def fasta2select(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the gap characters be removed explicitly now that BioPython doesn't recognize them?

From BioPython Documentation:

Another case where the alphabet was used was in declaring the gap character, by default - in the various Biopython sequence and alignment parsers. If you are using a different character, you will need to pass this to the Seq object .replace() method explicitly now:

# Old style
from Bio.Alphabet import generic_dna, Gapped
from Bio.Seq import Seq

my_dna = Seq("ACGT=TT", Gapped(generic_dna, "="))
print(my_dna.ungap())
# New style
from Bio.Seq import Seq

my_dna = Seq("ACGT=TT")
print(my_dna.replace("=", ""))

@AndreaVolkamer
Copy link
Member

@pipaj97 can you have a look at these comments please?

@AndreaVolkamer
Copy link
Member

@dominiquesydow can you please have a look in the CI failures realted to the kliffs package:

 =========================== short test summary info ============================
  FAILED opencadd/tests/databases/test_klifs_local_remote.py::TestsAllQueries::test_all_interactions - bravado.exception.HTTPURITooLong: 414 Request-URI Too Long: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <html><head>
  <title>414 Request-URI Too Long</title>
  </head><body>
  <h1>Request-URI Too Long</h1>
  <p>The requested URL's length exceeds the capacity
  limit for this server.<br />
  </p>
  <hr>
  <address>Apache/2.4.38 (Debian) Server at klifs.net Port 443</address>
  </body></html>
  FAILED opencadd/tests/databases/test_klifs_local_remote.py::TestsAllQueries::test_all_conformations - bravado.exception.HTTPURITooLong: 414 Request-URI Too Long: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <html><head>
  <title>414 Request-URI Too Long</title>
  </head><body>
  <h1>Request-URI Too Long</h1>
  <p>The requested URL's length exceeds the capacity
  limit for this server.<br />
  </p>
  <hr>
  <address>Apache/2.4.38 (Debian) Server at klifs.net Port 443</address>
  </body></html>

@dominiquesydow
Copy link
Contributor

dominiquesydow commented Feb 25, 2024

Hi @AndreaVolkamer,

I fixed the following two items:

1. A future deprecation warning for pandas, e.g. here:

/Users/dominique/Documents/GitHub/opencadd/opencadd/databases/klifs/core.py:147: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
  The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
  
  For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
  
  
    dataframe["ligand_allosteric.expo_id"].replace(0, "-", inplace=True)

We are doing now instead:

dataframe["ligand_allosteric.expo_id"] = dataframe["ligand_allosteric.expo_id"].replace(0, "-")

2. Two instances of HTTPURITooLong errors when querying KLIFS.

In these two cases we are pulling all interactions and all conformations - part of that query is that we pull all structure IDs from KLIFS, to then build a URL with all of them to retrieve interactions and conformations. This URL is naturally too long (I guess they rightfully added a limit). I am now chunking over the list of structure IDs instead.

@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2024

Codecov Report

Merging #157 (cb3a39f) into master (c76e87c) will increase coverage by 0.10%.
The diff coverage is 91.42%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@dominiquesydow
Copy link
Contributor

@AndreaVolkamer I did not touch any KLIFS-unrelated issues.

@dominiquesydow
Copy link
Contributor

@AndreaVolkamer ok could not resist & applied black to the full package --- can someone look into Python 3.12, which seems to fail with conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module-structure-superposition Concerns opencadd.structure.superposition module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants