Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistently use existing charge caching #1115

Merged
merged 4 commits into from
Jan 15, 2025
Merged

Conversation

mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Nov 21, 2024

Description

A few code paths didn't use the improvements of #1066 / #1069, which caused some horrible performance in some number of corner cases. (Writing out GROMACS files in the protein-ligand example took 10 minutes (!) compared to a few seconds for OpenMM.)

I'm more than a little confused as to how from_openmm code paths might have been hit in this example - which just uses SMIRNOFF force fields and Interchange.combine - but I could be misremembering which changes actually affected things.

Checklist

  • Add tests
  • Lint
  • Update docstrings

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.47%. Comparing base (6a515c0) to head (9ad8427).
Report is 5 commits behind head on main.

Additional details and impacted files

@mattwthompson
Copy link
Member Author

Upstream:

============================= slowest 20 durations =============================
474.12s call     examples/protein_ligand/protein_ligand.ipynb::Cell 20
427.49s call     examples/protein_ligand/protein_ligand.ipynb::Cell 22
188.02s call     examples/protein_ligand/protein_ligand.ipynb::Cell 26
95.74s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 14
85.04s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 18
69.02s call     examples/packed_box/packed_box.ipynb::Cell 6
65.37s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 8
62.18s call     examples/protein_ligand/protein_ligand.ipynb::Cell 7
59.11s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 6
32.89s call     examples/lammps/lammps.ipynb::Cell 6
30.41s call     examples/amber/amber.ipynb::Cell 4
30.02s call     examples/host-guest/host_guest.ipynb::Cell 6
27.48s call     examples/protein_ligand/protein_ligand.ipynb::Cell 9
20.64s call     examples/openmm/openmm.ipynb::Cell 5
19.13s call     examples/protein_ligand/protein_ligand.ipynb::Cell 18
18.69s call     examples/openmm/openmm.ipynb::Cell 4
14.53s call     examples/packed_box/packed_box.ipynb::Cell 4
14.51s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 19
12.16s call     examples/openmm/openmm.ipynb::Cell 0
11.10s call     examples/amber/amber.ipynb::Cell 0
================= 119 passed, 3 skipped in 1247.79s (0:20:47) ==================

These changes:

============================= slowest 20 durations =============================
89.78s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 14
64.33s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 18
63.69s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 8
63.53s call     examples/protein_ligand/protein_ligand.ipynb::Cell 7
57.87s call     examples/ligand_in_water/ligand_in_water.ipynb::Cell 6
38.[25](https://github.com/openforcefield/openff-interchange/actions/runs/11958262648/job/33337380981?pr=1115#step:8:26)s call     examples/packed_box/packed_box.ipynb::Cell 6
33.39s call     examples/lammps/lammps.ipynb::Cell 6
30.02s call     examples/host-guest/host_guest.ipynb::Cell 6
29.95s call     examples/protein_ligand/protein_ligand.ipynb::Cell 22
28.00s call     examples/amber/amber.ipynb::Cell 4
[26](https://github.com/openforcefield/openff-interchange/actions/runs/11958262648/job/33337380981?pr=1115#step:8:27).46s call     examples/protein_ligand/protein_ligand.ipynb::Cell 9
25.93s call     examples/protein_ligand/protein_ligand.ipynb::Cell 26
21.18s call     examples/openmm/openmm.ipynb::Cell 5
19.35s call     examples/openmm/openmm.ipynb::Cell 4
17.60s call     examples/protein_ligand/protein_ligand.ipynb::Cell 18
15.10s call     examples/packed_box/packed_box.ipynb::Cell 4
11.91s call     examples/openmm/openmm.ipynb::Cell 0
11.38s call     examples/lammps/lammps.ipynb::Cell 0
11.18s call     examples/amber/amber.ipynb::Cell 0
9.00s call     examples/packed_box/packed_box.ipynb::Cell 2
================== 119 passed, 3 skipped in 304.15s (0:05:04) ==================

@mattwthompson mattwthompson marked this pull request as ready for review November 21, 2024 18:46
Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's a big speedup!

Am I right in thinking that this has implications for modifying an ElectrostaticsCollection after the charges have been cached? If so, since we demonstrate modifying collections in the examples, we should probably add some sort of invalidate_cache() method and document this behavior both in the collection docstring and in the relevant examples. Modifying a collection according to the example and having that modification only sometimes reflected in outputs would be quite frustrating.

Might it also make sense to generalize this caching code to the Collection class - perhaps have a parameters: dict[TopologyKey | <etc>, Potential property there? I imagine most of the cost of _get_charges() is the repeated dictionary lookups, though maybe having to do an attribute lookup would just reproduce that. parameters: dict[TopologyKey | <etc>, tuple[int, ...] would be a nicer, maybe faster API, but it would mean each collection would have to somehow define an ordered set of parameters. I'm not sure if that would be worthwhile.

Other than the needed documentation and a few extremely minor nits, this looks excellent. The code seems much easier to read and with less indirection. Great work!

@@ -185,7 +181,7 @@ def _convert(
_partial_charges: dict[int | VirtualSiteKey, float] = dict()

# Indexed by particle (atom or virtual site) indices
for key, charge in interchange["Electrostatics"]._get_charges().items():
for key, charge in interchange["Electrostatics"].charges.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave the electrostatics_collection assignment in place above and then use it here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't completely follow - the appeal here is to save those lookups from above and I'm not sure what the charge variable is doing above in that scope

@@ -585,7 +581,7 @@ def _convert_virtual_sites(
residue_index=molecule.atoms[0].residue_index,
residue_name=molecule.atoms[0].residue_name,
charge_group_number=1,
charge=interchange["Electrostatics"]._get_charges()[virtual_site_key],
charge=interchange["Electrostatics"].charges[virtual_site_key],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and here?

openff/interchange/smirnoff/_nonbonded.py Outdated Show resolved Hide resolved
openff/interchange/smirnoff/_nonbonded.py Show resolved Hide resolved
@mattwthompson
Copy link
Member Author

Thanks @Yoshanuikabundi!

@mattwthompson mattwthompson merged commit 9460db5 into main Jan 15, 2025
23 checks passed
@mattwthompson mattwthompson deleted the use-charge-caching branch January 15, 2025 16:37
@mattwthompson mattwthompson added this to the 0.4.1 milestone Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants