Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior with field replacement and method that uses apply_global_index #1101

Open
ikrommyd opened this issue Jun 1, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ikrommyd
Copy link
Collaborator

ikrommyd commented Jun 1, 2024

This

from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

incorrectly prints

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]

while the eta_to_use field is different in both cases for electrons. The seemingly redundant events["Photon", "eta_to_use"] = events.Photon.eta is part of the reproducer. If you remove it, it works.
The correct output should be

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]
@ikrommyd ikrommyd added the bug Something isn't working label Jun 1, 2024
@ikrommyd
Copy link
Collaborator Author

ikrommyd commented Jun 1, 2024

If you remove the second events reading in the above reproducer and run it like this in a notebook:

from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

You will get the correct output the first time you run the cell

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]

and all the following times you rerun the same cell you will be getting

[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]

@pfackeldey
Copy link
Contributor

pfackeldey commented Jan 25, 2025

This is a bug in dask-awkward with it's HLG cache it seems:

from coffea.nanoevents import NanoEventsFactory
from dask_awkward.lib.core import dak_cache

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

# clear cache
dak_cache.clear()

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

yields correctly:

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]

This looks to me like a cache is hit where it should not (probably the ak.with_field/__setitem__ call?)

@martindurant I think we need to either fix the cache key calculation here, or remove the cache. Do you remember how much speedup this brings for building the DAG?

These kind of silent errors where numerical values are wrong (but the computation doesn't fail) are dangerous, especially when users do not directly interact with the numerical values (unless they explicitly call .compute() and look at them). So we need to be very sure that this does not happen in other cases aswell.

@martindurant
Copy link

We should fix it. Many metadata and layer objects are created during a HEP analysis run (unless they know how to use partition mapping) and the cost adds up. I assume the original PR had numbers, but much has changed - this would not have been done for a 10% improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants