-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird behavior with field replacement and method that uses apply_global_index
#1101
Comments
If you remove the second events reading in the above reproducer and run it like this in a notebook: from coffea.nanoevents import NanoEventsFactory
events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:]) You will get the correct output the first time you run the cell
and all the following times you rerun the same cell you will be getting
|
This is a bug in dask-awkward with it's HLG cache it seems: from coffea.nanoevents import NanoEventsFactory
from dask_awkward.lib.core import dak_cache
events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])
# clear cache
dak_cache.clear()
events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:]) yields correctly:
This looks to me like a cache is hit where it should not (probably the @martindurant I think we need to either fix the cache key calculation here, or remove the cache. Do you remember how much speedup this brings for building the DAG? These kind of silent errors where numerical values are wrong (but the computation doesn't fail) are dangerous, especially when users do not directly interact with the numerical values (unless they explicitly call |
We should fix it. Many metadata and layer objects are created during a HEP analysis run (unless they know how to use partition mapping) and the cost adds up. I assume the original PR had numbers, but much has changed - this would not have been done for a 10% improvement. |
This
incorrectly prints
while the
eta_to_use
field is different in both cases for electrons. The seemingly redundantevents["Photon", "eta_to_use"] = events.Photon.eta
is part of the reproducer. If you remove it, it works.The correct output should be
The text was updated successfully, but these errors were encountered: