Weird behavior with field replacement and method that uses `apply_global_index` #1101

ikrommyd · 2024-06-01T08:33:05Z

This

from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

incorrectly prints

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]

while the eta_to_use field is different in both cases for electrons. The seemingly redundant events["Photon", "eta_to_use"] = events.Photon.eta is part of the reproducer. If you remove it, it works.
The correct output should be

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]

The text was updated successfully, but these errors were encountered:

ikrommyd · 2024-06-01T08:35:30Z

If you remove the second events reading in the above reproducer and run it like this in a notebook:

from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

You will get the correct output the first time you run the cell

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]

and all the following times you rerun the same cell you will be getting

[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]

pfackeldey · 2025-01-25T00:26:31Z

This is a bug in dask-awkward with it's HLG cache it seems:

from coffea.nanoevents import NanoEventsFactory
from dask_awkward.lib.core import dak_cache

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta + events.Electron.deltaEtaSC
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

# clear cache
dak_cache.clear()

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
events["Photon", "eta_to_use"] = events.Photon.eta
events["Electron", "eta_to_use"] = events.Electron.eta
print(events.Photon.matched_electron.eta_to_use.compute()[-10:])

yields correctly:

[[], [0.776], [], [2.02], [], [-0.00642], [], [], [], []]
[[], [0.785], [], [2.02], [], [-0.0595], [], [], [], []]

This looks to me like a cache is hit where it should not (probably the ak.with_field/__setitem__ call?)

@martindurant I think we need to either fix the cache key calculation here, or remove the cache. Do you remember how much speedup this brings for building the DAG?

These kind of silent errors where numerical values are wrong (but the computation doesn't fail) are dangerous, especially when users do not directly interact with the numerical values (unless they explicitly call .compute() and look at them). So we need to be very sure that this does not happen in other cases aswell.

martindurant · 2025-01-25T01:22:46Z

We should fix it. Many metadata and layer objects are created during a HEP analysis run (unless they know how to use partition mapping) and the cost adds up. I assume the original PR had numbers, but much has changed - this would not have been done for a 10% improvement.

ikrommyd added the bug Something isn't working label Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird behavior with field replacement and method that uses `apply_global_index` #1101

Weird behavior with field replacement and method that uses `apply_global_index` #1101

ikrommyd commented Jun 1, 2024 •

edited

Loading

ikrommyd commented Jun 1, 2024 •

edited

Loading

pfackeldey commented Jan 25, 2025 •

edited

Loading

martindurant commented Jan 25, 2025

Weird behavior with field replacement and method that uses apply_global_index #1101

Weird behavior with field replacement and method that uses apply_global_index #1101

Comments

ikrommyd commented Jun 1, 2024 • edited Loading

ikrommyd commented Jun 1, 2024 • edited Loading

pfackeldey commented Jan 25, 2025 • edited Loading

martindurant commented Jan 25, 2025

Weird behavior with field replacement and method that uses `apply_global_index` #1101

Weird behavior with field replacement and method that uses `apply_global_index` #1101

ikrommyd commented Jun 1, 2024 •

edited

Loading

ikrommyd commented Jun 1, 2024 •

edited

Loading

pfackeldey commented Jan 25, 2025 •

edited

Loading