Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coffea.lumi_tools.LumiData cannot compute inside a dask.distributed.Client context in v2025.1.1 #1256

Open
rpsimeon34 opened this issue Jan 24, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@rpsimeon34
Copy link
Contributor

rpsimeon34 commented Jan 24, 2025

small_lumi.csv

Describe the bug
Task graphs containing coffea.lumi_tools.LumiData fail to pickle/serialize when trying to compute inside a dask.distributed.Client context.

To Reproduce
Setup is as follows.

import awkward as ak
import dask
import dask_awkward as dak
from dask.distributed import Client
from coffea.lumi_tools import LumiData, LumiList

LUMI_CSV_2023 = "utils/lumis/lumi2023.csv" #produced from brilcalc, see end of "To Reproduce"

runs_eager = ak.Array([368229, 368229, 368229, 368229])
runs = dak.from_awkward(runs_eager,2)
lumis_eager = ak.Array([74, 74, 74, 74])
lumis = dak.from_awkward(lumis_eager,2)

def count_lumi(runs,lumis):
    total_lumi = 0
    my_lumilist = LumiList(runs,lumis)
    my_lumidata = LumiData(LUMI_CSV_2023)
    total_lumi += my_lumidata.get_lumi(my_lumilist)
    return total_lumi

The following code works as intended:

output_noclient = count_lumi(runs,lumis)
coutput_noclient = dask.compute(output_noclient)[0]

where coutput_noclient is a float. The following code fails:

client = Client()

with client:
    output = count_lumi(runs,lumis)
    coutput = dask.compute(output)[0]

Error output is included in "Output" section.

LUMI_CSV_2023 is produced with brilcalc - if it's interesting to know how this is done, I can provide more details.

Expected behavior
Computing with and without a Client should result in a float.

Output

2025-01-24 12:25:28,650 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 8 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f519ff778d0>
 0. Dict-e9470c53-4fde-4b2e-9fba-355b2af184dd
 1. lumilist-unique-chunked-45752f79d9f1cd5c491c6273eea610de
 2. lumilist-unique-finalize-180fa53edd47b037d837d5cc42a9cad3
 3. <dask-awkward.lib.core.ArgsKwargsPackedFunction ob-aa20eed3f086352e6b8d3580ec93efca
 4. sum-finalize-37e6b194124cdba8701c0bcccda3d779
 5. partitions-a42ffbc4519a9ece50642e0abf538c03
 6. getitem-c792600458c5b6f9356df7390e977633
 7. add-2371b017cb36d22d5eacb3d8120c99f8
>.
Traceback (most recent call last):
  File "[/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 60](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py#line=59), in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.PicklingError: Can't pickle <function add at 0x7f51c0fbfec0>: it's not the same object as _operator.add

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 65](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py#line=64), in dumps
    pickler.dump(x)
_pickle.PicklingError: Can't pickle <function add at 0x7f51c0fbfec0>: it's not the same object as _operator.add

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 77](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py#line=76), in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py#line=1536), in dumps
    cp.dump(obj)
  File "[/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py#line=1302), in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: cannot pickle '_nrt_python._MemInfo' object

PicklingError                             Traceback (most recent call last)
File [/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py:60](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py#line=59), in dumps(x, buffer_callback, protocol)
     59 try:
---> 60     result = pickle.dumps(x, **dump_kwargs)
     61 except Exception:

PicklingError: Can't pickle <function add at 0x7f51c0fbfec0>: it's not the same object as _operator.add

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
File [/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py:65](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py#line=64), in dumps(x, buffer_callback, protocol)
     64 buffers.clear()
---> 65 pickler.dump(x)
     66 result = f.getvalue()

PicklingError: Can't pickle <function add at 0x7f51c0fbfec0>: it's not the same object as _operator.add

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
File [/usr/local/lib/python3.11/site-packages/distributed/protocol/serialize.py:366](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/serialize.py#line=365), in serialize(x, serializers, on_error, context, iterate_collection)
    365 try:
--> 366     header, frames = dumps(x, context=context) if wants_context else dumps(x)
    367     header["serializer"] = name

File [/usr/local/lib/python3.11/site-packages/distributed/protocol/serialize.py:78](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/serialize.py#line=77), in pickle_dumps(x, context)
     76     writeable.append(not f.readonly)
---> 78 frames[0] = pickle.dumps(
     79     x,
     80     buffer_callback=buffer_callback,
     81     protocol=context.get("pickle-protocol", None) if context else None,
     82 )
     83 header = {
     84     "serializer": "pickle",
     85     "writeable": tuple(writeable),
     86 }

File [/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py:77](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/pickle.py#line=76), in dumps(x, buffer_callback, protocol)
     76     buffers.clear()
---> 77     result = cloudpickle.dumps(x, **dump_kwargs)
     78 except Exception:

File [/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py#line=1536), in dumps(obj, protocol, buffer_callback)
   1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1537 cp.dump(obj)
   1538 return file.getvalue()

File [/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/cloudpickle/cloudpickle.py#line=1302), in Pickler.dump(self, obj)
   1302 try:
-> 1303     return super().dump(obj)
   1304 except RuntimeError as e:

TypeError: cannot pickle '_nrt_python._MemInfo' object

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[6], line 7
      5 print(type(output))
      6 print(output)
----> 7 coutput = dask.compute(output)[0]
      8 print(type(coutput))
      9 print(coutput)

File [/usr/local/lib/python3.11/site-packages/dask/base.py:660](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/dask/base.py#line=659), in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    657     postcomputes.append(x.__dask_postcompute__())
    659 with shorten_traceback():
--> 660     results = schedule(dsk, keys, **kwargs)
    662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File [/usr/local/lib/python3.11/site-packages/distributed/protocol/serialize.py:392](https://cms01.hep.wisc.edu:8003/user/rsimeon/lab/tree/home/usr/local/lib/python3.11/site-packages/distributed/protocol/serialize.py#line=391), in serialize(x, serializers, on_error, context, iterate_collection)
    390     except Exception:
    391         raise TypeError(msg) from exc
--> 392     raise TypeError(msg, str_x) from exc
    393 else:  # pragma: nocover
    394     raise ValueError(f"{on_error=}; expected 'message' or 'raise'")

TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 8 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f519ff778d0>\n 0. Dict-e9470c53-4fde-4b2e-9fba-355b2af184dd\n 1. lumilist-unique-chunked-45752f79d9f1cd5c491c6273eea610de\n 2. lumilist-unique-finalize-180fa53edd47b037d837d5cc42a9cad3\n 3. <dask-awkward.lib.core.ArgsKwargsPackedFunction ob-aa20eed3f086352e6b8d3580ec93efca\n 4. sum-finalize-37e6b194124cdba8701c0bcccda3d779\n 5. partitions-a42ffbc4519a9ece50642e0abf538c03\n 6. getitem-c792600458c5b6f9356df7390e977633\n 7. add-2371b017cb36d22d5eacb3d8120c99f8\n>')

Desktop (please complete the following information):

  • OS: linux
  • Version: AlmaLinux 9

Additional context
Running within a container with an image inheriting from coffeateam/coffea-dask-almalinux9:2025.1.1-py3.11 and dask_jobqueue version 0.9.0.

@rpsimeon34 rpsimeon34 added the bug Something isn't working label Jan 24, 2025
@lgray
Copy link
Collaborator

lgray commented Jan 24, 2025

Can you attach your .csv file to this issue?

@rpsimeon34
Copy link
Contributor Author

Sorry for the delay - a smaller version of the csv (that still reproduces the behavior) is now attached to the top of the issue.

@ikrommyd
Copy link
Collaborator

ikrommyd commented Jan 25, 2025

Maybe it has to deal with numba inside lumi_tools? numba/numba#8791, numba/numba#8797. Lumi tools also uses numba.typed.Dict

@lgray
Copy link
Collaborator

lgray commented Jan 25, 2025

I'll get to this probably tomorrow. I'd check if updating numba helps, indeed. Though I think the images should be fairly recent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants