Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated zarr compressors API #51

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

updated zarr compressors API #51

wants to merge 2 commits into from

Conversation

kashif
Copy link

@kashif kashif commented Jan 14, 2025

Describe your changes

Update the zarr compressors API and fixed some warnings for zarr v3.

Issue Link

was getting the error:

/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/group.py:2349: UserWarning: The `compressor` argument is deprecated. Use `compressors` instead.
  compressors = _parse_deprecated_compressor(compressor, compressors)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/mnt/scratch/kashif/mllam-data-prep/mllam_data_prep/__main__.py", line 80, in <module>
    create_dataset_zarr(fp_config=args.config, fp_zarr=args.output)
  File "/mnt/scratch/kashif/mllam-data-prep/mllam_data_prep/create_dataset.py", line 277, in create_dataset_zarr
    ds.to_zarr(fp_zarr, consolidated=True, mode="w", encoding=encoding)
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/xarray/core/dataset.py", line 2622, in to_zarr
    return to_zarr(  # type: ignore[call-overload,misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/xarray/backends/api.py", line 2216, in to_zarr
    dump_to_store(dataset, zstore, writer, encoding=encoding)
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/xarray/backends/api.py", line 1952, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1022, in store
    self.set_variables(
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1194, in set_variables
    zarr_array = self._create_new_array(
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1089, in _create_new_array
    zarr_array = self.zarr_group.create(
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/group.py", line 2234, in create
    return self.create_array(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/_compat.py", line 43, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/group.py", line 2351, in create_array
    self._sync(
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/sync.py", line 187, in _sync
    return sync(
           ^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/sync.py", line 142, in sync
    raise return_result
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/sync.py", line 98, in _runner
    return await coro
           ^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/group.py", line 1119, in create_array
    return await create_array(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/array.py", line 3919, in create_array
    array_array, array_bytes, bytes_bytes = _parse_chunk_encoding_v3(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/array.py", line 4120, in _parse_chunk_encoding_v3
    out_bytes_bytes = tuple(_parse_bytes_bytes_codec(c) for c in maybe_bytes_bytes)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/core/array.py", line 4120, in <genexpr>
    out_bytes_bytes = tuple(_parse_bytes_bytes_codec(c) for c in maybe_bytes_bytes)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kashif/.venv/pytorch/lib/python3.12/site-packages/zarr/registry.py", line 184, in _parse_bytes_bytes_codec
    raise TypeError(f"Expected a BytesBytesCodec. Got {type(data)} instead.")
TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead.

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@kashif kashif changed the title updated compressors API updated zarr compressors API Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant