Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zarr v2 endpoints to Tiled #774

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

Conversation

genematx
Copy link
Contributor

@genematx genematx commented Aug 6, 2024

This PR exposes Tiled data as a zarr collection on a set of new api endpoints, /zarr/v2/.... This allows one to use zarr clients directly with Tiled, as if it was an external filesystem accessed through fsspec.

Assuming a demo Tiled server is running on 127.0.0.1:8000 (e.g. started with tiled serve demo), one can read its contents into zarr by first specifying a file system mapper and then passing it to zarr:

import zarr
from fsspec import get_mapper

url = "http://localhost:8000/zarr/v2/"
fs_mapper = get_mapper(url)
root = zarr.open(fs_mapper, mode="r")

The resulting object is a zarr.Group, which represents the root of the Tiled catalog tree and supports (most) of the usual operations on zarr groups:

>>> print(group)
<zarr.hierarchy.Group '/' read-only>

>>> list(group.keys())
['dynamic', 'flat_array', 'high_entropy', 'low_entropy',
'nested', 'scalars', 'structured_data', 'tables']
>>> root.tree()
/
├── dynamic (3, 3) float64
 ├── flat_array (100,) float64
 ├── high_entropy (100, 100) int64
 ├── low_entropy (100, 100) int32
 ├── nested
 │   ├── cubes
 │   │   ├── tiny_cube (50, 50, 50) float64
 │   │   └── tiny_hypercube (50, 50, 50, 50, 50) float64
 │   ├── images
 │   │   ├── big_image (10000, 10000) float64
 │   │   ├── medium_image (1000, 1000) float64
 │   │   ├── small_image (300, 300) float64
 │   │   └── tiny_image (50, 50) float64
 │   └── sparse_image (100, 100) float64
 ├── scalars
 │   ├── e_arr (1,) <U7
 │   ├── fortytwo () int64
 │   ├── fsc () <U5
 │   └── pi () float64
 ├── structured_data
 │   ├── pets
 │   └── xarray_dataset
 │       ├── lat (2, 2) float64
 │       ├── lon (2, 2) float64
 │       ├── precipitation (2, 2, 3) float64
 │       ├── temperature (2, 2, 3) float64
 │       └── time (3,) datetime64[ns]
 └── tables
     ├── long_table
     │   ├── A (100000,) float64
     │   ├── B (100000,) float64
     │   └── C (100000,) float64
     ├── short_table
     │   ├── A (100,) uint8
     │   ├── B (100,) uint8
     │   └── C (100,) uint8
     └── wide_table
         ├── A (10,) float64
         ├── B (10,) float64
         ├── C (10,) float64
         ...
         ├── X (10,) float64
         ├── Y (10,) float64
         └── Z (10,) float64

NOTE: To access Tiled servers that require authentication, we can pass an api-key in the header of the HTTP requests. With fsspec, this is done by explicitly constructing an HTTPFileSystem object and mapping it to zarr:

from fsspec.implementations.http import HTTPFileSystem

headers = {"Authorization": "Apikey your-api-key-goes-here",
           "Content-Type": "application/json"}
fs = HTTPFileSystem(client_kwargs={"headers": headers})
root = zarr.open(fs.get_mapper(url), mode="r")

The native tiled datastructures are mapped to zarr as follows:

Tiled zarr
Container Group
Array Array
Sparse Array Array (dense)
Data Frame Group (of columns)
Data Frame Column Array

Addresses the Issue #562.

Checklist

  • Add a Changelog entry
  • Add the ticket number which this PR closes to the comment section

@genematx genematx requested a review from danielballan August 6, 2024 21:40
@joshmoore joshmoore mentioned this pull request Sep 9, 2024
2 tasks
@genematx genematx changed the title Add zarr endpoints to Tiled Add zarr v2 endpoints to Tiled Oct 22, 2024
@genematx genematx marked this pull request as ready for review October 22, 2024 16:52
pyproject.toml Outdated
@@ -44,6 +44,7 @@ tiled = "tiled.commandline.main:main"

# This is the union of all optional dependencies.
all = [
"aiohttp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find where this is used. Maybe this was needed in some transient state of the PR but no longer is needed.

$ git grep aiohttp
pyproject.toml:    "aiohttp",
pyproject.toml:    "aiohttp",
pyproject.toml:    "aiohttp",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, since we use starlette for the server and httpx for the client, it would be somewhat odd and redundant to use aiohttp as well.

Copy link
Contributor Author

@genematx genematx Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used by fsspec.implementations.http.HTTPFileSystem, which is needed to connect to a tiled server that requires authentication. I had the same thought yesterday, that this was something I used before but no longer need, but unfortunately it's not the case. We don't need it in all requirements though (only for testing), which I have fixed now.

tiled/_tests/test_zarr.py Outdated Show resolved Hide resolved
arr = zarr.open(fs.get_mapper(url), mode="r")
actual = arr[...]
expected = df[col]
assert numpy.array_equal(actual, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to write raises a helpful error message:

ReadOnlyError: object is read-only

This behavior should be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of test cases here. Just to note, those errors are raised by fsspec and zarr client objects, even before the request reaches Tiled.

@davramov
Copy link

davramov commented Jan 10, 2025

EDIT: I have included my code from this fix here: https://github.com/davramov/tiled/tree/add-zarr-forked

Hi, I am working on serving zarr files to the standalone itkwidgets web browser visualization tool via Tiled using this PR.

I am serving the zarr via tiled using this command:

TILED_CONFIG=config.yaml TILED_ALLOW_ORIGINS="http://localhost:3000" tiled serve directory "../../data/tomo/scratch/" --public --verbose

I ran into an issue where when I pass the tiled path into the itk-vtk-viewer url, http://localhost:3000/?fileToLoad=http://127.0.0.1:8000/zarr/v2/rec20230606_152011_jong-seto_fungal-mycelia_flat-AQ_fungi2_fast, in the web app there is a 404 error:

itkVtkViewer.js:2 
GET http://127.0.0.1:8000/zarr/v2/rec20230606_152011_jong-seto_fungal-mycelia_flat-AQ_fungi2_fast/.zattrs 404 (Not Found)

It doesn't seem like this file is getting served as part of the directory (I noticed in server/zarr.py there are routers for .zarray and .zgroup, but not .zattrs).

To solve this I added a ZarrAttrsAdapter class to tiled/adapters/zarr.py:

class ZarrAttrsAdapter:
    """
    Adapter that exposes a Zarr node's .attrs as JSON.
    """

    structure_family = "node"  # or "container" if you prefer
    specs: List[Spec] = []

    def __init__(self, node: Union[zarr.hierarchy.Group, zarr.core.Array]):
        """
        node: Zarr Group or Array whose attributes we want to serve
        """
        self._node = node

    def metadata(self) -> JSON:
        """
        Return any extra metadata. In this example, it's empty.
        """
        return {}

    def structure(self):
        """
        We have no numeric array data to describe, just JSON attributes.
        """
        return None

    def read(self) -> JSON:
        """
        Return the node's attrs as a plain dictionary.
        """
        return dict(self._node.attrs)

    def __repr__(self) -> str:
        return f"<ZarrAttrsAdapter attrs_keys={list(self._node.attrs.keys())}>"

I also added an additional router in tiled/server/zarr.py called get_zarr_attrs:

@router.get("{path:path}.zattrs", name="Zarr .zattrs metadata")
@router.get("/{path:path}/.zattrs", name="Zarr .zattrs metadata")
async def get_zarr_attrs(
    request: Request,
    entry=SecureEntry(
        scopes=["read:data", "read:metadata"],
        structure_families={
            StructureFamily.table,
            StructureFamily.container,
            StructureFamily.array,
        },
    ),
):
    """
    Return Zarr attributes metadata (.zattrs).
    If entry.metadata() (or entry.metadata) includes "zattrs", return them.
    """
    # If it's an unstructured array, we do not treat it as a group for .zattrs
    if entry.structure_family == StructureFamily.array and not isinstance(
        entry.structure().data_type, StructDtype
    ):
        raise HTTPException(status_code=HTTP_404_NOT_FOUND)

    # Attempt to retrieve .zattrs from entry.metadata
    try:
        metadata_dict = entry.metadata()  # if it's callable
    except TypeError:
        metadata_dict = entry.metadata  # if it's a property

    return Response(
        json.dumps(metadata_dict),
        status_code=200,
        media_type="application/json",
    )

With these changes, I was able to successfully load the zarr volume in the itk-vtk-viewer web app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants