Create staggered hierarchy for CellArr dataset for flexibility for power users #81
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Power users may want to manage their
CellArrDataset
more directly withtiledb.Array
or uris that do not share a common prefix. This PR provides an example implementation of how to achieve this by extracting out a common base class that is then extended to reproduce the existing class with the implemented safeguards:class _CellArrDatasetBase
: base class that is constructed withtiledb.Array
s: This does not manage (open/close) any arrays. An additional check for read-only is added to the constructorclass _CellArrDatasetUri(_CellArrDatasetBase)
: this class is constructed with uris pointing to existingtiledb.Array
s. This class opens arrays from these uris (each uri is taken as is, no prefix prepended) and closes them via__del__
. The arrays are passed intosuper().__init__
CellArrDataset(_CellArrDatasetUri)
: this class has the same interface as before. In the constructor, it simply does some string concatenations to produce the uris that are then passed intosuper().__init__
.Most users should use the existing class (3). Power users can fall back to (1) or (2) for extra flexibility if needed. The
_
prefix of (1) and (2) indicates that people who use those should not expect support and will need to debug on their own if they use them incorrectly.Note: This is a draft PR to start a discussion around such power user classes. In my use case, I am planning on a processing pipeline that maps one
CellArrDataset
to a new one in each step. The input is considered immutable, i.e. I currently have to copy the metadata each time, even when only the matrix data changes. Using (1) and (2) allows for much greater flexibility in such scenarios and avoids unnecessary data replication.