Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allowing to use zcollection without any dask cluster. #16

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Thomas-Z
Copy link
Collaborator

The objective of these changes is to enable the use of a zcollection without requiring a Dask cluster.

This functionality is made possible by the new distributed parameter, which has been added to all functions and methods that rely on Dask (except for the map and map_overlap functions, as they are specifically designed for Dask usage).

However, the choice of the distributed keyword might not be ideal, given that we are working in a context where the distributed package is heavily used, which could lead to some confusion.
The keyword was already in use within a sub-function, so I adopted it, but this is open for discussion.

@robin-cls: Could you review this and confirm whether it meets your needs?
@fbriol : Could you evaluate the implementation and either approve it or suggest an alternative approach?

@Thomas-Z Thomas-Z requested a review from fbriol November 20, 2024 14:11
@Thomas-Z Thomas-Z marked this pull request as draft November 20, 2024 14:11
@Thomas-Z Thomas-Z added the enhancement New feature or request label Nov 20, 2024
…iew.load() functions.

refactor: collection.partitions() and view.partitions() now handle indexer and selected_partitions parameters.
@Thomas-Z
Copy link
Collaborator Author

Thomas-Z commented Dec 7, 2024

A second commit includes changes that allow loading a specific set of partitions using the new selected_partitions parameter of the collection.load() and view.load() functions.

@robin-cls
Copy link

Hello @Thomas-Z thank you for this first implementation. My use case is to use the insert() method in a dask worker without submitting any graph to the current dask scheduler. For now, I am quite busy on another pressing matter, but I think I will be able to test your version by the end of the year if that's not too late for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants