Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTask Execution Storing Opened Files Optimization #27

Open
Keith-Bateman opened this issue Apr 23, 2024 · 0 comments
Open

DataTask Execution Storing Opened Files Optimization #27

Keith-Bateman opened this issue Apr 23, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@Keith-Bateman
Copy link
Member

Is your feature request related to a problem? Please describe.
Currently, every write or read is performed with an expensive phase including open, seek, write/read, and close. This could be fixed by storing opened files and keeping them opened on the executors.

Describe the solution you'd like
Explore the cost and benefit of keeping files open across I/O operations, and possible different policies for improving this

Describe alternatives you've considered
It need not be the case that every file is opened on every executor. I would like to explore the possibility of having executors responsible for different files, forcing scheduling to push DataTasks to the executors that have different files opened or incorporating that information into a novel scheduling policy.

@Keith-Bateman Keith-Bateman added the enhancement New feature or request label Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant