Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support for dynamic match policy changes and always use first match for feasibility/satisfiability #1160

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

milroy
Copy link
Member

@milroy milroy commented Mar 30, 2024

This PR adds support for dynamically changing the match policy based on a jobspec or other conditions. It also configures the first match policy to be used for satisfiability/feasibility checks to improve performance.

This PR addresses but likely does not completely solve the satisfiability/feasibility check performance problem reported in issue #1159. (The system configuration at the time would not have benefited from this change.

@milroy milroy added Status: In Progress performance Fluxion performance and scalability labels Mar 30, 2024
Copy link

codecov bot commented Mar 30, 2024

Codecov Report

Merging #1160 (4219d35) into master (68c636a) will decrease coverage by 0.1%.
The diff coverage is 89.4%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1160     +/-   ##
========================================
- Coverage    71.0%   71.0%   -0.1%     
========================================
  Files          96      96             
  Lines       12867   12875      +8     
========================================
+ Hits         9147    9150      +3     
- Misses       3720    3725      +5     
Files Coverage Δ
resource/traversers/dfu.cpp 87.5% <100.0%> (-0.2%) ⬇️
resource/traversers/dfu_impl.hpp 94.7% <ø> (ø)
resource/traversers/dfu_impl.cpp 82.9% <80.0%> (-0.1%) ⬇️

... and 4 files with indirect coverage changes

@vsoch
Copy link
Member

vsoch commented Dec 17, 2024

Potential use cases for this:

  • Folks are asking about (and wanting demonstration for) this capability in the hpc.social slack
  • In that same thread, I'd like to do a Flux tutorial that demonstrates it working
  • We would very much find this useful for Kubernetes environments in a custom scheduler plugin.

Some points we talked about:

  • On a super large cluster, if a user submits a custom jobspec that takes 1 minute to match, this could hugely disrupt the cluster. Likely we'd want a way to disable it, system wide, and possibly have a whitelist policy.
  • If not done via an official change to jobspec (unlikely to happen soon) we can pass forward some preference another way (attributes, for example).

@trws did you ever look at this PR and did you have any thoughts? I'd like to push on it, given the above, even if we have something that isn't part of core, but something that can be exposed another way. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Fluxion performance and scalability Status: In Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants