Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow scaling custom resource #2833

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

wjiec
Copy link
Contributor

@wjiec wjiec commented Aug 19, 2024

This PR implements the features described in #2779.

Comment on lines 145 to 146
// we must block until all started informers' caches were synced
_ = fac.WaitForCacheSync(f.stopChan)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for blocking here is that command call the internal/dao/registry::Meta.loadCRDs method during initialization (and as part of the init of the whole app), and if it doesn't block here, it will result in an incorrect list of CRDs being read (and theoretically, other resources will have the same problem...).

But this also creates additional problems, in my cluster with a slightly higher load, a total of 96 CRDs takes about ~2s to synchronize.

I didn't think of any good way (or I'm missing something) to solve this problem, does anyone have any ideas?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjiec This could indeed be a problem. As you've noted any cluster with lots of resources will become a pb.
The idea here is to leverage k9s eventual consistance so we would not have to block til all caches are updated.

I think we could perhaps lazy eval the crd instead to gauge whether it is scalable once the user decides to actually view it.
Would this make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derailed I've also considered checking if the crd is scalable when the view is created, but consider a scenario where if k9s needs to display a crd resource that supports scalable as soon as it starts up, and at that point the cache may not have finished synced, then factory.Get will either return a NotFound error or will need to block until that crd finish synced. Doesn't seem like an optimal approach in this case either?

The current logic decides whether to wait for the cache to finish sync based on the wait parameter, which I see is false in most cases, and only in some special cases does it have wait = true, which I think might be acceptable?

Or we make the key binding dynamic, and when the crds sync is done, we update the view to add the scale cmd.

What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjiec Thank you for your input and sorry for the delay... I like the idea of eventually enabling the scaling option once the resource is synced. This allows for a non blocking approach.

Copy link
Owner

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjiec Thank you for taking this on Jayson!

Comment on lines 145 to 146
// we must block until all started informers' caches were synced
_ = fac.WaitForCacheSync(f.stopChan)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjiec This could indeed be a problem. As you've noted any cluster with lots of resources will become a pb.
The idea here is to leverage k9s eventual consistance so we would not have to block til all caches are updated.

I think we could perhaps lazy eval the crd instead to gauge whether it is scalable once the user decides to actually view it.
Would this make sense?

@wjiec wjiec force-pushed the feat/scale-crd branch 4 times, most recently from 07f9053 to afa4070 Compare December 1, 2024 11:00
@wjiec
Copy link
Contributor Author

wjiec commented Dec 1, 2024

@derailed I spent some time looking into the logic in Meta, and I found that all resources (including CRDs) are actually loaded during loadPreferred, and that loadCRDs don't bring any additional changes (even the previous assignment of meta.Name = crd.Name resulted in the wrong GVRs being registered to Meta. resMetas).

loadCRDs now wait for the cache to sync and add scaleCat for the corresponding CRD. In scale_extender it is dynamically decided whether to add a Scale command to the view based on the presence or absence of a scaleCat.

What do you think of this?

@derailed
Copy link
Owner

derailed commented Dec 7, 2024

@wjiec Thank you for taking another pass Jayson!! The catch here is crd loads are currently async. waitForCacheSync will halt the load until all caches are refreshed (not just crds!). This could be a pb on larger clusters and hence I had initially opted to load async thus relying of eventual consistency. Does this make sense or am I missing it?

@derailed derailed added enhancement New feature or request question Further information is requested labels Dec 7, 2024
@wjiec
Copy link
Contributor Author

wjiec commented Dec 8, 2024

@derailed Thank you very much for your reply and reminder, I think I understand what you mean! Sorry, it's my understanding that's wrong. I always thought loadCRDs would only be executed once on k9s startup 😅 I just now understand that the eventual consistency you're talking about says that loadCRDs load a portion at a time until the caches are done synchronizing.

@derailed
Copy link
Owner

@wjiec Thank you Jayson! No worries. I am glad you've asked!!
These can indeed get a little bit tricky ;(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants