-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ansible-operator: ignore existing resources at startup or conditional watches in watches.yaml #20
Comments
For the record, here's what my little ansible playbook looks like that gets invoked when new namespaces are found: https://github.com/kiali/kiali-operator/blob/master/playbooks/kiali-new-namespace-detected.yml All it does is:
Notice a trick here where this playbook will actually throttle the annotation modification to once every minute. Thus, if you create 100 namespaces within under 60 seconds, only one time will the CR's annotation be touched and thus only one reconciliation will happen. This helps avoid modifying the CR 100 times (and potentially triggering multiple reconciliations of the CR). |
@jmazzitelli this was brought up during the community meeting and we had concerns on having watches on |
I will try to be as brief as I can to explain what my operator is doing (this is the Kiali operator BTW). Kiali is the UI console for Istio and OpenShift Service Mesh. The service mesh consists of microservices deployed in some namespaces in a cluster (not necessarily all namespaces). Kiali needs to observe those namespaces so it can do various things (manage and observe resources such as pods and Istio-specific CRs, etc in those namespaces). For security purposes, many users of Kiali DO NOT want to grant cluster-wide access to Kiali. Instead, they only want to grant access to those namespaces that are participating in the service mesh - any other namespaces should be off-limits to Kiali. For this reason, there is a setting in the Kiali CR called When a user creates a Kiali CR, the Kiali operator reconciles it. The Kiali operator will examine the list of namespace regexes specified in Now, suppose someone ADDS a new namespace to the cluster which becomes part of the mesh. The Kiali Operator needs to add the Role/RoleBinding to the namespace to grant Kiali access to that new namespace. But there is no way for the Kiali Operator to know about the new namespace (if it isn't watching for them) because the Kiali CR does not necessarily have to be touched (the new namespace might match one of the regexes already defined in the Kiali CR). The user will have to, minimally, manually "touch" the Kiali CR (change a spec setting, or add/modify a dummy annotation so the Kiali operator can be triggered to run a reconciliation). At that point, the reconciliation will be triggered and the Role/RoleBinding will be created. Now, suppose someone DELETES an old namespace, and then later ADDS it back in again. Well, the deletion will have removed the Role/RoleBinding that the Kiali operator originally created. Adding the namespace back in will do nothing to bring that Role/RoleBinding back - so essentially, that namespace is no longer accessible to Kiali, even though it originally was, and even though that namespace was brought back into the mesh. In each case, having the Kiali Operator be notified that a new namespace has been created will allow it to automatically re-reconcile the Kiali CR and handle the Role/RoleBinding creation automatically without the user having to worry about (or even being aware this needs to be done) manually "touching" the Kiali CR to kick off the reconciliation. |
/assign I need to dig a bit deeper into potential limitation as it was brought to my attention that something like this has been discussed in the past and had performance limitations. Will do some research and update with my findings. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
@jmazzitelli: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Feature Request
Describe the problem you need a feature to resolve.
I have an ansible operator that wants to reconcile its CRs when new namespaces are added to the cluster. I do this by adding this to watches.yaml:
new-namespace-detected.yml
is very small - it just "touches" the CRs that exist (just sets an annotation on them) so the operator then reconciles them (the operator has "watchAnnotationsChanges: true" for the CR Kind in the same watches.yaml).The issue is - for clusters with hundreds or thousands of namespaces, this means when the operator starts up, it will run this playbook hundreds/thousands of times. While the playbook takes at most 1 to 2 seconds to complete, multiply that by hundreds or thousands and it will potentially spin the operator for a long period of time at startup (e.g. for every 600 namespaces, the operator will run this playbook multiple times for a total of about 10 to 20 minutes). Not only that, but I don't even need to be told about all of these namespaces because (a) usually, when the operator starts, there are no CRs in the cluster anyway, so there is nothing for my operator to do when processing all the namespaces at startup and (b) even if there are CRs in the cluster, the first time the CRs are reconciled, the operator will process all of the cluster's existing namespaces. Thus, I really only want to be told about namespaces that are created AFTER the operator starts.
Describe the solution you'd like.
I have 2 ideas about how to avoid the long startup processing. The first is the ideal solution; if that cannot be implemented, the second is an alternative (though not the best one).
Solution Idea 1: Do NOT trigger a reconciliation at all for any resources of the watched Kind that already exist at the time the operator starts up (i.e. do not run the playbook for all namespaces that already exist at the time the operator starts). So this solution would have a flag in watches.yml (something like
ignoreExistingResourcesAtStartup: true
) - when this is true, all resources of the given Kind that already exist at the time the operator starts up will be ignored. Do NOT trigger the ansible playbook to run. Instead, only run the playbook when resources of the given Kind (in my example, Namespace) are created AFTER the operator starts up.Solution Idea 2: Alternatively, I would like a conditional placed in the watches.yaml just to be able to tell the operator NOT to watch for new namespaces if a particular "setting" is detected (say, an environment variable placed on the operator Pod). In other words, ignore that particular list item in the watches.yaml - do not watch resources of the given Kind. This will be a "cutoff switch" for customers who have very large clusters (as in a large number of namespaces). Those customers would install the operator with this special setting so they can tell the operator to NOT watch for new namespaces thus saving the operator from having to consume a lot of CPU/resources at startup. The drawback would mean the customer would need to manually touch the CR so it can reconcile when a new namespace is added. But this would just be a known drawback that you get in exchange for reducing startup time and resource consumption of the operator.
/language ansible
Workaround
A workaround would be to require the user to put a label on the namespaces that you want the operator to automatically detect. When doing this, I would be able to use the already existing "selector" feature of watches.yaml (see https://sdk.operatorframework.io/docs/building-operators/ansible/reference/watches/). But this would require the user to know ahead of time that their namespaces should be labeled properly, it would require the user to have permission to label namespaces, and it would require users to "pollute" their namespace labels sets just for supporting the operator (this is something that customers may not want to do). So, while the workaround is viable, it is not ideal.
The text was updated successfully, but these errors were encountered: