ansible-operator: ignore existing resources at startup or conditional watches in watches.yaml #20

jmazzitelli · 2023-05-03T14:45:52Z

Feature Request

Describe the problem you need a feature to resolve.

I have an ansible operator that wants to reconcile its CRs when new namespaces are added to the cluster. I do this by adding this to watches.yaml:

# Watching new namespaces so the operator can automatically reconcile CRs as new namespaces come online
- version: v1
  group: ""
  kind: Namespace
  playbook: playbooks/new-namespace-detected.yml
  reconcilePeriod: "0s"
  manageStatus: False
  watchDependentResources: False
  watchClusterScopedResources: False
  watchAnnotationsChanges: False

new-namespace-detected.yml is very small - it just "touches" the CRs that exist (just sets an annotation on them) so the operator then reconciles them (the operator has "watchAnnotationsChanges: true" for the CR Kind in the same watches.yaml).

The issue is - for clusters with hundreds or thousands of namespaces, this means when the operator starts up, it will run this playbook hundreds/thousands of times. While the playbook takes at most 1 to 2 seconds to complete, multiply that by hundreds or thousands and it will potentially spin the operator for a long period of time at startup (e.g. for every 600 namespaces, the operator will run this playbook multiple times for a total of about 10 to 20 minutes). Not only that, but I don't even need to be told about all of these namespaces because (a) usually, when the operator starts, there are no CRs in the cluster anyway, so there is nothing for my operator to do when processing all the namespaces at startup and (b) even if there are CRs in the cluster, the first time the CRs are reconciled, the operator will process all of the cluster's existing namespaces. Thus, I really only want to be told about namespaces that are created AFTER the operator starts.

Describe the solution you'd like.

I have 2 ideas about how to avoid the long startup processing. The first is the ideal solution; if that cannot be implemented, the second is an alternative (though not the best one).

Solution Idea 1: Do NOT trigger a reconciliation at all for any resources of the watched Kind that already exist at the time the operator starts up (i.e. do not run the playbook for all namespaces that already exist at the time the operator starts). So this solution would have a flag in watches.yml (something like ignoreExistingResourcesAtStartup: true) - when this is true, all resources of the given Kind that already exist at the time the operator starts up will be ignored. Do NOT trigger the ansible playbook to run. Instead, only run the playbook when resources of the given Kind (in my example, Namespace) are created AFTER the operator starts up.

Solution Idea 2: Alternatively, I would like a conditional placed in the watches.yaml just to be able to tell the operator NOT to watch for new namespaces if a particular "setting" is detected (say, an environment variable placed on the operator Pod). In other words, ignore that particular list item in the watches.yaml - do not watch resources of the given Kind. This will be a "cutoff switch" for customers who have very large clusters (as in a large number of namespaces). Those customers would install the operator with this special setting so they can tell the operator to NOT watch for new namespaces thus saving the operator from having to consume a lot of CPU/resources at startup. The drawback would mean the customer would need to manually touch the CR so it can reconcile when a new namespace is added. But this would just be a known drawback that you get in exchange for reducing startup time and resource consumption of the operator.

/language ansible

Workaround

A workaround would be to require the user to put a label on the namespaces that you want the operator to automatically detect. When doing this, I would be able to use the already existing "selector" feature of watches.yaml (see https://sdk.operatorframework.io/docs/building-operators/ansible/reference/watches/). But this would require the user to know ahead of time that their namespaces should be labeled properly, it would require the user to have permission to label namespaces, and it would require users to "pollute" their namespace labels sets just for supporting the operator (this is something that customers may not want to do). So, while the workaround is viable, it is not ideal.

The text was updated successfully, but these errors were encountered:

jmazzitelli · 2023-05-03T20:34:30Z

For the record, here's what my little ansible playbook looks like that gets invoked when new namespaces are found:

https://github.com/kiali/kiali-operator/blob/master/playbooks/kiali-new-namespace-detected.yml

All it does is:

Find all instances of CRs in the cluster (there can be more than one, or there can be none at all)
For all instanaces of the found CRs, simply "touch" each one by adding/modifying an annotation with the current time, but ONLY do this if the namespace was created after the last time the CR was reconciled.

Notice a trick here where this playbook will actually throttle the annotation modification to once every minute. Thus, if you create 100 namespaces within under 60 seconds, only one time will the CR's annotation be touched and thus only one reconciliation will happen. This helps avoid modifying the CR 100 times (and potentially triggering multiple reconciliations of the CR).

varshaprasad96 · 2023-05-10T21:21:56Z

@jmazzitelli this was brought up during the community meeting and we had concerns on having watches on namespaces. Could you elaborate more on the use case of having watch on namespace. Having watch on namespaces is unnecessary because: say you have to reconcile an object X (of a particular GVK, assuming namespace scoped for this context). If you set watches on that GVK, the controller will automatically queue in the events coming for them across namespaces in the cluster. Watching the entire namespace and its contents is not necessary.

jmazzitelli · 2023-05-10T21:49:03Z

Could you elaborate more on the use case of having watch on namespace.

I will try to be as brief as I can to explain what my operator is doing (this is the Kiali operator BTW).

Kiali is the UI console for Istio and OpenShift Service Mesh. The service mesh consists of microservices deployed in some namespaces in a cluster (not necessarily all namespaces). Kiali needs to observe those namespaces so it can do various things (manage and observe resources such as pods and Istio-specific CRs, etc in those namespaces).

For security purposes, many users of Kiali DO NOT want to grant cluster-wide access to Kiali. Instead, they only want to grant access to those namespaces that are participating in the service mesh - any other namespaces should be off-limits to Kiali. For this reason, there is a setting in the Kiali CR called deployment.accessible_namespaces (see some documentation on this setting here). deployment.accessible_namespaces is a list of regexes that indicate which namespaces are part of the mesh and that Kiali should be able to access. If a namespace in the cluster does not match one of those regexes, Kiali will not be granted access to see that namespace.

When a user creates a Kiali CR, the Kiali operator reconciles it. The Kiali operator will examine the list of namespace regexes specified in deployment.accessible_namespaces and for each namespace in the cluster that matches one of the regexes in this list, the Kiali operator will create a Role/RoleBinding which gets assigned to the Kiali Service Account. Thus, each namespace that matches a regex in deployment.accessible_namespaces will be accessible to Kiali.

Now, suppose someone ADDS a new namespace to the cluster which becomes part of the mesh. The Kiali Operator needs to add the Role/RoleBinding to the namespace to grant Kiali access to that new namespace. But there is no way for the Kiali Operator to know about the new namespace (if it isn't watching for them) because the Kiali CR does not necessarily have to be touched (the new namespace might match one of the regexes already defined in the Kiali CR). The user will have to, minimally, manually "touch" the Kiali CR (change a spec setting, or add/modify a dummy annotation so the Kiali operator can be triggered to run a reconciliation). At that point, the reconciliation will be triggered and the Role/RoleBinding will be created.

Now, suppose someone DELETES an old namespace, and then later ADDS it back in again. Well, the deletion will have removed the Role/RoleBinding that the Kiali operator originally created. Adding the namespace back in will do nothing to bring that Role/RoleBinding back - so essentially, that namespace is no longer accessible to Kiali, even though it originally was, and even though that namespace was brought back into the mesh.

In each case, having the Kiali Operator be notified that a new namespace has been created will allow it to automatically re-reconcile the Kiali CR and handle the Role/RoleBinding creation automatically without the user having to worry about (or even being aware this needs to be done) manually "touching" the Kiali CR to kick off the reconciliation.

everettraven · 2023-05-15T16:15:19Z

/assign

I need to dig a bit deeper into potential limitation as it was brought to my attention that something like this has been discussed in the past and had performance limitations. Will do some research and update with my findings.

openshift-bot · 2023-08-14T01:00:37Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

jmazzitelli · 2023-09-07T19:56:54Z

/remove-lifecycle stale

openshift-ci · 2023-10-05T15:12:47Z

@jmazzitelli: The label(s) language/ansible cannot be applied, because the repository doesn't have them.

In response to this:

Feature Request

Describe the problem you need a feature to resolve.

I have an ansible operator that wants to reconcile its CRs when new namespaces are added to the cluster. I do this by adding this to watches.yaml:
# Watching new namespaces so the operator can automatically reconcile CRs as new namespaces come online
- version: v1
 group: ""
 kind: Namespace
 playbook: playbooks/new-namespace-detected.yml
 reconcilePeriod: "0s"
 manageStatus: False
 watchDependentResources: False
 watchClusterScopedResources: False
 watchAnnotationsChanges: False
new-namespace-detected.yml is very small - it just "touches" the CRs that exist (just sets an annotation on them) so the operator then reconciles them (the operator has "watchAnnotationsChanges: true" for the CR Kind in the same watches.yaml).

The issue is - for clusters with hundreds or thousands of namespaces, this means when the operator starts up, it will run this playbook hundreds/thousands of times. While the playbook takes at most 1 to 2 seconds to complete, multiply that by hundreds or thousands and it will potentially spin the operator for a long period of time at startup (e.g. for every 600 namespaces, the operator will run this playbook multiple times for a total of about 10 to 20 minutes). Not only that, but I don't even need to be told about all of these namespaces because (a) usually, when the operator starts, there are no CRs in the cluster anyway, so there is nothing for my operator to do when processing all the namespaces at startup and (b) even if there are CRs in the cluster, the first time the CRs are reconciled, the operator will process all of the cluster's existing namespaces. Thus, I really only want to be told about namespaces that are created AFTER the operator starts.

Describe the solution you'd like.

I have 2 ideas about how to avoid the long startup processing. The first is the ideal solution; if that cannot be implemented, the second is an alternative (though not the best one).

Solution Idea 1: Do NOT trigger a reconciliation at all for any resources of the watched Kind that already exist at the time the operator starts up (i.e. do not run the playbook for all namespaces that already exist at the time the operator starts). So this solution would have a flag in watches.yml (something like ignoreExistingResourcesAtStartup: true) - when this is true, all resources of the given Kind that already exist at the time the operator starts up will be ignored. Do NOT trigger the ansible playbook to run. Instead, only run the playbook when resources of the given Kind (in my example, Namespace) are created AFTER the operator starts up.

Solution Idea 2: Alternatively, I would like a conditional placed in the watches.yaml just to be able to tell the operator NOT to watch for new namespaces if a particular "setting" is detected (say, an environment variable placed on the operator Pod). In other words, ignore that particular list item in the watches.yaml - do not watch resources of the given Kind. This will be a "cutoff switch" for customers who have very large clusters (as in a large number of namespaces). Those customers would install the operator with this special setting so they can tell the operator to NOT watch for new namespaces thus saving the operator from having to consume a lot of CPU/resources at startup. The drawback would mean the customer would need to manually touch the CR so it can reconcile when a new namespace is added. But this would just be a known drawback that you get in exchange for reducing startup time and resource consumption of the operator.

/language ansible

Workaround

A workaround would be to require the user to put a label on the namespaces that you want the operator to automatically detect. When doing this, I would be able to use the already existing "selector" feature of watches.yaml (see https://sdk.operatorframework.io/docs/building-operators/ansible/reference/watches/). But this would require the user to know ahead of time that their namespaces should be labeled properly, it would require the user to have permission to label namespaces, and it would require users to "pollute" their namespace labels sets just for supporting the operator (this is something that customers may not want to do). So, while the workaround is viable, it is not ideal.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jmazzitelli mentioned this issue May 3, 2023

get operator to process namespaces faster at intialization kiali/kiali#6081

Closed

openshift-ci bot assigned everettraven May 15, 2023

varshaprasad96 added the triage/needs-information Indicates an issue needs more information in order to work on it. label May 15, 2023

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 14, 2023

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2023

everettraven transferred this issue from operator-framework/operator-sdk Oct 5, 2023

This was referenced Apr 29, 2024

[operator] be able to disable namespace watching kiali/kiali#7322

Closed

[Kiali] stop the reconciliation to other namespaces #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ansible-operator: ignore existing resources at startup or conditional watches in watches.yaml #20

ansible-operator: ignore existing resources at startup or conditional watches in watches.yaml #20

jmazzitelli commented May 3, 2023

jmazzitelli commented May 3, 2023

varshaprasad96 commented May 10, 2023

jmazzitelli commented May 10, 2023

everettraven commented May 15, 2023

openshift-bot commented Aug 14, 2023

jmazzitelli commented Sep 7, 2023

openshift-ci bot commented Oct 5, 2023

Feature Request

Describe the problem you need a feature to resolve.

Describe the solution you'd like.

Workaround

ansible-operator: ignore existing resources at startup or conditional watches in watches.yaml #20

ansible-operator: ignore existing resources at startup or conditional watches in watches.yaml #20

Comments

jmazzitelli commented May 3, 2023

Feature Request

Describe the problem you need a feature to resolve.

Describe the solution you'd like.

Workaround

jmazzitelli commented May 3, 2023

varshaprasad96 commented May 10, 2023

jmazzitelli commented May 10, 2023

everettraven commented May 15, 2023

openshift-bot commented Aug 14, 2023

jmazzitelli commented Sep 7, 2023

openshift-ci bot commented Oct 5, 2023

Feature Request

Describe the problem you need a feature to resolve.

Describe the solution you'd like.

Workaround