-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: gitops-engine fixes, plus enable Numaflow Controller deleted and then recreated #471
base: main
Are you sure you want to change the base?
Conversation
… to Unhealthy when the NumaflowController is not healthy (phase failed) and the ChildResourceHealthy is not present in the NumaflowController Signed-off-by: Antonino Fugazzotto <antonino_fugazzotto@intuit.com>
Signed-off-by: Antonino Fugazzotto <antonino_fugazzotto@intuit.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
…wner reference Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
…tedPipelineDefinition function Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
err := r.deleteChildren(ctx, controller.Namespace, controller.Name) | ||
if err != nil { | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the manifests don't have an OwnerReference anymore, the deletion of the children is no longer automatic when NumaflowController is deleted, so we explicitly do it.
// No need to explicitly requeue since a new reconciliation will occur due to changes in watched child resources | ||
return ctrl.Result{}, nil | ||
} | ||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous code looked specifically for "version change" to determine if we need to delete/recreate - now do it any time the manifest has a difference, regardless of whether it's due to "version change"
manifestsWithOwnership, err := applyOwnershipToManifests(manifests, controller) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to apply ownership reference, %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we no longer apply OwnerReference since gitops-engine assumes there's no OwnerReferences
if err != nil { | ||
return nil, fmt.Errorf("failed to parse the manifest, %w", err) | ||
} | ||
for _, obj := range targetObjs { | ||
obj.SetNamespace(namespace) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to compare the live vs target manifests effectively, the target manifests need to have "namespace" applied
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
…o replace with force should be sufficient Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Fixes #455
Modifications
This is a follow-up to this previous PR. That PR made it so that if the version number changes, then the previous Numaflow Controller manifests are deleted and then recreated. However, it doesn't take care of the cases in which the version number doesn't change but there's some other change to the manifest of a given version. (In fact, we saw this issue when deploying.)
When the NumaflowController reconciler is invoked either through version change or something else, this PR makes it so that we follow this process:
NumaflowController
reconciler sees the change and finds the manifests associated with the versionScenario 1: It finds the version:
NumaflowController
reconciler deletes the previous manifests and requeuesScenario 2: it doesn't find the version:
NumaflowController
is marked "Failed"gitops-engine code wasn't really working
While working on this PR, I realized that the code we had around the gitops-engine (for applying Numaflow Controller manifests) wasn't truly working the way it was supposed to. The way it's supposed to work is that we first determine if we need to do an "apply" based on whether there's a difference between what we want to apply and what is live, and then only do an "apply" if there's a difference. However, the code was never able to detect what was live, so it was always just doing an "apply".
I realized that the reason it wasn't able to detect what was live is because gitops-engine assumes that manifests don't have "Owner references". I am guessing since it needs to manage the lifecycle of the resources, it doesn't want something else to have applied an OwnerRef to it? In any case, if I remove the owner references that we were applying previously, it is able to detect the live resources and perform the comparison.
Because I removed the Owner reference from the manifests, I had to do the following:
NumaflowController
is deleted (using Finalizer)Verification
Performed the following:
NumaflowController
CR was created since that's newNumaflowController
marked "Failed" andNumaflowControllerRollout
HealthyChildCondition marked False ("degraded"). Numaflow Controller pod still running 1.4.0 and pipeline unpaused.NumaflowController
went back to "Deployed" andNumaflowControllerRollout
HealthyChildCondition back to True.Backward incompatibilities