Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: opt-in to "all" semantics when determining freight's availability to a given stage #3207

Open
3 tasks done
aidan-canva opened this issue Jan 4, 2025 · 5 comments
Open
3 tasks done

Comments

@aidan-canva
Copy link

aidan-canva commented Jan 4, 2025

Checklist

  • I've searched the issue queue to verify this is not a duplicate feature request.
  • I've pasted the output of kargo version, if applicable.
  • I've pasted logs, if applicable.

Note

I'm not actually sure if this is a feature request or a bug - the current functionality caught me by surprise though I could be biased based on my personal use-case.

I've searched the docs, issues, discussions and Discord and I couldn't find any mention of this behavior. I've also read through parts of the code-base to determine if this feature request is supported but not documented. I've found reference to the ControlFlow concept, but it doesn't support any extra customization at present.

Please let me know if this use-case is solvable using existing constructs.

Proposed Feature

(Optionally?) Freight should only be promoted to a downstream Stage when all upstream Stages have successfully verified it.

Motivation

In our environment we have a large number of Stages (each represents a different cluster/environment). Stages are grouped based on risk profile (think: dev, staging, prod) where we would like each Stage in a 'group' to receive a Freight in parallel. Each Stage then runs their own verification process (potentially different / more exhaustive per-Stage). Each Stage also has autoPromotionEnabled - we are relying on the verifications to ensure a safe promotion.

If a single Stage fails to verify a given Freight, it indicates something needs to be investigated and possibly fixed and as a result, the Freight should not be promoted further as it could fail in more critical environments.

In a simple Kargo Pipeline with no parallel steps, the above behavior works as expected: eg:

Warehouse-A -> Stage-Dev -> Stage-Staging -> Stage-Prod

In pipelines that have 'parallel' Stages, Freight will be promoted to a downstream Stage if any single upstream Stage has verified it:


                            - Stage-StagingA -
Warehouse-A -> Stage-Dev  -|                   |- Stage-Prod
                            - Stage-StagingB -

If Freight aaaaaa passed verification in Stage-StagingA but fails in Stage-StagingB, Stage-Prod will receive the Freight.

Suggested Implementation

I am struggling to think of use-cases where the current functionality is the preferred behavior which is making me lean towards this being a bug rather than a feature. If its a bug, I have a small working POC that changes the getAvailableFreight method in a ControlFlow Stage to only return Freight verified by all upstream Stages. Assumably, the same logic could be applied to a regular Stage.

If this is a feature, then I imagine the same logic would be suitable but would need to be configurable - ideally per-Stage. Though maybe this is only a 'bug' when autoPromotionEnabled is set on a Stage and this behavior is otherwise expected?

Let me know if opening a draft PR helps to understand the problem better.

@krancour
Copy link
Member

krancour commented Jan 4, 2025

This is basically #1168. It went stale and got revived (by me) a few times, and the last time it went stale, I left it, simply because in all the time it had been open, it never got much traction. Personally, I agree 100% that this is a useful feature. Just no one's been asking for it.

If you want to do some of the legwork on this, we'd certainly be grateful for that.

A couple words of caution on revising Freight availability logic -- I've got some non-trivial refactoring of that going on in #3100. It's best to hold off on touching that so as not to invite some rough merge conflicts. We would also need the existing "any" semantics to remain the default, with "all" semantics as an opt-in, so that this won't be a breaking change...

With that in mind, my best suggestion is to start by proposing whatever (non-breaking) modifications you think will be required to various resource types to enable this. That would be a great way to drive further discussion.

Thanks for re-raising this and for your willingness to help make it happen!

@krancour krancour changed the title feat: Support ControlFlow 'wait gate' functionality feat: opt-in to "all" semantics when determining freight's availability to a given stage Jan 4, 2025
@aidan-canva
Copy link
Author

aidan-canva commented Jan 4, 2025

I'm still fresh with Kargo and the direction it's going, but here is a high level approach to implementing this:


Scope

From the linked issue #1168, this feature is scoped to the use-case of auto promotions and shouldn't change the behaior of other types of promotion (ie direct Warehouse promotion or manual approvals). The existing any semantics for auto promotion will remain the default when spec.promotionPolicies[*].autoPromotionEnabled is set to true.

Interface changes

The Project CRD's spec.promotionPolicies value should be extended to support an additional property: autoPromotionStrategy which supports the following values:

  • any(default) - Freight is auto-promoted when any of the upstream Stages have verified it.
  • all - Freight is auto-promoted when all of the upstream Stages have verified it.

The strategy names might benefit from being more explicit ie, 'upstream_stages_any' and 'upstream_stages_all'. There could also be other strategies implemented in the future, such as 'majority' (Freight is promoted when a majority of upstream Stages have verified it) or 'percentage' (Freight is promoted when a >X% of upstream Stages have verified it).

The implemented behavior should be consistent for both 'regular Stages' and 'control flow Stages'. Here is an example Project which sets the all strategy:

apiVersion: kargo.akuity.io/v1alpha1
kind: Project
metadata:
  name: example
spec:
  promotionPolicies:
  - stage: test
    autoPromotionEnabled: true
    autoPromotionStrategy: all
  - stage: uat
    autoPromotionEnabled: true
    autoPromotionStrategy: all

I don't believe the UI currently indicates if auto promotion is enabled - maybe that is a gap in UX right now but I don't think this feature would be the driver for any changes.

All other resource types should be able to stay the same.

Implementation

This will likely change with the upcoming refactor, but right now I believe this could be implemented relatively simply within regular_stages.go and control_flow.go (and ideally de-duplicate this logic a bit).

@krancour
Copy link
Member

krancour commented Jan 4, 2025

This seems like a good starting point. If I may suggest a refinement:

I wouldn't want to see the option for this being in the Project resource. As I wrote in another thread:

Promotion policies are defined at the Project-level only for security reasons. Assume that there might be a few developers on a team that have sufficient permission to edit Stages, but lack permissions to promote. If the switch for enabling auto-promoting were at the Stage-level, it would be possible for someone not authorized to promote to make a promotion happen anyway just by switching on auto-promotion. This is why that switch is at the Project-level, where, presumably, someone with greater permissions has the final say on auto-promotions. This is a long way of saying: The existing policy is where it is for security reasons, but doesn't feel like the right place to be defining behavioral things...

... such as opting into different availability semantics.

I think a more appropriate place to put this might be in the FreightRequest type. This is what Stages use to express what kind of Freight they want and what upstream Stages those need to clear to become available to it. Freight's currently available after it's verified in any Stage on that list. It think any new option that alters that behavior to use "all" semantics should be nearby.

My other suggestion is to keep the word "promotion" out of this as much as we can, and especially keep the words "auto promotion" out of this. I totally get that preventing what you consider pre-mature autopromotion was the impetus for opening this issue, but I'd want the rules for what Freight is available to a given Stage to be consistent regardless of whether a Promotion were automatic or manual. Maybe "availabilityStrategy" or something like that?

I don't believe the UI currently indicates if auto promotion is enabled

I think there's a little robot head icon on the Stage if it's enabled.

Once again, thank you so much for your willingness to collaborate on this!

@aidan-canva
Copy link
Author

With #3100 now merged, I'll re-structure my implementation to fit and then open a draft PR for initial feedback.

@krancour
Copy link
Member

Thanks @aidan-canva! I planned to call your attention to that having been merged, but it seems you were watching it. 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants