Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEVCON-2524] output deploy queue to checkrun summary #772

Merged
merged 13 commits into from
Dec 3, 2024

Conversation

tlin4194
Copy link
Contributor

@tlin4194 tlin4194 commented Oct 28, 2024

Issue:

Previously, when a revision is queued, we mark the github status as queued but have no other indication in the check run itself that this check run is waiting on something.
Users have to figure out which check run is holding up the queue which is difficult because deploys for a root can happen off any branch. We can figure this out using the temporal UI however it would be a better CX if we could just surface the queue itself in the check run.

Part 1

Add queue info to checkrun summary on creating new check run or unlock signal. Queue info is shown as [commit SHA](link to specific check run)

Screenshot 2024-10-31 at 1 23 43 PM

Part 2

Check run summaries do not get updated in cases where a a Terraform workflow is pending Confirm/Reject checkrun action between Plan and Apply steps, since the deploy queue lock only tracks the "Unlock" checkrun action.

  • Surfaces deploy queue from queue package to deploy/terraform package. Some complication using Queue structs here due to cyclic dependencies between the two packages.
  • To avoid restructuring the two packages, instead we will pass the GithubCheckRunCache (github package) over from queue worker to deploy/terraform workflow.
  • StateReceiver will use GithubCheckRunCache to update checkrun summaries for each revision on the queue when a Confirm/Reject action is pending.
Screenshot 2024-11-04 at 12 46 37 PM

Test Plan

  1. Recreating Unlock action: open 2 PRs, run atlantis apply -f on PR 1, and try to merge PR 2. PR 2 will be locked by confirm reject action on PR 1. Any subsequent queued deploys will show up as queued commits (with corresponding links) in checkrun summary.
  2. Recreating diverged confirm/reject action: run atlantis apply -f on unmerged PR 1 and confirm. Create PR 3 and run atlantis apply -f on unmerged PR 3.
  3. Merged PR 1 deploy will update to to show pending action on PR 3. Links will go directly to the specific run instead of just the commit in case there were multiple forced applies on PR 1.

Improve Test Cases

Fix queue_test and updater_test where assertions in CreateOrUpdate were not getting caught since the function is called within a temporal worker.

@tlin4194
Copy link
Contributor Author

/ptal #dev-console

smonero
smonero previously approved these changes Oct 30, 2024
@tlin4194 tlin4194 force-pushed the DEVCON-2524-output-deploy-queue-to-checkrun-summary branch from eb843ba to cbf4fa5 Compare October 31, 2024 19:31
smonero
smonero previously approved these changes Nov 6, 2024
}

workflow.GetLogger(ctx).Debug(fmt.Sprintf("Updating action pending summary for deployment id: %s", i.ID.String()))
_, err := n.CheckRunCache.CreateOrUpdate(ctx, i.ID.String(), request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might pose some non-determinism errors, see https://community.temporal.io/t/workflow-determinism/4027, but I could be wrong. If it does, let's add some workflow versioning here

@tlin4194 tlin4194 changed the title Devcon 2524 output deploy queue to checkrun summary [DEVCON-2524] output deploy queue to checkrun summary Nov 8, 2024
@tlin4194 tlin4194 force-pushed the DEVCON-2524-output-deploy-queue-to-checkrun-summary branch from 5fbd36d to d5ef226 Compare November 9, 2024 01:27
@tlin4194
Copy link
Contributor Author

tlin4194 commented Nov 9, 2024

Current PR is tested on staging, with behavior as expected.
The only issue is that after a deploy pending on confirm/reject action gets unblocked, if the queue is currently locked the state receiver cannot restore the unlock button on the checkruns for queued items. The workaround in the UI is that users will just click on the linked previous checkrun unlock from there.

The actual solution requires exposing queue lock state to state receiver. This would require a significant code restructure since LockState cannot be imported to Terraform package without a cyclic dependency. The following steps are moved to a separate PR.
Steps:

  1. Add lockstate logic to state receiver
  2. Move lockstate to independent package so that lockstate can be imported in state receiver
  3. Split up queue interface into 2 interfaces due to interface bloat, one for deploy worker, one for terraform runner.
  4. Fix test cases

When we update checkruns for queued commits (after the blocking commit
gets confirmed or rejected), we need to check if the queue was
previously locked by something else and restore the unlock button for
each commit.

### Sequence of events without this change:
1. Queue is locked due to diverged commits on latest deployment vs
current plan (i.e. if someone previously force applied). All queued
commits should have a unlock button.
2. New commit comes in with a pending confirm/reject action. My other PR
makes it so that all queued checkruns are updated to reflect that the
queue is waiting on this confirm reject action.
<img width="1461" alt="Screenshot 2024-11-08 at 3 08 07 PM"
src="https://github.com/user-attachments/assets/3daf3dc4-a95a-411c-983c-8ad95f53d2e7">
4. Confirm/reject commit gets resolved. All queued commit checkruns get
updated to the default status, without the unlock button. Currently
users would have to click on the commit they want to unlock (first
screenshot below), then click on an older checkrun of that commit
(linked in blue) to access the unlock button (2nd screenshot below).
<img width="877" alt="Screenshot 2024-11-08 at 3 07 08 PM"
src="https://github.com/user-attachments/assets/5c86a489-f18f-45cd-b887-583774ccaa0a">
<img width="1243" alt="Screenshot 2024-11-08 at 3 07 17 PM"
src="https://github.com/user-attachments/assets/0fa5b694-6350-450a-9155-7a3ca7583b48">

### After this change:
4. (Previous step 4) All queued commit checkruns get updated back to its
original status from before the confirm/reject commit was applied,
**including** unlock actions if the queue was previously locked.


### Additional notes:
* LockState is migrated out of the queue package into its own `lock`
package to avoid cyclic imports with terraform module
* bumping max interfacebloat golangci error to allow adding
GetLockState() function to queue interface

### Testing
1. Open PR 1. Force apply PR 1, and verify that the force apply is
successful.
2. Open PR 2. Merge PR 2 to master, and verify that PR 2 deployment
failed and is now locked on PR 1's last commit (there should be an
unlock button and link to the locking commit).
3. Open PR 3. Force apply PR 3, and verify that there is a
Confirm/Reject action.
4. Verify that PR 2's deployment checkrun on master has an updated
summary with link to the pending Confirm/Reject run on PR 3.
5. Reject forced apply on PR 3.
6. Verify that PR 2's deployment checkrun updates again, this time back
to the original checkrun summary from step 2. If the queue was
previously locked (which it was), there should be an unlock button.
@tlin4194 tlin4194 merged commit 58ff899 into main Dec 3, 2024
3 checks passed
@tlin4194 tlin4194 deleted the DEVCON-2524-output-deploy-queue-to-checkrun-summary branch December 3, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants