Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-20058: Use GCP secret manager + Helm for infra deployments #1015

Merged
merged 22 commits into from
Oct 16, 2023

Conversation

tommartensen
Copy link
Contributor

@tommartensen tommartensen commented Sep 29, 2023

Changes

  • Infra configuration and secrets were moved to GCP secret manager, which supports versioning, streaming of values into helm upgrade.
  • Infra is now deployed via Helm.
  • Helm and secrets utils are available through Make targets, which use the scripts in scripts/deploy.
  • "deployment" & "environment" were simplified to "environment", which can be "development" or "production". Previous usages of "local" deployment have been replaced with check of the .Capabilities API to avoid provisioning non-existent K8s resources, and the testMode value, which can be set to disable telemetry.
  • The "infra" namespace is now created with Helm instead of explicitly as a K8s resource

Misc

  • PR GHA workflow: Is now synchronized, ie only one workflow per PR can run
  • Deployment GHA workflow: The Slack channel ID was fixed
  • Documentation was updated
  • Argo Workflow server is now installed through a Helm chart and independent of the infra-server deployment

How I tested my changes

  • PR workflow (build, PR cluster, e2e tests)
  • Deployed to dev.infra.rox.systems with the fixed Deployment workflow (notification) and from my machine, following the DEPLOYMENT.md instructions.

Rollout on prod

  1. make configuration-download on master

  2. Replacing ".Values.deployment" with ".Values.environment" in chart/infra-server/configuration/production/{auth0.yaml, oidc.yaml}

  3. make create-consolidated-values

  4. Validate "oidc_yaml" contains ".Values.environment":

    yq \
        ".oidc_yaml" \
        "chart/infra-server/configuration/production-values-from-files.yaml" \
    | base64 --decode
  5. Remove "auth0_yaml" (migrated to RHSSO), "google_calendar_credentials_json" (functionality removed), "gke__gke_credentials_json" (pre RH migration), all unused in chart.

  6. Switch to tm/helm-charts branch.

  7. ENVIRONMENT=production make secrets-upload

  8. Validate new secret shows updated values: ./scripts/deploy/secrets.sh show production latest, selecting "oidc_yaml" shows the same output as (4).

  9. Validate you're pointing to the correct cluster: kubectl config current-context.

  10. Delete argo and infra namespaces and all secrets in default namespace (except default token). This is required to have a clean slate, otherwise conflicts with Helm unmanaged resources.

  11. ENVIRONMENT=production make install-argo helm-deploy.

@tommartensen tommartensen self-assigned this Sep 29, 2023
@ghost
Copy link

ghost commented Oct 4, 2023

A single node development cluster (infra-pr-1015) was allocated in production infra for this PR.

CI will attempt to deploy us.gcr.io/stackrox-infra/infra-server:0.8.2-22-gce8e555ae1 to it.

🔌 You can connect to this cluster with:

gcloud container clusters get-credentials infra-pr-1015 --zone us-central1-a --project acs-team-temp-dev

🛠️ And pull infractl from the deployed dev infra-server with:

nohup kubectl -n infra port-forward svc/infra-server-service 8443:8443 &
make pull-infractl-from-dev-server

🚲 You can then use the dev infra instance e.g.:

bin/infractl -k -e localhost:8443 whoami

⚠️ Any clusters that you start using your dev infra instance should have a lifespan shorter then the development cluster instance. Otherwise they will not be destroyed when the dev infra instance ceases to exist when the development cluster is deleted. ⚠️

Further Development

☕ If you make changes, you can commit and push and CI will take care of updating the development cluster.

🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with:

make install-local

Logs

Logs for the development infra depending on your @redhat.com authuser:

Or:

kubectl -n infra logs -l app=infra-server --tail=1 -f

@tommartensen tommartensen changed the title wip: GCP secrets + Helm charts ROX-... : Use GCP secret manager + Helm for infra deployments Oct 11, 2023
@tommartensen tommartensen changed the title ROX-... : Use GCP secret manager + Helm for infra deployments ROX-20058: Use GCP secret manager + Helm for infra deployments Oct 11, 2023
@tommartensen tommartensen marked this pull request as ready for review October 11, 2023 12:45
@tommartensen tommartensen requested a review from a team as a code owner October 11, 2023 12:45
Copy link
Contributor

@gavin-stackrox gavin-stackrox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Other than my questions around ManagedCert/Ingress, I think this PR is good to go


- name: Notify infra channel about new version
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
uses: slackapi/slack-github-action@v1.23.0
with:
channel-id: C01H4DC33K3 #acs-infra
channel-id: CVANK5K5W #acs-infra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

@@ -1,3 +1,4 @@
name: Deploy infra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

Comment on lines -26 to -28
#############
## Linting ##
#############
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. Re-organizing files makes review extra hard. Re-org should be a separate PR IMhO. But not going to hold this PR back for it.

@@ -1,4 +1,4 @@
{{ if ne .Values.deployment "local" -}}
{{- if .Capabilities.APIVersions.Has "networking.gke.io/v1/ManagedCertificate" -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to create the managedcertificate on infra PR instances with dev values. How will this affect the dev instance? If we are not sure, then best to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed certificate will be stuck in Provisioning state for infra PR instances, because the certificate challenge cannot be reached on the {{ .Values.hosts.primary }} or {{ .Values.hosts.secondary }}.

No impact on the dev cluster other than some failed requests from GCP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatabout using .Values.testMode to exclude them? I get that they are probably harmless but given that dependabot can create lots of dev instances I'd prefer to avoid any issues.

Copy link
Contributor Author

@tommartensen tommartensen Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK ce8e555

chart/infra-server/templates/ingress.yaml Outdated Show resolved Hide resolved
@tommartensen tommartensen merged commit de64d2f into master Oct 16, 2023
7 checks passed
@tommartensen tommartensen deleted the tm/helm-charts branch October 16, 2023 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants