Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh install fails #22

Open
mrmarcsmith opened this issue Sep 17, 2021 · 10 comments
Open

Fresh install fails #22

mrmarcsmith opened this issue Sep 17, 2021 · 10 comments

Comments

@mrmarcsmith
Copy link
Contributor

mrmarcsmith commented Sep 17, 2021

I followed the readme with fresh repos and the installation failed. I'm going to include details about everything I did in hopes we can fix this template. I really want jx3 to take off but the adoption of jx3 is totally dependent on these templates working out of the box. I'm going to put in some work and make a PR to fix the easy things I found but I need help with the secrets population issue at the end.

  1. used the template links to create new repos
  2. added the my bot account as a collaborator
  3. made this single commit (2 file changes) to the infrastructure repo

Screen Shot 2021-09-16 at 10 53 09 PM

Screen Shot 2021-09-16 at 10 53 23 PM

4. exported the following env vars
export TF_VAR_jx_bot_token=<REDACTED>
export AWS_PROFILE=dev_root

my ~/.aws/credentials file looks like this:

[default]
aws_access_key_id = <REDACTED>
aws_secret_access_key = <REDACTED>

[dev_root]
role_arn = arn:aws:iam::<REDACTED>:role/OrganizationAccountAccessRole
source_profile = default
  1. ran terraform init and received this error

Screen Shot 2021-09-16 at 11 09 51 PM

  1. removed the ref to fix

Screen Shot 2021-09-16 at 11 11 20 PM

  1. re-ran terraform init and received this error

Screen Shot 2021-09-16 at 11 21 05 PM

  1. ran terraform init -upgrade to fix

  2. ran the code and it passed:

terraform init
terraform plan
terraform apply

Screen Shot 2021-09-16 at 11 49 15 PM

  1. I tailed the logs with jx admin logs and saw this error at the end. Random thought: Is there some additional step not outlined in the README I need to perform to populate those secrets manually or should they be populated automatically?

saved docs/releases.yaml
saved docs/README.md
jx gitops scheduler
jx gitops hash --pod-spec --kind Deployment -s config-root/namespaces/jx/lighthouse-config/config-cm.yaml -s config-root/namespaces/jx/lighthouse-config/plugins-cm.yaml -d config-root/namespaces/jx/lighthouse
jx gitops label --dir config-root/cluster                   gitops.jenkins-x.io/pipeline=cluster
jx gitops label --dir config-root/customresourcedefinitions gitops.jenkins-x.io/pipeline=customresourcedefinitions
jx gitops label --dir config-root/namespaces                gitops.jenkins-x.io/pipeline=namespaces
jx gitops annotate --dir config-root --selector app=pusher-wave kapp.k14s.io/change-group=apps.jenkins-x.io/pusher-wave
jx gitops annotate --dir config-root --selector app.kubernetes.io/name=ingress-nginx kapp.k14s.io/change-group=apps.jenkins-x.io/ingress-nginx
jx gitops label --dir config-root/cluster --kind=Namespace team=jx
jx gitops annotate --dir  config-root/namespaces --kind Deployment --selector app=pusher-wave --invert-selector wave.pusher.com/update-on-config-change=true
jx gitops git setup
found git user.name qube-bot from requirements
found git user.email  from requirements
setup git user  email jenkins-x@googlegroups.com
generated Git credentials file: /workspace/xdg_config/git/credentials with username: qube-bot email: 
git add --all
git commit -m "chore: regenerated" -m "/pipeline cancel"
[main dcd6737] chore: regenerated
 17 files changed, 18 insertions(+), 18 deletions(-)
make[1]: Leaving directory '/workspace/source'
make regen-phase-3
make[1]: Entering directory '/workspace/source'
Already up to date.
To https://github.com/mrmarcsmith/jenkins-x-dev-cluster.git
   ba3cbc2..dcd6737  main -> main
VAULT_ADDR=https://vault.jx-vault:8200 VAULT_NAMESPACE=jx-vault jx secret populate --secret-namespace jx-vault
waiting for vault pod vault-0 in namespace jx-vault to be ready...
pod vault-0 in namespace jx-vault is ready
verifying we have vault installed
about to run: /root/.jx/plugins/bin/vault-1.6.1 version
Vault v1.6.1 (6d2db3f033e02e70202bef9ec896360062b88b03)
verifying we can connect to vault...
about to run: /root/.jx/plugins/bin/vault-1.6.1 kv list secret
Keys
----
accounts/
dockerrepo
mysql
vault is setup correctly!

managed to verify we can connect to vault
VAULT_ADDR=https://vault.jx-vault:8200 jx secret wait -n jx
waiting for the mandatory Secrets to be populated from ExternalSecrets...
jenkins-x-chartmuseum: key secret/data/jx/adminUser missing properties: password, username
jx-basic-auth-user-password: key secret/data/jx/basic/auth/user missing properties: password, key secret/data/jx/basic/auth/user/password missing properties: username
lighthouse-hmac-token: key secret/data/lighthouse/hmac missing properties: token
lighthouse-oauth-token: key secret/data/lighthouse/oauth missing properties: token
nexus: key secret/data/nexus missing properties: password
tekton-container-registry-auth: key secret/data/tekton/container/registry/auth missing properties: .dockerconfigjson
tekton-git: key secret/data/jx/pipelineUser missing properties: token, username
  1. ran jx ui and gathered these relevant error messages

Screen Shot 2021-09-16 at 11 55 59 PM

Screen Shot 2021-09-16 at 11 57 08 PM

  1. ran kubectl get pods --all-namespaces and noticed these failed containers

Screen Shot 2021-09-16 at 11 58 42 PM

  1. ran kubectl logs jx-preview-gc-jobs-27197690-6kjcs output only this:
WARNING: could not default pipeline user/email from requirements as file does not exist: jx-requirements.yml
error: creating git credentials: failed to load the boot secret: failed to find boot secret: failed to find secret tekton-git in namespace jx or jx-git-operator: secrets "tekton-git" not found
  1. ran kubectl describe pod jenkins-x-chartmuseum-79c9b8dcd9-vv9sx -n jx
Events:
  Type     Reason                  Age                   From                                 Message
  ----     ------                  ----                  ----                                 -------
  Normal   Scheduled               19m                   default-scheduler                    Successfully assigned jx/jenkins-x-chartmuseum-79c9b8dcd9-vv9sx to ip-10-0-1-246.ec2.internal
  Normal   SuccessfulAttachVolume  19m                   attachdetach-controller              AttachVolume.Attach succeeded for volume "pvc-27fed11e-906d-4e9e-9948-d38a7ec760e5"
  Normal   Pulling                 19m                   kubelet, ip-10-0-1-246.ec2.internal  Pulling image "chartmuseum/chartmuseum:v0.12.0"
  Normal   Pulled                  19m                   kubelet, ip-10-0-1-246.ec2.internal  Successfully pulled image "chartmuseum/chartmuseum:v0.12.0" in 2.599439293s
  Warning  Failed                  17m (x12 over 19m)    kubelet, ip-10-0-1-246.ec2.internal  Error: secret "jenkins-x-chartmuseum" not found
  Normal   Pulled                  4m24s (x70 over 19m)  kubelet, ip-10-0-1-246.ec2.internal  Container image "chartmuseum/chartmuseum:v0.12.0" already present on machine
  1. ran kubectl describe pod lighthouse-foghorn-86b84cb46c-dkrzm -n jx
Events:
  Type     Reason     Age                 From                                 Message
  ----     ------     ----                ----                                 -------
  Normal   Scheduled  21m                 default-scheduler                    Successfully assigned jx/lighthouse-foghorn-86b84cb46c-dkrzm to ip-10-0-3-201.ec2.internal
  Normal   Pulling    21m                 kubelet, ip-10-0-3-201.ec2.internal  Pulling image "ghcr.io/jenkins-x/lighthouse-foghorn:1.1.51"
  Normal   Pulled     21m                 kubelet, ip-10-0-3-201.ec2.internal  Successfully pulled image "ghcr.io/jenkins-x/lighthouse-foghorn:1.1.51" in 3.272873027s
  Warning  Failed     19m (x12 over 21m)  kubelet, ip-10-0-3-201.ec2.internal  Error: secret "lighthouse-oauth-token" not found
  Normal   Pulled     84s (x94 over 21m)  kubelet, ip-10-0-3-201.ec2.internal  Container image "ghcr.io/jenkins-x/lighthouse-foghorn:1.1.51" already present on machine
@ankitm123
Copy link
Collaborator

ankitm123 commented Sep 17, 2021

Can you paste the full output of jx admin log? May be create a gist. Things (Bad ones) happened much before the errors that you posted.

should they be populated automatically?

Yes, once we have the full logs from the operator install (which runs as part of terraform apply), we should be able to see what exactly failed. There have been changes to helm charts and docker images recently, my guess is some chart was probably not updated, but that's just a guess.

WARNING: could not default pipeline user/email from requirements as file does not exist: jx-requirements.yml

This does not look right, do u have a jx-requirements.yml file in the jx-eks-vault repo?

Basically, jx git operator failed ...

@mrmarcsmith
Copy link
Contributor Author

Yes, I will paste the full jx admin log soon

@mrmarcsmith
Copy link
Contributor Author

here is the js admin log output jx_admin_log.txt

Yes, there is a jx-requirements.yml in my jx-eks-vault repo I will paste the redacted version here.

please let me know what else I can do to help.

@mrmarcsmith
Copy link
Contributor Author

apiVersion: core.jenkins-x.io/v4beta1
kind: Requirements
spec:
  autoUpdate:
    enabled: false
    schedule: ""
  cluster:
    chartRepository: http://jenkins-x-chartmuseum.jx.svc.cluster.local:8080
    clusterName: tf-jx-sweeping-insect
    devEnvApprovers:
    - todo
    environmentGitOwner: todo
    gitKind: github
    gitName: github
    gitServer: https://github.com
    project: "7777777777"
    provider: eks
    region: us-east-1
    registry: 7777777777.dkr.ecr.us-east-1.amazonaws.com
  environments:
  - key: dev
    owner: mrmarcsmith
    repository: jenkins-x-dev-cluster
  - key: staging
  - key: production
  ingress:
    domain: 77.77.77.77.nip.io
    kind: ingress
    namespaceSubDomain: -jx.
    tls:
      email: ""
      enabled: false
      production: false
  pipelineUser:
    username: qube-bot
  repository: nexus
  secretStorage: vault
  storage:
  - name: logs
    url: s3://logs-tf-jx-sweeping-insect-20210917777777777777777
  - name: reports
    url: s3://reports-tf-jx-sweeping-insect-20210917777777777777777
  - name: repository
    url: s3://repository-tf-jx-sweeping-insect-202109177777777777777777
  terraform: true
  vault:
    aws:
      dynamoDBRegion: us-east-1
      dynamoDBTable: vault-unseal-tf-jx-sweeping-insect-777777
      kmsKeyId: 77777777-7777-7777-7777-777777777777
      kmsRegion: us-east-1
      s3Bucket: vault-unseal-tf-jx-sweeping-insect-202109177777777777777777
      s3Region: us-east-1
  webhook: lighthouse

I only changed numbers to 7's everything else is unchanged

@ankitm123
Copy link
Collaborator

Looking at the logs, I dont think this is the first boot job. I see quite a few resources which are unchanged in that boot job:

clusterrole.rbac.authorization.k8s.io/jx-build-controller-jx unchanged

Do you happen to have the very first boot job? The first boot job would run when you do terraform apply.
Also, one thing you can add is email to the pipelineUser in jx-requirements.yml:

pipelineUser:
    username: qube-bot
    email: add-email-here

Commit these changes in ur cluster git repo, push it and tail the logs, let's see what happens ...

Having said that, the fastest way to debug this would be to get the first boot job log, if not, then recreating the cluster, and getting fresh new logs will help a lot (I feel it might potentially save some time as well).

@mrmarcsmith
Copy link
Contributor Author

@ankitm123 My mistake, Here is the new jx admin log.
new_jx_admin_log.txt

I pushed the email change but nothing happened in the jx admin log. possibly because the last command in the logs is still waiting for the secrets to populate?
VAULT_ADDR=https://vault.jx-vault:8200 jx secret wait -n jx

@ankitm123
Copy link
Collaborator

I will take a look tonight, and come back to you.

@ankitm123
Copy link
Collaborator

ankitm123 commented Oct 23, 2021

Starting to think this has to do with kubernetes 1.21, I will check what's the option to disable iss check in the helm charts.

I see that you have tried by downgrading to 1.20, and it still did not work. Can you post the output of external-secrets pod in the secret-infra namespace and vault pods in the jx-vault namespace.

I dont see anything suspicious in the boot log 😕

@jenkinsvctrn
Copy link

i got the same issue over here with v1.15.47

@ankitm123
Copy link
Collaborator

ankitm123 commented Dec 4, 2021

Use the latest version - there's a weirdness in the release pipeline and so it gets pegged to this version (1.15.47), uninstall the cluster created using this version, and use the latest: https://github.com/jenkins-x/terraform-aws-eks-jx/releases/tag/v1.18.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants