Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-21530: Cert Expiry Dashboard #283

Merged
merged 7 commits into from
Nov 18, 2024
Merged

Conversation

aaa5kameric
Copy link
Contributor

@aaa5kameric aaa5kameric commented Oct 28, 2024

Description

There are 2 steps prior to this PR:
First step was to Implement monitoring for certificate expiration, tracking and managing of digital certificates expiration dates. Certificate Monitoring PR: ROX-21530-certificate-monitoring , extracts timestamps from certificates and exposes metrics to Prometheus.

The second step was the Alerting part. Certificate Alerting PR: ROX-21530-certificate-alerting depends on the monitoring phase for timestamp exposing and extraction. So, External dependencies: adding/extracting the metrics : ROX-21530-certificate-monitoring . In the alerting phase, we defined prometheus rules and tests (RHACSFleetschardCertificateExpiring.yaml) for timestamps expiring on:

WARNING: <= 7 days RHACSFleetshardCertificateExpiringSoon

CRITICAL: <=1 day RHACSFleetshardCertificateExpiringCritical

Lastly, the Certificate Expiry Table-Dashboard was created using Grafana called Certificates Expiry. From the prometheus metric: acs_fleetshard_certificate_expiration_timestamp. This table is located in the RHACS Dataplane - Cluster Metrics section.

Jira Ticket: https://issues.redhat.com/browse/ROX-21530

Dashboard Screenshots:

image

image

Link to draft dashboard:https://grafana-route-rhacs-observability.apps.acs-int-us-01.isbr.p1.openshiftapps.com/d/ae1jamsury800e/rhacs-dataplane-cluster-metrics-copy-amina-copy?orgId=1&from=1730045176177&to=1730131576177&viewPanel=148

@aaa5kameric aaa5kameric requested a review from a team as a code owner October 28, 2024 11:08
Copy link
Contributor

@ludydoo ludydoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change to integrate with the Central instance filter
Screenshot 2024-10-29 at 11 33 11 AM

resources/grafana/sources/rhacs-cluster-overview.json Outdated Show resolved Hide resolved
Copy link
Contributor

@stehessel stehessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move the new widget to the bottom of the page? Right now it seems to be on top of the dashboard. I would also make the table bigger like the other tables. Also the table shows like this to me

SCR-20241029-ktea

resources/grafana/sources/rhacs-cluster-overview.json Outdated Show resolved Hide resolved
@stehessel
Copy link
Contributor

stehessel commented Oct 31, 2024

Some more suggestions:

  • Sort the table such that certs that will expire soonest are at the top.
  • Add Organization column to table.
  • Make Expiration the second column after Namespace, then Organization, then the rest.

@ludydoo
Copy link
Contributor

ludydoo commented Nov 6, 2024

@aaa5kameric can you add new screenshots?

@aaa5kameric
Copy link
Contributor Author

/retest

@aaa5kameric aaa5kameric reopened this Nov 14, 2024
@aaa5kameric aaa5kameric requested a review from ludydoo November 15, 2024 16:27
@ludydoo
Copy link
Contributor

ludydoo commented Nov 18, 2024

@stehessel can you re-review 🙏

@aaa5kameric aaa5kameric merged commit 0bbe4f4 into master Nov 18, 2024
2 checks passed
@aaa5kameric aaa5kameric deleted the ROX-21530-Cert-Expiry-Dashboard branch November 18, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants