Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added prometheus rules to the helm chart #99

Merged
merged 5 commits into from
Jul 8, 2024

Conversation

miminar
Copy link
Contributor

@miminar miminar commented Jul 3, 2024

  • ContainerEphemeralStorageUsageAtLimit
  • ContainerEphemeralStorageUsageReachingLimit
  • EphemeralStorageVolumeFilledUp
  • EphemeralStorageVolumeFillingUp

That's what we have/use at the moment. If you wish, I could make the individual alerts toggle-able.

There is a potential for additional NodeOutOfEphemeralStorage and NodeRunningOutOfEphemeralStorage or similar, but since we don't have them yet, I'd leave them for a possible follow-up.

Please take a look.

miminar added 3 commits July 3, 2024 10:21
- ContainerEphemeralStorageUsageAtLimit
- ContainerEphemeralStorageUsageReachingLimit
- EphemeralStorageVolumeFilledUp
- EphemeralStorageVolumeFillingUp

Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
@@ -102,6 +102,7 @@ function main() {
"dev.grow.image=${internal_registry}/${grow_repo_image}"
"metrics.adjusted_polling_rate=true"
"pprof=true"
"prometheus.rules.enable=false"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure validity of the rules, it may actually be better to deploy prometheus-operator-crds in the test env and enable this flag, even though the rules won't have any effect?

Copy link
Owner

@jmcgrath207 jmcgrath207 Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so I guess I would have to match the Application version to the CRD version mentioned here.
https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack/56.13.1#from-55-x-to-56-x

If so, I am open to that change.

FWIW, here is where we deploy the servicemonitor for our e2e test currently.
https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics/blob/master/scripts/create_kind.sh#L25

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will take a look and update this tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved. Please double-check.

@miminar
Copy link
Contributor Author

miminar commented Jul 3, 2024

Also, to further avoid spam, the following inhibit rules could be recommended (viz kubectl explain alertmanagerconfigs.monitoring.coreos.com.spec.inhibitRules):

        - source_matchers:
            - alertname="EphemeralStorageVolumeFilledUp"
          target_matchers:
            - severity="warning"
            - alertname="EphemeralStorageVolumeFillingUp"
          equal:
            - pod_namespace
            - pod_name
            - volume_name
        - source_matchers:
            - alertname="ContainerEphemeralStorageUsageAtLimit"
          target_matchers:
            - severity="warning"
            - alertname="ContainerEphemeralStorageUsageReachingLimit"
          equal:
            - pod_namespace
            - pod_name
            - exported_container

I'll try to squeeze it into the README

@miminar miminar marked this pull request as draft July 3, 2024 14:27
@miminar miminar force-pushed the container-prometheus-rules branch 2 times, most recently from c433b3c to 20b7330 Compare July 4, 2024 05:00
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
@miminar miminar force-pushed the container-prometheus-rules branch from 20b7330 to 742dede Compare July 4, 2024 05:14
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
@miminar miminar force-pushed the container-prometheus-rules branch from 742dede to 5c99b91 Compare July 4, 2024 05:15
@miminar miminar marked this pull request as ready for review July 4, 2024 05:18
Copy link
Owner

@jmcgrath207 jmcgrath207 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Great job @miminar . Releasing this shortly.

@jmcgrath207 jmcgrath207 merged commit 82d1431 into jmcgrath207:master Jul 8, 2024
2 checks passed
@jmcgrath207
Copy link
Owner

jmcgrath207 commented Jul 8, 2024

Just released 1.11.1 with your change

@miminar
Copy link
Contributor Author

miminar commented Jul 8, 2024

Awesome, thank you for swift review and new release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants