-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added prometheus rules to the helm chart #99
added prometheus rules to the helm chart #99
Conversation
- ContainerEphemeralStorageUsageAtLimit - ContainerEphemeralStorageUsageReachingLimit - EphemeralStorageVolumeFilledUp - EphemeralStorageVolumeFillingUp Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
scripts/deploy.sh
Outdated
@@ -102,6 +102,7 @@ function main() { | |||
"dev.grow.image=${internal_registry}/${grow_repo_image}" | |||
"metrics.adjusted_polling_rate=true" | |||
"pprof=true" | |||
"prometheus.rules.enable=false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure validity of the rules, it may actually be better to deploy prometheus-operator-crds in the test env and enable this flag, even though the rules won't have any effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so I guess I would have to match the Application version
to the CRD version mentioned here.
https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack/56.13.1#from-55-x-to-56-x
If so, I am open to that change.
FWIW, here is where we deploy the servicemonitor for our e2e test currently.
https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics/blob/master/scripts/create_kind.sh#L25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I will take a look and update this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be resolved. Please double-check.
Also, to further avoid spam, the following inhibit rules could be recommended (viz - source_matchers:
- alertname="EphemeralStorageVolumeFilledUp"
target_matchers:
- severity="warning"
- alertname="EphemeralStorageVolumeFillingUp"
equal:
- pod_namespace
- pod_name
- volume_name
- source_matchers:
- alertname="ContainerEphemeralStorageUsageAtLimit"
target_matchers:
- severity="warning"
- alertname="ContainerEphemeralStorageUsageReachingLimit"
equal:
- pod_namespace
- pod_name
- exported_container I'll try to squeeze it into the README |
c433b3c
to
20b7330
Compare
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
20b7330
to
742dede
Compare
Signed-off-by: Michal Minář <michal.minar@id.ethz.ch>
742dede
to
5c99b91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Great job @miminar . Releasing this shortly.
Just released |
Awesome, thank you for swift review and new release! |
That's what we have/use at the moment. If you wish, I could make the individual alerts toggle-able.
There is a potential for additional
NodeOutOfEphemeralStorage
andNodeRunningOutOfEphemeralStorage
or similar, but since we don't have them yet, I'd leave them for a possible follow-up.Please take a look.