-
Hi there! I created extra rules that provide me with auto resolve capability. It works actually pretty good, but sometimes the auto resolve - for example for checking if an HTTP endpoint is again available - is not working. Then of course the Alert remains unresolved in Pagerduty until I resolve it manually. Any ideas on how to debug this so I can fix this? Is there another way of implementing auto resolve with Elastalert2? Here is my auto-resolve rule for HTTP endpoints (This is a template for Terraform):
thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
This documentation describes a way to receive resolve alerts: https://elastalert2.readthedocs.io/en/latest/recipes/faq.html#how-can-i-get-a-resolve-event However, in your situation, since your method does work sometimes, I suggest continuing down the debug path because changing to the documented method could result in the same behavior: works sometimes, but not always. What you might find is that ElastAlert2 is detecting the resolve event correctly already, and something else, such as a network glitch, PagerDuty glitch, etc is preventing the alert from getting to PagerDuty. Also, if you're missing an occassional resolve trigger, consider that you might be missing outage triggers too, due to the same underlying problem. Next steps would be to enable verbose logging for ElastAlert2 and when the problem occurs, inspect the logs to track down whether ElastAlert2 detected the resolve event or not. The verbose logs can be overwhelming so be prepared to sit down and sort through the logs for a while. To enable verbose logging: https://elastalert2.readthedocs.io/en/latest/elastalert.html?highlight=verbose#logging If you need even more logging you can adjust the log levels in the config.yaml. See the example config (near the bottom). |
Beta Was this translation helpful? Give feedback.
This documentation describes a way to receive resolve alerts: https://elastalert2.readthedocs.io/en/latest/recipes/faq.html#how-can-i-get-a-resolve-event
However, in your situation, since your method does work sometimes, I suggest continuing down the debug path because changing to the documented method could result in the same behavior: works sometimes, but not always. What you might find is that ElastAlert2 is detecting the resolve event correctly already, and something else, such as a network glitch, PagerDuty glitch, etc is preventing the alert from getting to PagerDuty.
Also, if you're missing an occassional resolve trigger, consider that you might be missing outage triggers too, due …