Replies: 1 comment 5 replies
-
Interesting idea. I'm not sure off the top of my head the best approach to confirm that Elastalert is successfully connected to ES, and has correctly loaded the desired rules. What I do, instead of relying on the cluster health mechanism, is to add a Deadman's Switch alert to my ruleset. That alert fires every 3 minutes because it's scanning for a consistently logged heartbeat event in the ES index. Therefore, every time it fires the rule it will trigger the alert out to a third-party deadman's switch service, typically a simple URL GET/POST (these services are very affordable and worth it). The deadman's switch service then will notify a slack webhook, or send a text to the devops team, whenever the alert fails to arrive within a set threshold of time, such as 6 minutes (instead of 3, so that we avoid false positives due to a network hiccup). The advantage to this design is that you will get notified in the following circumstances:
The health probe you are proposing will detect scenario 1 but will fail to detect scenarios 2 and 3. However, the deadman's switch solution covers all three. |
Beta Was this translation helpful? Give feedback.
-
Hi thanks for this lib! We should add probes to helm chart - otherwise when error occurs k8s will not restart it. That will be very bad, as then we will lose our alerting system :/
I am willing to help. Now the problem is, how can we detect the health?
Beta Was this translation helpful? Give feedback.
All reactions