Add probes to helm chart - otherwise when error occurs k8s will not restart it #49

fzyzcjy · 2021-02-06T06:39:19Z

fzyzcjy
Feb 6, 2021

Hi thanks for this lib! We should add probes to helm chart - otherwise when error occurs k8s will not restart it. That will be very bad, as then we will lose our alerting system :/

I am willing to help. Now the problem is, how can we detect the health?

jertel · 2021-02-06T23:50:49Z

jertel
Feb 6, 2021
Maintainer

Interesting idea. I'm not sure off the top of my head the best approach to confirm that Elastalert is successfully connected to ES, and has correctly loaded the desired rules.

What I do, instead of relying on the cluster health mechanism, is to add a Deadman's Switch alert to my ruleset. That alert fires every 3 minutes because it's scanning for a consistently logged heartbeat event in the ES index. Therefore, every time it fires the rule it will trigger the alert out to a third-party deadman's switch service, typically a simple URL GET/POST (these services are very affordable and worth it). The deadman's switch service then will notify a slack webhook, or send a text to the devops team, whenever the alert fails to arrive within a set threshold of time, such as 6 minutes (instead of 3, so that we avoid false positives due to a network hiccup). The advantage to this design is that you will get notified in the following circumstances:

Elastalert has an internal fault or a connection issue with Elasticsearch.
The network connectivity outbound from the Kubernetes cluster, or the Elastalert pod gets disconnected. Kube-dns/core-dns service has/had known issues resulting in this scenario.
The entire Kubernetes cluster has failed. This has happened to me with Azure's AKS multiple times.

The health probe you are proposing will detect scenario 1 but will fail to detect scenarios 2 and 3. However, the deadman's switch solution covers all three.

5 replies

fzyzcjy Feb 7, 2021
Author

Sounds interesting! Do you mean this one or some other service?

jertel Feb 7, 2021
Maintainer

It's been a couple years since I looked into them, so I can't recall a specific one. I ended up starting my own cloud-based Deadman's Switch service, because I had specific requirements. I have been using it for the past couple of years now.

fzyzcjy Feb 7, 2021
Author

Cloud you please look a bit at your deployment to see which service? Thanks!

jertel Feb 8, 2021
Maintainer

It's not an advertised service. It currently supports POSTing to a webhook URL, such as Slack. If you're interested in using it you can reach me at the email address listed in the helm chart.yaml.

fzyzcjy Feb 8, 2021
Author

@jertel Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add probes to helm chart - otherwise when error occurs k8s will not restart it #49

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Add probes to helm chart - otherwise when error occurs k8s will not restart it #49

fzyzcjy Feb 6, 2021

Replies: 1 comment · 5 replies

jertel Feb 6, 2021 Maintainer

fzyzcjy Feb 7, 2021 Author

jertel Feb 7, 2021 Maintainer

fzyzcjy Feb 7, 2021 Author

jertel Feb 8, 2021 Maintainer

fzyzcjy Feb 8, 2021 Author

fzyzcjy
Feb 6, 2021

Replies: 1 comment 5 replies

jertel
Feb 6, 2021
Maintainer

fzyzcjy Feb 7, 2021
Author

jertel Feb 7, 2021
Maintainer

fzyzcjy Feb 7, 2021
Author

jertel Feb 8, 2021
Maintainer

fzyzcjy Feb 8, 2021
Author