-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signalilo Heartbeat Implementation Question #123
Comments
That seems to be a mismatch between the example heartbeat service in the README.md and the description. We should update either the heartbeat service example to have
Or update the
The internal heartbeat used to work like this in the beginning, but it looks like this got accidentally changed in 3a863dd. It should be fairly straightforward to change Lines 102 to 104 in d27c2c8
startHeartbeat() .
Having such a facility would be nice in general, but at this time we're only maintaining Signalilo and are reviewing external contributions on a best effort basis, since we're moving away from Icinga2 internally, and the need for Signalilo is no longer present for us. If we want to make a major change to how we handle Signalilo/Icinga liveness, I'd probably go for the second approach you outlined, since both the first and last option suffer from the same issue than the current heartbeat alert, they don't work as you'd want them to if the whole service host or the API user are dropped. Also, the current heartbeat service should already inform you about most Icinga-side misconfigurations (the one case it won't catch is the dropped Service Host, but that one is hard to catch with anything that's sent to Icinga), since the heartbeat should go critical if the Signalilo API user is missing and Signalilo can't update the heartbeat. Additionally, we could extend the current heartbeat implementation to ensure that we're replying with HTTP 500 to Alertmanager if we couldn't update the heartbeat service -> this would then generate an AlertmanagerFailedToSendAlerts alert which may or may not arrive in Icinga, but you should notice that something is off since the corresponding heartbeat will be CRITICAL or UNKNOWN depending on your configuration. |
Hi @simu, thank you for your reply. Regarding:
Yeah, I think it's true if 3a863dd will be reverted. And I up to:
Regarding |
Also regarding:
Third option have 2 sides, it's ready to fully down Icinga Service and will not create delayed requests between Alertmanager-Signalilo-Icinga. The point that it should use same P.s. bad to hear that you moving from Icinga usage, this middleware is nice 👍. |
This more question to discuss, but from what I see:
README.md
states alert will be in UNKNOWN state if heartbeat will be triggered, but actually it will be in CRITICAL state. I think it was a change in heartbeat service example and someone forgot to update description.README.md
states:On startup, Signalilo checks if the matching heartbeat service is available in Icinga, otherwise it exits with a fatal error
. Which get me to understanding that if the heartbeat service doesn't exist404
or there will be any other failures like4xx\5xx
- Signalilo will die, but I don't see this behavior for now. Maybe it was someday broken?AlertmanagerFailedToSendAlerts|AlertmanagerClusterFailedToSendAlerts
. Problem that it will create delays.IcingaApiErorrs
for such error handling that must be created in the same way asHeartbeat
, which will display if there was any errors in last minute. Small minus - with multiple Signalilo replicas it could start to be flapping. In case when even updates ofIcingaApiErorrs
service fails - Signalilo can instantly reply to Alertmanager about failures. After 1m there will be no errors in Icinga API, as no requests were made, and we will try to updateIcingaApiErorrs
- if fail - wait 1 minute again and reply to Alertmanager 500, if pass - start accept alerts from Alertmanager.What you think?
The text was updated successfully, but these errors were encountered: