Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimir.rules.kubernetes: Add support for clustered mode #669

Merged
merged 6 commits into from
May 2, 2024

Conversation

56quarters
Copy link
Contributor

@56quarters 56quarters commented Apr 24, 2024

PR Description

This change adds support for updating Mimir rules based on PrometheusRule objects when running in clustered mode. It does this by picking a single instance to be the leader responsible for making changes using the Mimir API. Other non-leader instances will not process events or make calls to the Mimir API.

Which issue(s) this PR fixes

Related: #158
Related: #616
Related: #648

Notes to the Reviewer

It might be helpful to review this by commit:

  • a05ac6b Introduces leader election such that only a single Alloy instance will receive Kubernetes events and make requests to the Mimir API based on them. This was a relatively small change but lacked any unit tests.
  • 2d46f69 Was a larger refactor that kept the same idea as the previous commit but moved things around and introduced a few internal interfaces so that the interesting logic in Component::Run() could be unit tested.
  • 0023f5f Removes the warning in the docs introduced by mimir.rules.kubernetes: Add warning about clustering to docs #648

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

This change adds support for updating Mimir rules based on PrometheusRule
objects when running in clustered mode. It does this by picking a single
instance to be the leader responsible for making changes using the Mimir
API. Other non-leader instances will not process events or make calls to
the Mimir API.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters
Copy link
Contributor Author

Question for the maintainers:

I find myself needing to do a fair amount of refactoring of this component in order to test the interesting bits of the clustering logic added to Component::Run(). Before I get too far along, what sort of testing would you like to see for this change? I think the component will be in a better place after this refactoring but it's also a bigger change to review.

@56quarters
Copy link
Contributor Author

Question for the maintainers:

I find myself needing to do a fair amount of refactoring of this component in order to test the interesting bits of the clustering logic added to Component::Run(). Before I get too far along, what sort of testing would you like to see for this change? I think the component will be in a better place after this refactoring but it's also a bigger change to review.

Nevermind, I've managed to make the logic from Component::Run() testable without too many changes.

@56quarters 56quarters force-pushed the 56quarters/cluster-rules branch from 38cf1ca to 06cf6ba Compare April 30, 2024 19:21
This commit introduces a few interfaces that cover some internal methods
of the `Component`. This allows the interesting logic from `Component::Run()`
to be extracted to its own method and tested with mocks of some portions
of the rest of the `Component`.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters 56quarters force-pushed the 56quarters/cluster-rules branch from 06cf6ba to 2d46f69 Compare April 30, 2024 19:46
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters 56quarters marked this pull request as ready for review April 30, 2024 20:12
@56quarters 56quarters requested a review from rfratto April 30, 2024 20:12
Copy link
Contributor

@clayton-cornell clayton-cornell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc changes look OK

Comment on lines 27 to 34
{{< admonition type="warning" >}}
This component does not support [clustered mode][]. Using this component as part of a cluster of
{{< param "PRODUCT_NAME" >}} instances will cause them to all attempt to update rules using the
Mimir API, conflicting with each other. When using this component, it must be run in a separate
single-instance deployment of {{< param "PRODUCT_NAME" >}}.

[clustered mode]: ../../../concepts/clustering/
{{< /admonition >}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO instead of removing this outright I think we should add a description for how the component does behave in clustered mode now. (Also noting that this behavior is only present starting with 1.1).

internal/component/mimir/rules/kubernetes/rules.go Outdated Show resolved Hide resolved
internal/component/mimir/rules/kubernetes/rules.go Outdated Show resolved Hide resolved
internal/component/mimir/rules/kubernetes/rules.go Outdated Show resolved Hide resolved
internal/component/mimir/rules/kubernetes/rules.go Outdated Show resolved Hide resolved
56quarters added 2 commits May 1, 2024 14:58
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@56quarters 56quarters requested a review from rfratto May 1, 2024 19:25
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the work you've put into this!

@clayton-cornell can you check the new docs changes before we merge?

@56quarters 56quarters requested a review from clayton-cornell May 2, 2024 15:43
Copy link
Contributor

@clayton-cornell clayton-cornell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor rephrase and good to go

Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
@rfratto rfratto merged commit 08f3c59 into main May 2, 2024
13 checks passed
@rfratto rfratto deleted the 56quarters/cluster-rules branch May 2, 2024 18:29
polyrain pushed a commit to polyrain/alloy that referenced this pull request May 2, 2024
This change adds support for updating Mimir rules based on PrometheusRule
objects when running in clustered mode. It does this by picking a single
instance to be the leader responsible for making changes using the Mimir
API. Other non-leader instances will not process events or make calls to
the Mimir API.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
@rfratto rfratto added the backport-to-agent PR should be backported to the agent repo. label May 3, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport-to-agent PR should be backported to the agent repo. frozen-due-to-age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants