-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ring: add GetWithOptions method to adjust per call behavior #632
base: main
Are you sure you want to change the base?
Conversation
@@ -1332,36 +1420,3 @@ func (op Operation) ShouldExtendReplicaSetOnState(s InstanceState) bool { | |||
|
|||
// All states are healthy, no states extend replica set. | |||
var allStatesRingOperation = Operation(0x0000ffff) | |||
|
|||
// numberOfKeysOwnedByInstance returns how many of the supplied keys are owned by given instance. | |||
func (r *Ring) numberOfKeysOwnedByInstance(keys []uint32, op Operation, instanceID string, bufDescs []InstanceDesc, bufHosts []string, bufZones []string) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only called from tests. I've moved most of this logic to the test in question (token_range_test.go
).
35571f9
to
e17f2cc
Compare
@@ -627,27 +628,31 @@ func TestRing_Get_ZoneAwareness(t *testing.T) { | |||
"should succeed if there are enough instances per zone on RF = 3": { | |||
numInstances: 16, | |||
numZones: 3, | |||
totalZones: 3, | |||
replicationFactor: 3, | |||
zoneAwarenessEnabled: true, | |||
expectedInstances: 3, | |||
}, | |||
"should fail if there are instances in 1 zone only on RF = 3": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the change to add an instance in each zone below, this test would fail.
Previously, findInstancesForKey
ignored the actual number of zones, len(r.ringTokensByZone)
, and assumed that replication factor was equivalent. Now that we've changed things so that replication factor is not necessarily the same as the number of zones, we need to add at least one instance in each zone so that findInstancesForKey
is aware of them and only picks a single instance from each zone.
distinctHosts = bufHosts[:0] | ||
distinctZones = bufZones[:0] | ||
|
||
// TODO: Do we need to pass this in to avoid allocations? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to replace passing bufDescs, bufHosts, bufZones
around everywhere with a single struct Buffers
which could include this map but that was a pretty invasive and non-backwards compatible change (not that we have any guarantees in dskit).
6c1dec6
to
7b887a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only reviewed the prod changes so far
|
||
// SupportsExpandedReplication returns true for replication strategies that | ||
// support increasing the replication factor beyond a single instance per zone, | ||
// false otherwise. | ||
SupportsExpandedReplication() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know if this won't break clients with their own ReplicationStrategy
implementations? I'm wondering if we shouldn't add a new interface and check with type assertions in getReplicationSetForKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will break existing ReplicationStrategy
implementations but it's an obvious and simple fix. We could add a new type but I'm not sure what that gets us besides not requiring changes from people that have their own ReplicationStrategy
implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's the only thing I can think of. A separate method in the existing interface is also fine as long as the list of optional features doesn't keep growing. (feel free to go either way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. I'll leave it as-is for now and we can revisit if we end up having more than just this feature we need to test against.
distinctZones = bufZones[:0] | ||
|
||
// TODO: Do we need to pass this in to avoid allocations? | ||
hostsPerZone = make(map[string]int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. Can't we still use a slice with a struct? Have you ran BenchmarkRing_Get
(maybe also adding a test case there with a different RF)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we don't need to pass this in to avoid an allocation 🤷 . In the new version, Get
ends up being ~7% slower than the old version
$ benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: github.com/grafana/dskit/ring
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Ring_Get/with_zone_awareness-16 1.145µ ± ∞ ¹ 1.233µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 835.5n ± ∞ ¹ 943.5n ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness-16 881.8n ± ∞ ¹ 934.8n ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 647.6n ± ∞ ¹ 676.1n ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 859.7n 926.0n +7.71%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
Ring_Get/with_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean ³ +0.00% ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Ring_Get/with_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean ³ +0.00% ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was more concerned about the added CPU time. 7% isn't a huge deal. In a random cluster at GL, ring.Get
is only consuming 1.5% of distributor CPU. So this increase won't be material. Is doing a slice with a struct{ ... }
complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is doing a slice with a
struct{ ... }
complicated?
I don't understand what you mean by this. Are you suggesting something to replace the map[string]int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. So previously we had a buffer with zones []string
. That was in place of a map because iterating it is faster than accessing a map.
But now we also care about how many instances per zone we have, so you went with a map. To keep the lookup speed of a slice we can have something like this
type zoneInstanceCount struct {
zone string
numInstances int
}
hostsPerZone := make([]zoneInstanceCount, 0, 0)
findHostsPerZone := func(zone string) int {
for _, c := range hostsPerZone {
if c.zone == zone {
return c.numInstances
}
}
return 0
}
addInstanceToZone := func(zone string) {
for i, c := range hostsPerZone {
if c.zone == zone {
hostsPerZone[i].numInstances++
}
}
}
the zoneInstanceCount can be a package type and the funcs can also be proper methods.
hostsPerZone
can also be taken from a buffer after you make the change you mentioned with a Buffers
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll benchmark a change like this. I think I'd rather do the Buffers
type change in a separate PR and get that merged before this one (it's low risk, just annoying for consumers of dskit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a hacky version of this here. It doesn't seem like using slices is any faster so I'm going to leave this as-is.
$ benchstat map.txt slice.txt
goos: linux
goarch: amd64
pkg: github.com/grafana/dskit/ring
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
│ map.txt │ slice.txt │
│ sec/op │ sec/op vs base │
Ring_Get/without_zone_awareness-16 893.2n ± ∞ ¹ 932.0n ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 670.4n ± ∞ ¹ 735.1n ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/with_zone_awareness-16 1.238µ ± ∞ ¹ 1.401µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 943.8n ± ∞ ¹ 1023.0n ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 914.6n 995.4n +8.84%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ map.txt │ slice.txt │
│ B/op │ B/op vs base │
Ring_Get/without_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/with_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean ³ +0.00% ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
│ map.txt │ slice.txt │
│ allocs/op │ allocs/op vs base │
Ring_Get/without_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/with_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean ³ +0.00% ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
This change adds a new method that accepts 0 or more `Option` instances that modify the behavior of the call. These options can (currently) be used to adjust the replication factor for a particular key or use buffers to avoid excessive allocation. The most notable changes are in the `Ring.findInstancesForKey` method which is the core of the `Ring.Get` method. Instead of keeping track of distinct zones and assuming that only a single instance per zone would ever be returned, we keep a map of the number of instances found in each zone. Part of grafana/mimir#9944
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
7b887a7
to
bf94082
Compare
What this PR does:
This change adds a new method that accepts 0 or more
Option
instances that modify the behavior of the call. These options can (currently) be used to adjust the replication factor for a particular key or use buffers to avoid excessive allocation.The most notable changes are in the
Ring.findInstancesForKey
method which is the core of theRing.Get
method. Instead of keeping track of distinct zones and assuming that only a single instance per zone would ever be returned, we keep a map of the number of instances found in each zone.Which issue(s) this PR fixes:
Part of grafana/mimir#9944
Previous attempt #620
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]