[ops dashboard] Partition Balance needs a rename #42

hcoyote · 2024-10-23T18:33:52Z

          "expr": "100 * abs(1-(sum(stddev by (redpanda_topic) (sum(redpanda_kafka_max_offset{redpanda_namespace=\"kafka\",redpanda_cloud_data_cluster_name=~\"\"}) by (redpanda_topic,redpanda_partition))) /  sum(avg by (redpanda_topic) ((sum(redpanda_kafka_max_offset{redpanda_namespace=\"kafka\",redpanda_cloud_data_cluster_name=~\"\"}) by (redpanda_topic,redpanda_partition))))))",

This currently shows the balance of writes to partitions across the cluster. It's confusingly named since we also think about the balance of partition replicas (data) and partition leadership (which brokers own updates to a primary partition and replication to the partition followers).

I get the intent of this to show how well the data is distributed on writes, but I'm not sure we're being clear on this. We probably need to rename this to something like "Partition Write Distribution" and give it an info block so people understand how to interpret.

On a cluster with only a handful of topics, it's probably ok to interpret this as topic write evenness, but on complex clusters with hundreds or thousands of topics/partitions, I'm not sure how this should really come across because the write loads could be fairly skewed to a subset of topics and still have a relatively even write load.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ops dashboard] Partition Balance needs a rename #42

[ops dashboard] Partition Balance needs a rename #42

hcoyote commented Oct 23, 2024

[ops dashboard] Partition Balance needs a rename #42

[ops dashboard] Partition Balance needs a rename #42

Comments

hcoyote commented Oct 23, 2024