Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counters reported as Gauges in Prometheus metrics #3031

Open
danielgblanco opened this issue Sep 10, 2024 · 10 comments
Open

Counters reported as Gauges in Prometheus metrics #3031

danielgblanco opened this issue Sep 10, 2024 · 10 comments

Comments

@danielgblanco
Copy link

danielgblanco commented Sep 10, 2024

What happened:
Some of the Prometheus metrics exported by the VPC CNI plugin are defined with inaccurate metric types. For example:

This metric (awscni_add_ip_req_count) is exported as a gauge but it has cumulative incremental values. In fact, it seems that it's used as a counter in:

prometheusmetrics.AddIPCnt.Inc()

It seems that awscni_del_ip_req_count is correctly exported as a counter.

I probably don't have enough context on this to make a judgement call. However, I think there are probably more Gauges that are operating as Counters.

Attach logs
N/A

What you expected to happen:
I'd expect metrics to follow the semantic conventions defined in https://prometheus.io/docs/concepts/metric_types/

How to reproduce it (as minimally and precisely as possible):
Using Prometheus exporters.

Anything else we need to know?:
This may not be a critical issues if systems use Prometheus as the backend. However, it becomes a problem when Prometheus metrics are transformed into other representations. For example, OpenTelemetry Collectors will read this as a Gauge and that gives the aggregation a different meaning (e.g. one can change temporality of counters from cumulative to delta or viceversa).

Environment:

  • Kubernetes version (use kubectl version): 1.28.12
  • CNI Version: 1.16.3
  • OS (e.g: cat /etc/os-release): Bottlerocket 1.21.0
  • Kernel (e.g. uname -a): x86_64 GNU/Linux
@orsenthil
Copy link
Member

@danielgblanco - could you verify this with the latest version of VPC CNI. This was fixed in the recent versions of CNI 1.18.3.

@orsenthil
Copy link
Member

This is still present as Gauge in master

https://github.com/aws/amazon-vpc-cni-k8s/blob/master/utils/prometheusmetrics/prometheusmetrics.go#L64

We need to fix this to use as Counter.

@danielgblanco
Copy link
Author

Sorry I've been on PTO, thanks for the follow up.

@hbhasker
Copy link

hbhasker commented Oct 3, 2024

Looks like its not just the AddIPCnt metric that changed incorrectly to a gauge. I think this was done incorrectly in 2ac9e0a#diff-6c65a620b5206565cbd61b3390a33e02146dac52f62735909564c6b963127968L182

Copy link

github-actions bot commented Dec 3, 2024

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Dec 3, 2024
Copy link

Issue closed due to inactivity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2024
@danielgblanco
Copy link
Author

Can this be re-opened please? I believe it's still an issue.

@orsenthil
Copy link
Member

This will be resolved with the next release VPC CNI 1.19.3

@orsenthil orsenthil reopened this Jan 9, 2025
@danielgblanco
Copy link
Author

Apologies, I've just seen #3093 which I hope fixes it? Thanks!

@orsenthil
Copy link
Member

orsenthil commented Jan 9, 2025

Yes. That's correct. You can use the manifest from the master file, and it should resolve it.

@github-actions github-actions bot removed the stale Issue or PR is stale label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants