Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Execution attempt logged with information severity #2362

Closed
martintmk opened this issue Oct 24, 2024 · 9 comments
Closed

[Bug]: Execution attempt logged with information severity #2362

martintmk opened this issue Oct 24, 2024 · 9 comments
Labels
bug stale Stale issues or pull requests

Comments

@martintmk
Copy link
Contributor

martintmk commented Oct 24, 2024

Describe the bug

Some of our service teams that switched to Polly reported increased CPU utilization. After digging for a while we found that this was caused by increased amount of logs being produced. They have set the minimum log level to Information to not overwhelm the logging infra.

What suprised me, is that Polly logs each execution attempt (Retry, Hedging) with information severity. Is there any reason doing so? Why can't we just use Debug for this by default?

Repro Code:

var services = new ServiceCollection();
services.AddLogging(builder => builder.AddConsole().SetMinimumLevel(LogLevel.Debug));

var pipeline = services
    .AddResiliencePipeline("my-pipeline", builder => builder.AddRetry(new()))
    .BuildServiceProvider()
    .GetRequiredKeyedService<ResiliencePipeline>("my-pipeline");

await pipeline.ExecuteAsync(async token => true);

Gives the following console output:

dbug: Polly[1]
      Resilience pipeline executing. Source: 'my-pipeline/(null)', Operation Key: '(null)'
info: Polly[3]
      Execution attempt. Source: 'my-pipeline/(null)/Retry', Operation Key: '', Result: 'True', Handled: 'False', Attempt: '0', Execution Time: 5.3998ms
dbug: Polly[2]
      Resilience pipeline executed. Source: 'my-pipeline/(null)', Operation Key: '', Result: 'True', Execution Time: 60.2843ms

As a workaround, they have used the SeverityProvider (#1859) to switch the default level for ExecutionAttempt to Debug.

@martintmk martintmk added the bug label Oct 24, 2024
@peter-csala
Copy link
Contributor

IMHO if we want to change telemetry logs' severity then we should not focus only on the ExecutionAttempt rather we should re-evaluate for all logs.

@martincostello
Copy link
Member

I assumed the levels were all as-intended as part of the telemetry work for v8.

I can see a case for it being either Info or Debug, so I don't really think this is a bug. I think this is more a case of an unforeseen side-effect of adding logging to Polly.

If we changed the default behaviour, we might break people who do want this information, and this is unlikely to be the sort of behaviour change people would find in their CI. It then also requires code changes to retain the previous behaviour, not just config.

Feels like the workaround is more using a feature by-design to change the behaviour.

I'd agree we should check all the logs and re-evaluate them if we do want to change, but doing so outside a major release feels wrong to me (but maybe I'm being over-cautious).

@martintmk
Copy link
Contributor Author

Thanks folks, Martin you are right this might not really be bug.

My main point is that we should review what logs are we generating on the hot + happy path. I believe, for this category of logs, everything should be at debug level to prevent the problems mentioned in the issue info.

@peter-csala
Copy link
Contributor

I think we could create an opt-in version for V8. We could extend the TelemetryOptions with a flag, like LowerLogSeverityForHighPerformance or EnablePerformanceOptimization or whatever name you like.

This flag would lower the log severity for a predefined list of telemetry log types (those that are usuallt on the hot path). The advantage of this approach compared to the direct usage of SeverityProvider is that user can easily turn this on to reduce the quantity of logs. Of course we should provide a proper documentation, but the Polly users don't have to know all the nitty gritty details.

@martincostello
Copy link
Member

I think we could create an opt-in version for V8. We could extend the TelemetryOptions with a flag, like LowerLogSeverityForHighPerformance or EnablePerformanceOptimization or whatever name you like.

Having a "make go fast" option doesn't seem great to me, because it would inevitably attract "why isn't that the default then?" questions.

Maybe a "performance optimizations" section in the docs would be an alternative? It could cover things like customizing the severities as well as overriding the general Polly log level to Warning+ etc. I imagine most people are logging Polly at Information level via having "Default": "Information" in their application's logging configuration.

@peter-csala
Copy link
Contributor

Naming is hard 😅 Less verbose or Important ones only might be better. My main idea here was to introduce a simple switch which lowers the severity of the chatty telemetry events.

Obviously it has a trade off: less logs, less visibility. I'm not sure ASP.NET Core has a similar feature, if so we could follow that pattern.

@martincostello
Copy link
Member

I think the problem with that is that the severity is pretty subjective when it comes to info vs. debug and warn vs. error and we've already (rightly or wrongly) applied an opinion on what events are what level, so having a second opinion on what they all are baked-in feels a bit weird to me.

Maybe with hindsight we may have tweaked them slightly before shipping v8, but I think if you're really performance conscious, we've already provided ways for people to tweak that as much as they want.

If over time there's feedback from multiple users on this we could consider doing something, but I think docs is the "cheaper" solution at this stage.

Copy link
Contributor

This issue is stale because it has been open for 60 days with no activity. It will be automatically closed in 14 days if no further updates are made.

@github-actions github-actions bot added the stale Stale issues or pull requests label Dec 28, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug stale Stale issues or pull requests
Projects
None yet
Development

No branches or pull requests

3 participants