-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backoff retry delay in status check is increasing to hours rendering services offline #2897
Comments
Hi @eugene-sadovsky , that the retry time increases is intended behaviour. But you are right that the waiting time might get too high. |
thank you for the quick response 🙇🏼 |
I think the main issue is that back-off time is never reset back to zero after successful retry. It will just saturate to |
if this is really true that would be a bug in project reactor I think |
yeah, this is the behavior I observed. You can reproduce it by running my gist, it closely resembles the code in |
Spring Boot Admin Server information
Version:
3.1.4
Spring Boot version:
3.1.0
Client information
Consul
Description
Exponential back-off delay in
de.codecentric.boot.admin.server.services.IntervalCheck
is increasing to hours. I noticed that after I run SBA for 2+ weeks, previously registered services go offline for hours and then they become available again. Restarting SBA helps right away. This is always accompanied by the error message:Unexpected error in status-check: reactor.core.Exceptions$OverflowException: Could not emit tick NN due to lack of requests (interval doesn't support small downstream requests that replenish slower than the ticks)
After some investigation it looks like this happens when checkAllInstances method times-out (takes longer to complete than the interval check) and it triggers a retry. The back-off interval keeps increasing with each failure during the life-time of the SBA and eventually grows to hours. I actually takes about 12+ retries, The situation improved by lowering
spring.boot.admin.timeout.health
to 3 seconds. By default health endpoint timeout is equal tospring.boot.admin.status-interval
(10s).Here's the code snippet that reproduces this behavior. It will slow-down with each retry
The text was updated successfully, but these errors were encountered: