Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SubConn idleness when connections are lost #4298

Closed
dfawley opened this issue Mar 25, 2021 · 8 comments
Closed

Support SubConn idleness when connections are lost #4298

dfawley opened this issue Mar 25, 2021 · 8 comments
Assignees
Labels
P2 Type: Feature New features or improvements in behavior

Comments

@dfawley
Copy link
Member

dfawley commented Mar 25, 2021

C and Java both have this behavior. Go does not, and it can cause problems where idle clients are actively reconnecting to backends they don't need to use. Note that if a round robin LB policy is active, it will still maintain persistent connections to all backends, which is by design. Pick first, however, will stay disconnected until the next RPC.

@dfawley dfawley added P2 Type: Feature New features or improvements in behavior labels Mar 25, 2021
@dfawley dfawley self-assigned this Mar 25, 2021
@dfawley
Copy link
Member Author

dfawley commented Mar 25, 2021

While related, #1719 and #1786 are separate issues.

@menghanl
Copy link
Contributor

This includes when a GOAWAY is received. The client should stay IDLE until a new RPC is started.

@Jun10ng

This comment has been minimized.

@Jun10ng
Copy link

Jun10ng commented Apr 16, 2021

I jump this issue from #4282
from my understanding, after client receive GOAWAY, the connection that received GOAWAYshould become IDLE instead of STOP to prevent client creates a new connection and GracefulStop() closes two connection at the same time?

PLMK if I understand it incorrectly.

Thanks!

@menghanl
Copy link
Contributor

No, the old connection that received GOAWAY is DRAINING. The new connection should be in IDLE.

The current behavior is, the new connection will be CONNECTING -> READY. So a new TCP connection is always created immediately, even if there's no RPC to use it.
This feature request is to delay the new connection, until there's a new RPC.

@ipochi
Copy link

ipochi commented Jan 4, 2022

Any updates on this feature ? @dfawley

In our setup I have a gut feeling this is whats causing the memory leak.

client is Kubernetes api-server and the grpc-server runs as a proxy(apiserver-network-proxy) on the cluster as a pod, acting as a bridge between control plane and worker nodes.

We have a situation where we are seeing memory leaks but not able to identify the root cause of the issue. For some reason setting MaxConnectionIdle doesn't work. However setting MaxConnectionAge and MaxConnectionAgeGrace works but is not ideal as it terminates the connection regardless of activity from the client. Despite setting the mentioned fields, even if there are no active gRPC connections or open filedescriptors as seen in the prometheus metrics, only ~10% memory reduction is seen ?

Could it possibly be the case where we are hitting this issue ?

@dfawley
Copy link
Member Author

dfawley commented Jan 5, 2022

Any updates on this feature?

This was implemented in #4613. gRPC will now not reconnect until an RPC is attempted if a connection is lost.

It's unlikely that this kind of thing would fix memory leak issues, although it will make gRPC consume fewer resources when a ClientConn goes idle and the server closes the connection (e.g. due to max idle or max age settings).

Note that MaxConnectionAgeGrace on the server mostly addresses your concerns about "terminates the connection regardless of activity from the client". It can lead to RPCs being hard-stopped after the grace period, but only if they take longer than the grace period. New RPCs will go on a new connection, and the old connection will only be closed when there are no more RPCs on it or when the grace period expires.

@dfawley
Copy link
Member Author

dfawley commented Jan 5, 2022

Closing issue since the feature it describes is implemented by #4613.

@dfawley dfawley closed this as completed Jan 5, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P2 Type: Feature New features or improvements in behavior
Projects
None yet
Development

No branches or pull requests

4 participants