-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock when calling Session.Close() while control connection is reconnecting #1687
Comments
Thanks! This could be the same issue as described in #1677. |
We switched from separate mutex for closing to sessionStateMu in 312a614. This change introduced a deadlock. We don't need to hold the mutex for the whole duration of Close(), we only need to update the status atomically. Previously IsClosed() returned true only after all closing is done (because of the deferred unlock). We can emulate that by setting isClosed at the end of Close(). We need a new variable to ensure that Close() is only executed once. Fixes apache#1687
Oh sorry, I glanced at that one but I didn't find anything related to batches when reproducing this so I kind of ignored it but you're right it looks like the same issue especially when looking at the stack traces. |
I opened a pull request with a possible fix, but I haven't verified it yet. Please let me know if it fixes the issue for you. I will be mostly unavailable next ~10 days so I will look into it in ~2 weeks. |
I have a deadlock that is pretty similar in description to this in v1.6.0
It looks like a race condition between
|
Nice catch, I suggest you add a new issue with those details, the fix should be pretty simple. |
Got it, thanks #1752 |
I ran into this while testing #1680
Calling
Session.Close()
while the control connection is reconnecting can lead to a deadlock.This can be reproduced with the following integration test:
After a few seconds if you dump goroutines you will see that one of them will be stuck here:
And another one here:
controlConn.close()
gets stuck writing to thec.quit
channel and this happens while the goroutine is holding aLock
fors.sessionStateMu
.The
c.quit
channel is supposed to be read fromcontrolConn.heartBeat()
but if this goroutine is in the process of callingcontrolConn.reconnect(refreshring bool)
then it will eventually callc.session.initialized()
(as part ofcontrolConn.setupConn(conn *Conn)
) which tries to obtain aRLock
fors.sessionStateMu
and a deadlock happens.The text was updated successfully, but these errors were encountered: