-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX[MQB]: race in admin domain remove #567
Conversation
8222790
to
4bac73e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, may need few tweaks
d_state = e_PREREMOVE; | ||
d_teardownRemoveCb = bsl::nullptr_t(); | ||
} | ||
|
||
void Domain::removeDomainComplete() | ||
{ | ||
bslmt::LockGuard<bslmt::Mutex> guard(&d_mutex); // LOCK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be no need for removeDomainComplete
. If the Domain knows, it is being removed and it calls tearDownCb
in teardownRemove
/ d_teardownRemoveCb
in unregisterQueue
, it can make the state transition to e_POSTREMOVE
.
In any case, changing Domain state (including d_teardownRemoveCb
) should be done under the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I did it outside of teardownRemove / d_teardownRemoveCb in unregisterQueue was because I hoped the second pass to only be able to go through until the first pass is fully finished, and I didn't want the second pass to be issued between these two lines. I'm not fully sure if this concern is valid tho.
And for locking - I thought the state is an AtomicInt
so we probably don't need to protect with another mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the concern about removing domain before mqbblp::Queue
? Cause, the queue keeps raw pointer to the domain?
f2b7804
to
ad9642c
Compare
|
||
// 5. Mark DOMAIN REMOVED to accecpt the second pass | ||
|
||
// 3. Mark DOMAIN REMOVED to accecpt the second pass | ||
bmqu::SharedResource<DomainManager> self(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful with bmqu::SharedResource
. Calling self.acquire()
followed by destructing/invalidating without releasing the shared_ptr
will result in a deadlock.
a974d15
to
5897540
Compare
99fff14
to
a6792a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One small comment
c02472a
to
0f92b2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
0f92b2d
to
fbc2f58
Compare
1. Check queue open status in cluster thread to prevent race when checking if the queue is actively used 2. Consolidate purge and GC since they're both called in cluster thread 3. Decide whether or not to call d_teardownCb based on function pointer being nullptr or not, since d_state can be rewritten when shutdown is called after DOMAINS REMOVE 4. Remove e_REMOVING and e_REMOVED since these states are not necessary when we check if d_teardownRemoveCb is assigned to decide whether to call it 5. Change e_PREREMOVE to e_REMOVING 6. Replace the use of e_POSTREMOVE to e_STOPPED since there's a chance the state of a domain could be changed to e_POSTREMOVE after e_STOPPING in a late unregisterQueue 7. Explicitly invalidate a SharedResource of DomainManage. Commented in the code for the reason Signed-off-by: Emelia Lei <wlei29@bloomberg.net>
fbc2f58
to
41f9b5b
Compare
when we check if d_teardownRemoveCb is assigned to decide whether to call it