-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix registration logic in lifecycle manager #52
Conversation
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! The approach is a lot better. Left a minor comment to consider before we can merge this.
ok = ok && ( | ||
transitionGraph.atGoalState(state.value(), transition) || | ||
client->change_state(transition)); | ||
ok &= transitionGraph.atGoalState(state.value(), transition) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: By switching to bitwise AND we lose the optimization of short circuiting if ok
is false
so we can we keep the original implementation? Let me know if I missed something else.
Also not sure whats the point of returning a boolean in this function since we might have successfully transitioned some nodes and failed for others and the user would not know which ones succeeded. But that is beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting one, the idea behind this change is that previously as soon as ok
became false the short circuiting would make all the following clients skip transitioning as well.
With the new logic without short circuiting instead, all the following client will still transition.
This is also why I added the continue
above, before the short circuiting would make sure we never call .value()
even as new nodes do indeed have a state but since short circuiting is removed we need an early guard clause
nexus_lifecycle_manager/include/nexus_lifecycle_manager/lifecycle_manager.hpp
Show resolved
Hide resolved
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
nexus_lifecycle_manager/include/nexus_lifecycle_manager/lifecycle_manager.hpp
Outdated
Show resolved
Hide resolved
…cle_manager.hpp Co-authored-by: yadunund <yadunund@gmail.com>
This PR works toward #48 by fixing corner cases with workcells bringup / bringdown.
The logic in the lifecycle manager was flawed, specifically in the branch where nodes already exist and a new one is added.
Two main issues are fixed by this PR.
Re-registering the same workcell.
If Nexus has a single workcell, that workcell is brought down without proper cleanup (i.e. it suddenly crashes) then it's readded, we will go into the branch I linked above. However, when we try to get the target state
nexus/nexus_lifecycle_manager/include/nexus_lifecycle_manager/lifecycle_manager.hpp
Lines 376 to 381 in a0eff3c
We will actually query the workcell we are trying to create right now, hence try to transition it to its current state ending up in a NOOP. The new workcell will be successfully added but remain unconfigured.
workcell_1
is created,workcell_1
is killed,workcell_2
is createdAnother corner case of the check above, now we will actually try to get the target state from a workcell that died, was not cleaned up properly and will thus be unreachable, adding new nodes will fail.
The solution
The solution I came up with is actually a large simplification of the logic. I just created a new class variable
_target_state
that contains the state that we want the nodes to be in.autostart
the target state will beACTIVE
, otherwiseUNCONFIGURED
._autostart
itself can be removed, now it just sets_target_state
to ACTIVE instead.system_active
can be removed, we just check that the_target_state
is ACTIVE.changeStateForAllNodes
we will also update this internal_target_state
._target_state
.The drawback
The only "corner" case I can think of is in cases in which we want to manage a set of nodes at different states, in which the concept of a single target state that new nodes should be transitioned to will not be applicable. Two main points about this:
unordered_map
and transition new nodes to that state.unordered_map
is... unordered... so the first item could be literally any node depending on what the hashing does.