Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to ros2dds bridge #37

Merged
merged 4 commits into from
Dec 26, 2024
Merged

Migrate to ros2dds bridge #37

merged 4 commits into from
Dec 26, 2024

Conversation

luca-della-vedova
Copy link
Member

@luca-della-vedova luca-della-vedova commented Oct 30, 2024

Closes #17.

This seems to work but required a fair bit of changes in the nexus_network_configuration package, so we should probably rethink what that package does and how.

Specifically, it had a builtin support for generating configs from redf, but it actually didn't and just used hardcoded endpoints here.
Furthermore, while with the dds plugin we can just specify endpoints as strings, the ros2dds plugin requires us to clearly say whether each substring is a publisher, subscriber, service server, service client, action server or action client.
The way I approached it in this PR is the same as the previous PR, updating the config file accordingly, however there are a few important changes to consider

Redf compatibility

redf does not have a way to specify for each topic / service etc. which nodes can publish and which can subscribe, so the only way to include redf config in its current implementation would be to allow each entity to be both, which breaks the isolation a bit since (for example), we currently do:

# System orchestrator
    allow:
      service_servers: [[...] "/register_workcell"]

# Workcell
    allow:
      service_clients: ["/register_workcell"]

Allowing both would mean technically allowing a rogue workcell to advertise a /register_workcell client which could create a lot of trouble in the system.
Note however that this is also an issue in the current implementation.

Because there was no usage of redf for network configuration I couldn't test it and whether it worked or not before I'm fairly sure it will stop working with this PR.

Queries timeout

The ros2dds bridge adds a concept of timeout for a query. They can be pretty granular but the main issue is that they also apply to actions so if an action was to take longer than the specified value it would fail.
I set it to 10 minutes as a default, but this should probably be double checked (increased? made more granular?).

Namespaces

Sadly, while the ros2dds bridge supports namespaces, they currently also apply them to global topics. This means that if we set a bridge with namespace workcell_1 and asked it to allow a subscription to /chatter it would silently only allow /workcell_1/chatter. This makes it pretty painful because workcells need to access global services to register, so we can't rely on namespaces and have to add them manually to each workcell configuration.

Warning spam

Last but not least, running the ros2dds bridge results in a lot of warnings being printed, they seem to be harmless but are still very noisy. An upstream issue has been opened and acknowledged by Zettascale.

@luca-della-vedova luca-della-vedova changed the base branch from main to luca/jazzy_support October 30, 2024 06:30
@luca-della-vedova luca-della-vedova changed the base branch from luca/jazzy_support to main October 30, 2024 06:31
@luca-della-vedova luca-della-vedova changed the base branch from main to luca/jazzy_support October 30, 2024 09:37
@luca-della-vedova luca-della-vedova force-pushed the luca/jazzy_support branch 2 times, most recently from 688818e to f48d054 Compare November 1, 2024 04:45
@luca-della-vedova luca-della-vedova force-pushed the luca/jazzy_support branch 5 times, most recently from 100ab0c to 822ea75 Compare November 19, 2024 05:56
Base automatically changed from luca/jazzy_support to main November 21, 2024 02:46
@luca-della-vedova luca-della-vedova force-pushed the luca/ros2dds_bridge branch 3 times, most recently from 90c8c3c to ed94ad7 Compare November 21, 2024 02:51
@luca-della-vedova luca-della-vedova changed the title WIP migrating to ros2dds bridge Migrate to ros2dds bridge Nov 21, 2024
@luca-della-vedova luca-della-vedova marked this pull request as ready for review November 21, 2024 03:08
Comment on lines +21 to +24
# TODO(luca) check if transporter needs available endpoint
service_clients: ["/.*/queue_task", "/.*/remove_pending_task", "/.*/pause", "/.*/signal", "/.*/get_state", "/.*/change_state", "/.*/is_task_doable"]
action_servers: []
# TODO(luca) check if we really need transporter client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea if these TODOs are still valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The demo / integration test works well but the configuration of these endpoints depends on where nodes are ran / how complete we want this config file.
The main crux is that in this specific implementation the system orchestrator and the transporter node run in the same domain ID so technically it is not necessary to allow all the transporter interfaces through Zenoh.
On the other hand, if we want to have this file be kind of a template that would also allow external transporters to connect to the system orchestrator we would need to add their interfaces here.

For the estop I'm just not 100% sure how it is supposed to work, since it seems it is a global topic do we just halt the whole system as soon as one workcell decides to estop? That sounds a bit dangerous so I left a note here.

I'm erring on the side of "make this a template" at least until a proper redf integration for generation of config files is not done and permissively allow all the interfaces we might need but what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the estop I'm just not 100% sure how it is supposed to work, since it seems it is a global topic do we just halt the whole system as soon as one workcell decides to estop? That sounds a bit dangerous so I left a note here.

Yeah we halt processes across all workcells if the estop is triggered. This was a requirement.

@luca-della-vedova
Copy link
Member Author

Warning spam was fixed upstream and in the bumped dependency 20c2eb6

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to run a second job after the first one completes but that could be unrelated to the changes here.

nexus_system_orchestrator-1] [ERROR] [1735196375.435664264] [system_orchestrator]: No workcell is able perform task [2]
[nexus_system_orchestrator-1] [ERROR] [1735196375.435700924] [system_orchestrator]: One or more tasks cannot be performed!

@Yadunund
Copy link
Member

I can confirm that with the main branch, I am able to run the pick_from_conveyor and place_on_conveyor processes back to back but am unable to do so with this branch.

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
@luca-della-vedova
Copy link
Member Author

Sigh. I bumped the patch version from 1.0.2 to 1.0.4 to avoid the warning spam and thought it was innocuous enough that I didn't have to retest the whole stack, but somehow that version (and the following 1.1.0) are broken.
I could reproduce, reverting it in d813ae9 helped, can you give it a spin?

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
@Yadunund Yadunund merged commit a61c71e into main Dec 26, 2024
2 of 3 checks passed
@Yadunund Yadunund deleted the luca/ros2dds_bridge branch December 26, 2024 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to newer zenoh-plugin-ros2dds bridge
2 participants