Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real time benchmarking / reasoning #615

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fmauch
Copy link
Contributor

@fmauch fmauch commented Jan 30, 2023

This PR aims at providing more detailed information about setting up a system for stable communication with the robot.

This is best combined with UniversalRobots/Universal_Robots_Client_Library#139

Planned additions (maybe other PRs)"

  • Support non-blocking read in our HW-Interface node
  • Update PREEMPT_RT build instructions

We did a test running the driver on different kernels with different configurations.
These are the results from those tests.

You might still be able to control the robot using a non-real-time system. This is, however, not recommended.

For getting an almost real-time capable systems there are two methods available:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level, but "almost real-time" is not really how you'd phrase this.

I believe the correct terminology would be: best effort, soft real-time and hard real-time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. Yes, writing this didn't feel too good. I deliberately chose none of the above, since I was not sure where to put the different configurations into. A PREEMPT_RT kernel-based system using SCHED_FIFO could probably called soft real-time, but for the lowlatency kernel I am not sure.

From my understanding soft real-time and hard real-time are more or less defined categories of real-time systems, while I wanted to state that the measures above will give you a system that comes close to the behavior of a real-time system (I guess "close to a soft real-time system") would be the better terminology here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without going through the text in detail, I'd say low-latency kernel is best-effort.

PREEMPT_RT probably soft real-time, as you already suggested -- although there seems to be some discussion/contention around that. I'd expect companies like VxWorks and QNX to deem it soft, while the goal of the project has been summarised as "making Linux hard real-time capable".

Copy link
Contributor

@gavanderhoorn gavanderhoorn Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to state that the measures above will give you a system that comes close to the behavior of a real-time system

perhaps describing the behaviour of the system would be better than trying to give it a name.

You seem to have a large set of measurements. Why not just state "with PREEMPT_RT, you get average activation latency of X, max jitter of Y, etc" (maybe you already do this)? Then link those KPIs to behaviour of the driver and/or robot in typical usage scenarios ("with PREEMPT_RT and a direct cable connection, a typical MoveIt-plus-10-other-nodes configuration should be able to achieve 500 Hz, while .."). Something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's not the point I want to make there, since for those kind of analysis we have the separate document, but a description-based approach seems to be a good solution.

PREEMPT_RT probably soft real-time, as you already suggested -- although there seems to be some discussion/contention around that. I'd expect companies like VxWorks and QNX to deem it soft, while the goal of the project has been summarised as "making Linux hard real-time capable".

Which is why I wanted to avoid such terms ;-)

@github-actions
Copy link

github-actions bot commented May 1, 2023

This PR hasn't made any progress for quite some time and will be closed soon. Please comment if it is still relevant.

@github-actions github-actions bot added the Stale label May 1, 2023
@shuobh
Copy link

shuobh commented Jun 21, 2023

One additional point for comparison -- We've integrated the proper freedrive mode and compared the performance by switching between trajectory execution and freedrive mode for 20k trajectories. One with low latency kernel fails 1% of the time with error saying that it can not execute trajectory in freedrive mode yet real time kernel only failed once.

I also want to circle back to an issue about sched_FIFO. We tried to dive a bit into it and somehow it is related to user privilege, where setting priority returns "Operation not permitted". Some suggests to set the application to run with sudo privilege but it is not very easy for ROS and it might introduce other issues. Is there another way to fix this?

@fmauch
Copy link
Contributor Author

fmauch commented Jul 3, 2023

I also want to circle back to an issue about sched_FIFO. We tried to dive a bit into it and somehow it is related to user privilege, where setting priority returns "Operation not permitted". Some suggests to set the application to run with sudo privilege but it is not very easy for ROS and it might introduce other issues. Is there another way to fix this?

We have that covered by our tutorial. You can give your user the privilege to set the priority. I would not recommend running the application using sudo.

@shuobh
Copy link

shuobh commented Jul 3, 2023

I also want to circle back to an issue about sched_FIFO. We tried to dive a bit into it and somehow it is related to user privilege, where setting priority returns "Operation not permitted". Some suggests to set the application to run with sudo privilege but it is not very easy for ROS and it might introduce other issues. Is there another way to fix this?

We have that covered by our tutorial. You can give your user the privilege to set the priority. I would not recommend running the application using sudo.

I followed the tutorial to set user privileges but the problem is still there, same as other people mentioned in the post I linked. I've pasted my setup below.

infinity@nuc-41-robot-35:~$ id -Gn
infinity adm cdrom sudo audio dip plugdev lpadmin pulse pulse-access lxd sambashare realtime
infinity@nuc-41-robot-35:~$ cat /etc/security/limits.conf
# /etc/security/limits.conf
#
#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - a user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#        - NOTE: group and wildcard limits are not applied to root.
#          To apply a limit to the root user, <domain> must be
#          the literal username root.
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open file descriptors
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#        - chroot - change root to directory (Debian-specific)
#
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#root            hard    core            100000
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4
@realtime soft rtprio 99
@realtime soft priority 99
@realtime soft memlock 102400
@realtime hard rtprio 99
@realtime hard priority 99
@realtime hard memlock 102400
# End of file

@fmauch
Copy link
Contributor Author

fmauch commented Jul 3, 2023

@shuobh tbh I don't know what's going wrong there. However, this bringing discussions in this PR out of line. Please open a separate issue for that. Although, as I said, I don't quite know what is going on there.

@samialperen
Copy link

@shuobh @fmauch What is the current status on this on ROS2 side? I think documentation is not super up to date. What is the suggested way to get the best performance? I see that in this MR a comment by @fmauch says Regardless of the actual kernel used, setting the producer thread to FIFO scheduling seems to improve robustness.. However on this thread, @shuobh suggesting that rt kernel performs better based on his tests. Could you guys shed a bit more light on this? The documentation on the repos are not up to date :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trajectory Execution Failed Sliently due to "Sending data through socket failed."
4 participants