Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate symbols exported from skrouterd and find improvement opportunities #1164

Open
jiridanek opened this issue Jul 15, 2023 · 2 comments
Assignees
Labels
task Minor work item - not a bug or feature

Comments

@jiridanek
Copy link
Contributor

jiridanek commented Jul 15, 2023

tl;dr: Do we have a serious -rdynamic problem?

No, because our binaries are not built with -fPIC (which would make all calls indirect, going through PLT).
The binaries are built with -fPIE, which does not have this effect.

There is a small inefficiency in that all symbols that can be exported are currently being exported due to use of -rdynamic, which probably makes the startup and looking up symbols from Python the first time tiny bit slower.

Therefore, I propose not to do -fno-semantic-interposition, because it has no effect on the generated code.
However, I propose doing -fvisibility=hidden, because we are already marking the symbols that do need to be exported with the explicit visibility annotation.
See the QD_EXPORT macro that does just that.

The risk to the above is that we forget to mark something as visible, and we only learn about the mistake at runtime, when Python tries to lookup the symbol and it fails.
Hopefully this will be caught by CI, but in the worst case, it can result in a crash in production.

The router tries to do most of its symbol lookups during initialization, but not everything is looked up in that single place

def __init__(self) -> None:
# `dlopen(NULL, ...)` opens the current executable; the router used to dlopen libqpid-dispatch.so before
super().__init__(name=None, mode=os.RTLD_LAZY | os.RTLD_NOLOAD)
# Types
self.qd_dispatch_p = c_void_p
# No check on qd_error_* functions, it would be recursive
self._prototype(self.qd_error_code, c_long, [], check=False)
self._prototype(self.qd_error_message, c_char_p, [], check=False)
self._prototype(self.qd_log_entity, c_long, [py_object])
self._prototype(self.qd_dispatch_configure_router, None, [self.qd_dispatch_p, py_object])
self._prototype(self.qd_dispatch_prepare, None, [self.qd_dispatch_p])
self._prototype(self.qd_dispatch_configure_listener, c_void_p, [self.qd_dispatch_p, py_object])
self._prototype(self.qd_dispatch_configure_connector, c_void_p, [self.qd_dispatch_p, py_object])
self._prototype(self.qd_dispatch_configure_ssl_profile, c_void_p, [self.qd_dispatch_p, py_object])
self._prototype(self.qd_dispatch_configure_tcp_listener, c_void_p, [self.qd_dispatch_p, py_object])
self._prototype(self.qd_dispatch_configure_tcp_connector, c_void_p, [self.qd_dispatch_p, py_object])

Investigation

There are multiple aspects to this.
First, -rdynamic switch, then the -fno-semantic-interposition, and -fvisibility=hidden

-rdynamic

Router binary is built with -rdynamic which is enabled by doing

set(CMAKE_ENABLE_EXPORTS TRUE)

This is necessary so that Python part of skrouterd can lookup symbols for C functions to call.

% nm --dynamic router/skrouterd | wc -l
2340

There is about 2k symbols being exported like this.
On Linux, the executables are ELF files, same as shared libraries, so that's how an executable can work like a bit of both.

And, by the way, some shared libraries are executable, such as /usr/lib64/ld-linux-x86-64.so.2 --help.

The number of functions marked with QD_EXPORT macro is far smaller, CLion IDE says 55.
We can limit what gets exported like this using -fvisibility=hidden, as described in https://stackoverflow.com/questions/37142301/rdynamic-for-select-symbols-only/76671650

-fno-semantic-interposition

The flag and what it does is well described in blog http://maskray.me/blog/2021-05-09-fno-semantic-interposition.
Another source is this Fedora proposal (where I first learned about it) https://fedoraproject.org/wiki/Releases/32/ChangeSet#Build_Python_with_-fno-semantic-interposition_for_better_performance.

This is useful only for shared libraries, that is, when compiling with -fPIC.
Router gets built as -FPIE, which does not use the PLT (Procedure Linkage Table) to turn all calls into indirect ones.

With -fPIC and with semantic interposition (which is the default on GCC),

void f(void){}
void main(void) { f(); }

gets compiled (the call to f) as

call    f()@PLT

without semantic interpostion (with -fno-semantic-interposition), we get

call    f() [clone .localalias]

Even though this does not affect skrouterd directly, and it does not affect Proton (since that is compiled statically), it probably does affect libwebsockets and libunwind, which get compiled into a dll.

Possible further investigation

Therefore, it might make sense to use -fno-semantic-interposition for those two dependencies mentioned above.

-fvisibility=hidden

To support that, what needs to be exported needs to be annotated explicitly with __attribute__((visibility("default"))) __attribute__((used))

This is in the codebase for a long time, through the QD_EXPORT macro, added in

So far, the correctness (that I got them all) was never tested in production, only by my ad-hoc attempts to

  1. compile the codebase on Windows (where hidden visibility was the default; I guess we still had the shared dll back then?)
  2. compile the codebase as c++, because unless declared extern C, the compiler mangles the symbol name according to c++ rules

The benefit to doing this is probably tangible, but most likely very small.
While the risk is also tangible and small, but the consequences (if we forget to export something that should've been exported, and don't catch that in testing) will be bad.
If I were you, I probably wouldn't do it.

Edit: interesting blog about visibility and interposing, https://www.airs.com/blog/archives/307

Additional option, -Wl,--gc-sections

There is the idea to

  • compile libwebsockets and libunwind statically, if possible
  • and then, use -ffunction-sections, -fdata-sections, -Wl,--gc-sections to tell the linker to identify unused functions and not link them.

This might be useful to eliminate unused parts of Proton, LWS, and libunwind. Of course, there is the same risk as above, if some object is looked up dynamically, it will be eliminated and then later found missing at runtime, with bad consequences.

@jiridanek jiridanek added the task Minor work item - not a bug or feature label Jul 15, 2023
@jiridanek jiridanek self-assigned this Jul 15, 2023
@jiridanek
Copy link
Contributor Author

Here's a fairer symbol count before and after -fvisibility=hidden.

Before:

% nm --dynamic --defined-only router/skrouterd | wc -l
1741

After:

% nm --dynamic --defined-only router/skrouterd | wc -l
70

@jiridanek
Copy link
Contributor Author

jiridanek commented Jul 15, 2023

Regarding --gc-sections, using with dynamically linked Proton ended up reducing skrouterd size from 4964 kbytes in size to 4956 as reported from du ;P

With static Proton,

% du router/skrouterd
6836 router/skrouterd

% du router/skrouterd
6824 router/skrouterd

And when I put -ffunction-sections -fdata-sections to Proton compilation options

% du router/skrouterd
6836 router/skrouterd

% du router/skrouterd
6824 router/skrouterd

one more attempt

% du router/skrouterd
6404 router/skrouterd

Phew, finally some decent reduction in size. I put in -fvisibility=hidden, no mucking with --gc. Looks like hiding symbols makes the binary a bit smaller. We are already doing LTO...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Minor work item - not a bug or feature
Projects
None yet
Development

No branches or pull requests

1 participant