-
-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the extra statistics of relative relocations in large binaries #1375
base: main
Are you sure you want to change the base?
Add the extra statistics of relative relocations in large binaries #1375
Conversation
I see the need to do something like this, but I'd like to learn a little bit more about the background.
Also, please share some performance numbers of the linker if you have. I'm just curious how much useful mold is compared to other linkers for users like you. |
@rui314 Sure,
I use several techniques. For example, based on
I know the structure of my binary. It requires to use some tricks and non-regular solutions like I mentioned earlier, including reordering.
This depends on various factors. Typically, the text section size ranges from 1.9 to 2.6 GB. Unfortunately, I cannot disclose the size of the data segment or the size of the binary after stripping debug info. Of course, the debug is the major part.
I try to avoid dynamically loaded dependencies. Implementations of standard libraries for C++ and C, runtime libraries, and compiler-dependent routines can be linked statically. This is not a regular practice and can be error-prone in general cases.
|
Here is my understanding: You have some control over how object files are compiled, but you don't want to compile everything with Is my understanding correct? I was contacted by another big-tech company regarding resolving the same issue. I want to create a feature that works for everybody, so please hold off on approving this PR. For now, please maintain this as your local private patch. |
@rui314 Yes, your understanding is absolutely correct. Thanks! |
Motivation
I am using the
mold
linker to quickly link a large monolithic application (over 20 GB with debug information). The primary challenge with my binary is the constantly growing (and sometimes uncontrolled) portion of the business logic. Furthermore, the structure of the application's dependencies is highly heterogeneous, and I lack the ability to control how they were compiled — whether withPIE
or without, and whether with-mcmodel=large
or not. This leads to unpredictable issues during linking; for example, certain relocations (e.g., PC-relative) cannot be resolved because sections containing business logic have become too large (e.g., R_X86_64_32S allows for offsets less than ±2GB).Using the
-mcmodel=large
and producing only absolute relocations for all components of the binary is not feasible in my case. Therefore, I need a method to detect the relocations nearest to overflowing. Based on your design principles of determinism and build reproducibility, I can rely on the fact that the resulting binary structure will not change significantly from one build to another.Solution
For each architecture, there are
apply_reloc_alloc
andapply_reloc_nonalloc
methods inInputSection
where thecheck
routine verifies the relocation range depending on the relocation type. We can update thecheck
routine to record the minimum distance to the upper and lower bounds of the range for the current section. After processing all relocation entries of a section, we can update the global minimums in the context.As a result, we will obtain two new metrics:
relative_relocations_offset_infimum
andrelative_relocations_offset_supremum
, which can be interpreted as indicators of "how much space is still available" in the large binary. Although these aren't universal indicators, they may be extremely helpful for managing large monolithic applications.Impact
--stats
option is enabled.