Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overriding libc FS-related calls in Emscripten (for custom user's FS goodness) #23302

Open
vadimkantorov opened this issue Jan 5, 2025 · 10 comments

Comments

@vadimkantorov
Copy link

vadimkantorov commented Jan 5, 2025

I've implemented a read-only, virtual FS - for supporting "data package-like" ZIP-archives with musl libc by copying libc.a (for further linkage), prefixing FS-related symbols with orig_ using objcopy (e.g. open -> orig_open) and then I'm defining open(...)(and other related FS functions) in my own code (which callsorig_open` under the hood when the accessed path does not match the virtual FS one).

Is it feasible to reuse my custom open(...) code with Emscripten/libc somehow?

For now, LLVM does not have support for renaming WASM symbols:

Does Emscripten support using dlsym(RTLD_NEXT, "open"); specifically for libc?

Or maybe is there another way to share my implementation code between regular musl libc and Emscripten/libc (and hook/override the Emscripten/libc default FS-related functions)?

(in this way I can come up with my own portable impl of "data packages", so that the data packages can be used with both Emscripten and not Emscripten, and so that multiple compression formats of data packages can be supported and experimenting with no ahead-of-time decompression)

Thanks!


Basically looking for a portable (so should work outside Emscripten context as well), minimalistic (hopefully header-only), non-intrusive libc-level virtual FS support - my take was overriding some FS posix/libc/c functions sufficient for my application, a more complete virtual FS support is a more complex (and still leaky) feat... Some related attempts:

@sbc100
Copy link
Collaborator

sbc100 commented Jan 5, 2025

How does this work on non-emscripten platforms? Do they all currently use RTLD_NEXT?

Emscripten does not current support RTLD_NEXT, and furthermore we strongly prefer static linking where RTLD_NEXT simply cannot work. Do you have any other existing methods for doing this other than RTLD_NEXT?

wasm-ld does support the --wrap command line flag which is designed for exactly this kind of thing: https://ftp.gnu.org/pub/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html

--wrap symbol
Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to __wrap_symbol. Any undefined reference to __real_symbol will be resolved to symbol. This can be used to provide a wrapper for a system function. The wrapper function should be called __wrap_symbol. If it wishes to call the system function, it should call __real_symbol. Here is a trivial example:
void *
__wrap_malloc (int c)
{
  printf ("malloc called with %ld\n", c);
  return __real_malloc (c);
}
If you link other code with this file using --wrap malloc, then all calls to malloc will call the function __wrap_malloc instead. The call to __real_malloc in __wrap_malloc will call the real malloc function. You may wish to provide a __real_malloc function as well, so that links without the --wrap option will succeed. If you do this, you should not put the definition of __real_malloc in the same file as __wrap_malloc; if you do, the assembler may resolve the call before the linker has a chance to wrap it to malloc.

However that would require adding a while bunch of linker flags which might be desirable.

If the objcopy solution is something you would consider I think there has been some work on llvm/llvm-project#50623 so that its possible that could be an option in the future.

Another option is to add internal aliases for the "real" versions of these symbol. Musl already does it itself for a lot of symbols. e.g:

weak_alias(__isprint_l, isprint_l);

@vadimkantorov
Copy link
Author

vadimkantorov commented Jan 5, 2025

How does this work on non-emscripten platforms? Do they all currently use RTLD_NEXT?

I have two working variants:

  1. using dlsym(RTLD_NEXT, "open") for when libc is linked dynamically and libdl is available
  2. for static linkage with musl libc, I prepared my custom copy of libc.a where I orig_-prefixed the file-related functions

So I wonder if any of these two are also applicable for Emscripten/libc? (for (2) probably llvm/llvm-project#50623 should be implemented first. I also had problems that some object files somehow re-export optind/optarg etc which conflict with multiple definitions with libc, so I had to prefix these in libc too)

wasm-ld does support the --wrap command line flag which is designed for exactly this kind of thing

Wow, this is very interesting! Should it also work for static linkage with regular musl libc?

I also wonder is it possible to "hide" certain symbols, especially data-related symbols like optind, from libc during static linkage (without having to make a copy with renamed symbols)?

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

2. for static linkage with musl libc, I prepared my custom copy of libc.a where I orig_-prefixed the file-related functions

How about adding alias for all of the original libc symbols you want to wrap. Adding those aliases to the emscripten version of musl seems like a reasonable request. Then method (2) would work fine with emscripten and without needing two different builds of libc.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

I also wonder is it possible to "hide" certain symbols, especially data-related symbols like optind, from libc during static linkage (without having to make a copy with renamed symbols)?

Not that I know of.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

  1. for static linkage with musl libc, I prepared my custom copy of libc.a where I orig_-prefixed the file-related functions

How about adding alias for all of the original libc symbols you want to wrap. Adding those aliases to the emscripten version of musl seems like a reasonable request. Then method (2) would work fine with emscripten and without needing two different builds of libc.

This method seems less invasive and also avoids needing two different builds of libc.

@vadimkantorov
Copy link
Author

How about adding alias for all of the original libc symbols you want to wrap. Adding those aliases to the emscripten version of musl seems like a reasonable request.

Do you mean adding the orig_ symbols to the Emscripten libc? I think, maybe more sustainable would be to just improve llvm-objcopy for wasm: llvm/llvm-project#50623 (and figure out if Emscripten/libc could be manipulated similarly to to regular musl libc.a, as the exact list of file-related functions is still somewhat application-dependent, and other users may want to override other groups of symbols

I'll also try --wrap - if it works, it's perfect for my narrow usecase (if I manage to bypass the problem with duplicated optind/optarg/optopt symbols)

And I still wonder if doing sth like this would work with Emscripten/libc (variant (1)) - as I don't fully understand the state of dynamic linking / dlsym support and especially of linkage to libc:

#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <dlfcn.h>
#include <sys/stat.h>

int open(const char *path, int flags, mode_t mode)
{
    typedef int (*orig_func_type)(const char *pathname, int flags, mode_t mode);
    fprintf(stderr, "log_file_access_preload: open(\"%s\", %d)\n", path, flags);
    orig_func_type orig_func = (orig_func_type)dlsym(RTLD_NEXT, "open");
    return orig_func(path, flags, mode);
}

@vadimkantorov
Copy link
Author

Not that I know of.

I wonder what would happen if I use --wrap to hide the symbols optind etc from libc, but without having __wrapped version defined... (and instead having optind in another object file)

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

RTLD_NEXT does not work today in emscripten, so I think your options are:

  1. Modify libc sources to add aliases explicitly
  2. Use --wrap to have the link at aliases
  3. Use objcopy to rename symbols.

(1) and (2) should work today. (3) requires form work on the llvm side which I believe has started but who knows when it will be ready.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 6, 2025

(1) seems fairly easy since you can just apply to to whatever symbols you need to wrap, and you can easily more symbols if you discover you need more.

I don't believe any of these methods work for non-function symbols like optint (RTLD_NEXT also doesn't work for non-functions symbols).

@curiousdannii
Copy link
Contributor

curiousdannii commented Jan 6, 2025

I previously used --wrap to interject for getc(stdin) and use my own async input method. Worked great.

See here: https://github.com/curiousdannii/emglken/blob/v0.6/src/getc.c

(I no longer use it because I'm now completely avoiding the libc stdio API and effectively added my own syscalls instead.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants