-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use statically linked loader #2500
Comments
I also noticed that Windows already uses a launcher executable: rules_python/python/private/py_executable_bazel.bzl Lines 62 to 68 in e823657
I wrote such a launcher executable (template) for Linux to replace the stage1 bootloader script: #include <errno.h>
#include <unistd.h>
#include <cstdlib>
#include <cstring>
#include <filesystem>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
#include "tools/cpp/runfiles/runfiles.h"
using bazel::tools::cpp::runfiles::Runfiles;
std::string find_python_interpreter(Runfiles& runfiles,
const std::string& interpreter_path) {
if (interpreter_path.length() > 0 && interpreter_path[0] == '/') {
// An absolute path, i.e. platform runtime
return interpreter_path;
} else if (interpreter_path.find('/') != std::string::npos) {
// A runfiles-relative path
return runfiles.Rlocation(interpreter_path);
} else {
// A plain word, e.g. "python3". Rely on searching PATH
return interpreter_path;
}
}
int main(int argc, char** argv) {
std::string STAGE1_BOOTSTRAP = argv[0];
std::string STAGE2_BOOTSTRAP = "%stage2_bootstrap%";
std::string PYTHON_BINARY = "%python_binary%";
std::string error;
std::unique_ptr<Runfiles> runfiles(
Runfiles::Create(STAGE1_BOOTSTRAP, BAZEL_CURRENT_REPOSITORY, &error));
if (runfiles == nullptr) {
std::cerr << "ERROR: Could not resolve runfiles root: " << error << std::endl;
return 1;
}
std::string python_exe = find_python_interpreter(*runfiles, PYTHON_BINARY);
if (!std::filesystem::is_regular_file(python_exe)) {
std::cerr << "ERROR: Python interpreter not found: $python_exe"
<< std::endl;
return 1;
}
// TODO check if executable to provide better error
std::string stage2_bootstrap = runfiles->Rlocation(STAGE2_BOOTSTRAP);
// Don't prepend a potentially unsafe path to sys.path
// See: https://docs.python.org/3.11/using/cmdline.html#envvar-PYTHONSAFEPATH
// NOTE: Only works for 3.11+
// We inherit the value from the outer environment in case the user wants to
// opt-out of using PYTHONSAFEPATH. To opt-out, they have to set
// `PYTHONSAFEPATH=` (empty string). This is because Python treats the empty
// value as false, and any non-empty value as true.
int result = setenv("PYTHONSAFEPATH", "1", false);
if (result != 0) {
std::cerr << "ERROR: Failed to set PYTHONSAFEPATH: " << strerror(errno)
<< std::endl;
}
// TODO set RUNFILES_DIR env var to runfiles root
// Why does runfiles->Rlocation(".") not work?
// We use `exec` instead of a child process so that signals sent directly
// (e.g. using `kill`) to this process (the PID seen by the calling process)
// are received by the Python process. Otherwise, this process receives the
// signal and would have to manually propagate it. See
// https://github.com/bazelbuild/rules_python/issues/2043#issuecomment-2215469971
// for more information.
std::vector<const char*> args(argv + 1, argv + argc);
args.insert(args.begin(), {python_exe.c_str(), stage2_bootstrap.c_str()});
// const_cast is safe: https://stackoverflow.com/a/19505361
execvp(args[0], const_cast<char**>(args.data()));
// If execvp returns, there was an error.
std::cerr << "Error executing command\n";
return 1;
} This template needs to be evaluated to resolve the following variables:
I tested this with the following load("@rules_python//python:defs.bzl", "py_binary")
load("@rules_oci//oci:defs.bzl", "oci_image", "oci_load")
load("@rules_pkg//:pkg.bzl", "pkg_tar")
load("@rules_cc//cc:defs.bzl", "cc_binary")
genrule(
name = "loader_src",
srcs = ["loader.cc.tmpl"],
outs = ["loader.cc"],
# requires `--@rules_python//python/config_settings:bootstrap_impl=script` to create the stage2 bootstrap
cmd = 'sed -e "s:%stage2_bootstrap%:_main/zz/_zz_stage2_bootstrap.py:" -e "s:%python_binary%:rules_python~~python~python_3_11_x86_64-unknown-linux-gnu/bin/python3:" "$<" > "$@"',
local = 1,
)
cc_binary(
name = "loader",
srcs = ["loader.cc"],
deps = [
"@bazel_tools//tools/cpp/runfiles",
],
)
py_binary(
name = "zz",
srcs = ["zz.py"],
)
pkg_tar(
name = "zz_layer",
srcs = [
"loader",
":zz",
],
include_runfiles = True,
strip_prefix = "/",
)
oci_image(
name = "zz_image",
base = "@distroless_base",
entrypoint = ["/zz/loader"],
tars = [":zz_layer"],
workdir = "/",
)
oci_load(
name = "zz_image.tar",
image = ":zz_image",
repo_tags = ["zz/zz_image:latest"],
) |
cc @groodt who I think also liked the idea of a native binary to launch things |
I made a proposal a while ago, but nothing has really progressed: bazelbuild/proposals#275 I'm supportive of the idea, I'm just concerned about teams having to bring additional toolchains for compiling native launchers. Ideally it would be out of the box with bazel, since I think there are many interpreted languages that could benefit from a native launcher, but that arguably is more challenging to solve than solving it in rules_python. Now that the rules are fully extracted out of bazelbuild/bazel, I imagine this could be something that is tackled eventually. But it's probably simpler to have a small docker image with python in it, or an approach like the one posted above, which is a neat solution to the problem. |
As of #1929 we have the flag
The problem is that it requires Python twice, once in the image to bootstrap and then additionally packaged as part of the runfiles. A full Python runtime is not "small". |
re: code: That looks pretty promising! The main case I think is missing is the zip case. I guess statically link zlib into it (and we don't necessarily have to use zip, could use another format)? For prototyping this, having the py_executable macro create a cc_binary is probably the easiest thing to do. Wiring it in is probably going to be hacky, but such is a prototype. For the final code, though, I see two options: (2) Use cc_binary as-is, but modifying it after-the-fact, similar to the windows launcher. If we had a way to modify the contents of a binary (to perform the string replacements necessary), then this seems preferable to (1). The windows launcher does some sort of trick to append a couple extra lines onto the binary, which works, but also seems a bit hacky. Being able to e.g. stick the paths in a special elf section or something seems much more appealing. Also, this doesn't have to use C++. Anything that produces a native, standalone executable would suffice (e.g rust is all the rage now).
Yeah, it does except it's primitive and out of our control, so it's more of a headache than a help for us. I wanted to replace it with a powershell-based thing after bootstrap=script is made the default to reduce the number of ways we bootstrap programs.
Yep! That's one of the reasons I made that flag a string instead of boolean :) |
There are sneaky things that can be done with the shebang so that it uses the hermetic interpreter, but I'll need to dig it out. Overall, I agree though. Just noting that there are mechanisms to workaround this at the moment if desperate. |
With the script based bootstrap, you can probably more easily just use a custom stage1 bootstrap. This avoids any issues of trying to fit a program into 1 line that shebang process accepts. Also, in Marcel's case, that may not work anyways -- he says his image doesn't have any shells available at all. Hm, actually, I wonder if you could stick a prebuilt binary in as the stage1 bootstrap file. This might make it easier to prototype a native launcher, at the least. It'll get passed through
An alternative is to use something like the runtime env toolchain or a platform runtime. The runtime env toolchain's "interpreter" is simply a shell script that basically does A platform runtime is when a fixed path is used, i.e. setting `py_runtime.interpreter_path = "/usr/bin/python". |
For completeness, if we want to distribute binaries together with rules_python releases, I think Aspect has a blogpost on something related: https://blog.aspect.build/releasing-bazel-rulesets-rust They are using rust to create their launcher that builds a Since we are already depending on |
While I would have liked to pick another language, I decided for C++ because of the following reasons:
We would prefer to just use the runtime as part of the runfiles in order to avoid knowing or figuring out where in the image the runtime is. Also we would like to be independent of the base image and not require others to add Python and configure the paths correctly.
I only briefly checked the rules and it looks like there is no real macro as part of |
Ah, right, there isn't a common macro for both binaries and tests. Each has its own macro that calls its own rule (this isn't for a particular reason, just something that organically happened). |
This is what I tried but getting the values for |
🚀 feature request
Relevant Rules
py_binary
Description
With a similar motivation as #691, we would like to package a
py_binary
(including runfiles) into anoci_image
and run it within a minimum base image like distroless_base in order to minimize the attack surface. This does not come with a shell and other tools which are required by #1929 so this unfortunately doesn't help us.Describe the solution you'd like
Use a statically linked executable as loader.
Describe alternatives you've considered
Add more stuff to the base image. This is suboptimal as this does not only increase the size but also the attack surface.
The text was updated successfully, but these errors were encountered: