Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always install pre-built wheels #177

Open
Tracked by #21
jmbowman opened this issue Feb 1, 2023 · 3 comments
Open
Tracked by #21

Always install pre-built wheels #177

jmbowman opened this issue Feb 1, 2023 · 3 comments

Comments

@jmbowman
Copy link

jmbowman commented Feb 1, 2023

There are a few reasons we would prefer to always install Python packages as wheels rather than source tarballs:

  • Faster, especially for packages with C or Rust extensions but even for pure Python packages
  • Enables us to eliminate hundreds of MB of compilation tools from our Docker images
  • More secure, by removing opportunities for arbitrary code execution during installation

However, not all of our dependencies have wheels on PyPI for the versions we use. And of those that do, not all of them have all of the process architectures we would like to support. But this can be worked around. Tentative proposal:

  • Document the set of binary formats (OS/architecture combinations) we want to support
  • Make sure that all of the Open edX packages get all of the appropriate wheels pushed to PyPI with each release. This will generally be a single universal wheel unless the package contains native code extensions.
  • Set up at least one package server for 2U to host wheels for its private code and dependencies which don't have all the needed wheels on PyPI. Ideally there would be a second managed by tCRIL to host just the missing dependency wheels for the benefit of the entire Open edX community.
  • Create a repository and Docker image(s) with all of the dependencies needed to build the missing dependency wheels. This could also be used in development to try out new packages with missing wheel variants on PyPI, building them here and then supplying them to the dev environment's devpi instance.
  • Update pip configuration to allow installation from the custom package server(s)
  • Remove native compilation tooling and development header packages from all the other Docker images.

Some of the benefits could be obtained by instead/also using a builder pattern for Dockerfiles such that the compilation toolchain isn't actually in the images used for development and production, but that leaves some of the security risk, a longer image build time, and occasional hassles in development when installing a new package with a missing binary wheel.

Some package server options:

Some tooling that helps build and upload wheels:

@kdmccormick
Copy link

Make sure that all of the Open edX packages get all of the appropriate wheels pushed to PyPI with each release. This will generally be a single universal wheel unless the package contains native code extensions.

For ~95% of Open edX repos, would this be as simple as setting {'bdist_wheel':{'universal':'1'}}?

If so, would that on its own be worth doing, whether or not we go for the rest of the issue?

Create a repository and Docker image(s) with all of the dependencies needed to build the missing dependency wheels. This could also be used in development to try out new packages with missing wheel variants on PyPI, building them here and then supplying them to the dev environment's devpi instance.

I think I would want to look at a list packages that are missing dependency wheels so we can get an idea of how much build time & image size we'd save. If it'd be significant, then I'm in favor of this.

Some of the benefits could be obtained by instead/also using a builder pattern for Dockerfiles

FWIW, Tutor and most of its plugins use the builder pattern, yet I still want to bring the build time and image size down.

@jmbowman
Copy link
Author

jmbowman commented Feb 3, 2023

I think we'd want to start with a repo health check that checks PyPI for wheel availability, that would probably go a long way towards answering your questions. I suspect it's only around 2% of our package dependencies that have any native code extensions, which is why it feels silly to me that we're bloating our build process because of those.

@kdmccormick
Copy link

Copied to openedx/edx-platform#34444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants