-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a source container that freezes the state of source repos #283
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speedy!
The source container is only meant to be used in CI for the following purposes:
- protocol normalization: we want to support protocols like https, ssh, local file, etc., for fetching the various source repositories. Instead of letting the individual FW/model Dockerfiles handle this logic, we can do it all at once when building the source container. Once the repo is cloned into the source container, we will mount it in subsequent build processes so that from the perspective of the FW/model dockerfiles, they are always checking out a local repo.
- code archival: the source container will be preserved in our container registry for reproducible builds, e.g. when the workflow need to be rerun.
- achieving the above two goals while ensuring the FW/model Dockerfiles are still usable out-of-the-box in standalone mode.
Given the above requirements, I feel that we can make a minimal Dockerfile.src
that looks like
FROM scratch
ARG SRC_PATH /src
# copy every folder under SRC_PATH to /src in the image
COPY ${SRC_PATH}/* /src
while the logic for handling different source protocols can be done in CI via a bash function in .github/workflows/scripts/clone-repo.sh
:
function clone-repo() {
# $1: SRC, e.g. https://github.com/google/jax.git#jax-v0.4.16
# $2: DST, e.g. /src/jax-source
}
and then in _build_src.yaml
, we can do something like:
...
jobs:
build:
steps:
- name: check out source repos locally
run: |
for package in jax xla pax ...; do
clone-repo ${!SRC_$package} /src/$package-source # pseudo code for variable indirect
- name: build source container
# use build-push-action with one build-arg $SRC_PATH=/src
...
I can for sure move the main cloning logic outside of the dockerfile into a script, but I feel like by moving cloning outside of the dockerfile, we create more logic that lives outside of the dockerfile, which should be hermetic. One benefit of structuring it in this way is buildkit will parallelize the independent stages :) The way I've initially proposed the structure is that the dockerfile by default needs no other instructions to build the source container, but we can customize it via build-args or build-contexts; whereas I think you're proposing cloning outside the dockerfile and copying it all in from the singular defeault build context. Perhaps I'm not seeing something you're seeing. Could you share more of your thought process? |
Honestly, I don't have a 100% structured idea here as this is new ground for us :) Some points are (not necessarily fully connected yet):
|
Also, what's your thought on how we gonna use the source container? I can think of the following options:
|
That's a good point. One crummy thing is that dockerfiles aren't very flexible, so moderately complex functionality (like having variadic inputs) tend to be quite verbose. Sure, I can move it out to a script. One thing I'm still struggling with is the level of complexity this introduces to our build process and whether we can make it simpler for those who want to manually build with our dockerfiles. I fear that we may be creating something that can only be run manually by inspecting our CI and chaining together the logic. I’m also recalling that one property we want from this is the ability to pin all sources, but also be able to tweak something small if we find there to be an issue. So actually let me go back and make sure I start from what I think we need to pin the state of the ecosystem. I’ll assume we need something like a # repo ref optional[url|path] clone
jax main https://github.com/google/jax.git True
t5x main https://github.com/google-research/t5x.git True
paxml main ./paxml-local True
seqio main https://github.com/google/seqio.git False This file will be used as the input to our entire build and should be present inside the container. Then we can have some script, say Then it’s just a matter of cloning the repos to the path specified under the I also include Also we can allow things like ——— I’m actually now thinking the source container:
What are your thoughts on this? |
Adds initial support for a "source" container that can either build from source given
--build-arg
s, or be overridden given extra--build-context
s.Examples: