Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove Stacks #167

Closed
wants to merge 5 commits into from
Closed

RFC: Remove Stacks #167

wants to merge 5 commits into from

Conversation

sclevine
Copy link
Member

Readable

This is rough and needs more detail, but I'm opening this as ready-for-review given the sweeping nature of the changes proposed.

CC: @ekcasey @jkutner @samj1912 @BarDweller

Signed-off-by: Stephen Levine <stephen.levine@gmail.com>
@sclevine sclevine requested a review from a team as a code owner June 17, 2021 01:10
@sclevine sclevine requested review from ekcasey, hone, jkutner and nebhale June 17, 2021 01:11
Signed-off-by: Stephen Levine <stephen.levine@gmail.com>

A buildpack app may have a build.Dockerfile and/or run.Dockerfile in its app directory. A run.Dockerfile is applied to the selected runtime base image after the detection phase. A build.Dockerfile is applied to the build-time base image before the detection phase.

Both Dockerfiles must accept `base_image` and `build_id` args. The `base_image` arg allows the lifecycle to specify the original base image. The `build_id` arg allows the app developer to bust the cache after a certain layer and must be defaulted to `0`.
Copy link
Contributor

@jabrown85 jabrown85 Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does lifecycle pass into base_image? The examples below specify LABEL io.buildpacks.image.distro=ubuntu in a run.Dockerfile but I would have thought those would exist on the base_image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of base_image is always the original image that needs to be extended. The run.Dockerfile you're referencing below would be used to create a stack from something like ubuntu:bionic. A command like pack create-stack could take run.Dockerfile and build.Dockerfile, perform a normal docker build, and then validate that all required fields/files are present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at

Note: kaniko, BuildKit, and/or the original Docker daemon may be used to apply Dockerfiles at the platform's discretion.

and also

allows the lifecycle to specify the original base image

I wonder if maybe it's worth clarifying further how this would work. I'm assuming for the build image, the lifecycle could use kaniko during the existing build phase. But extending the run image would imply a new phase...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build_id arg allows the app developer to bust the cache after a certain layer and must be defaulted to 0.

Could you describe a bit further how this would work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the examples below -- if you use build_id in a RUN instruction, that layer and all layers under it will never be cached due to the value changing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming for the build image, the lifecycle could use kaniko during the existing build phase. But extending the run image would imply a new phase...

This is what I'm thinking as well. For pack, this phase could happen in parallel with the builder phase. Happy to add more detail.

Copy link

@cmoulliard cmoulliard Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kaniko

Kaniko will imply to run the build process in a docker container or kubernetes pod. This is not needed using Google JIB - https://github.com/GoogleContainerTools/jib or buildah - https://github.com/containers/buildah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tools all use different approaches:

  • JIB doesn't require containers at all, but it's specific to JVM-based apps.
  • Kaniko can be used as a library and invoked within an existing container (entirely in userspace, like JIB).
  • Buildah requires either containers or fuse/chroot.

For in-cluster builds, kaniko's approach is least-privileged. For local builds, Docker/BuildKit (or buildah on Linux) all seem like good options.

Happy to remove or extend the list of suggested technologies.

run.Dockerfile used to create a runtime base image:

```
ARG base_image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a plain Dockerfile with FROM ubuntu:18.04?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but that would work also. The idea is to use the same format for all stack-related Dockerfiles (creating, pre-build extension, at-build extension).

@aemengo
Copy link
Contributor

aemengo commented Jun 17, 2021

On App-specified Dockerfiles:
I like the idea of an artifact in your app directory that extends the build process. I have mixed feelings about it being a Dockerfile.

My worry is that the application developer will see something they're used to and abuse it. For instance, putting other commands that a buildpack should be used for (like installing golang). Furthermore, it blunts our value proposition to see this other method of container building. Having this artifact instead be an "inline-buildpack" or even just some buildpack.sh would ease my worry.

Also I'd like to think that app-specified Dockerfiles, though made possible, should be frowned upon. Perhaps platforms like pack can display a warning message, or kpack can allow operators to reject applications with these files in place. As long as it's known that writing arbitrary build code isn't a best practice.

These are my thoughts about the application developer persona. For other personas, I have no strong feelings here.

@jabrown85
Copy link
Contributor

I like the idea of changing what a stack is and simplifying that. What do you think about pushing the docker execution to an extension spec with a much smaller interface?

Something like exec.d. Where, if present on a build or run image, extension binaries (/cnb/lifecycle/heroku-stack-extender) are executed to extend the build or run image. A stack author may choose to implement a Dockerfile based extension like you propose or they could also decide to execute an inline-style execution of commands located in a project.toml. Or even simpler, the stack author could turn install OS packages listed in project.toml. Kind of like docker credential helpers. The project could donate an apt one for bionic/focal that is safe for rebase.

The app developer may not be able to move stacks without moving from one extension method to another...but maybe that is ok?

FROM ${base_image}
ARG build_id=0

LABEL io.buildpacks.unsafe=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case for adding this label any time the run image is extended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Flipped the label to io.buildpacks.rebasable. Not sure if it should be inherited though.

sclevine added 2 commits June 17, 2021 17:51
Signed-off-by: Stephen Levine <stephen.levine@gmail.com>
Signed-off-by: Stephen Levine <stephen.levine@gmail.com>
Comment on lines +100 to +101
LABEL io.buildpacks.image.distro=ubuntu
LABEL io.buildpacks.image.version=18.04
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be derived automatically from /etc/os-release when present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would be responsible for adding this label? Would the lifecycle add it to the exported image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pack create-stack could add it if not set already

@sclevine
Copy link
Member Author

sclevine commented Jun 18, 2021

@aemengo:

My worry is that the application developer will see something they're used to and abuse it. For instance, putting other commands that a buildpack should be used for (like installing golang).

I agree, this is definitely a concern. The proposed functionality allows users to apply Dockerfiles to tasks that might be accomplished more efficiently with buildpacks. That said, if a user finds value in combining buildpacks and Dockerfiles, I'd like to make that as easy as possible. I'd like to optimize for easy adoption. If buildpacks are genuinely valuable, I don't think we should avoid giving the user choice.

Furthermore, it blunts our value proposition to see this other method of container building.

I don't believe that buildpacks can meaningfully improve the experience of installing OS packages, compared to Dockerfiles. Given that Dockerfiles are the accepted solution for this already, I think we should make it as easy as possible to use them.

Perhaps platforms like pack can display a warning message, or kpack can allow operators to reject applications with these files in place. As long as it's known that writing arbitrary build code isn't a best practice.

At the very least, the pack CLI should warn loudly when your runtime base image is not rebasable (which will happen when run.Dockerfile is used naively). Kpack could choose to implement this functionality entirely outside of its Image CRD / app source. Platforms should not be required to implement any form of run.Dockerfile / build.Dockerfile.

@jabrown85:

What do you think about pushing the docker execution to an extension spec with a much smaller interface?

👍 I think all functionality related to modifying base images (i.e., run/build.Dockerfile) should be optional for platforms to implement and exist outside of the core spec.

Or even simpler, the stack author could turn install OS packages listed in project.toml.

That's actually what I'm trying to avoid with this proposal: solving OS package installation as part of the buildpacks project. I think @aemengo's point here is prescient:

app-specified Dockerfiles, though made possible, should be frowned upon.

By forcing the user to use run.Dockerfile and build.Dockerfile (which are both recognizable as Dockerfiles and not an interface created by the buildpacks project) we make it clear that these constructs are escape hatches that don't provide additional value over Dockerfiles. Dockerfiles don't provide safe rebasing, automated version updates, etc. -- and they never will. As soon as we have a packages list in project.toml, I think we're on the hook to solve a hard, distribution-specific problem.


More generally, I don't think buildpacks are quite popular enough yet for restrictive interfaces to make sense. We should focus on adding value on top of what already exists in the ecosystem, reduce the number of concepts (stacks, mixins, etc.), unnecessary terms ("buildpackage"), and bespoke interfaces that make the project inaccessible, and do more to make value apparent to users (e.g., #160).


When an app image is rebased, `pack rebase` will fail if packages are removed from the new runtime base image. This check may be skipped by passing a new `--force` flag to `pack rebase`.

## Runtime Base Image Selection
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekcasey Re: your comment on runtime base image selection and app-specified Dockerfiles not playing well together (i.e., app-specified Dockerfiles can't fulfill package requirements from buildpacks): what if we allow users to label the Dockerfiles with packages names (in version-less PURL format) that could be matched against (and thus remove) buildpacks-required packages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that the "label" would be something like a comment at the top of the Dockerfile, not an image label.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we allow users to label the Dockerfiles with packages names (in version-less PURL format) that could be matched against (and thus remove) buildpacks-required packages?

Hmm, this fills the required purpose but it seems like it moving away from the simplicity of "just a Dockerfile" towards something that more closely resembled a list of "provided mixins"? I need to chew on this a little more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, feels bad to me also

Signed-off-by: Stephen Levine <stephen.levine@gmail.com>

Builders may specify an ordered list of runtime base images, where each entry may contain a list of runtime base image mirrors.

Buildpacks may specify a list of package names (as PURL URLs without versions or qualifiers) in a `packages` table in the build plan.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekcasey some discussion on PURL version ranges:

package-url/purl-spec#93
package-url/purl-spec#66

Some of the comments in those threads suggests that PURL could be used for querying, with qualifiers used for matching against ranges. Is it worth re-considering whether we move the build plan to PURL as well? Then we could represent the packages table with normal build plan entries. Seems like it could cleanly unify the build plan, sbom, run image selection, and dependency validation.


## Dockerfiles

Note: kaniko, BuildKit, and/or the original Docker daemon may be used to apply Dockerfiles at the platform's discretion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "at the platform's discretion" mean that a platform can provide whatever mechanism it wants for buildpack users to select/provide Dockerfiles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement was only intended to suggest that the underlying technology for applying Dockerfiles is up to the platform. E.g., BuildKit if you're using the BuildKit frontent, kaniko if you're using kpack or tekton, etc.


### App-specified Dockerfiles

A buildpack app may have a build.Dockerfile and/or run.Dockerfile in its app directory. A run.Dockerfile is applied to the selected runtime base image after the detection phase. A build.Dockerfile is applied to the build-time base image before the detection phase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we're specing, or is this a platform detail of Pack?

I'd like to see these build/run Dockerfiles defined in project.toml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @jabrown85's idea of putting all Dockerfile-related functionality into an extension spec.

Do you mean the locations may be overriden in project.toml, or are you thinking inline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of overriding the location, but now I'm interested in inline too

- Buildpacks cannot install OS packages directly, only select runtime base images.

# Alternatives
[alternatives]: #alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's also a variant of stackpacks where we aren't so strict as the current RFC. All the complexity came in when we tried to put guardrails around them and ensure rebase always worked.

The Dockerfiles in this proposal could be easily replaced with a stackpack that's just a bin/detect and bin/build and no guarantees about rebase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add this as an alternative.

I think our mistake is larger than trying to preserve rebase though. As mentioned in #167 (comment), I think stackpacks leave us on the hook to solve a hard problem. Even if we don't break rebase, how are we going to ensure that packages stay up-to-date? I'd rather implement the existing solution (and associated set of user expectations) first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I agree the problem is larger (rebase is just an example here).

My original vision for Stackpacks was something dead simple: a type of buildpack that runs as root. I don't think that's much different than a Dockerfile. If we try to attach a bunch of guardrails/constraints/etc we'll probably end up in the same spot.

That said, I think the original very simple Stackpacks concept could co-exist with the Dockerfile mechanism.

# Unresolved Questions
[unresolved-questions]: #unresolved-questions

- Should we use the build plan to allows buildpacks to specify package requirements? This allows, e.g., a requirement for "python" to be satisfied by either a larger runtime base image or by a buildpack. Opinion: no, too complex and difficult to match package names and plan entry names, e.g., python2.7 vs. python2 vs. python.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. this is where stackpacks got messy. I don't think stackpacks themselves where the problem, but rather all the stuff like this that we tacked on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the current proposal doesn't shut the door to doing something like that in the future, right? If so maybe we could revisit this question when we find that we need it.

@jkutner
Copy link
Member

jkutner commented Jun 21, 2021

@aemengo @sclevine

Furthermore, it blunts our value proposition to see this other method of container building.

I don't believe that buildpacks can meaningfully improve the experience of installing OS packages, compared to Dockerfiles. Given that Dockerfiles are the accepted solution for this already, I think we should make it as easy as possible to use them.

I think an "apt-buildpack" would improve the experience of installing OS packages. For example: automatically cleaning up var/cache, ensuring apt-get update, etc.

That said I'm still in favor of exploring the Dockerfile approach.

@jabrown85
Copy link
Contributor

@sclevine is rebase in this RFC going to execute the run.Dockerfile? Or are you saying these Dockerfiles will only be executed during build?

@sclevine
Copy link
Member Author

Dockerfiles would only be applied during build. The rebasable label is used to signify that users may rebase layers under the Dockerfile (e.g., because it only writes to /opt)


For Linux-based images, each field should be canonicalized against values specified in `/etc/os-release` (`$ID` and `$VERSION_ID`).

The `stacks` list in `buildpack.toml` is replaced by a `platforms` list, where each entry corresponds to a different buildpack image that is exported into a [manifest index](https://github.com/opencontainers/image-spec/blob/master/image-index.md). Each entry may contain multiple valid values for Distribution and/or Version, but only a single OS and Architecture. Each entry may also contain a list of package names (as PURL URLs without versions or qualifiers) that specify detect-time and build-time (but not runtime) OS package dependencies. Buildpacks may express runtime OS package dependencies during detection (see "Runtime Base Image Selection" below).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each entry may contain multiple valid values for Distribution and/or Version, but only a single OS and Architecture.

How would this work? How would the exporter know which Distribution/Version combinations are valid?

Buildpacks may express runtime OS package dependencies during detection

What's the advantage in removing the ability to declare this statically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the exporter know which Distribution/Version combinations are valid?

Each entry represents a separate artifact.

E.g., x86_64 Linux version of the buildpack is compatible with Ubuntu 18.04 and Ubuntu 16.04.

What's the advantage in removing the ability to declare this statically?

I didn't see much of an advantage to it, given that potential run images might be selected right before the build. Also, the static list in each platform in buildpack.toml could be used to specify build-time packages, which do need static validation (when creating a builder).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need an example of what this buildpack.toml would look like. I am 100% on the idea but I am curious about how we list packages in an entry that contains multiple distributions, given some identifiers pkg:rpm/fedora/curl might only make sense for certain distributions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me make sure I understand the point correctly -- say the buildpack is something related to node.js, so it would need the npm tool during the building (pulling the dependencies from npmjs.org). The buildpack.toml would then include something like pkg:rpm/fedora/npm and it would result in npm package being installed during the build phase?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during the build phase?

How such installation will then take place, using which lib/tool ? Will the container running the lifecycle creator be able to execute a command to install a rpm which requires to be root ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buildpack.toml would then include something like pkg:rpm/fedora/npm and it would result in npm package being installed during the build phase?

Not exactly -- this list would only be used before the build to ensure that the build-time base image contains OS packages that are required by the buildpack at build-time. E.g., it could be used to assert that curl is available at build time. If a platform wants to create a base image dynamically based on buildpack.toml contents, that sounds interesting, but tooling to do that is out-of-scope for this RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example platform table in: #172


## Mixins

The mixins label on each base image is replaced by a layer in each base image containing a single file consisting of a CycloneDX-formatted list of packages. Each package entry has a [PURL](https://github.com/package-url/purl-spec)-formatted ID that uniquely identifies the package.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for run images too? How would the lifecycle get this information for selecting the run image?

Think it's here:

the output replaces the label io.buildpacks.sbom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the label is supposed to be a reference to the layer that contains the SBoM. I should make that more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sold on combining the stack SBoM and packages list. I know the SBoM by definition does include the packages but it may also include a lot of other informaion (provenance, licenses) and it could be useful to pull out the piece that is required for validation into a more easily consumable format (and one that is less likely to change if for example we switch from cycloneDX to SPDX for the BOM)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be useful to pull out the piece that is required for validation into a more easily consumable format

Why should we implement logic to transform the data into a different format and put both formats on the image? We could just use one (ideally standardized, non-CNB-specific) format, and transform it when we need to validate.

one that is less likely to change if for example we switch from cycloneDX to SPDX for the BOM

If we commit to a format and change it, we're going to have to update the lifecycle to parse it regardless.


A buildpack app may have a build.Dockerfile and/or run.Dockerfile in its app directory. A run.Dockerfile is applied to the selected runtime base image after the detection phase. A build.Dockerfile is applied to the build-time base image before the detection phase.

Both Dockerfiles must accept `base_image` and `build_id` args. The `base_image` arg allows the lifecycle to specify the original base image. The `build_id` arg allows the app developer to bust the cache after a certain layer and must be defaulted to `0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at

Note: kaniko, BuildKit, and/or the original Docker daemon may be used to apply Dockerfiles at the platform's discretion.

and also

allows the lifecycle to specify the original base image

I wonder if maybe it's worth clarifying further how this would work. I'm assuming for the build image, the lifecycle could use kaniko during the existing build phase. But extending the run image would imply a new phase...


A buildpack app may have a build.Dockerfile and/or run.Dockerfile in its app directory. A run.Dockerfile is applied to the selected runtime base image after the detection phase. A build.Dockerfile is applied to the build-time base image before the detection phase.

Both Dockerfiles must accept `base_image` and `build_id` args. The `base_image` arg allows the lifecycle to specify the original base image. The `build_id` arg allows the app developer to bust the cache after a certain layer and must be defaulted to `0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build_id arg allows the app developer to bust the cache after a certain layer and must be defaulted to 0.

Could you describe a bit further how this would work?

Comment on lines +100 to +101
LABEL io.buildpacks.image.distro=ubuntu
LABEL io.buildpacks.image.version=18.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would be responsible for adding this label? Would the lifecycle add it to the exported image?

# Unresolved Questions
[unresolved-questions]: #unresolved-questions

- Should we use the build plan to allows buildpacks to specify package requirements? This allows, e.g., a requirement for "python" to be satisfied by either a larger runtime base image or by a buildpack. Opinion: no, too complex and difficult to match package names and plan entry names, e.g., python2.7 vs. python2 vs. python.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the current proposal doesn't shut the door to doing something like that in the future, right? If so maybe we could revisit this question when we find that we need it.

@aemengo
Copy link
Contributor

aemengo commented Jun 22, 2021

@sclevine @jkutner

That said I'm still in favor of exploring the Dockerfile approach.

My last words on App-specified Dockerfiles.

I see a lot of potential for this. Being an escape hatch, it might be used just as much as project.toml.

That said, these files are self-marketing. I'm worried that this a missed oppurtunity to draw attention to the buildpacks project by using, as @sclevine put it: an interface not created by the buildpacks project.


The same Dockerfiles may be used to create new stacks or modify existing stacks outside of the app build process. For both app-specified and stack-modifying Dockerfiles, any specified labels override existing values.

Dockerfiles that are used to create a stack must create a `/cnb/stack/genpkgs` executable that outputs a CycloneDX-formatted list of packages in the image with PURL IDs when invoked. This executable is executed after any run.Dockerfile or build.Dockerfile is applied, and the output replaces the label `io.buildpacks.sbom`. This label doubles as a Software Bill-of-Materials for the base image. In the future, this label will serve as a starting point for the application SBoM.
Copy link

@fg-j fg-j Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the validity of this binary be assured? Is this binary published by the project and included (by the platform?) in Dockerfile-extended stacks? Is it the responsibility of the Dockerfile writer to create (or validate) their own binary? I worry about a malicious binary that produces false information about the packages contained in a build/run image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary would be root-owned and added by the Dockerfiles that created the build-time and runtime base images. The distribution-specific logic in the binary could be implemented for common distros by the CNB project.

Given that all Dockerfiles can run as root, they must all be fully-trusted. If an untrusted Dockerfile is allowed to run, it could cause the binary to produce false information without touching the binary itself (e.g., via LD_PRELOAD, or by modifying the package DB). It's up to Dockerfile authors to ensure supply chain integrity for any components they add.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach proposed here which I suppose will rely on docker build or podman build will only work using docker locally as it uses by default the root user but not at all if the image is build using a pod (kubernetes, openshift) as a random or fix UID which is not root is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this might not be necessary as the buildpacks lifecycle could use kaniko to execute the Dockerfile in the context of the build or run image (see #167 (comment) ). @sclevine is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, kaniko can be used for in-cluster builds. The lifecycle already has build-time phases that require in-container root.

@sclevine
Copy link
Member Author

@aemengo

That said, these files are self-marketing. I'm worried that this a missed oppurtunity to draw attention to the buildpacks project by using, as @sclevine put it: an interface not created by the buildpacks project.

I wonder if the specific file names (e.g., run.Dockerfile / build.Dockerfile) might actually help buildpacks, marketing-wise. Without context in buildpacks, I might not associate a file called "project.toml" or even "buildpacks.toml" with building containers. But I would be very curious about how/why specially named Dockerfiles are being used by a repo, especially if I open them up and see parameterized base images, etc.

@jabrown85
Copy link
Contributor

How would you feel about specifically named stages in a single Dockerfile?

ARG base_image
FROM ${base_image} as cnb-build
ARG build_id=0

LABEL io.buildpacks.image.distro=ubuntu
LABEL io.buildpacks.image.version=18.04
LABEL io.buildpacks.rebasable=true

ENV CNB_USER_ID=1234
ENV CNB_GROUP_ID=1235

RUN groupadd cnb --gid ${CNB_GROUP_ID} && \
  useradd --uid ${CNB_USER_ID} --gid ${CNB_GROUP_ID} -m -s /bin/bash cnb

USER ${CNB_USER_ID}:${CNB_GROUP_ID}
COPY genpkgs /cnb/stack/genpkgs

FROM ${base_image} as cnb-run
ARG build_id=0

LABEL io.buildpacks.rebasable=true

RUN echo ${build_id}

RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

@sclevine
Copy link
Member Author

How would you feel about specifically named stages in a single Dockerfile

I considered this, but decided against because intermediate images are generally not tagged. And COPY --from works with multiple separate Dockerfiles already, if we decide it would be useful.

I'm not opposed to exploring this though. (Assuming you meant to parameterize the run/build base images separately.)

Instead of a stack ID, runtime and build-time base images are labeled with the following canonicalized metadata:
- OS (e.g., "linux", `$GOOS`)
- Architecture (e.g., "x86_64", `$GOARCH`)
- Distribution (optional) (e.g., "ubuntu", `$ID`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any cases where $ID_LIKE is useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only find the combination of $ID and $VERSION_ID to be especially useful (for establishing ABI). I supposed $ID could be useful for knowledge of common tooling (e.g., apt), and sometimes that could work for $ID_LIKE as well? Maybe something we could add later?


For Linux-based images, each field should be canonicalized against values specified in `/etc/os-release` (`$ID` and `$VERSION_ID`).

The `stacks` list in `buildpack.toml` is replaced by a `platforms` list, where each entry corresponds to a different buildpack image that is exported into a [manifest index](https://github.com/opencontainers/image-spec/blob/master/image-index.md). Each entry may contain multiple valid values for Distribution and/or Version, but only a single OS and Architecture. Each entry may also contain a list of package names (as PURL URLs without versions or qualifiers) that specify detect-time and build-time (but not runtime) OS package dependencies. Buildpacks may express runtime OS package dependencies during detection (see "Runtime Base Image Selection" below).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need an example of what this buildpack.toml would look like. I am 100% on the idea but I am curious about how we list packages in an entry that contains multiple distributions, given some identifiers pkg:rpm/fedora/curl might only make sense for certain distributions.


The `stacks` list in `buildpack.toml` is replaced by a `platforms` list, where each entry corresponds to a different buildpack image that is exported into a [manifest index](https://github.com/opencontainers/image-spec/blob/master/image-index.md). Each entry may contain multiple valid values for Distribution and/or Version, but only a single OS and Architecture. Each entry may also contain a list of package names (as PURL URLs without versions or qualifiers) that specify detect-time and build-time (but not runtime) OS package dependencies. Buildpacks may express runtime OS package dependencies during detection (see "Runtime Base Image Selection" below).

App image builds fail if the build image and selected run image have mismatched metadata. We may consider introducing a flag to skip this validation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming "mismatched metadata" only applied to things like Architecture and Distribution, given we typically expect mismatched packages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also curious about what happens to stacks like io.paketo.stacks.tiny here. The build image is an ubuntu distribution but the run image isn't (although it is derived from an ubuntu distribution, I believe).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to strike this requirement for $ID and $VERSION_ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording in #172 so that tiny can leave off the distro/version to be compatible with all distros/versions. Also mentioned that flags/labels could be used to skip validation in the future.


## Mixins

The mixins label on each base image is replaced by a layer in each base image containing a single file consisting of a CycloneDX-formatted list of packages. Each package entry has a [PURL](https://github.com/package-url/purl-spec)-formatted ID that uniquely identifies the package.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sold on combining the stack SBoM and packages list. I know the SBoM by definition does include the packages but it may also include a lot of other informaion (provenance, licenses) and it could be useful to pull out the piece that is required for validation into a more easily consumable format (and one that is less likely to change if for example we switch from cycloneDX to SPDX for the BOM)


When an app image is rebased, `pack rebase` will fail if packages are removed from the new runtime base image. This check may be skipped by passing a new `--force` flag to `pack rebase`.

## Runtime Base Image Selection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we allow users to label the Dockerfiles with packages names (in version-less PURL format) that could be matched against (and thus remove) buildpacks-required packages?

Hmm, this fills the required purpose but it seems like it moving away from the simplicity of "just a Dockerfile" towards something that more closely resembled a list of "provided mixins"? I need to chew on this a little more.

[what-it-is]: #what-it-is

Summary of changes:
- Replace mixins with a CycloneDX-formatted list of packages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add the link to the project you refer here. I suppose that it corresponds to : https://cyclonedx.org/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, will add a link in the smaller RFCs that will replace this one.


### Validations

Buildpack base image metadata and packages specified in `buildpack.toml`'s `platforms` list are validated against the runtime and build-time base images.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use YAML or JSON to declare such METADATA instead of the TOML format ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to introduce a separate file, since all buildpack configuration is currently in buildpack.toml. Alternatively, we could permit an alternate format version of the buildpack.toml file (e.g., buildpack.yml).

Why do you prefer YAML or JSON?


Dockerfiles that are used to create a stack must create a `/cnb/stack/genpkgs` executable that outputs a CycloneDX-formatted list of packages in the image with PURL IDs when invoked. This executable is executed after any run.Dockerfile or build.Dockerfile is applied, and the output replaces the label `io.buildpacks.sbom`. This label doubles as a Software Bill-of-Materials for the base image. In the future, this label will serve as a starting point for the application SBoM.

### Examples

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create, part of a github repo a more concrete example staring from an existing Buildpack and stacks and how it could be converted into Buildpacks DockerStacks tree of files ?

@sclevine
Copy link
Member Author

Deprecating in favor of #172, #173, #174

@sclevine sclevine closed this Jun 30, 2021
Copy link

@cmoulliard cmoulliard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is definitely a concern. The proposed functionality allows users to apply Dockerfiles to tasks that might be accomplished more efficiently with buildpacks. That said, if a user finds value in combining buildpacks and Dockerfiles, I'd like to make that as easy as possible. I'd like to optimize for easy adoption. If buildpacks are genuinely valuable, I don't think we should avoid giving the user choice.

I think that there is a misunderstanding as we dont want necessarily to give to a developer the right to install something as proposed with the file build.Dockerfiles but to allow to the builder step executed by the creator to perform a privileged command which is maybe to install a package, to create a certificate, update a file owned by root, allow a java runtime to use as local port 80, ... Even if this code dont execute a privileged command, it gives you an example about what we could do but on Fedora, CentOS, RHEL os using a tool able to install a RPM - https://github.com/paketo-buildpacks/libjvm/blob/main/build.go#L57-L78


The same Dockerfiles may be used to create new stacks or modify existing stacks outside of the app build process. For both app-specified and stack-modifying Dockerfiles, any specified labels override existing values.

Dockerfiles that are used to create a stack must create a `/cnb/stack/genpkgs` executable that outputs a CycloneDX-formatted list of packages in the image with PURL IDs when invoked. This executable is executed after any run.Dockerfile or build.Dockerfile is applied, and the output replaces the label `io.buildpacks.sbom`. This label doubles as a Software Bill-of-Materials for the base image. In the future, this label will serve as a starting point for the application SBoM.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the executable genpkgs become part of the Lifecycle ? Will it be called during bin buildpack step to install a package and set the $PATH of the package (e.g. maven, jdk, ...) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants