Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Golang implementation of image-builder with buildkit backend #1633

Open
rajaskakodkar opened this issue Nov 26, 2024 · 11 comments
Open

Golang implementation of image-builder with buildkit backend #1633

rajaskakodkar opened this issue Nov 26, 2024 · 11 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@rajaskakodkar
Copy link

Is your feature request related to a problem? Please describe.

Currently, there are bunch of issues with the build and test of image-builder

  1. There is no clear demarkation between OS level tasks and Kubernetes related tasks making it a tightly coupled system where a change in OS config warrants a rebuild of the entire system and not just OS layer and vice versa for Kubernetes.
  2. Ansible configurations have grown exponentially making the project extremely flexible and configurable but also difficult to maintain
  3. Packer continues to be a build time dependency which has been pinned to a version because of licensing issues. Ref Investigate implications of new Packer licensing #1246
  4. While Goss provides some validation for the artifacts of the project, there is still a gap of "e2e test coverage" in the form of inspect tests which can actually provide a clear signal of the final state of the machine images. Also highlighted in Testing of all providers and distros #1605

Describe the solution you'd like

One liner pitch: A Golang implementation of image builder with buildkit at its backend providing OCI layers for OS and Kubernetes related actions.

Slightly more detailed version: The idea is to transform the build system of image builder to start with

  1. Create a raw disk from scratch with loop devices, etc mounted on it
  2. Superimpose the distro provider filesystem on the disk.
    2.1. To create this filesystem, start with an OCI Image of the OS and implement all the tasks done by image-builder at an OS level in golang with buildkit llb at its backend https://github.com/moby/buildkit/blob/c1dacbc5ce0544ff72f7dc8acd9b99f015c2021a/docs/dev/dockerfile-llb.md
  3. Independently create an OCI Image with all the Kubernetes and friends (containerd, etc) related tasks and curate the filesystem as expected by Kubernetes and Cluster API
  4. Superimpose 2 and 3 to provide the combined filesystem for the machine image
  5. Mount the filesystem on the raw disk created in 1
  6. Use independent tools like openvmdk, vhd, ami, etc to create machine images for various providers

This enables clear demarkation between the OS and the Kubernetes layers as well as moves to a golang implementation. This can then be extended to write an e2e suite with inspect tests for automated testing.

This also moves away from packer.

This is distro agnostic and can work for windows as well.

Describe alternatives you've considered

systemd-sysext https://man.archlinux.org/man/systemd-sysext.8.en has appeared in the community and maybe there is a middle ground here to figure out how to integrate the OCI approach with systemd-sysext. Comments welcome!

A caveat here is that systemd-sysext will work only on systemd distros and not on windows

Another alternative is bootc for linux distros - https://github.com/containers/bootc

Additional context

My partners in crime in brainstorming this have been @randomvariable and @clebs and we want to check the appetite of the community for this feature. Happy to contribute to this effort!

cc @AverageMarcus @mboersma


/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 26, 2024
@rajaskakodkar
Copy link
Author

cc @t-lo to see how systemd-sysext fits here

@AverageMarcus
Copy link
Member

Some quick thoughts...

  1. Have you PoC'd this approach? I did a little research after our chat at KubeCon and couldn't find anything reliable on how to convert an OCI image to a VM disk image. Things like Kernel aren't included in OCI images usually and needs some special attention.

  2. You mention Independently create an OCI Image with all the Kubernetes and friends (containerd, etc) related tasks and curate the filesystem as expected by Kubernetes and Cluster API to have a separation between OS and non-OS things but I'm not sure this is actually feasible in practice. Please correct me if I'm wrong but I'm pretty sure different OS's expect things to be in different locations - e.g. Flatcar vs. Debian vs. Windows. I'm not sure we could get away we creating a generic layer that will work for all distros.

  3. I'd like to hear from the Flatcar folks that have been working on systext (thanks for pinging Thilo) and get their thoughts on how this could integrate.

  4. Big +1 on improving testability! 💙 We're sorely lacking in this area in the project and it's hit us a few times now.

  5. I like the thought of "reigning in" the project a bit. The current amount of configurability is not maintainable and makes all support issues difficult. It would be good to have a moderately strict amount of configuration that we support with this new approach but I also know that people have a need for more so we'd need to think of some "hook" that people could use to layer on their additional, unsupported configurations.

  6. I like the idea of having things more modular. We have a section that is base-OS, a section that Kubernetes binary layer and another section that is "convert to specific cloud provider". The benefit of this is we could have more maintainers of each of these specific areas. E.g. someone from Azure that is just responsible for the "convert to Azure disk image" bit. That way us maintainers don't have to be experts on everything (because thats impossible 😅)

@AverageMarcus AverageMarcus pinned this issue Nov 26, 2024
@AverageMarcus AverageMarcus added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 26, 2024
@t-lo
Copy link
Contributor

t-lo commented Nov 26, 2024

@rajaskakodkar Thanks for pulling me in! Tl;dr, sysexts would blend well with just about any image building approach you chose.

We would preferably compose sysexts into the system at provisioning time. This way, we can provision stock distro images instead of being required to host our own, self-built images. However, we absolutely recognise use cases where pre-built images are preferred, and the additional overhead of self-hosting images is acceptable. In these cases, using sysexts would be as easy as adding 2 or 3 files to well-known locations in the disk image. This should be very straightforward and does not require any exotic actions like e.g. running tools from inside the disk image (as it's currently the case with image-builder). Given the flexibility of sysexts, this could either be done at step 2 or step 3/4 of your process - without losing the ability of updating the Kubernetes bits independently.

Re: using OCI images, @AverageMarcus raises a good point - you need to figure out the boot process. There's some prior work for that in the bootc project, which specialises in bootable containers, and requires (to my understanding) special-crafted container images with kernel and bootloader: https://containers.github.io/bootc/intro.html.

Re: Marcus' 2nd point, there are minor differences but for Kubernetes specifically, these are negligible - Kubernetes already does a great job at being self-contained. Providers would need to adjust service management (likely systemd on most systems) and maybe logging. I'm not sure about Windows though.

I also wouldn't under-estimate vendor support - providing images for cloud vendors will need integration work of the base os, as well as continued testing. This too is a reason we prefer provisioning-time composition as we can directly benefit from the upstream distros' integration work for various clouds :)

Lastly, and just out of curiosity, did you have a look at mkosi? It seems to be the go-to tool for building distro images nowadays: https://github.com/systemd/mkosi/ and I know of at least one Kubernetes distro - Edgeless' "Constellation" - that uses mkosi and is happy with it. It currently doesn't support Windows though, and I personally don't have much experience with it (the tooling we use in Flatcar to build vendor images predates mkosi by several years, and we didn't investigate integrating mkosi with Flatcar yet).

@rajaskakodkar
Copy link
Author

Thanks @AverageMarcus and @t-lo for your feedback!

  1. Have you PoC'd this approach? I did a little research after our chat at KubeCon and couldn't find anything reliable on how to convert an OCI image to a VM disk image. Things like Kernel aren't included in OCI images usually and needs some special attention.

you need to figure out the boot process. There's some prior work for that in the bootc project, which specialises in bootable containers, and requires (to my understanding) special-crafted container images with kernel and boot loader

I have done some experiments that can be PoC'd. The idea is to create a raw disk with things mounted from the host filesystem. The OCI images will provide overlays for what is dictated by image-builder for OS and Kubernetes bits. And then using grub to help with the boot process.

  1. You mention Independently create an OCI Image with all the Kubernetes and friends (containerd, etc) related tasks and curate the filesystem as expected by Kubernetes and Cluster API to have a separation between OS and non-OS things but I'm not sure this is actually feasible in practice. Please correct me if I'm wrong but I'm pretty sure different OS's expect things to be in different locations - e.g. Flatcar vs. Debian vs. Windows. I'm not sure we could get away we creating a generic layer that will work for all distros.

there are minor differences but for Kubernetes specifically, these are negligible - Kubernetes already does a great job at being self-contained. Providers would need to adjust service management (likely systemd on most systems) and maybe logging. I'm not sure about Windows though.

As @t-lo has pointed out, Kubernetes is self contained and the differences per OS are negligible. Alternatively, there can be another OCI layer to make room for these differences. E.g OS -> Kubernetes -> Niche-OS-and-Kubernetes

We have a section that is base-OS, a section that Kubernetes binary layer and another section that is "convert to specific cloud provider". The benefit of this is we could have more maintainers of each of these specific areas.

I think this sums up the idea pretty well!

Lastly, and just out of curiosity, did you have a look at mkosi?

I haven't and thanks for bringing that up!

For next steps, I will try to PoC something and then we can whip out a well crafted proposal.

@justinsb
Copy link
Contributor

There's a lot of prior art in the history of this project, but deleted in #1175

Personally, it feels like bootc is a good wave to ride.

@AverageMarcus
Copy link
Member

The idea is to create a raw disk with things mounted from the host filesystem.

That sounds like we'd still need Packer to provision machines with each of the OSs we want to build for? Or am I misunderstanding?

I'm very keen to see a very basic PoC that can create a VM image based on Ubuntu, Flatcar and Windows (without Kubernetes) just to get a feel for what this will look like in practice and what would be required from the host system etc.

@t-lo
Copy link
Contributor

t-lo commented Nov 28, 2024

For a Flatcar PoC please have a look at bake_flatcar_image.sh in our sysext bakery. This script can be used to embed any given sysext into a stock Flatcar OS image. It supports producing "generic" images as well as vendor images for all 35 vendors currently supported by Flatcar (public and private clouds, bare metal, etc.). It's not in Go though, it's in Bash 😅 A brief intro to the OS image bake script is here.

E.g. for integrating Flatcar into the PoC we can leverage that script to embed the Kubernetes sysext, like so:

wget https://raw.githubusercontent.com/flatcar/sysext-bakery/refs/heads/main/bake_flatcar_image.sh
chmod 755 bake_flatcar_image.sh
./bake_flatcar_image.sh --fetch kubernetes:kubernetes-v1.31.3-x86-64.raw

This will produce a Flatcar disk image with Kubernetes embedded - and since it's a sysext, it can still be in-place updated. The sysext will be fetched from the Bakery - for a full list of Kubernetes sysexts available check out the Bakery releases. We could of course also "bake" an OS image with a local (custom built) sysext - the create_kubernetes_sysext.sh bakery script would come in helpful here.

We can also produce a vendor image - a ready-made cloud image with guest tools for a specific cloud. For AWS e.g. run

./bake_flatcar_image.sh --fetch --vendor ami kubernetes:kubernetes-v1.31.3-x86-64.raw

which will produce flatcar_production_ami_image.bin.bz2, an OS image with Kubernetes sysext embedded.

Deployments would merely need to add a kubelet config at /var/lib/kubelet/config.yaml at provisioning time so nodes could join clusters.

@t-lo
Copy link
Contributor

t-lo commented Nov 28, 2024

Oh and, with the new --mutable option in systemd-256 onwards, kubernetes sysexts have become usable on general purpose distros, too - Ubuntu 24.10 and Fedora 41 already ship that version, implicitly supporting mutability.

The main issue was that as soon as you merge a sysext, the whole of /usr becomes read-only. General purpose distros don't take that very well. But now we can create a symlink /var/lib/extensions.mutable/usr/ → /usr/ and run systemd-sysext --mutable=auto merge, and /usr remains writable.

This way, we can cover Ubuntu with sysexts too, and only Windows is left as a special case.

@AverageMarcus
Copy link
Member

and only Windows is left as a special case.

I suspect that will always be the case to some degree so not too worried about that. 😁

@rajaskakodkar
Copy link
Author

rajaskakodkar commented Nov 30, 2024

This is great, @t-lo

That sounds like we'd still need Packer to provision machines with each of the OSs we want to build for? Or am I misunderstanding?

We won't need packer, @AverageMarcus!

I think things will be clear with some PoC! I am going to find some time to get that out but I expect some delay due to prior commitments.

@AverageMarcus
Copy link
Member

Sounds good!

There's no rush. It's also coming up to the holiday season so I don't know about others but I'm going to be away for the next ~1 month anyway 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

5 participants