Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DockerImage build step #572

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

anti-social
Copy link
Contributor

@anti-social anti-social commented Jan 9, 2022

Implemented:

  • Build step options: registry, image, tag, path, insecure
  • Short form: !DockerImage python, !DockerImage python:3.10-slim
  • Insecure registries
  • Registries that don't support authentication
  • Cache layers; wait if another process is downloading the layer

@anti-social anti-social marked this pull request as draft January 9, 2022 23:39
@anti-social anti-social changed the title [wip] DockerImage build step [wip] DockerImage build step, closes #555 Jan 9, 2022
@anti-social anti-social changed the title [wip] DockerImage build step, closes #555 [wip] DockerImage build step Jan 9, 2022
@anti-social anti-social force-pushed the docker-image branch 3 times, most recently from 06d6039 to 2cfe178 Compare January 10, 2022 22:38
@anti-social
Copy link
Contributor Author

@tailhook Could you review the PR

Copy link
Owner

@tailhook tailhook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a bit tricky async code and the absense of docs look good.

Also did you do any semantic changes in tar file handling, except making it more configurable? It's hard to spot if there is something significant changed.

});

let mut layers_tasks = vec!();
for (digest, tx) in layers_digests.iter().zip(downloaded_layer_txs.into_iter()) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of spawning tasks, it's better to use FuturesUnordered, like this:

    let layers = layers_digests.iter().zip(downloaded_layer_txs.into_iter())
        .map(|(digest, tx)| {
            async move {
               // same code as in task
            }
        })
        .collect::<FuturesUnordered<_>>()
        .try_collect().await?;

This way you don't need channels (results returned right into the collection), and cancellation works seamlesly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way you don't need channels

I need channels to notify an unpack task that a corresponding layer was downloaded. Thus unpacking downloaded layers occurs while other layers are still downloading.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put unpack task just next to the download task into the same future, but yeah, you have to wait that previous layer is unpacked, right? So yes, oneshot is probably a way to ensure that.

@anti-social
Copy link
Contributor Author

Other than a bit tricky async code and the absense of docs look good.

I'm considering not doing а concurrent download of docker layers.

@tailhook
Copy link
Owner

Other than a bit tricky async code and the absense of docs look good.

I'm considering not doing а concurrent download of docker layers.

Why? FuturesUnordered are fun to use, I think you just have to play with it a little bit to make yourself comfortable. Technically you can also use FuturesUnordered like spawn (i.e. use push), but that code might be too imperative :)

@anti-social
Copy link
Contributor Author

anti-social commented Feb 15, 2022

With FuturesUnordered coroutines are launched without an order. So there is a possible situation when the first layer will be downloaded last. As a result an unpacking won't be executed concurrently with layers downloading.

I've tested a concurrent downloading a little. Large images become ready for use faster when their layers are downloaded in a sequence. Especially when the first layer is huge.

@anti-social
Copy link
Contributor Author

anti-social commented Feb 15, 2022

I would rather use mpmc channel for a concurrent layers downloading. But I think it's redundant as the concurrent downloading cannot significantly increase total downloading speed due to limited network bandwidth.

@anti-social anti-social marked this pull request as ready for review February 15, 2022 21:58
@anti-social anti-social changed the title [wip] DockerImage build step DockerImage build step Feb 15, 2022
@tailhook
Copy link
Owner

With FuturesUnordered coroutines are launched without an order. So there is a possible situation when the first layer will be downloaded last. As a result an unpacking won't be executed concurrently with layers downloading.

I've tested a concurrent downloading a little. Large images become ready for use faster when their layers are downloaded in a sequence. Especially when the first layer is huge.

It's the same issue with spawned futures :) Just add a way to sync this. Something like this:

let channels = (0..layers.len()+1).map(|_| oneshot::channel()).collect::<Vec<_>>();
let tx = channels.map(|(tx, _)| tx);
let rx = channels.map(|(tx, _)| rx);
tx.remove(0).send(|()| ()); // Activate first one
layers.zip(tx).zip(rx).map(|((l, tx), rx)| async {
  download().await;
  rx.recv().await;
  unpack().await;
  tx.send(()).await;
}).collect::<FuturesUnordered>().await;

(and maybe also add a semaphore to limit number of simultaneous downloads)

But I think it's redundant as the concurrent downloading cannot significantly increase total downloading speed due to limited network bandwidth.

Downloading in parallel to unpacking, and also setup cost on small files could make it not as clean. But yes, we can try simpler implementation first. Also indicatif is quite useful on big files.


(default ``[localhost]``) A list of docker insecure registries.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is insecure registry? Does it allow no-TLS access?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, insecure registries don't support https.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean can you add a tiny clarification to the docs?

@anti-social
Copy link
Contributor Author

It's the same issue with spawned futures

Yeah, I know the first implementation also could download image layers out of order.

Also indicatif is quite useful on big files.

Progress bar is on the todo list. But I decided to do it later. Guess I can implement it in this PR.

@anti-social
Copy link
Contributor Author

Updated documentation.

We need to know size of a layer to display a downloading progress. But with current dkregistry api it's not possible to get content length of the layer. I'll think about it later.

@anti-social
Copy link
Contributor Author

@tailhook Could you review it and merge if all is OK

@anti-social
Copy link
Contributor Author

The certificate for ubuntu.zerogw.com expired on 4/23/2022.

@@ -26,6 +26,7 @@ of empty container):
* :step:`SubConfig`
* :step:`Container`
* :step:`Tar`
* :step:`DockerImage`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this option really use the "Docker" branding in the name? It'd be confusing for the Podman users as well for anything else coming later. I saw that the tests seem to refer to buildah and the OCI Images. So maybe just play with OCI instead of specific brand names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly just Image would be ok. I don't know any other container standarts.

Another choice: FetchImage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants