-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow use of skopeo's sync
command instead of copy
#58
Comments
On every task invocation the skopeo skopeo --insecure-policy copy --dest-tls-verify=false --src-creds={user}:{pasword} docker://docker.io/library/busybox:1.28 docker://172.17.0.1:5000/library/busybox:1.28 I let the task run twice and checked the log of the local registry (at debug level), and watched for
Upon the second task invocation, there is overall much less log activity, and only three
This all looks like skopeo |
Is there a way to convince it to use |
It's possible, but wouldn't really change anything. Looking at the implementations of the skopeo If overhead is a concern, I think the best would be to first measure traffic volume from/to the involved registries for first and second sync round. That would give us a better feeling for what the numbers actually are. |
They may call the same function, but they aren't identical. copy always copies the image (fortunately just the metadata if the image already exists), sync will see that the image already exists and not update anything. With a simple alpine image, this is a difference between 76K and 46K of network traffic. I am going to run some comparisons on larger images to see if that is a set amount per layer and/or how that scales. I am also a bit concerned that kubernetes might see the image metadata change as an image change and restart pods on a deploy when there isn't a need to. I haven't verified that this is really a concern.
If I look at the sync namespace after running the above sync, the metadata has not been updated and it still shows the modification time from the copy. |
I just wrote a quick-n-dirty patch that seems to do the right thing. It isn't perfect (there isn't any argument checking for example), but it is working for me and calling sync instead of copy. # relay config sections
skopeo:
# path to the skopeo binary; defaults to 'skopeo', in which case it needs to
# be in PATH
binary: skopeo
# directory under which to look for client certs & keys, as well as CA certs
# (see note below)
#certs-dir: /etc/skopeo/certs.d
# skopeo mode, 'copy' or 'sync'. Defaults to 'copy' if not set
mode: sync
❯ git diff
diff --git a/internal/pkg/relays/skopeo/skopeo.go b/internal/pkg/relays/skopeo/skopeo.go
index bc3254a..894ac13 100644
--- a/internal/pkg/relays/skopeo/skopeo.go
+++ b/internal/pkg/relays/skopeo/skopeo.go
@@ -33,6 +33,7 @@ const defaultCertsBaseDir = "/etc/skopeo/certs.d"
var skopeoBinary string
var certsBaseDir string
+var skopeoMode string
//
func init() {
diff --git a/internal/pkg/relays/skopeo/skopeorelay.go b/internal/pkg/relays/skopeo/skopeorelay.go
index f47a3c0..7f4517b 100644
--- a/internal/pkg/relays/skopeo/skopeorelay.go
+++ b/internal/pkg/relays/skopeo/skopeorelay.go
@@ -33,6 +33,7 @@ const RelayID = "skopeo"
type RelayConfig struct {
Binary string `yaml:"binary"`
CertsDir string `yaml:"certs-dir"`
+ Mode string `yaml:"mode"`
}
//
@@ -55,6 +56,11 @@ func NewSkopeoRelay(conf *RelayConfig, out io.Writer) *SkopeoRelay {
if conf.CertsDir != "" {
certsBaseDir = conf.CertsDir
}
+ if conf.Mode != "" {
+ skopeoMode = conf.Mode
+ } else {
+ skopeoMode = "copy"
+ }
}
return relay
@@ -86,11 +92,17 @@ func (r *SkopeoRelay) Sync(srcRef, srcAuth string, srcSkipTLSVerify bool,
cmd := []string{
"--insecure-policy",
- "copy",
+ skopeoMode,
+ }
+
+ if skopeoMode == "sync" {
+ cmd = append(cmd, "--src=docker")
+ cmd = append(cmd, "--dest=docker")
}
if srcSkipTLSVerify {
- cmd = append(cmd, "--src-tls-verify=false")
+ cmd = append(cmd, "--src-tls-verify=false)")
}
if destSkipTLSVerify {
cmd = append(cmd, "--dest-tls-verify=false")
@@ -126,11 +138,20 @@ func (r *SkopeoRelay) Sync(srcRef, srcAuth string, srcSkipTLSVerify bool,
errs := false
for _, tag := range tags {
log.WithField("tag", tag).Info("syncing tag")
- if err := runSkopeo(r.wrOut, r.wrOut, verbose, append(cmd,
- fmt.Sprintf("docker://%s:%s", srcRef, tag),
- fmt.Sprintf("docker://%s:%s", destRef, tag))...); err != nil {
- log.Error(err)
- errs = true
+ if skopeoMode == "copy" {
+ if err := runSkopeo(r.wrOut, r.wrOut, verbose, append(cmd,
+ fmt.Sprintf("docker://%s:%s", srcRef, tag),
+ fmt.Sprintf("docker://%s:%s", destRef, tag))...); err != nil {
+ log.Error(err)
+ errs = true
+ }
+ } else {
+ if err := runSkopeo(r.wrOut, r.wrOut, verbose, append(cmd,
+ fmt.Sprintf("%s:%s", srcRef, tag),
+ fmt.Sprintf("%s", destRef))...); err != nil {
+ log.Error(err)
+ errs = true
+ }
}
} |
You must have read my thoughts ;-) I was going to create a branch today with the exact same I looked again at the image copy function used from both |
I did a few measurements myself for
Overall, we can confirm that layers are not actually copied with each sync round, but rather that destination manifests get Adding the skopeo mode setting would be a useful enhancement, so users can opt for |
sync
command instead of copy
Interesting... I ran a quick test with an alpine image and saw about 2x more total traffic with copy than I did with sync -- but that was a very small image. I got pulled into a different fire yesterday afternoon so didn't get a change to test with a much larger image. My suspicion is that the larger image with more layers will show almost the same traffic. I'll see if I can get back to that later this morning. |
I chose the busybox image because it is very small, just about 1MB, so the overhead would be easier to see. For the second round, I also quickly ran the test suite with skopeo mode at |
hi @xelalexv i was wondering if we can re-consider this issue, recently i have recently switched over from using sync mode (similar to what @sjthespian has changed to the code) to copy mode so that i can use the
every time when we add a new tag, and re-run the golang task, if this was just me using this tool, i would totally be ok to wait, but when this is a tool that is in the critical path where multiple folks are sending requests to this tool to run dregsy tasks simultaneously, congestions can happen longer than expected. also, one of the current issue i observed when porting from sync > copy is the mappings in copy mode:
in sync mode:
this will achieve the same results of importing the image to internal-registry/library/golang:tag All in all, would you consider adding sync as an option to minimize time it takes to compare if a image has already been imported? |
Making this available as a config option is not complicated as such, I'd say. The real problem, as mentioned above in my last post, is that switching to
So it may not be possible to use |
i see, in this case, is the best option to just do two seperate jobs? request based tasks done via maintaince tasks ie: i would just need to generate a config based on the latest dregsy config to append the tag filtering for each image when running |
I am using skopeo to sync my registries, but every time it runs it transfers data, even if it has already copied the image in the prior pass. I see the last modified time in the destination registry update on each run. My expectation was that this would work more like rsync -- i.e. only copying data that has changed on each task run.
Even more importantly, I would expect skopeo to only copy layers that have changed. It looks like the way it is running now it will copy layers that already exist on the destination multiple times if they are used by multiple images.
Is there a way, short of editing the tags list in the yaml file after a successful copy, to only have dregsy sync images that don't exist in the destination?
I am using a local build of dregsy built from the master branch in GitHub and skopeo 1.4.1.
Obfuscated config and log output:
The text was updated successfully, but these errors were encountered: