Skip to content

Commit

Permalink
Changed script to python, added S3 and auth support
Browse files Browse the repository at this point in the history
  • Loading branch information
Morten Rasmussen authored and Morten Rasmussen committed Dec 20, 2018
1 parent 349e808 commit b21c40e
Show file tree
Hide file tree
Showing 5 changed files with 232 additions and 138 deletions.
8 changes: 5 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
FROM alpine:3.6
RUN apk add --no-cache curl bash jq
ADD docker-registry-cleanup.sh /docker-registry-cleanup.sh
CMD /docker-registry-cleanup.sh
RUN apk add --no-cache python3 ca-certificates
ADD docker-registry-cleanup.py /docker-registry-cleanup.py
ADD requirements.txt /requirements.txt
RUN pip3 install -r requirements.txt && chmod +x /docker-registry-cleanup.py
CMD python3 /docker-registry-cleanup.py
60 changes: 48 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,33 +15,69 @@ A feature request to be able to explicitly garbage collect untagged manifests is
This repo is meant as a workaround until we have the necessary tooling in Docker and registry to handle this without 3rd party tools.

## Usage
Replace the `<path-to-registry>` and `<registry-url>` in the below commands. See the *example* below if needed.

To do a dry-run, add `-e DRY_RUN=true`.
### Running against local storage
See the *examples* below if needed.

After running this, you should do a garbage collect in the registry to free up the disk space.

#### For a normal http registry:
| Variable name | Required | Description | Example |
| --- | --- | --- | --- | --- |
REGISTRY_URL | Yes | The URL to the registry | `http://example.com:5000/` |
REGISTRY_DIR | No | The path to the registry dir - not needed if using the docker container and mounting in the dir in /registry (see examples) | `/registry` |
SELF_SIGNED_CERT | No | Set this if using a self-signed cert | `true` |
REGISTRY_AUTH | No | Set this when using http basic auth | `username:password` |
DRY_RUN | No | Set this to do a dry-run (e.g. don't delete anything, just show what would be done) | `true` |

#### Examples of running against local storage:
Simplest way:
```
docker run -it -v /home/someuser/registry:/registry -e REGISTRY_URL=http://192.168.77.88:5000 mortensrasmussen/docker-registry-manifest-cleanup
```

To test it without changing anything in your registry:
```
docker run -it -v /home/someuser/registry:/registry -e REGISTRY_URL=http://192.168.77.88:5000 -e DRY_RUN="true" mortensrasmussen/docker-registry-manifest-cleanup
```

With more options:
```
docker run -it -v <path-to-registry>:/registry -e REGISTRY_URL=<registry-url> mortensrasmussen/docker-registry-manifest-cleanup
docker run -it -v /home/someuser/registry:/registry -e REGISTRY_URL=http://192.168.77.88:5000 -e SELF_SIGNED_CERT="true" -e REGISTRY_AUTH="myuser:sickpassword" mortensrasmussen/docker-registry-manifest-cleanup
```

#### For an https registry with self-signed certificates:
### Running against S3 storage
See the *examples* below if needed.

After running this, you should do a garbage collect in the registry to free up the disk space.

| Variable name | Required | Description | Example |
| --- | --- | --- | --- | --- |
REGISTRY_URL | Yes | The URL to the registry | `http://example.com:5000/` |
ACCESS_KEY | Yes | The Accesskey to S3 | `XXXXXXGZMXXXXQMAGXXX` |
SECRET_KEY | Yes | The secret to S3 | `zfXXXXXEbq/JX++XXXAa/Z+ZCXXXXypfOXXXXC/X` |
BUCKET | Yes | The name of the bucket | `registry-bucket-1` |
REGION | Yes | The region in which the bucket is located | `eu-central-1` |
REGISTRY_DIR | No | Only needed if registry is not in the root folder of the bucket | `/path/to/registry` |
SELF_SIGNED_CERT | No | Set this if using a self-signed cert | `true` |
REGISTRY_AUTH | No | Set this when using http basic auth | `username:password` |
DRY_RUN | No | Set this to do a dry-run (e.g. don't delete anything, just show what would be done) | `true` |

#### Examples of running against S3 storage
Simplest way:
```
docker run -it -v <path-to-registry>:/registry -e REGISTRY_URL=<registry-url> -e CURL_INSECURE=true mortensrasmussen/docker-registry-manifest-cleanup
docker run -it -e REGISTRY_URL=http://192.168.77.88:5000 -e REGISTRY_STORAGE="S3" -e ACCESS_KEY="XXXXXXGZMXXXXQMAGXXX" -e SECRET_KEY="zfXXXXXEbq/JX++XXXAa/Z+ZCXXXXypfOXXXXC/X" -e BUCKET="registry-bucket-1" -e REGION="eu-central-1" mortensrasmussen/docker-registry-manifest-cleanup
```

#### Dry-run
To test it without changing anything in your registry:
```
docker run -it -v <path-to-registry>:/registry -e REGISTRY_URL=<registry-url> -e DRY_RUN=true mortensrasmussen/docker-registry-manifest-cleanup
docker run -it -e DRY_RUN="true" -e REGISTRY_URL=http://192.168.77.88:5000 -e REGISTRY_STORAGE="S3" -e ACCESS_KEY="XXXXXXGZMXXXXQMAGXXX" -e SECRET_KEY="zfXXXXXEbq/JX++XXXAa/Z+ZCXXXXypfOXXXXC/X" -e BUCKET="registry-bucket-1" -e REGION="eu-central-1" mortensrasmussen/docker-registry-manifest-cleanup
```

#### Example:
With more options:
```
docker run -it -v /home/someuser/registry:/registry -e REGISTRY_URL=http://192.168.50.87:5000 mortensrasmussen/docker-registry-manifest-cleanup
docker run -it -e REGISTRY_URL=http://192.168.77.88:5000 -e REGISTRY_STORAGE="S3" -e ACCESS_KEY="XXXXXXGZMXXXXQMAGXXX" -e SECRET_KEY="zfXXXXXEbq/JX++XXXAa/Z+ZCXXXXypfOXXXXC/X" -e BUCKET="registry-bucket-1" -e REGION="eu-central-1" -e SELF_SIGNED_CERT="true" -e REGISTRY_AUTH="myuser:sickpassword" mortensrasmussen/docker-registry-manifest-cleanup
```

## License
This project is distributed under [Apache License, Version 2.0.](LICENSE)

Copyright © 2017 Morten Steen Rasmussen
Copyright © 2018 Morten Steen Rasmussen
173 changes: 173 additions & 0 deletions docker-registry-cleanup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
import glob
import urllib3
from requests.auth import HTTPBasicAuth
import requests
import json
import re
import os
import boto
from boto.s3.key import Key

############################
######## Functions #########
############################
def exit_with_error(message):
print(message)
print("Exiting")
exit(1)

# Initial setup
try:
if "DRY_RUN" in os.environ and os.environ['DRY_RUN'] == "true":
dry_run_mode = True
print("Running in dry-run mode. No changes will be made.")
print()
else:
dry_run_mode = False
if "REGISTRY_STORAGE" in os.environ and os.environ['REGISTRY_STORAGE'] == "S3":
print("Running against S3 storage")
storage_on_s3 = True
s3_access_key = os.environ['ACCESS_KEY']
s3_secret_key = os.environ['SECRET_KEY']
s3_bucket = os.environ['BUCKET']
s3_region = os.environ['REGION']
if "REGISTRY_DIR" in os.environ:
registry_dir = os.environ['REGISTRY_DIR']
else:
registry_dir = "/"
else:
print("Running against local storage")
storage_on_s3 = False
if "REGISTRY_DIR" in os.environ:
registry_dir = os.environ['REGISTRY_DIR']
else:
registry_dir = "/registry"
registry_url = os.environ['REGISTRY_URL']
except KeyError as e:
exit_with_error("Missing environment variable: %s" % (e))

# Optional vars
if "REGISTRY_AUTH" in os.environ:
registry_auth = HTTPBasicAuth(os.environ["REGISTRY_AUTH"].split(":")[0], os.environ["REGISTRY_AUTH"].split(":")[1])
else:
registry_auth = {}
if "SELF_SIGNED_CERT" in os.environ:
cert_verify = False
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
else:
cert_verify = True

# Check connection to registry
try:
r = requests.get("%s/v2/" % (registry_url), auth=registry_auth, verify=cert_verify)
if r.status_code == 401:
exit_with_error("Got an authentication error connecting to the registry. Check credentials, or add REGISTRY_AUTH='username:password'")
except requests.exceptions.SSLError as e:
exit_with_error("Got an SSLError connecting to the registry. Might be a self signed cert, please set SELF_SIGNED_CERT=true")
except requests.exceptions.RequestException as e:
exit_with_error("Could not contact registry at %s - error: %s" % (registry_url, e))

# Set variables
repo_dir = registry_dir + "/docker/registry/v2/repositories"
blob_dir = registry_dir + "/docker/registry/v2/blobs"
all_manifests = set()
linked_manifests = set()
linked_manifest_files = set()
file_list = set()
if storage_on_s3:

bucket_size = 0

# Connect to bucket
conn = boto.s3.connect_to_region(s3_region, aws_access_key_id=s3_access_key, aws_secret_access_key=s3_secret_key)
bucket = conn.get_bucket(s3_bucket)
s3_file_list = bucket.list()

#get all the filenames in bucket as well as size
for key in s3_file_list:
bucket_size += key.size
file_list.add(key.name)
else:
#local storage
for filename in glob.iglob("%s/**" % (registry_dir), recursive=True):
if os.path.isfile(filename):
file_list.add(filename)

for filename in file_list:
if filename.endswith("link"):
if "_manifests/revisions/sha256" in filename:
all_manifests.add(re.sub('.*docker/registry/v2/repositories/.*/_manifests/revisions/sha256/(.*)/link','\\1',filename))
elif "_manifests/tags/" in filename and filename.endswith("/current/link"):
linked_manifest_files.add(filename)

#fetch linked_manifest_files
for filename in linked_manifest_files:
if storage_on_s3:
k = Key(bucket)
k.key = filename

#Get the shasum from the link file
shasum = k.get_contents_as_string().decode().split(":")[1]

#Get the manifest json to check if its a manifest list
k.key = "%s/sha256/%s/%s/data" % (blob_dir, shasum[0:2], shasum)
manifest = json.loads(k.get_contents_as_string().decode())

else:
shasum = open(filename, 'r').read().split(":")[1]
manifest = json.loads(open("%s/sha256/%s/%s/data" % (blob_dir, shasum[0:2], shasum)).read())

manifest_media_type = manifest["mediaType"]

if manifest_media_type == "application/vnd.docker.distribution.manifest.list.v2+json":
#add all manifests from manifest list
for mf in manifest["manifests"]:
linked_manifests.add(mf["digest"])
else:
linked_manifests.add(shasum)

unused_manifests = all_manifests - linked_manifests

if len(unused_manifests) == 0:
print("No manifests without tags found. Nothing to do.")
if storage_on_s3:
print("For reference, the size of the bucket is currently: %s bytes" % (bucket_size))
else:
print("Found " + str(len(unused_manifests)) + " manifests without tags. Deleting")
#counters
current_count = 0
cleaned_count = 0
failed_count = 0
total_count = len(unused_manifests)

for manifest in unused_manifests:
current_count += 1
status_msg = "Cleaning %s of %s" % (current_count, total_count)
if "DRY_RUN" in os.environ and os.environ['DRY_RUN'] == "true":
status_msg += " ..not really, due to dry-run mode"
print(status_msg)

#get repos
repos = set()
for file in file_list:
if "_manifests/revisions/sha256/%s" % (manifest) in file and file.endswith("link"):
repos.add(re.sub(".*docker/registry/v2/repositories/(.*)/_manifests/revisions/sha256.*", "\\1", file))

for repo in repos:
if dry_run_mode:
print("DRY_RUN: Would have run an HTTP DELETE request to %s/v2/%s/manifests/sha256:%s" % (registry_url, repo, manifest))
else:
r = requests.delete("%s/v2/%s/manifests/sha256:%s" % (registry_url, repo, manifest), auth=registry_auth, verify=cert_verify)
if r.status_code == 202:
cleaned_count += 1
else:
failed_count += 1
print("Failed to clean manifest %s from repo %s with response code %s" % (manifest, repo, r.status_code))

print("Job done, Cleaned %s of %s manifests." % (cleaned_count, total_count))
print()
print()
if storage_on_s3:
print("For reference, the size of the bucket before this run was: %s bytes" % (bucket_size))
print()
print("Please run a garbage-collect on the registry now to free up disk space.")
123 changes: 0 additions & 123 deletions docker-registry-cleanup.sh

This file was deleted.

6 changes: 6 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
boto==2.49.0
certifi==2018.11.29
chardet==3.0.4
idna==2.8
requests==2.21.0
urllib3==1.24.1

0 comments on commit b21c40e

Please sign in to comment.