The ollama registry is somewhat a docker registry, but also somewhat not. Hence, normal docker pull through caches do not work. This project aims to pull through cache, that you can deploy in your local network, to speed up the pull times of the ollama registry.
- Install golang
- Clone the repo
- Build the code via
go build -o proxy main.go
- Execute the binary via
./proxy
- The cache will be available at http://localhost:9200`
- Run via docker
docker run -p 9200:9200 -v ./cache_dir_docker:/pull-through-cache ghcr.io/beans-bytes/ollama-registry-pull-through-proxy:latest
- This mounts the local folder
cache_dir_docker
to the cache directory of the container. This will contain ollama files. - The cache will be available at http://localhost:9200`
This is quite easy, just prepend http://localhost:9200/library/
to the image you want to run/pull
This ollama pull <image>:<tag>
becomes
ollama pull --insecure http://localhost:9200/library/<image>:<tag>
Note: As we run on a non-https endpoint we need to add the --insecure
to the command
p.S. Please give a thumbs up on this PR, so that the default behavior of the ollama client can be overwrite to use the cache. Will look nicer and work better.
This proxy is based on a worker architecture. It has a worker pool, that can be configured via the NUM_DOWNLOAD_WORKERS
environment variable.
When a request comes in, the proxy will check if the file is already in the cache. If not, it will mark it as queued and serve the request from the upstream.
In the background the worker checks the queue and downloads the file. Going forward, the file will be served from the local cache, not upstream anymore
Environment Variable | Description | Default |
---|---|---|
PORT |
The port the proxy listens on | 9200 |
DUMP_UPSTREAM_REQUESTS |
If the proxy should dump the upstream requests | false |
CACHE_DIR |
Directory where the cache is stored | ./cache_dir |
NUM_DOWNLOAD_WORKERS |
Number of workers that download the files | 1 |
MANIFEST_LIFETIME |
The lifetime of the model manifest. These change from time to time on the registry, and we want to get updates | 240h / 10 days |
UPSTREAM_ADDRESS |
The upstream ollama registry | https://registry.ollama.ai/ |
LOG_LEVEL |
The log level of the proxy | info |
LOG_FORMAT_JSON |
If the log should be in json format. This is nice for external log tools. (e.g. when you run this in kubernetes) | false |
Feel free to create issues and PRs. The project is tiny as of now, so no dedicated guidelines.
Disclaimer: This is a side project. Don't expect any fast responses on anything.
- It is a fix to ollama/ollama#914 (comment)
- To make its behavior work better, we would need this PR merged: ollama/ollama#5241
MIT