Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of machines in cloud but only a few in docker-machine #115

Open
buffcode opened this issue Nov 22, 2023 · 3 comments
Open

Lots of machines in cloud but only a few in docker-machine #115

buffcode opened this issue Nov 22, 2023 · 3 comments

Comments

@buffcode
Copy link

We are currently running on 4.1.0 (I will upgrade later today) and we have the problem (since multiple versions) that docker-machine creates servers but some how fails to remember those.

I recently manually deleted about 30 servers in Hetzner cloud that weren't known to docker-machine ls (anymore?) but definitely created this way.

We are using docker-machine to spin up cloud runners for GitLab CI, so every runner has a fixed prefix and is easily recognizable.

Is there a way to sync docker-machine with hetzner cloud, so that these servers get picked up again? Or that docker-machine recognizes those unmanaged machines and removes them? This is filling our resource limits and bills as well :)

Can I provide logs (which?) to debug this? This usually stacks up over multiple weeks and does not happen on a daily basis.

@buffcode
Copy link
Author

After upgrading to 5.0.1 and creating all of the missing servers:

runner-ovfjcph1-runner-1700632818-61a5a758   -        hetzner   Error                                         Unknown    coul
d not execute drivers.MustBeRunning: could not get server by ID: limit of 5000 requests per hour for XXXX:XXXX:c0c:b1cc::1 rea
ched (rate_limit_exceeded)
runner-ovfjcph1-runner-1700639271-6e1e6488   -        hetzner   Error                                         Unknown    coul
d not execute drivers.MustBeRunning: could not get server by ID: limit of 3600 requests per hour reached (rate_limit_exceeded
)

Maybe this also affects which machines/states are known on both sides?

@buffcode
Copy link
Author

After the API being accessible again I can confirm that docker-machine and Hetzner cloud are now out of sync.

Docker reports 19 servers while Hetzner currently has 42 servers.

@JonasProgrammer
Copy link
Owner

Hi,

sorry I came back only now, I was dealing with some medical issues.

It is indeed possible for Hetzner and the driver to get out-of-sync. docker-machine implements a rather basic RPC protocol and the server creation logic boils down to a pre-create check (which on a best-effort basis tries to ensure the machine creation should succeed), the actual creation and then waiting for the machine to come up.
Depending on which step fails, docker-machine may conclude the machine has not been created and decide to remove the files; the driver on the other hand only performs a tear-down during the creation steps.

Unfortunately the setup process is wonky and inherently racy. There are some options to configure retry behavior, intended specifically for dealing with rate-limiting issues, but there is still no guaranteed. The best thing I can recommend is to check the servers manually after an abnormal creation failure, perhaps tagging them beforehand so they are easier to identify.
I am myself dealing with this problem when terminating docker-machine prematurely in development and sometimes having left-over resources (including running servers) then; it's annoying, but unfortunately for me so far the aforementioned manual way is the best thing I could come up with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants