-
-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container creation sometimes fails in the 32-bit ARM test job #1780
Comments
This was referenced Jan 18, 2025
Merged
Thanks for keeping track of this! My hope is that over time this issue will go away - the runners are still new and maybe the tracking software they have still has some shortcomings, growing pains if you will. |
EliahKagan
added a commit
to EliahKagan/gitoxide
that referenced
this issue
Jan 24, 2025
In the AArch64/ARM64 (64-bit, non-containerized) test-fast job, this uses the `ubuntu-22.04-arm` runner instead of the `ubuntu-24.04-arm` runner. This is to avoid the errors described in GitoxideLabs#1790, i.e., to work around rust-lang/rust#135867. Such problems have not been observed on the 22.04 runner, including in tests intended to find them, and switching to it seems to be a complete workaround for the problem. In contrast, continuing to use the 24.04 runner, but attempting to work around the problem by switching from the stable to the beta channel, looks like it would greatly decrease the frequency of the errors but not eliminate them. A problem with `actions/checkout` failing is likewise observed on the 24.04 runner only, so using 22.04 avoids that too. Because that seems like a complete workaround, this also reverts 50da7cb (GitoxideLabs#1792). That is to say that the ARM64 test-fast job is again in the `test-fast` matrix. It is capable of cancelling or being cancelled by the other `test-fast` checks. Code duplication in the workflow is somewhat decreased. The job will again block PR auto-merge. Similar errors do not seem to have occurred in the `test-32bit` job that runs an arm32v7 Docker image in `ubuntu-24.04-arm`, and it is not clear that changing the runner image would help with GitoxideLabs#1780, nor even if that issue is still happening. Therefore, it is not changed there at this time. This affects only ARM Linux runners. The x86-64 runners continue to use `ubuntu-latest`, which is currently resolved to `ubuntu-24.04`, and that does not need to be changed. Likewise, the `macos-latest` runners use ARM processors (Apple Silicon) and they are fine. Various experiments were done in a separate workflow. This commit also removes that workflow, because it is not actively needed anymore, and because, if kept, it would have to be modified to avoid running hundreds of extra checks on each and every push.
This comment has been minimized.
This comment has been minimized.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Current behavior 😯
One of the changes in #1777 was to add an arm32v7 test job that runs in a container on the new arm64 runner (cbe3793, fbc27b5), analogous to the preexisting i386 test job that runs in a container on an amd64 runner. It looks like this may be brittle, with container creation failing from time to time. This is the failure noted in #1778 (comment).
This resembles,, and is probably identical to, some failures I had seen when testing #1777, that I had erroneously assumed (or hoped) were due to a hiccup in infrastructure rather than a persistent problem. This issue tracks that in case it is a persistent problem, which seems likely. If it happens again and no fix is apparent, I can revert the parts of #1777 that are about 32-bit testing, while keeping 87387c2 from it, which does not seem to have had any problems.
Something that probably isn't the cause
It is possible for a 64-bit ARM processor not to be capable of natively executing 32-bit ARM instructions--unlike 64-bit and 32-bit x86, this capability is not universal. When that happens, if
binfmt_misc
is configured to provide emulation via QEMU, a container of the incompatible architecture can still be run, but it will run much slower and some things may not work. However, while that was an early concern I had, as far as I can tell from the error that does not seem to be a factor here. Furthermore, in another repository, I checked in a reverse shell that no such architecture was enabled inbinfmt_misc
(EliahKagan/arm@496d9c1
), and also even tried turning offbinfmt_misc
(EliahKagan/arm@efa15ff
), and a 32-bit ARM binary was still able to run.Expected behavior 🤔
The container specified in the
container:
key should start up at least as reliably in jobs on the ARM runner as other runners.Git behavior
Not directly applicable, but Git does test on various platforms. Cursory inspection of the
runs-on
keys in this workflow suggests Git may not be using the newubuntu-24.04-arm
orubuntu-22.04-arm
GHA runners at this time.Steps to reproduce 🕹
I'm unsure what factors trigger this, or if it is effectively random. It seems likely that it will happen again, but I'm not certain, so I'm opening this issue rather than immediately changing the workflow.
When I was working on the PR, I think it happened most often when I had two pushes separated by a very short time. My first thought was that it might have to do with caching. That is implausible, though, at least with respect to the caching of Rust dependencies that we are doing, because the failure happens much earlier, when the GitHub Actions runner software runs Docker to set up the job, before any steps of the job have begun.
The text was updated successfully, but these errors were encountered: