You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are running into an issue where very infrequently, Docker containers running inside our sysbox container on AWS EC2 fail to be able to do any operations in their filesystem.
The following reproduces the issue roughly one in ten thousand times:
docker run --runtime=sysbox-runc --hostname=syscont nestybox/alpine-docker:latest /bin/sh -c "dockerd & until docker version > /dev/null 2>&1; do sleep 1; done && docker --version && docker pull alpine:3.20 && docker run --pull never alpine:3.20 chmod 03775 /home"
When the issue is reproduced, chmod 03775 /home fails with chmod: /home: No such file or directory.
A full reproduction
Launch a Ubuntu 24 EC2 instance and run the following as root to setup the system to use sysbox/docker:
This will run a loop inside the Sysbox container -- as stated before, this will fail roughly 1 in 10000 times although it is very sporadic.
Factors we've considered
We've reproduced this issue using various versions of Docker 26 and Docker 27 and using Ubuntu 22.04 and Ubuntu 24.04. We've reproduced it using sysbox-runc directly and using docker run --runtime sysbox-runc. We've seen it fail using both ext4 and xfs filesystems outside the Sysbox container. We haven't seen it fail when running Docker outside of Sysbox. We've seen it fail the first time the inner docker container runs on a machine, so it isn't caused by the repeated invocation of Docker inside the container. However, we attempted to reproduce it on GitHub actions using the following job and have not successfully reproduced it, so it may be specific to the Linux kernel on AWS's Ubuntu AMIs:
jobs:
repro:
runs-on: ubuntu-latest
steps:
- run: |
wget https://downloads.nestybox.com/sysbox/releases/v0.6.5/sysbox-ce_0.6.5-0.linux_amd64.deb
docker rm $(docker ps -a -q) -f || true
sudo apt-get install ./sysbox-ce_0.6.5-0.linux_amd64.deb
sudo systemctl status sysbox -n20
docker run --runtime=sysbox-runc --hostname=syscont nestybox/alpine-docker:latest /bin/sh -c "dockerd & until docker version > /dev/null 2>&1; do sleep 1; done && docker --version && docker pull alpine:3.20 && i=1; while [ \$i -le 10000 ]; do echo \$i && docker run --pull never alpine:3.20 chmod 03775 /home && sleep 0.4; i=\$((i + 1)); done"
Other things we know
We've done some investigation inside the Sysbox container when this happens. It appears the overlayfs directory that Docker creates is totally broken in these cases. For example, in one case when we were exploring this issue when it occurred against the postgres Docker image, we ran
sudo ls /var/lib/docker/overlay2/e691c7a9a4426ded12e56cb89d599758efbecdd32154234305649f1625080519/merged/var/run
and saw there was a postgresql directory inside of that directory, as expected. However, trying to write a file to that postgresql directory and trying to move that postgresql directory both yielded No such file or directory errors, both from inside the Docker container and from outside the Docker container.
The text was updated successfully, but these errors were encountered:
@TAGraves, thanks for the detailed description. Question: during problem reproduction, can you instantiate a second sysbox container? If so, is the problem seen in that second container too? At first glance, I can't think of anything obvious, so I'm just trying to narrow down the issue with these questions.
Also, can you please check if there's any relevant sysbox-related log in your journalctl? Otherwise, enable debug logs for the sysbox-mgr daemon and try to reproduce again.
We are running into an issue where very infrequently, Docker containers running inside our sysbox container on AWS EC2 fail to be able to do any operations in their filesystem.
The following reproduces the issue roughly one in ten thousand times:
When the issue is reproduced,
chmod 03775 /home
fails withchmod: /home: No such file or directory
.A full reproduction
Launch a Ubuntu 24 EC2 instance and run the following as root to setup the system to use sysbox/docker:
Then run the following to build the inner Docker container:
Then run:
This will run a loop inside the Sysbox container -- as stated before, this will fail roughly 1 in 10000 times although it is very sporadic.
Factors we've considered
We've reproduced this issue using various versions of Docker 26 and Docker 27 and using Ubuntu 22.04 and Ubuntu 24.04. We've reproduced it using sysbox-runc directly and using
docker run --runtime sysbox-runc
. We've seen it fail using bothext4
andxfs
filesystems outside the Sysbox container. We haven't seen it fail when running Docker outside of Sysbox. We've seen it fail the first time the inner docker container runs on a machine, so it isn't caused by the repeated invocation of Docker inside the container. However, we attempted to reproduce it on GitHub actions using the following job and have not successfully reproduced it, so it may be specific to the Linux kernel on AWS's Ubuntu AMIs:Other things we know
We've done some investigation inside the Sysbox container when this happens. It appears the overlayfs directory that Docker creates is totally broken in these cases. For example, in one case when we were exploring this issue when it occurred against the postgres Docker image, we ran
and saw there was a
postgresql
directory inside of that directory, as expected. However, trying to write a file to that postgresql directory and trying to move that postgresql directory both yieldedNo such file or directory
errors, both from inside the Docker container and from outside the Docker container.The text was updated successfully, but these errors were encountered: