Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition in stage1? #35

Open
nh2 opened this issue Jul 20, 2017 · 3 comments
Open

Possible race condition in stage1? #35

nh2 opened this issue Jul 20, 2017 · 3 comments

Comments

@nh2
Copy link

nh2 commented Jul 20, 2017

I got on an OVH cloud server running Ubuntu 14.04:

>>> Validating checksum
nixos-minimal-16.09.680.4e14fd5-x86_64-linux.iso: OK
>>> Extracting ISO
mount: /dev/loop0 is write-protected, mounting read-only
Parallel unsquashfs: Using 8 processors
44678 inodes (49014 blocks) to write

[=======================================================================|] 49014/49014 100%

created 37671 files
created 13602 directories
created 7007 symlinks
created 0 devices
created 0 fifos
>>> Embarking stage1!
>>> Setting up chroot networking
>>> Looking for NixOS init... find: './proc/1902': No such file or directory

Running it again made it go past that without problems.

Maybe there's a race?

@jeaye
Copy link
Owner

jeaye commented Jul 20, 2017

Hm, I think you're onto something. The snippet of the relevant code is this:

## Enable networking
log "Setting up chroot networking"
cd host
mkdir -p etc dev proc sys
cp /etc/resolv.conf etc/external-resolv.conf
for fn in dev dev/shm dev/pts proc sys; do mount --bind "/$fn" "$fn"; done

## Patch the ISO for local chroot
log_start "Looking for NixOS init... "
INIT=$(find . -type f -path '*nixos*/init')
log_end "$INIT"

If any of those mounts have not yet finished, though the command is run synchronously, you may fail the find. In your case, it looks like mounting proc too extra long.

Are you able to continue testing this to see how often you can reproduce it? If so, I would recommend changing that mount --bind to mount --bind -o sync and see if the issue goes away. In the meantime, I can further research solutions.

@jeaye
Copy link
Owner

jeaye commented Jul 20, 2017

After some more research and question asking, it looks like the race is not to do with the mounts, but just to do with find being racy to begin with. It does have some flags to handle issues like this, and I think we can exclude proc from the search entirely as well.

In short, this issue happened because, as you were running find, the process 1902 was destroyed and its file was removed, but find still tried to open it. So, this has nothing to do with the NixOS installation in particular and should be easy to mitigate; I'll leave this open for now and push a fix tonight.

Thanks for reporting this in such a helpful fashion!

@srid
Copy link

srid commented Jan 6, 2018

Just hit this with Debian Stretch on a OVH dedicated server. Repro'ed 2nd time as well. Adding a sleep 2 before the find fixed it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants