-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Skymeld] Symlink planting for external repos races against action execution #22073
Comments
Cc @joeleba |
Thanks for filing the bug. I'll look into this. Incidentally I was looking at aspect-build/rules_js#1412 today and was about to ask for a minimal repro. |
@bazel-io flag |
@bazel-io fork 7.2.0 |
It's not unlikely that this is also the cause of #20886. |
Update: I expect to have a fix submitted sometime this week. |
Consider the following scenario: - Top level targets `A` and `B` - To execute actions in `A`, we need to plant symlinks for `NestedSet<Package>: [A1, C1, [D1]]`, - To execute actions in `B`, we need to plant symlinks for `NestedSet<Package>: [B1, C1, [D1]]` In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`. **What went wrong** With the current code, there are 2 possible race conditions: 1. Transitive NestedSet level: - Start to plant symlinks for `A` - `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink planted yet_. - Start to plant symlinks for `B` - `NestedSet [D1]` seen as "handled" and immediately skipped. Planted `B1` and `C1`. `B` moves on to execution. => actions from `B` that requires `D1` would fail (no such file or directory). 2. Individual symlink level: - Start to plant symlinks for `A` - `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_. - Start to plant symlinks for `B` - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. Planted `B1` and `D1`. `B` moves on to execution. => actions from `B` that requires `C1` would fail (no such file or directory). **The Solution** In order to prevent this race condition, we can plant the symlinks for top level targets `A` and `B` sequentially. This gives us the guarantee that: for an action `foo` under a top level target `A`, `foo` is only executed when all the necessary symlinks for `A` are already planted. The above scenario would look like: - Start to plant symlinks for `A` - The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and is held in the sequential event queue - Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves on to execution. - Start to plant symlinks for `B` - `NestedSet [D1]` already seen in `handledPackageNestedSets` and immediately skipped. - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. - Planted `B1`. `B` moves on to execution. As an (hopefully not premature) optimization, the symlinks under a single top level target are planted in parallel. Fixes bazelbuild#22073 Verified locally with something similar to the repro in bazelbuild#22073 (comment). PiperOrigin-RevId: 628080361 Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a
Consider the following scenario: - Top level targets `A` and `B` - To execute actions in `A`, we need to plant symlinks for `NestedSet<Package>: [A1, C1, [D1]]`, - To execute actions in `B`, we need to plant symlinks for `NestedSet<Package>: [B1, C1, [D1]]` In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`. **What went wrong** With the current code, there are 2 possible race conditions: 1. Transitive NestedSet level: - Start to plant symlinks for `A` - `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink planted yet_. - Start to plant symlinks for `B` - `NestedSet [D1]` seen as "handled" and immediately skipped. Planted `B1` and `C1`. `B` moves on to execution. => actions from `B` that requires `D1` would fail (no such file or directory). 2. Individual symlink level: - Start to plant symlinks for `A` - `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_. - Start to plant symlinks for `B` - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. Planted `B1` and `D1`. `B` moves on to execution. => actions from `B` that requires `C1` would fail (no such file or directory). **The Solution** In order to prevent this race condition, we can plant the symlinks for top level targets `A` and `B` sequentially. This gives us the guarantee that: for an action `foo` under a top level target `A`, `foo` is only executed when all the necessary symlinks for `A` are already planted. The above scenario would look like: - Start to plant symlinks for `A` - The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and is held in the sequential event queue - Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves on to execution. - Start to plant symlinks for `B` - `NestedSet [D1]` already seen in `handledPackageNestedSets` and immediately skipped. - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. - Planted `B1`. `B` moves on to execution. As an (hopefully not premature) optimization, the symlinks under a single top level target are planted in parallel. Fixes #22073 Verified locally with something similar to the repro in #22073 (comment). PiperOrigin-RevId: 628080361 Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a Commit 52adf0b Co-authored-by: Googler <leba@google.com>
Consider the following scenario: - Top level targets `A` and `B` - To execute actions in `A`, we need to plant symlinks for `NestedSet<Package>: [A1, C1, [D1]]`, - To execute actions in `B`, we need to plant symlinks for `NestedSet<Package>: [B1, C1, [D1]]` In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`. **What went wrong** With the current code, there are 2 possible race conditions: 1. Transitive NestedSet level: - Start to plant symlinks for `A` - `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink planted yet_. - Start to plant symlinks for `B` - `NestedSet [D1]` seen as "handled" and immediately skipped. Planted `B1` and `C1`. `B` moves on to execution. => actions from `B` that requires `D1` would fail (no such file or directory). 2. Individual symlink level: - Start to plant symlinks for `A` - `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_. - Start to plant symlinks for `B` - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. Planted `B1` and `D1`. `B` moves on to execution. => actions from `B` that requires `C1` would fail (no such file or directory). **The Solution** In order to prevent this race condition, we can plant the symlinks for top level targets `A` and `B` sequentially. This gives us the guarantee that: for an action `foo` under a top level target `A`, `foo` is only executed when all the necessary symlinks for `A` are already planted. The above scenario would look like: - Start to plant symlinks for `A` - The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and is held in the sequential event queue - Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves on to execution. - Start to plant symlinks for `B` - `NestedSet [D1]` already seen in `handledPackageNestedSets` and immediately skipped. - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. - Planted `B1`. `B` moves on to execution. As an (hopefully not premature) optimization, the symlinks under a single top level target are planted in parallel. Fixes bazelbuild#22073 Verified locally with something similar to the repro in bazelbuild#22073 (comment). PiperOrigin-RevId: 628080361 Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a
A fix for this issue has been included in Bazel 7.2.0 RC1. Please test out the release candidate and report any issues as soon as possible. |
Consider the following scenario: - Top level targets `A` and `B` - To execute actions in `A`, we need to plant symlinks for `NestedSet<Package>: [A1, C1, [D1]]`, - To execute actions in `B`, we need to plant symlinks for `NestedSet<Package>: [B1, C1, [D1]]` In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`. **What went wrong** With the current code, there are 2 possible race conditions: 1. Transitive NestedSet level: - Start to plant symlinks for `A` - `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink planted yet_. - Start to plant symlinks for `B` - `NestedSet [D1]` seen as "handled" and immediately skipped. Planted `B1` and `C1`. `B` moves on to execution. => actions from `B` that requires `D1` would fail (no such file or directory). 2. Individual symlink level: - Start to plant symlinks for `A` - `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_. - Start to plant symlinks for `B` - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. Planted `B1` and `D1`. `B` moves on to execution. => actions from `B` that requires `C1` would fail (no such file or directory). **The Solution** In order to prevent this race condition, we can plant the symlinks for top level targets `A` and `B` sequentially. This gives us the guarantee that: for an action `foo` under a top level target `A`, `foo` is only executed when all the necessary symlinks for `A` are already planted. The above scenario would look like: - Start to plant symlinks for `A` - The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and is held in the sequential event queue - Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves on to execution. - Start to plant symlinks for `B` - `NestedSet [D1]` already seen in `handledPackageNestedSets` and immediately skipped. - `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. - Planted `B1`. `B` moves on to execution. As an (hopefully not premature) optimization, the symlinks under a single top level target are planted in parallel. Fixes bazelbuild#22073 Verified locally with something similar to the repro in bazelbuild#22073 (comment). PiperOrigin-RevId: 628080361 Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a
Description of the bug:
When Skymeld is enabled, execroot symlinks for the root of external repositories are created when a
TopLevelTargetReadyForSymlinkPlanting
event is fired and received byIncrementalPackageRoots#topLevelTargetReadyForSymlinkPlanting
. Notably, there is no happens-before relationship between the actual filesystem operation creating the symlink and the execution of dependent actions, which results in a race that can result in the symlink not being available during action execution, thus causing anIOException
.Which category does this issue belong to?
Core
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Thread.sleep(10000)
before line https://cs.opensource.google/bazel/bazel/+/d8613c222f233406165c623b1a92179faed142ba:src/main/java/com/google/devtools/build/lib/buildtool/SymlinkForest.java;l=474.bazel build //...
in https://github.com/bazelbuild/examples/tree/main/frontend.This should reliably reproduce bazelbuild/examples#421.
Which operating system are you running Bazel on?
Any
What is the output of
bazel info release
?HEAD
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: