Skip to content

Commit

Permalink
Fix part of #59, part of #3926: Upgrade compute affected tests (#4929)
Browse files Browse the repository at this point in the history
## Explanation
Fix part of #59
Fix part of #3926

As part of developing downstream PRs for #59, it was discovered that PRs
which change a LOT of files (such as #4937) can run into problems where
ComputeAffectedTests simply times out trying to compute the entire list
of affected targets.

**Critical performance and compatibility fixes**

There have been past efforts to optimize the affected tests workflows
(bucketing, breaking up some of the computations), but it was discovered
that the most expensive part of the process is running the
``rbuildfiles`` query to figure out which BUILD files were affected by
changed files. It was known this was an expensive step in the past, but
it wasn't clear until this PR exactly how to address it. This PR changes
the script to now:
- Filter ``rbuildfiles`` to only run under first-party targets (since it
shouldn't be possible for third-party BUILD files to be affected by
first-party changes). This reduces the search graph.
- Introduce Bazel command sharding for this step like the script already
does for others. This breaks up the command to run on a subset of files
multiple times, combining the output rather than running a single
command with a large input. It seems that ``rbuildfiles`` does not scale
linearly with its input size, so this drastically improves the script's
performance. It's thought that this approach is also more logically
correct due to more correct sibling matching semantics, but it's a bit
hard to reason about ``bazel query`` behavior at times so I can't be
100% confident in this. Nevertheless, the existing tests pass and I
haven't seen any issues from using these changes in downstream PRs.

Separately, another issue was discovered wherein some commands
(including certain Bazel commands) can actually cause
``CommandExecutorImpl`` to soft-lock and always time out. This was due
to an issue in the previous implementation wherein it would wait to read
a command's stdout until after the timeout has been completed (i.e. it
assumed the process would finish). This isn't correct, however: stdout
is blocking I/O, and some commands are implemented to only continue
execution after their standard output is read. The new implementation
makes use of coroutine actors to consume stdout and stderr at the same
time as waiting for execution to ensure all commands can continue
execution and that they finish within the desired timeout. Note that the
new ``ScriptBackgroundCoroutineDispatcher`` was actually introduced (in
#5313) specifically to support this new ``CommandExecutorImpl``
implementation, though it has since been found to have lots of other
nice benefits by providing scripts with a reliable mechanism for
performing asynchronous operations without having to manage their own
execution dispatcher.

Command execution for Bazel commands has also been updated to time out
after 5 minutes rather than the previous 1 minute. Despite the
optimizations and robustness improvements above, some commands still
take quite some time to run for especially large and complex cases.
While this change may result in a slower failure turnaround in cases
when commands are soft-locked, it should result in better CI and script
robustness in the long-term.

**Better support for chained PRs & possibly merge queues**

ComputeAffectedTests was also updated to use a merge base commit rather
than a reference to the develop branch (where this new commit hash is
provided by the CI workflow). The idea behind this is that the merge
base commit is:
- More logically correct (as ``ComputeAffectedTests`` is meant to run in
contexts where a branch wants to be merged into a destination).
- Better compatible with chained PRs. This allows for **significantly**
better CI performance in chained PR situations since now only the tests
relevant to the child PR will be run rather than all tests for the child
& its parental hierarchy (see downstream PRs' CI runs to see this
working in action).
- Hopefully better compatibility with merge queues (#3926). This hasn't
been verified, but the flexibility in customizing the destination for
affected tests should be the main prerequisite to properly supporting
merge queues.

**Other changes**

``GitClient`` was updated to have a peace-of-mind check to ensure the
base commit (provided as explained in the previous section) matches the
merge base of the current branch. This should always be true (except
maybe in merge queues--this will need to be verified). Note that this is
only a soft warning, not an assertion failure.

``RepositoryFile`` was cleaned up slightly to be a bit more consistent
with other directory management approaches done in scripts. I can see
this being refactored more in the future. Callsite behavior isn't
expected to be affected by these changes.

Some script tests were updated to have consistent formatting (which
required updating the TODO exemptions). ``TodoIssueResolvedCheckTest``
and ``TodoOpenCheckTest`` also had some of their test file management
cleaned up a bit.

**A note on testing**

These are inherently difficult things to test. I've verified what I
could via CI and general observation, but I've also largely relied on
existing tests to catch regressions (and many were caught during changes
to the script). Since these are mainly implementation and not behavioral
changes, I'm comfortable with the level of testing that was done.

## Essential Checklist
- [x] The PR title and explanation each start with "Fix #bugnum: " (If
this PR fixes part of an issue, prefix the title with "Fix part of
#bugnum: ...".)
- [x] Any changes to
[scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets)
files have their rationale included in the PR explanation.
- [x] The PR follows the [style
guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide).
- [x] The PR does not contain any unnecessary code changes from Android
Studio
([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)).
- [x] The PR is made from a branch that's **not** called "develop" and
is up-to-date with "develop".
- [x] The PR is **assigned** to the appropriate reviewers
([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)).

## For UI-specific PRs only
N/A -- This is an infrastructure-only change.

---------

Co-authored-by: Adhiambo Peres <59600948+adhiamboperes@users.noreply.github.com>
Co-authored-by: Sean Lip <sean@seanlip.org>
  • Loading branch information
3 people authored May 22, 2024
1 parent 6f5f8fa commit 24b1721
Show file tree
Hide file tree
Showing 26 changed files with 257 additions and 196 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,12 @@ jobs:
id: compute-test-matrix
env:
compute_all_targets: ${{ contains(github.event.pull_request.title, '[RunAllTests]') }}
# See: https://docs.github.com/en/webhooks-and-events/webhooks/webhook-events-and-payloads#pull_request.
base_commit_hash: ${{ github.event.pull_request.base.sha }}
# https://unix.stackexchange.com/a/338124 for reference on creating a JSON-friendly
# comma-separated list of test targets for the matrix.
run: |
bazel run //scripts:compute_affected_tests -- $(pwd) $(pwd)/affected_targets.log origin/develop compute_all_tests=$compute_all_targets
bazel run //scripts:compute_affected_tests -- $(pwd) $(pwd)/affected_targets.log $base_commit_hash compute_all_tests=$compute_all_targets
TEST_BUCKET_LIST=$(cat ./affected_targets.log | sed 's/^\|$/"/g' | paste -sd, -)
echo "Affected tests (note that this might be all tests if configured to run all or on the develop branch): $TEST_BUCKET_LIST"
echo "::set-output name=matrix::{\"affected-tests-bucket-base64-encoded-shard\":[$TEST_BUCKET_LIST]}"
Expand Down
74 changes: 37 additions & 37 deletions scripts/assets/todo_open_exemptions.textproto
Original file line number Diff line number Diff line change
Expand Up @@ -232,24 +232,24 @@ todo_open_exemption {
}
todo_open_exemption {
exempted_file_path: "scripts/src/javatests/org/oppia/android/scripts/todo/TodoIssueResolvedCheckTest.kt"
line_number: 69
line_number: 71
line_number: 73
line_number: 79
line_number: 77
line_number: 93
line_number: 95
line_number: 97
line_number: 103
line_number: 131
line_number: 133
line_number: 139
line_number: 101
line_number: 128
line_number: 130
line_number: 136
line_number: 140
line_number: 142
line_number: 143
line_number: 145
line_number: 146
line_number: 177
line_number: 179
line_number: 185
line_number: 189
line_number: 191
line_number: 192
line_number: 172
line_number: 174
line_number: 180
line_number: 184
line_number: 186
line_number: 187
}
todo_open_exemption {
exempted_file_path: "scripts/src/javatests/org/oppia/android/scripts/todo/TodoOpenCheckTest.kt"
Expand Down Expand Up @@ -281,58 +281,58 @@ todo_open_exemption {
line_number: 287
line_number: 288
line_number: 292
line_number: 336
line_number: 337
line_number: 338
line_number: 342
line_number: 411
line_number: 339
line_number: 343
line_number: 412
line_number: 418
line_number: 420
line_number: 438
line_number: 413
line_number: 419
line_number: 421
line_number: 439
line_number: 440
line_number: 441
line_number: 442
line_number: 457
line_number: 443
line_number: 458
line_number: 461
line_number: 476
line_number: 459
line_number: 462
line_number: 477
line_number: 478
line_number: 479
line_number: 480
line_number: 481
line_number: 482
line_number: 485
line_number: 483
line_number: 486
line_number: 487
line_number: 505
line_number: 509
line_number: 569
line_number: 488
line_number: 506
line_number: 510
line_number: 570
line_number: 576
line_number: 578
line_number: 599
line_number: 571
line_number: 577
line_number: 579
line_number: 600
line_number: 601
line_number: 602
line_number: 603
line_number: 640
line_number: 604
line_number: 641
line_number: 644
line_number: 677
line_number: 642
line_number: 645
line_number: 678
line_number: 679
line_number: 680
line_number: 681
line_number: 682
line_number: 683
line_number: 686
line_number: 684
line_number: 687
line_number: 688
line_number: 736
line_number: 740
line_number: 689
line_number: 737
line_number: 741
}
todo_open_exemption {
exempted_file_path: "scripts/static_checks.sh"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import org.oppia.android.scripts.common.ScriptBackgroundCoroutineDispatcher
import org.oppia.android.scripts.proto.AffectedTestsBucket
import java.io.File
import java.util.Locale
import java.util.concurrent.TimeUnit
import kotlin.system.exitProcess

private const val COMPUTE_ALL_TESTS_PREFIX = "compute_all_tests="
Expand All @@ -28,27 +29,28 @@ private const val MAX_TEST_COUNT_PER_SMALL_SHARD = 15
* Arguments:
* - path_to_directory_root: directory path to the root of the Oppia Android repository.
* - path_to_output_file: path to the file in which the affected test targets will be printed.
* - base_develop_branch_reference: the reference to the local develop branch that should be use.
* Generally, this is 'origin/develop'.
* - merge_base_commit: the base commit against which the local changes will be compared when
* determining which tests to run. When running outside of CI you can use the result of running:
* 'git merge-base develop HEAD'
* - compute_all_tests: whether to compute a list of all tests to run.
*
* Example:
* bazel run //scripts:compute_affected_tests -- $(pwd) /tmp/affected_test_buckets.proto64 \\
* origin/develop compute_all_tests=false
* abcdef0123456789 compute_all_tests=false
*/
fun main(args: Array<String>) {
if (args.size < 4) {
println(
"Usage: bazel run //scripts:compute_affected_tests --" +
" <path_to_directory_root> <path_to_output_file> <base_develop_branch_reference>" +
" <path_to_directory_root> <path_to_output_file> <merge_base_commit>" +
" <compute_all_tests=true/false>"
)
exitProcess(1)
}

val pathToRoot = args[0]
val pathToOutputFile = args[1]
val baseDevelopBranchReference = args[2]
val baseCommit = args[2]
val computeAllTestsSetting = args[3].let {
check(it.startsWith(COMPUTE_ALL_TESTS_PREFIX)) {
"Expected last argument to start with '$COMPUTE_ALL_TESTS_PREFIX'"
Expand All @@ -62,7 +64,7 @@ fun main(args: Array<String>) {
}
ScriptBackgroundCoroutineDispatcher().use { scriptBgDispatcher ->
ComputeAffectedTests(scriptBgDispatcher)
.compute(pathToRoot, pathToOutputFile, baseDevelopBranchReference, computeAllTestsSetting)
.compute(pathToRoot, pathToOutputFile, baseCommit, computeAllTestsSetting)
}
}

Expand All @@ -81,7 +83,10 @@ class ComputeAffectedTests(
val maxTestCountPerLargeShard: Int = MAX_TEST_COUNT_PER_LARGE_SHARD,
val maxTestCountPerMediumShard: Int = MAX_TEST_COUNT_PER_MEDIUM_SHARD,
val maxTestCountPerSmallShard: Int = MAX_TEST_COUNT_PER_SMALL_SHARD,
val commandExecutor: CommandExecutor = CommandExecutorImpl(scriptBgDispatcher)
val commandExecutor: CommandExecutor =
CommandExecutorImpl(
scriptBgDispatcher, processTimeout = 5, processTimeoutUnit = TimeUnit.MINUTES
)
) {
private companion object {
private const val GENERIC_TEST_BUCKET_NAME = "generic"
Expand All @@ -93,28 +98,28 @@ class ComputeAffectedTests(
* @param pathToRoot the absolute path to the working root directory
* @param pathToOutputFile the absolute path to the file in which the encoded Base64 test bucket
* protos should be printed
* @param baseDevelopBranchReference see [GitClient]
* @param baseCommit see [GitClient]
* @param computeAllTestsSetting whether all tests should be outputted versus only the ones which
* are affected by local changes in the repository
*/
fun compute(
pathToRoot: String,
pathToOutputFile: String,
baseDevelopBranchReference: String,
baseCommit: String,
computeAllTestsSetting: Boolean
) {
val rootDirectory = File(pathToRoot).absoluteFile
check(rootDirectory.isDirectory) { "Expected '$pathToRoot' to be a directory" }
check(rootDirectory.list().contains("WORKSPACE")) {
check(rootDirectory.list()?.contains("WORKSPACE") == true) {
"Expected script to be run from the workspace's root directory"
}

println("Running from directory root: $rootDirectory")
println("Running from directory root: $rootDirectory.")

val gitClient = GitClient(rootDirectory, baseDevelopBranchReference, commandExecutor)
val gitClient = GitClient(rootDirectory, baseCommit, commandExecutor)
val bazelClient = BazelClient(rootDirectory, commandExecutor)
println("Current branch: ${gitClient.currentBranch}")
println("Most recent common commit: ${gitClient.branchMergeBase}")
println("Current branch: ${gitClient.currentBranch}.")
println("Most recent common commit: ${gitClient.branchMergeBase}.")

val currentBranch = gitClient.currentBranch.toLowerCase(Locale.US)
val affectedTestTargets = if (computeAllTestsSetting || currentBranch == "develop") {
Expand All @@ -139,7 +144,7 @@ class ComputeAffectedTests(
}

private fun computeAllTestTargets(bazelClient: BazelClient): List<String> {
println("Computing all test targets")
println("Computing all test targets...")
return bazelClient.retrieveAllTestTargets()
}

Expand All @@ -153,38 +158,45 @@ class ComputeAffectedTests(
val changedFiles = gitClient.changedFiles.filter { filepath ->
File(rootDirectory, filepath).exists()
}.toSet()
println("Changed files (per Git): $changedFiles")
println("Changed files (per Git, ${changedFiles.size} total): $changedFiles.")

// Compute the changed targets 100 files at a time to avoid unnecessarily long-running Bazel
// commands.
val changedFileTargets =
changedFiles.chunked(size = 100).fold(initial = setOf<String>()) { allTargets, filesChunk ->
allTargets + bazelClient.retrieveBazelTargets(filesChunk).toSet()
}
println("Changed Bazel file targets: $changedFileTargets")
println("Changed Bazel file targets (${changedFileTargets.size} total): $changedFileTargets.")

// Similarly, compute the affect test targets list 100 file targets at a time.
val affectedTestTargets =
changedFileTargets.chunked(size = 100)
.fold(initial = setOf<String>()) { allTargets, targetChunk ->
allTargets + bazelClient.retrieveRelatedTestTargets(targetChunk).toSet()
}
println("Affected Bazel test targets: $affectedTestTargets")
println(
"Affected Bazel test targets (${affectedTestTargets.size} total): $affectedTestTargets."
)

// Compute the list of Bazel files that were changed.
val changedBazelFiles = changedFiles.filter { file ->
file.endsWith(".bzl", ignoreCase = true) ||
file.endsWith(".bazel", ignoreCase = true) ||
file == "WORKSPACE"
}
println("Changed Bazel-specific support files: $changedBazelFiles")
println(
"Changed Bazel-specific support files (${changedBazelFiles.size} total): $changedBazelFiles."
)

// Compute the list of affected tests based on BUILD/Bazel/WORKSPACE files. These are generally
// framed as: if a BUILD file changes, run all tests transitively connected to it.
val transitiveTestTargets = bazelClient.retrieveTransitiveTestTargets(changedBazelFiles)
println("Affected test targets due to transitive build deps: $transitiveTestTargets")
println(
"Affected test targets due to transitive build deps (${transitiveTestTargets.size} total):" +
" $transitiveTestTargets."
)

return (affectedTestTargets + transitiveTestTargets).toSet().toList()
return (affectedTestTargets + transitiveTestTargets).distinct()
}

private fun filterTargets(testTargets: List<String>): List<String> {
Expand Down Expand Up @@ -321,7 +333,7 @@ class ComputeAffectedTests(
private val EXTRACT_BUCKET_REGEX = "^//([^(/|:)]+?)[/:].+?\$".toRegex()

/** Returns the [TestBucket] that corresponds to the specific [testTarget]. */
fun retrieveCorrespondingTestBucket(testTarget: String): TestBucket? {
fun retrieveCorrespondingTestBucket(testTarget: String): TestBucket {
return EXTRACT_BUCKET_REGEX.matchEntire(testTarget)
?.groupValues
?.maybeSecond()
Expand Down
21 changes: 11 additions & 10 deletions scripts/src/java/org/oppia/android/scripts/common/BazelClient.kt
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,7 @@ import java.util.Locale
* Utility class to query & interact with a Bazel workspace on the local filesystem (residing within
* the specified root directory).
*/
class BazelClient(
private val rootDirectory: File,
private val commandExecutor: CommandExecutor
) {
class BazelClient(private val rootDirectory: File, private val commandExecutor: CommandExecutor) {
/** Returns all Bazel test targets in the workspace. */
fun retrieveAllTestTargets(): List<String> {
return correctPotentiallyBrokenTargetNames(
Expand Down Expand Up @@ -57,16 +54,19 @@ class BazelClient(
// Note that this check is needed since rbuildfiles() doesn't like taking an empty list.
return if (buildFileList.isNotEmpty()) {
val referencingBuildFiles =
executeBazelCommand(
"query",
runPotentiallyShardedQueryCommand(
"filter('^[^@]', rbuildfiles(%s))", // Use a filter to limit the search space.
buildFiles,
"--noshow_progress",
"--universe_scope=//...",
"--order_output=no",
"rbuildfiles($buildFileList)"
delimiter = ","
)
// Compute only test & library siblings for each individual build file. While this is both
// much slower than a fully combined query & can potentially miss targets, it runs
// substantially faster per query and helps to avoid potential hanging in CI.
// substantially faster per query and helps to avoid potential hanging in CI. Note also that
// this is more correct than a combined query since it ensures that siblings checks are
// properly unique for each file being considered (vs. searching for common siblings).
val relevantSiblings = referencingBuildFiles.flatMap { buildFileTarget ->
retrieveFilteredSiblings(filterRuleType = "test", buildFileTarget) +
retrieveFilteredSiblings(filterRuleType = "android_library", buildFileTarget)
Expand All @@ -77,7 +77,7 @@ class BazelClient(
relevantSiblings,
"--noshow_progress",
"--universe_scope=//...",
"--order_output=no",
"--order_output=no"
)
)
} else listOf()
Expand Down Expand Up @@ -142,6 +142,7 @@ class BazelClient(
queryFormatStr: String,
values: Iterable<String>,
vararg prefixArgs: String,
delimiter: String = " ",
allowPartialFailures: Boolean = false
): List<String> {
// Split up values into partitions to ensure that the argument calls don't over-run the limit.
Expand All @@ -154,7 +155,7 @@ class BazelClient(

// Fragment the query across the partitions to ensure all values can be considered.
return partitions.flatMap { partition ->
val lastArgument = queryFormatStr.format(Locale.US, partition.joinToString(" "))
val lastArgument = queryFormatStr.format(Locale.US, partition.joinToString(delimiter))
val allArguments = prefixArgs.toList() + lastArgument
executeBazelCommand(
"query", *allArguments.toTypedArray(), allowPartialFailures = allowPartialFailures
Expand Down
Loading

0 comments on commit 24b1721

Please sign in to comment.