Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
4383: Allow slight tolerance for restart health check r=rafal-ch a=rafal-ch This PR makes the NCTL health checks more relaxed when it comes to validating the number of restarts. This is to guard against flakiness, i.e. when the test finishes correctly and all assertions hold (for example: network is correctly upgraded), but there were slightly more node restarts during the process than expected. For example, assuming that there are 10 restarts allowed, NCTL will: * report error if there are more than 10+50%=15 restarts: ``` NCTL :: Adjusted restarts allowed: 15 NCTL :: ERROR: ALLOWED: 15 < TOTAL: 16 ``` * warn if there are more than 10, but less than 10+50%=15 restarts: ``` NCTL :: WARN: Test would fail without allowed restart adjustment NCTL :: SUCCESS: ALLOWED: 15 = TOTAL: 13 ``` * finish successfully otherwise Closes #4360 Co-authored-by: Rafał Chabowski <rafal@casperlabs.io>
- Loading branch information