UCP/PERF: Use final sync up for all UCP ucx_perftest tests #10425
+1
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What?
Users complain on too optimistic performance reports provided by some ucx_perftest tests (e.g. ucp_put_bw), especially visible on huge messages and low iteration count. The root cause is the absence of synchronisation between client and server at the end of the test execution. This synchronisation was added (#10310) only for some tests: (UCX_PERF_CMD_TAG, UCX_PERF_CMD_TAG_SYNC, UCX_PERF_CMD_AM). It should be extended to others
Why?
There is a discrepancy between client and server final perf result, which is especially visible with pipeline protocols, but it occurs also with many others. On the client side we measure time needed to send all the messages, and with pipeline protocol we essentially measure the time needed to send message to the remote bounce buffer. With default window size of 32 we see an initial performance spike on the client side, but with larger amount of iterations the client and server results are converging to the same value due to implicit synchronisation between send/recv.
Workaround: increase iteration count (-n 1000) or decrease window size (-O 1)
How?
The fix is to send ack message from receiver to the sender at the end of processing, to confirm that all messages were received. For now we only fix the final report, while intermediate reports may still have a discrepancy.
Tested manually by running all UCP STREAM_UNI tests with sync up message