Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/UCP: select bigger knomial radix for ppn1 #936

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Use min raidx=3 for ppn1 teams. Such radix improves performance providing more network parallelism.

# UCC with patch
# OSU MPI-CUDA Broadcast Latency Test v7.3
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)   Min Latency(us)   Max Latency(us)  Iterations
1                       5.97              3.14              8.17        1000
2                       5.89              3.13              8.05        1000
4                       5.93              3.26              8.07        1000
8                       5.88              3.18              8.00        1000
16                      5.89              3.19              7.97        1000
32                      6.00              3.30              8.09        1000
64                      6.62              3.83              8.76        1000
128                     6.67              3.69              8.84        1000
256                     6.34              3.88              8.28        1000
512                     6.34              3.62              8.36        1000
1024                    6.41              4.06              8.48        1000
2048                    6.56              3.87              8.74        1000
4096                    6.92              4.24              9.34        1000
8192                    7.64              4.67             10.12        1000
16384                  14.95              9.16             19.87         100
32768                  18.88             14.46             21.59         100
65536                  28.83             25.68             32.49         100
131072                 39.65             34.42             46.40         100
262144                 50.26             41.51             58.21         100
524288                 75.56             61.76             85.24         100
1048576               123.21             98.07            143.94         100

# UCC without patch
# OSU MPI-CUDA Broadcast Latency Test v7.3
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)   Min Latency(us)   Max Latency(us)  Iterations
1                       6.21              3.58              8.65        1000
2                       6.13              3.51              8.54        1000
4                       6.10              3.41              8.55        1000
8                       6.12              3.42              8.54        1000
16                      6.12              3.50              8.52        1000
32                      6.20              3.56              8.62        1000
64                      6.95              4.30              9.45        1000
128                     6.98              4.04              9.57        1000
256                     6.91              4.68              9.10        1000
512                     7.12              4.67              9.37        1000
1024                    6.93              4.78              9.03        1000
2048                    6.97              4.31              9.44        1000
4096                    7.26              4.80              9.79        1000
8192                    8.23              5.49             11.00        1000
16384                  14.71              8.99             19.75         100
32768                  32.50             25.91             37.50         100
65536                  41.15             32.58             46.29         100
131072                 51.94             40.57             59.31         100
262144                 68.33             53.42             77.67         100
524288                 91.97             72.29            102.24         100
1048576               148.00            111.05            167.37         100

src/components/tl/ucp/tl_ucp_team.c Outdated Show resolved Hide resolved
src/components/tl/ucp/tl_ucp_team.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright

@Sergei-Lebedev Sergei-Lebedev enabled auto-merge (squash) March 15, 2024 12:07
@Sergei-Lebedev Sergei-Lebedev merged commit 0d68445 into openucx:master Mar 18, 2024
11 checks passed
@Sergei-Lebedev Sergei-Lebedev deleted the topic/kn_radix_ppn1 branch March 18, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants