-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/UCP: Add linear alltoall and allgather algorithms based on xgvmi ucp get #992
base: master
Are you sure you want to change the base?
Conversation
3b8cbf2
to
ecbd9e1
Compare
ecbd9e1
to
a362467
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went over the code with Nick, LGTM
I don't think we should do this now, but these algorithms, including sliding-window AR will not need the allgather in the init function when #909 is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, thanks! I still need to go voer the main file tl_ucp_dpu_offload.c
.
Just a first round of minor review in the meantime
|
||
req_param.op_attr_mask |= UCP_OP_ATTR_FIELD_MEMH; | ||
|
||
for (i = *posted; i < host_team_size; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that posted is not 0 when entering this function?
Would it make sense to put this first loop in a "start" function instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first entry *posted
will be 0, but after that it will be equal to host_team_size
. It's possible to make a standalone start function, but I thought it would be less code to reuse ucc_tl_ucp_dpu_xgvmi_rdma_task_post
for both alltoall and allgather, even though it isn't that long (10 lines)
|
||
ucp_worker_progress(tl_ctx->worker.ucp_worker); | ||
|
||
for (i = *completed; i < *posted; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here *posted
is necessarily equal to host_team_size
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, when it gets to this point all the gets will be posted. I think *posted
might be clearer, just because we want *posted == *completed
at the end. What do you think?
Can you update the tests as well? |
8e17a30
to
419a128
Compare
I updated the gtests to test the linear allgather/alltoall |
873b576
to
ea42acf
Compare
Co-authored-by: samnordmann <snordmann@nvidia.com>
ea42acf
to
925c3bb
Compare
Can one of the admins verify this patch? |
This PR is a follow up to allreduce sliding window. It adds linear alltoall and allgather algorithms based on XGVMI. They will post ucp gets from host to host in a round robin fashion.