Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/MLX5: one-sided mcast reliability init #980

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

MamziB
Copy link
Collaborator

@MamziB MamziB commented May 28, 2024

TL/MLX5: one-sided mcast reliability init that will be used for mcast-based allgather

@MamziB MamziB force-pushed the mamzi/mcast-allgather-1 branch from fc9aed7 to fad5590 Compare May 29, 2024 17:45
@MamziB MamziB requested review from janjust and manjugv May 29, 2024 18:10
@MamziB MamziB self-assigned this May 29, 2024
@artemry-nv
Copy link
Collaborator

bot:retest

1 similar comment
@janjust
Copy link
Collaborator

janjust commented May 31, 2024

bot:retest

@janjust janjust requested a review from nsarka May 31, 2024 17:58
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide more context and explanation about the PR? It is a big PR so it is challenging to review without context and walkthrough.

I see different new options in ucc_tl_mlx5_mcast_coll_comm_init_spec_t, like enable_truly_zero_copy_pipelined_allgather, one_sided_reliability_enabled. Can you explain what is the role of them and why they cannot be treated in different PRs?

Also, can you add tests for those new features?

src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Outdated Show resolved Hide resolved
@MamziB MamziB force-pushed the mamzi/mcast-allgather-1 branch from fad5590 to a2798fd Compare June 5, 2024 05:28
@MamziB
Copy link
Collaborator Author

MamziB commented Jun 5, 2024

@nsarka @janjust @samnordmann Thanks everyone for the constructive comments. I have pushed all the requested changes. Please take a look and feel free to hit the resolve button if they look good.

@MamziB MamziB force-pushed the mamzi/mcast-allgather-1 branch from a2798fd to e1be369 Compare June 5, 2024 19:05
@MamziB
Copy link
Collaborator Author

MamziB commented Jun 6, 2024

/* below data structures are used in async design only */

@samnordmann These options are for algo selection and check if one-sided is enabled. Let me remove the options that are not necessary for this PR.

@MamziB
Copy link
Collaborator Author

MamziB commented Jun 6, 2024

@samnordmann Thanks for the comments. I added a new commit (a separate commit) with all the new requested changes. It will be easier this way to track what has changed, at the end I will squash all the commits into a single one.

@MamziB MamziB force-pushed the mamzi/mcast-allgather-1 branch from 6598665 to b8280d1 Compare June 6, 2024 19:41
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing the comments!

Copy link
Collaborator

@janjust janjust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for addressing my comments

@MamziB MamziB force-pushed the mamzi/mcast-allgather-1 branch from b8280d1 to c71f4cf Compare June 11, 2024 16:34
@MamziB
Copy link
Collaborator Author

MamziB commented Jun 11, 2024

@Sergei-Lebedev Thanks Sergey for the comments. I have resolved all of them.

@Sergei-Lebedev
Copy link
Contributor

@MamziB pls rebase

@MamziB MamziB force-pushed the mamzi/mcast-allgather-1 branch from c71f4cf to ae5f6f8 Compare June 23, 2024 17:39
@MamziB
Copy link
Collaborator Author

MamziB commented Jun 23, 2024

@Sergei-Lebedev Thanks for the comments, I have rebased it on top of latest master.

@Sergei-Lebedev Sergei-Lebedev merged commit ba203ab into openucx:master Jun 24, 2024
11 checks passed
MamziB added a commit to MamziB/ucc-forked that referenced this pull request Jun 25, 2024
MamziB added a commit to MamziB/ucc-forked that referenced this pull request Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants