-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/WIREUP: Support EP reconfiguration for non wired-up scenarios #10337
base: master
Are you sure you want to change the base?
Conversation
src/ucp/core/ucp_ep.c
Outdated
for (lane = 0; lane < ucp_ep_num_lanes(ep); lane++) { | ||
if (ucp_wireup_ep_test(ucp_ep_get_lane(ep, lane))) { | ||
return lane; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to save wireup_ep
in worker, and then access it in O(1)?
continue; | ||
} | ||
|
||
if (dev_count++ >= max_num_devs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need dev_count
? lane
has the same values here
6194b9a
to
05a18e8
Compare
7261edc
to
3eb6604
Compare
src/ucp/wireup/wireup.c
Outdated
send_reply = /* Always send the reply in case of CM, the client's EP has to | ||
* be marked as REMOTE_CONNECTED */ | ||
has_cm_lane || (msg->dst_ep_id == UCS_PTR_MAP_KEY_INVALID) || | ||
ucp_ep_config(ep)->p2p_lanes; | ||
has_cm_lane || (msg->dst_ep_id == UCS_PTR_MAP_KEY_INVALID) || | ||
ucp_ep_config(ep)->p2p_lanes || is_am_replaced; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a comment for the new case
src/ucp/wireup/wireup.c
Outdated
@@ -720,7 +723,7 @@ ucp_wireup_process_request(ucp_worker_h worker, ucp_ep_h ep, | |||
/* don't mark as connected to remote now in case of CM, since it destroys | |||
* CM wireup EP (if it is hidden in the CM lane) that is used for sending | |||
* WIREUP MSGs */ | |||
if (!ucp_ep_config(ep)->p2p_lanes && !has_cm_lane) { | |||
if (!ucp_ep_config(ep)->p2p_lanes && !has_cm_lane && !is_am_replaced) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a comment
src/ucp/wireup/wireup.c
Outdated
return !ucp_ep_has_cm_lane(ep) && (ep->am_lane != UCP_NULL_LANE) && | ||
(reuse_lane_map[ep->am_lane] == UCP_NULL_LANE) && | ||
!ucp_wireup_ep_test(ucp_ep_get_lane(ep, ep->am_lane)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- why do we check cached
am_lane
lane value but not config key? - pls comment every statememt why is it in the condition
src/ucp/core/ucp_ep.c
Outdated
ucp_lane_index_t ucp_ep_config_find_match_lane(const ucp_ep_config_key_t *key1, | ||
ucp_lane_index_t lane1, | ||
const ucp_ep_config_key_t *key2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe invert:
ucp_lane_index_t ucp_ep_config_find_match_lane(const ucp_ep_config_key_t *key1,
const ucp_ep_config_key_t *key2,
ucp_lane_index_t lane2,
)
to align with ucp_ep_config_lane_is_equal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion also assumed inverting an implementation - to find the lane in key1 which equals to lane2 in key2, otherwise mixing of the arguments in usage is not clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it will be clearer because currently key1 corresponds to old_key and key2 corresponds to new_key
src/ucp/wireup/wireup.c
Outdated
} | ||
} | ||
|
||
return ucp_ep_num_lanes(ep) != num_lanes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this check before the loop
@@ -147,7 +149,7 @@ class test_ucp_ep_reconfig : public ucp_test { | |||
static const unsigned num_iterations = 1000; | |||
/* TODO: remove this when 100MB asan bug is solved */ | |||
#ifdef __SANITIZE_ADDRESS__ | |||
static const size_t msg_sizes[] = {8, 1024, 16384, 65536}; | |||
static const size_t msg_sizes[] = {8, 1024, 16384, 32768}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why reduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a bug in rcache when data size is larger then ~70MB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls update the TODO above
num_shm_lanes(); | ||
} | ||
|
||
unsigned test_ucp_ep_reconfig::entity::num_shm_lanes() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that num devices always equals to num lanes, pls rename the function and comment the assumption on usage
|
||
void init() override | ||
{ | ||
static const std::string ib_tls[] = {"rc_mlx5", "dc_mlx5", "rc_verbs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with modern C++ you can use std::vector then usage of std::any looks simpler
btw, mybe has_any_transport
can be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_any_transport
does not help for usecase of shm,ib
because it will return true for IB even if there's only SHM on the setup
UCS_TEST_SKIP_COND_P(test_reconfig_asymmetric, request_reset, | ||
has_transport("shm"), "PROTO_REQUEST_RESET=y") | ||
!has_transport("ib"), "PROTO_REQUEST_RESET=y") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new condition reduces test coverage, why?
src/ucp/wireup/wireup.c
Outdated
/* TODO: 2) Support reconfiguration for separated wireup and AM lanes | ||
* during wireup process (request sent). */ | ||
return !(ep->flags & UCP_EP_FLAG_CONNECT_REQ_QUEUED) || | ||
(ep->am_lane == wireup_lane) || | ||
!ucp_wireup_is_am_lane_replaced(ep, reuse_lane_map); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, from my understanding this PR firstly has the deal with wireup_msg_lane, not an am_lane, then need to rename all related things, like is_am_replaced
-> is_wireup_msg_replaced
, use corresponding lane access wrapper, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it's also deals with different AM lane and wireup lane, provided that wireup process was completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code I see changes related to ep->am_lane
which is equals to wireup_msg_lane
in most particular configurations. And this case is implemented in this PR but common case is in TODO (2)
pls add a summary to the PR, probably my understanding of the change is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it common case?
What?
UCP/WIREUP: Support EP reconfiguration for non wired-up scenarios
Why?
This PR introduces 2 additional usecases supported: