Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch and log exceptions from failed links #8

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shurickdaryin
Copy link

I propose to add simple exception handlers for failed links, so that discovery can finish in their presence.

@jgunthorpe
Copy link
Owner

Thank you for the patch,

What scenarios were you able to use this in?

There are many reasons a SMP send during discovery could fail, this seems to deal with the forward direction failing - is that because a SMA is non-responsive or similar?

I would think the most common reason would be a change in the already discovered region - ie a link going down?

@shurickdaryin
Copy link
Author

This helps in two cases observed in our practice:

  1. an active link is faulty: the switch port is up, but cannot transmit anything;
  2. a device is non-responsive: its ports are up, but do not respond to MADs.

In both cases python-rdma's discovery does not finish due to exceptions. On the contrary, standard tools like ibnetdiscover do finish while reporting observed errors. Results of discovery with proposed patch are consisted with those of ibnetdiscover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants