-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TRANSCIEVER_INFO table for ports that doesn't exist in Config DB #556
Conversation
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
@@ -2178,6 +2178,15 @@ def init(self): | |||
|
|||
self.log_notice("XCVRD INIT: After port config is done") | |||
port_mapping_data = port_event_helper.get_port_mapping(self.namespaces) | |||
# remove ports from TRANSCEIVER_INFO table, if they don't exist in CONFIG DB | |||
logical_ports_list = port_mapping_data.logical_port_list | |||
asic_id = multi_asic.get_asic_index_from_namespace(self.namespaces) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is self.namesapces a list? Maybe we need a loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Fixed
… table is changed" This reverts commit 8cdc32f.
aa0135a
to
7f9e669
Compare
@noaOrMlnx How did we land up in this situation where TRANSCEIVER_INFO table exists for the ports NOT in CONFIG_DB ? |
@noaOrMlnx What crash are we talking about? backtrace please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noaOrMlnx Can you please explain why on_remove_logical_port
is not taking care of deleting the TRANSCEIEVR_INFO
table for the relevant ports?
@prgeor please check following code - sonic-platform-daemons/sonic-xcvrd/xcvrd/xcvrd.py Line 2199 in 0cb3644
MSFT didn't want 'config reload' to remove TRANSCEIVER_INFO table, as it retriggers host tx ready setting. |
@mihirpat1 |
@mihirpat1 please let me know of any further questions otherwise please approve so i can merge. |
@noaOrMlnx Can you please help in addressing #556 (comment) Also, can you please add the traceback of crash to the description of the PR? |
@mihirpat1 @prgeor all comments were addressed |
@noaOrMlnx Please help in fixing the build failure |
@mihirpat1 Done |
@noaOrMlnx what do you mean by "If user removes port from config DB" ? What is this usecase? |
@noaOrMlnx please share the steps to reproduce this issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noaOrMlnx as discussed, the reason to NOT delete TRANSCIEVER_INFO
table during Xcvrd deinit was to prevent link flap on Mellanox platform because on mellanox platform as per the design TRANSCIEVER_INFO
table update is considered as optics insertion/deletion and OA uses this event to reinitialize the port. To prevent link flap during Xcvrd restart/crash on Mellanox platform we had to take this approach.
To fix the present issue which is to reconfigure the complete switch with new configuration, please delete all of the following tables during config reload command so that Xcvrd populates them as per new configuration push:-
1. TRANSCEIVER_INFO_TABLE = 'TRANSCEIVER_INFO'
2. TRANSCEIVER_FIRMWARE_INFO_TABLE = 'TRANSCEIVER_FIRMWARE_INFO'
3. TRANSCEIVER_DOM_SENSOR_TABLE = 'TRANSCEIVER_DOM_SENSOR'
4. TRANSCEIVER_DOM_THRESHOLD_TABLE = 'TRANSCEIVER_DOM_THRESHOLD'
5. TRANSCEIVER_STATUS_TABLE = 'TRANSCEIVER_STATUS'
6. TRANSCEIVER_PM_TABLE = 'TRANSCEIVER_PM'
@noaOrMlnx closing as not needed. |
Description
This change
sonic-platform-daemons/sonic-xcvrd/xcvrd/xcvrd.py
Line 2199 in 0cb3644
When TRANSCEIVER_INFO table is not deleted from State DB, we have old ports there.
if we do split to 8 in SPC4, we must deleted the adjacent port. so we have all old ports in TRANSCIVER_INFO but not in Config DB.
To not cause any crash when trying to search for non-exist tables and ports, we should delete all ports which are not in Config DB from TRANSCEIVER_INFO table.
example to a possible crash -
Motivation and Context
Motivation is to be able to use cmis host mgmt feature in case we split to 8 and delete some ports from Config DB.
How Has This Been Tested?
with a split to 8 port
Additional Information (Optional)