-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for redis memory check failure after link flap and also sometimes cpu usage high failure #15732
Conversation
… cpu usage high failure
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me, thanks!
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
@wumiaont PR conflicts with 202405 branch |
… cpu usage high failure (sonic-net#15732) Description of PR Redis memory check result is not a stable value using "redis-cli info memory | grep used_memory_human". It's found on a stable system (BGO converged, no port flapping etc), the above check could have memory usage difference by more than 0.2M. Followings are CLI output from 202405 and 202205. 202405: admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.64 admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.74 admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.52 202205: admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.02 admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.26 admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.14 We can see that 202405 has some memory optimization for redis and it's not using as much memory as 202205. 0.2M memory usage difference could easily reach the memory usage threshold of 5% in 202405. Solution is to get the average redis memory usage before and after link flap. using 5 seconds interval and 5 times query and then get the average memory usage for redis. Also make the threshold to 10% from 5%. With this fix it's found that the redis memory check will not fail for 2405 after link flap. This commit also provide a fix for sometimes CPU utilization check failed for orchagent after link flap. The reason is in scaling setup (34k routes) orchagent takes more time to calm down. Summary: Fixes # (issue) sonic-net#15733 Approach What is the motivation for this PR? Fix test failures How did you verify/test it? OC tests run with the fix. Did not see the test failed. co-authorized by: jianquanye@microsoft.com
… cpu usage high failure (sonic-net#15732) Description of PR Redis memory check result is not a stable value using "redis-cli info memory | grep used_memory_human". It's found on a stable system (BGO converged, no port flapping etc), the above check could have memory usage difference by more than 0.2M. Followings are CLI output from 202405 and 202205. 202405: admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.64 admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.74 admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.52 202205: admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.02 admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.26 admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.14 We can see that 202405 has some memory optimization for redis and it's not using as much memory as 202205. 0.2M memory usage difference could easily reach the memory usage threshold of 5% in 202405. Solution is to get the average redis memory usage before and after link flap. using 5 seconds interval and 5 times query and then get the average memory usage for redis. Also make the threshold to 10% from 5%. With this fix it's found that the redis memory check will not fail for 2405 after link flap. This commit also provide a fix for sometimes CPU utilization check failed for orchagent after link flap. The reason is in scaling setup (34k routes) orchagent takes more time to calm down. Summary: Fixes # (issue) sonic-net#15733 Approach What is the motivation for this PR? Fix test failures How did you verify/test it? OC tests run with the fix. Did not see the test failed. co-authorized by: jianquanye@microsoft.com
#15955) * Fix for redis memory check failure after link flap and also sometimes cpu usage high failure (#15732) Description of PR Redis memory check result is not a stable value using "redis-cli info memory | grep used_memory_human". It's found on a stable system (BGO converged, no port flapping etc), the above check could have memory usage difference by more than 0.2M. Followings are CLI output from 202405 and 202205. 202405: admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.64 admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.74 admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 2.52 202205: admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.02 admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.26 admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/' 6.14 We can see that 202405 has some memory optimization for redis and it's not using as much memory as 202205. 0.2M memory usage difference could easily reach the memory usage threshold of 5% in 202405. Solution is to get the average redis memory usage before and after link flap. using 5 seconds interval and 5 times query and then get the average memory usage for redis. Also make the threshold to 10% from 5%. With this fix it's found that the redis memory check will not fail for 2405 after link flap. This commit also provide a fix for sometimes CPU utilization check failed for orchagent after link flap. The reason is in scaling setup (34k routes) orchagent takes more time to calm down. Summary: Fixes # (issue) #15733 Approach What is the motivation for this PR? Fix test failures How did you verify/test it? OC tests run with the fix. Did not see the test failed. co-authorized by: jianquanye@microsoft.com * Fix syntax issue --------- Co-authored-by: wumiao_nokia <wu.miao@nokia.com>
Description of PR
Redis memory check result is not a stable value using "redis-cli info memory | grep used_memory_human". It's found on a stable system (BGO converged, no port flapping etc), the above check could have memory usage difference by more than 0.2M.
Followings are CLI output from 202405 and 202205.
202405:
admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
2.64
admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
2.74
admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
2.52
202205:
admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
6.02
admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
6.26
admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
6.14
We can see that 202405 has some memory optimization for redis and it's not using as much memory as 202205. 0.2M memory usage difference could easily reach the memory usage threshold of 5% in 202405.
Solution is to get the average redis memory usage before and after link flap. using 5 seconds interval and 5 times query and then get the average memory usage for redis. Also make the threshold to 10% from 5%. With this fix it's found that the redis memory check will not fail for 2405 after link flap.
This commit also provide a fix for sometimes CPU utilization check failed for orchagent after link flap. The reason is in scaling setup (34k routes) orchagent takes more time to calm down.
Summary:
Fixes # (issue) #15733
Type of change
Back port request
Approach
What is the motivation for this PR?
Fix test failures
How did you verify/test it?
OC tests run with the fix. Did not see the test failed.