[action] [PR:15536] fix 7260 headroom pool watermark test failure #15555

mssonicbld · 2024-11-14T05:59:10Z

Description of PR

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

Approach

What is the motivation for this PR?

observed consistent headroom wartermark test failure on 7260
and it's known issue of test script, as below RCA:

RCA:

summarize test step first:

PTF send lots of pkt to multiple src ports to fill multiple PG's share buffer
PTF send one or a few pkts to multiple src ports to trigger pfc on multiple PG
check watermark before test headroom's watermark
PTF send pkt to multiple src port to consum headroom pool, and test if watermark changes as expected
after step2,

already send 20 pkts into headroom to trigger PFC on 10 src ports (20 PG)
but, so far, "upper_bound" value is static hardcode "2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + 1", didn't consider headroom pool consumption in step2.
since we use dynamically threshold calculating, it can get accurate threshold value, we pretty sure the headroom pool consumption equal "pgs_num" in step2.
so I change "upper_bound" value to "2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + self.pgs_num", and it pass the tests.

How did you do it?

change "upper_bound" value to "2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + self.pgs_num"

How did you verify/test it?

this change already verified in MSFT nightly for 202305 and 202311 branch, just commit to github.

Any platform specific information?

this change is dedicated to below platform and topology:

 if (hwsku == 'Arista-7260CX3-D108C8' and self.testbed_type in ('t0-116', 'dualtor-120')) \
 or (hwsku == 'Arista-7260CX3-C64' and self.testbed_type in ('dualtor-aa-56', 't1-64-lag')):
 upper_bound = 2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + self.pgs_num

if other platform and topology have hit similar issue, can add affected platform and topo to above condition checking.
Note:
for generic fix, qos refactor project will covert.

Supported testbed topology if it's a new test case?

Documentation

What is the motivation for this PR? observed consistent headroom wartermark test failure on 7260 and it's known issue of test script, as below RCA: RCA: summarize test step first: PTF send lots of pkt to multiple src ports to fill multiple PG's share buffer PTF send one or a few pkts to multiple src ports to trigger pfc on multiple PG check watermark before test headroom's watermark PTF send pkt to multiple src port to consum headroom pool, and test if watermark changes as expected after step2, already send 20 pkts into headroom to trigger PFC on 10 src ports (20 PG) but, so far, "upper_bound" value is static hardcode "2 * margin + 1", didn't consider headroom pool consumption in step2. since we use dynamically threshold calculating, it can get accurate threshold value, we pretty sure the headroom pool consumption equal "pgs_num" in step2. so I change "upper_bound" value to "2 * margin + self.pgs_num", and it pass the tests. How did you do it? change "upper_bound" value to "2 * margin + self.pgs_num" How did you verify/test it? this change already verified in MSFT nightly for 202305 and 202311 branch, just commit to github. Any platform specific information? this change is dedicated to below platform and topology: if (hwsku == 'Arista-7260CX3-D108C8' and self.testbed_type in ('t0-116', 'dualtor-120')) \ or (hwsku == 'Arista-7260CX3-C64' and self.testbed_type in ('dualtor-aa-56', 't1-64-lag')): upper_bound = 2 * margin + self.pgs_num if other platform and topology have hit similar issue, can add affected platform and topo to above condition checking. Note: for generic fix, qos refactor project will covert.

mssonicbld · 2024-11-14T05:59:14Z

Original PR: #15536

XuChen-MSFT

Approved

mssonicbld added the automerge label Nov 14, 2024

mssonicbld mentioned this pull request Nov 14, 2024

fix 7260 headroom pool watermark test failure #15536

Merged

8 tasks

XuChen-MSFT approved these changes Nov 14, 2024

View reviewed changes

mssonicbld merged commit 94a0751 into sonic-net:202405 Nov 14, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[action] [PR:15536] fix 7260 headroom pool watermark test failure #15555

[action] [PR:15536] fix 7260 headroom pool watermark test failure #15555

mssonicbld commented Nov 14, 2024

mssonicbld commented Nov 14, 2024

XuChen-MSFT left a comment

[action] [PR:15536] fix 7260 headroom pool watermark test failure #15555

[action] [PR:15536] fix 7260 headroom pool watermark test failure #15555

Conversation

mssonicbld commented Nov 14, 2024

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

mssonicbld commented Nov 14, 2024

XuChen-MSFT left a comment

Choose a reason for hiding this comment