Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[action] [PR:15536] fix 7260 headroom pool watermark test failure #15555

Merged
merged 1 commit into from
Nov 14, 2024

Conversation

mssonicbld
Copy link
Collaborator

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

observed consistent headroom wartermark test failure on 7260
and it's known issue of test script, as below RCA:

RCA:

summarize test step first:

  1. PTF send lots of pkt to multiple src ports to fill multiple PG's share buffer
  2. PTF send one or a few pkts to multiple src ports to trigger pfc on multiple PG
  3. check watermark before test headroom's watermark
  4. PTF send pkt to multiple src port to consum headroom pool, and test if watermark changes as expected
    after step2,

already send 20 pkts into headroom to trigger PFC on 10 src ports (20 PG)
but, so far, "upper_bound" value is static hardcode "2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + 1", didn't consider headroom pool consumption in step2.
since we use dynamically threshold calculating, it can get accurate threshold value, we pretty sure the headroom pool consumption equal "pgs_num" in step2.
so I change "upper_bound" value to "2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + self.pgs_num", and it pass the tests.

How did you do it?

change "upper_bound" value to "2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + self.pgs_num"

How did you verify/test it?

this change already verified in MSFT nightly for 202305 and 202311 branch, just commit to github.

Any platform specific information?

this change is dedicated to below platform and topology:

 if (hwsku == 'Arista-7260CX3-D108C8' and self.testbed_type in ('t0-116', 'dualtor-120')) \
 or (hwsku == 'Arista-7260CX3-C64' and self.testbed_type in ('dualtor-aa-56', 't1-64-lag')):
 upper_bound = 2 LICENSE README.md SECURITY.md ansible azure-pipelines.yml docs git pyproject.toml sdn_tests setup-container.sh spytest test_reporting tests margin + self.pgs_num

if other platform and topology have hit similar issue, can add affected platform and topo to above condition checking.
Note:
for generic fix, qos refactor project will covert.

Supported testbed topology if it's a new test case?

Documentation

What is the motivation for this PR?
observed consistent headroom wartermark test failure on 7260
and it's known issue of test script, as below RCA:

RCA:

summarize test step first:

PTF send lots of pkt to multiple src ports to fill multiple PG's share buffer
PTF send one or a few pkts to multiple src ports to trigger pfc on multiple PG
check watermark before test headroom's watermark
PTF send pkt to multiple src port to consum headroom pool, and test if watermark changes as expected
after step2,
already send 20 pkts into headroom to trigger PFC on 10 src ports (20 PG)
but, so far, "upper_bound" value is static hardcode "2 * margin + 1", didn't consider headroom pool consumption in step2.
since we use dynamically threshold calculating, it can get accurate threshold value, we pretty sure the headroom pool consumption equal "pgs_num" in step2.
so I change "upper_bound" value to "2 * margin + self.pgs_num", and it pass the tests.

How did you do it?
change "upper_bound" value to "2 * margin + self.pgs_num"

How did you verify/test it?
this change already verified in MSFT nightly for 202305 and 202311 branch, just commit to github.

Any platform specific information?
this change is dedicated to below platform and topology:

            if (hwsku == 'Arista-7260CX3-D108C8' and self.testbed_type in ('t0-116', 'dualtor-120')) \
                or (hwsku == 'Arista-7260CX3-C64' and self.testbed_type in ('dualtor-aa-56', 't1-64-lag')):
                upper_bound = 2 * margin + self.pgs_num
if other platform and topology have hit similar issue, can add affected platform and topo to above condition checking.
Note:
for generic fix, qos refactor project will covert.
@mssonicbld
Copy link
Collaborator Author

Original PR: #15536

Copy link
Contributor

@XuChen-MSFT XuChen-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@mssonicbld mssonicbld merged commit 94a0751 into sonic-net:202405 Nov 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants