use informers for pod events instead of Listing #2178

hakuna-matatah · 2022-11-03T17:57:41Z

What type of PR is this?

/kind bug

/kind failing-test

What this PR does / why we need it:

It's a quick fix to help calculate pod_startup_latencies effectively for large clusters and not worry about running into Apiserver side ttl issues nor worry about events being expired after 1h for larger clusters.

Which issue(s) this PR fixes:

Fixes #

It fixes these issues

Pod startup latency calculation phases are not efficient and may result in error for larger clusters in ClusterLoader2 #2176
Pod startup time phases are inaccurate in longer tests. #2006

Special notes for your reviewer:

linux-foundation-easycla · 2022-11-03T17:57:44Z

The committers listed above are authorized under a signed CLA.

✅ login: hakuna-matatah / name: Harish K (09a0c41, f33b0ba)

k8s-ci-robot · 2022-11-03T17:57:50Z

Hi @hakuna-matatah. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hakuna-matatah · 2022-11-03T18:28:01Z

/easycla

mengqiy · 2022-11-03T20:19:47Z

/ok-to-test

mengqiy · 2022-11-03T20:23:35Z

@hakuna-matatah Your commit is not associated with your account properly. I guess you need to amend it and push again.

hakuna-matatah · 2022-11-03T22:19:33Z

/easycla

mborsz · 2022-11-04T08:12:09Z

/assign @tosi3k

tosi3k · 2022-11-04T09:57:48Z

clusterloader2/pkg/measurement/common/slos/pod_startup_latency.go

@@ -91,7 +94,7 @@ func (p *podStartupLatencyMeasurement) Execute(config *measurement.Config) ([]me
 	if err != nil {
 		return nil, err
 	}
-
+	schedulerName, err := util.GetStringOrDefault(config.Params, "schedulerName", defaultSchedulerName)


Please copy the error checking here from under the case "gather" as well.

tosi3k · 2022-11-04T11:09:11Z

clusterloader2/pkg/measurement/common/slos/pod_startup_latency.go

+
+	p.stopSchedCh = make(chan struct{})
+
+	e := informer.NewInformer(


Could we use Controller, a slightly lower-lever primitive than Informer and pass a large (e.g. 10k) WatchListPageSize in the Config?

The reason I'm asking this is that in large clusters there's a tendency to have O(hundreds of thousands) events and listing them using default page size (500) may result in informer's initial list getting timed out.

listing them using default page size (500) may result in informer's initial list getting timed out.

Oh! how will it timeout IIUC ? I have ensured we are not relying on client side timeout defined here for this use-case, instead I'm calling directly Run method here . Am i misinterpreting what you are trying to imply here ?

Ah, I see... we deliberately set the timeout in our wrappers around informers to make sure that the initial list (the one responsible for the cache's sync) completes in a reasonable time. In a clusters with O(xxx k) events this will take ages (O(a few minutes)) if using the default page size and because of that possibly run into the "too old resource version" during the initial list. I'd strongly suggest using the larger page size and hence the Controller primitive for the List+Watch pattern for events like we do in https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/measurement/common/loadbalancer_nodesync_latency.go.

CC @mborsz for his thoughts as I'll be OOO for the rest of the week.

k8s-ci-robot · 2023-07-01T12:55:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hakuna-matatah
Once this PR has been reviewed and has the lgtm label, please ask for approval from tosi3k. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

clusterloader2/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2023-07-01T12:55:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hakuna-matatah
Once this PR has been reviewed and has the lgtm label, please ask for approval from tosi3k. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

clusterloader2/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-triage-robot · 2024-03-15T00:07:55Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-05-11T19:58:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot · 2024-05-17T05:34:03Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-triage-robot · 2024-06-16T05:51:46Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-06-16T05:51:50Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 3, 2022

k8s-ci-robot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 3, 2022

k8s-ci-robot requested review from krzysied and mborsz November 3, 2022 17:57

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 3, 2022

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 3, 2022

hakuna-matatah force-pushed the master branch from b07f0f7 to b723471 Compare November 3, 2022 22:26

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 3, 2022

k8s-ci-robot assigned tosi3k Nov 4, 2022

tosi3k suggested changes Nov 4, 2022

View reviewed changes

hakuna-matatah mentioned this pull request Dec 13, 2022

bug fixes for test failures awslabs/kubernetes-iteration-toolkit#334

Merged

hakuna-matatah force-pushed the master branch from 95959a1 to c3bd24d Compare November 16, 2023 05:40

hakuna-matatah force-pushed the master branch 3 times, most recently from 277ebcc to da63d7a Compare December 1, 2023 06:18

hakuna-matatah force-pushed the master branch from 6ed1e1a to 087976f Compare December 15, 2023 20:50

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 15, 2024

hakuna-matatah force-pushed the master branch from 8a079ce to 41673b0 Compare March 21, 2024 20:52

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 21, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 11, 2024

hakuna-matatah added 2 commits May 14, 2024 01:40

use informers for pod events instead of Listing

f33b0ba

Add direct scheduler throughput test suite

09a0c41

hakuna-matatah force-pushed the master branch from e16f655 to 09a0c41 Compare May 14, 2024 01:46

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2024

k8s-ci-robot closed this Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use informers for pod events instead of Listing #2178

use informers for pod events instead of Listing #2178

hakuna-matatah commented Nov 3, 2022

linux-foundation-easycla bot commented Nov 3, 2022 •

edited

Loading

k8s-ci-robot commented Nov 3, 2022

hakuna-matatah commented Nov 3, 2022

mengqiy commented Nov 3, 2022

mengqiy commented Nov 3, 2022

hakuna-matatah commented Nov 3, 2022

mborsz commented Nov 4, 2022

tosi3k Nov 4, 2022

tosi3k Nov 4, 2022

hakuna-matatah Nov 4, 2022 •

edited

Loading

tosi3k Nov 9, 2022

k8s-ci-robot commented Jul 1, 2023

k8s-ci-robot commented Jul 1, 2023

k8s-triage-robot commented Mar 15, 2024

k8s-triage-robot commented May 11, 2024

k8s-ci-robot commented May 17, 2024

k8s-triage-robot commented Jun 16, 2024

k8s-ci-robot commented Jun 16, 2024


		p.stopSchedCh = make(chan struct{})

		e := informer.NewInformer(

use informers for pod events instead of Listing #2178

use informers for pod events instead of Listing #2178

Conversation

hakuna-matatah commented Nov 3, 2022

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

linux-foundation-easycla bot commented Nov 3, 2022 • edited Loading

k8s-ci-robot commented Nov 3, 2022

hakuna-matatah commented Nov 3, 2022

mengqiy commented Nov 3, 2022

mengqiy commented Nov 3, 2022

hakuna-matatah commented Nov 3, 2022

mborsz commented Nov 4, 2022

tosi3k Nov 4, 2022

Choose a reason for hiding this comment

tosi3k Nov 4, 2022

Choose a reason for hiding this comment

hakuna-matatah Nov 4, 2022 • edited Loading

Choose a reason for hiding this comment

tosi3k Nov 9, 2022

Choose a reason for hiding this comment

k8s-ci-robot commented Jul 1, 2023

k8s-ci-robot commented Jul 1, 2023

k8s-triage-robot commented Mar 15, 2024

k8s-triage-robot commented May 11, 2024

k8s-ci-robot commented May 17, 2024

k8s-triage-robot commented Jun 16, 2024

k8s-ci-robot commented Jun 16, 2024

linux-foundation-easycla bot commented Nov 3, 2022 •

edited

Loading

hakuna-matatah Nov 4, 2022 •

edited

Loading