Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate some e2e testcases #1500

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

fanhaouu
Copy link
Contributor

migrate e2e_duplicatepods e2e_failedpods e2e_topologyspreadconstraint e2e_leaderelection e2e_clientconnection

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 18, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @fanhaouu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign a7i for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fanhaouu
Copy link
Contributor Author

The current method of using IMAGE_TAG as VERSION in run-e2e-tests.sh to make the image has issues, which results in the failure to obtain the minor version during the image build. Additionally, should we also synchronize and push the image after making it? @ingvagabund

@fanhaouu fanhaouu force-pushed the feat-migrate-e2e-tests branch from 6194ad2 to 16e4377 Compare August 18, 2024 19:39
@ingvagabund
Copy link
Contributor

@fanhaouu can you split the PR into multiple commits? Each commit with a reasonable change. E.g. one commit per a test change. Also, @hsunwenfang has already started migration of TestLeaderElection in #1497.

@ingvagabund
Copy link
Contributor

The current method of using IMAGE_TAG as VERSION in run-e2e-tests.sh to make the image has issues, which results in the failure to obtain the minor version during the image build.

During the image build? Can you share the failure? I am aware of a parsing error when running descheduler. Is this a new failure?

Additionally, should we also synchronize and push the image after making it?

The image is built only for testing purposes. There's a different mechanism configured for building new image after each PR merges.

@fanhaouu
Copy link
Contributor Author

@fanhaouu can you split the PR into multiple commits? Each commit with a reasonable change. E.g. one commit per a test change. Also, @hsunwenfang has already started migration of TestLeaderElection in #1497.

sorryt, I can't split the PR into multiple commits; many of the changes were made together.

@fanhaouu
Copy link
Contributor Author

The current method of using IMAGE_TAG as VERSION in run-e2e-tests.sh to make the image has issues, which results in the failure to obtain the minor version during the image build.

During the image build? Can you share the failure? I am aware of a parsing error when running descheduler. Is this a new failure?

Additionally, should we also synchronize and push the image after making it?

The image is built only for testing purposes. There's a different mechanism configured for building new image after each PR merges.
A parsing error when running descheduler. Since we were passing the version as a specific tag, I have now changed it to IMAGE_TAG=v$(shell date +%Y%m%d)-$(shell git describe --tags)."

@ingvagabund
Copy link
Contributor

sorryt, I can't split the PR into multiple commits; many of the changes were made together.

It's a good exercise to find a way how to split the PR into smaller changes. Please keep in mind we need to help each other when reviewing any PR. The smaller the changes the easier it is to review it. Even if it means to refactor the code in a different PR before you can make a PR that performs the actual migration.

@fanhaouu
Copy link
Contributor Author

fanhaouu commented Aug 19, 2024

sorryt, I can't split the PR into multiple commits; many of the changes were made together.

It's a good exercise to find a way how to split the PR into smaller changes. Please keep in mind we need to help each other when reviewing any PR. The smaller the changes the easier it is to review it. Even if it means to refactor the code in a different PR before you can make a PR that performs the actual migration.

Got it, I understand. I'll pay attention to it next time, but if I have to split this into many PRs, it will consume a lot of my effort and time. The PR #1497 is not finished yet, and there are many issues. It seems like he didn't fully understand how to perform the migration.

@fanhaouu
Copy link
Contributor Author

@ingvagabund There are still some other strategies in the e2e_test.go file. Should those also be migrated?

@hsunwenfang
Copy link

sorryt, I can't split the PR into multiple commits; many of the changes were made together.

It's a good exercise to find a way how to split the PR into smaller changes. Please keep in mind we need to help each other when reviewing any PR. The smaller the changes the easier it is to review it. Even if it means to refactor the code in a different PR before you can make a PR that performs the actual migration.

Got it, I understand. I'll pay attention to it next time, but if I have to split this into many PRs, it will consume a lot of my effort and time. The PR #1497 is not finished yet, and there are many issues. It seems like he didn't fully understand how to perform the migration.

Hi @fanhaouu
I just committed earlier and encounter some problem during the testing
Can you instruct me on the remaining issues?
thx!

@fanhaouu
Copy link
Contributor Author

sorryt, I can't split the PR into multiple commits; many of the changes were made together.

It's a good exercise to find a way how to split the PR into smaller changes. Please keep in mind we need to help each other when reviewing any PR. The smaller the changes the easier it is to review it. Even if it means to refactor the code in a different PR before you can make a PR that performs the actual migration.

Got it, I understand. I'll pay attention to it next time, but if I have to split this into many PRs, it will consume a lot of my effort and time. The PR #1497 is not finished yet, and there are many issues. It seems like he didn't fully understand how to perform the migration.

Hi @fanhaouu I just committed earlier and encounter some problem during the testing Can you instruct me on the remaining issues? thx!

Your submission doesn't seem to have been executed in a k8s cluster with make test-e2e. Otherwise, you would have encountered many issues. Your test case might not even run.If possible, could you review my submission regarding the test for leader election?

@ingvagabund
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 19, 2024
@ingvagabund
Copy link
Contributor

Got it, I understand. I'll pay attention to it next time, but if I have to split this into many PRs, it will consume a lot of my effort and time.

I see a lot of good improvements in the code. Thank you that. My initial review notes:

  • Can you unrename test/e2e/test_clientconnection_test.go? It's easier to review code changes rather than file renames
  • I see you are removing createDeployment and other like duplicates. This deserves its own commit/PR.
  • The same holds for removal of waitForDeschedulerPodRunning/waitForDeschedulerPodAbsent and generalizing the methods. This deserves its own commit/PR.

@fanhaouu
Copy link
Contributor Author

  • I see you are removing createDeployment and other like duplicates. This deserves its own commit/PR.
  • The same holds for removal of waitForDeschedulerPodRunning/waitForDeschedulerPodAbsent and generalizing the methods. This deserves its own commit/PR.

Ok, no problem. I will make some improvements.

@fanhaouu fanhaouu force-pushed the feat-migrate-e2e-tests branch from 16e4377 to a4b52b8 Compare August 19, 2024 18:23
@fanhaouu
Copy link
Contributor Author

/retest

test/run-e2e-tests.sh Show resolved Hide resolved
@@ -101,7 +101,6 @@ func TestTooManyRestarts(t *testing.T) {
if _, err := clientSet.CoreV1().Namespaces().Create(ctx, testNamespace, metav1.CreateOptions{}); err != nil {
t.Fatalf("Unable to create ns %v", testNamespace.Name)
}
defer clientSet.CoreV1().Namespaces().Delete(ctx, testNamespace.Name, metav1.DeleteOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay here in case creating a deployment errors and t.Fatalf is invoked.

Copy link
Contributor Author

@fanhaouu fanhaouu Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't remove this logic. I move it to

clientSet.CoreV1().Namespaces().Delete(ctx, testNamespace.Name, metav1.DeleteOptions{})

@@ -194,9 +196,6 @@ func TestTopologySpreadConstraint(t *testing.T) {
return
}

// Ensure recently evicted Pod are rescheduled and running before asserting for a balanced topology spread
test.WaitForDeploymentPodsRunning(ctx, t, clientSet, deployment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for synchronization as the comment describes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently running the descheduler inside a container, so we just need to continuously wait for the results using wait.PollUntilContextTimeout. We don’t actually know when the descheduler evicts the pods.

if err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 60*time.Second, true, func(ctx context.Context) (bool, error) {

@@ -181,9 +186,6 @@ func TestTopologySpreadConstraint(t *testing.T) {
plugin.(frameworktypes.BalancePlugin).Balance(ctx, workerNodes)
t.Logf("Finished RemovePodsViolatingTopologySpreadConstraint strategy for %s", name)

t.Logf("Wait for terminating pods of %s to disappear", name)
waitForTerminatingPodsToDisappear(ctx, t, clientSet, deployment.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for synchronization so new pods get a chance to be started

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently running the descheduler inside a container, so we just need to continuously wait for the results using wait.PollUntilContextTimeout. We don’t actually know when the descheduler evicts the pods.

if err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 60*time.Second, true, func(ctx context.Context) (bool, error) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t actually know when the descheduler evicts the pods

We do as we expect the descheduler to evict certain pods in a certain time based on the configuration the test sets.

@ingvagabund
Copy link
Contributor

@fanhaouu just a quick review. Lemme know once the PR is ready for review and all the tests are going green. Thank you.

@fanhaouu fanhaouu force-pushed the feat-migrate-e2e-tests branch 7 times, most recently from cd7b9ce to 115f816 Compare August 22, 2024 08:01
@fanhaouu fanhaouu force-pushed the feat-migrate-e2e-tests branch from 115f816 to 9ba1c55 Compare August 22, 2024 10:25
@fanhaouu
Copy link
Contributor Author

fanhaouu commented Aug 22, 2024

@fanhaouu just a quick review. Lemme know once the PR is ready for review and all the tests are going green. Thank you.

That's ok. I think you can start to review. It's really frustrating. I've been making adjustments for a long time.

}
deschedulerPolicyConfigMapObj, err := deschedulerPolicyConfigMap(podlifetimePolicy(podLifeTimeArgs, evictorArgs))
deschedulerPolicyConfigMapObj.Name = fmt.Sprintf("%s-%s", deschedulerPolicyConfigMapObj.Name, testName)
deschedulerPolicyConfigMapObj, err := deschedulerPolicyConfigMap(podlifetimePolicy(podLifeTimeArgs, evictorArgs), func(cm *v1.ConfigMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of abstracting deschedulerPolicyConfigMap and deschedulerDeployment methods if the apply part is used only once?

@@ -1392,6 +1392,24 @@ func waitForRCPodsRunning(ctx context.Context, t *testing.T, clientSet clientset
}
}

func waitForTerminatingPodsToDisappear(ctx context.Context, t *testing.T, clientSet clientset.Interface, namespace string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are putting back waitForTerminatingPodsToDisappear which was removed in 1e23dbc. Can you consolidate the changes into a single commit?

- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
resourceNames: ["descheduler"]
verbs: ["get", "patch", "delete"]
verbs: ["get", "patch", "delete","create", "update"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular change/commit that resulting in need to add the verbs here? If so, it needs to be part of the same commit/PR.

var meetsExpectations bool
var meetsEvictedExpectations bool
var actualEvictedPodCount int
t.Logf("Check whether the number of evicted pods meets the expectation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message resolve the issue causing the e2e test failures does not describe which issue and what e2e test failure was resolved. Also, this commit changes two e2es. Is the change in TestTopologySpreadConstraint related to the issue/e2e test failure? If not, can you move this change into migrate e2e_topologyspreadconstraint commit?

@ingvagabund
Copy link
Contributor

ingvagabund commented Aug 29, 2024

@fanhaouu if you separate generic changes from this PR into separate PRs it will be easier and faster to review the migration. There's still changes in this PR that are changing changes in previous commits of this PR.

@fanhaouu
Copy link
Contributor Author

fanhaouu commented Sep 3, 2024

@fanhaouu if you separate generic changes from this PR into separate PRs it will be easier and faster to review the migration. There's still changes in this PR that are changing changes in previous commits of this PR.

I think submitting too many changes at once is too cumbersome and makes reviewing difficult. Therefore, I’ve decided to split this PR into smaller, individual PRs.

@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2024
@k8s-ci-robot
Copy link
Contributor

@fanhaouu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-descheduler-test-e2e-k8s-master-1-32 196ac65 link true /test pull-descheduler-test-e2e-k8s-master-1-32

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants