Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SS 1091 Fix handling of deleted pods in a release #4

Merged
merged 14 commits into from
Aug 26, 2024
Merged

Conversation

alfredeen
Copy link
Member

@alfredeen alfredeen commented Aug 23, 2024

This PR fixes a bug that incorrectly sets the app status to Deleted. This can occur in several scenarios for example if the app image is changed to an invalid image and then back to a valid image.

A function to fetch the current status directly from k8s via the k8s API client is introduced. This is used in special situations when the k8s stream seems to indicate that a pod has been deleted to make sure this is the case.

Also introduces a CI action and extended unit test coverage of the app status logic.

@alfredeen alfredeen self-assigned this Aug 23, 2024
@alfredeen alfredeen requested a review from a team August 23, 2024 11:11
@alfredeen alfredeen marked this pull request as ready for review August 23, 2024 11:11
@churnikov
Copy link
Contributor

Just a general thought about logging. We have correlation id enabled in serve. We could pass it here as well for when we would be looking for logs, we would be able to trace the whole chain of events even here.

serve_event_listener/status_data.py Outdated Show resolved Hide resolved
Comment on lines +137 to +144
self.assertEqual(
self.status_data.status_data[release].get("status"),
"Running",
f"Release should be Running after delete of first pod, \
ts pod deleted={self.pod.metadata.deletion_timestamp} vs \
ts invalid_pod deleted={self.invalid_pod.metadata.deletion_timestamp} vs \
ts valid_pod created={self.valid_pod.metadata.creation_timestamp}, {msg}",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why did you switch to unittests style asserts? assert has a third argument as well, if that was the reason

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the pattern already used by Viktor which I chose to continue

@@ -74,19 +75,222 @@ def test_replica_scenario(self):
self.new_pod.create(release)
self.status_data.update({"object": self.new_pod})

time.sleep(0.01)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it actually create objects on the cluster, you've added these sleep calls here?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objects are created on the cluster, just in memory k8s client objects. Without these short pauses the object timestamps were exactly equal which would be exceptionally rare in the real world and were problematic for the implementation.

Comment on lines +180 to +189
def test_waiting_container_reason_pending(self):
"""
This scenario tests a k8s pod status object with a container with the following status attributes:
state=waiting, reason=PodInitializing
"""
podstatus = PodStatus()
podstatus.add_container_status("waiting", "PodInitializing")
expected = ("Pending", "", "")
actual = StatusData.determine_status_from_k8s(podstatus)
self.assertEqual(actual, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the tests bellow are perfect case for a parameterized test cases. It would reduce test file size quite a bit I think
But if you want to stick to unittest, then there is a subTest context manager that could be used for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but the tests are not so similar as they might seem at first. There are different methods for init containers and non-init containers, and one test adds both objects. So parameterized tests could combine at most 2 or 3 tests into one.

Co-authored-by: Nikita Churikov <8545082+churnikov@users.noreply.github.com>
@alfredeen alfredeen merged commit dea097e into main Aug 26, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants