-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lifecycle-cpu-isolation and lifecycle-affinity-required-pods don't appear to be matching pods #1119
Comments
Has this been figured out? Just going through the issue list and saw this hasn't been responded to. |
I haven't heard back about it and as of the last test run in DCI it seems like this is still being skipped even though AFAIK they're only currently working through failures at the moment. |
Looking that DCI run that you sent me via Slack, the tnf_config.yml looks like its still using |
There have been multiple iterations on this problem. IIRC the issue was filed with a manual run of the test suite but I gave you the latest DCI run. It's possible that's the disconnect. I can try using the new version of the test suite |
Just in case; on DCI, we are moving to |
OK I re-ran the tests in DCI again: https://www.distributed-ci.io/jobs/22d3f865-2240-4fb5-972c-6379127b0be2/files Still getting a skip because of it's not locating the pods though. |
@joeldavis84 I'm getting a "job not found" error from that link. |
Ok sorry, I deleted a bunch of jobs in response to partner input and may have gotten a bit too over zealous. New test run: https://www.distributed-ci.io/jobs/abc116a5-e157-478b-9c2b-387302f17fae?sort=date From tnf_config: And verified that the label is present (not sure if this is in the DCI results):
In the DCI results I can see "lifecycle-cpu-isolation" being skipped due to no labels:
And from how CATALOG.md is worded I would assume both the mismatch between |
@joeldavis84 , I've been checking your job, and I think that what is happening is that you're not looking at the correct namespace to retrieve the pods. If you check the
In the file called
If you check the list, the pod you're commenting is not there, I suppose it's in a different namespace. So, you need to really check the I'd not say, for the moment, that it's an issue on tnf side, because we're testing these tests recurrently with the latest and we can confirm this is working fine in tnf v4.2.4 and v4.3.0 for sure. If you need some support here, don't hesitate to reach us! |
The pod I was looking at was just a random pod in the CNF that had that label. The pods do exist in the tawon namespace though:
The particular pod not being in the namespace anymore is likely just because the partner is working on the CNF and likely restarting or recreating pods somehow. |
Is it possible it's related to them using DaemonSets? I don't know how the tests are written but that is another weird thing that they're doing so I don't know if the pods are being tracked down by looking at ReplicaSets and deployments or something. |
As long as the pods are labeled they will be tested regardless of their parent being a daemonset or not. CNFs aren't supposed to use daemonsets per the requirements docs but it shouldn't prevent the pods themselves from being tested if they are labeled. |
we seem to be running into this issue again with a different partner who is using regular deployments. The test is being skipped when it seems like it should be failing due to subdividing CPU's and "limits" != "requests" |
Can you send me the claim.json from their run? |
This is the run I was looking at when I made this comment and it's the latest one: https://www.distributed-ci.io/jobs/80912905-b9ed-45dd-9701-ad35584d0712?sort=date |
/usr/tnf/tnfsrc/pkg/testhelper/testhelper.
|
Relevant portion of the TNF config (CNF name redacted, can provide whatever is needed side channel):
And there are pods with requests/limits configured in a way that seems like it would cause the test to fail (mismatched, not using whole CPU's, etc):
but that aren't found when the test suite runs:
The above is for
lifecycle-cpu-isolation
but's also relevant forlifecycle-affinity-required-pods
as these same pods have affinity rules but not the affinity-related labels or annotations mentioned in the CATALOG.md description which seems like it should cause a failure rather than be skipped.The text was updated successfully, but these errors were encountered: