Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dryRun issue with permissionless user #3530

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

lucferbux
Copy link
Contributor

@lucferbux lucferbux commented Nov 28, 2024

Description

Closes https://issues.redhat.com/browse/RHOAIENG-15479

Fix an issue in which users without cluster admin roles would dry run rolebindings referencing a role that's not there.
Also, avoiding calling setUpTokenAuth twice in kserve.

How Has This Been Tested?

  1. Impersonate a regular user
  2. Create a new project
  3. Go to model serving section
  4. Deploy a new kserve model
  5. Select auth tokens
  6. See there's an issue

These changes will fix that scenario

Test Impact

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress This PR is in WIP state label Nov 28, 2024
Copy link
Contributor

openshift-ci bot commented Nov 28, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Member

@andrewballantyne andrewballantyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the setup token auth stuff you changed... I'll need to dig into that a bit more. But I have a direct inquiry to the comment you wrote down about this solution. Let me know if I am on the right track.

return createRoleBinding(rolebinding, opts).catch((error) => {
if (error.statusObject?.code === 404 && opts?.dryRun) {
// If dryRun is enabled and the user is not Cluster Admin it seems that there's a k8s error
// that raises a 404 trying to find the role, which is missing from the dryRun
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing from the Dry Run -- because we are dry running everything and the RoleBinding validates the existence of the Role on the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's right, I assume this is a k8s error, cause it only happens with permisionless users, when dryRunning, creating the rolebinding i think internally tries to get the role it's trying to assign, since it's not yet created it fails. My guess is that with cluster admins k8s does the correct check and it assumes it's dryRunned too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's funny, cause if you change the rolebinding to not use an specific role it works just fine, or if you run this with cluster admins, it's just with this specific scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely strange. It doesn't make sense that the createRoleBinding util would return a 404 since it's a direct call to k8sCreateResource with the RoleBindingModel, the client never does a GET for the role - you mean that is happening internally in the k8s API? Seems bizarre that it would respond with 404 then and not 400.

It also doesn't make sense that this ever worked for admins tbh - I don't see why the API would be aware of the previous dryRun requests, they're all separate HTTP requests. So for some reason that internal check for the role existing isn't happening for admins? I wonder if admin permissions mean overriding checks like that or something? Confusing all around...

This solution seems fine to me based on the behavior we're seeing, I would love to understand it more but I suppose that doesn't block this PR.

@lucferbux lucferbux marked this pull request as ready for review November 28, 2024 22:46
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress This PR is in WIP state label Nov 28, 2024
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 85.57%. Comparing base (a08ad68) to head (90e3338).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
frontend/src/pages/modelServing/utils.ts 80.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3530      +/-   ##
==========================================
+ Coverage   85.42%   85.57%   +0.15%     
==========================================
  Files        1353     1342      -11     
  Lines       31129    31012     -117     
  Branches     8692     8667      -25     
==========================================
- Hits        26591    26538      -53     
+ Misses       4538     4474      -64     
Files with missing lines Coverage Δ
...d/src/pages/modelServing/screens/projects/utils.ts 93.62% <100.00%> (+0.12%) ⬆️
frontend/src/pages/modelServing/utils.ts 92.94% <80.00%> (-1.04%) ⬇️

... and 19 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09ccd93...90e3338. Read the comment docs.

@emilys314
Copy link
Contributor

i tested the PR image on my cluster with a non admin user and the issue was fixed. the extra isModelMesh is slightly confusing but i suppose it makes sense to avoid unnecessary calls

Copy link
Contributor

@mturley mturley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair reviewed and tested on a call with @lucferbux - LGTM

Copy link
Contributor

openshift-ci bot commented Dec 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mturley

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 4, 2024
@lucferbux
Copy link
Contributor Author

i tested the PR image on my cluster with a non admin user and the issue was fixed. the extra isModelMesh is slightly confusing but i suppose it makes sense to avoid unnecessary calls

Yes thanks Emily, this is just because debugging i realized in kserve we are calling it twice, one for the inferenceservice creation and the other for servingruntime, it's just to avoid duplicated calls.

@openshift-merge-bot openshift-merge-bot bot merged commit ca3d3e4 into opendatahub-io:main Dec 4, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants