-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented visualizations for the NIM metrics #3612
base: main
Are you sure you want to change the base?
Conversation
Hi @LinoyBitan1. Thanks for your PR. I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
FYI @emilys314 |
This is our (Matan Talvi and Linoy Bitan) first PR. |
/ok-to-test |
Can you run the linter / formatter? The CI checks will not pass unless it's following the linting rules See https://github.com/opendatahub-io/odh-dashboard/blob/main/CONTRIBUTING.md#linter-testing |
Can you also add in the PR description the jira item this is associated with? |
done |
The linter and formatter were run, and we've added the updates in the commit along with a few changes in the tests. Let us know if there's anything else needed! |
const isKServeNIMEnabled = isProjectNIMSupported(currentProject); | ||
|
||
const modelMetricsSupported = | ||
modelMetricsEnabled && (modelMesh || kserveMetricsEnabled) && !isKServeNIMEnabled; | ||
const modelMetricsSupported = modelMetricsEnabled && (modelMesh || kserveMetricsEnabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another linter error since isKserveNIMEnabled
is no longer being used
https://github.com/opendatahub-io/odh-dashboard/actions/runs/12686647659/job/35374595034?pr=3612#step:9:48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We performed the needed changes.
522c605
to
eebe460
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it even possible to deploy a NIM model using modelmesh? I assume it's not so isModelMesh(model) ? <ModelMeshMetrics />
and the entire file is not needed.
backend/src/utils/constants.ts
Outdated
@@ -69,7 +69,7 @@ export const blankDashboardCR: DashboardConfig = { | |||
disableServingRuntimeParams: false, | |||
disableConnectionTypes: false, | |||
disableStorageClasses: false, | |||
disableNIMModelServing: true, | |||
disableNIMModelServing: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? I don't see in the JIRA or PR description that we should be changing this. Can you revert this?
@@ -6,8 +6,10 @@ const useModelMetricsEnabled = (): [modelMetricsEnabled: boolean] => { | |||
).status; | |||
const biasMetricsAreaAvailable = useIsAreaAvailable(SupportedArea.BIAS_METRICS).status; | |||
|
|||
const nimMetricsAreaAvailable = useIsAreaAvailable(SupportedArea.BIAS_METRICS).status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const nimMetricsAreaAvailable = useIsAreaAvailable(SupportedArea.BIAS_METRICS).status; | |
const nimMetricsAreaAvailable = useIsAreaAvailable(SupportedArea.NIM_MODEL).status; |
I think you intended to check NIM_MODEL
?
if (performanceMetricsAreaAvailable) { | ||
enabledTabs.push(MetricsTabKeys.PERFORMANCE); | ||
} | ||
if (biasMetricsAreaAvailable) { | ||
enabledTabs.push(MetricsTabKeys.BIAS); | ||
} | ||
if (nimMetricsAreaAvailable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed the disableNIMModelServing flag back to true. In order to test the tab isolation, the flag needs to be changed to false.
}; | ||
|
||
export const useFetchKserveKVCacheUsageData = ( | ||
metricsDef: KserveMetricGraphDefinition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metricsDef: KserveMetricGraphDefinition, | |
metricsDef: KserveMetricGraphDefinition | NimMetricGraphDefinition, |
I don't think we need to duplicate files (frontend/src/api/prometheus/NimPerformanceMetrics.ts
and frontend/src/api/prometheus/kservePerformanceMetrics.ts
) with just a different metricsDef
type. Can we remove one of them and union the metricsDef
type to contain both?
@@ -189,7 +182,8 @@ const MetricsChart: React.FC<MetricsChartProps> = ({ | |||
{hasSomeData ? ( | |||
<Chart | |||
ariaTitle={title} | |||
containerComponent={containerComponent} | |||
// containerComponent={containerComponent} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: remove the old commented out code
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
It appears there is another linter error https://github.com/opendatahub-io/odh-dashboard/actions/runs/12769888698/job/35594011880?pr=3612 |
d36865b
to
859f097
Compare
//check availability of NIM metrics | ||
const nimMetricsAreaAvailable = useIsAreaAvailable(SupportedArea.NIM_MODEL).status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be needed because const isNIMAvailable = servingPlatformStatuses.kServeNIM.enabled;
will already check it for you and return false
if the area is not available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think that it is needed because that just says if NIM is available or not available in the cluster but not by project. We've verified it with @olavtar .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useIsAreaAvailable(SupportedArea.NIM_MODEL).status;
is only checking if it's it's available in the cluster too.
const isNIMAvailable = servingPlatformStatuses.kServeNIM.enabled;
is the check for cluster wide nim enablement. And const isKServeNIMEnabled = project ? isProjectNIMSupported(project) : false
is the one checking for nim project enablement.
useServingPlatformStatuses()
for nim checks const { isNIMAvailable } = React.useContext(NIMAvailabilityContext);
and that has
const isNIMModelServingAvailable = useIsAreaAvailable(SupportedArea.NIM_MODEL).status;
const fetchNIMAvailability = React.useCallback(async () => {
if (!isNIMModelServingAvailable) {
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right!
const servingPlatformStatuses = useServingPlatformStatuses(); | ||
const isNIMAvailable = servingPlatformStatuses.kServeNIM.enabled; | ||
const { projects } = React.useContext(ProjectsContext); | ||
const project = projects.find(byName(model.metadata.namespace)) ?? null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const project = projects.find(byName(model.metadata.namespace)) ?? null; | |
const project = projects.find(byName(model.metadata.namespace)); |
small nitpick but ?? null
is not needed
050e6c9
to
d76e983
Compare
Signed-off-by: LinoyBitan1 <lbitan@redhat.com>
Signed-off-by: LinoyBitan1 <lbitan@redhat.com>
d76e983
to
bf2f0fd
Compare
Description
JIRA - https://issues.redhat.com/browse/NVPE-31
Implemented visualizations for the NIM metrics. the graphs added to Nim tab-
1 - Graph for KV Cache usage over time
2 - Line graph for Running, Waiting and Max Request Count
3 - Line graph with Total Prompt Token Count and Total Generation Token Count
4 - Area chart with Time to First Token
5 - Area chart with Time per Output Token
6 - Donut chart with Success Request and Failed Requests
odh-model-controller creates a ConfigMap next to the InferenceService called -metrics-dashboard with a data indicating if metrics are available for the runtime and if so, the ConfigMap will also contain a JSON object specifying the queries required for the graphs mentioned above.
odh-dashboard grabs the above-mentioned ConfigMap based on its name, and if metrics are available for the runtime, it will parse the JSON object and create the required graphs.
NIM Metrics tab-
NIM Metrics graphs-
How Has This Been Tested?
Use the following doc -
NIM Metrics_ frontend.pdf
Test Impact
Request review criteria:
Self checklist (all need to be checked):
If you have UI changes:
After the PR is posted & before it merges:
main