Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base test for vLLM and its metrics #1438

Merged
merged 15 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions ods_ci/tests/Resources/CLI/ModelServing/llm.resource
Original file line number Diff line number Diff line change
Expand Up @@ -787,8 +787,9 @@ Get KServe Default Deployment Mode From DSC
RETURN ${mode}

Start Port-forwarding
[Arguments] ${namespace} ${pod_name} ${process_alias}=llm-query-process
${process}= Start Process oc -n ${namespace} port-forward pod/${pod_name} 8033:8033
[Arguments] ${namespace} ${pod_name} ${process_alias}=llm-query-process ${local_port}=8033
Dismissed Show dismissed Hide dismissed
... ${remote_port}=8033
${process}= Start Process oc -n ${namespace} port-forward pod/${pod_name} ${local_port}:${remote_port}
... alias=${process_alias} stderr=STDOUT shell=yes
Process Should Be Running ${process}
sleep 7s
Expand Down
10 changes: 10 additions & 0 deletions ods_ci/tests/Resources/Files/llm/model_expected_responses.json
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,16 @@
}
}
}
},
{
"query_text": "{'role': 'system','content': 'You are a poetic assistant, skilled in explaining complex programming concepts with creative flair.'},{'role': 'user','content': 'Compose a poem that explains the concept of recursion in programming.'}",
"models": {
"gpt2": {
"vllm": {
"chat-completions_response_text": "A friend of mine came over to the house to play with his wife. He was asleep, and he felt like he'd been hit by a speeding car. He's a big guy. He's like the kind of guy who may not have a very good head, but he's big enough to stand up at a table and read something. I was like, \"I'm going to play with this.\"\n\nThat's where I started playing with my car. It was something I never dreamed of doing, but I'd never imagined that it would be such a big deal.\n\nWe started playing with it. When I was about 12, we started playing with it to see how it would turn out. I was 26, and I was playing it for the first time for the first time ever. It was fun. I remember thinking it was like a different game than I had ever played before. I remember thinking the first time we played would be like, \"Oh my god, I've never played a game like this before before.\"\n\nIt was surreal. I was in my 20s at the time. We got to have a party in my house at the time. I was sitting in the living room with my friend, who's 28. We're from Dallas, and his wife is a pretty big girl. He's about 6 feet tall and 250 pounds. On the phone with his friend said, \"Dad, is it possible you'll be able to do this without your beard?\" I was like, \"Absolutely, actually.\" I thought, \"I'm going to do it.\"\n\nI finally did it and it turned out pretty well. I was able to take our photo with our friend, and he got excited and started laughing. He was like, \"That's awesome.\" I sat in his living room for two hours and made sure he was really excited. He was really excited. We ended up having a workshop and we have a lot of stuff to do.\n\nHe just started playing. It's been amazing. I'm like, \"It's going to be huge.\" At first I was like, \"Wow, my god that's amazing.\" I was like, \"Wow, my God that's awesome.\" He's like, \"I'm so excited about this!\" He was like, \"Oh my god, I can't wait to do it!\"\n\nHe had that awesome physique. He was super skinny. He was like, \"I'm so excited about it.\" He was like, \"Really?\" I was like, \"Yeah, I'm so excited! I'm so excited.\" We did it for two weeks and it turned out pretty well.\n\nHe's like, \"I hope it stays that way.\" I was like, \"I hope it stays that way.\" He was like, \"Oh my god, I've never even played with a computer before!\" I was like, \"Yeah, it's just fun to play with a computer.\" He was like, \"Oh my god, I can't wait to play with a computer!\" He was like, \"It's just a cool thing to do!\"\n\nI was doing it with my friend's dog, a puppy.\n\nI was doing it with my friend's dog. People said, \"You think that's cool?\" I said, \"Yeah, that's cool.\" We had the dog. He was a little bit shy and it was a little bit intimidating and scary.\n\nWe played it twice. It was like a game. He was like, \"Oh my God I've never played with a computer before!\" I was like, \"I hope it stays that way.\" He was like, \"Yeah, it's just a cool thing to do!\" He was like, \"Oh my god, I can't wait to do it!\"\n\nWe played it again on the bus, on the weekend.\n\nWe played it again on the weekend.\n\nThen we went to the store and bought a new Canon 5D Mark II.\n\nI couldn't believe what the customer was saying. I was like, \"That sounds amazing!\" He was like, \"That's amazing!\"\n\nHe was like, \"Wow! That's awesome!\" So we were like, \"Wow! That looks awesome!\" He's like, \"Yeah, that looks awesome!\" I was like, \"Wow! That looks awesome! That looks awesome!\"\n\nWe played it twice again.\n\nI was like, \"Wow! That sounds awesome!\" He was like, \"Wow! That sounds awesome! That sounds awesome!\" I was like, \"Wow! That looks awesome!\"\n\nHe was like, \"Wow! That sounds awesome! That looks awesome!\"\n\nI was just like, \"Wow! That looks awesome! That looks awesome!\" He was like"
}
}
}
}
],
"model-info": {
Expand Down
17 changes: 17 additions & 0 deletions ods_ci/tests/Resources/Files/llm/runtime_query_formats.json
Original file line number Diff line number Diff line change
Expand Up @@ -96,5 +96,22 @@
}
},
"containers": ["kserve-container"]
},
"vllm": {
"endpoints": {
"chat-completions": {
"http": {
"endpoint": "v1/chat/completions",
"header": "Content-Type:application/json",
"body": "{'model': '${model_name}','messages': [${query_text}]}",
"response_fields_map": {
"response": "choices",
"completion_tokens": "completion_tokens",
"response_text": "content"
}
}
}
},
"containers": ["kserve-container"]
}
}
74 changes: 74 additions & 0 deletions ods_ci/tests/Resources/Files/llm/vllm/download_model.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
apiVersion: v1
kind: Namespace
metadata:
name: vllm-gpt2
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: vlmm-gpt2-claim
namespace: vllm-gpt2
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
name: setup-gpt2-binary
namespace: vllm-gpt2
labels:
gpt-download-pod: 'true'
spec:

Check warning

Code scanning / SonarCloud

Service account tokens should not be mounted in pods

<!--SONAR_ISSUE_KEY:AY9jLyHKqk86X9Pay0bC-->Set automountServiceAccountToken to false for this specification of kind Pod. <p>See more on <a href="https://sonarcloud.io/project/issues?id=red-hat-data-services_ods-ci&issues=AY9jLyHKqk86X9Pay0bC&open=AY9jLyHKqk86X9Pay0bC&pullRequest=1438">SonarCloud</a></p>
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: vlmm-gpt2-claim
restartPolicy: Never
initContainers:
- name: fix-volume-permissions
image: quay.io/quay/busybox:latest
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: true
resources:
requests:
memory: "64Mi"
cpu: "250m"
nvidia.com/gpu: "1"
limits:
memory: "128Mi"
cpu: "500m"
nvidia.com/gpu: "1"
command: ["sh"]
args: ["-c", "chown -R 1001:1001 /mnt/models"]
volumeMounts:
- mountPath: "/mnt/models/"
name: model-volume
containers:
- name: download-model

Check warning

Code scanning / SonarCloud

Storage limits should be enforced

<!--SONAR_ISSUE_KEY:AY9jLyHKqk86X9Pay0bD-->Specify a storage limit for this container. <p>See more on <a href="https://sonarcloud.io/project/issues?id=red-hat-data-services_ods-ci&issues=AY9jLyHKqk86X9Pay0bD&open=AY9jLyHKqk86X9Pay0bD&pullRequest=1438">SonarCloud</a></p>
image: registry.access.redhat.com/ubi9/python-311:latest
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: true
resources:
requests:
memory: "1Gi"
cpu: "1"
nvidia.com/gpu: "1"
limits:
memory: "1Gi"
cpu: "1"
nvidia.com/gpu: "1"
command: ["sh"]
args: [ "-c", "pip install --upgrade pip && pip install --upgrade huggingface_hub && python3 -c 'from huggingface_hub import snapshot_download\nsnapshot_download(repo_id=\"gpt2\", local_dir=\"/mnt/models/gpt2\", local_dir_use_symlinks=False)'"]
lugi0 marked this conversation as resolved.
Show resolved Hide resolved
volumeMounts:
- mountPath: "/mnt/models/"
name: model-volume
env:
- name: TRANSFORMERS_CACHE
value: /tmp
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: vllm-gpt2-openai
namespace: vllm-gpt2
labels:
modelmesh-enabled: "true"
spec:
predictor:
model:
runtime: kserve-vllm
modelFormat:
name: vLLM
storageUri: pvc://vlmm-gpt2-claim/
77 changes: 77 additions & 0 deletions ods_ci/tests/Resources/Files/llm/vllm/vllm_servingruntime.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: kserve-vllm
namespace: vllm-gpt2
spec:
annotations:
sidecar.istio.io/inject: "true"
sidecar.istio.io/rewriteAppHTTPProbers: "true"
serving.knative.openshift.io/enablePassthrough: "true"
opendatahub.io/dashboard: "true"
openshift.io/display-name: "vLLLM Openai entry point"
prometheus.io/port: '8080'
prometheus.io/path: "/metrics/"
multiModel: false
supportedModelFormats:
- name: vLLM
autoSelect: true
containers:
- name: kserve-container
#image: kserve/vllmserver:latest
image: quay.io/wxpe/tgis-vllm:release.74803b6
lugi0 marked this conversation as resolved.
Show resolved Hide resolved
startupProbe:
httpGet:
port: 8080
path: /health
# Allow 12 minutes to start
failureThreshold: 24
periodSeconds: 30
readinessProbe:
httpGet:
port: 8080
path: /health
periodSeconds: 30
timeoutSeconds: 5
livenessProbe:
httpGet:
port: 8080
path: /health
periodSeconds: 100
timeoutSeconds: 8
terminationMessagePolicy: "FallbackToLogsOnError"
terminationGracePeriodSeconds: 120
args:
- --port
- "8080"
- --model
- /mnt/models/gpt2
- --served-model-name
- "gpt2"
command:
- python3
- -m
- vllm.entrypoints.openai.api_server
env:
- name: STORAGE_URI
value: pvc://vlmm-gpt2-claim/
- name: HF_HUB_CACHE
value: /tmp
- name: TRANSFORMERS_CACHE
value: $(HF_HUB_CACHE)
- name: NUM_GPUS
lugi0 marked this conversation as resolved.
Show resolved Hide resolved
value: "1"
- name: CUDA_VISIBLE_DEVICES
value: "0"
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: "4"
memory: 8Gi
nvidia.com/gpu: "1"
requests:
cpu: "1"
memory: 4Gi
nvidia.com/gpu: "1"
17 changes: 17 additions & 0 deletions ods_ci/tests/Resources/OCP.resource
Original file line number Diff line number Diff line change
Expand Up @@ -258,3 +258,20 @@ Check If Pod Does Not Exist
${rc} ${output}= Run And Return Rc And Output
... oc get pod -l {label_selector} -n ${namespace}
Should Be Equal "${rc}" "1" msg=${output}

Set Default Storage Class In GCP
[Documentation] If the storage class exists we can assume we are in GCP. We force ssd-csi to be the default class
... for the duration of this test suite.
[Arguments] ${default}
${rc}= Run And Return Rc oc get storageclass ${default}
IF ${rc} == ${0}
IF "${default}" == "ssd-csi"
Run oc patch storageclass standard-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' #robocop: disable
Run oc patch storageclass ssd-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' #robocop: disable
ELSE
Run oc patch storageclass ssd-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' #robocop: disable
Run oc patch storageclass standard-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' #robocop: disable
END
ELSE
Log Proceeding with default storage class because we're not in GCP
END
10 changes: 7 additions & 3 deletions ods_ci/tests/Resources/Page/ODH/Monitoring/Monitoring.resource
Original file line number Diff line number Diff line change
Expand Up @@ -176,9 +176,13 @@ Metrics Should Exist In UserWorkloadMonitoring
Log ${index}: ${metric_search_text}
${metrics_names}= Get Thanos Metrics List thanos_url=${thanos_url} thanos_token=${thanos_token}
... search_text=${metric_search_text}
Should Not Be Empty ${metrics_names}
${metrics_names}= Split To Lines ${metrics_names}
Append To List ${metrics} @{metrics_names}
${found} = Run Keyword And Return Status Should Not Be Empty ${metrics_names}
IF ${found}
${metrics_names}= Split To Lines ${metrics_names}
Append To List ${metrics} @{metrics_names}
ELSE
Run Keyword And Continue On Failure Fail msg=${metric_search_text} not found
END
END
RETURN ${metrics}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ ${KSERVE_MODE}= RawDeployment
${MODEL_FORMAT}= pytorch #vLLM
${PROTOCOL}= grpc #http
${OVERLAY}= vllm


*** Test Cases ***
Verify User Can Serve And Query A bigscience/mt0-xxl Model
[Documentation] Basic tests for preparing, deploying and querying a LLM model
Expand Down Expand Up @@ -454,23 +456,6 @@ Suite Teardown
Set Default Storage Class In GCP default=standard-csi
RHOSi Teardown

Set Default Storage Class In GCP
[Documentation] If the storage class exists we can assume we are in GCP. We force ssd-csi to be the default class
... for the duration of this test suite.
[Arguments] ${default}
${rc}= Run And Return Rc oc get storageclass ${default}
IF ${rc} == ${0}
IF "${default}" == "ssd-csi"
Run oc patch storageclass standard-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' #robocop: disable
Run oc patch storageclass ssd-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' #robocop: disable
ELSE
Run oc patch storageclass ssd-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' #robocop: disable
Run oc patch storageclass standard-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' #robocop: disable
END
ELSE
Log Proceeding with default storage class because we're not in GCP
END

Setup Test Variables
[Arguments] ${model_name} ${kserve_mode}=Serverless ${use_pvc}=${FALSE} ${use_gpu}=${FALSE}
... ${model_path}=${model_name}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
*** Settings ***
Documentation Basic vLLM deploy test to validate metrics being correctly exposed in OpenShift
Resource ../../../../../Resources/Page/ODH/ODHDashboard/ODHModelServing.resource
Resource ../../../../../Resources/OCP.resource
Resource ../../../../../Resources/Page/Operators/ISVs.resource
Resource ../../../../../Resources/Page/ODH/ODHDashboard/ODHDashboardAPI.resource
Resource ../../../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/ModelServer.resource
Resource ../../../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/DataConnections.resource
Resource ../../../../../Resources/CLI/ModelServing/llm.resource
Resource ../../../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/Permissions.resource
Library OpenShiftLibrary
Suite Setup Suite Setup
Suite Teardown Suite Teardown
Test Tags KServe


*** Variables ***
${VLLM_RESOURCES_DIRPATH}= ods_ci/tests/Resources/Files/llm/vllm
${DL_POD_FILEPATH}= ${VLLM_RESOURCES_DIRPATH}/download_model.yaml
${SR_FILEPATH}= ${VLLM_RESOURCES_DIRPATH}/vllm_servingruntime.yaml
${IS_FILEPATH}= ${VLLM_RESOURCES_DIRPATH}/vllm-gpt2_inferenceservice.yaml
${TEST_NS}= vllm-gpt2
@{SEARCH_METRICS}= vllm:cache_config_info
lugi0 marked this conversation as resolved.
Show resolved Hide resolved
... vllm:num_requests_running
... vllm:num_requests_swapped
... vllm:num_requests_waiting
... vllm:gpu_cache_usage_perc
... vllm:cpu_cache_usage_perc
... vllm:prompt_tokens_total
... vllm:generation_tokens_total
... vllm:time_to_first_token_seconds_bucket
... vllm:time_to_first_token_seconds_count
... vllm:time_to_first_token_seconds_sum
... vllm:time_per_output_token_seconds_bucket
... vllm:time_per_output_token_seconds_count
... vllm:time_per_output_token_seconds_sum
... vllm:e2e_request_latency_seconds_bucket
... vllm:e2e_request_latency_seconds_count
... vllm:e2e_request_latency_seconds_sum
... vllm:avg_prompt_throughput_toks_per_s
... vllm:avg_generation_throughput_toks_per_s


*** Test Cases ***
Verify User Can Deploy A Model With Vllm Via CLI
Fixed Show fixed Hide fixed
[Documentation] Deploy a model (gpt2) using the vllm runtime and confirm that it's running
[Tags] Tier1 Sanity Resources-GPU RHOAIENG-6264
${rc} ${out}= Run And Return Rc And Output oc apply -f ${DL_POD_FILEPATH}
Dismissed Show dismissed Hide dismissed
Should Be Equal As Integers ${rc} ${0}
Wait For Pods To Succeed label_selector=gpt-download-pod=true namespace=${TEST_NS}
${rc} ${out}= Run And Return Rc And Output oc apply -f ${SR_FILEPATH}
Fixed Show fixed Hide fixed
Fixed Show fixed Hide fixed

Check notice

Code scanning / Robocop

Variable '{{ name }}' is assigned but not used Note test

Variable '${out}' is assigned but not used
Should Be Equal As Integers ${rc} ${0}
#TODO: Switch to common keyword for model DL and SR deploy

Check warning

Code scanning / Robocop

Missing blank space after comment character Warning test

Missing blank space after comment character
#Set Project And Runtime runtime=vllm namespace=${TEST_NS}

Check warning

Code scanning / Robocop

Missing blank space after comment character Warning test

Missing blank space after comment character
#... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=gpt2

Check warning

Code scanning / Robocop

Missing blank space after comment character Warning test

Missing blank space after comment character
#... storage_size=10Gi

Check warning

Code scanning / Robocop

Missing blank space after comment character Warning test

Missing blank space after comment character
Deploy Model Via CLI ${IS_FILEPATH} ${TEST_NS}
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=vllm-gpt2-openai
... namespace=${TEST_NS}
Query Model Multiple Times model_name=gpt2 isvc_name=vllm-gpt2-openai runtime=vllm protocol=http
... inference_type=chat-completions n_times=3 query_idx=8
... namespace=${TEST_NS} string_check_only=${TRUE}

Verify Vllm Metrics Are Present
[Documentation] Confirm vLLM metrics are exposed in OpenShift metrics
[Tags] Tier1 Sanity Resources-GPU RHOAIENG-6264
... Depends On Test Verify User Can Deploy A Model With Vllm Via CLI

Check warning

Code scanning / Robocop

Tag '{{ tag }}' should not contain spaces Warning test

Tag 'Depends On Test' should not contain spaces

Check warning

Code scanning / Robocop

Tag '{{ tag }}' should not contain spaces Warning test

Tag 'Verify User Can Deploy A Model With Vllm Via CLI' should not contain spaces
${host} = llm.Get KServe Inference Host Via CLI isvc_name=vllm-gpt2-openai namespace=${TEST_NS}

Check warning

Code scanning / Robocop

The assignment sign is not consistent within the file. Expected '{{ expected_sign }}' but got '{{ actual_sign }}' instead Warning test

The assignment sign is not consistent within the file. Expected '=' but got ' =' instead
lugi0 marked this conversation as resolved.
Show resolved Hide resolved
${rc} ${out}= Run And Return Rc And Output
... curl -ks ${host}/metrics/
Should Be Equal As Integers ${rc} ${0}
Log ${out}
${thanos_url}= Get OpenShift Thanos URL
${token}= Generate Thanos Token
Metrics Should Exist In UserWorkloadMonitoring ${thanos_url} ${token} ${SEARCH_METRICS}


*** Keywords ***
Suite Setup
Dismissed Show dismissed Hide dismissed
Skip If Component Is Not Enabled kserve
RHOSi Setup
Set Default Storage Class In GCP default=ssd-csi
${is_self_managed}= Is RHODS Self-Managed
IF ${is_self_managed}
Configure User Workload Monitoring
Enable User Workload Monitoring
#TODO: Find reliable signal for UWM being ready

Check warning

Code scanning / Robocop

Missing blank space after comment character Warning test

Missing blank space after comment character
#Sleep 10m

Check warning

Code scanning / Robocop

Missing blank space after comment character Warning test

Missing blank space after comment character
END
Load Expected Responses

Suite Teardown
Fixed Show fixed Hide fixed
Fixed Show fixed Hide fixed

Check warning

Code scanning / Robocop

Missing documentation in '{{ name }}' keyword Warning test

Missing documentation in 'Suite Teardown' keyword
Set Default Storage Class In GCP default=standard-csi
${rc}= Run And Return Rc oc delete inferenceservice -n ${TEST_NS} --all
Should Be Equal As Integers ${rc} ${0}
${rc}= Run And Return Rc oc delete servingruntime -n ${TEST_NS} --all
Should Be Equal As Integers ${rc} ${0}
${rc}= Run And Return Rc oc delete pod -n ${TEST_NS} --all
Should Be Equal As Integers ${rc} ${0}
${rc}= Run And Return Rc oc delete namespace ${TEST_NS}
Should Be Equal As Integers ${rc} ${0}
RHOSi Teardown
Loading