-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating Python model REST protocol test on triton for Kserve ( UI -> API ) #2133
Changes from 5 commits
063cfe2
855aa8b
8dc1f34
02d28da
569ee66
b5b250f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
apiVersion: serving.kserve.io/v1alpha1 | ||
kind: ServingRuntime | ||
metadata: | ||
name: triton-kserve-runtime | ||
spec: | ||
annotations: | ||
prometheus.kserve.io/path: /metrics | ||
prometheus.kserve.io/port: "8002" | ||
containers: | ||
- args: | ||
- tritonserver | ||
- --model-store=/mnt/models | ||
- --grpc-port=9000 | ||
- --http-port=8080 | ||
- --allow-grpc=true | ||
- --allow-http=true | ||
image: nvcr.io/nvidia/tritonserver:23.05-py3 | ||
name: kserve-container | ||
resources: | ||
limits: | ||
cpu: "1" | ||
memory: 2Gi | ||
requests: | ||
cpu: "1" | ||
memory: 2Gi | ||
ports: | ||
- containerPort: 8080 | ||
protocol: TCP | ||
protocolVersions: | ||
- v2 | ||
- grpc-v2 | ||
supportedModelFormats: | ||
- autoSelect: true | ||
name: tensorrt | ||
version: "8" | ||
- autoSelect: true | ||
name: tensorflow | ||
version: "1" | ||
- autoSelect: true | ||
name: tensorflow | ||
version: "2" | ||
- autoSelect: true | ||
name: onnx | ||
version: "1" | ||
- name: pytorch | ||
version: "1" | ||
- autoSelect: true | ||
name: triton | ||
version: "2" | ||
- autoSelect: true | ||
name: xgboost | ||
version: "1" | ||
- autoSelect: true | ||
name: python | ||
version: "1" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
*** Settings *** | ||
Documentation Suite of test cases for Triton in Kserve | ||
Library OperatingSystem | ||
Library ../../../../libs/Helpers.py | ||
Resource ../../../Resources/Page/ODH/JupyterHub/HighAvailability.robot | ||
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHModelServing.resource | ||
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/Projects.resource | ||
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/DataConnections.resource | ||
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/ModelServer.resource | ||
Resource ../../../Resources/Page/ODH/ODHDashboard/ODHDashboardSettingsRuntimes.resource | ||
Resource ../../../Resources/Page/ODH/Monitoring/Monitoring.resource | ||
Resource ../../../Resources/OCP.resource | ||
Resource ../../../Resources/CLI/ModelServing/modelmesh.resource | ||
Resource ../../../Resources/Common.robot | ||
Resource ../../../Resources/CLI/ModelServing/llm.resource | ||
Suite Setup Suite Setup | ||
Suite Teardown Suite Teardown | ||
Test Tags Kserve | ||
|
||
Check warning Code scanning / Robocop Invalid number of empty lines between sections ({{ empty_lines }}/{{ allowed_empty_lines }}) Warning test
Invalid number of empty lines between sections (1/2)
|
||
*** Variables *** | ||
${PYTHON_MODEL_NAME}= python | ||
${EXPECTED_INFERENCE_REST_OUTPUT_PYTHON}= {"model_name":"python","model_version":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":[4],"data":[0.921442985534668,0.6223347187042236,0.8059385418891907,1.2578542232513428]},{"name":"OUTPUT1","datatype":"FP32","shape":[4],"data":[0.49091365933418274,-0.027157962322235107,-0.5641784071922302,0.6906309723854065]}]} | ||
Check warning Code scanning / Robocop Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Line is too long (375/120)
|
||
${INFERENCE_REST_INPUT_PYTHON}= @tests/Resources/Files/triton/kserve-triton-python-rest-input.json | ||
${KSERVE_MODE}= Serverless # Serverless | ||
${PROTOCOL}= http | ||
${TEST_NS}= tritonmodel | ||
Check notice Code scanning / Robocop Variable '{{ name }}' is assigned but not used Note test
Variable '${TEST_NS}' is assigned but not used
|
||
${DOWNLOAD_IN_PVC}= ${FALSE} | ||
${MODELS_BUCKET}= ${S3.BUCKET_1} | ||
Check notice Code scanning / Robocop Variable '{{ name }}' is assigned but not used Note test
Variable '${MODELS_BUCKET}' is assigned but not used
|
||
${LLM_RESOURCES_DIRPATH}= tests/Resources/Files/llm | ||
${INFERENCESERVICE_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/serving_runtimes/base/isvc.yaml | ||
Check notice Code scanning / Robocop Variable '{{ name }}' is assigned but not used Note test
Variable '${INFERENCESERVICE_FILEPATH}' is assigned but not used
|
||
${INFERENCESERVICE_FILEPATH_NEW}= ${LLM_RESOURCES_DIRPATH}/serving_runtimes/isvc | ||
${INFERENCESERVICE_FILLED_FILEPATH}= ${INFERENCESERVICE_FILEPATH_NEW}/isvc_filled.yaml | ||
${KSERVE_RUNTIME_REST_NAME}= triton-kserve-runtime | ||
|
||
|
||
*** Test Cases *** | ||
Test Python Model Rest Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case | ||
Check warning Code scanning / Robocop Test case '{{ test_name }}' has too many keywords inside ({{ keyword_count }}/{{ max_allowed_count }}) Warning test
Test case 'Test Python Model Rest Inference Via API (Triton on Kserve)' has too many keywords inside (11/10)
|
||
[Documentation] Test the deployment of python model in Kserve using Triton | ||
[Tags] Tier2 RHOAIENG-16912 | ||
Setup Test Variables model_name=${PYTHON_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE} | ||
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/ | ||
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL} namespace=${test_namespace} | ||
Check warning Code scanning / Robocop Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Line is too long (123/120)
|
||
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${PYTHON_MODEL_NAME} | ||
... storage_size=100Mi memory_request=100Mi | ||
${requests}= Create Dictionary memory=1Gi | ||
Check notice Code scanning / Robocop {{ create_keyword }} can be replaced with VAR Note test
Create Dictionary can be replaced with VAR
|
||
Compile Inference Service YAML isvc_name=${PYTHON_MODEL_NAME} | ||
... sa_name=models-bucket-sa | ||
... model_storage_uri=${storage_uri} | ||
... model_format=python serving_runtime=${KSERVE_RUNTIME_REST_NAME} | ||
... version="1" | ||
... limits_dict=${limits} requests_dict=${requests} kserve_mode=${KSERVE_MODE} | ||
Deploy Model Via CLI isvc_filepath=${INFERENCESERVICE_FILLED_FILEPATH} | ||
... namespace=${test_namespace} | ||
# File is not needed anymore after applying | ||
Remove File ${INFERENCESERVICE_FILLED_FILEPATH} | ||
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME} | ||
... namespace=${test_namespace} | ||
Comment on lines
+40
to
+57
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this block is repeated in multiple tests; create a test template and re-use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will discuss with @tarukumar and create a template. |
||
${pod_name}= Get Pod Name namespace=${test_namespace} | ||
... label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME} | ||
${service_port}= Extract Service Port service_name=${PYTHON_MODEL_NAME}-predictor protocol=TCP | ||
... namespace=${test_namespace} | ||
IF "${KSERVE_MODE}"=="RawDeployment" | ||
Start Port-forwarding namespace=${test_namespace} pod_name=${pod_name} local_port=${service_port} | ||
... remote_port=${service_port} process_alias=triton-process | ||
END | ||
Verify Model Inference With Retries model_name=${PYTHON_MODEL_NAME} inference_input=${INFERENCE_REST_INPUT_PYTHON} | ||
Check warning Code scanning / Robocop Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Line is too long (123/120)
|
||
... expected_inference_output=${EXPECTED_INFERENCE_REST_OUTPUT_PYTHON} project_title=${test_namespace} | ||
... deployment_mode=Cli kserve_mode=${KSERVE_MODE} service_port=${service_port} | ||
... end_point=/v2/models/${model_name}/infer retries=3 | ||
[Teardown] Run Keywords | ||
... Clean Up Test Project test_ns=${test_namespace} | ||
... isvc_names=${models_names} wait_prj_deletion=${FALSE} kserve_mode=${KSERVE_MODE} | ||
... AND | ||
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true | ||
|
||
|
||
*** Keywords *** | ||
Suite Setup | ||
[Documentation] Suite setup keyword | ||
Set Library Search Order SeleniumLibrary | ||
Skip If Component Is Not Enabled kserve | ||
RHOSi Setup | ||
Load Expected Responses | ||
Set Default Storage Class In GCP default=ssd-csi | ||
|
||
Suite Teardown | ||
[Documentation] Suite teardown keyword | ||
Set Default Storage Class In GCP default=standard-csi | ||
RHOSi Teardown | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the keyword already exists in
ods-ci/ods_ci/tests/Tests/1000__model_serving/1007__model_serving_llm/1007__model_serving_llm_models.robot
Line 1305 in 1157066
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnetser It does not come under the scope of this PR; it will be checked and updated in upcoming PRs.