Skip to content

Commit

Permalink
Merge branch 'master' into pytorch_api_rest
Browse files Browse the repository at this point in the history
  • Loading branch information
rpancham authored Jan 15, 2025
2 parents c6eaff6 + dc83836 commit 7d1d2b3
Show file tree
Hide file tree
Showing 13 changed files with 172 additions and 20 deletions.
5 changes: 1 addition & 4 deletions ods_ci/tests/Resources/CLI/ModelServing/llm.resource
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ ${SERVICEMESH_CR_NS}= istio-system
... triton-kserve-runtime=${LLM_RESOURCES_DIRPATH}/serving_runtimes/triton_servingruntime_{{protocol}}.yaml # robocop: disable
${DOWNLOAD_PVC_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_model_in_pvc.yaml
${DOWNLOAD_PVC_FILLED_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_model_in_pvc_filled.yaml

${DOWNLOAD_PROMPTS_PVC_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_prompts_in_pvc.yaml
${DOWNLOAD_PROMPTS_PVC_FILLED_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/download_prompts_in_pvc_filled.yaml
${MATCHING_RATIO}= ${60}
Expand Down Expand Up @@ -141,7 +140,7 @@ Compile Inference Service YAML
[Arguments] ${isvc_name} ${model_storage_uri} ${model_format}=caikit ${serving_runtime}=caikit-tgis-runtime
... ${kserve_mode}=${NONE} ${sa_name}=${DEFAULT_BUCKET_SA_NAME} ${canaryTrafficPercent}=${EMPTY} ${min_replicas}=1
... ${scaleTarget}=1 ${scaleMetric}=concurrency ${auto_scale}=${NONE}
... ${requests_dict}=&{EMPTY} ${limits_dict}=&{EMPTY} ${overlays}=${EMPTY} ${version}=${EMPTY}
... ${requests_dict}=&{EMPTY} ${limits_dict}=&{EMPTY} ${overlays}=${EMPTY} ${version}=${EMPTY}
IF '${auto_scale}' == '${NONE}'
${scaleTarget}= Set Variable ${EMPTY}
${scaleMetric}= Set Variable ${EMPTY}
Expand Down Expand Up @@ -199,7 +198,6 @@ Compile Inference Service YAML
Log message=Using defaultDeploymentMode set in the DSC: ${mode}
END


Model Response Should Match The Expectation
[Documentation] Checks that the actual model response matches the expected answer.
... The goals are:
Expand Down Expand Up @@ -952,7 +950,6 @@ Remove Model Mount Path From Runtime
... oc patch servingruntime ${runtime} -n ${namespace} --type='json' -p='[{"op": "remove", "path": "/spec/containers/0/args/1"}]'
Should Be Equal As Integers ${rc} ${0} msg=${out}


Set Runtime Image
[Documentation] Sets up runtime variables for the Suite
[Arguments] ${gpu_type}
Expand Down
4 changes: 2 additions & 2 deletions ods_ci/tests/Resources/Common.robot
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,9 @@ Get All Text Under Element
${elements}= Get WebElements ${parent_element}
${text_list}= Create List
FOR ${element} IN @{elements}
${text}= Run Keyword And Ignore Error
${status} ${text}= Run Keyword And Ignore Error
... Get Element Attribute ${element} textContent
Append To List ${text_list} ${text}
Run Keyword If '${status}' == 'PASS' Append To List ${text_list} ${text}
END
RETURN ${text_list}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: triton-kserve-runtime
spec:
annotations:
prometheus.kserve.io/path: /metrics
prometheus.kserve.io/port: "8002"
containers:
- args:
- tritonserver
- --model-store=/mnt/models
- --grpc-port=9000
- --http-port=8080
- --allow-grpc=true
- --allow-http=true
image: nvcr.io/nvidia/tritonserver:24.10-py3
name: kserve-container
ports:
- containerPort: 9000
name: h2c
protocol: TCP
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
protocolVersions:
- v2
- grpc-v2
supportedModelFormats:
- autoSelect: true
name: tensorrt
version: "8"
- autoSelect: true
name: tensorflow
version: "1"
- autoSelect: true
name: tensorflow
version: "2"
- autoSelect: true
name: onnx
version: "1"
- name: pytorch
version: "1"
- autoSelect: true
name: triton
version: "2"
- autoSelect: true
name: xgboost
version: "1"
- autoSelect: true
name: python
version: "1"

Large diffs are not rendered by default.

17 changes: 11 additions & 6 deletions ods_ci/tests/Resources/Page/Components/Menu.robot
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,22 @@ Resource ../ODH/ODHDashboard/ODHDashboard.robot
Library String
Library JupyterLibrary


*** Variables ***
${SIDEBAR_XP} //div[@id="page-sidebar"]


*** Keywords ***
Navigate To Page
[Arguments]
... ${menu}
... ${submenu}=${NONE}
... ${timeout}=10s
Wait Until Element Is Visible //div[@id="page-sidebar"] timeout=${timeout}
Wait Until Element Is Visible ${SIDEBAR_XP} timeout=${timeout}
Wait Until Page Contains ${menu}
${menu}= Set Variable If "${menu}" == "Deployed models" Model Serving ${menu}
IF "${submenu}" == "${NONE}" Run Keyword And Return
... Click Link ${menu}
... Click Button ${SIDEBAR_XP}//button[text()="${menu}"]
${is_menu_expanded}= Menu.Is Menu Expanded ${menu}
IF "${is_menu_expanded}" == "false" Menu.Click Menu ${menu}
Wait Until Page Contains ${submenu}
Expand All @@ -23,20 +28,20 @@ Navigate To Page
Click Menu
[Arguments]
... ${menu}
Click Element //button[text()="${menu}"]
Click Element ${SIDEBAR_XP}//button[text()="${menu}"]

Click Submenu
[Arguments]
... ${submenu}
Click Element //a[text()="${submenu}"]
Click Element ${SIDEBAR_XP}//a[text()="${submenu}"]

Is Menu Expanded
[Arguments]
... ${menu}
${is_menu_expanded}= Get Element Attribute //button[text()="${menu}"] attribute=aria-expanded
${is_menu_expanded}= Get Element Attribute ${SIDEBAR_XP}//button[text()="${menu}"] attribute=aria-expanded
RETURN ${is_menu_expanded}

Page Should Contain Menu
[Arguments] ${menu}
Page Should Contain Element //button[text()="${menu}"]
Page Should Contain Element ${SIDEBAR_XP}//button[text()="${menu}"]

Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Resource ../../Common.robot

*** Variables ***
${PROJECT_XP}= xpath=//div[text()='Project']
${DISTRIBUITED_WORKLOAD_METRICS_TITLE_XP}= xpath=//h1[text()="Distributed Workload Metrics"]
${DISTRIBUITED_WORKLOAD_METRICS_TEXT_XP}= xpath=//div[text()='Monitor the metrics of your active resources.']
${PROJECT_METRICS_TAB_XP}= xpath=//button[@aria-label="Project metrics tab"]
${WORKLOAD_STATUS_TAB_XP}= xpath=//button[@aria-label="Distributed workload status tab"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ Open Pipeline Run
Wait Until Page Contains Element xpath=//*[@data-testid="active-runs-tab"] timeout=30s
Click Element xpath=//*[@data-testid="active-runs-tab"]
Wait Until Page Contains Element xpath=//span[text()='${pipeline_run_name}']
Click Element xpath=//span[text()='${pipeline_run_name}']
Click Element xpath=//td[@data-label="Name"]//span[contains(text(), '${pipeline_run_name}')]
Wait Until Page Contains Element xpath=//div[@data-test-id='topology']

# robocop: disable:line-too-long
Expand Down
2 changes: 1 addition & 1 deletion ods_ci/tests/Tests/0500__ide/0502__ide_elyra.robot
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Verify Pipelines Integration With Elyra Running Hello World Pipeline Test #
# We need to navigate to the page because the project name hold a state
# In a fresh cluster, if not state found, it will select the first one
# In this case, the first could not be the project created
Menu.Navigate To Page Data Science Pipelines
Menu.Navigate To Page Data Science Pipelines Pipelines
Select Pipeline Project By Name ${PRJ_TITLE}
Log ${pipeline_run_name}
Verify Pipeline Run Is Completed ${pipeline_run_name} timeout=5m experiment_name=${experiment_name}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ Verify Workload Metrics Home page Contents
[Tags] RHOAIENG-4837
... Sanity DistributedWorkloads Training WorkloadsOrchestration
Open Distributed Workload Metrics Home Page
Wait For Dashboard Page Title Distributed Workload Metrics
Wait Until Element Is Visible ${DISTRIBUITED_WORKLOAD_METRICS_TEXT_XP} timeout=20
Wait Until Element Is Visible ${PROJECT_METRICS_TAB_XP} timeout=20
Page Should Contain Element ${DISTRIBUITED_WORKLOAD_METRICS_TITLE_XP}
Page Should Contain Element ${DISTRIBUITED_WORKLOAD_METRICS_TEXT_XP}
Page Should Contain Element ${PROJECT_XP}
Page Should Contain Element ${PROJECT_METRICS_TAB_XP}
Expand Down Expand Up @@ -143,9 +143,11 @@ Verify The Workload Metrics By Submitting Ray Workload
Open Distributed Workload Metrics Home Page
Select Distributed Workload Project By Name ${PRJ_TITLE}
Select Refresh Interval 15 seconds
# verifying workload metrics in Dark mode
Click Button xpath=//button[@aria-label="dark theme"]
Wait Until Element Is Visible ${DISTRIBUITED_WORKLOAD_RESOURCE_METRICS_TITLE_XP} timeout=20
Wait For Job With Status ${RAY_CLUSTER_NAME} Admitted 30
Wait For Job With Status ${RAY_CLUSTER_NAME} Running 180
Wait For Job With Status ${RAY_CLUSTER_NAME} Running 300

${cpu_requested} = Get CPU Requested ${PRJ_TITLE} ${LOCAL_QUEUE_NAME}
${memory_requested} = Get Memory Requested ${PRJ_TITLE} ${LOCAL_QUEUE_NAME} RayCluster
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ ${PRJ_TITLE}= ms-triton-project
${PRJ_DESCRIPTION}= project used for model serving triton runtime tests
${MODEL_CREATED}= ${FALSE}
${PATTERN}= https:\/\/([^\/:]+)
${ONNX_MODEL_NAME}= densenet_onnx
${ONNX_MODEL_NAME}= densenetonnx
${ONNX_MODEL_LABEL}= densenetonnx
${ONNX_GRPC_RUNTIME_NAME}= triton-kserve-grpc
${ONNX_RUNTIME_NAME}= triton-kserve-rest
Expand Down Expand Up @@ -407,7 +407,7 @@ Test FIL Model Rest Inference Via UI (Triton on Kserve) # robocop: off=too-lo
... Clean All Models Of Current User
... AND
... Delete Serving Runtime Template From CLI displayed_name=triton-kserve-rest

Test FIL Model Grpc Inference Via UI (Triton on Kserve) # robocop: off=too-long-test-case
[Documentation] Test the deployment of an fil model in Kserve using Triton
[Tags] Sanity RHOAIENG-15823
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,15 @@ Test Tags Kserve

*** Variables ***
${PYTHON_MODEL_NAME}= python
${ONNX_MODEL_NAME}= densenetonnx
${EXPECTED_INFERENCE_GRPC_OUTPUT_PYTHON}= {"modelName":"python","modelVersion":"1","id":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":["4"]},{"name":"OUTPUT1","datatype":"FP32","shape":["4"]}],"rawOutputContents":["AgAAAAAAAAAAAAAAAAAAAA==","AAQAAAAAAAAAAAAAAAAAAA=="]}
${INFERENCE_GRPC_INPUT_PYTHONFILE}= tests/Resources/Files/triton/kserve-triton-python-grpc-input.json
${KSERVE_MODE}= Serverless # Serverless
${PROTOCOL_GRPC}= grpc
${EXPECTED_INFERENCE_REST_OUTPUT_PYTHON}= {"model_name":"python","model_version":"1","outputs":[{"name":"OUTPUT0","datatype":"FP32","shape":[4],"data":[0.921442985534668,0.6223347187042236,0.8059385418891907,1.2578542232513428]},{"name":"OUTPUT1","datatype":"FP32","shape":[4],"data":[0.49091365933418274,-0.027157962322235107,-0.5641784071922302,0.6906309723854065]}]}
${INFERENCE_REST_INPUT_PYTHON}= @tests/Resources/Files/triton/kserve-triton-python-rest-input.json
${EXPECTED_INFERENCE_REST_OUTPUT_FILE_ONNX}= tests/Resources/Files/triton/kserve-triton-onnx-rest-output.json
${INFERENCE_REST_INPUT_ONNX}= @tests/Resources/Files/triton/kserve-triton-onnx-rest-input.json
${KSERVE_MODE}= Serverless # Serverless
${PROTOCOL}= http
${TEST_NS}= tritonmodel
Expand All @@ -34,6 +41,8 @@ ${KSERVE_RUNTIME_REST_NAME}= triton-kserve-runtime
${PYTORCH_MODEL_NAME}= resnet50
${INFERENCE_REST_INPUT_PYTORCH}= @tests/Resources/Files/triton/kserve-triton-resnet-rest-input.json
${EXPECTED_INFERENCE_REST_OUTPUT_FILE__PYTORCH}= tests/Resources/Files/triton/kserve-triton-resnet-rest-output.json
${PATTERN}= https:\/\/([^\/:]+)
${PROTOBUFF_FILE}= tests/Resources/Files/triton/grpc_predict_v2.proto


*** Test Cases ***
Expand Down Expand Up @@ -116,6 +125,90 @@ Test Pytorch Model Rest Inference Via API (Triton on Kserve) # robocop: off=t
... isvc_names=${models_names} wait_prj_deletion=${FALSE} kserve_mode=${KSERVE_MODE}
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true

Check warning

Code scanning / Robocop

Trailing whitespace at the end of line Warning test

Trailing whitespace at the end of line
Test Python Model Grpc Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case
[Documentation] Test the deployment of python model in Kserve using Triton
[Tags] Tier2 RHOAIENG-16912
Setup Test Variables model_name=${PYTHON_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE}
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (123/120)
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL_GRPC} namespace=${test_namespace}
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${PYTHON_MODEL_NAME}
... storage_size=100Mi memory_request=100Mi
${requests}= Create Dictionary memory=1Gi
Compile Inference Service YAML isvc_name=${PYTHON_MODEL_NAME}
... sa_name=models-bucket-sa
... model_storage_uri=${storage_uri}
... model_format=python serving_runtime=${KSERVE_RUNTIME_REST_NAME}
... version="1"
... limits_dict=${limits} requests_dict=${requests} kserve_mode=${KSERVE_MODE}
Deploy Model Via CLI isvc_filepath=${INFERENCESERVICE_FILLED_FILEPATH}
... namespace=${test_namespace}
# File is not needed anymore after applying
Remove File ${INFERENCESERVICE_FILLED_FILEPATH}
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
... namespace=${test_namespace}
${pod_name}= Get Pod Name namespace=${test_namespace}
... label_selector=serving.kserve.io/inferenceservice=${PYTHON_MODEL_NAME}
${valued} ${host}= Run And Return Rc And Output oc get ksvc ${PYTHON_MODEL_NAME}-predictor -o jsonpath='{.status.url}'
Log ${valued}
${host}= Evaluate re.search(r"${PATTERN}", r"${host}").group(1) re
Log ${host}
${inference_output}= Query Model With GRPCURL host=${host} port=443
... endpoint=inference.GRPCInferenceService/ModelInfer
... json_body=@ input_filepath=${INFERENCE_GRPC_INPUT_PYTHONFILE}
... insecure=${True} protobuf_file=${PROTOBUFF_FILE} json_header=${NONE}
${inference_output}= Evaluate json.dumps(${inference_output})
Log ${inference_output}
${result} ${list}= Inference Comparison ${EXPECTED_INFERENCE_GRPC_OUTPUT_PYTHON} ${inference_output}
Log ${result}
Log ${list}
[Teardown] Run Keywords
... Clean Up Test Project test_ns=${test_namespace}
... isvc_names=${models_names} wait_prj_deletion=${FALSE} kserve_mode=${KSERVE_MODE}
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true

Test Onnx Model Rest Inference Via API (Triton on Kserve) # robocop: off=too-long-test-case
[Documentation] Test the deployment of onnx model in Kserve using Triton
[Tags] Tier2 RHOAIENG-16908
Setup Test Variables model_name=${ONNX_MODEL_NAME} use_pvc=${FALSE} use_gpu=${FALSE}
... kserve_mode=${KSERVE_MODE} model_path=triton/model_repository/
Log ${ONNX_MODEL_NAME}
Set Project And Runtime runtime=${KSERVE_RUNTIME_REST_NAME} protocol=${PROTOCOL} namespace=${test_namespace}
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${ONNX_MODEL_NAME}
... storage_size=100Mi memory_request=100Mi
${requests}= Create Dictionary memory=1Gi
Compile Inference Service YAML isvc_name=${ONNX_MODEL_NAME}
... sa_name=models-bucket-sa
... model_storage_uri=${storage_uri}
... model_format=onnx serving_runtime=${KSERVE_RUNTIME_REST_NAME}
... version="1"
... limits_dict=${limits} requests_dict=${requests} kserve_mode=${KSERVE_MODE}
Deploy Model Via CLI isvc_filepath=${INFERENCESERVICE_FILLED_FILEPATH}
... namespace=${test_namespace}
# File is not needed anymore after applying
Remove File ${INFERENCESERVICE_FILLED_FILEPATH}
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${ONNX_MODEL_NAME}
... namespace=${test_namespace}
${pod_name}= Get Pod Name namespace=${test_namespace}
... label_selector=serving.kserve.io/inferenceservice=${ONNX_MODEL_NAME}
${service_port}= Extract Service Port service_name=${ONNX_MODEL_NAME}-predictor protocol=TCP
... namespace=${test_namespace}
IF "${KSERVE_MODE}"=="RawDeployment"
Start Port-forwarding namespace=${test_namespace} pod_name=${pod_name} local_port=${service_port}
... remote_port=${service_port} process_alias=triton-process
END
${EXPECTED_INFERENCE_REST_OUTPUT_ONNX}= Load Json File
... file_path=${EXPECTED_INFERENCE_REST_OUTPUT_FILE_ONNX} as_string=${TRUE}
Verify Model Inference With Retries model_name=${ONNX_MODEL_NAME} inference_input=${INFERENCE_REST_INPUT_ONNX}
... expected_inference_output=${EXPECTED_INFERENCE_REST_OUTPUT_ONNX} project_title=${test_namespace}
... deployment_mode=Cli kserve_mode=${KSERVE_MODE} service_port=${service_port}
... end_point=/v2/models/${model_name}/infer retries=3
[Teardown] Run Keywords
... Clean Up Test Project test_ns=${test_namespace}
... isvc_names=${models_names} wait_prj_deletion=${FALSE} kserve_mode=${KSERVE_MODE}
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process triton-process kill=true


*** Keywords ***
Expand Down

0 comments on commit 7d1d2b3

Please sign in to comment.