-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to support TorchServe on cpu & gpu #15
base: main
Are you sure you want to change the base?
Changes to support TorchServe on cpu & gpu #15
Conversation
So far we have successfully verified the PR is for CPU only, both fastapi and torchserve settings for the new parameter in the config.properties:
We are planning to release an update for the related AWS guidance shortly containing other important changes without this PR included yet, then will focus on merging this PR upon additional testing on other architectures (AWS Graviton, GPU etc). Thanks |
Update 4/17/24: tested this PR using images built for "torchserve" API server on AWS Graviton and Inferentia 2 based nodes. In both cases there were run-time container errors like: |
Also, the /3-pack/Dockerfile.torchserve file needs to have this command before the:
in order for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Dockerfile.torchserve fails to build ML inference images on inf2 and graviton based EC2 instances. However, appears to work fine w/o changes - build corresponding images that can run on matching processor architecture EKS nodes - on X86_64 and GPU based instances.
In order for the ./pack.sh
command to work, user model-server
needs to be explicitly created:
ARG BASE_IMAGE
FROM $BASE_IMAGE
ARG MODEL_NAME
ARG MODEL_FILE_NAME
ARG PROCESSOR
LABEL description="Model $MODEL_NAME packed in a TorchServe container to run on $PROCESSOR"
#DZ: added line to create a user that is later used as an owner of /home/model-server folder
RUN useradd -m model-server
WORKDIR /home/model-server
COPY 3-pack/torchserve torchserve
WORKDIR /home/model-server/torchserve
USER root
COPY 3-pack/torchserve/dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
&& chown -R model-server /home/model-server
So the last command that was failing before chown -R model-server /home/model-server
now works fine, images are built and available from public ECR here: public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve
However, their execution fails at runtime with nondeterministic error:
Normal Pulling 12m (x5 over 13m) kubelet Pulling image "public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve" Normal Pulled 12m kubelet Successfully pulled image "public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve" in 157ms (157ms including waiting) Warning BackOff 3m33s (x48 over 13m) kubelet Back-off restarting failed container main in pod bert-base-multilingual-cased-inf2-0-6d66b9c798-2v7zz_mpi(2bebae59-24ee-4a1f-a98d-c1f0facf5604)
@dzilbermanvmw Thanks for checking. I havent tested them on both inf2 and graviton . Will look into these next week |
|
Adding further details - for CPU, the built and deployment was successful. For GPU, built was successful however for deployment to be successful, we have commented out the below in limits section in 4-deploy/app-bert-base-multilingual-cased-gpu-g4dn.xlarge/bert-base-multilingual-cased-gpu-0.yaml file. resources: |
*What is the PR about
This PR is for integrating TorchServe with this solution
./test.sh run bmk
From UX POV, User needs to change
model_server=torchserve
inconfig.properties
. Rest of the flow is the same.Currently, this is supported for CPU only
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
CPU Logs
GPU logs