Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLFlow model deplyoments error when deploying PyTorch model from GCS bucket - ModuleNotFoundError: No module named 'models' #85

Open
2 of 7 tasks
akasantony opened this issue Aug 30, 2021 · 1 comment

Comments

@akasantony
Copy link

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04): Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
  • MLflow installed from (source or binary): binary
  • MLflow version (run mlflow --version): 1.19.0
  • MLflow TorchServe Deployment plugin installed from (source or binary): binary
  • MLflow TorchServe Deployment plugin version (run mlflow deployments--version): 0.1.0
  • TorchServe installed from (source or binary): binary
  • TorchServe version (run torchserve --version): 0.4.2
  • Python version: 3.9.6
  • Exact command to reproduce: mlflow deployments create -t torchserve -m gs://<model_bucket>/models/classnet/48d548cc841d4c2b9a06e975dec88c8e/artifacts/classnet_model --name classnet -C 'MODEL_FILE=models/classnet.py' -C 'HANDLER=model_handler.py' -C 'EXTRA_FILES=transforms.py,artifacts/models/desnse_depth.pt,models/dense_depth.py'

Describe the problem

I have trained a custom PyTorch models for an image classification problem. The model is logged to a Google Cloud Storage bucket. When I try to deploy the model to torchserve I get: ModuleNotFoundError: No module named 'models' error. From what I understand mlflow.pytorch.log_model() calls torch.save(model) internally. This creates a dependency on the directory structure 18325 .

Code to reproduce issue

I have saved the MLFlow model on GCS bucket using the below script:
mlflow.pytorch.log_model(model, "{}_model".format('livenet'))

The model deployment is using the command below:
mlflow deployments create -t torchserve -m gs://<model_bucket>/models/classnet/48d548cc841d4c2b9a06e975dec88c8e/artifacts/classnet_model --name classnet -C 'MODEL_FILE=models/classnet.py' -C 'HANDLER=model_handler.py' -C 'EXTRA_FILES=transforms.py,artifacts/models/desnse_depth.pt,models/dense_depth.py'

Other info / logs

2021-08-30 10:13:37,521 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG - /tmp/models/555ed568ad5f4fb4a4ebe1b231e298fb/model.pth
2021-08-30 10:13:37,523 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG - <class 'livenet.LiveNet'>
2021-08-30 10:13:38,338 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG - Backend worker process died.
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/model_service_worker.py", line 183, in <module>
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     worker.run_server()
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/model_service_worker.py", line 155, in run_server
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/model_service_worker.py", line 117, in handle_connection
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2021-08-30 10:13:38,339 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/model_service_worker.py", line 90, in load_model
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/model_loader.py", line 110, in load
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     initialize_fn(service.context)
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/torch_handler/vision_handler.py", line 20, in initialize
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     super().initialize(context)
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/torch_handler/base_handler.py", line 69, in initialize
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     self.model = self._load_pickled_model(model_dir, model_file, model_pt_path)
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/ts/torch_handler/base_handler.py", line 133, in _load_pickled_model
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     state_dict = torch.load(model_pt_path, map_location=self.device)
2021-08-30 10:13:38,340 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/torch/serialization.py", line 607, in load
2021-08-30 10:13:38,341 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
2021-08-30 10:13:38,341 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/torch/serialization.py", line 882, in _load
2021-08-30 10:13:38,341 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     result = unpickler.load()
2021-08-30 10:13:38,341 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -   File "/opt/conda/envs/vkyc/lib/python3.9/site-packages/torch/serialization.py", line 875, in find_class
2021-08-30 10:13:38,341 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG -     return super().find_class(mod_name, name)
2021-08-30 10:13:38,341 [INFO ] W-9000-spoofnet_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'models'

What component(s) does this bug affect?

Components

  • area/deploy: Main deployment plugin logic
  • area/build: Build and test infrastructure for MLflow TorchServe Deployment Plugin
  • area/docs: MLflow TorchServe Deployment Plugin documentation pages
  • area/examples: Example code
@akasantony akasantony changed the title MLFlow model deplyoments error when deploying PyTorch model - ModuleNotFoundError: No module named 'models' MLFlow model deplyoments error when deploying PyTorch model from GCS bucket - ModuleNotFoundError: No module named 'models' Aug 30, 2021
@shrinath-suresh
Copy link
Collaborator

shrinath-suresh commented Sep 7, 2021

@akasantony

From the information you have shared, the model is saved using mlflow.pytorch.log_model and while loading the model, base handler is trying to load it as state dict.

mlflow.pytorch.log_model uses cloudpickle and saves the entire model structure using torch.save. To load the model, you can refer this custom handler IrisClassification example. In this example, the model is saved using mlflow.pytorch and loaded using torch.load.

We are about to make another release. there are lot of changes went in after 0.1.0 release. Can you please install the mlflow-torchserve plugin from source - Reference - https://github.com/mlflow/mlflow-torchserve/blob/master/README.md#installation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants