fix safetensors issue

1b5d · May 5, 2023 · 1b2037b · 1b2037b
1 parent 250ec5c
commit 1b2037b
Show file tree

Hide file tree

Showing 4 changed files with 33 additions and 8 deletions.
diff --git a/.pylintrc b/.pylintrc
@@ -1,2 +1,3 @@
 [MESSAGES CONTROL]
 disable=duplicate-code
+generated-members=numpy.*, torch.*
diff --git a/README.md b/README.md
@@ -18,6 +18,7 @@ tested on CPU with the following models :
 tested on GPU with GPTQ-for-LlaMa with
 
 - Koala 7B-4bit-128g
+- wizardLM 7B-4bit-128g
 
 Contribution for supporting more models is welcomed.
 
@@ -29,8 +30,8 @@ Contribution for supporting more models is welcomed.
 - [x] Support GPTQ-for-LLaMa
 - [ ] Lora support
 - [ ] huggingface pipeline
-- [ ] Write an implementation for OpenAI
-- [ ] Write an implementation for RWKV-LM
+- [ ] Support OpenAI
+- [ ] Support RWKV-LM
 
 # Usage
 
@@ -151,16 +152,37 @@ curl --location 'localhost:8000/generate' \
 
 ## Llama / Alpaca on GPU - using GPTQ-for-LLaMa (beta)
 
-**Note**: According to [nvidia-docker](https://github.com/NVIDIA/nvidia-docker), you might want to install the [NVIDIA Driver](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html) on your host machine
+**Note**: According to [nvidia-docker](https://github.com/NVIDIA/nvidia-docker), you might want to install the [NVIDIA Driver](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html) on your host machine. Verify that your nvidia environment is properly by running this:
 
-You can also run the Llama model using GPTQ-for-LLaMa 4 bit quantization, you can use a docker image specially built for that purpose `1b5d/llm-api:0.0.2-gptq-llama-cuda` instead of the default image.
+```
+docker run --rm --gpus all nvidia/cuda:11.7.1-base-ubuntu20.04 nvidia-smi
+```
+
+You should see a table showing you the current nvidia driver version and some other info:
+```
++---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 11.7     |
+|-----------------------------------------+----------------------+----------------------+
+...
+|=======================================================================================|
+|  No running processes found                                                           |
++---------------------------------------------------------------------------------------+
+```
+
+You can also run the Llama model using GPTQ-for-LLaMa 4 bit quantization, you can use a docker image specially built for that purpose `1b5d/llm-api:0.0.3-gptq-llama-cuda` instead of the default image.
 
 a separate docker-compose file is also available to run this mode:
 
 ```
 docker compose -f docker-compose.gptq-llama-cuda.yaml up
 ```
 
+or by directly running the container:
+
+```
+docker run --gpus all -v $PWD/models/:/models:rw -v $PWD/config.yaml:/llm-api/config.yaml:ro -p 8000:8000 1b5d/llm-api:0.0.3-gptq-llama-cuda
+```
+
 Example config file:
 
 ```

diff --git a/app/llms/gptq_llama/gptq_llama.py b/app/llms/gptq_llama/gptq_llama.py
@@ -95,7 +95,11 @@ def __init__(self, params: Dict[str, str]) -> None:
         os.environ["CUDA_VISIBLE_DEVICES"] = cuda_visible_devices
         self.device = torch.device(dev)
         self.model = self._load_quant(
-            settings.setup_params["repo_id"], model_path, wbits, group_size, self.device
+            settings.setup_params["repo_id"],
+            model_path,
+            wbits,
+            group_size,
+            cuda_visible_devices,
         )
 
         self.model.to(self.device)

diff --git a/docker-compose.gptq-llama-cuda.yaml b/docker-compose.gptq-llama-cuda.yaml
@@ -2,7 +2,7 @@ version: '3'
 
 services:
   app:
-    image: 1b5d/llm-api:0.0.2-gptq-llama-cuda
+    image: 1b5d/llm-api:0.0.3-gptq-llama-cuda
     container_name: llm-api-app
     ports:
       - "8000:8000"
@@ -11,8 +11,6 @@ services:
     volumes:
       - "./models:/models:rw"
       - "./config.yaml:/llm-api/config.yaml:ro"
-    ulimits:
-      memlock: 16000000000
     deploy:
       resources:
         reservations: