Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

加载chatglm-6b-int4-qe会报错 #35

Open
ystyle opened this issue Mar 26, 2023 · 2 comments
Open

加载chatglm-6b-int4-qe会报错 #35

ystyle opened this issue Mar 26, 2023 · 2 comments

Comments

@ystyle
Copy link

ystyle commented Mar 26, 2023

目录如下:

lxy52@YSTYLE-PC MINGW64 /d/Code/Python/ChatGLM-6B (main)
$ tree -d -L 2
.
|-- ChatGLM-webui
|   |-- modules
|   |-- outputs
|   `-- scripts
|-- THUDM
|   |-- chatglm-6b
|   |-- chatglm-6b-int4
|   |-- chatglm-6b-int4-qe
|   `-- chatglm-6b-main
|-- examples
|-- limitations
|-- outputs
|   |-- markdown
|   `-- save
`-- resources

15 directories

/d/Code/Python/ChatGLM-6B/ChatGLM-webui 下运行 python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\ 会报如下错误,加载chatglm-6b-int4也会报错, 是目录不能用回退的方式加载么?还是什么原因,上周某个版本就可以的,更新了后就不行了

(ChatGLM) PS D:\Code\Python\ChatGLM-6B\ChatGLM-webui> python .\webui.py --model-path ..\THUDM\chatglm-6b-int4-qe\
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
GPU memory: 8.59 GB
No compiled kernel found.
Compiling kernels : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.c -shared -o C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Kernels compiled : C:\Users\lxy52\.cache\huggingface\modules\transformers_modules\quantization_kernels_parallel.so
Traceback (most recent call last):
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 52, in <module>
    init()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\webui.py", line 24, in init
    load_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 61, in load_model
    prepare_model()
  File "D:\Code\Python\ChatGLM-6B\ChatGLM-webui\modules\model.py", line 42, in prepare_model
    model = model.half().quantize(4).cuda()
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\modeling_chatglm.py", line 1281, in quantize
    load_cpu_kernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 390, in load_cpu_kernel
    cpu_kernels = CPUKernel(**kwargs)
  File "C:\Users\lxy52/.cache\huggingface\modules\transformers_modules\quantization.py", line 157, in __init__
    kernels = ctypes.cdll.LoadLibrary(kernel_file)
  File "D:\Application\Miniconda3\envs\ChatGLM\lib\ctypes\__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
    self._handle = _dlopen(self._name, mode)
rnels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.
@Akegarasu
Copy link
Owner

try --precision fp16

@yknBugs
Copy link

yknBugs commented Apr 8, 2023

这好像不是WebUi的问题而是ChatGLM本身的问题。可以看源仓库的这个issue: THUDM/ChatGLM-6B#162

另外你直接加载量化后的模型真的能够节省显存吗?为什么我直接加载量化后的模型,虽然一开始占用的显存确实大幅度减少了,但每句对话占用的显存却大幅增加了,综合导致对话轮数还不如直接以量化的形式加载原始模型。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants