Errors loading ggml models #13

bbecausereasonss · 2023-08-17T19:14:13Z

Hey there been a while. Glad to see there was a recent update. Tried loading a different quant today and had an error. Not sure how to proceed. Also any way to incorporate AutoGPTQ?

--- Launched app
--- Set theme to: native
--- llama.cpp model load parameters: None
--- Loading llama.cpp model...
llama.cpp: loading model from E:/Backups/Deep Models/LLama/models/orca_mini_v3_70b.ggmlv3.q4_K_M.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 8192
llama_model_load_internal: n_mult = 7168
llama_model_load_internal: n_head = 64
llama_model_load_internal: n_layer = 80
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q4_K - Medium)
llama_model_load_internal: n_ff = 28672
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size = 0.18 MB
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
llama_init_from_file: failed to load model
--- Error loading backend:

---Error: Model load failure...

shinomakoi · 2023-08-18T16:39:06Z

Hmm I'm not sure. llama-cpp-python just got updated, maybe see if it works with the updated version. I'll look into AutoGPTQ

bbecausereasonss · 2023-08-19T11:50:47Z

Does magi automatically update and build the latest LLamacpp with CUDA support or do I need to do this manually somehow?

shinomakoi · 2023-08-19T13:01:14Z

Not by default, instructions are here https://github.com/abetlen/llama-cpp-python/#windows-remarks

$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
$env:FORCE_CMAKE = 1
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Should do it (also need CUDA toolkit installed https://developer.nvidia.com/cuda-11-8-0-download-archive)
I could make a pre-compiled version too

sjdthree · 2023-10-07T15:28:19Z

just tried and downloaded a GGUF model (not GGML) and it works. Note: ggerganov/llama.cpp#3070 maybe the issue here.

grabbed the GGUF model from here: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/tree/main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors loading ggml models #13

Errors loading ggml models #13

bbecausereasonss commented Aug 17, 2023

shinomakoi commented Aug 18, 2023

bbecausereasonss commented Aug 19, 2023

shinomakoi commented Aug 19, 2023

sjdthree commented Oct 7, 2023

Errors loading ggml models #13

Errors loading ggml models #13

Comments

bbecausereasonss commented Aug 17, 2023

shinomakoi commented Aug 18, 2023

bbecausereasonss commented Aug 19, 2023

shinomakoi commented Aug 19, 2023

sjdthree commented Oct 7, 2023