Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors loading ggml models #13

Open
bbecausereasonss opened this issue Aug 17, 2023 · 4 comments
Open

Errors loading ggml models #13

bbecausereasonss opened this issue Aug 17, 2023 · 4 comments

Comments

@bbecausereasonss
Copy link

Hey there been a while. Glad to see there was a recent update. Tried loading a different quant today and had an error. Not sure how to proceed. Also any way to incorporate AutoGPTQ?

--- Launched app
--- Set theme to: native
--- llama.cpp model load parameters: None
--- Loading llama.cpp model...
llama.cpp: loading model from E:/Backups/Deep Models/LLama/models/orca_mini_v3_70b.ggmlv3.q4_K_M.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 8192
llama_model_load_internal: n_mult = 7168
llama_model_load_internal: n_head = 64
llama_model_load_internal: n_layer = 80
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q4_K - Medium)
llama_model_load_internal: n_ff = 28672
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size = 0.18 MB
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
llama_init_from_file: failed to load model
--- Error loading backend:

---Error: Model load failure...

@shinomakoi
Copy link
Owner

Hmm I'm not sure. llama-cpp-python just got updated, maybe see if it works with the updated version. I'll look into AutoGPTQ

@bbecausereasonss
Copy link
Author

Does magi automatically update and build the latest LLamacpp with CUDA support or do I need to do this manually somehow?

@shinomakoi
Copy link
Owner

Not by default, instructions are here https://github.com/abetlen/llama-cpp-python/#windows-remarks

$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
$env:FORCE_CMAKE = 1
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Should do it (also need CUDA toolkit installed https://developer.nvidia.com/cuda-11-8-0-download-archive)
I could make a pre-compiled version too

@sjdthree
Copy link

sjdthree commented Oct 7, 2023

just tried and downloaded a GGUF model (not GGML) and it works. Note: ggerganov/llama.cpp#3070 maybe the issue here.

grabbed the GGUF model from here: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/tree/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants