-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors loading ggml models #13
Comments
Hmm I'm not sure. llama-cpp-python just got updated, maybe see if it works with the updated version. I'll look into AutoGPTQ |
Does magi automatically update and build the latest LLamacpp with CUDA support or do I need to do this manually somehow? |
Not by default, instructions are here https://github.com/abetlen/llama-cpp-python/#windows-remarks
Should do it (also need CUDA toolkit installed https://developer.nvidia.com/cuda-11-8-0-download-archive) |
just tried and downloaded a GGUF model (not GGML) and it works. Note: ggerganov/llama.cpp#3070 maybe the issue here. grabbed the GGUF model from here: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/tree/main |
Hey there been a while. Glad to see there was a recent update. Tried loading a different quant today and had an error. Not sure how to proceed. Also any way to incorporate AutoGPTQ?
--- Launched app
--- Set theme to: native
--- llama.cpp model load parameters: None
--- Loading llama.cpp model...
llama.cpp: loading model from E:/Backups/Deep Models/LLama/models/orca_mini_v3_70b.ggmlv3.q4_K_M.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 8192
llama_model_load_internal: n_mult = 7168
llama_model_load_internal: n_head = 64
llama_model_load_internal: n_layer = 80
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q4_K - Medium)
llama_model_load_internal: n_ff = 28672
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size = 0.18 MB
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
llama_init_from_file: failed to load model
--- Error loading backend:
---Error: Model load failure...
The text was updated successfully, but these errors were encountered: