This is a playground to explore the ExLlama project in a Windows environment.
Download and install the latest versions:
Hint: When installing Visual Studio 2022 it is sufficent to just install the Build Tools for Visual Studio 2022
package. Also make sure that Desktop development with C++
is enabled in the installer.
Clone the repository to a nice place on your machine via:
git clone --recurse-submodules git@github.com:countzero/windows_exllama.git
This repository can reference an outdated version of the exllama repository. To update the submodule to the latest version execute the following.
git submodule update --remote --merge
Then add, commit and push the changes to make the update available for others.
git add --all; git commit -am "Update exllama submodule to latest commit"; git push
Hint: This is optional because the build script will pull the latest version.
Create a new Conda environment for this project with a specific version of Python:
conda create --name exllama python=3.10
To make Conda available in you current shell execute the following:
conda init
Hint: You can always revert this via conda init --reverse
.
./rebuild_exllama.ps1
Download a large language model (LLM) with weights in the GPTQ format into the ./models
directory. You can for example download the vicuna-7b-v1.3 model in a quantized GPTQ format via:
git clone https://huggingface.co/TheBloke/vicuna-7B-v1.3-GPTQ ./models/vicuna-7B-v1.3-GPTQ
Hint: See the 🤗 Open LLM Leaderboard for best in class open source LLMs.
Activate the conda environment to make the dependencies available via:
conda activate exllama
Execute the following to chat with a GPTQ formatted model:
python ./vendor/exllama/example_chatbot.py `
--directory "./models/vicuna-7B-v1.3-GPTQ" `
--prompt "./prompts/chatbot.txt" `
--botname "Vicuña" `
--username "User" `
--length 2048 `
--no_newline
Activate the conda environment to make the dependencies available via:
conda activate exllama
Execute the following to benchmark your system:
python ./vendor/exllama/test_benchmark_inference.py `
--directory "./models/vicuna-7B-v1.3-GPTQ" `
--perf
Activate the conda environment to make the dependencies available via:
conda activate exllama
Execute the following to measure the perplexity of the GPTQ formatted model:
python ./vendor/exllama/test_benchmark_inference.py `
--directory "./models/vicuna-7B-v1.3-GPTQ" `
--perplexity `
--perplexity_dataset "./vendor/exllama/datasets/wikitext2_val_sample.jsonl"