Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another one to check out #9

Open
bbecausereasonss opened this issue Jun 22, 2023 · 3 comments
Open

Another one to check out #9

bbecausereasonss opened this issue Jun 22, 2023 · 3 comments

Comments

@bbecausereasonss
Copy link

Extend context 6-8k on any model. *Have not tested

https://kaiokendev.github.io/til#extending-context-to-8k

@shinomakoi
Copy link
Owner

Maybe this works now. I can't test though cause only 8GB VRAM. LoRAs seem to work at least

Download the LoRA: https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test - and set the path in the app (File>Parameters>Exllama)
Set 'Context' to 8192
Set 'Compress pos embeddings' to 4

@bbecausereasonss
Copy link
Author

Seen the stuff going on with these long conext + exllama?

https://huggingface.co/Panchovix

https://huggingface.co/TheBloke

"Please make sure you're using the latest version of text-generation-webui

Click the Model tab.
Under Download custom model or LoRA, enter TheBloke/Robin-13B-v2-SuperHOT-8K-GPTQ.
Click Download.
The model will start downloading. Once it's finished it will say "Done"
Untick Autoload the model
In the top left, click the refresh icon next to Model.
In the Model dropdown, choose the model you just downloaded: Robin-13B-v2-SuperHOT-8K-GPTQ
To use the increased context, set the Loader to ExLlama, set max_seq_len to 8192 or 4096, and set compress_pos_emb to 4 for 8192 context, or to 2 for 4096 context.
Now click Save Settings followed by Reload
The model will automatically load, and is now ready for use!
Once you're ready, click the Text Generation tab and enter a prompt to get started!"

@shinomakoi
Copy link
Owner

They should work here now with Exllama backend, if set context to 8192 and 'Compress pos embeddings' to 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants