Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenAI and Ollama #60

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

shavit
Copy link
Contributor

@shavit shavit commented Feb 23, 2024

This change extends previous work on remote models, and adds OpenAI compatible backend #59

Tasks and discussions:

Ideally the change will not affect what's already working right now with Llama, and have the minimum necessary change. Upgrades or refactoring can be added at the end.

Copy link

vercel bot commented Feb 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
free-chat ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 25, 2024 7:43pm

@prabirshrestha
Copy link

For ollama would be good to support keep_alive parameter so we can control for how long the model will be loaded.

@shavit
Copy link
Contributor Author

shavit commented Feb 25, 2024

The remote server backends will need API key field, to be added as authorization header, and a selected model name from a separate list.

Since the model ID is used to select model names but also the remote model, the settings need another option to choose a backend type. Then the model ID can be used for remote backends.

@psugihara
Copy link
Owner

This is cool, great work! Would it make sense to go more general and migrate OllamaBackend -> OpenAIBackend?

Now that there's template support in llama.cpp server, we could migrate the default LlamaServer logic to llama.cpp server's openAI API and hopefully share all of the code.

@shavit
Copy link
Contributor Author

shavit commented Feb 28, 2024

They are similar but not the same:

The Settings and Agent is where all the backends share the same behavior. An interface of chat completion can have context, user messages, and maybe options for the temperature etc. that can be shared across all backends.

@psugihara
Copy link
Owner

Would their openai /v1/chat/completions endpoint give what we need?

https://github.com/ollama/ollama/blob/main/docs/openai.md#endpoints

@shavit
Copy link
Contributor Author

shavit commented Feb 28, 2024

Yes, I don't remember why I used the other endpoint.

@shavit
Copy link
Contributor Author

shavit commented Mar 10, 2024

There are few more changes to make, such as backend initialization to ensure it is not nil, and solve the conflicts. Currently the local version of llama.cpp doesn't work, but it could be outdated.

Other notes:

  • Health check only check for 200 instead of the previous response, which is misleading if it hits other services on the host.
  • There are two model lists on the settings view.
  • The llama.cpp backend does not have a model list.

Related #51 Related #26 Related Closes #59

@shavit shavit marked this pull request as ready for review March 10, 2024 20:58
@psugihara
Copy link
Owner

psugihara commented Mar 11, 2024

Just played with this, very cool. I like the general approach of allowing you to switch backends (and having the 0-config localhost backend by default).

Try merging main for a recent version of llama.cpp (I updated it friday).

A few other thoughts...

  1. Screenshot 2024-03-11 at 9 26 19 AM

Since this is used for multiple backends, switch copy to "Configure your backend based on the model you're using"


  1. Screenshot 2024-03-11 at 9 26 12 AM

  1. For max simplicity, maybe we just get rid of the llama.cpp backend list option since (I think?) you can use llama.cpp via OpenAI API. It's slightly confusing to me just bc "This Computer" is also llama.cpp.

  1. It would be nice to have a sentence or 2 of copy pointing you to where to start with each backend and/or what it is. This is not necessary to merge though, I can enhance later if you don't have copy ideas.

@shavit
Copy link
Contributor Author

shavit commented Mar 12, 2024

  1. Notice that the context parameter only applies to Ollama and the embedded server.
  2. The copy can move up below the backend instead the model.
  3. I agree that it can be confusing if users will configure it to run against the embedded server, but it is still an option for users to run remotely.
  4. I added short descriptions

Also the prompt is ignored now.

shavit added 19 commits March 16, 2024 07:44
  * Add backend types with default URLs
  * Use llama to only run the local instance
  * Make submit(input:) async
  * Create backends for local and remote servers
  * List models for each backend
  * Save backend type
  * Change backends during chat conversations
Each backend has its own config, model, token, and a default host value.

More changes:
  * Importing a single file will open the app and set the model and backend.
  * Each backend has its own model list.
  * Choosing a model will not override other backends.
  * Update model list when file is added or deleted
  * Update completion params
  * Match the picker selection to a model file
  * Update the backend response
  * Select and use imported model file
  * Remove context from backends
  * Provide a fallback baseURL to new backends
  * Pass config to create backends and add another fallback to ensure initialization
  * Use localized strings with markdown.
  * Add system prompt to completion.
  * Add a default port 443 for OpenAI to ensure port value in settings.
  * Determine default value for `selectedModelId`.
Fetch the backend config and create the backend on each reboot.
@psugihara
Copy link
Owner

Just had some time to test and found a fatal bug when I send a message after switching to the default backend. Not quite sure what's going on.

Screenshot 2024-03-24 at 10 05 40 PM Screenshot 2024-03-24 at 10 05 20 PM

  * Create a backend during agent initialization.
  * Start the local llama server in conversation view
@shavit
Copy link
Contributor Author

shavit commented Mar 25, 2024

Yes, the backend was an implicitly unwrapped optional to find those errors, rather than silence them and not respond at all. Now the backend is being initialized together with the agent, and uses the default local server.

Maybe the initialization parameters of the agent and error handling can be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants