Support OpenAI and Ollama #60

shavit · 2024-02-23T20:39:31Z

This change extends previous work on remote models, and adds OpenAI compatible backend #59

Tasks and discussions:

Use the same completion interface for all backends
Upgrade EventStream from 0.0.5, because it can crash if users misconfigure their server: https://github.com/Recouse/EventSource/blob/8c0af68bf3a4c93819d3fa5f549232f612324de2/Sources/EventSource/ServerMessage.swift#L55-L57
Use a stream instead of callbacks for completion
Settings interface, can have preconfigured settings for OpenAI, Ollama, yet need to have the server type because of their different design.
Currently in this branch the app only works with Ollama, minimally for demonstration.

Ideally the change will not affect what's already working right now with Llama, and have the minimum necessary change. Upgrades or refactoring can be added at the end.

vercel · 2024-02-23T20:39:35Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
free-chat	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 25, 2024 7:43pm

prabirshrestha · 2024-02-25T07:02:57Z

For ollama would be good to support keep_alive parameter so we can control for how long the model will be loaded.

shavit · 2024-02-25T19:22:03Z

The remote server backends will need API key field, to be added as authorization header, and a selected model name from a separate list.

Since the model ID is used to select model names but also the remote model, the settings need another option to choose a backend type. Then the model ID can be used for remote backends.

psugihara · 2024-02-28T18:29:59Z

This is cool, great work! Would it make sense to go more general and migrate OllamaBackend -> OpenAIBackend?

Now that there's template support in llama.cpp server, we could migrate the default LlamaServer logic to llama.cpp server's openAI API and hopefully share all of the code.

shavit · 2024-02-28T19:11:35Z

They are similar but not the same:

In 543303a the parameters are similar to llama.cpp and OpenAI, except they have options nested (see https://github.com/ollama/ollama/blob/main/docs/api.md and https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values).
OpenAI and Ollama accept model name parameters, but their values are different.
Their API could change and break other backends.

The Settings and Agent is where all the backends share the same behavior. An interface of chat completion can have context, user messages, and maybe options for the temperature etc. that can be shared across all backends.

psugihara · 2024-02-28T19:21:25Z

Would their openai /v1/chat/completions endpoint give what we need?

https://github.com/ollama/ollama/blob/main/docs/openai.md#endpoints

shavit · 2024-02-28T19:26:50Z

Yes, I don't remember why I used the other endpoint.

shavit · 2024-03-10T20:55:46Z

There are few more changes to make, such as backend initialization to ensure it is not nil, and solve the conflicts. Currently the local version of llama.cpp doesn't work, but it could be outdated.

Other notes:

Health check only check for 200 instead of the previous response, which is misleading if it hits other services on the host.
There are two model lists on the settings view.
The llama.cpp backend does not have a model list.

Related #51 Related #26 Related Closes #59

psugihara · 2024-03-11T16:56:09Z

Just played with this, very cool. I like the general approach of allowing you to switch backends (and having the 0-config localhost backend by default).

Try merging main for a recent version of llama.cpp (I updated it friday).

A few other thoughts...

Since this is used for multiple backends, switch copy to "Configure your backend based on the model you're using"

For max simplicity, maybe we just get rid of the llama.cpp backend list option since (I think?) you can use llama.cpp via OpenAI API. It's slightly confusing to me just bc "This Computer" is also llama.cpp.

It would be nice to have a sentence or 2 of copy pointing you to where to start with each backend and/or what it is. This is not necessary to merge though, I can enhance later if you don't have copy ideas.

shavit · 2024-03-12T20:46:44Z

Notice that the context parameter only applies to Ollama and the embedded server.
The copy can move up below the backend instead the model.
I agree that it can be confusing if users will configure it to run against the embedded server, but it is still an option for users to run remotely.
I added short descriptions

Also the prompt is ignored now.

* Add backend types with default URLs * Use llama to only run the local instance * Make submit(input:) async

* Create backends for local and remote servers * List models for each backend * Save backend type * Change backends during chat conversations

Each backend has its own config, model, token, and a default host value. More changes: * Importing a single file will open the app and set the model and backend. * Each backend has its own model list. * Choosing a model will not override other backends.

* Update model list when file is added or deleted * Update completion params * Match the picker selection to a model file

* Update the backend response * Select and use imported model file

* Remove context from backends * Provide a fallback baseURL to new backends * Pass config to create backends and add another fallback to ensure initialization

* Use localized strings with markdown. * Add system prompt to completion. * Add a default port 443 for OpenAI to ensure port value in settings. * Determine default value for `selectedModelId`.

Fetch the backend config and create the backend on each reboot.

psugihara · 2024-03-25T05:06:17Z

Just had some time to test and found a fatal bug when I send a message after switching to the default backend. Not quite sure what's going on.

* Create a backend during agent initialization. * Start the local llama server in conversation view

shavit · 2024-03-25T19:56:19Z

Yes, the backend was an implicitly unwrapped optional to find those errors, rather than silence them and not respond at all. Now the backend is being initialized together with the agent, and uses the default local server.

Maybe the initialization parameters of the agent and error handling can be improved.

vercel bot deployed to Preview February 23, 2024 20:40 View deployment

vercel bot deployed to Preview February 24, 2024 17:01 View deployment

shavit force-pushed the 59-openai-prompt-templates branch from 0d61eaf to a466526 Compare February 24, 2024 17:16

vercel bot deployed to Preview February 24, 2024 17:17 View deployment

vercel bot deployed to Preview February 25, 2024 19:09 View deployment

vercel bot deployed to Preview March 3, 2024 21:14 View deployment

vercel bot deployed to Preview March 3, 2024 22:30 View deployment

vercel bot deployed to Preview March 4, 2024 21:34 View deployment

vercel bot deployed to Preview March 6, 2024 23:05 View deployment

vercel bot deployed to Preview March 9, 2024 16:29 View deployment

vercel bot deployed to Preview March 9, 2024 21:33 View deployment

vercel bot deployed to Preview March 10, 2024 20:37 View deployment

shavit changed the title ~~[WIP] Support OpenAI and Ollama~~ Support OpenAI and Ollama Mar 10, 2024

shavit marked this pull request as ready for review March 10, 2024 20:58

vercel bot deployed to Preview March 12, 2024 20:22 View deployment

vercel bot deployed to Preview March 12, 2024 20:32 View deployment

shavit force-pushed the 59-openai-prompt-templates branch from 3a6afe4 to 1faa518 Compare March 12, 2024 20:44

vercel bot deployed to Preview March 12, 2024 20:45 View deployment

vercel bot deployed to Preview March 16, 2024 11:39 View deployment

Update server health endpoint

4d1a89a

shavit added 19 commits March 16, 2024 07:44

Add completion method with AsyncStream

9f9c94d

Add Ollama backend

cf29584

Break stream for any finish reason

7efff57

Fetch Ollama model list

1fdcd26

Add Ollama optional parameters

56cf53e

Use OpenAI backend for remote and local server

e88b52e

* Add backend types with default URLs * Use llama to only run the local instance * Make submit(input:) async

Add backend type picker

ea6a1d5

Upgrade EventSource

76e3e45

Move the prompt editor to the same position

dc88b74

Clean unmatched quotes from the backend output

52fd9d4

Initialize backend based on the backend type

18882b2

Add backends

d936121

* Create backends for local and remote servers * List models for each backend * Save backend type * Change backends during chat conversations

Configure all backends

4359287

Each backend has its own config, model, token, and a default host value. More changes: * Importing a single file will open the app and set the model and backend. * Each backend has its own model list. * Choosing a model will not override other backends.

Update model file picker and completion parameters

c972fb6

* Update model list when file is added or deleted * Update completion params * Match the picker selection to a model file

Make content optional in completion

dd30ed4

* Update the backend response * Select and use imported model file

Update backend initialization

e40dadd

* Remove context from backends * Provide a fallback baseURL to new backends * Pass config to create backends and add another fallback to ensure initialization

Disable remote llama.cpp model selection and add backend descriptions

c8c0ba2

Use localized formatted strings for help

ec4a8b2

* Use localized strings with markdown. * Add system prompt to completion. * Add a default port 443 for OpenAI to ensure port value in settings. * Determine default value for `selectedModelId`.

Create backends each time the agent reboots

d71f565

Fetch the backend config and create the backend on each reboot.

shavit force-pushed the 59-openai-prompt-templates branch from f8894fd to d71f565 Compare March 16, 2024 16:12

vercel bot deployed to Preview March 16, 2024 16:13 View deployment

Initialize agent backend and start server

2879995

* Create a backend during agent initialization. * Start the local llama server in conversation view

vercel bot deployed to Preview March 25, 2024 19:43 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support OpenAI and Ollama #60

Support OpenAI and Ollama #60

shavit commented Feb 23, 2024 •

edited

Loading

vercel bot commented Feb 23, 2024 •

edited

Loading

prabirshrestha commented Feb 25, 2024

shavit commented Feb 25, 2024

psugihara commented Feb 28, 2024

shavit commented Feb 28, 2024

psugihara commented Feb 28, 2024

shavit commented Feb 28, 2024

shavit commented Mar 10, 2024

psugihara commented Mar 11, 2024 •

edited

Loading

shavit commented Mar 12, 2024

psugihara commented Mar 25, 2024

shavit commented Mar 25, 2024

Support OpenAI and Ollama #60

Are you sure you want to change the base?

Support OpenAI and Ollama #60

Conversation

shavit commented Feb 23, 2024 • edited Loading

vercel bot commented Feb 23, 2024 • edited Loading

prabirshrestha commented Feb 25, 2024

shavit commented Feb 25, 2024

psugihara commented Feb 28, 2024

shavit commented Feb 28, 2024

psugihara commented Feb 28, 2024

shavit commented Feb 28, 2024

shavit commented Mar 10, 2024

psugihara commented Mar 11, 2024 • edited Loading

shavit commented Mar 12, 2024

psugihara commented Mar 25, 2024

shavit commented Mar 25, 2024

shavit commented Feb 23, 2024 •

edited

Loading

vercel bot commented Feb 23, 2024 •

edited

Loading

psugihara commented Mar 11, 2024 •

edited

Loading