This repository contains basic scripts for running the SEA-LION base LLM locally, as well as sending requests to the SEA-LION Instruct models running on Ollama/TGI servers. It has been tested with the SEA-LION-3B base LLM, as well as quantized GGUF files for 7b/8b instruct models on the server side. The scripts available facilitate running prompts in the terminal, as well as running text completion/input, question and answer, and translation via a Flask app.
Do check out the range of SEA-LION models available at https://huggingface.co/aisingapore/
Also available on Ollama at https://ollama.com/aisingapore
The scripts provided have been tested in the following environments:
- Processor: Apple M3 Max
- Memory: 64GB
- OS: MacOS Sonoma version 14.5
- Chip Architecture: ARM64
- Processor: 2.3GHz Quad-Core Intel Core i7
- Memory: 32GB
- OS: MacOS Sonoma version 14.5
- Chip Architecture: x86-64
- Memory: 16GB
- OS: Debian GNU/Linux 11 (Bullseye)
- Chip Architecture: x86-64
- Download Miniconda from the official website.
- Follow the installation instructions for your operating system.
Open your terminal and execute the following commands:
# Create a new conda environment
conda env create -f sealion_env.yml
# Activate the environment
conda activate sealion_env
# Create a new conda environment
conda create -n sealion_env python=3.12
# Activate the environment
conda activate sealion_env
Ensure you have requirements.txt in your project directory. Run:
# Install required packages
pip install -r requirements.txt
To test the model with a simple prompt, run the following Python script via terminal:
python src.sealion_3b_prompt.py
It will prompt the model with the string The sea lion is a
and should return a continuation of this sentence.
Referring to .env.example
as an example, create a .env
file in the same repository folder.
Model Selection section will advise on how to configure the variables used.
Before running the Flask app, the model should be loaded on Ollama first.
After installation of Ollama, proceed to run the following command:
ollama pull aisingapore/llama3-8b-cpt-sea-lionv2-instruct
For specific quantizations, do add the available tag. For example:
ollama pull aisingapore/llama3-8b-cpt-sea-lionv2-instruct:q4_k_m
You should then see it when you run ollama list
ollama list
> NAME ID SIZE MODIFIED
> aisingapore/llama3-8b-cpt-sea-lionv2-instruct:latest 648d5f2d7bbe 4.9 GB 23 hours ago
> aisingapore/llama3-8b-cpt-sea-lionv2-instruct:q4_k_m 648d5f2d7bbe 4.9 GB 23 hours ago
Make sure the full model name is updated under OLL_API_MODEL
in your .env
file.
OLL_API_MODEL=aisingapore/llama3-8b-cpt-sea-lionv2-instruct
Note: Ollama should be running before starting up the Flask app, if not the server would not be up. Running any of the commands should suffice in starting up Ollama, e.g. ollama list
To start up the Flask app:
flask -A src.sealion_app run
To make the app accessible from other machines by exposing ports:
flask -A src.sealion_app run --host '0.0.0.0'
Note: Exposing your Flask app to '0.0.0.0' makes it accessible from any device on the network, which can introduce security risks. Ensure you have proper security measures in place.
Alternatively: Running via Python script (Will run in debug mode and expose ports):
python src.sealion_app.py
Once the terminal returns the following, you should proceed to access the app via port 5000:
* Serving Flask app 'src.sealion_app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
127.0.0.1 - - [DD/MMM/YYYY HH:MM:SS] "GET / HTTP/1.1" 200 -
In the event that the port is occupied, it may return such an error:
Address already in use
Port 5000 is in use by another program. Either identify and stop that program, or start the server with a different port.
On macOS, try disabling the 'AirPlay Receiver' service from System Preferences -> General -> AirDrop & Handoff.
Try assigning the Flask app to a different port instead:
flask -A src.sealion_app run --host '0.0.0.0' --port 5050
If running via Python script, do update the port (located at the end of script):
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5050, debug=False)
There are 3 types of models made available for selection:
- Locally run model: Run directly from the Python script via HuggingFace Transformers, currently configured to run
sea-lion-3b
model underLOCAL_MODEL
inenv.example
. Able to run on CPU. - Model on Ollama server: This selection is configured to work with a model running on Ollama server.
OLL_API_URL
in.env.example
is currently set to default local address.OLL_API_MODEL
should be set to the model name as per what was set in the Ollama Modelfile. - Multiple models on online server (TGI): This selection is configured to work with multiple models running on a TGI server with API Key authentication.
TGI_API_URL
should be set to the endpoint URL.TGI_API_KEY
should be set to the API Key required, remove if no authentication used, it will default toNone
. Update the respective model names used in the server toTGI_SEALION
andTGI_LLAMA
. If only one model is used, removeTGI_LLAMA
, it will default toNone
.
Currently there are three functions available, which prompt the base LLM differently:
- Text Generation/Input: No additional template, LLM proceeds with text generation, continuing from where the input prompt ends. For base models, it should return the initial prompt, continued with the generated text. For chat/instruct-tuned models, they should return a response to the prompt.
- Question and Answer: Prompt is input to a
Question: {prompt} Answer:
template - Translation: An additional
Language
option will be provided (Default:English
). Prompt is input to a'{prompt}' In {language}, this translates to:
template.
There are two parameters provided for the app user to toggle:
- Temperature (Default:
0.7
, Range:0.0-1.0
): The value used to modulate the next token probabilities. - Max New Tokens (Default:
40
for local,128
for server): The maximum numbers of tokens to generate, also known as num_predict in some frameworks.
To run the script for prompting via LangChain:
python src.sealion_3b_langchain.py
This script is currently a work in progress, with different parts commented out to test different ways of prompting the SEA-LION base model. Feel free to experiment with the script!