Skip to content

Latest commit

 

History

History
146 lines (118 loc) · 3.9 KB

README.md

File metadata and controls

146 lines (118 loc) · 3.9 KB

vector-embedding-api

vector-embedding-apiprovides a Flask API server and client to generate text embeddings using either OpenAI's embedding model or the SentenceTransformers library. The API server now supports in-memory LRU caching for faster retrievals, batch processing for handling multiple texts at once, and a health status endpoint for monitoring the server status.

SentenceTransformers supports over 500 models via HuggingFace Hub.

Features 🎯

  • POST endpoint to create text embeddings
    • sentence_transformers
    • OpenAI text-embedding-ada-002
  • In-memory LRU cache for quick retrieval of embeddings
  • Batch processing to handle multiple texts in a single request
  • Easy setup with configuration file
  • Health status endpoint
  • Python client utility for submitting text or files

Installation 💻

To run this server locally, follow the steps below:

Clone the repository: 📦

git clone https://github.com/deadbits/vector-embedding-api.git
cd vector-embedding-api

Set up a virtual environment (optional but recommended): 🐍

virtualenv -p /usr/bin/python3.10 venv
source venv/bin/activate

Install the required dependencies: 🛠️

pip install -r requirements.txt

Usage

Modify the server.conf configuration file: ⚙️

[main]
openai_api_key = YOUR_OPENAI_API_KEY
sent_transformers_model = sentence-transformers/all-MiniLM-L6-v2
use_cache = true/false

Start the server: 🚀

python server.py

The server should now be running on http://127.0.0.1:5000/.

API Endpoints 🌐

Client Usage

A small Python client is provided to assist with submitting text strings or files.

Usage python3 client.py -t "Your text here" -m local

python3 client.py -f /path/to/yourfile.txt -m openai

POST /submit

Submits an individual text string or a list of text strings for embedding generation.

Request Parameters

  • text: The text string or list of text strings to generate the embedding for. (Required)
  • model: Type of model to be used, either local for SentenceTransformer models or openai for OpenAI's model. Default is local.

Response

  • embedding: The generated embedding array.
  • status: Status of the request, either success or error.
  • elapsed: The elapsed time taken for generating the embedding (in milliseconds).
  • model: The model used to generate the embedding.
  • cache: Boolean indicating if the result was retrieved from cache. (Optional)
  • message: Error message if the status is error. (Optional)

GET /health

Checks the server's health status.

Response

  • cache.enabled: Boolean indicating status of the cache
  • cache.max_size: Maximum cache size
  • cache.size: Current cache size
  • models.openai: Boolean indicating if OpenAI embeddings are enabled. (Optional)
  • models.sentence-transformers: Name of sentence-transformers model in use.
{
  "cache": {
    "enabled": true,
    "max_size": 500,
    "size": 0
  },
  "models": {
    "openai": true,
    "sentence-transformers": "sentence-transformers/all-MiniLM-L6-v2"
  }
}

Example Usage

Send a POST request to the /submit endpoint with JSON payload:

{
    "text": "Your text here",
    "model": "local"
}

// multi text submission
{
    "text": ["Text1 goes here", "Text2 goes here"], 
    "model": "openai"
}

You'll receive a response containing the embedding and additional information:

[
  {
    "embedding": [...],
    "status": "success",
    "elapsed": 123,
    "model": "sentence-transformers/all-MiniLM-L6-v2"
  }
]

[
  {
    "embedding": [...],
    "status": "success",
    "elapsed": 123,
    "model": "openai"
  }, 
  {
    "embedding": [...],
    "status": "success",
    "elapsed": 123,
    "model": "openai"
  }, 
]