Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS Bedrock guide #72

Merged
merged 3 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions docs/integrations_bedrock.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Integration: AWS Bedrock

This guide will walk you through an example using Amazon Bedrock SDK with `vecs`. We will create embeddings using the Amazon Titan Embeddings G1 – Text v1.2 (amazon.titan-embed-text-v1) model, insert these embeddings into a PostgreSQL database using vecs, and then query the collection to find the most similar sentences to a given query sentence.

## Create an Environment

First, you need to set up your environment. You will need Python 3.7+ with the `vecs` and `boto3` libraries installed.

You can install the necessary Python libraries using pip:

```sh
pip install vecs boto3
```

You'll also need:

- [Credentials to your AWS account](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html)
- [A Postgres Database with the pgvector extension](hosting.md)

## Create Embeddings

Next, we will use Amazon’s Titan Embedding G1 - Text v1.2 model to create embeddings for a set of sentences.

```python
import boto3
import vecs
import json

client = boto3.client(
'bedrock-runtime',
region_name='us-east-1',
# Credentials from your AWS account
aws_access_key_id='<replace_your_own_credentials>',
aws_secret_access_key='<replace_your_own_credentials>',
aws_session_token='<replace_your_own_credentials>',
)

dataset = [
"The cat sat on the mat.",
"The quick brown fox jumps over the lazy dog.",
"Friends, Romans, countrymen, lend me your ears",
"To be or not to be, that is the question.",
]

embeddings = []

for sentence in dataset:
# invoke the embeddings model for each sentence
response = client.invoke_model(
body= json.dumps({"inputText": sentence}),
modelId= "amazon.titan-embed-text-v1",
accept = "application/json",
contentType = "application/json"
)
# collect the embedding from the response
response_body = json.loads(response["body"].read())
# add the embedding to the embedding list
embeddings.append((sentence, response_body.get("embedding"), {}))

```

### Store the Embeddings with vecs

Now that we have our embeddings, we can insert them into a PostgreSQL database using vecs.

```python
import vecs

DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"

# create vector store client
vx = vecs.Client(DB_CONNECTION)

# create a collection named 'sentences' with 1536 dimensional vectors
# to match the default dimension of the Titan Embeddings G1 - Text model
sentences = vx.get_or_create_collection(name="sentences", dimension=1536)

# upsert the embeddings into the 'sentences' collection
sentences.upsert(records=embeddings)

# create an index for the 'sentences' collection
sentences.create_index()
```

### Querying for Most Similar Sentences

Now, we query the sentences collection to find the most similar sentences a sample query sentence. First need to create an embedding for the query sentence. Next, we query from the collection we created earlier to find the most similar sentences.
olirice marked this conversation as resolved.
Show resolved Hide resolved

```python
query_sentence = "A quick animal jumps over a lazy one."

# create vector store client
vx = vecs.Client(DB_CONNECTION)

# create an embedding for the query sentence
response = client.invoke_model(
body= json.dumps({"inputText": query_sentence}),
modelId= "amazon.titan-embed-text-v1",
accept = "application/json",
contentType = "application/json"
)

response_body = json.loads(response["body"].read())

query_embedding = response_body.get("embedding")

# query the 'sentences' collection for the most similar sentences
results = sentences.query(
data=query_embedding,
limit=3,
include_value = True
)

# print the results
for result in results:
print(result)
```

Returns the most similar 3 records and their distance to the query vector.
olirice marked this conversation as resolved.
Show resolved Hide resolved
```
('The quick brown fox jumps over the lazy dog.', 0.27600620558852)
('The cat sat on the mat.', 0.609986272479202)
('To be or not to be, that is the question.', 0.744849503688346)
```
2 changes: 1 addition & 1 deletion docs/integrations_openai.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This guide will walk you through an example integration of the OpenAI API with t

## Create an Environment

First, you need to set up your environment. You will need Python 3.7 with the `vecs` and `openai` libraries installed.
First, you need to set up your environment. You will need Python 3.7+ with the `vecs` and `openai` libraries installed.

You can install the necessary Python libraries using pip:

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ nav:
- Metadata: concepts_metadata.md
- Integrations:
- OpenAI: integrations_openai.md
- Bedrock: integrations_bedrock.md
- HuggingFace Inference Endpoints: integrations_huggingface_inference_endpoints.md
- Support:
- Changelog: support_changelog.md
Expand Down
Loading