Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: knowledge base : pgvector db structure need help #17446

Open
1 task done
naveenr-btc opened this issue Jan 7, 2025 · 3 comments
Open
1 task done

[Question]: knowledge base : pgvector db structure need help #17446

naveenr-btc opened this issue Jan 7, 2025 · 3 comments
Labels
question Further information is requested

Comments

@naveenr-btc
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I have embeddings of text already. The embeddings are available in a table that has structure as

my_embeddings (
id BIGSERIAL PRIMARY KEY,
embedding vector(1536),
);

when using knowledge base, i'm getting error telling certain columns are missing. So need to understand what columns are missing. and also it'll be helpful if i can get an example for those columns of what it should hold.
Note: my use case is that the llms needs to understand from those knowledge base for better results not querying the actual content from it.

@naveenr-btc naveenr-btc added the question Further information is requested label Jan 7, 2025
Copy link

dosubot bot commented Jan 7, 2025

Hey @naveenr-btc! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this together!

To use pgvector with LlamaIndex, your knowledge base table needs to have specific columns. The required columns are:

  • id UUID DEFAULT (gen_random_uuid())
  • embedding VECTOR({num_dimensions})
  • content TEXT
  • metadata JSONB

If tenant_aware is set to True, you also need:

  • tenant_id UUID

Your current table structure is missing the content and metadata columns, which are necessary for LlamaIndex to function correctly. You can refer to the _create_tables method in the NileVectorStore class for more details on the table structure [1].

Additionally, here's an example of setting up a knowledge base table using SQLAlchemy and pgvector:

from pgvector.sqlalchemy import Vector
from sqlalchemy import insert, create_engine, String, text, Integer
from sqlalchemy.orm import declarative_base, mapped_column

Base = declarative_base()

class SECTextChunk(Base):
    __tablename__ = "sec_text_chunk"

    id = mapped_column(Integer, primary_key=True)
    page_label = mapped_column(Integer)
    file_name = mapped_column(String)
    text = mapped_column(String)
    embedding = mapped_column(Vector(384))

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

# Insert into Database
for node in nodes:
    row_dict = {
        "text": node.get_content(),
        "embedding": node.embedding,
        **node.metadata,
    }
    stmt = insert(SECTextChunk).values(**row_dict)
    with engine.connect() as connection:
        cursor = connection.execute(stmt)
        connection.commit()

This example demonstrates how to define a table schema and insert data with embeddings into a PostgreSQL database with pgvector [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@logan-markewich
Copy link
Collaborator

The db structure is here

Haven't taken the time to add options for people to "bring their own" in the postgres vector store class.

@naveenr-btc
Copy link
Author

ok Thanks for pointing it out, Can you guys tell me what these missed columns should hold ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants