Contributions

This application improves on the problems of the forked version. (reported in SonicWarrior1#7 (comment))

   **Significantly reworked** the forked version to properly support **Retrieval-Augmented Generation (RAG)**.

Successfully deployed on a 8GB Mac Mini. Access the deployed version here: https://mac-mini.boga-vector.ts.net/mistral

Read more about how to deploy in this Medium Post.

What has changed

Retrieval-Augmented Generation (RAG) architecture: to select documents
Vector Store: Uses FAISS DB properly to implement the RAG architecture
Multiple Users: Each user session has its own workspace
Multiple Files: Multiple files can be uploaded

The system supports the ingestion of multiple documents, and each browser reload initiates a private session, ensuring that users can interact exclusively with their specific uploaded documents.

Use Case: Retrieve Important Information from Email PDFs

This app enables you to upload email PDFs and interactively extract key details such as sender information, dates, attachments, and discussion points. Whether you're summarizing meeting notes, tracking follow-ups, or searching for approvals, the app leverages Retrieval-Augmented Generation (RAG) to provide precise, context-aware answers from one or multiple email threads.

❤️ I am using this to find the discounts promotions in my inbox ❤️

Key Features

FAISS Vector Database: Enables fast and efficient semantic search for document content.
Retrieval-Augmented Generation (RAG): Combines the LLM’s generative capabilities with relevant information retrieval to deliver precise, document-grounded answers.
Mistral 7B via Ollama: Runs the lightweight, high-performance Mistral 7B model locally for inference.
Streamlit Interface: Provides an intuitive, interactive frontend for seamless user interaction.

How It Works

Document Ingestion: Users upload one or more PDF files.
Vectorization: Document content is embedded and stored in a FAISS vector database.
Semantic Search: User queries trigger a semantic search within the vector database to locate the most relevant document passages.
Contextual Response Generation: The system integrates retrieved information with the Mistral 7B model to generate highly accurate responses.

How to Deploy

Before starting

Follow the instructions to install Ollama here: Ollama GitHub Repository.

Instructions

Clone this repository:

git clone https://github.com/25mb-git/pdfchat.git

Install dependencies:
```
 pip install -r requirements.txt
```
Run the application::
```
 streamlit run app.py
```

Tests

Test cases

setup_session: Ensures the files folder exists.
pdf_bytes: Simulates a PDF file in memory using BytesIO.
test_pdf_upload_and_storage: Verifies PDF uploads and file path creation.
test_vector_store_creation: Tests the creation of the FAISS vector store with dummy PDFs.
test_streamlit_ui_elements: Ensures session state is initialized.
test_user_input_and_chat_flow: Simulates the full chat flow with user input and assistant responses.

Test execution

Run test code:: Make sure pytest is installed:
```
pip install pytest
```
Run test code::
```
 pytest test/test_app.py
```

Technologies Used

Mistral 7B: Lightweight, open-weight LLM optimized for local deployment. Ollama: Simplifies LLM model deployment and inference. LangChain: Facilitates seamless integration of LLMs and external tools. FAISS: High-performance vector database for semantic search. Streamlit: User-friendly framework for creating interactive web applications.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
art		art
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contributions

What has changed

Use Case: Retrieve Important Information from Email PDFs

Key Features

How It Works

How to Deploy

Before starting

Instructions

Tests

Test cases

Test execution

Technologies Used

About

Releases

Packages

Languages

License

25mb-git/pdfchat

Folders and files

Latest commit

History

Repository files navigation

Contributions

What has changed

Use Case: Retrieve Important Information from Email PDFs

Key Features

How It Works

How to Deploy

Before starting

Instructions

Tests

Test cases

Test execution

Technologies Used

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages