Skip to content

RAG-based Local PDF Chatbot: Supports multiple PDFs and concurrent users. Powered by Mistral 7B LLM, LangChain, Ollama, FAISS vector store, and Streamlit for an interactive experience.

License

Notifications You must be signed in to change notification settings

25mb-git/pdfchat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributions

This application improves on the problems of the forked version. (reported in SonicWarrior1#7 (comment))

   **Significantly reworked** the forked version to properly support **Retrieval-Augmented Generation (RAG)**.

Successfully deployed on a 8GB Mac Mini. Access the deployed version here: https://mac-mini.boga-vector.ts.net/mistral

Read more about how to deploy in this Medium Post.

What has changed

  • Retrieval-Augmented Generation (RAG) architecture: to select documents
  • Vector Store: Uses FAISS DB properly to implement the RAG architecture
  • Multiple Users: Each user session has its own workspace
  • Multiple Files: Multiple files can be uploaded

The system supports the ingestion of multiple documents, and each browser reload initiates a private session, ensuring that users can interact exclusively with their specific uploaded documents.

Use Case: Retrieve Important Information from Email PDFs

This app enables you to upload email PDFs and interactively extract key details such as sender information, dates, attachments, and discussion points. Whether you're summarizing meeting notes, tracking follow-ups, or searching for approvals, the app leverages Retrieval-Augmented Generation (RAG) to provide precise, context-aware answers from one or multiple email threads.

❤️ I am using this to find the discounts promotions in my inbox ❤️

Key Features

  • FAISS Vector Database: Enables fast and efficient semantic search for document content.
  • Retrieval-Augmented Generation (RAG): Combines the LLM’s generative capabilities with relevant information retrieval to deliver precise, document-grounded answers.
  • Mistral 7B via Ollama: Runs the lightweight, high-performance Mistral 7B model locally for inference.
  • Streamlit Interface: Provides an intuitive, interactive frontend for seamless user interaction.

How It Works

  1. Document Ingestion: Users upload one or more PDF files.
  2. Vectorization: Document content is embedded and stored in a FAISS vector database.
  3. Semantic Search: User queries trigger a semantic search within the vector database to locate the most relevant document passages.
  4. Contextual Response Generation: The system integrates retrieved information with the Mistral 7B model to generate highly accurate responses.

How to Deploy

Before starting

Follow the instructions to install Ollama here: Ollama GitHub Repository.

Instructions

  1. Clone this repository:

    git clone https://github.com/25mb-git/pdfchat.git
  2. Install dependencies:

     pip install -r requirements.txt
  3. Run the application::

     streamlit run app.py

Tests

Test cases

  1. setup_session: Ensures the files folder exists.
  2. pdf_bytes: Simulates a PDF file in memory using BytesIO.
  3. test_pdf_upload_and_storage: Verifies PDF uploads and file path creation.
  4. test_vector_store_creation: Tests the creation of the FAISS vector store with dummy PDFs.
  5. test_streamlit_ui_elements: Ensures session state is initialized.
  6. test_user_input_and_chat_flow: Simulates the full chat flow with user input and assistant responses.

Test execution

  1. Run test code:: Make sure pytest is installed:

    pip install pytest
  2. Run test code::

     pytest test/test_app.py

Technologies Used

Mistral 7B: Lightweight, open-weight LLM optimized for local deployment. Ollama: Simplifies LLM model deployment and inference. LangChain: Facilitates seamless integration of LLMs and external tools. FAISS: High-performance vector database for semantic search. Streamlit: User-friendly framework for creating interactive web applications.

About

RAG-based Local PDF Chatbot: Supports multiple PDFs and concurrent users. Powered by Mistral 7B LLM, LangChain, Ollama, FAISS vector store, and Streamlit for an interactive experience.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%