This project implements a Retrieval-Augmented Generation (RAG) pipeline, enabling users to upload various data files (CSV, JSON, PDF, DOCX), store their content in a Chroma vector store, and interact with it via a chatbot. The chatbot is powered by Gemini, OpenAI, or local models accessible through OLLAMA, retrieving relevant information and using Large Language Models (LLMs) to enhance responses based on user queries. Additionally, the app provides an experimental Graph RAG feature for visualizing data connections.
- Flexible File Upload – Supports uploading CSV, JSON, PDF, and DOCX files, allowing users to choose which columns or sections to index.
- Chroma-based Storage and Retrieval – Uses Chroma to store vector embeddings and perform efficient vector-based searches.
- Interactive Chatbot – Chatbot interaction is enhanced by Gemini, OpenAI, or local LLMs to generate context-aware responses.
- Customizable LLM Choices – Offers options between cloud-based Gemini and OpenAI, or local LLMs like OLLAMA, with support for various open-source models.
- Dynamic Chunking Options – Provides multiple chunking strategies: Recursive Token Chunking, Agentic Chunking, Semantic Chunking, or no chunking.
- Graph RAG Visualization – Experimental support for visualizing data relationships and connections using Graph RAG.
git clone https://github.com/bangoc123/drop-rag.git
cd drop-rag
pip install -r requirements.txt
streamlit run app.py
The app will be accessible at http://localhost:8501
.
Upload a CSV, JSON, PDF, or DOCX file. You can specify which columns to index for vector-based search.
The data is stored in Chroma, and embeddings are generated using models like all-MiniLM-L6-v2
(for English) or keepitreal/vietnamese-sbert
(for Vietnamese).
Select from:
- Gemini API (requires a Gemini API key)
- OpenAI API (requires an OpenAI API key)
- Local LLMs via OLLAMA, supporting models like
llama
,gpt-j
, and more.
Select a chunking method to organize the content:
- No Chunking: Use the entire document.
- Recursive Token Chunking: Divide text based on token count.
- Semantic Chunking: Group text semantically.
- Agentic Chunking: Use an LLM to dynamically manage text chunks (requires Gemini API).
Start chatting with the bot, which will enhance responses using the retrieved content.
Use Graph RAG to visualize relationships and connections within the uploaded data:
- Ensure an online LLM is configured (Gemini or OpenAI).
- Click the "Extract Graph" button to generate and display the graph.
Here is a list of models supported by OLLAMA:
Model Name | Size | Identifier |
---|---|---|
Llama 3.2 (3B) | 3B (2.0GB) | llama3.2 |
Phi 3 Medium (14B) | 14B (7.9GB) | phi3:medium |
Code Llama (7B) | 7B (3.8GB) | codellama |
Mistral (7B) | 7B (4.1GB) | mistral |
... | ... | ... |
High-performance GGUF models are supported. Refer to Hugging Face for available models.
Experimental support for Graph RAG allows the visualization of data connections. Requires an online LLM (Gemini or OpenAI).
Choose from:
- Vector Search: Based on vector similarity.
- Hyde Search: Uses a generative model for improved search accuracy.
- No Results? Ensure you've indexed the correct columns and stored embeddings.
- API Issues? Verify that your API key is valid (if using Gemini or OpenAI) and that your vector store is initialized.
- Gemini and OpenAI API Keys: Required for cloud-based LLMs. Obtain keys from their respective platforms.
- Local Models: Requires Docker for local model inference.
The app allows exporting configuration data for local LLMs to JSON for easy deployment.
Users can clear session state via a sidebar button to reset settings.