The Smart Search AI Agent is a powerful Streamlit application designed to streamline data processing. It allows users to upload CSV files or connect to Google Sheets, perform web searches for relevant information, and extract structured data using advanced language models. This tool is particularly useful for automating information retrieval and enhancing productivity.
- Data Integration:
- Upload CSV files or link Google Sheets for processing.
- Preview and select specific columns for data processing.
- Dynamic Querying:
- Create custom queries using placeholders like
{entity}
. - Automatically insert data from the selected column into queries.
- Create custom queries using placeholders like
- Automated Web Search:
- Perform searches using APIs like SerpAPI and gather structured results.
- LLM Integration:
- Use OpenAI GPT or Groq LLM to extract precise information from search results.
- Results Display and Export:
- View extracted data directly in the app and download it as a CSV file.
Before starting, ensure you have the following:
- Python 3.10 or later
- Git
- API Keys for:
Clone the project to your local machine:
git clone https://github.com/mehtachandrashekhar/smart-search-ai-agent.git
cd smart-search-ai-agent
Set up a virtual environment for managing dependencies:
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
Install the required Python packages:
pip install -r requirements.txt
Create a .env
file in the project root to store sensitive API keys:
SERPAPI_KEY=your_serpapi_key
OPENAI_API_KEY=your_openai_api_key
GOOGLE_CREDENTIALS_PATH=config/credentials.json
GROQ_API_KEY=your_groq_api_key
GROQ_API_URL=https://api.groq.com/openai/v1/chat/completions
GROQ_MODEL=llama3-8b-8192
GROQ_MAX_TOKENS=100
GROQ_CONTEXT_MAX_TOKENS=4096
-
config/credentials.json
: Configure your Google Sheets service account:{ "type": "service_account", "project_id": "your-project-id", "private_key_id": "your-private-key-id", "private_key": "-----BEGIN PRIVATE KEY-----\nyour-private-key\n-----END PRIVATE KEY-----\n", "client_email": "your-client-email", "client_id": "your-client-id", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-client-email" }
-
config/config.yaml
: Store API keys and settings:api: serpapi_key: your_serpapi_key openai_api_key: your_openai_api_key google_credentials_path: config/credentials.json groq_api_key: your_groq_api_key groq_api_url: https://api.groq.com/openai/v1/chat/completions groq_model: llama3-8b-8192 groq_max_tokens: 100 groq_context_max_tokens: 4096
- Open the Google Sheet you want to process.
- Click on the "Share" button.
- Add the service account email from
credentials.json
. - Set permissions to Editor and save.
Ensure sensitive files are ignored by Git:
# Environment variables
.env
# Configuration files
config/
Start the Streamlit app:
streamlit run app/main.py
- Upload CSV/Connect Google Sheets:
- Use the sidebar to upload a CSV file or enter a Google Sheets URL.
- Select Column:
- Choose the column containing entities for processing.
- Enter Query:
- Input a query with placeholders (e.g.,
Find the email of {entity}
).
- Input a query with placeholders (e.g.,
- Process Data:
- Click "Process Data" to search and extract results.
- View and Export:
- View the extracted results in the app and download them as a CSV file.
Contributions are always welcome! Here’s how you can contribute:
- Fork the Repository:
- Click the "Fork" button on GitHub.
- Create a Branch:
- Create a branch for your feature or bug fix:
git checkout -b feature-name
- Create a branch for your feature or bug fix:
- Make Changes:
- Add your feature or fix the bug.
- Push to GitHub:
- Push your branch:
git push origin feature-name
- Push your branch:
- Submit a Pull Request:
- Open a pull request on the original repository.
This project is licensed under the MIT License. See the LICENSE file for more details.