Welcome to the Comprehensive AI Toolkit project! This repository contains a suite of AI-powered tools designed to demonstrate proficiency with major AI platforms and frameworks. The toolkit includes tools for image processing, music generation, text-to-speech (TTS), speech-to-text (STT), various natural language processing (NLP) tasks, and more.
- SAM2 Image Segmentation: Interactive tool for segmenting images using Meta's Segment Anything 2 (SAM2) model
- Click to add foreground (left click) and background (right click) points
- Save segmented regions with transparency
- Clear selections and start over
- More image tools coming soon...
A cutting-edge research assistant that combines user-provided sources with dynamic web search results, powered by Groq's Llama 3.3 and the pydantic_ai framework.
- Hybrid Search: Integrates user-supplied URLs with web search results for comprehensive research insights.
- AI-Synthesized Summaries: Leverages Llama 3.3 by Groq to generate detailed and accurate research summaries.
- Interactive Q&A: Facilitates follow-up questions with context-aware answers using the pydantic_ai framework.
- Advanced Search Configuration: Customize search parameters including depth, time range, and domain filtering with Tavily integration.
- Intelligent Source Management: Differentiates between user-provided and searched sources for precise citations and references.
- Music Generation: Create original music compositions across different genres
- Text-to-Speech & Speech-to-Text: Convert text to speech and transcribe spoken words into text
- NLP Tasks: Perform text generation, summarization, and sentiment analysis
- Python 3.8+
- CUDA-capable GPU (recommended) or CPU
- Git Bash (recommended for Windows users)
- Required Python packages listed in
requirements.txt
andtools/requirements_groq_llama_search.txt
You'll need to obtain free API keys from:
- Groq - For LLM inference
- Sign up at: Groq Console
- Get API key from your dashboard
- Tavily - For web search functionality
- Sign up at: Tavily
- Get API key from your dashboard
Create a .env
file in your project root with:
GROQ_API_KEY=your_groq_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
-
Clone the repository with submodules:
git clone --recursive https://github.com/dleon86/ai-toolkit.git cd ai-toolkit
-
Create a virtual environment:
python -m venv ai_env source ai_env/bin/activate # On Windows using Git Bash: source ai_env/Scripts/activate
-
Install PyTorch with CUDA support (if available):
# For CUDA 11.8 pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # OR for CUDA 12.1 pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
-
Install other dependencies:
pip install -r requirements.txt pip install -r tools/requirements_groq_llama_search.txt
-
Install and setup SAM2:
# Initialize and update SAM2 submodule git submodule update --init --recursive # Install SAM2 requirements cd external/sam2 pip install -e . # Download SAM2 model checkpoints ./download_ckpts.sh # Use Git Bash on Windows cd ../..
-
Install and setup Groq Llama 3 Hybrid Search & Q&A:
# Ensure environment variables are set in .env # No additional setup required beyond installing requirements
-
Navigate to the project root:
cd path/to/ai-toolkit
-
Run the segmentation tool:
python tools/image_processing_SAM2.py
-
Use the interactive interface:
- Left click to mark foreground points
- Right click to mark background points
- Click "Clear" to reset selections
- Click "Save" to export segmented regions with transparency
-
Find saved segments in the
saved_segments
directory:original_[timestamp].png
: Original imagemasked_[timestamp]_segment[n].png
: Selected regions with transparencyunmasked_[timestamp]_segment[n].png
: Unselected regions with transparency
-
Navigate to the project root:
cd path/to/ai-toolkit
-
Run the Streamlit app:
streamlit run tools/groq_web_search_RAG.py
-
Use the interactive interface to:
- Enter a research query
- Configure search parameters (depth, time range, content type, domains)
- Add your own source URLs
- Get AI-synthesized research summaries
- Ask follow-up questions and view cited sources with full URLs
Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.
This project is licensed under the MIT License.
Stay tuned for more updates and tutorials!