A robust end-to-end MLOps project that predicts whether a user will click on an advertisement based on various user behavioral and demographic features.
This project implements a complete MLOps pipeline for ad click prediction, incorporating best practices in machine learning operations including automated data pipelines, data ingestion, data validation, data transforamtion, model training, evaluation, and deployment. The system uses MongoDB for data storage, AWS for model registry and deployment, Data Version Control and Performance tracking, and includes comprehensive CI/CD pipelines.
- Age
- Gender
- Device Type
- Ad Position
- Browsing History
- Time of Day
The project follows a modular and scalable architecture with the following components:
-
Data Ingestion 📥
- MongoDB integration for data storage and retrieval
- Automated data extraction and transformation pipeline
- Data validation and quality checks
-
Data Validation ✅
- Schema validation using YAML configuration
- Data drift detection
- Automated validation reports
-
Data Transformation 🔄
- Feature engineering pipeline
- Data preprocessing and standardization
- Automated transformation artifacts
-
Model Training 🧠
- Automated model training pipeline
- Hyperparameter optimization
- Model performance logging
-
Model Evaluation 📊
- Automated performance metrics calculation
- Model comparison with existing production model
- AWS S3 integration for model registry
-
Model Deployment 🚀
- Containerized deployment using Docker
- AWS ECR for container registry
- CI/CD pipeline using GitHub Actions
- Python 3.10
- MongoDB Atlas - Data Storage
- AWS Services:
- S3 (Model Registry)
- ECR (Container Registry)
- EC2 (Deployment)
- Docker - Containerization
- GitHub Actions - CI/CD Pipeline
- FastAPI - Web Application
- Clone the repository:
git clone https://github.com/bobinsingh/Ad-Click-Prediction-MLOps.git
- Create and activate a conda environment:
conda create -n Ad python=3.10 -y
conda activate Ad
- Install requirements:
pip install -r requirements.txt
- Set up MongoDB connection:
export MONGODB_URL="your_mongodb_connection_string"
- Set up AWS credentials:
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
├── artifacts/ # Training artifacts and model files
├── dataset/ # Contains a local copy of dataset used in this project
├── configs/ # Contain Schema and Model config files
├── src/
| ├── cloud/ # Contains files for AWS connection & storage
│ ├── components/ # Core pipeline components
│ ├── config/ # Files relate to database
│ ├── constants/ # Contains Central file for all Constants used
│ ├── data/ # Contains project data handler
│ ├── docs/ # Documents related to project
│ ├── entities/ # Contain Artifact & Config, and model related entities
│ ├── exceptions/ # Custom exception handling
│ ├── logging/ # Logging configuration
│ ├── pipelines/ # Training & Prediction pipeline
│ ├── tests/ # Test pipeline
│ └── utils/ # Utility functions
├── static/ # Static files for web application
├── templates/ # HTML templates
├── app.py # FastAPI application
├── Dockerfile # Docker configuration
├── requirements.txt # Project dependencies
└── setup.py # Project setup configuration
-
Data Pipeline:
- Automated data ingestion from MongoDB
- Data validation and quality checks
- Feature engineering and transformation
-
Training Pipeline:
- Model training with latest data
- Performance evaluation
- Model versioning and registry
-
Deployment Pipeline:
- Automated Docker image creation
- Push to AWS ECR
- Deployment to EC2 instance
The project includes a web interface for:
- Real-time ad click predictions
- Model training triggering
- Performance monitoring
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.