This project implements a machine learning pipeline to classify IMDB movie reviews into Positive or Negative sentiments. The pipeline includes data preprocessing, model training, evaluation, and a web-based interface for predictions. The project is fully containerized using Docker and uses Neptune.ai for experiment tracking and visualization.
- Data Preprocessing: Cleaning and vectorizing movie reviews using TF-IDF.
- Model Training: Logistic Regression with hyperparameter tuning using Random Search.
- Evaluation Metrics: Accuracy, F1-score, Confusion Matrix, and ROC-AUC curve.
- Neptune.ai Integration: Logs experiments, metrics, and visualizations.
- Web Interface: Simple frontend (HTML, CSS, JS) for users to input reviews and get predictions.
- Containerization: Backend and frontend are containerized with Docker and orchestrated using Docker Compose.
- Python 3.9: Main programming language.
- FastAPI: Backend framework for serving predictions.
- Scikit-learn: ML library for Logistic Regression and TF-IDF.
- Neptune.ai: For experiment tracking.
- Docker & Docker Compose: Containerization of the application.
- HTML, CSS, JavaScript: Frontend interface.
- Matplotlib & Seaborn: Visualization tools.
- Pandas & NumPy: Data handling and processing.
IMDB-Review-Classifier/
│
├── data/
│ ├── raw/ # Raw dataset (IMDB Dataset.csv)
│ └── processed/ # Processed TF-IDF data and labels
│ ├── X_train_tfidf.npz
│ ├── X_test_tfidf.npz
│ ├── y_train.csv
│ ├── y_test.csv
│ └── tfidf_vectorizer.pkl
│
├── model/
│ └── best_sentiment_model.pkl # Trained Logistic Regression model
│
├── backend/
│ ├── requirements.txt # Python dependencies
│ ├── api.py # FastAPI backend for predictions
│ ├── train_model.py # Model training with Random Search and Neptune logging
│ └── data_processing.py # Data cleaning and TF-IDF processing
│
├── frontend/
│ ├── index.html # Web interface
│ ├── style.css # Styling for the web interface
│ └── static/ # Static assets like images (snowflakes, icons)
│
├── docker/
│ ├── Dockerfile.backend # Dockerfile for the backend
│ ├── Dockerfile.frontend # Dockerfile for the frontend
│ └── docker-compose.yml # Docker Compose configuration
│
└── README.md # Project documentation
- Dataset: IMDB Dataset of 50K Movie Reviews
- Size: 50,000 rows (Positive and Negative reviews).
- Format: CSV
git clone https://github.com/himarygr/IMDB-Review-Classifier.git
cd IMDB-Review-Classifier
cd backend
pip install -r requirements.txt
No installation is required for the static frontend.
Run the following script to clean and vectorize data:
python backend/data_processing.py
Run model training with Random Search and log metrics to Neptune.ai:
python backend/train_model.py
To build and run the project using Docker:
cd docker
docker-compose up --build
- Backend will run on:
http://localhost:8000
- Frontend will run on:
http://localhost:8501
Method | Endpoint | Description |
---|---|---|
POST | /predict/ |
Predict sentiment of a review |
Example Request:
{
"review": "The movie was absolutely fantastic! Great acting and direction."
}
Example Response:
{
"sentiment": "positive"
}
The frontend provides a simple interface where users can:
- Enter a movie review.
- Click the "Analyze Sentiment" button.
- See whether the review is classified as Positive 😊 or Negative 😞.
All experiments, metrics, and visualizations are logged to Neptune.ai.
- Hyperparameters:
C
,solver
,max_iter
. - Metrics: Accuracy, F1-score.
- Confusion Matrix: Uploaded as an image.
- ROC-AUC Curve: Uploaded as an image.
- CPU & Memory Usage: System resource monitoring.
- Confusion Matrix
- ROC-AUC Curve
- Accuracy and F1-Score
- Hyperparameter values
- CPU/Memory usage during training
- Add more classifiers (e.g., SVM, Random Forest) for comparison.
- Integrate Grid Search for exhaustive hyperparameter tuning.
- Deploy the project to a cloud service (AWS, GCP, etc.).
- Enhance the frontend with a modern framework (React or Vue.js).
Feel free to fork the repository, create a branch, and submit pull requests for new features or bug fixes!
This project is licensed under the MIT License.
For any questions or suggestions:
- Email: lilley@ya.ru
- GitHub: himarygr