IMDB Sentiment Analysis Project

🎯 Project Overview

This project implements a machine learning pipeline to classify IMDB movie reviews into Positive or Negative sentiments. The pipeline includes data preprocessing, model training, evaluation, and a web-based interface for predictions. The project is fully containerized using Docker and uses Neptune.ai for experiment tracking and visualization.

Example of negative prediction

Example of positive prediction

🚀 Features

Data Preprocessing: Cleaning and vectorizing movie reviews using TF-IDF.
Model Training: Logistic Regression with hyperparameter tuning using Random Search.
Evaluation Metrics: Accuracy, F1-score, Confusion Matrix, and ROC-AUC curve.
Neptune.ai Integration: Logs experiments, metrics, and visualizations.
Web Interface: Simple frontend (HTML, CSS, JS) for users to input reviews and get predictions.
Containerization: Backend and frontend are containerized with Docker and orchestrated using Docker Compose.

🛠️ Technologies Used

Python 3.9: Main programming language.
FastAPI: Backend framework for serving predictions.
Scikit-learn: ML library for Logistic Regression and TF-IDF.
Neptune.ai: For experiment tracking.
Docker & Docker Compose: Containerization of the application.
HTML, CSS, JavaScript: Frontend interface.
Matplotlib & Seaborn: Visualization tools.
Pandas & NumPy: Data handling and processing.

📂 Project Structure

IMDB-Review-Classifier/
│
├── data/
│   ├── raw/                 # Raw dataset (IMDB Dataset.csv)
│   └── processed/           # Processed TF-IDF data and labels
│       ├── X_train_tfidf.npz
│       ├── X_test_tfidf.npz
│       ├── y_train.csv
│       ├── y_test.csv
│       └── tfidf_vectorizer.pkl
│
├── model/
│   └── best_sentiment_model.pkl   # Trained Logistic Regression model
│
├── backend/
│   ├── requirements.txt       # Python dependencies
│   ├── api.py                 # FastAPI backend for predictions
│   ├── train_model.py         # Model training with Random Search and Neptune logging
│   └── data_processing.py     # Data cleaning and TF-IDF processing
│
├── frontend/
│   ├── index.html             # Web interface
│   ├── style.css              # Styling for the web interface
│   └── static/                # Static assets like images (snowflakes, icons)
│
├── docker/
│   ├── Dockerfile.backend     # Dockerfile for the backend
│   ├── Dockerfile.frontend    # Dockerfile for the frontend
│   └── docker-compose.yml     # Docker Compose configuration
│
└── README.md                  # Project documentation

📊 Dataset

Dataset: IMDB Dataset of 50K Movie Reviews
Size: 50,000 rows (Positive and Negative reviews).
Format: CSV

⚙️ Setup Instructions

1. Clone the repository

git clone https://github.com/himarygr/IMDB-Review-Classifier.git
cd IMDB-Review-Classifier

2. Install dependencies (for local development)

Backend

cd backend
pip install -r requirements.txt

Frontend

No installation is required for the static frontend.

3. Data Preprocessing

Run the following script to clean and vectorize data:

python backend/data_processing.py

4. Model Training

Run model training with Random Search and log metrics to Neptune.ai:

python backend/train_model.py

5. Run with Docker Compose

To build and run the project using Docker:

cd docker
docker-compose up --build

Backend will run on: http://localhost:8000
Frontend will run on: http://localhost:8501

🔗 Endpoints (Backend API)

Method	Endpoint	Description
POST	`/predict/`	Predict sentiment of a review

Example Request:

{
  "review": "The movie was absolutely fantastic! Great acting and direction."
}

Example Response:

{
  "sentiment": "positive"
}

🖥️ Web Interface

The frontend provides a simple interface where users can:

Enter a movie review.
Click the "Analyze Sentiment" button.
See whether the review is classified as Positive 😊 or Negative 😞.

🧪 Experiment Tracking

All experiments, metrics, and visualizations are logged to Neptune.ai.

Logged Items:

Hyperparameters: C, solver, max_iter.
Metrics: Accuracy, F1-score.
Confusion Matrix: Uploaded as an image.
ROC-AUC Curve: Uploaded as an image.
CPU & Memory Usage: System resource monitoring.

🎨 Visualizations in Neptune.ai

Confusion Matrix
ROC-AUC Curve
Accuracy and F1-Score
Hyperparameter values
CPU/Memory usage during training

Neptune dashboard

🔮 Future Improvements

Add more classifiers (e.g., SVM, Random Forest) for comparison.
Integrate Grid Search for exhaustive hyperparameter tuning.
Deploy the project to a cloud service (AWS, GCP, etc.).
Enhance the frontend with a modern framework (React or Vue.js).

🤝 Contributing

Feel free to fork the repository, create a branch, and submit pull requests for new features or bug fixes!

📜 License

This project is licensed under the MIT License.

📞 Contact

For any questions or suggestions:

Email: lilley@ya.ru
GitHub: himarygr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

IMDB Sentiment Analysis Project

🎯 Project Overview

Example of negative prediction

Example of positive prediction

🚀 Features

🛠️ Technologies Used

📂 Project Structure

📊 Dataset

⚙️ Setup Instructions

1. Clone the repository

2. Install dependencies (for local development)

Backend

Frontend

3. Data Preprocessing

4. Model Training

5. Run with Docker Compose

🔗 Endpoints (Backend API)

🖥️ Web Interface

🧪 Experiment Tracking

Logged Items:

🎨 Visualizations in Neptune.ai

Neptune dashboard

🔮 Future Improvements

🤝 Contributing

📜 License

📞 Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

IMDB Sentiment Analysis Project

🎯 Project Overview

Example of negative prediction

Example of positive prediction

🚀 Features

🛠️ Technologies Used

📂 Project Structure

📊 Dataset

⚙️ Setup Instructions

1. Clone the repository

2. Install dependencies (for local development)

Backend

Frontend

3. Data Preprocessing

4. Model Training

5. Run with Docker Compose

🔗 Endpoints (Backend API)

🖥️ Web Interface

🧪 Experiment Tracking

Logged Items:

🎨 Visualizations in Neptune.ai

Neptune dashboard

🔮 Future Improvements

🤝 Contributing

📜 License

📞 Contact