Spam Comments Detection with Machine Learning

Overview

This project implements a simple spam comments detection system using Machine Learning. The system classifies YouTube comments as either Spam or Not Spam based on their content. The model is built using a Naive Bayes classifier and trained on a labeled dataset.

Features

Preprocesses and vectorizes text data using CountVectorizer.
Trains a Bernoulli Naive Bayes model for binary classification.
Provides predictions for new sample comments.
Evaluates the model’s performance with accuracy on a test dataset.

Dataset

The dataset used for this project is Youtube01-Psy.csv. It contains YouTube comments with the following columns:

CONTENT: The text of the comment.
CLASS: A binary label where 0 represents Not Spam and 1 represents Spam Comment.

Requirements

Make sure you have the following Python libraries installed:

pandas
numpy
scikit-learn

You can install the dependencies using pip:

pip install pandas numpy scikit-learn

Usage

Step 1: Clone the Repository

Clone this repository to your local machine:

git clone https://github.com/gawadx1/spam-detection-ml.git

Step 2: Navigate to the Project Directory

cd spam-detection-ml

Step 3: Run the Script

Run the Python script to use the model and test sample comments:

python spam_comments_detection_gui.py

Step 4: Test Sample Comments

The script includes example comments to test the model. You can modify these or add your own samples to see how the model performs.

GUI Preview

Code Explanation

Main Steps:

Data Loading:
- Reads the dataset and filters the relevant columns (CONTENT and CLASS).
Preprocessing:
- Converts labels to human-readable values (Spam Comment, Not Spam).
- Converts text data to a bag-of-words representation using CountVectorizer.
Model Training and Evaluation:
- Splits the data into training and testing sets.
- Trains a Bernoulli Naive Bayes classifier.
- Evaluates the model’s accuracy on the test set.
Sample Prediction:
- Tests the model with predefined sample comments and predicts whether they are spam or not.

Output:

Accuracy of the model on the test data.
Predictions for sample comments.

Example Output

Model Accuracy: 0.95
Sample Comment: "Check this out: https://thecleverprogrammer.com /"
Prediction: Spam Comment

Sample Comment: "Lack of information!"
Prediction: Not Spam

Enhancements

Add additional features (e.g., comment length, special characters).
Experiment with other machine learning models.
Use a larger and more diverse dataset for better generalization.
Implement a user interface for real-time predictions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
Spam_Comments_Detection.ipynb		Spam_Comments_Detection.ipynb
Youtube01-Psy.csv		Youtube01-Psy.csv
model.pkl		model.pkl
spam_comments_detection_gui.py		spam_comments_detection_gui.py
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Comments Detection with Machine Learning

Overview

Features

Dataset

Requirements

Usage

Step 1: Clone the Repository

Step 2: Navigate to the Project Directory

Step 3: Run the Script

Step 4: Test Sample Comments

GUI Preview

Code Explanation

Main Steps:

Output:

Example Output

Enhancements

License

About

Releases

Packages

Languages

License

gawadx1/spam-detection-ml

Folders and files

Latest commit

History

Repository files navigation

Spam Comments Detection with Machine Learning

Overview

Features

Dataset

Requirements

Usage

Step 1: Clone the Repository

Step 2: Navigate to the Project Directory

Step 3: Run the Script

Step 4: Test Sample Comments

GUI Preview

Code Explanation

Main Steps:

Output:

Example Output

Enhancements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages