Toxic Comment Classification Challenge with TF-IDF

Introduction

The project identifies and calculates the probability score of online toxic comments for each of six given categories: 'toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'.

Installation

To install and run this project, follow these steps:

Clone the repository:

git clone https://github.com/username/Toxic-Comment-Classification-Challenge.git
cd Toxic-Comment-Classification-Challenge

Usage

Load the data files of training and test sets using the read_csv module of the pandas library.

Exploratory Data Analysis

The training data has 7 columns: comment_text and six labels. The data is cleaned using the re library.

Vectorization

The comment_text column is vectorized using the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm.

Model Comparison

The logistic regression model outperformed Support Vector Machines, Multi-layer Perceptron, and MiniLM models of Sentence Transformers.

Model	Leaderboard Score
Logistic Regression	0.97461
Support Vector Machines	0.8242
Multi-layer Perceptron	0.9092
Sentence Transformers	0.9596

Acknowledgments

The Scikit-Learn library and its contributors
Kaggle for providing the Toxic Comment Classification Challenge
The open-source community for their invaluable contributions

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
toxic-comments-classification-tf-idf-nltk.ipynb		toxic-comments-classification-tf-idf-nltk.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Classification Challenge with TF-IDF

Table of Contents

Introduction

Installation

Usage

Exploratory Data Analysis

Vectorization

Model Comparison

Acknowledgments

About

Releases

Packages

Languages

Amnaikram1/Toxic-Comment-Classification-Challenge

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification Challenge with TF-IDF

Table of Contents

Introduction

Installation

Usage

Exploratory Data Analysis

Vectorization

Model Comparison

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages