Skip to content

Amnaikram1/Toxic-Comment-Classification-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Toxic Comment Classification Challenge with TF-IDF

Table of Contents

Introduction

The project identifies and calculates the probability score of online toxic comments for each of six given categories: 'toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'.

Installation

To install and run this project, follow these steps:

  1. Clone the repository:
    git clone https://github.com/username/Toxic-Comment-Classification-Challenge.git
    cd Toxic-Comment-Classification-Challenge

Usage

Load the data files of training and test sets using the read_csv module of the pandas library.

Exploratory Data Analysis

The training data has 7 columns: comment_text and six labels. The data is cleaned using the re library.

Vectorization

The comment_text column is vectorized using the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm.

Model Comparison

The logistic regression model outperformed Support Vector Machines, Multi-layer Perceptron, and MiniLM models of Sentence Transformers.

Model Leaderboard Score
Logistic Regression 0.97461
Support Vector Machines 0.8242
Multi-layer Perceptron 0.9092
Sentence Transformers 0.9596

Acknowledgments

  • The Scikit-Learn library and its contributors
  • Kaggle for providing the Toxic Comment Classification Challenge
  • The open-source community for their invaluable contributions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published