Predicting Human Emotions with RoBERTa and XLNet

Business Objective

Explore and leverage advanced language models to enhance sentiment prediction beyond BERT's capabilities. In this project, we focus on two architectures:

RoBERTa: A Robustly Optimized BERT Pretraining Approach
XLNet: Generalized Autoregressive Pretraining for Language Understanding

We use the architectures, investigate their training and optimization techniques, and apply them to classify human emotions into distinct categories.

Data Description

The dataset, named "Emotion," comprises English Twitter messages annotated with six basic emotions: anger, fear, joy, love, sadness, and surprise. The dataset, sourced from the Hugging Face library, consists of three categories:

Train: 16,000 rows, 2 columns
Validation: 2,000 rows, 2 columns
Test: 2,000 rows, 2 columns

The two columns represent labels and text, with labels corresponding to different emotions (0: sadness, 1: joy, 2: love, 3: anger, 4: fear, 5: surprise).

Aim

The project aims to build and evaluate two emotion classification models: RoBERTa and XLNet.

Tech Stack

Language: Python
Libraries: datasets, numpy, pandas, matplotlib, seaborn, ktrain, transformers, tensorflow, sklearn

Environment

Jupyter Notebook
Google Colab Pro (Recommended)

Approach

Install Required Libraries
Load 'Emotion' Dataset
Read Dataset Across Categories
Convert Dataset to Dataframe and Create a New Feature
Data Visualization
- Histogram Plots
RoBERTa Model
- Create RoBERTa model instance
- Split train and validation data
- Perform Data Pre-processing
- Compile RoBERTa in a K-train learner object
- Find optimal learning rate
- Fine-tune RoBERTa on the dataset
- Evaluate performance metrics
- Save RoBERTa model
- Apply RoBERTa on test data and assess performance
Understanding Autoregressive and Autoencoder Models
XLNet Model
- Load required libraries
- Create XLNet model instance
- Split train and validation data
- Perform Data Pre-processing
- Compile XLNet in a K-train learner object
- Find optimal learning rate
- Fine-tune XLNet on the dataset
- Evaluate performance metrics
- Save XLNet model
- Apply XLNet on test data and assess performance

Modular Code Overview

Src Folder

Engine.py
ML_Pipeline Folder

ML_Pipeline Folder

Contains functions in different Python files, appropriately named, for each step. These functions are called inside the Engine.py file.

Output Folder

Contains the best-fitted model trained for this data. This model can be loaded for future use without retraining. Note: The model is built on a subset of data; running Engine.py with the entire data retrains the models.

Lib Folder

Contains the original IPython notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
lib		lib
src		src
LICENSE		LICENSE
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Human Emotions with RoBERTa and XLNet

Business Objective

Data Description

Aim

Tech Stack

Environment

Approach

Modular Code Overview

About

Releases

Packages

Languages

License

AjNavneet/Emotion-Classification-RoBERTa-XLNet

Folders and files

Latest commit

History

Repository files navigation

Predicting Human Emotions with RoBERTa and XLNet

Business Objective

Data Description

Aim

Tech Stack

Environment

Approach

Modular Code Overview

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages