Sentiment-analysis-in-Persian-text-using-deep-learning

We have developed a text classification model for Persian language using the ArmanEmo dataset and the Hugging Face model.

Dataset

The ArmanEmo dataset has been collected from various sources, including Persian tweets from Twitter, users' comments on Instagram, and customers' comments on Digikala (an online shopping platform). These sources were chosen to capture a diverse range of textual data that reflects individuals' emotions and opinions on social and political topics. The dataset aims to be representative and encompass different online platforms commonly used in Iran, where individuals express their ideas and emotions.

The ArmanEmo dataset consists of seven classes with the following distribution:

Model

We utilized the powerful Hugging Face platform, which provides access to a wide range of powerful models. We leveraged libraries within this platform, such as tokenization and more. We experimented with various models, including ParsBert, ALBERT, roberta_facebook, and roberta-base-ft-udpos28, in order to achieve the desired performance. Ultimately, we achieved the highest accuracy on the test data using the persian_xlm_roberta_large model.

We went through the following four steps:

Preprocessed the train and test datasets.
Tested different models and selected the best model.
Saved the best model.
Evaluated the performance using metrics such as F1 score, precision, recall, and accuracy.

Results

The evaluation results on the test data were as follows: Precision: 0.732781028240936 Recall: 0.7115551694178974 F1 score: 0.7094612321075684

And the confusion matrix was as follows:

You can refer to the accompanying PDF document for detailed implementation steps. Please see the PDF file for a comprehensive guide on how to execute the code.

How to run

Download the ArmanEmo dataset using the provided link.
Run the notebook in the "preprocess" folder [Notebook].
Run the notebook in the "main_model" folder [Notebook].
You can also apply the text classification results on your own dataset by running the notebook in the "predict_evaluate" folder. Simply perform the second step, which is preprocessing, on your dataset, and then use the saved best model in the "save_best_model" folder to classify your text dataset [Notebook].

References

https://huggingface.co/docs/transformers/en/tasks/sequence_classification

https://arxiv.org/pdf/2207.11808.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
dataset_arman-text-emotion		dataset_arman-text-emotion
main_model		main_model
other_model		other_model
predict_evaluate		predict_evaluate
preprocess		preprocess
save_best_model		save_best_model
DL_Project_Optional.ipynb		DL_Project_Optional.ipynb
IntroDL_Project_SadeghPoulaei_FatemehAskari.pdf		IntroDL_Project_SadeghPoulaei_FatemehAskari.pdf
LICENSE		LICENSE
README.md		README.md
README.md~		README.md~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-analysis-in-Persian-text-using-deep-learning

Table of Contents

Dataset

Model

Results

How to run

References

About

Releases

Packages

Languages

License

FatemehAskari/Sentiment-analysis-in-Persian-text-using-deep-learning

Folders and files

Latest commit

History

Repository files navigation

Sentiment-analysis-in-Persian-text-using-deep-learning

Table of Contents

Dataset

Model

Results

How to run

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages