DS202 / Data science thesis 1

About

This is a college course project about collecting social data from Internet for for NLP emotion classification task.
Techniques applied:

Data collection and cleaning

Annotation guildlines

Word tokenization, Deep learning models

Data source

Raw data
Clean data

Experiment pipelines

Data details

Label distribution

Word count

Comment length distribution over labels

Annotation agreement results over 5 rounds

Code

Feature extraction and models training (and so on) in this repo are implemented in Google Colab.
All codes are organized in name.ipynb files.

Presentation slides and Report

Report slides
Report
Guildlines

References

All references are cited in the report file.

Cite us

@INPROCEEDINGS{9997964,
  author={Van Duong, Binh and Nguyen, An Trong and Ha, Chien Nhu and Duong, Hong-Hanh Thi and Tran, My-Linh Thi and Do, Trong-Hop},
  booktitle={2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)}, 
  title={UIT-VLFC: Vietnamese Lipstick Feedbacks Corpus}, 
  year={2022},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/O-COCOSDA202257103.2022.9997964}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DS202 / Data science thesis 1

About

Table of contents

Data source

Experiment pipelines

Data details

Code

Presentation slides and Report

References

Cite us

Files

README.md

Latest commit

History

README.md

File metadata and controls

DS202 / Data science thesis 1

About

Table of contents

Data source

Experiment pipelines

Data details

Code

Presentation slides and Report

References

Cite us