Skip to content

Latest commit

 

History

History
76 lines (53 loc) · 2.36 KB

README.md

File metadata and controls

76 lines (53 loc) · 2.36 KB

DS202 / Data science thesis 1

About

  • This is a college course project about collecting social data from Internet for for NLP emotion classification task.
  • Techniques applied:
  • Data collection and cleaning
  • Annotation guildlines
  • Word tokenization, Deep learning models

Table of contents

Data source

Experiment pipelines

Data details

  • Label distribution

  • Word count

  • Comment length distribution over labels

  • Annotation agreement results over 5 rounds

Code

  • Feature extraction and models training (and so on) in this repo are implemented in Google Colab.
  • All codes are organized in name.ipynb files.

Presentation slides and Report

References

  • All references are cited in the report file.

Cite us

@INPROCEEDINGS{9997964,
  author={Van Duong, Binh and Nguyen, An Trong and Ha, Chien Nhu and Duong, Hong-Hanh Thi and Tran, My-Linh Thi and Do, Trong-Hop},
  booktitle={2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)}, 
  title={UIT-VLFC: Vietnamese Lipstick Feedbacks Corpus}, 
  year={2022},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/O-COCOSDA202257103.2022.9997964}}