Skip to content
/ EmoTa Public

EmoTa is an open-access Tamil Speech Emotion Recognition dataset with 936 utterances from 22 native speakers, covering five emotions (anger, happiness, sadness, fear, and neutrality). It supports emotion classification tasks and advances Tamil language processing.

License

Notifications You must be signed in to change notification settings

aaivu/EmoTa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EmoTa

Tamil Emotional Speech Dataset is a collection of recordings in Sri Lankan Tamil, representing the distinct dialects spoken in the northern, eastern, western, and central provinces. It aims to capture the linguistic and emotional diversity of these regions for use in speech and emotion recognition research.

GitHub release (latest by date) License: EmoTa Academic-Commercial License


EmoTa is the first emotional speech dataset in Tamil, designed to reflect the linguistic diversity of Sri Lankan Tamil speakers. It includes 936 utterances from 22 native Tamil speakers (11 male, 11 female), each articulating 19 semantically neutral sentences across five primary emotions: Anger, Happiness, Sadness, Fear, and Neutrality.

Speaker Distribution

Key Features:

  • Speakers: 22 native Tamil speakers (11 male, 11 female)
  • Emotions: Anger, Happiness, Sadness, Fear, Neutrality
  • Sentences: 19 semantically neutral sentences to reduce lexical bias
  • Recording Quality: Captured in a controlled, soundproof environment with professional equipment
  • Total Duration: Approx. 48 minutes of speech

Dataset Structure:

The dataset is organized into emotion-based folders with the following naming convention:

EmoTa/
    ├── happy/
    ├── sad/
    ├── angry/
    ├── fear/
    └── neutral/
        └── <spkID>_<senID>_<emo[:3]>.wav

Purpose:

EmoTa aims to facilitate research in Speech Emotion Recognition (SER) for the Tamil language, offering a balanced and diverse representation of emotional expressions from native Tamil speakers. It is released as open-access to support further exploration of Tamil language processing.


Contact

Name Email LinkedIn
Jubeerathan Thevakumar jubeerathan.20@cse.mrt.ac.lk here
Luxshan Thavarasa luxshan.20@cse.mrt.ac.lk here
Thanikan Sivatheepan thanikan.20@cse.mrt.ac.lk here
Uthayasanker Thayasivam* rtuthaya@cse.mrt.ac.lk here

Dataset Access

About

EmoTa is an open-access Tamil Speech Emotion Recognition dataset with 936 utterances from 22 native speakers, covering five emotions (anger, happiness, sadness, fear, and neutrality). It supports emotion classification tasks and advances Tamil language processing.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •