Automatic-Music-Genre-Classification

Comparison of the performance of different machine learning techniques on an automatic music genre classification ensemble. Classification was decided to be the most prominent genre shown by the song out of 10 possible ones. The two performance metrics established in the evaluation phase were the log loss and accuracy.

Project Description

The project revolves around the task of identifying the music genre of songs. This is useful as a way to group music into categories that can be later used for recommendation or discovery. The problem of music genre classification is difficult: while some genres distinctions are fairly straightforward (e.g. heavy metal vs classical), others are fuzzier (e.g. rock vs blues). The task was to construct a predictor h(x) for each genre Y, which takes the features x and maps it to a probability h(x) that the genre is "rock" (e.g.) or not. The classification would then be carried out as the most prominent genre - the largest probability found.

This work explores the use of logistic regression, support vector machines and gradient boosting as classifiers paired with some optimization methods such as feature scaling and grid search.

The Data

The dataset along with the songs correct classification was fetched from AllMusic.com. It contained preprocessed audio information - in particular, the raw audio signals had been transformed previously to carefully chosen features.

The labels from the dataset were taken from the following list:

'Pop_Rock'
'Electronic'
'Rap'
'Jazz'
'Latin'
'RnB'
'International'
'Country'
'Reggae'
'Blues'

Evaluation

Evaluation was performed on the Kaggle online platform in accordance to a participation in a competition between all the students of the course. The two metrics evaluated on Kaggle were accuracy and log loss.

The accuracy competition can be found here The log loss competition can be found here

Besides this, validation was also taken upon with the generation of confusion matrices for all techniques to analyse the performance of all methodologies.

Output

The output file for the accuracy competition had the form of Sample id followed by Sample Genre. An example can be seen below:

The output file for the log loss metric showcased the Sample id paired with a probability value for all Sample genres. Again, an example can be seen below.

Built With

Python 3.6.3 - Python version used
Scikit learn - Machine library used to employ the methods
Jupyter Notebook - Jupyter notebooks were used in development

Authors

Filipa Ramos - Initial work - FilipaRamos
Pedro Pontes - Initial work - pmpontes

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Article		Article
Code		Code
Research		Research
Resources		Resources
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic-Music-Genre-Classification

Project Description

The Data

Evaluation

Output

Built With

Authors

About

Releases

Packages

Contributors 2

Languages

FilipaRamos/Automatic-Music-Genre-Classification

Folders and files

Latest commit

History

Repository files navigation

Automatic-Music-Genre-Classification

Project Description

The Data

Evaluation

Output

Built With

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages