Skip to content

An exploration into music genre classification using Neural Networks and Classifiers

Notifications You must be signed in to change notification settings

Not-Pace/Genre-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genre Classification using Machine Learning

Introduction

Being audiophiles, we've always been interested in the kind of effect the 'reading' of music has on its style, and whether there's a quantitative aspect to any tangible reading of music. Thus, we began this project with the thought of understanding how music genres came to be, and whether a stylistic choice can be quantified into a set of features on an image.

Dataset

For this project, we used the GTZAN Music Genre Dataset. The dataset took the age old GTZAN dataset from 2002 and extended upon it by adding an extracted set of important features in a CSV file, as well as images of the mel-spectrograms of all the audio files, thus making it 3 different datasets for the same purpose. We tried to explore all 3 for the models we made.

Contents

  • ./data is the dataset.
  • ./models has all the notebooks used for the different approaches used.

Findings

In our first attempt, we made a simple CNN on top of the VGG16 model using Transfer Learning, only to have it give us rather erratic results, either as a result of VGG16 not playing well with the data (it's technically meant for feature extraction from normal images, not graphs), or to do with the data itself.

Results - Transfer Learning

Moving forward, we decided to forgo transfer learning and make a model for scratch. An alternative attempt with the Xception model is possible, and is something we're probably going to implement in the future. Our results using the images was odd, and that strengthened our belief that something was wrong with the image dataset.

Results - Image

We then took up a more rudimentary approach, using the dataset of 30 second tracks, converting them to mel-spectrograms and feeding the spectrograms into the NN. The results we found were much clearer and commensurate to the kind of performance that other CNN solutions for this problems had been offering all this time.

Results - Mel-Spec

This fetched us an accuracy of 68%, which while being a fair number to stop on, is exactly as much as any CNN solution we found in papers or on the web seemed to have. To push further for better results, we picked up the features CSV file and worked on that. A simple KerasClassifier with standard scaling and KFold cross validation fetched us an accuracy of ~75%, which was markedly higher than all the CNN solutions that we'd explored, while also not being quite overfit to the dataset.

What we learnt was that Genre Classification while being an interesting topic to get into, isn't quite something that can be perfectly quantified using just features, or spectrograms alone. This mostly taught us that it's important for you, the user, to understand the data and its implications on the result before you can expect the model to. We also learnt that GridSearch cross validation is rather convenient.

About

An exploration into music genre classification using Neural Networks and Classifiers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published