This repository provides code used for musical genre recognition with artificial neural networks used in Buchmüller, A., Gerloff C. (2020). Music Genre Classification using Artificial Neural Networks. The paper has been published in Saefken, B., Silbersdorff, A., & Weisser, C. (Eds.). (2020). Learning deep. Universitätsverlag Göttingen, Göttingen, 2020. doi:10.17875/gup2020-1338 as part of a volume on Deep Learning.
Abstract from the paper:
Music genre recognition is a promising field of research in the area of music information retrieval (MIR). Genre classifiers have many real world applications, e.g. as a way to automatically tag large data sets suited as inputs to recommender systems. In this paper we propose a way to sample song data with the Spotify API and create a music genre classifier using artificial neural networks. We compare different feature sets to each other and evaluate their performance and accuracy using confusion matrices and more sophisticated metrics like F1 scores. We show that convolutional neural networks using timbre values perform well on this task and also propose ways to handle class imbalance.
Abstract from the volume:
Artificial intelligence is considered to be one of the most decisive topics in the 21th century. Deep learning algorithms, which are the basis of artificial intelligence applications, are of central interest for researchers but also for students that strive to build up academic knowledge and practical competences in this field. The Deep Learning Seminar at the University of Göttingen follows the central notion of the Humboldtian model of higher education and offers graduate students of applied statistics the opportunity to conduct their own research. The quality of the results motivated us to publish the most promising seminar papers in this volume. For the selected papers a full peer review process was conducted. The presented contributions cover a broad range of deep learning topics. The articles in the first part of this volume may serve the reader as introduction to deep learning algorithms. Subsequently, research applications allow the reader to gain deep insights into some of the latest developments in the field of artificial intelligence.
- Python 3.7.5
- Tensorflow - 1.14.0
- Keras - 2.3.1
- NumPy - 1.16.2
- Pandas - 0.25.3
- matplotlib - 3.0.3
- seaborn 0.9.0
- scikit-learn - 0.21.3
- spotipy - 2.4.4
The data used in our models was aquired from Spotify. It contains audio features provided by Echo Nest. To make authorized calls to the Spotify Web API you will need access to a Spotify developer account. A guide on how to access can be found here. The approach and used playlists are detailed in the data_acquisition notebook. Note however that the playlists change constantly so you will probably be unable to pull exactly the same data.
This repository contains the following notebooks:
- data_acquisition: shows how to sample your own song data from Spotify with spotipy.
- data_prep: pre-processing of the data for our neural networks. Used to extract timbre values and pitch for each track.
- c1m1: 1D Convolutional neural network approach using timbre vectors.
- c1m2: 1D Convolutional neural network approach using pitch vectors.
- c2m1: 2D Convolutional neural network approach using matrices combining timbre and pitch vectors.
Building a genre classifier with Keras can be broken down into three steps.
This is done in the data_acquisition and data_prep notebooks. You may skip the last step of the data_prep notebook (reducing raw data to Numpy arrays) but note that your raw data can get very large quickly. Our sample of approximately 10.000 tracks has a size of 4GB. By extracting only timbre and pitch vectors and compressing into a .npz file, we were able to cut the size down to 600MB.
This is done in the c1m1, c1m2 and c1m2 notebooks. All models are multi-layered CNN's. You may choose your own architecture, add more layers or filters and choose a differenz optimizer.
Also done in the c1m1, c1m2 and c1m2 notebooks. You'll need seaborn and scikit-learn to create confusion matrices similarly to ours. Below an example of our timbre model: