-
Notifications
You must be signed in to change notification settings - Fork 1
mattmcvicar/lyric_database
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
LabROSA Lyrics / Audio Repository ================================= This is the documentation for LYRICAL: the Labrosa repositorY of Recorded Information Concerning Audio and Lyrics. Contained within are the lyrics to popular music tracks, annotated at the structure level (verse, chorus etc.) with accompanying acapella and polyphonic audio. The database is intended to be used for providing a common dataset for Automatic Lyric Transcription (ALT). Also contained within are therefore scripts for reproducing our baseline ALT performance using the Sphinx4 Automatic Speech Recognition system. Installation ============ Lyrics ------ The easiest way of getting the lyrics is to say: git clone https://github.com/mattmcvicar/lyric_database.git Audio ----- For queries relating to audio, please email mattjamesmcvicar@gmail.com Sphinx4 ------- If you want to run Reproduce_baseline.py, you will need to set up a Sphinx4 project. The following list of conjures was found to be effective on Mac OSX 10.6.8, revision revision 12489. If the following doesn't work for you, it may be that the sphinx4 structure has changed, in which case email mattjamesmcvicar@gmail.com and I'll update this README. 1) Download latest stable Sphinx4 *source* release from: http://cmusphinx.sourceforge.net/wiki/download/ 2) unzip and cd into repo jar xvf sphinx4-5prealpha-src.zip cd sphinx4-5prealpha 3) build Sphinx4 and all demos ant 4) We're going to adapt one of the demos for our own use. Open up /src/apps/edu/cmu/sphinx/demo/transcriber/Transcriber.java in an editor. First, note that Transcriber currently spits out quite a bit of info, so comment out everything in the while loop (lines 51-63) and replace it with the following: System.out.format("%s\n", result.getHypothesis()); so that it just echoes the best hypothesis for each detected utterance to stdout 5) re-complile the demo: ant 6) Run the demo, with 512mb starting heap memory, up to 1GB. Run the following: java -Xms512m -Xmx1024m -jar ./bin/Transcriber.jar It's had a pretty good go at transcribing some digits. You should see: what zero is zero zero one nine oh two one oh cyril one eight zero three 7) The demo currently reads the audio file stored in src/apps/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav and transcribes it using a fairly general-purpose model. Let's adapt it to take an audio file as an additional input. Set the line: recognizer.startRecognition(new URL("file:src/apps/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav").openStream()); to be: recognizer.startRecognition(new URL("file:" + args[0] ).openStream()); 8) re-compile the demo for the last time: ant 9) Now try running on your own audio: java -Xms512m -Xmx1024m -jar ./bin/Transcriber.jar 10) That's it! Note you can also change the acoustic model, language model and dictionary in Transcriber.java too. Just remember to re-compile after each change. To reproduce our baseline results, see the file resources/Reproduce_baseline.py Contents ======== Lyrics ------ The database is split into two disjoint subsets: "Train" and "Test", and two genres: "Rap" and "Sing". The "Train" set is to be used for training or adapting acoustic or language models, the test set reserved for evaluation. We believe the acoustic and language models for rapped and sung audio to be quite different, so have kept them distinct. Each file is written in plain text, with each line being one of: - annotation line - lyric line - empty line Annotation lines are enclosed within square brackets and contain structural information. lyric lines contain the lyrics to the song, made up of the lower and upper-case letters of the latin alphabet, together with the following punctuation set: ,!?'- (commas, exclamation marks, questions marks, single quotes and hyphens). Empty lines are used purely for aesthetic purposes. Example file snippet (first verse of "30_Seconds_to_Mars_-_The_Kill.lyrics"): ----------------------------------- [VERSE] What if I wanted to break Laugh it all off in your face What would you do? Oh, oh oh oh What if I fell to the floor Couldn't take all this anymore What would you do do, do do, do do? ----------------------------------- Resources --------- Since the database contains a considerable amount of slang, we also include a file resources/Train_Test_Holdout.dict which contains pronounciations from the CMU set of 39 unstressed phonemes for each word in the train, test, and holdout sets. These are not used in our baseline experiments (many will most likely not be in standard language models) except in evaluation, but will be essential in training new language models. Reproduce_baseline.py --------------------- Also contained within the repository is a simple python script "Reproduce_baseline.py", used to reproduce the results reported in the accompanying paper (see below) The script assumes you are familiar with Sphinx4. If you require assistance in installing and setting up Sphinx4, please see: http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4 Attribution =========== If you make use of this database in your research, please consider citing the following paper: Contact ======= For all queries, please contact mattjamesmcvicar@gmail.com
About
Scripts and Data for the LabROSA Aligned Lyric Database
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published