Skip to content

Latest commit

 

History

History
285 lines (229 loc) · 69.5 KB

README.md

File metadata and controls

285 lines (229 loc) · 69.5 KB

Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval Systems: a Survey

arXiv preprint

This repository contains the tables accompanying the paper "Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval Systems: a Survey" by Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller and Dorien Herremans.

Representations

Event-based tokenization

Elementary tokens

Tokenization Score-based / Performance-based Alphabet Grouping Vocab. size Data
ABC notation Score Text alphabet Bar patching N/A Monophonic
Synchronized Multi-Track ABC Notation (2024) (MuPT) Score Text alphabet BPE N/A Multi-track
MIDI-like (2018) Performance
<Note-ON> (MIDI value)
<Note-OFF> (MIDI value)
<Time-shift> (absolute time) <Velocity> (integer)
BPE, BPE Unigram 388 Piano
LakhNES (2019) Performance
<Note-ON-[Trk]> (MIDI value)
<Note-OFF-[Trk]> (MIDI value)
<Time-shift> (absolute time)
- 630 Multi-track
REMI (2020) Score
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
<Bar>
<Position> (music time)
<Chord> (class)
BPE, BPE, BPE Unigram 332 Piano
REMI+ (2022) Score
REMI alphabet + features:
<Instrument> (class)
<Time-Signature> (class)
<Tempo> (integer)
- N/A Multi-track
Lee & al. (2022) (ComMU) Score
REMI alphabet + metadata:
<BPM> (integer)
<Key> (class)
<Instrument> (class)
<Time-Signature> (class)
<Pitch-range> (class)
<Number-of-measures> (number)
<Min-velocity> (integer)
<Max-velocity> (integer)
<Rhythm> (class)
- 728 Multi-track
MusIAC (2022) Score
REMI alphabet + control info:
<Tensile-train> (class)
<Cloud diameter> (class)
<Density> (class)
<Polyphony> (class)
<Occupation> (class)
- 360 Multi-track
Gover & al. (2022) Score
<Pitch> (MIDI value)
<Duration> (music time)
<Position> (music time)
<Bar>
<Hand> (class)
- N/A Piano
Wu & Yang (2023) (MuseMorphose) Score
<Pitch-[Trk]> (MIDI value)
<Duration-[Trk]> (music time)
<Velocity-[Trk]> (integer)
<Bar>
<Position> (music time)
<Tempo> (integer)
- 3440 Multi-track
MultiTrack (2020) (MMM) Performance
<Start-piece>
<Start-track>/<End-track>
<Start-bar>/<End-bar>
<Start-fill><End-fill>
<Note-ON> (MIDI value)
<Note-OFF> (MIDI value)
<Time-shift> (absolute time)
<Instrument> (class)
<Density level> (integer)
- 440 Multi-track
MMR (2022) (SymphonyNet) Score
<Start-score>/<End-score>
<Start-bar>/<End-bar>
<Chord> (class)
<Change-track>
<Position> (integer)
<Pitch> (MIDI value)
<Duration> (music time)
BPE N/A Multi-track
TSD (2023) Performance
<Pitch> (MIDI value)
<Velocity> (integer)
<Duration> (absolute time)
<Time-shift> (absolute time)
<Rest> (absolute time)
<Program> (class)
BPE 249 Multi-track
Structured (2021) Performance
<Pitch> (MIDI value)
<Velocity> (integer)
<Duration> (absolute time)
<Time-shift> (absolute time)
- 428 Piano
Chen & al. (2020) Score (tabs)
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
<Position> (music time)
<Bar> (integer)
<String> (integer)
<Fret> (integer)
<Technique> (class)
<Grooving> (class)
- 231 Guitar
Li & al. (2023) Score
<Pitch-class> (class)
<Octave> (integer)
<Duration> (music time)
<Bar> (integer)
<Position> (music time)
<Velocity> (integer)
- N/A Monophonic
DadaGP (2021) Score (tabs)
<Start><End>
<Instrument:note> (class)
<String> (integer)
<Fret> (integer)
<Drums:note> (MIDI value)
<Effect> (class)
<Wait> (integer)
BPE Unigram 2140 Guitar

Composite tokens

Tokenization Musical features Embedded object Data
Luo & al. (2020) (MG-VAE)
<Pitch> (class)
<Interval> (number)
<Rhythm> (class)
3-long vector Monophonic
Zhang (2020)
<Program> (class)
<Pitch> (integer)
<Velocity> (integer)
3-long vector + Time-shift Multi-track
PiRhDy (2020)
<Chroma> (class)
<Octave> (integer)
<Inter-onset-interval> (music time)
<Note-state> (class)
<Velocity> (integer)
5-long vector Multi-track
Zixun & al. (2021)
<Pitch> (one-hot)
<Duration> (one-hot)
<Current-chord> (one-hot)
<Next-chord> (one-hot)
<Bar> (one-hot)
246-long vector Lead sheet
Octuple (2021) (MusicBERT)
<Time-signature> (class)
<Tempo> (integer)
<Bar> (integer)
<Position> (music time)
<Instrument> (class)
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
8-long vector Multi-track
Dong & al. (2023) (MMT)
<Type> (class)
<Beat> (integer)
<Position> (music time)
<Pitch> (MIDI value)
<Duration> (music time)
<Instrument> (class)
6-long vector Multi-track
Dalmazzo & al. (2023) (Chordinator)
<Chord-root> (class)
<Chord-nature> (class)
<Chord-extensions> (class)
<MIDI-array> (multi-hot)
<Slash-chord> (boolean)
8-long vector Chord sequences
Wang & al. (2021) (MuseBERT)
<Onset> (music time)
<Pitch> (MIDI value)
<Duration> (music time)
+ factorized properties
Matrices of factorized attributes and relations Multi-track
MuMIDI (2020)
<Bar>
<Position> (music time)
<Tempo> (integer)
<Track> (class)
<Chord> (class)
<Pitch> (MIDI value)
<Drum> (MIDI value)
<Velocity> (integer)
<Duration> (music time)
Note / Event grouping Multi-track
Compound Word (2021)
<Family> (class)
<Time-signature> (class)
<Bar> (integer)
<Beat> (music time)
<Chord> (class)
<Tempo> (integer)
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
Note / Event grouping Piano
Di & al. (2021)
<Type> (class)
<Bar/beat> (integer)
<Density> (class)
<Strenth> (class)
<Instrument> (integer)
<Pitch> (MIDI value)
<Duration> (music time)
Note / Event grouping Multi-track
Makris & al. (2022)
Encoder input:
<Onset> (number)
<Group> (class)
<Type> (class)
<Duration> (music time or none)
<Value> (any - depends on type)
Decoder output:
<Onset> (number)
<Drums> (integer)
Note / Event grouping Encoder: Multi-track
Decoder: Drums
Unsupervised Compound Word (2024)
<Family> (class)
<Time-signature> (class)
<Bar> (integer)
<Beat> (music time)
<Chord> (class)
<Tempo> (integer)
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)</details
Note / Event grouping + learning (BPE) Piano
REMI_Track (2024)
<Instrument> (class)
<Bar>
<Position> (music time)
<BPE> or <Pitch> (MIDI value) or <Velocity> (integer) or <Duration> (music time) </details
5-long vector + learning (BPE) Multi-track

Models

Recurrent models

RNN

Model Recurrent unit Architecture Data Representation Tasks
RNN-RBM (2012) Vanilla RNN RBM + RNN Multi-track Time-slice (piano roll) Free generation
RNN-DBN (2014) Vanilla RNN RBM + DBN + RNN Multi-track Time-slice (piano roll) Free generation

LSTM

Model Recurrent unit Architecture Data Representation Tasks
Folk-RNN (2016)
Code
LSTM LSTM Monophonic ABC notation Free generation
C-RNN-GAN (2016)
Code
LSTM GAN + Bi-LSTM Multi-track Pitch + duration + time-shift + velocity (composite tokens) Free generation
Song from Pi (2016) LSTM Hierarchical + LSTM Multi-track Custom features (composite tokens) Free generation (melody, chord, drum generation)
Melody / Attention-RNN (2016)
Code
LSTM LSTM (+ Attention) Monophonic Note-ON / Note-OFF Priming
DeepBach (2017)
Code
LSTM Bi-LSTM 4-part chorales Time-slice-based Harmonization
Free generation
Anticipation-RNN (2017)
Code
LSTM LSTM Monophonic Pitch + duration (time-slice-based) Infilling
JamBot (2017)
Code
LSTM LSTM Multi-track Time-slice (piano roll) Chord generation
Chord-conditioned generation
Note-RNN / RL Tuner (2017)
Code
LSTM LSTM (+ Reinforcement Learning) Monophonic Note-ON / Note-OFF Free generation
PerformanceRNN (2018)
Code
LSTM LSTM Piano MIDI-like Expressive performance generation
Chen & al. (2018)
Code
LSTM Bi-LSTM Piano Time-slice (piano roll) Roman Numeral Analysis
StructureNet (2018) LSTM LSTM Monophonic Custom features (composite tokens) Free generation
Music-VAE (2018)
Code
LSTM VAE + LSTM Monophonic MIDI-like Samples interpolation
Free generation
JazzGAN (2018) LSTM GAN + LSTM Lead sheet Pitch + duration + chord (event-based) Chord-conditioned generation
DeepJ (2018)
Code
LSTM Biaxial LSTM Piano Time-slice (piano roll) Free generation
Style embedding analysis
Chen & al. (2019) LSTM Bi-LSTM Lead sheet Time-slice (piano roll) Chord-conditioned generation
Makris & al. (2019) LSTM LSTM (Drums) / Feed-forward (Context) Multi-track Drums: event-based
Context: time-slice
Drums accompaniment generation
MahlerNet (2019)
Code
LSTM VAE + Bi-LSTM Multi-track Event-based Samples interpolation
GrooVAE (2019)
Code
LSTM VAE + Bi-LSTM Drums Time-slice (drumroll) Drum Infilling
Tap2Drum
Humanization
Wu & al. (2019) LSTM Hierarchical + Bi-LSTM Monophonic Note-ON / Note-OFF Structure-conditioned generation
VirtuosoNet (2019)
Code
LSTM Hierarchical + VAE + Bi-LSTM + Attention Piano Custom features (composite tokens) Expressive performance generation
Amadeus (2019) LSTM Hierarchical + Reinforcement Learning Piano Pitch + duration (event-based) Free generation
MuseAE (2020)
Code
LSTM Adversarial Auto-encoder + LSTM Multi-track Time-slice (piano roll) Samples interpolation
Embedding analysis
Jin & al. (2020) LSTM LSTM + Reinforcement Learning Multi-track Time-slice (piano roll) Free generation
GGA-MG (2020)
Code
LSTM Bi-LSTM + Genetic Algorithm Monophonic ABC notation Free generation
Yu & al. (2021)
Code
LSTM GAN + LSTM Monophonic Pitch + duration (event-based) Lyrics-conditioned generation
CM-HRNN (2021)
Code
LSTM Hierarchical + LSTM Lead sheet Pitch + duration + chord + bar (composite tokens) Chord-conditioned generation
Keerti & al. (2022) LSTM Bi-LSTM + Attention Monophonic Pitch + duration (event-based) Sequence reconstruction
LStoM (2022)
Code
LSTM Bi-LSTM Multi-track Custom features (event-based) Melody extraction
Turker & al. (2022) LSTM VAE + LSTM Piano Note-ON / Note-OFF Sequence reconstruction
Latent space analysis

GRU

Model Recurrent unit Architecture Data Representation Tasks
MIDI-VAE (2018)
Code
GRU VAE + GRU Multi-track Time-slice (piano roll) Style transfer
Samples interpolation
XiaoIce Band (2018) GRU GRU + Attention Multi-track Pitch + duration + chord (event-based) Chord-conditioned generation
Arrangement generation
Songwriter (2019) GRU GRU + Attention Monophonic Pitch + duration (event-based) Lyrics-conditioned generation
Yang & al. (2019)
Code
GRU VAE + bi-GRU Lead sheet Time-slice (piano roll) + chords (chromagram) Melody contour-conditioned generation
Chord-conditioned generation
BUTTER (2020)
Code
GRU VAE + GRU Monophonic Time-slice (piano roll) Text-based query
Music captioning
Text-conditioned generation
Kong & al. (2020)
Code
GRU Bi-GRU Piano Time-slice (piano roll) Composer classification
MG-VAE (2020) GRU VAE + Bi-GRU Monophonic Pitch + interval + duration (event-based) Free generation
PianoTree-VAE (2020)
Code
GRU VAE + bi-GRU Piano / Multi-track Time-slice (pianoroll / MIDI-like) Samples interpolation
Free generation
Embedding analysis
Su & al. (2022) GRU Bi-GRU + CNN + Attention Monophonic Pitch + duration (time-slice-based) Free generation

Attention-based models

End-to-end models

Transformer decoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
Music Transformer (2018)
Code
Transformer decoder Relative attention Piano / Choral MIDI-like Priming
Harmonization
Chen & al. (2020) Transformer-XL - Guitar tabs REMI-derived (tabs) Free tabs generation
Pop Music Transformer (2020)
Code
Transformer-XL - Piano REMI Priming
Free generation
Jazz Transformer (2020)
Code
Transformer-XL - Lead sheet REMI-derived (chords) Free generation
PopMAG (2020) Transformer-XL - Multi-track MuMIDI Accompaniment generation
Wu & al. (2020) Transformer-XL - Piano MIDI-like-derived (composite tokens) Free generation
Di & al. (2020)
Code
Transformer decoder - Multi-track Compound-word-derived (rhythm family) Video-to-music
Chang & al. (2021)
Code
XLNet Relative bar encoding Piano Compound Word Infilling
Compound Word Transformer (2021)
Code
Linear Transformer decoder - Piano Compound Word Priming
Free generation
Sarmento & al. (2021)
Code
Transformer-XL - Guitar tabs + multi-track DadaGP Metadata-conditioned generation
Sulun & al. (2022)
Code
Music Transformer - Multi-track MIDI-like Emotion-conditioned generation
ComMU (2022)
Code
Transformer-XL - Multi-track REMI + metadata Metadata-conditioned generation
Multi-track combination
SymphonyNet (2022)
Code
Linear Transformer 3-D positional encoding Orchestral MMR Chord-conditioned generation
Priming
Free generation
Li & al. (2023) Transformer-XL - Lead sheet REMI-derived (pitch class) Free generation
Multitrack Music Transformer (2023)
Code
Transformer decoder - Orchestral MMT Free generation
Instrument-conditioned generation
Priming
GTR-CTRL (2023) Transformer-XL - Guitar tabs + multi-track DadaGP Instrument-conditioned generation
Genre-conditioned generation
ShredGP (2023) Transformer-XL - Guitar tabs DadaGP Style-conditioned generation
Choir Transformer (2023)
Code
Transformer decoder Relative attention 4-part chorales Chord + pitch (event-based) Harmonization
Guo & al. (2023)
Code
Transformer decoder with custom attention Fundamental music embedding
RIPO attention
Monophonic FME Priming
Compose & Embellish (2023)
Code
Transformer decoder - Multi-track REMI Lead sheet priming
Accompaniment refinement
RHEPP-Transformer (2023)
Code
Transformer decoder - Piano Octuple Expressive performance generation
Angioni & al. (2023)
Code
Transformer decoder - Multi-track TSD-like Style classification
Chordinator (2023)
Code
minGPT (no pre-training) - Chords Custom chord features (+ MIDI array) Chord generation
MMT-I/-G/-GI (2023) Transformer decoder - Multi-track REMI+ (+ genre, instrument) Genre-conditioned generation
Instrument-conditioned generation
Agarwal & al. (2024) Transformer decoder Structure-informed Positional Encoding Multi-track Pianoroll time-slices Free generation
Accompaniment generation
Transformer encoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
MTBert (2023) BERT (no-pre-training) - 4-part chorales Interval + duration (event-based) Fugue form analysis
Transformer encoder-decoder architecture
Model Base model MIR mechanism Data Representation Tasks
Transformer-VAE (2020) Transformer encoder-decoder - Monophonic Pitch + duration (time-slice-based) Priming
Harmony Transformer (2021)
Code
Transformer encoder-decoder - Piano Pianoroll time-slices Roman Numeral Analysis
Makris & al. (2021)
Code
Transformer encoder-decoder - Lead sheet Encoder: bar features / Decoder: chord + pitch + duration Emotion-conditioned generation
Liutkus & al. (2021)
Code
Performer Stochastic positional encoding Multi-track REMI / MIDI-like-derived (multi-track) Free generation
Groove continuation
Gover & al. (2022) BART - Piano REMI-derived (hands token) Arrangement generation
Museformer (2022)
Code
Transformer encoder-decoder with custom attention Fine-/coarse-grained attention
Bar selection
Multi-track REMI Free generation
Theme Transformer (2022)
Code
Transformer encoder-decoder Theme-aligned positional encoding Multi-track REMI-derived (theme tokens) Theme-conditioned generation
FIGARO (2022)
Code
Transformer encoder-decoder - Multi-track REMI+ Controllable generation
MuseMorphose (2023)
Code
Transformer encoder + Transformer-XL In-attention conditioning Piano REMI-derived (multi-track) Style transfer
Controllable generation
Accomontage 3 (2023)
Code
Transformer encoder-decoder Instrument embedding Multi-track Pianoroll time-slices Accompaniment generation
TeleMelody (2023)
Code
Transformer encoder-decoder - Monophonic Bar + position + pitch + duration (event-based) Lyrics-to-melody
MuseCoco (2023)
Code
Text2Attr: BERT
Attr2Music: Linear Transformer
- Multi-track REMI Text-to-MIDI
Multi-view MidiVAE (2024) Transformer encoder-decoder - Multi-track Octuple Free generation
MelodyT5 (2024)
Code
T5 - Monophonic ABC Notation Melody generation
Melody harmonization
Melody segmentation
Composer's Assistant 2 (2024) T5 - Multi-track REMI+ -derived Infilling
Controllable generation
BandControlNet (2024)
Code
Transformer encoder-decoder Structure enhanced self-attention Multi-track REMI_Track Controllable generation
Model combinations
Model Base model MIR mechanism Data Representation Tasks
Zhang (2020) Generator: Transformer decoder
Discriminator: Transformer encoder
- Multi-track MIDI-like-derived (composite tokens) Free generation
Transformer-GAN (2021)
Code
Generator: Transformer-XL
Discriminator: BERT
- Piano MIDI-like Free generation
Dai & al. (2021) Generator: Transformer encoder
Discriminator: LSTM
- Multi-track Pitch + rhythm (event-based) Structure-conditioned generation
Chord conditioned generation
Choi & al. (2021)
Code
Chord encoder: Bi-LSTM
Rhythm decoder: Transformer decoder
Pitch decoder: Transformer decoder
- Lead sheet Pitch + rhythm + chord (time-slice-based) Chord-conditioned generation
Makris & al. (2022)
Code
Bi-LSTM encoder / Transformer encoder - Multi-track Compound-word-derived Drums accompaniment generation
Neves & al. (2022)
Code
Generator: Linear Transformer
Discriminator: Linear Transformer
Local prediction map Piano REMI Emotion-conditioned generation
Q&A (2023)
Code
PianoTree-VAE
Transformer decoder
Instrument embedding Multi-track Piano roll time-slices Accompaniment generation
Duan & al. (2023) Generator: Transformer encoder
Discriminator: LSTM
- Monophonic Pitch + duration + rest (event-based) Lyrics-to-melody
Video2Music (2023)
Code
GRU + Transformer encoder-decoder - Multi-track MIDI-like Video-to-music

Pre-trained models

Transformer encoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
MuseBERT (2021)
Code
BERT Generalized relative positional encoding Multi-track MuseBERT representation Controllable generation
Chord analysis
Accompaniment refinement
MidiBERT-Piano (2021)
Code
BERT - Piano REMI / Compound Word Melody extraction
Velocity prediction
Composer classification
Emotion classification
MusicBERT (2021)
Code
RoBERTa Bar-level masking Multi-track Octuple Melody completion
Accompaniment suggestion
Genre classification
Style classification
DBTMPE (2021) Transformer encoder - Multi-track Pitch combinations + durations (event-based) Style classification
MRBERT (2023) BERT Melody/rhythm cross-attention Lead sheet Pitch + duration (event-based) Free generation
Infilling
Chord analysis
SoloGPBERT (2023) BERT - Guitar tabs DadaGP Guitar player classification
Shen & al. (2023) MidiBERT-Piano Pre-training tasks
(quad-attribute masking / key prediction)
Multi-track Compound Word simplified Melody extraction
Velocity prediction
Composer classification
Emotion classification
CLaMP (2023)
Code
Text encoder: DistilRoBERTa
Music encoder: BERT
- Lead sheet ABC notation-derived Text-based semantic music search
Music recommandation
Music classification
Transformer decoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
LakhNES (2019)
Code
Transformer-XL - Multi-track MIDI-like Free generation
Musenet (2019) GPT-2 Timing embedding / Structural embedding Multi-track MIDI-like Priming
MMM (2020)
Code
GPT-2 - Multi-track MultiTrack representation Free generation
Priming
Inpainting
Controllable generation
Angioni & al. (2023)
Code
GPT-2 - Multi-track TSD-like Priming
Zhang & al. (2023)
Code
GPT-3 - Drums Drumroll time-slices Priming
Bubeck & al. (2023) GPT-4 - Text / Mono-track ABC notation Text-to-ABC
ChatMusician (2024)
*Code
Llama-2 - Text / Mono-track ABC notation Text-to-ABC
ComposerX (2024)
*Code
GPT-4 - Text / Multi-track ABC notation Text-to-ABC
MuPT (2024) Transformer decoder - Multi-track ABC notation-derived Free generation
MuseBarControl (2024) Linear Transformer Auxiliary task pre-adaptation Piano REMI Controllable generation
Chord-conditioned generation
Transformer encoder-decoder architecture
Model Base model MIR mechanism Data Representation Tasks
MusIAC (2022)
Code
Transformer encoder-decoder - Multi-track REMI Infilling
Controllable generation
Li & al. (2023) Transformer encoder-decoder - Lead sheet Pitch + duration (event-based) Harmony analysis
Chord generation
Fu & al. (2023) MusicBERT + Music Transformer - Multi-track Octuple Melody completion
Accompaniment suggestion
Genre classification
Style classification
Multi-MMLG (2023) XLNet + MuseBERT - Multi-track Compound-word-derived Melody extraction
PianoBART (2024) BART Multi-level object masking Piano Octuple Priming
Melody extraction
Velocity prediction
Composer classification
Emotion classification
Comparative studies
Model Base model MIR mechanism Data Representation Tasks
Ferreira & al. (2023)
Code
GRU / Performance-RNN / GPT-2 / Music Transformer / MuseNet - Piano MIDI-like Free generation
Wu & al. (2023)
Code
BERT / GPT-2 / BART - Lead sheet ABC notation Text-to-ABC

Cite

If you find this useful, please cite our paper.

@misc{le2024surveymirnlp,
    title={Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey}, 
    author={Dinh-Viet-Toan Le and Louis Bigo and Mikaela Keller and Dorien Herremans},
    year={2024},
    eprint={2402.17467},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}