I will try to use several different models to do singing voice separtion.
In 'singing voice separtion' task, the input to the model is mixed signal, the output from the model are two signals, one for vocal and the other for backgound music.
In the very early stage, now, I have just implemented the basic DNN model to do this task.
This DNN model is trained on 'MIR-1K' dataset collected by Chao-Ling Hsu and Prof. Jyh-Shing Roger Jang from National Taiwan University. And my final results are approximated to:
Metrics | Music | Vocal |
---|---|---|
GNSDR | 5.766 | 5.738 |
GSIR | 11.614 | 13.232 |
GSAR | 8.051 | 7.455 |
If you have ideas to improve this results, please let me know. The results in paper [1] is better than this one, but I have no idea how to improve it.
And I am trying to use GAN to do this task now. (Well, unfortunately, it failed :(, the model cannot converge.)
-
Singing Voice Separation and Pitch Extraction from Monaural Polyphonic Audio Music Via DNN and Adaptive Pitch Tracking
-
SVSGAN: SINGING VOICE SEPARATION VIA GENERATIVE ADVERSARIAL NETWORK