Suggestions on how to best separate speaker identities #923

Rahul-Brito · 2022-03-08T17:04:59Z

Rahul-Brito
Mar 8, 2022

Hello! Your work is awesome, and I have been playing with your pipeline for a few months (a few months last year, now back to using it this year).

We have a population of speakers who are all reading the same passage. We want to be able to determine the relative distance between speakers i.e. figure out which speakers have similar voice and which are different. Ideally there is some form of a gradient (speaker 1 might be most similar to speaker 0, speaker 2 less so, and speaker 3 far away. Of course speaker 1, 2, and 3 would have their own interrelationships).

I was curious if you had a suggestion on the best way to do so with your pre-trained pipelines/models (those seem to be effective enough so far an we don't have a big training dataset so we have not done any retraining).

What I have tried so far:

I implemented your older tutorial and was able to get nice cluster-like point clouds. I played around with different tSNE parameters and UMAP as well. With this, I could do a few things - look at parameters that tell me how well the high-dimension embeddings are preserved in the low dimension space, look at correlation analysis to known feature sets, etc. Curious how you might recommend trying this?

hbredin · 2022-03-09T13:51:24Z

hbredin
Mar 9, 2022
Maintainer

I recommend using Speechbrain ECAPA-TDNN pretrained model for that purpose.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions on how to best separate speaker identities #923

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Suggestions on how to best separate speaker identities #923

Rahul-Brito Mar 8, 2022

Replies: 1 comment

hbredin Mar 9, 2022 Maintainer

Rahul-Brito
Mar 8, 2022

hbredin
Mar 9, 2022
Maintainer