Speaker Count #568
Replies: 2 comments 5 replies
-
pyannote.audio does speaker diarization and not speaker counting. Of course, one can hope that speaker diarization will be perfect and contains the correct number of speakers but that is seldom the case (especially for large number of speakers). One reason is that the speaker diarization pipeline is optimized for diarization error rate and not speaker count. If speaker count is really what you are looking for (and you do not care about speaker diarization), I'd suggest you train a model to do just that. Unfortunately, this is not currently implemented in pyannote.audio. Upcoming v2.0 might change that by making it very easy to design and train for new tasks. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the answer However diarization answers the question "who speaks when" and speaker counting is just the who. I can understand that there are maybe better trained models which just focus on speaker counting (expecially for large number of speakers) but I think solving the speaker count problem should also be possible by diarization. |
Beta Was this translation helpful? Give feedback.
-
Is it correct that the number of speakers in an audiofile is only determined by the speaker embeddings model?
Following this tutorial for the speaker embedding shows 4 clusters by tSNE, however groundtruth is 3 speakers.
https://github.com/pyannote/pyannote-audio/tree/master/tutorials/pretrained/model
Is this just bit error rate?
Does anyone know if the speaker embedding clusters are equal to the number of speakers in a audiofile?
Are there better clustering algorithms recommendations (k-means etc)?
Thanks for any answer
Beta Was this translation helpful? Give feedback.
All reactions