Audio clustering - how to aggregate embeddings from NNFP #2639
Unanswered
claudiolaas
asked this question in
Q&A
Replies: 1 comment
-
You can just concat 10 continuous clip embeddings to make one larger embedding (dim 1280). Or you can try some other operators who only generate 1 embedding for each audio input: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I have audio clips of variable length (<10sec). I want to create embeddings and use some form of unsupervised clustering to group the clips into buckets. I have had decent success with resemblyzer+Kmeans, but i thought a different approach might yield better results. I also tried speaker diarization from pyannote.audio but that took too long.
NNFP (https://towhee.io/audio-embedding/nnfp) does indeed give me embeddings for my clips but it splits the clips into segments of 1 second length and produces embedding for each of those segments.
cheers
Beta Was this translation helpful? Give feedback.
All reactions