Add compatibility with pyannote 3.0 embedding wrappers #188

sorgfresser · 2023-10-18T08:47:22Z

Adds initial support for the embedding model using in pyannote/speaker-diarization-3.0

Usage:

embedding = EmbeddingModel.from_pyannote("hbredin/wespeaker-voxceleb-resnet34-LM")
embedding.to(device("cuda"))
config = SpeakerDiarizationConfig(embedding=embedding)
pipeline = SpeakerDiarization(config)
mic = MicrophoneAudioSource()
inference = StreamingInference(pipeline, mic, do_plot=True)
inference.attach_observers(RTTMWriter(mic.uri, "output/file.rttm"))
prediction = inference()

I am still lacking support for pyannote/segmentation-3.0 as of now and I am not 100% sure why... I thought it should be drop in replacement for pyannote/segmentation but it does not seem to work.

Any hints here would be greatly appreciated.

hbredin · 2023-10-18T09:13:11Z

Hint for pyannote/segmentation-3.0 support: use Powerset.to_multilabel conversion as illustrated here

juanmc2005

Thank you for this PR! The core logic looks good. Changes are needed to simplify the API before merging

src/diart/models.py

src/diart/blocks/diarization.py

juanmc2005 · 2023-10-18T12:17:37Z

@sorgfresser if that's ok with you, let's open a different PR for segmentation-3.0 so we can merge this one before that

sorgfresser · 2023-10-27T11:26:50Z

Hey @juanmc2005
Thanks for your recommendations, I hopefully added them all now. Could you review again?
The usage has changed a bit, I've updated my original comment accordingly.

juanmc2005

@sorgfresser Thank you for this revised version! It looks way better than before.
I just want to address a couple of concerns before merging:

I want to support all pyannote embedding models and not just wespeaker (this can be done without much effort)
There are some small errors here and there and potential improvements that would greatly benefit the diart API

Also, I noticed you forgot to replace SegmentationModel with the new API. Remember it also inherits from LazyModel so we need to replace forward with __call__ and make sure that type hints still match. This is very important.

Thank you again for this enormous contribution! I can't wait to get this merged, it will be huge for many people wanting to improve real-time diarization performance!

src/diart/mapping.py

src/diart/models.py

juanmc2005 · 2023-10-27T12:20:54Z

src/diart/models.py

+                weights = weights.to("cpu")
+            # Move to cpu for numpy conversion
+            waveform = waveform.to("cpu")
+            return torch.from_numpy(self.model(waveform, weights))


do waveform and weights have to be numpy? have you checked that this works with pyannote/embedding? Also please check that this way we can run both models on GPU.

If you don't have a GPU, I can check this myself during testing. Please let me know.

Also, please use super().__call__(waveform, weights) here as well so you don't have to call self.load() manually

Yeah, sadly they have to be. It will be executed on the GPU, but this .numpy() requires the waveform.to("cpu")
Additionally this requires the weights.to("cpu") as we otherwise fail with

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

I can verify that the GPU is utilized, you can see the ONNX provider set with our call to .to("cuda") because of this definition

Nonetheless, the tensors have to be converted to numpy for onnx use and as such the errors pointed out above exist if they are not moved to the cpu beforehand.

Do all types of embedding models use onnx? Maybe @hbredin can give us a hint here.

If we absolutely need this to("cpu"), we could create a wrapper around PretrainedEmbeddingModel, let's say PipelineInputFormatter.

This way PyannoteLoader would return a PipelineInputFormatter instead of a PretrainedEmbeddingModel:

class PipelineInputFormatter: def __init__(self, model: PretrainedEmbeddingModel): self.model = model def __call__(self, audio, masks) -> np.ndarray: return self.model(audio.cpu().numpy(), masks.cpu().numpy())

I don't really like the name though, I'm open to suggestions :)

It may also be a matter of needing to specify the device upon instantiation, contrary to nn.Module. In that case we may want to refactor diart models to work in the same way

Do all types of embedding models use onnx?

No. Only WeSpeaker ones.

Ok I just took a look at the links sent by @sorgfresser and it does look like a pyannote bug.
@hbredin I opened a PR to fix this: pyannote/pyannote-audio#1518
Could you take a look?

Also, if it's not too much to ask, it would be great to have a release with this fix so we don't have to do any weird workarounds here

src/diart/models.py

juanmc2005 · 2023-10-28T14:12:44Z

@sorgfresser I just released version 0.8. Please make sure to rebase your branch against develop so we're able to merge:

git checkout pyannote-3.0
git rebase <diart remote>/develop
# Once successful and without conflicts
git push --force origin pyannote-3.0

src/diart/models.py

juanmc2005 · 2023-10-30T16:37:16Z

Ok I think the code is pretty solid! I want to test this but I would prefer to wait for the pyannote fix to be merged and hopefully released by @hbredin 🙏🏻

If the pyannote fix takes a long time to get into a release, I would prefer to do the required changes here anyway. In this case, WeSpeaker embeddings wouldn't work (on GPU) temporarily, but I prefer the code to be clean if it's going to be part of the next release

src/diart/models.py

juanmc2005 · 2023-11-01T15:43:16Z

@sorgfresser could you please rebase on top of develop and force push again? I added some GitHub action checks that I would like to run on this PR to make sure nothing's broken

sorgfresser · 2023-11-03T16:23:20Z

Thanks for rebasing @juanmc2005 and sorry for the late reply. I added the docstrings and removed the cpu moving - is there anything else I can add / modify?

src/diart/models.py

juanmc2005 · 2023-11-03T20:07:02Z

Hey @sorgfresser thanks for the new changes. I think we're good here. Now that the code is looking good I'll pull the branch locally and do some tests to see if the feature is working correctly. In particular wespeaker and some other model like ecapa tdnn.

If my tests look good I'll go ahead and merge. I'll probably wait for the pyannote fix to release v0.9 though.

juanmc2005 · 2023-11-05T13:45:43Z

@sorgfresser quick update after some tests.

It looks like normalizing weights does affect the embeddings. DER on AMI goes from 27.3 to to 29.8, which is pretty bad. When I remove the normalization it goes down to 27.5, so there's something else affecting performance negatively.

I think we should add a parameter somewhere to specify if weights should be normalized.

I'll keep investigating and get back

juanmc2005 · 2023-11-05T15:13:17Z

@hbredin looks like the difference between the 27.5 and 27.3 was because of pytorch 2.1.0.
Downgrading to 2.0.1 solves this. Have you observed any performance changes in pyannote too with this new pytorch version?

Maybe it's related to the automatic conversion of the segmentation and embedding models. I keep getting these warnings:

Model was trained with pyannote.audio 0.0.1, yours is 3.0.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.8.1+cu102, yours is 2.0.1+cu117. Bad things might happen unless you revert torch to 1.x.

I don't think this is a deal-breaker so I won't change the requirements to force torch<=2.1.0, but it may be worth checking if the model is being loaded badly with torch 2.1.0

juanmc2005 · 2023-11-05T15:54:57Z

@sorgfresser could you move the weight normalization code to diart.blocks.OverlappedSpeechPenalty?
The new OverlappedSpeechPenalty.__call__() method should look like this:

    def __call__(self, segmentation: TemporalFeatures) -> TemporalFeatures:
        weights = self.formatter.cast(segmentation)  # shape (batch, frames, speakers)
        with torch.no_grad():
            probs = torch.softmax(self.beta * weights, dim=-1)
            weights = torch.pow(weights, self.gamma) * torch.pow(probs, self.gamma)
            weights[weights < 1e-8] = 1e-8
            if self.normalize:
                min_values = weights.min(dim=1, keepdim=True).values
                max_values = weights.max(dim=1, keepdim=True).values
                weights = (weights - min_values) / (max_values - min_values)
                weights.nan_to_num_(1e-8)
        return self.formatter.restore_type(weights)

Where self.normalize is a new constructor argument that defaults to False.
Also, we need to add a normalize_weights argument to OverlapAwareSpeakerEmbedding.
Finally, let's add a normalize_embedding_weights to DiarizationConfig, and then pass this value to OverlapAwareSpeakerEmbedding inside the constructor of diart.blocks.diarization.SpeakerDiarization.

This way, users can decide whether they want to do this normalization or not directly in the pipeline config.

I could do these changes myself but I'm not sure I have write access to your fork.

I would also like to have this as a CLI argument --normalize-embedding-weights in diart.stream, diart.benchmark, diart.tune and diart.serve, but I would merge the PR without this anyway. I leave it for you to decide if you want to implement that feature here.
In any case, pipeline configs are becoming pretty big so I'm thinking of converting them to a yaml file soon.

juanmc2005 · 2023-11-05T16:02:12Z

Otherwise, I was able to run diart.stream with WeSpeaker embeddings! Amazing work @sorgfresser!

hbredin · 2023-11-06T09:28:38Z

@hbredin looks like the difference between the 27.5 and 27.3 was because of pytorch 2.1.0. Downgrading to 2.0.1 solves this. Have you observed any performance changes in pyannote too with this new pytorch version?

pyannote's CI is kind of non-existent so I don't actually know :)

sorgfresser · 2023-11-06T14:56:08Z

I added the boolean to cli and moved the normalization to OverlappedSpeechPenalty. Is that the way you'd like the cli to behave?
Would be nice if you'd test it again. What models did you use for your benchmark to get the 29.8 on AMI?
I think btw you can edit the fork, feel free to do so but I would be willing to implement any other changes too if you prefer me to do it.

juanmc2005 · 2023-11-06T15:21:04Z

@sorgfresser thank you for the swift reply and commit!
I ran the command on the reproducibility section, only changing --tau-active, --delta-new and --rho-update to the AMI values from the hyper-parameter table.
So the models I used were pyannote/segmentation@Interspeech2021 and pyannote/embedding.

I'll re-run the tests as soon as I can and get back with updates

juanmc2005

I did some new tests and everything works well! My benchmark on AMI with WeSpeaker embeddings gave DER=28.9, but it may be a matter of tuning the hyper-parameters. Also it may be worth benchmarking without normalizing weights.

I can't seem to get ONNX to run on my GPU but I think it might be a problem with my CUDA drivers. @sorgfresser can you run them on GPU?

I will commit some suggestions here and there and I just have a couple of formatting things that I'd like to improve. Once this is fixed I'm good to merge.

requirements.txt

setup.cfg

src/diart/argdoc.py

src/diart/blocks/embedding.py

src/diart/models.py

juanmc2005 · 2023-11-06T18:58:28Z

src/diart/models.py

+    def __call__(self, waveform: torch.Tensor) -> torch.Tensor:
+        """
+        Call the forward pass of the segmentation model.
+        Parameters
+        ----------
+        waveform: torch.Tensor, shape (batch, channels, samples)
+        Returns
+        -------
+        speaker_segmentation: torch.Tensor, shape (batch, frames, speakers)
+        """
+        return super().__call__(waveform)


Move this to SegmentationModel

src/diart/models.py

src/diart/blocks/diarization.py

juanmc2005 · 2023-11-06T19:28:26Z

Update: AMI benchmark with WeSpeaker embeddings and not weight normalization gives DER=30.8

juanmc2005 · 2023-11-09T17:30:05Z

@sorgfresser huge thanks for this feature! Stay tuned for v0.9! I hope we can get it released as soon as possible.
If you liked contributing to diart I'd love to work with you on other issues in need of work 😃 Even better if it's on the list for the v0.9 milestone!

* bump pyannote to 3.0 * add wespeaker inference * add weights normalization, cpu for numpy conversion * unify api * remove try catch * always normalize * use PretrainedSpeakerEmbedding in Loader * Fix min-max normalization equation * fix: remove imports * Change embedding model return type to Callable Co-authored-by: Simon <80467011+sorgfresser@users.noreply.github.com> * fix: remove type checking * remove from active if NaN embeddings * Fix wrong typing of model in `LazyModel` * add docstrings * Simplify EmbeddingModel.__call__() * Add numpy import * add normalize boolean * Update requirements.txt * Update setup.cfg * Apply suggestions from code review * Fix wrong kwarg name * add abstract __call__ * move __call__ to parent class --------- Co-authored-by: Juan Coria <juanmc2005@hotmail.com>

juanmc2005 requested changes Oct 18, 2023

View reviewed changes

juanmc2005 added feature New feature or request API Improvements to the API labels Oct 18, 2023

juanmc2005 changed the title ~~Pyannote 3.0~~ Add compatibility with pyannote 3.0 embedding wrappers Oct 18, 2023

juanmc2005 added this to the Version 0.9 milestone Oct 19, 2023

sorgfresser requested a review from juanmc2005 October 27, 2023 11:26

juanmc2005 requested changes Oct 27, 2023

View reviewed changes

juanmc2005 assigned sorgfresser Oct 27, 2023

juanmc2005 mentioned this pull request Oct 28, 2023

Fix missing .cpu() call causing WeSpeaker embedding pipeline to crash pyannote/pyannote-audio#1518

Closed

juanmc2005 requested changes Oct 28, 2023

View reviewed changes

juanmc2005 force-pushed the develop branch from b26e60c to 782ce49 Compare October 28, 2023 14:07

sorgfresser force-pushed the pyannote-3.0 branch from 68ca2ba to 679dee4 Compare October 30, 2023 13:57

sorgfresser commented Oct 30, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

juanmc2005 reviewed Oct 30, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

juanmc2005 reviewed Oct 30, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

juanmc2005 reviewed Oct 30, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

juanmc2005 mentioned this pull request Oct 31, 2023

SpeechBrain embedding compatibility #34

Closed

sorgfresser added 7 commits November 3, 2023 16:03

bump pyannote to 3.0

3d6330e

add wespeaker inference

4e96df3

add weights normalization, cpu for numpy conversion

4c5a66e

unify api

c89152f

remove try catch

ccd7d12

always normalize

aa2e03b

use PretrainedSpeakerEmbedding in Loader

d0788bb

juanmc2005 reviewed Nov 3, 2023

View reviewed changes

src/diart/models.py Outdated Show resolved Hide resolved

Simplify EmbeddingModel.__call__()

f57f9d2

juanmc2005 reviewed Nov 3, 2023

View reviewed changes

src/diart/models.py Show resolved Hide resolved

Add numpy import

fd9f8e6

add normalize boolean

e5d31e0

juanmc2005 requested changes Nov 6, 2023

View reviewed changes

juanmc2005 added 3 commits November 6, 2023 20:06

Update requirements.txt

661588c

Update setup.cfg

c708aa0

Apply suggestions from code review

22aa9f1

juanmc2005 reviewed Nov 6, 2023

View reviewed changes

src/diart/blocks/diarization.py Outdated Show resolved Hide resolved

Fix wrong kwarg name

576992b

sorgfresser added 2 commits November 9, 2023 08:09

add abstract __call__

0644c70

move __call__ to parent class

791d688

juanmc2005 merged commit 14910e1 into juanmc2005:develop Nov 9, 2023

This was referenced Nov 9, 2023

Add support for powerset segmentation models #198

Merged

Add compatibility with new pyannote embedding models #185

Closed

juanmc2005 mentioned this pull request Nov 18, 2023

Version 0.9 #217

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compatibility with pyannote 3.0 embedding wrappers #188

Add compatibility with pyannote 3.0 embedding wrappers #188

sorgfresser commented Oct 18, 2023 •

edited

Loading

hbredin commented Oct 18, 2023

juanmc2005 left a comment

juanmc2005 commented Oct 18, 2023

sorgfresser commented Oct 27, 2023

juanmc2005 left a comment

juanmc2005 Oct 27, 2023

sorgfresser Oct 27, 2023 •

edited

Loading

juanmc2005 Oct 27, 2023

juanmc2005 Oct 27, 2023

hbredin Oct 28, 2023

juanmc2005 Oct 28, 2023

juanmc2005 commented Oct 28, 2023

juanmc2005 commented Oct 30, 2023

juanmc2005 commented Nov 1, 2023

sorgfresser commented Nov 3, 2023

juanmc2005 commented Nov 3, 2023

juanmc2005 commented Nov 5, 2023

juanmc2005 commented Nov 5, 2023

juanmc2005 commented Nov 5, 2023

juanmc2005 commented Nov 5, 2023

hbredin commented Nov 6, 2023

sorgfresser commented Nov 6, 2023

juanmc2005 commented Nov 6, 2023

juanmc2005 left a comment

juanmc2005 Nov 6, 2023

juanmc2005 commented Nov 6, 2023

juanmc2005 commented Nov 9, 2023

Add compatibility with pyannote 3.0 embedding wrappers #188

Add compatibility with pyannote 3.0 embedding wrappers #188

Conversation

sorgfresser commented Oct 18, 2023 • edited Loading

hbredin commented Oct 18, 2023

juanmc2005 left a comment

Choose a reason for hiding this comment

juanmc2005 commented Oct 18, 2023

sorgfresser commented Oct 27, 2023

juanmc2005 left a comment

Choose a reason for hiding this comment

juanmc2005 Oct 27, 2023

Choose a reason for hiding this comment

sorgfresser Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

juanmc2005 Oct 27, 2023

Choose a reason for hiding this comment

juanmc2005 Oct 27, 2023

Choose a reason for hiding this comment

hbredin Oct 28, 2023

Choose a reason for hiding this comment

juanmc2005 Oct 28, 2023

Choose a reason for hiding this comment

juanmc2005 commented Oct 28, 2023

juanmc2005 commented Oct 30, 2023

juanmc2005 commented Nov 1, 2023

sorgfresser commented Nov 3, 2023

juanmc2005 commented Nov 3, 2023

juanmc2005 commented Nov 5, 2023

juanmc2005 commented Nov 5, 2023

juanmc2005 commented Nov 5, 2023

juanmc2005 commented Nov 5, 2023

hbredin commented Nov 6, 2023

sorgfresser commented Nov 6, 2023

juanmc2005 commented Nov 6, 2023

juanmc2005 left a comment

Choose a reason for hiding this comment

juanmc2005 Nov 6, 2023

Choose a reason for hiding this comment

juanmc2005 commented Nov 6, 2023

juanmc2005 commented Nov 9, 2023

sorgfresser commented Oct 18, 2023 •

edited

Loading

sorgfresser Oct 27, 2023 •

edited

Loading