diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
new file mode 100644
index 000000000..6f024c73b
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,56 @@
+name: Bug report
+description: Report a bug in pyannote.audio
+body:
+
+- type: markdown
+ attributes:
+ value: |
+ When reporting bugs, please follow the guidelines in this template. This helps identify the problem precisely and thus enables contributors to fix it faster.
+ - Write a descriptive issue title above.
+ - The golden rule is to **always open *one* issue for *one* bug**. If you notice several bugs and want to report them, make sure to create one new issue for each of them.
+ - Search [open](https://github.com/pyannote/pyannote-audio/issues) and [closed](https://github.com/pyannote/pyannote-audio/issues?q=is%3Aissue+is%3Aclosed) issues to ensure it has not already been reported. If you don't find a relevant match or if you're unsure, don't hesitate to **open a new issue**. The bugsquad will handle it from there if it's a duplicate.
+ - Please always check if your issue is reproducible in the latest version – it may already have been fixed!
+ - If you use a custom build, please test if your issue is reproducible in official releases too.
+
+- type: textarea
+ attributes:
+ label: Tested versions
+ description: |
+ To properly fix a bug, we need to identify if the bug was recently introduced in the engine, or if it was always present.
+ - Please specify the pyannote.audio version you found the issue in, including the **Git commit hash** if using a development build.
+ - If you can, **please test earlier pyannote.audio versions** and, if applicable, newer versions (development branch). Mention whether the bug is reproducible or not in the versions you tested.
+ - The aim is for us to identify whether a bug is a **regression**, i.e. an issue that didn't exist in a previous version, but was introduced later on, breaking existing functionality. For example, if a bug is reproducible in 3.2 but not in 3.0, we would like you to test intermediate 3.1 to find which version is the first one where the issue can be reproduced.
+ placeholder: |
+ - Reproducible in: 3.1, 3.2, and later
+ - Not reproducible in: 3.0
+ validations:
+ required: true
+
+- type: input
+ attributes:
+ label: System information
+ description: |
+ - Specify the OS version, and when relevant hardware information.
+ - For issues that are likely OS-specific and/or GPU-related, please specify the GPU model and architecture.
+ - **Bug reports not including the required information may be closed at the maintainers' discretion.** If in doubt, always include all the requested information; it's better to include too much information than not enough information.
+ placeholder: macOS 13.6 - pyannote.audio 3.1.1 - M1 Pro
+ validations:
+ required: true
+
+- type: textarea
+ attributes:
+ label: Issue description
+ description: |
+ Describe your issue briefly. What doesn't work, and how do you expect it to work instead?
+ You can include audio, images or videos with drag and drop, and format code blocks or logs with ```
tags.
+ validations:
+ required: true
+
+- type: input
+ attributes:
+ label: Minimal reproduction example (MRE)
+ description: |
+ Having reproducible issues is a prerequisite for contributors to be able to solve them.
+ Include a link to minimal reproduction example using [this Google Colab notebook](https://colab.research.google.com/github/pyannote/pyannote-audio/blob/develop/tutorials/MRE_template.ipynb) as a starting point.
+ validations:
+ required: true
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 000000000..84f6ea55a
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,15 @@
+blank_issues_enabled: false
+
+contact_links:
+
+ - name: Feature request
+ url: https://github.com/pyannote/pyannote-audio/discussions
+ about: Suggest an idea for this project.
+
+ - name: Consulting
+ url: https://herve.niderb.fr/consulting
+ about: Using pyannote.audio in production? Make the most of it thanks to our consulting services.
+
+ - name: Premium models
+ url: https://forms.office.com/e/GdqwVgkZ5C
+ about: We are considering selling premium models, extensions, or services around pyannote.audio.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
deleted file mode 100644
index 4ead48053..000000000
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ /dev/null
@@ -1,20 +0,0 @@
----
-name: Feature request
-about: Suggest an idea for this project
-title: ''
-labels: ''
-assignees: ''
-
----
-
-**Is your feature request related to a problem? Please describe.**
-A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
-
-**Describe the solution you'd like**
-A clear and concise description of what you want to happen.
-
-**Describe alternatives you've considered**
-A clear and concise description of any alternative solutions or features you've considered.
-
-**Additional context**
-Add any other context about the feature request here.
diff --git a/.github/workflows/new_issue.yml b/.github/workflows/new_issue.yml
deleted file mode 100644
index b8477dc16..000000000
--- a/.github/workflows/new_issue.yml
+++ /dev/null
@@ -1,29 +0,0 @@
-name: issues
-on:
- issues:
- types: [opened]
-jobs:
- add-comment:
- runs-on: ubuntu-latest
- permissions:
- issues: write
- steps:
- - uses: actions/checkout@v3
- with:
- ref: develop
- - name: Install FAQtory
- run: pip install FAQtory
- - name: Run Suggest
- env:
- TITLE: ${{ github.event.issue.title }}
- run: faqtory suggest "$TITLE" > suggest.md
- - name: Read suggest.md
- id: suggest
- uses: juliangruber/read-file-action@v1
- with:
- path: ./suggest.md
- - name: Suggest FAQ
- uses: peter-evans/create-or-update-comment@a35cf36e5301d70b76f316e867e7788a55a31dae
- with:
- issue-number: ${{ github.event.issue.number }}
- body: ${{ steps.suggest.outputs.content }}
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index 90a4302c6..e649027ae 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -30,4 +30,4 @@ jobs:
pip install -e .[dev,testing]
- name: Test with pytest
run: |
- pytest
+ pytest -k "not test_cli.py"
diff --git a/.github/workflows/test_cli.yml b/.github/workflows/test_cli.yml
new file mode 100644
index 000000000..731a4b5b6
--- /dev/null
+++ b/.github/workflows/test_cli.yml
@@ -0,0 +1,33 @@
+name: CLI tests
+
+on:
+ push:
+ branches: [develop]
+ pull_request:
+ branches: [develop]
+
+jobs:
+ build:
+ timeout-minutes: 20
+ runs-on: ${{ matrix.os }}
+ strategy:
+ matrix:
+ os: [ubuntu-latest]
+ python-version: ["3.10"]
+ steps:
+ - uses: actions/checkout@v2
+ - name: Set up Python ${{ matrix.python-version }}
+ uses: actions/setup-python@v2
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Install libsndfile
+ if: matrix.os == 'ubuntu-latest'
+ run: |
+ sudo apt-get update
+ sudo apt-get install libsndfile1
+ - name: Install pyannote.audio
+ run: |
+ pip install -e .[dev,testing,cli]
+ - name: Test with pytest
+ run: |
+ pytest tests/test_cli.py
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 549e46ad0..92c952bdc 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -14,7 +14,7 @@ repos:
# Sort imports
- repo: https://github.com/PyCQA/isort
- rev: 5.10.1
+ rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"]
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 777f41f38..ad88762c2 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,40 @@
# Changelog
+## Version 3.2.0 (2024-05-08)
+
+### New features
+
+- feat(task): add option to cache task training metadata to speed up training (with [@clement-pages](https://github.com/clement-pages/))
+- feat(model): add `receptive_field`, `num_frames` and `dimension` to models (with [@Bilal-Rahou](https://github.com/Bilal-Rahou))
+- feat(model): add `fbank_only` property to `WeSpeaker` models
+- feat(util): add `Powerset.permutation_mapping` to help with permutation in powerset space (with [@FrenchKrab](https://github.com/FrenchKrab))
+- feat(sample): add sample file at `pyannote.audio.sample.SAMPLE_FILE`
+- feat(metric): add `reduce` option to `diarization_error_rate` metric (with [@Bilal-Rahou](https://github.com/Bilal-Rahou))
+- feat(pipeline): add `Waveform` and `SampleRate` preprocessors
+
+### Fixes
+
+- fix(task): fix random generators and their reproducibility (with [@FrenchKrab](https://github.com/FrenchKrab))
+- fix(task): fix estimation of training set size (with [@FrenchKrab](https://github.com/FrenchKrab))
+- fix(hook): fix `torch.Tensor` support in `ArtifactHook`
+- fix(doc): fix typo in `Powerset` docstring (with [@lukasstorck](https://github.com/lukasstorck))
+
+### Improvements
+
+- improve(metric): add support for number of speakers mismatch in `diarization_error_rate` metric
+- improve(pipeline): track both `Model` and `nn.Module` attributes in `Pipeline.to(device)`
+- improve(io): switch to `torchaudio >= 2.2.0`
+- improve(doc): update tutorials (with [@clement-pages](https://github.com/clement-pages/))
+
+## Breaking changes
+
+- BREAKING(model): get rid of `Model.example_output` in favor of `num_frames` method, `receptive_field` property, and `dimension` property
+- BREAKING(task): custom tasks need to be updated (see "Add your own task" tutorial)
+
+## Community contributions
+
+- community: add tutorial for offline use of `pyannote/speaker-diarization-3.1` (by [@simonottenhauskenbun](https://github.com/simonottenhauskenbun))
+
## Version 3.1.1 (2023-12-01)
### TL;DR
diff --git a/MANIFEST.in b/MANIFEST.in
index 16909925f..45ad7d6af 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,4 +1,6 @@
recursive-include pyannote *.py
recursive-include pyannote *.yaml
+recursive-include pyannote *.wav
+recursive-include pyannote *.rttm
global-exclude *.pyc
global-exclude __pycache__
diff --git a/README.md b/README.md
index 49b976a1f..e1326816a 100644
--- a/README.md
+++ b/README.md
@@ -70,26 +70,30 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
- Videos
- [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min
- [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min
- - [First releaase of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
+ - [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
+- Community contributions (not maintained by the core team)
+ - 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)
## Benchmark
Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x.
Those numbers are diarization error rates (in %):
-| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.office.com/e/GdqwVgkZ5C) |
-| ---------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ---------------------------------------------- |
-| AISHELL-4 | 14.1 | 12.3 | 11.9 |
-| AliMeeting (channel 1) | 27.4 | 24.5 | 22.5 |
-| AMI (IHM) | 18.9 | 18.8 | 16.6 |
-| AMI (SDM) | 27.1 | 22.6 | 20.9 |
-| AVA-AVD | 66.3 | 50.0 | 39.8 |
-| CALLHOME (part 2) | 31.6 | 28.4 | 22.2 |
-| DIHARD 3 (full) | 26.9 | 21.4 | 17.2 |
-| Ego4D (dev.) | 61.5 | 51.2 | 43.8 |
-| MSDWild | 32.8 | 25.4 | 19.8 |
-| REPERE (phase2) | 8.2 | 7.8 | 7.6 |
-| VoxConverse (v0.3) | 11.2 | 11.2 | 9.4 |
+| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.office.com/e/GdqwVgkZ5C) |
+| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ |
+| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 11.9 |
+| [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.4 | 22.5 |
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 16.6 |
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.4 | 20.9 |
+| [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 50.0 | 39.8 |
+| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 22.2 |
+| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.7 | 17.2 |
+| [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.0 |
+| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 43.8 |
+| [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.3 | 19.8 |
+| [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 18.4 |
+| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.8 | 7.6 |
+| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.3 | 9.4 |
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
diff --git a/pyannote/audio/augmentation/mix.py b/pyannote/audio/augmentation/mix.py
index a6fff49c0..c6e811280 100644
--- a/pyannote/audio/augmentation/mix.py
+++ b/pyannote/audio/augmentation/mix.py
@@ -60,10 +60,10 @@ def __init__(
max_snr_in_db: float = 5.0,
mode: str = "per_example",
p: float = 0.5,
- p_mode: str = None,
- sample_rate: int = None,
- target_rate: int = None,
- max_num_speakers: int = None,
+ p_mode: Optional[str] = None,
+ sample_rate: Optional[int] = None,
+ target_rate: Optional[int] = None,
+ max_num_speakers: Optional[int] = None,
output_type: str = "tensor",
):
super().__init__(
@@ -80,7 +80,7 @@ def __init__(
def randomize_parameters(
self,
- samples: Tensor = None,
+ samples: Optional[Tensor] = None,
sample_rate: Optional[int] = None,
targets: Optional[Tensor] = None,
target_rate: Optional[int] = None,
diff --git a/pyannote/audio/cli/evaluate.py b/pyannote/audio/cli/evaluate.py
index a5ab682c5..3eec5e3cc 100644
--- a/pyannote/audio/cli/evaluate.py
+++ b/pyannote/audio/cli/evaluate.py
@@ -25,7 +25,7 @@
import hydra
from omegaconf import DictConfig
-from pyannote.database import FileFinder, ProtocolFile, get_protocol
+from pyannote.database import FileFinder, ProtocolFile, registry
from rich.progress import Progress
from pyannote.audio import Inference, Model
@@ -41,8 +41,16 @@ def evaluate(cfg: DictConfig) -> Optional[float]:
(device,) = get_devices(needs=1)
model = Model.from_pretrained(cfg.model, device=device)
+ # load databases into registry if it was specified
+ if "registry" in cfg:
+ for database_yml in cfg.registry.split(","):
+ registry.load_database(database_yml)
+
# load evaluation files
- protocol = get_protocol(cfg.protocol, preprocessors={"audio": FileFinder()})
+ protocol = registry.get_protocol(
+ cfg.protocol, preprocessors={"audio": FileFinder()}
+ )
+
files = list(getattr(protocol, cfg.subset)())
# load evaluation metric
@@ -53,7 +61,7 @@ def evaluate(cfg: DictConfig) -> Optional[float]:
main_task = progress.add_task(protocol.name, total=len(files))
file_task = progress.add_task("Processing", total=1.0)
- def progress_hook(completed: int = None, total: int = None):
+ def progress_hook(completed: Optional[int] = None, total: Optional[int] = None):
progress.update(file_task, completed=completed / total)
inference = Inference(model, device=device)
@@ -65,8 +73,6 @@ def hypothesis(file: ProtocolFile):
warm_up=(warm_up, warm_up),
)
- metric = DiscreteDiarizationErrorRate()
-
for file in files:
progress.update(file_task, description=file["uri"])
reference = file["annotation"]
diff --git a/pyannote/audio/cli/evaluate_config/hydra/default.yaml b/pyannote/audio/cli/evaluate_config/hydra/default.yaml
index 95979432b..1aaebd1b9 100644
--- a/pyannote/audio/cli/evaluate_config/hydra/default.yaml
+++ b/pyannote/audio/cli/evaluate_config/hydra/default.yaml
@@ -22,7 +22,8 @@ help:
template: |-
${hydra.help.header}
- pyannote-audio-eval protocol={protocol_name}
+ pyannote-audio-eval registry={path_to_database.yml}
+ protocol={protocol_name}
subset={test | development | train}
model={path_to_pretrained_model}
warm_up={warm_up_duration_in_seconds}
diff --git a/pyannote/audio/cli/lr_schedulers/CosineAnnealingWarmRestarts.py b/pyannote/audio/cli/lr_schedulers/CosineAnnealingWarmRestarts.py
index d8e5f4b3c..3c270eba1 100644
--- a/pyannote/audio/cli/lr_schedulers/CosineAnnealingWarmRestarts.py
+++ b/pyannote/audio/cli/lr_schedulers/CosineAnnealingWarmRestarts.py
@@ -20,6 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
+from typing import Optional
from torch.optim import Optimizer
from torch.optim.lr_scheduler import (
@@ -32,7 +33,7 @@ def CosineAnnealingWarmRestarts(
min_lr: float = 1e-8,
max_lr: float = 1e-3,
patience: int = 1,
- num_batches_per_epoch: int = None,
+ num_batches_per_epoch: Optional[int] = None,
**kwargs,
):
"""Wrapper around CosineAnnealingWarmRestarts
diff --git a/pyannote/audio/cli/lr_schedulers/CyclicLR.py b/pyannote/audio/cli/lr_schedulers/CyclicLR.py
index cd7a7b730..cca4420b0 100644
--- a/pyannote/audio/cli/lr_schedulers/CyclicLR.py
+++ b/pyannote/audio/cli/lr_schedulers/CyclicLR.py
@@ -20,6 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
+from typing import Optional
from torch.optim import Optimizer
from torch.optim.lr_scheduler import CyclicLR as _CyclicLR
@@ -31,7 +32,7 @@ def CyclicLR(
max_lr: float = 1e-3,
mode: str = "triangular2",
patience: int = 50,
- num_batches_per_epoch: int = None,
+ num_batches_per_epoch: Optional[int] = None,
**kwargs,
):
"""Wrapper around CyclicLR learning rate scheduler
diff --git a/pyannote/audio/cli/train.py b/pyannote/audio/cli/train.py
index 74041554b..f052cd5b3 100644
--- a/pyannote/audio/cli/train.py
+++ b/pyannote/audio/cli/train.py
@@ -51,8 +51,9 @@ def train(cfg: DictConfig) -> Optional[float]:
seed_everything(seed=seed)
# load databases into registry
- for database_yml in cfg.registry.split(","):
- registry.load_database(database_yml)
+ if "registry" in cfg:
+ for database_yml in cfg.registry.split(","):
+ registry.load_database(database_yml)
# instantiate training protocol with optional preprocessors
preprocessors = {"audio": FileFinder(), "torchaudio.info": get_torchaudio_info}
@@ -75,9 +76,9 @@ def train(cfg: DictConfig) -> Optional[float]:
# instantiate model
fine_tuning = cfg.model["_target_"] == "pyannote.audio.cli.pretrained"
- model = instantiate(cfg.model)
- model.task = task
- model.setup(stage="fit")
+ model = instantiate(cfg.model, task=task)
+ model.prepare_data()
+ model.setup()
# validation metric to monitor (and its direction: min or max)
monitor, direction = task.val_monitor
@@ -146,7 +147,6 @@ def configure_optimizers(self):
# in case of fine-tuning, validate the initial model to make sure
# that we actually improve over the initial performance
if fine_tuning:
- model.setup(stage="fit")
trainer.validate(model)
# train the model
@@ -157,7 +157,9 @@ def configure_optimizers(self):
# return the best validation score
# this can be used for hyper-parameter optimization with Hydra sweepers
- if monitor is not None:
+ # this can only be done if the trainer is not in fast dev run mode, as
+ # checkpointing is disabled in this mode
+ if monitor is not None and not trainer.fast_dev_run:
best_monitor = float(checkpoint.best_model_score)
if direction == "min":
return best_monitor
diff --git a/pyannote/audio/cli/train_config/config.yaml b/pyannote/audio/cli/train_config/config.yaml
index d5b761cc9..d7ecd547a 100644
--- a/pyannote/audio/cli/train_config/config.yaml
+++ b/pyannote/audio/cli/train_config/config.yaml
@@ -1,4 +1,3 @@
-registry: ???
protocol: ???
defaults:
diff --git a/pyannote/audio/cli/train_config/trainer/default.yaml b/pyannote/audio/cli/train_config/trainer/default.yaml
index ac3a60ff4..f32031af1 100644
--- a/pyannote/audio/cli/train_config/trainer/default.yaml
+++ b/pyannote/audio/cli/train_config/trainer/default.yaml
@@ -30,6 +30,6 @@ precision: 32
profiler: null
reload_dataloaders_every_n_epochs: 0
use_distributed_sampler: True # TODO: check what this does exactly
-strategy: null
+strategy: auto
sync_batchnorm: False
val_check_interval: 1.0
diff --git a/pyannote/audio/core/callback.py b/pyannote/audio/core/callback.py
index 5ce522d57..0cc46845b 100644
--- a/pyannote/audio/core/callback.py
+++ b/pyannote/audio/core/callback.py
@@ -20,7 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
-from typing import List, Mapping, Text, Union
+from typing import List, Mapping, Optional, Text, Union
from pytorch_lightning import Callback, Trainer
from pytorch_lightning.utilities.model_summary import ModelSummary
@@ -67,7 +67,7 @@ class GraduallyUnfreeze(Callback):
def __init__(
self,
schedule: Union[Mapping[Text, int], List[Union[List[Text], Text]]] = None,
- epochs_per_stage: int = None,
+ epochs_per_stage: Optional[int] = None,
):
super().__init__()
diff --git a/pyannote/audio/core/inference.py b/pyannote/audio/core/inference.py
index dcf21868d..0c3e9b212 100644
--- a/pyannote/audio/core/inference.py
+++ b/pyannote/audio/core/inference.py
@@ -20,7 +20,6 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
-import math
import warnings
from pathlib import Path
from typing import Callable, List, Optional, Text, Tuple, Union
@@ -37,7 +36,6 @@
from pyannote.audio.core.model import Model, Specifications
from pyannote.audio.core.task import Resolution
from pyannote.audio.utils.multi_task import map_with_specifications
-from pyannote.audio.utils.permutation import mae_cost_func, permutate
from pyannote.audio.utils.powerset import Powerset
from pyannote.audio.utils.reproducibility import fix_reproducibility
@@ -86,12 +84,12 @@ def __init__(
self,
model: Union[Model, Text, Path],
window: Text = "sliding",
- duration: float = None,
- step: float = None,
+ duration: Optional[float] = None,
+ step: Optional[float] = None,
pre_aggregation_hook: Callable[[np.ndarray], np.ndarray] = None,
skip_aggregation: bool = False,
skip_conversion: bool = False,
- device: torch.device = None,
+ device: Optional[torch.device] = None,
batch_size: int = 32,
use_auth_token: Union[Text, None] = None,
):
@@ -263,16 +261,14 @@ def slide(
_, num_samples = waveform.shape
def __frames(
- example_output, specifications: Optional[Specifications] = None
+ receptive_field, specifications: Optional[Specifications] = None
) -> SlidingWindow:
if specifications.resolution == Resolution.CHUNK:
return SlidingWindow(start=0.0, duration=self.duration, step=self.step)
- return example_output.frames
+ return receptive_field
frames: Union[SlidingWindow, Tuple[SlidingWindow]] = map_with_specifications(
- self.model.specifications,
- __frames,
- self.model.example_output,
+ self.model.specifications, __frames, self.model.receptive_field
)
# prepare complete chunks
@@ -373,7 +369,7 @@ def __aggregate(
outputs,
SlidingWindow(start=0.0, duration=self.duration, step=self.step),
),
- frames=frames,
+ frames,
warm_up=self.warm_up,
hamming=True,
missing=0.0,
@@ -526,7 +522,7 @@ def __first_sample(outputs: np.ndarray, **kwargs) -> np.ndarray:
@staticmethod
def aggregate(
scores: SlidingWindowFeature,
- frames: SlidingWindow = None,
+ frames: SlidingWindow,
warm_up: Tuple[float, float] = (0.0, 0.0),
epsilon: float = 1e-12,
hamming: bool = False,
@@ -539,10 +535,8 @@ def aggregate(
----------
scores : SlidingWindowFeature
Raw (unaggregated) scores. Shape is (num_chunks, num_frames_per_chunk, num_classes).
- frames : SlidingWindow, optional
- Frames resolution. Defaults to estimate it automatically based on `scores` shape
- and chunk size. Providing the exact frame resolution (when known) leads to better
- temporal precision.
+ frames : SlidingWindow
+ Frames resolution.
warm_up : (float, float) tuple, optional
Left/right warm up duration (in seconds).
missing : float, optional
@@ -559,15 +553,11 @@ def aggregate(
num_chunks, num_frames_per_chunk, num_classes = scores.data.shape
chunks = scores.sliding_window
- if frames is None:
- duration = step = chunks.duration / num_frames_per_chunk
- frames = SlidingWindow(start=chunks.start, duration=duration, step=step)
- else:
- frames = SlidingWindow(
- start=chunks.start,
- duration=frames.duration,
- step=frames.step,
- )
+ frames = SlidingWindow(
+ start=chunks.start,
+ duration=frames.duration,
+ step=frames.step,
+ )
masks = 1 - np.isnan(scores)
scores.data = np.nan_to_num(scores.data, copy=True, nan=0.0)
@@ -602,6 +592,7 @@ def aggregate(
scores.sliding_window.start
+ scores.sliding_window.duration
+ (num_chunks - 1) * scores.sliding_window.step
+ + 0.5 * frames.duration
)
+ 1
)
@@ -627,7 +618,8 @@ def aggregate(
# score ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray
# mask ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray
- start_frame = frames.closest_frame(chunk.start)
+ start_frame = frames.closest_frame(chunk.start + 0.5 * frames.duration)
+
aggregated_output[start_frame : start_frame + num_frames_per_chunk] += (
score * mask * hamming_window * warm_up_window
)
@@ -698,134 +690,3 @@ def trim(
)
return SlidingWindowFeature(new_data, new_chunks)
-
- @staticmethod
- def stitch(
- activations: SlidingWindowFeature,
- frames: SlidingWindow = None,
- lookahead: Optional[Tuple[int, int]] = None,
- cost_func: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None,
- match_func: Callable[[np.ndarray, np.ndarray, float], bool] = None,
- ) -> SlidingWindowFeature:
- """
-
- Parameters
- ----------
- activations : SlidingWindowFeature
- (num_chunks, num_frames, num_classes)-shaped scores.
- frames : SlidingWindow, optional
- Frames resolution. Defaults to estimate it automatically based on `activations`
- shape and chunk size. Providing the exact frame resolution (when known) leads to better
- temporal precision.
- lookahead : (int, int) tuple
- Number of past and future adjacent chunks to use for stitching.
- Defaults to (k, k) with k = chunk_duration / chunk_step - 1
- cost_func : callable
- Cost function used to find the optimal mapping between two chunks.
- Expects two (num_frames, num_classes) torch.tensor as input
- and returns cost as a (num_classes, ) torch.tensor
- Defaults to mean absolute error (utils.permutations.mae_cost_func)
- match_func : callable
- Function used to decide whether two speakers mapped by the optimal
- mapping actually are a match.
- Expects two (num_frames, ) np.ndarray and the cost (from cost_func)
- and returns a boolean. Defaults to always returning True.
- """
-
- num_chunks, num_frames, num_classes = activations.data.shape
-
- chunks: SlidingWindow = activations.sliding_window
-
- if frames is None:
- duration = step = chunks.duration / num_frames
- frames = SlidingWindow(start=chunks.start, duration=duration, step=step)
- else:
- frames = SlidingWindow(
- start=chunks.start,
- duration=frames.duration,
- step=frames.step,
- )
-
- max_lookahead = math.floor(chunks.duration / chunks.step - 1)
- if lookahead is None:
- lookahead = 2 * (max_lookahead,)
-
- assert all(L <= max_lookahead for L in lookahead)
-
- if cost_func is None:
- cost_func = mae_cost_func
-
- if match_func is None:
-
- def always_match(this: np.ndarray, that: np.ndarray, cost: float):
- return True
-
- match_func = always_match
-
- stitches = []
- for C, (chunk, activation) in enumerate(activations):
- local_stitch = np.NAN * np.zeros(
- (sum(lookahead) + 1, num_frames, num_classes)
- )
-
- for c in range(
- max(0, C - lookahead[0]), min(num_chunks, C + lookahead[1] + 1)
- ):
- # extract common temporal support
- shift = round((C - c) * num_frames * chunks.step / chunks.duration)
-
- if shift < 0:
- shift = -shift
- this_activations = activation[shift:]
- that_activations = activations[c, : num_frames - shift]
- else:
- this_activations = activation[: num_frames - shift]
- that_activations = activations[c, shift:]
-
- # find the optimal one-to-one mapping
- _, (permutation,), (cost,) = permutate(
- this_activations[np.newaxis],
- that_activations,
- cost_func=cost_func,
- return_cost=True,
- )
-
- for this, that in enumerate(permutation):
- # only stitch under certain condiditions
- matching = (c == C) or (
- match_func(
- this_activations[:, this],
- that_activations[:, that],
- cost[this, that],
- )
- )
-
- if matching:
- local_stitch[c - C + lookahead[0], :, this] = activations[
- c, :, that
- ]
-
- # TODO: do not lookahead further once a mismatch is found
-
- stitched_chunks = SlidingWindow(
- start=chunk.start - lookahead[0] * chunks.step,
- duration=chunks.duration,
- step=chunks.step,
- )
-
- local_stitch = Inference.aggregate(
- SlidingWindowFeature(local_stitch, stitched_chunks),
- frames=frames,
- hamming=True,
- )
-
- stitches.append(local_stitch.data)
-
- stitches = np.stack(stitches)
- stitched_chunks = SlidingWindow(
- start=chunks.start - lookahead[0] * chunks.step,
- duration=chunks.duration + sum(lookahead) * chunks.step,
- step=chunks.step,
- )
-
- return SlidingWindowFeature(stitches, stitched_chunks)
diff --git a/pyannote/audio/core/io.py b/pyannote/audio/core/io.py
index 0a44e75ea..8fafe69d3 100644
--- a/pyannote/audio/core/io.py
+++ b/pyannote/audio/core/io.py
@@ -40,8 +40,6 @@
from pyannote.core import Segment
from torch import Tensor
-torchaudio.set_audio_backend("soundfile")
-
AudioFile = Union[Text, Path, IOBase, Mapping]
AudioFileDocString = """
@@ -253,7 +251,9 @@ def get_duration(self, file: AudioFile) -> float:
return frames / sample_rate
- def get_num_samples(self, duration: float, sample_rate: int = None) -> int:
+ def get_num_samples(
+ self, duration: float, sample_rate: Optional[int] = None
+ ) -> int:
"""Deterministic number of samples from duration and sample rate"""
sample_rate = sample_rate or self.sample_rate
diff --git a/pyannote/audio/core/model.py b/pyannote/audio/core/model.py
index bedb7f6c4..8af802293 100644
--- a/pyannote/audio/core/model.py
+++ b/pyannote/audio/core/model.py
@@ -46,7 +46,6 @@
from pyannote.audio.core.io import Audio
from pyannote.audio.core.task import (
Problem,
- Resolution,
Specifications,
Task,
UnknownSpecificationsError,
@@ -112,10 +111,6 @@ def task(self) -> Task:
def task(self, task: Task):
# reset (cached) properties when task changes
del self.specifications
- try:
- del self.example_output
- except AttributeError:
- pass
self._task = task
def build(self):
@@ -136,15 +131,7 @@ def specifications(self) -> Union[Specifications, Tuple[Specifications]]:
) from e
else:
- try:
- specifications = self.task.specifications
-
- except AttributeError as e:
- raise UnknownSpecificationsError(
- "Task specifications are not available. This is most likely because they depend on "
- "the content of the training subset. Use `model.task.setup()` to go over the training "
- "subset and fix this, or let lightning trainer do that for you in `trainer.fit(model)`."
- ) from e
+ specifications = self.task.specifications
return specifications
@@ -188,38 +175,35 @@ def example_input_array(self) -> torch.Tensor:
return self.__example_input_array()
@cached_property
- def example_output(self) -> Union[Output, Tuple[Output]]:
- """Example output"""
- example_input_array = self.__example_input_array()
- with torch.inference_mode():
- example_output = self(example_input_array)
-
- def __example_output(
- example_output: torch.Tensor,
- specifications: Specifications = None,
- ) -> Output:
- if specifications.resolution == Resolution.FRAME:
- _, num_frames, dimension = example_output.shape
- frame_duration = specifications.duration / num_frames
- frames = SlidingWindow(step=frame_duration, duration=frame_duration)
- else:
- _, dimension = example_output.shape
- num_frames = None
- frames = None
-
- return Output(
- num_frames=num_frames,
- dimension=dimension,
- frames=frames,
- )
+ def receptive_field(self) -> SlidingWindow:
+ """(Internal) frames"""
- return map_with_specifications(
- self.specifications, __example_output, example_output
+ receptive_field_size = self.receptive_field_size(num_frames=1)
+ receptive_field_step = (
+ self.receptive_field_size(num_frames=2) - receptive_field_size
+ )
+ receptive_field_start = (
+ self.receptive_field_center(frame=0) - (receptive_field_size - 1) / 2
+ )
+ return SlidingWindow(
+ start=receptive_field_start / self.hparams.sample_rate,
+ duration=receptive_field_size / self.hparams.sample_rate,
+ step=receptive_field_step / self.hparams.sample_rate,
)
+ def prepare_data(self):
+ self.task.prepare_data()
+
def setup(self, stage=None):
if stage == "fit":
- self.task.setup_metadata()
+ # let the task know about the trainer (e.g for broadcasting
+ # cache path between multi-GPU training processes).
+ self.task.trainer = self.trainer
+
+ # setup the task if defined (only on training and validation stages,
+ # but not for basic inference)
+ if self.task:
+ self.task.setup(stage)
# list of layers before adding task-dependent layers
before = set((name, id(module)) for name, module in self.named_modules())
@@ -252,7 +236,7 @@ def setup(self, stage=None):
module.to(self.device)
# add (trainable) loss function (e.g. ArcFace has its own set of trainable weights)
- if stage == "fit":
+ if self.task:
# let task know about the model
self.task.model = self
# setup custom loss function
@@ -260,9 +244,6 @@ def setup(self, stage=None):
# setup custom validation metrics
self.task.setup_validation_metric()
- # cache for later (and to avoid later CUDA error with multiprocessing)
- _ = self.example_output
-
# list of layers after adding task-dependent layers
after = set((name, id(module)) for name, module in self.named_modules())
@@ -331,7 +312,9 @@ def default_activation(self) -> Union[nn.Module, Tuple[nn.Module]]:
Activation.
"""
- def __default_activation(specifications: Specifications = None) -> nn.Module:
+ def __default_activation(
+ specifications: Optional[Specifications] = None,
+ ) -> nn.Module:
if specifications.problem == Problem.BINARY_CLASSIFICATION:
return nn.Sigmoid()
@@ -468,9 +451,8 @@ def __by_name(
if isinstance(modules, str):
modules = [modules]
- for name, module in ModelSummary(self, max_depth=-1).named_modules:
- if name not in modules:
- continue
+ for name in modules:
+ module = getattr(self, name)
for parameter in module.parameters(recurse=True):
parameter.requires_grad = requires_grad
diff --git a/pyannote/audio/core/pipeline.py b/pyannote/audio/core/pipeline.py
index f844d584f..24e266abf 100644
--- a/pyannote/audio/core/pipeline.py
+++ b/pyannote/audio/core/pipeline.py
@@ -29,6 +29,7 @@
from typing import Callable, Dict, List, Optional, Text, Union
import torch
+import torch.nn as nn
import yaml
from huggingface_hub import hf_hub_download
from huggingface_hub.utils import RepositoryNotFoundError
@@ -232,7 +233,7 @@ def remove_from(*dicts):
_models = self.__dict__.get("_models")
_inferences = self.__dict__.get("_inferences")
- if isinstance(value, Model):
+ if isinstance(value, nn.Module):
if _models is None:
msg = "cannot assign models before Pipeline.__init__() call"
raise AttributeError(msg)
diff --git a/pyannote/audio/core/task.py b/pyannote/audio/core/task.py
index 1edfbc35c..0a61e2a6f 100644
--- a/pyannote/audio/core/task.py
+++ b/pyannote/audio/core/task.py
@@ -23,19 +23,25 @@
from __future__ import annotations
+import itertools
import multiprocessing
import sys
import warnings
+from collections import defaultdict
from dataclasses import dataclass
from enum import Enum
from functools import cached_property, partial
from numbers import Number
+from pathlib import Path
+from tempfile import mkstemp
from typing import Dict, List, Literal, Optional, Sequence, Text, Tuple, Union
+import numpy as np
import pytorch_lightning as pl
import scipy.special
import torch
from pyannote.database import Protocol
+from pyannote.database.protocol.protocol import Scope, Subset
from torch.utils.data import DataLoader, Dataset, IterableDataset
from torch_audiomentations import Identity
from torch_audiomentations.core.transforms_interface import BaseWaveformTransform
@@ -44,6 +50,9 @@
from pyannote.audio.utils.loss import binary_cross_entropy, nll_loss
from pyannote.audio.utils.protocol import check_protocol
+Subsets = list(Subset.__args__)
+Scopes = list(Scope.__args__)
+
# Type of machine learning problem
class Problem(Enum):
@@ -151,6 +160,31 @@ def __len__(self):
return self.task.val__len__()
+def get_dtype(value: int) -> str:
+ """Return the most suitable type for storing the
+ value passed in parameter in memory.
+
+ Parameters
+ ----------
+ value: int
+ value whose type is best suited to storage in memory
+
+ Returns
+ -------
+ str:
+ numpy formatted type
+ (see https://numpy.org/doc/stable/reference/arrays.dtypes.html)
+ """
+ # signe byte (8 bits), signed short (16 bits), signed int (32 bits):
+ types_list = [(127, "b"), (32_768, "i2"), (2_147_483_648, "i")]
+ filtered_list = [
+ (max_val, type) for max_val, type in types_list if max_val > abs(value)
+ ]
+ if not filtered_list:
+ return "i8" # signed long (64 bits)
+ return filtered_list[0][1]
+
+
class Task(pl.LightningDataModule):
"""Base task class
@@ -169,6 +203,13 @@ class Task(pl.LightningDataModule):
----------
protocol : Protocol
pyannote.database protocol
+ cache : str, optional
+ As (meta-)data preparation might take a very long time for large datasets,
+ it can be cached to disk for later (and faster!) re-use.
+ When `cache` does not exist, `Task.prepare_data()` generates training
+ and validation metadata from `protocol` and save them to disk.
+ When `cache` exists, `Task.prepare_data()` is skipped and (meta)-data
+ are loaded from disk. Defaults to a temporary path.
duration : float, optional
Chunks duration in seconds. Defaults to two seconds (2.).
min_duration : float, optional
@@ -201,18 +242,20 @@ class Task(pl.LightningDataModule):
----------
specifications : Specifications or tuple of Specifications
Task specifications (available after `Task.setup` has been called.)
+
"""
def __init__(
self,
protocol: Protocol,
+ cache: Optional[Union[str, None]] = None,
duration: float = 2.0,
- min_duration: float = None,
+ min_duration: Optional[float] = None,
warm_up: Union[float, Tuple[float, float]] = 0.0,
batch_size: int = 32,
- num_workers: int = None,
+ num_workers: Optional[int] = None,
pin_memory: bool = False,
- augmentation: BaseWaveformTransform = None,
+ augmentation: Optional[BaseWaveformTransform] = None,
metric: Union[Metric, Sequence[Metric], Dict[str, Metric]] = None,
):
super().__init__()
@@ -221,8 +264,16 @@ def __init__(
self.protocol, checks = check_protocol(protocol)
self.has_validation = checks["has_validation"]
self.has_scope = checks["has_scope"]
+ if not self.has_scope:
+ raise ValueError(
+ "Protocol must provide 'scope' information (e.g. 'file', 'database', or 'global')."
+ )
+
self.has_classes = checks["has_classes"]
+ # metadata cache
+ self.cache = Path(cache) if cache else cache
+
# batching
self.duration = duration
self.min_duration = duration if min_duration is None else min_duration
@@ -255,24 +306,351 @@ def __init__(
self._metric = metric
def prepare_data(self):
- """Use this to download and prepare data
-
- This is where we might end up downloading datasets
- and transform them so that they are ready to be used
- with pyannote.database. but for now, the API assume
- that we directly provide a pyannote.database.Protocol.
+ """Use this to prepare data from task protocol
Notes
-----
- Called only once.
+ Called only once on the main process (and only on it), for global_rank 0.
+
+ After this method is called, the task should have a `prepared_data` attribute
+ with the following dictionary structure:
+
+ prepared_data = {
+ 'protocol': name of the protocol
+ 'audio-path': array of N paths to audio
+ 'audio-metadata': array of N audio infos such as audio subset, scope and database
+ 'audio-info': array of N audio torchaudio.info struct
+ 'audio-encoding': array of N audio encodings
+ 'audio-annotated': array of N annotated duration (usually equals file duration but might be shorter if file is not fully annotated)
+ 'annotations-regions': array of M annotated regions
+ 'annotations-segments': array of M' annotated segments
+ 'metadata-values': dict of lists of values for subset, scope and database
+ 'metadata-`database-name`-labels': array of `database-name` labels. Each database with "database" scope labels has it own array.
+ 'metadata-labels': array of global scope labels
+ }
+
+ """
+
+ if self.cache:
+ # check if cache exists and is not empty:
+ if self.cache.exists() and self.cache.stat().st_size > 0:
+ # data was already created, nothing to do
+ return
+ # create parent directory if needed
+ self.cache.parent.mkdir(parents=True, exist_ok=True)
+ else:
+ # if no cache was provided by user, create a temporary file
+ # in system directory used for temp files
+ self.cache = Path(mkstemp()[1])
+
+ # list of possible values for each metadata key
+ # (will become .prepared_data[""])
+ metadata_unique_values = defaultdict(list)
+ metadata_unique_values["subset"] = Subsets
+ metadata_unique_values["scope"] = Scopes
+
+ audios = list() # list of path to audio files
+ audio_infos = list()
+ audio_encodings = list()
+ metadata = list() # list of metadata
+
+ annotated_duration = list() # total duration of annotated regions (per file)
+ annotated_regions = list() # annotated regions
+ annotations = list() # actual annotations
+ unique_labels = list()
+ database_unique_labels = {}
+
+ if self.has_validation:
+ files_iter = itertools.chain(
+ self.protocol.train(), self.protocol.development()
+ )
+ else:
+ files_iter = self.protocol.train()
+
+ for file_id, file in enumerate(files_iter):
+ # gather metadata and update metadata_unique_values so that each metadatum
+ # (e.g. source database or label) is represented by an integer.
+ metadatum = dict()
+
+ # keep track of source database and subset (train, development, or test)
+ if file["database"] not in metadata_unique_values["database"]:
+ metadata_unique_values["database"].append(file["database"])
+ metadatum["database"] = metadata_unique_values["database"].index(
+ file["database"]
+ )
+ metadatum["subset"] = Subsets.index(file["subset"])
+
+ # keep track of label scope (file, database, or global)
+ metadatum["scope"] = Scopes.index(file["scope"])
+
+ remaining_metadata_keys = set(file) - set(
+ [
+ "uri",
+ "database",
+ "subset",
+ "audio",
+ "torchaudio.info",
+ "scope",
+ "classes",
+ "annotation",
+ "annotated",
+ ]
+ )
+
+ # keep track of any other (integer or string) metadata provided by the protocol
+ # (e.g. a "domain" key for domain-adversarial training)
+ for key in remaining_metadata_keys:
+ value = file[key]
+
+ if isinstance(value, str):
+ if value not in metadata_unique_values[key]:
+ metadata_unique_values[key].append(value)
+ metadatum[key] = metadata_unique_values[key].index(value)
+
+ elif isinstance(value, int):
+ metadatum[key] = value
+
+ else:
+ warnings.warn(
+ f"Ignoring '{key}' metadata because of its type ({type(value)}). Only str and int are supported for now.",
+ category=UserWarning,
+ )
+
+ metadata.append(metadatum)
+
+ # reset list of file-scoped labels
+ file_unique_labels = list()
+
+ # path to audio file
+ audios.append(str(file["audio"]))
+
+ # audio info
+ audio_info = file["torchaudio.info"]
+ audio_infos.append(
+ (
+ audio_info.sample_rate, # sample rate
+ audio_info.num_frames, # number of frames
+ audio_info.num_channels, # number of channels
+ audio_info.bits_per_sample, # bits per sample
+ )
+ )
+ audio_encodings.append(audio_info.encoding) # encoding
+
+ # annotated regions and duration
+ _annotated_duration = 0.0
+ for segment in file["annotated"]:
+ # skip annotated regions that are shorter than training chunk duration
+ if segment.duration < self.duration:
+ continue
+
+ # append annotated region
+ annotated_region = (
+ file_id,
+ segment.duration,
+ segment.start,
+ )
+ annotated_regions.append(annotated_region)
+
+ # increment annotated duration
+ _annotated_duration += segment.duration
+
+ # append annotated duration
+ annotated_duration.append(_annotated_duration)
+
+ # annotations
+ for segment, _, label in file["annotation"].itertracks(yield_label=True):
+ # "scope" is provided by speaker diarization protocols to indicate
+ # whether speaker labels are local to the file ('file'), consistent across
+ # all files in a database ('database'), or globally consistent ('global')
+
+ # 0 = 'file' / 1 = 'database' / 2 = 'global'
+ scope = Scopes.index(file["scope"])
+
+ # update list of file-scope labels
+ if label not in file_unique_labels:
+ file_unique_labels.append(label)
+ # and convert label to its (file-scope) index
+ file_label_idx = file_unique_labels.index(label)
+
+ database_label_idx = global_label_idx = -1
+
+ if scope > 0: # 'database' or 'global'
+ # update list of database-scope labels
+ database = file["database"]
+ if database not in database_unique_labels:
+ database_unique_labels[database] = []
+ if label not in database_unique_labels[database]:
+ database_unique_labels[database].append(label)
+
+ # and convert label to its (database-scope) index
+ database_label_idx = database_unique_labels[database].index(label)
+
+ if scope > 1: # 'global'
+ # update list of global-scope labels
+ if label not in unique_labels:
+ unique_labels.append(label)
+ # and convert label to its (global-scope) index
+ global_label_idx = unique_labels.index(label)
+
+ annotations.append(
+ (
+ file_id, # index of file
+ segment.start, # start time
+ segment.end, # end time
+ file_label_idx, # file-scope label index
+ database_label_idx, # database-scope label index
+ global_label_idx, # global-scope index
+ )
+ )
+
+ # since not all metadata keys are present in all files, fallback to -1 when a key is missing
+ metadata = [
+ tuple(metadatum.get(key, -1) for key in metadata_unique_values)
+ for metadatum in metadata
+ ]
+ metadata_dtype = [
+ (key, get_dtype(max(m[i] for m in metadata)))
+ for i, key in enumerate(metadata_unique_values)
+ ]
+
+ # turn list of files metadata into a single numpy array
+ # TODO: improve using https://github.com/pytorch/pytorch/issues/13246#issuecomment-617140519
+ info_dtype = [
+ (
+ "sample_rate",
+ get_dtype(max(ai[0] for ai in audio_infos)),
+ ),
+ (
+ "num_frames",
+ get_dtype(max(ai[1] for ai in audio_infos)),
+ ),
+ ("num_channels", "B"),
+ ("bits_per_sample", "B"),
+ ]
+
+ # turn list of annotated regions into a single numpy array
+ region_dtype = [
+ (
+ "file_id",
+ get_dtype(max(ar[0] for ar in annotated_regions)),
+ ),
+ ("duration", "f"),
+ ("start", "f"),
+ ]
+
+ # turn list of annotations into a single numpy array
+ segment_dtype = [
+ (
+ "file_id",
+ get_dtype(max(a[0] for a in annotations)),
+ ),
+ ("start", "f"),
+ ("end", "f"),
+ ("file_label_idx", get_dtype(max(a[3] for a in annotations))),
+ ("database_label_idx", get_dtype(max(a[4] for a in annotations))),
+ ("global_label_idx", get_dtype(max(a[5] for a in annotations))),
+ ]
+
+ # save all protocol data in a dict
+ prepared_data = {}
+
+ # keep track of protocol name
+ prepared_data["protocol"] = self.protocol.name
+
+ prepared_data["audio-path"] = np.array(audios, dtype=np.str_)
+ audios.clear()
+
+ prepared_data["audio-metadata"] = np.array(metadata, dtype=metadata_dtype)
+ metadata.clear()
+
+ prepared_data["audio-info"] = np.array(audio_infos, dtype=info_dtype)
+ audio_infos.clear()
+
+ prepared_data["audio-encoding"] = np.array(audio_encodings, dtype=np.str_)
+ audio_encodings.clear()
+
+ prepared_data["audio-annotated"] = np.array(annotated_duration)
+ annotated_duration.clear()
+
+ prepared_data["annotations-regions"] = np.array(
+ annotated_regions, dtype=region_dtype
+ )
+ annotated_regions.clear()
+
+ prepared_data["annotations-segments"] = np.array(
+ annotations, dtype=segment_dtype
+ )
+ annotations.clear()
+
+ prepared_data["metadata-values"] = metadata_unique_values
+
+ for database, labels in database_unique_labels.items():
+ prepared_data[f"metadata-{database}-labels"] = np.array(
+ labels, dtype=np.str_
+ )
+ database_unique_labels.clear()
+
+ prepared_data["metadata-labels"] = np.array(unique_labels, dtype=np.str_)
+ unique_labels.clear()
+
+ self.prepare_validation(prepared_data)
+ self.post_prepare_data(prepared_data)
+
+ # save prepared data on the disk
+ with open(self.cache, "wb") as cache_file:
+ np.savez_compressed(cache_file, **prepared_data)
+
+ def post_prepare_data(self, prepared_data: Dict):
+ """Method for completing `prepared_data` with task-specific data.
+ For instance, for a classification task, this could be a list of
+ possible classes.
+
+ Parameters
+ ----------
+ prepared_data: dict
+ dictionnary containing protocol data prepared by
+ `prepare_data()`
+ Note
+ ----
+ This method does not return anything. Thus, user have to directly modify
+ `prepared_data`, for updates to be taken into account
"""
pass
+ def setup(self, stage=None):
+ """Setup data cached by prepare_data into the task on each device"""
+
+ # send cache path on all processes used for the training,
+ # allowing them to access the cache generated by prepare_data
+ if stage == "fit":
+ self.cache = self.trainer.strategy.broadcast(self.cache)
+
+ try:
+ with open(self.cache, "rb") as cache_file:
+ self.prepared_data = dict(np.load(cache_file, allow_pickle=True))
+ except FileNotFoundError:
+ print(
+ "Cached data for protocol not found. Ensure that prepare_data() was called",
+ " and executed correctly or/and that the path to the task cache is correct.",
+ )
+ raise
+
+ # checks that the task current protocol matches the cached protocol
+ if self.protocol.name != self.prepared_data["protocol"]:
+ raise ValueError(
+ f"Protocol specified for the task ({self.protocol.name}) "
+ f"does not correspond to the cached one ({self.prepared_data['protocol']})"
+ )
+
@property
def specifications(self) -> Union[Specifications, Tuple[Specifications]]:
# setup metadata on-demand the first time specifications are requested and missing
if not hasattr(self, "_specifications"):
- self.setup_metadata()
+ raise UnknownSpecificationsError(
+ "Task specifications are not available. This is most likely because they depend on "
+ "the content of the training subset. Use `task.prepare_data()` and `task.setup()` "
+ "to go over the training subset and fix this, or let lightning trainer do that for you in `trainer.fit(model)`."
+ )
return self._specifications
@specifications.setter
@@ -281,29 +659,6 @@ def specifications(
):
self._specifications = specifications
- @property
- def has_setup_metadata(self):
- return getattr(self, "_has_setup_metadata", False)
-
- @has_setup_metadata.setter
- def has_setup_metadata(self, value: bool):
- self._has_setup_metadata = value
-
- def setup_metadata(self):
- """Called at the beginning of training at the very beginning of Model.setup(stage="fit")
-
- Notes
- -----
- This hook is called on every process when using DDP.
-
- If `specifications` attribute has not been set in `__init__`,
- `setup` is your last chance to set it.
- """
-
- if not self.has_setup_metadata:
- self.setup()
- self.has_setup_metadata = True
-
def setup_loss_func(self):
pass
diff --git a/pyannote/audio/models/blocks/pooling.py b/pyannote/audio/models/blocks/pooling.py
index 22d736a03..dc31bea8e 100644
--- a/pyannote/audio/models/blocks/pooling.py
+++ b/pyannote/audio/models/blocks/pooling.py
@@ -26,53 +26,53 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
-from einops import rearrange
-class StatsPool(nn.Module):
- """Statistics pooling
+def _pool(sequences: torch.Tensor, weights: torch.Tensor) -> torch.Tensor:
+ """Helper function to compute statistics pooling
- Compute temporal mean and (unbiased) standard deviation
- and returns their concatenation.
+ Assumes that weights are already interpolated to match the number of frames
+ in sequences and that they encode the activation of only one speaker.
- Reference
- ---------
- https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
+ Parameters
+ ----------
+ sequences : (batch, features, frames) torch.Tensor
+ Sequences of features.
+ weights : (batch, frames) torch.Tensor
+ (Already interpolated) weights.
+ Returns
+ -------
+ output : (batch, 2 * features) torch.Tensor
+ Concatenation of mean and (unbiased) standard deviation.
"""
- def _pool(self, sequences: torch.Tensor, weights: torch.Tensor) -> torch.Tensor:
- """Helper function to compute statistics pooling
+ weights = weights.unsqueeze(dim=1)
+ # (batch, 1, frames)
- Assumes that weights are already interpolated to match the number of frames
- in sequences and that they encode the activation of only one speaker.
+ v1 = weights.sum(dim=2) + 1e-8
+ mean = torch.sum(sequences * weights, dim=2) / v1
- Parameters
- ----------
- sequences : (batch, features, frames) torch.Tensor
- Sequences of features.
- weights : (batch, frames) torch.Tensor
- (Already interpolated) weights.
+ dx2 = torch.square(sequences - mean.unsqueeze(2))
+ v2 = torch.square(weights).sum(dim=2)
- Returns
- -------
- output : (batch, 2 * features) torch.Tensor
- Concatenation of mean and (unbiased) standard deviation.
- """
+ var = torch.sum(dx2 * weights, dim=2) / (v1 - v2 / v1 + 1e-8)
+ std = torch.sqrt(var)
- weights = weights.unsqueeze(dim=1)
- # (batch, 1, frames)
+ return torch.cat([mean, std], dim=1)
- v1 = weights.sum(dim=2) + 1e-8
- mean = torch.sum(sequences * weights, dim=2) / v1
- dx2 = torch.square(sequences - mean.unsqueeze(2))
- v2 = torch.square(weights).sum(dim=2)
+class StatsPool(nn.Module):
+ """Statistics pooling
- var = torch.sum(dx2 * weights, dim=2) / (v1 - v2 / v1 + 1e-8)
- std = torch.sqrt(var)
+ Compute temporal mean and (unbiased) standard deviation
+ and returns their concatenation.
- return torch.cat([mean, std], dim=1)
+ Reference
+ ---------
+ https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
+
+ """
def forward(
self, sequences: torch.Tensor, weights: Optional[torch.Tensor] = None
@@ -112,17 +112,20 @@ def forward(
has_speaker_dimension = True
# interpolate weights if needed
- _, _, num_frames = sequences.shape
- _, _, num_weights = weights.shape
+ _, _, num_frames = sequences.size()
+ _, num_speakers, num_weights = weights.size()
if num_frames != num_weights:
warnings.warn(
f"Mismatch between frames ({num_frames}) and weights ({num_weights}) numbers."
)
weights = F.interpolate(weights, size=num_frames, mode="nearest")
- output = rearrange(
- torch.vmap(self._pool, in_dims=(None, 1))(sequences, weights),
- "speakers batch features -> batch speakers features",
+ output = torch.stack(
+ [
+ _pool(sequences, weights[:, speaker, :])
+ for speaker in range(num_speakers)
+ ],
+ dim=1,
)
if not has_speaker_dimension:
diff --git a/pyannote/audio/models/blocks/sincnet.py b/pyannote/audio/models/blocks/sincnet.py
index 65bd6e57f..2a085201c 100644
--- a/pyannote/audio/models/blocks/sincnet.py
+++ b/pyannote/audio/models/blocks/sincnet.py
@@ -1,6 +1,6 @@
# The MIT License (MIT)
#
-# Copyright (c) 2019-2020 CNRS
+# Copyright (c) 2019- CNRS
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -23,22 +23,30 @@
# AUTHOR
# Hervé Bredin - http://herve.niderb.fr
+from functools import lru_cache
import torch
import torch.nn as nn
import torch.nn.functional as F
from asteroid_filterbanks import Encoder, ParamSincFB
+from pyannote.audio.utils.receptive_field import (
+ multi_conv_num_frames,
+ multi_conv_receptive_field_center,
+ multi_conv_receptive_field_size,
+)
+
class SincNet(nn.Module):
def __init__(self, sample_rate: int = 16000, stride: int = 1):
super().__init__()
if sample_rate != 16000:
- raise NotImplementedError("PyanNet only supports 16kHz audio for now.")
+ raise NotImplementedError("SincNet only supports 16kHz audio for now.")
# TODO: add support for other sample rate. it should be enough to multiply
# kernel_size by (sample_rate / 16000). but this needs to be double-checked.
+ self.sample_rate = sample_rate
self.stride = stride
self.wav_norm1d = nn.InstanceNorm1d(1, affine=True)
@@ -70,6 +78,88 @@ def __init__(self, sample_rate: int = 16000, stride: int = 1):
self.pool1d.append(nn.MaxPool1d(3, stride=3, padding=0, dilation=1))
self.norm1d.append(nn.InstanceNorm1d(60, affine=True))
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples.
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames.
+ """
+
+ kernel_size = [251, 3, 5, 3, 5, 3]
+ stride = [self.stride, 3, 1, 3, 1, 3]
+ padding = [0, 0, 0, 0, 0, 0]
+ dilation = [1, 1, 1, 1, 1, 1]
+
+ return multi_conv_num_frames(
+ num_samples,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ )
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+
+ kernel_size = [251, 3, 5, 3, 5, 3]
+ stride = [self.stride, 3, 1, 3, 1, 3]
+ padding = [0, 0, 0, 0, 0, 0]
+ dilation = [1, 1, 1, 1, 1, 1]
+
+ return multi_conv_receptive_field_size(
+ num_frames,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ )
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+
+ kernel_size = [251, 3, 5, 3, 5, 3]
+ stride = [self.stride, 3, 1, 3, 1, 3]
+ padding = [0, 0, 0, 0, 0, 0]
+ dilation = [1, 1, 1, 1, 1, 1]
+
+ return multi_conv_receptive_field_center(
+ frame,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ )
+
def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
"""Pass forward
@@ -83,7 +173,6 @@ def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
for c, (conv1d, pool1d, norm1d) in enumerate(
zip(self.conv1d, self.pool1d, self.norm1d)
):
-
outputs = conv1d(outputs)
# https://github.com/mravanelli/SincNet/issues/4
diff --git a/pyannote/audio/models/embedding/debug.py b/pyannote/audio/models/embedding/debug.py
index 11b3def30..b09283908 100644
--- a/pyannote/audio/models/embedding/debug.py
+++ b/pyannote/audio/models/embedding/debug.py
@@ -1,6 +1,6 @@
# MIT License
#
-# Copyright (c) 2020 CNRS
+# Copyright (c) 2020- CNRS
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -21,6 +21,7 @@
# SOFTWARE.
+from functools import lru_cache
from typing import Optional
import torch
@@ -39,7 +40,6 @@ def __init__(
num_channels: int = 1,
task: Optional[Task] = None,
):
-
super().__init__(sample_rate=sample_rate, num_channels=num_channels, task=task)
self.mfcc = MFCC(
@@ -58,6 +58,81 @@ def __init__(
bidirectional=True,
)
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames for a given number of input samples
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames
+
+ Source
+ ------
+ https://pytorch.org/docs/stable/generated/torch.stft.html#torch.stft
+
+ """
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ center = self.mfcc.MelSpectrogram.spectrogram.center
+
+ if center:
+ return 1 + num_samples // hop_length
+ else:
+ return 1 + (num_samples - n_fft) // hop_length
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ return n_fft + (num_frames - 1) * hop_length
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ center = self.mfcc.MelSpectrogram.spectrogram.center
+
+ if center:
+ return frame * hop_length
+ else:
+ return frame * hop_length + n_fft // 2
+
+ @property
+ def dimension(self) -> int:
+ """Dimension of output"""
+ return 64
+
def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
"""
diff --git a/pyannote/audio/models/embedding/wespeaker/__init__.py b/pyannote/audio/models/embedding/wespeaker/__init__.py
index 603a88c64..be51196c1 100644
--- a/pyannote/audio/models/embedding/wespeaker/__init__.py
+++ b/pyannote/audio/models/embedding/wespeaker/__init__.py
@@ -21,29 +21,52 @@
# SOFTWARE.
-from functools import partial
+from functools import lru_cache, partial
from typing import Optional
import torch
+import torch.nn.functional as F
import torchaudio.compliance.kaldi as kaldi
from pyannote.audio.core.model import Model
from pyannote.audio.core.task import Task
+from pyannote.audio.utils.receptive_field import (
+ conv1d_num_frames,
+ conv1d_receptive_field_center,
+ conv1d_receptive_field_size,
+)
from .resnet import ResNet34, ResNet152, ResNet221, ResNet293
class BaseWeSpeakerResNet(Model):
+ """Base class for WeSpeaker's ResNet models
+
+ Parameters
+ ----------
+ fbank_centering_span : float, optional
+ Span of the fbank centering window (in seconds).
+ Defaults (None) to use whole input.
+
+ See also
+ --------
+ torchaudio.compliance.kaldi.fbank
+
+ """
+
def __init__(
self,
sample_rate: int = 16000,
num_channels: int = 1,
num_mel_bins: int = 80,
- frame_length: int = 25,
- frame_shift: int = 10,
+ frame_length: float = 25.0, # in milliseconds
+ frame_shift: float = 10.0, # in milliseconds
+ round_to_power_of_two: bool = True,
+ snip_edges: bool = True,
dither: float = 0.0,
window_type: str = "hamming",
use_energy: bool = False,
+ fbank_centering_span: Optional[float] = None,
task: Optional[Task] = None,
):
super().__init__(sample_rate=sample_rate, num_channels=num_channels, task=task)
@@ -55,21 +78,38 @@ def __init__(
"frame_length",
"frame_shift",
"dither",
+ "round_to_power_of_two",
+ "snip_edges",
"window_type",
"use_energy",
+ "fbank_centering_span",
)
self._fbank = partial(
kaldi.fbank,
num_mel_bins=self.hparams.num_mel_bins,
frame_length=self.hparams.frame_length,
+ round_to_power_of_two=self.hparams.round_to_power_of_two,
frame_shift=self.hparams.frame_shift,
+ snip_edges=self.hparams.snip_edges,
dither=self.hparams.dither,
sample_frequency=self.hparams.sample_rate,
window_type=self.hparams.window_type,
use_energy=self.hparams.use_energy,
)
+ @property
+ def fbank_only(self) -> bool:
+ """Whether to only extract fbank features"""
+ return getattr(self, "_fbank_only", False)
+
+ @fbank_only.setter
+ def fbank_only(self, value: bool):
+ if hasattr(self, "receptive_field"):
+ del self.receptive_field
+
+ self._fbank_only = value
+
def compute_fbank(self, waveforms: torch.Tensor) -> torch.Tensor:
"""Extract fbank features
@@ -80,6 +120,7 @@ def compute_fbank(self, waveforms: torch.Tensor) -> torch.Tensor:
Returns
-------
fbank : (batch_size, num_frames, num_mel_bins)
+ fbank features
Source: https://github.com/wenet-e2e/wespeaker/blob/45941e7cba2c3ea99e232d02bedf617fc71b0dad/wespeaker/bin/infer_onnx.py#L30C1-L50
"""
@@ -93,19 +134,209 @@ def compute_fbank(self, waveforms: torch.Tensor) -> torch.Tensor:
features = torch.vmap(self._fbank)(waveforms.to(fft_device)).to(device)
- return features - torch.mean(features, dim=1, keepdim=True)
+ # center features with global average
+ if self.hparams.fbank_centering_span is None:
+ return features - torch.mean(features, dim=1, keepdim=True)
+
+ # center features with running average
+ window_size = int(self.hparams.sample_rate * self.hparams.frame_length * 0.001)
+ step_size = int(self.hparams.sample_rate * self.hparams.frame_shift * 0.001)
+ kernel_size = conv1d_num_frames(
+ num_samples=int(
+ self.hparams.fbank_centering_span * self.hparams.sample_rate
+ ),
+ kernel_size=window_size,
+ stride=step_size,
+ padding=0,
+ dilation=1,
+ )
+ return features - F.avg_pool1d(
+ features.transpose(1, 2),
+ kernel_size=2 * (kernel_size // 2) + 1,
+ stride=1,
+ padding=kernel_size // 2,
+ count_include_pad=False,
+ ).transpose(1, 2)
+
+ @property
+ def dimension(self) -> int:
+ """Dimension of output"""
+
+ if self.fbank_only:
+ return self.hparams.num_mel_bins
+
+ return self.resnet.embed_dim
+
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples.
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames.
+ """
+ window_size = int(self.hparams.sample_rate * self.hparams.frame_length * 0.001)
+ step_size = int(self.hparams.sample_rate * self.hparams.frame_shift * 0.001)
+
+ # TODO: take round_to_power_of_two and snip_edges into account
+
+ num_frames = conv1d_num_frames(
+ num_samples=num_samples,
+ kernel_size=window_size,
+ stride=step_size,
+ padding=0,
+ dilation=1,
+ )
+
+ if self.fbank_only:
+ return num_frames
+
+ return self.resnet.num_frames(num_frames)
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+
+ receptive_field_size = num_frames
+
+ if not self.fbank_only:
+ receptive_field_size = self.resnet.receptive_field_size(
+ receptive_field_size
+ )
+
+ window_size = int(self.hparams.sample_rate * self.hparams.frame_length * 0.001)
+ step_size = int(self.hparams.sample_rate * self.hparams.frame_shift * 0.001)
+
+ return conv1d_receptive_field_size(
+ num_frames=receptive_field_size,
+ kernel_size=window_size,
+ stride=step_size,
+ padding=0,
+ dilation=1,
+ )
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+ receptive_field_center = frame
+
+ if not self.fbank_only:
+ receptive_field_center = self.resnet.receptive_field_center(
+ frame=receptive_field_center
+ )
+
+ window_size = int(self.hparams.sample_rate * self.hparams.frame_length * 0.001)
+ step_size = int(self.hparams.sample_rate * self.hparams.frame_shift * 0.001)
+ return conv1d_receptive_field_center(
+ frame=receptive_field_center,
+ kernel_size=window_size,
+ stride=step_size,
+ padding=0,
+ dilation=1,
+ )
def forward(
- self, waveforms: torch.Tensor, weights: torch.Tensor = None
+ self, waveforms: torch.Tensor, weights: Optional[torch.Tensor] = None
) -> torch.Tensor:
+ """Extract speaker embeddings
+
+ Parameters
+ ----------
+ waveforms : torch.Tensor
+ Batch of waveforms with shape (batch, channel, sample)
+ weights : (batch, frames) or (batch, speakers, frames) torch.Tensor, optional
+ Batch of weights passed to statistics pooling layer.
+
+ Returns
+ -------
+ embeddings : (batch, dimension) or (batch, speakers, dimension) torch.Tensor
+ Batch of embeddings.
"""
+ fbank = self.compute_fbank(waveforms)
+ if self.fbank_only:
+ return fbank
+
+ return self.resnet(fbank, weights=weights)[1]
+
+ def forward_frames(self, waveforms: torch.Tensor) -> torch.Tensor:
+ """Extract frame-wise embeddings
+
Parameters
----------
waveforms : torch.Tensor
Batch of waveforms with shape (batch, channel, sample)
- weights : torch.Tensor, optional
- Batch of weights with shape (batch, frame).
+
+ Returns
+ -------
+ embeddings : (batch, ..., embedding_frames) torch.Tensor
+ Batch of frame-wise embeddings.
+ """
+ fbank = self.compute_fbank(waveforms)
+ return self.resnet.forward_frames(fbank)
+
+ def forward_embedding(
+ self, frames: torch.Tensor, weights: torch.Tensor = None
+ ) -> torch.Tensor:
+ """Extract speaker embeddings from frame-wise embeddings
+
+ Parameters
+ ----------
+ frames : torch.Tensor
+ Batch of frames with shape (batch, ..., embedding_frames).
+ weights : (batch, frames) or (batch, speakers, frames) torch.Tensor, optional
+ Batch of weights passed to statistics pooling layer.
+
+ Returns
+ -------
+ embeddings : (batch, dimension) or (batch, speakers, dimension) torch.Tensor
+ Batch of embeddings.
+
+ """
+ return self.resnet.forward_embedding(frames, weights=weights)[1]
+
+ def forward(
+ self, waveforms: torch.Tensor, weights: Optional[torch.Tensor] = None
+ ) -> torch.Tensor:
+ """Extract speaker embeddings
+
+ Parameters
+ ----------
+ waveforms : torch.Tensor
+ Batch of waveforms with shape (batch, channel, sample)
+ weights : (batch, frames) or (batch, speakers, frames) torch.Tensor, optional
+ Batch of weights passed to statistics pooling layer.
+
+ Returns
+ -------
+ embeddings : (batch, dimension) or (batch, speakers, dimension) torch.Tensor
+ Batch of embeddings.
"""
fbank = self.compute_fbank(waveforms)
diff --git a/pyannote/audio/models/embedding/wespeaker/resnet.py b/pyannote/audio/models/embedding/wespeaker/resnet.py
index 54f95fa8b..4c9d5a5f0 100644
--- a/pyannote/audio/models/embedding/wespeaker/resnet.py
+++ b/pyannote/audio/models/embedding/wespeaker/resnet.py
@@ -15,12 +15,23 @@
# See the License for the specific language governing permissions and
# limitations under the License.
+from functools import lru_cache
+from typing import Optional
+
import torch
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange
from pyannote.audio.models.blocks.pooling import StatsPool
+from pyannote.audio.utils.receptive_field import (
+ conv1d_num_frames,
+ conv1d_receptive_field_center,
+ conv1d_receptive_field_size,
+ multi_conv_num_frames,
+ multi_conv_receptive_field_center,
+ multi_conv_receptive_field_size,
+)
class TSTP(nn.Module):
@@ -35,7 +46,7 @@ def __init__(self, in_dim=0, **kwargs):
self.in_dim = in_dim
self.stats_pool = StatsPool()
- def forward(self, features, weights: torch.Tensor = None):
+ def forward(self, features, weights: Optional[torch.Tensor] = None):
"""
Parameters
@@ -75,6 +86,7 @@ class BasicBlock(nn.Module):
def __init__(self, in_planes, planes, stride=1):
super(BasicBlock, self).__init__()
+ self.stride = stride
self.conv1 = nn.Conv2d(
in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False
)
@@ -97,6 +109,34 @@ def __init__(self, in_planes, planes, stride=1):
nn.BatchNorm2d(self.expansion * planes),
)
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ return multi_conv_num_frames(
+ num_samples,
+ kernel_size=[3, 3],
+ stride=[self.stride, 1],
+ padding=[1, 1],
+ dilation=[1, 1],
+ )
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ return multi_conv_receptive_field_size(
+ num_frames,
+ kernel_size=[3, 3],
+ stride=[self.stride, 1],
+ padding=[1, 1],
+ dilation=[1, 1],
+ )
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ return multi_conv_receptive_field_center(
+ frame,
+ kernel_size=[3, 3],
+ stride=[self.stride, 1],
+ padding=[1, 1],
+ dilation=[1, 1],
+ )
+
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
@@ -110,6 +150,7 @@ class Bottleneck(nn.Module):
def __init__(self, in_planes, planes, stride=1):
super(Bottleneck, self).__init__()
+ self.stride = stride
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(
@@ -134,6 +175,34 @@ def __init__(self, in_planes, planes, stride=1):
nn.BatchNorm2d(self.expansion * planes),
)
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ return multi_conv_num_frames(
+ num_samples,
+ kernel_size=[1, 3, 1],
+ stride=[1, self.stride, 1],
+ padding=[0, 1, 0],
+ dilation=[1, 1, 1],
+ )
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ return multi_conv_receptive_field_size(
+ num_frames,
+ kernel_size=[1, 3, 1],
+ stride=[1, self.stride, 1],
+ padding=[0, 1, 0],
+ dilation=[1, 1, 1],
+ )
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ return multi_conv_receptive_field_center(
+ frame,
+ kernel_size=[1, 3, 1],
+ stride=[1, self.stride, 1],
+ padding=[0, 1, 0],
+ dilation=[1, 1, 1],
+ )
+
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = F.relu(self.bn2(self.conv2(out)))
@@ -190,12 +259,149 @@ def _make_layer(self, block, planes, num_blocks, stride):
self.in_planes = planes * block.expansion
return nn.Sequential(*layers)
- def forward(self, x: torch.Tensor, weights: torch.Tensor = None):
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples.
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames.
+ """
+
+ num_frames = num_samples
+ num_frames = conv1d_num_frames(
+ num_frames, kernel_size=3, stride=1, padding=1, dilation=1
+ )
+ for layers in [self.layer1, self.layer2, self.layer3, self.layer4]:
+ for layer in layers:
+ num_frames = layer.num_frames(num_frames)
+
+ return num_frames
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
"""
+ receptive_field_size = num_frames
+ for layers in reversed([self.layer1, self.layer2, self.layer3, self.layer4]):
+ for layer in reversed(layers):
+ receptive_field_size = layer.receptive_field_size(receptive_field_size)
+
+ receptive_field_size = conv1d_receptive_field_size(
+ num_frames=receptive_field_size,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ dilation=1,
+ )
+
+ return receptive_field_size
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
Parameters
----------
- x : (batch, frames, features) torch.Tensor
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+
+ receptive_field_center = frame
+ for layers in reversed([self.layer1, self.layer2, self.layer3, self.layer4]):
+ for layer in reversed(layers):
+ receptive_field_center = layer.receptive_field_center(
+ frame=receptive_field_center
+ )
+
+ receptive_field_center = conv1d_receptive_field_center(
+ frame=receptive_field_center,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ dilation=1,
+ )
+
+ return receptive_field_center
+
+ def forward_frames(self, fbank: torch.Tensor) -> torch.Tensor:
+ """Extract frame-wise embeddings
+
+ Parameters
+ ----------
+ fbanks : (batch, frames, features) torch.Tensor
+ Batch of fbank features
+
+ Returns
+ -------
+ embeddings : (batch, ..., embedding_frames) torch.Tensor
+ Batch of frame-wise embeddings.
+
+ """
+ fbank = fbank.permute(0, 2, 1) # (B,T,F) => (B,F,T)
+ fbank = fbank.unsqueeze_(1)
+ out = F.relu(self.bn1(self.conv1(fbank)))
+ out = self.layer1(out)
+ out = self.layer2(out)
+ out = self.layer3(out)
+ out = self.layer4(out)
+ return out
+
+ def forward_embedding(
+ self, frames: torch.Tensor, weights: torch.Tensor = None
+ ) -> torch.Tensor:
+ """Extract speaker embeddings
+
+ Parameters
+ ----------
+ frames : torch.Tensor
+ Batch of frames with shape (batch, ..., embedding_frames).
+ weights : (batch, frames) or (batch, speakers, frames) torch.Tensor, optional
+ Batch of weights passed to statistics pooling layer.
+
+ Returns
+ -------
+ embeddings : (batch, dimension) or (batch, speakers, dimension) torch.Tensor
+ Batch of embeddings.
+ """
+
+ stats = self.pool(frames, weights=weights)
+
+ embed_a = self.seg_1(stats)
+ if self.two_emb_layer:
+ out = F.relu(embed_a)
+ out = self.seg_bn_1(out)
+ embed_b = self.seg_2(out)
+ return embed_a, embed_b
+ else:
+ return torch.tensor(0.0), embed_a
+
+ def forward(self, fbank: torch.Tensor, weights: Optional[torch.Tensor] = None):
+ """Extract speaker embeddings
+
+ Parameters
+ ----------
+ fbank : (batch, frames, features) torch.Tensor
Batch of features
weights : (batch, frames) torch.Tensor, optional
Batch of weights
@@ -204,10 +410,9 @@ def forward(self, x: torch.Tensor, weights: torch.Tensor = None):
-------
embedding : (batch, embedding_dim) torch.Tensor
"""
- x = x.permute(0, 2, 1) # (B,T,F) => (B,F,T)
-
- x = x.unsqueeze_(1)
- out = F.relu(self.bn1(self.conv1(x)))
+ fbank = fbank.permute(0, 2, 1) # (B,T,F) => (B,F,T)
+ fbank = fbank.unsqueeze_(1)
+ out = F.relu(self.bn1(self.conv1(fbank)))
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
diff --git a/pyannote/audio/models/embedding/xvector.py b/pyannote/audio/models/embedding/xvector.py
index 975f0a991..3161876e3 100644
--- a/pyannote/audio/models/embedding/xvector.py
+++ b/pyannote/audio/models/embedding/xvector.py
@@ -20,6 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
+from functools import lru_cache
from typing import Optional
import torch
@@ -31,17 +32,21 @@
from pyannote.audio.models.blocks.pooling import StatsPool
from pyannote.audio.models.blocks.sincnet import SincNet
from pyannote.audio.utils.params import merge_dict
+from pyannote.audio.utils.receptive_field import (
+ multi_conv_num_frames,
+ multi_conv_receptive_field_center,
+ multi_conv_receptive_field_size,
+)
class XVectorMFCC(Model):
-
MFCC_DEFAULTS = {"n_mfcc": 40, "dct_type": 2, "norm": "ortho", "log_mels": False}
def __init__(
self,
sample_rate: int = 16000,
num_channels: int = 1,
- mfcc: dict = None,
+ mfcc: Optional[dict] = None,
dimension: int = 512,
task: Optional[Task] = None,
):
@@ -57,11 +62,13 @@ def __init__(
self.tdnns = nn.ModuleList()
in_channel = self.hparams.mfcc["n_mfcc"]
out_channels = [512, 512, 512, 512, 1500]
- kernel_sizes = [5, 3, 3, 1, 1]
- dilations = [1, 2, 3, 1, 1]
+ self.kernel_size = [5, 3, 3, 1, 1]
+ self.dilation = [1, 2, 3, 1, 1]
+ self.padding = [0, 0, 0, 0, 0]
+ self.stride = [1, 1, 1, 1, 1]
for out_channel, kernel_size, dilation in zip(
- out_channels, kernel_sizes, dilations
+ out_channels, self.kernel_size, self.dilation
):
self.tdnns.extend(
[
@@ -81,8 +88,102 @@ def __init__(
self.embedding = nn.Linear(in_channel * 2, self.hparams.dimension)
+ @property
+ def dimension(self) -> int:
+ """Dimension of output"""
+ return self.hparams.dimension
+
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples.
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames.
+ """
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ center = self.mfcc.MelSpectrogram.spectrogram.center
+
+ if center:
+ num_frames = 1 + num_samples // hop_length
+ else:
+ num_frames = 1 + (num_samples - n_fft) // hop_length
+
+ return multi_conv_num_frames(
+ num_frames,
+ kernel_size=self.kernel_size,
+ stride=self.stride,
+ padding=self.padding,
+ dilation=self.dilation,
+ )
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+
+ receptive_field_size = multi_conv_receptive_field_size(
+ num_frames,
+ kernel_size=self.kernel_size,
+ stride=self.stride,
+ padding=self.padding,
+ dilation=self.dilation,
+ )
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ return n_fft + (receptive_field_size - 1) * hop_length
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+
+ receptive_field_center = multi_conv_receptive_field_center(
+ frame,
+ kernel_size=self.kernel_size,
+ stride=self.stride,
+ padding=self.padding,
+ dilation=self.dilation,
+ )
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ center = self.mfcc.MelSpectrogram.spectrogram.center
+
+ if center:
+ return receptive_field_center * hop_length
+ else:
+ return receptive_field_center * hop_length + n_fft // 2
+
def forward(
- self, waveforms: torch.Tensor, weights: torch.Tensor = None
+ self, waveforms: torch.Tensor, weights: Optional[torch.Tensor] = None
) -> torch.Tensor:
"""
@@ -102,14 +203,13 @@ def forward(
class XVectorSincNet(Model):
-
SINCNET_DEFAULTS = {"stride": 10}
def __init__(
self,
sample_rate: int = 16000,
num_channels: int = 1,
- sincnet: dict = None,
+ sincnet: Optional[dict] = None,
dimension: int = 512,
task: Optional[Task] = None,
):
@@ -125,11 +225,13 @@ def __init__(
self.tdnns = nn.ModuleList()
out_channels = [512, 512, 512, 512, 1500]
- kernel_sizes = [5, 3, 3, 1, 1]
- dilations = [1, 2, 3, 1, 1]
+ self.kernel_size = [5, 3, 3, 1, 1]
+ self.dilation = [1, 2, 3, 1, 1]
+ self.padding = [0, 0, 0, 0, 0]
+ self.stride = [1, 1, 1, 1, 1]
for out_channel, kernel_size, dilation in zip(
- out_channels, kernel_sizes, dilations
+ out_channels, self.kernel_size, self.dilation
):
self.tdnns.extend(
[
@@ -149,8 +251,86 @@ def __init__(
self.embedding = nn.Linear(in_channel * 2, self.hparams.dimension)
+ @property
+ def dimension(self) -> int:
+ """Dimension of output"""
+ return self.hparams.dimension
+
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples.
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames.
+ """
+
+ num_frames = self.sincnet.num_frames(num_samples)
+
+ return multi_conv_num_frames(
+ num_frames,
+ kernel_size=self.kernel_size,
+ stride=self.stride,
+ padding=self.padding,
+ dilation=self.dilation,
+ )
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+
+ receptive_field_size = multi_conv_receptive_field_size(
+ num_frames,
+ kernel_size=self.kernel_size,
+ stride=self.stride,
+ padding=self.padding,
+ dilation=self.dilation,
+ )
+
+ return self.sincnet.receptive_field_size(num_frames=receptive_field_size)
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+
+ receptive_field_center = multi_conv_receptive_field_center(
+ frame,
+ kernel_size=self.kernel_size,
+ stride=self.stride,
+ padding=self.padding,
+ dilation=self.dilation,
+ )
+
+ return self.sincnet.receptive_field_center(frame=receptive_field_center)
+
def forward(
- self, waveforms: torch.Tensor, weights: torch.Tensor = None
+ self, waveforms: torch.Tensor, weights: Optional[torch.Tensor] = None
) -> torch.Tensor:
"""
diff --git a/pyannote/audio/models/segmentation/PyanNet.py b/pyannote/audio/models/segmentation/PyanNet.py
index 5af3734b1..3481b40c6 100644
--- a/pyannote/audio/models/segmentation/PyanNet.py
+++ b/pyannote/audio/models/segmentation/PyanNet.py
@@ -20,7 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
-
+from functools import lru_cache
from typing import Optional
import torch
@@ -73,9 +73,9 @@ class PyanNet(Model):
def __init__(
self,
- sincnet: dict = None,
- lstm: dict = None,
- linear: dict = None,
+ sincnet: Optional[dict] = None,
+ lstm: Optional[dict] = None,
+ linear: Optional[dict] = None,
sample_rate: int = 16000,
num_channels: int = 1,
task: Optional[Task] = None,
@@ -138,6 +138,17 @@ def __init__(
]
)
+ @property
+ def dimension(self) -> int:
+ """Dimension of output"""
+ if isinstance(self.specifications, tuple):
+ raise ValueError("PyanNet does not support multi-tasking.")
+
+ if self.specifications.powerset:
+ return self.specifications.num_powerset_classes
+ else:
+ return len(self.specifications.classes)
+
def build(self):
if self.hparams.linear["num_layers"] > 0:
in_features = self.hparams.linear["hidden_size"]
@@ -146,16 +157,56 @@ def build(self):
2 if self.hparams.lstm["bidirectional"] else 1
)
- if isinstance(self.specifications, tuple):
- raise ValueError("PyanNet does not support multi-tasking.")
+ self.classifier = nn.Linear(in_features, self.dimension)
+ self.activation = self.default_activation()
- if self.specifications.powerset:
- out_features = self.specifications.num_powerset_classes
- else:
- out_features = len(self.specifications.classes)
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames for a given number of input samples
- self.classifier = nn.Linear(in_features, out_features)
- self.activation = self.default_activation()
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames
+ """
+
+ return self.sincnet.num_frames(num_samples)
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+ return self.sincnet.receptive_field_size(num_frames=num_frames)
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+
+ return self.sincnet.receptive_field_center(frame=frame)
def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
"""Pass forward
diff --git a/pyannote/audio/models/segmentation/SSeRiouSS.py b/pyannote/audio/models/segmentation/SSeRiouSS.py
index 7cd545177..b96464ab3 100644
--- a/pyannote/audio/models/segmentation/SSeRiouSS.py
+++ b/pyannote/audio/models/segmentation/SSeRiouSS.py
@@ -20,7 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
-
+from functools import lru_cache
from typing import Optional, Union
import torch
@@ -32,6 +32,11 @@
from pyannote.audio.core.model import Model
from pyannote.audio.core.task import Task
from pyannote.audio.utils.params import merge_dict
+from pyannote.audio.utils.receptive_field import (
+ conv1d_num_frames,
+ conv1d_receptive_field_center,
+ conv1d_receptive_field_size,
+)
class SSeRiouSS(Model):
@@ -77,8 +82,8 @@ def __init__(
self,
wav2vec: Union[dict, str] = None,
wav2vec_layer: int = -1,
- lstm: dict = None,
- linear: dict = None,
+ lstm: Optional[dict] = None,
+ linear: Optional[dict] = None,
sample_rate: int = 16000,
num_channels: int = 1,
task: Optional[Task] = None,
@@ -144,9 +149,12 @@ def __init__(
self.lstm = nn.ModuleList(
[
nn.LSTM(
- wav2vec_dim
- if i == 0
- else lstm["hidden_size"] * (2 if lstm["bidirectional"] else 1),
+ (
+ wav2vec_dim
+ if i == 0
+ else lstm["hidden_size"]
+ * (2 if lstm["bidirectional"] else 1)
+ ),
**one_layer_lstm,
)
for i in range(num_layers)
@@ -172,6 +180,17 @@ def __init__(
]
)
+ @property
+ def dimension(self) -> int:
+ """Dimension of output"""
+ if isinstance(self.specifications, tuple):
+ raise ValueError("SSeRiouSS does not support multi-tasking.")
+
+ if self.specifications.powerset:
+ return self.specifications.num_powerset_classes
+ else:
+ return len(self.specifications.classes)
+
def build(self):
if self.hparams.linear["num_layers"] > 0:
in_features = self.hparams.linear["hidden_size"]
@@ -180,16 +199,84 @@ def build(self):
2 if self.hparams.lstm["bidirectional"] else 1
)
- if isinstance(self.specifications, tuple):
- raise ValueError("SSeRiouSS model does not support multi-tasking.")
+ self.classifier = nn.Linear(in_features, self.dimension)
+ self.activation = self.default_activation()
- if self.specifications.powerset:
- out_features = self.specifications.num_powerset_classes
- else:
- out_features = len(self.specifications.classes)
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames
- self.classifier = nn.Linear(in_features, out_features)
- self.activation = self.default_activation()
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples.
+
+ Returns
+ -------
+ num_frames : int
+ Number of output frames.
+ """
+
+ num_frames = num_samples
+ for conv_layer in self.wav2vec.feature_extractor.conv_layers:
+ num_frames = conv1d_num_frames(
+ num_frames,
+ kernel_size=conv_layer.kernel_size,
+ stride=conv_layer.stride,
+ padding=conv_layer.conv.padding[0],
+ dilation=conv_layer.conv.dilation[0],
+ )
+
+ return num_frames
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+
+ receptive_field_size = num_frames
+ for conv_layer in reversed(self.wav2vec.feature_extractor.conv_layers):
+ receptive_field_size = conv1d_receptive_field_size(
+ num_frames=receptive_field_size,
+ kernel_size=conv_layer.kernel_size,
+ stride=conv_layer.stride,
+ padding=conv_layer.conv.padding[0],
+ dilation=conv_layer.conv.dilation[0],
+ )
+ return receptive_field_size
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+ receptive_field_center = frame
+ for conv_layer in reversed(self.wav2vec.feature_extractor.conv_layers):
+ receptive_field_center = conv1d_receptive_field_center(
+ receptive_field_center,
+ kernel_size=conv_layer.kernel_size,
+ stride=conv_layer.stride,
+ padding=conv_layer.conv.padding[0],
+ dilation=conv_layer.conv.dilation[0],
+ )
+ return receptive_field_center
def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
"""Pass forward
diff --git a/pyannote/audio/models/segmentation/debug.py b/pyannote/audio/models/segmentation/debug.py
index 89512320c..ccac612a9 100644
--- a/pyannote/audio/models/segmentation/debug.py
+++ b/pyannote/audio/models/segmentation/debug.py
@@ -21,6 +21,7 @@
# SOFTWARE.
+from functools import lru_cache
from typing import Optional
import torch
@@ -57,18 +58,91 @@ def __init__(
bidirectional=True,
)
- def build(self):
- # define task-dependent layers
+ @lru_cache
+ def num_frames(self, num_samples: int) -> int:
+ """Compute number of output frames for a given number of input samples
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of input samples
+ Returns
+ -------
+ num_frames : int
+ Number of output frames
+
+ Source
+ ------
+ https://pytorch.org/docs/stable/generated/torch.stft.html#torch.stft
+
+ """
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ center = self.mfcc.MelSpectrogram.spectrogram.center
+
+ if center:
+ return 1 + num_samples // hop_length
+ else:
+ return 1 + (num_samples - n_fft) // hop_length
+
+ def receptive_field_size(self, num_frames: int = 1) -> int:
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+
+ Returns
+ -------
+ receptive_field_size : int
+ Receptive field size.
+ """
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ return n_fft + (num_frames - 1) * hop_length
+
+ def receptive_field_center(self, frame: int = 0) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int, optional
+ Frame index
+
+ Returns
+ -------
+ receptive_field_center : int
+ Index of receptive field center.
+ """
+
+ hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length
+ n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft
+ center = self.mfcc.MelSpectrogram.spectrogram.center
+
+ if center:
+ return frame * hop_length
+ else:
+ return frame * hop_length + n_fft // 2
+
+ @property
+ def dimension(self) -> int:
+ """Dimension of output"""
if isinstance(self.specifications, tuple):
raise ValueError("SimpleSegmentationModel does not support multi-tasking.")
if self.specifications.powerset:
- out_features = self.specifications.num_powerset_classes
+ return self.specifications.num_powerset_classes
else:
- out_features = len(self.specifications.classes)
+ return len(self.specifications.classes)
+
+ def build(self):
+ # define task-dependent layers
- self.classifier = nn.Linear(32 * 2, out_features)
+ self.classifier = nn.Linear(32 * 2, self.dimension)
self.activation = self.default_activation()
def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
diff --git a/pyannote/audio/pipelines/clustering.py b/pyannote/audio/pipelines/clustering.py
index 80098ea24..cd4b38935 100644
--- a/pyannote/audio/pipelines/clustering.py
+++ b/pyannote/audio/pipelines/clustering.py
@@ -25,7 +25,7 @@
import random
from enum import Enum
-from typing import Tuple
+from typing import Optional, Tuple
import numpy as np
from einops import rearrange
@@ -56,9 +56,9 @@ def __init__(
def set_num_clusters(
self,
num_embeddings: int,
- num_clusters: int = None,
- min_clusters: int = None,
- max_clusters: int = None,
+ num_clusters: Optional[int] = None,
+ min_clusters: Optional[int] = None,
+ max_clusters: Optional[int] = None,
):
min_clusters = num_clusters or min_clusters or 1
min_clusters = max(1, min(num_embeddings, min_clusters))
@@ -79,7 +79,7 @@ def set_num_clusters(
def filter_embeddings(
self,
embeddings: np.ndarray,
- segmentations: SlidingWindowFeature = None,
+ segmentations: Optional[SlidingWindowFeature] = None,
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Filter NaN embeddings and downsample embeddings
@@ -205,10 +205,10 @@ def assign_embeddings(
def __call__(
self,
embeddings: np.ndarray,
- segmentations: SlidingWindowFeature = None,
- num_clusters: int = None,
- min_clusters: int = None,
- max_clusters: int = None,
+ segmentations: Optional[SlidingWindowFeature] = None,
+ num_clusters: Optional[int] = None,
+ min_clusters: Optional[int] = None,
+ max_clusters: Optional[int] = None,
**kwargs,
) -> np.ndarray:
"""Apply clustering
@@ -323,7 +323,7 @@ def cluster(
embeddings: np.ndarray,
min_clusters: int,
max_clusters: int,
- num_clusters: int = None,
+ num_clusters: Optional[int] = None,
):
"""
@@ -476,10 +476,10 @@ class OracleClustering(BaseClustering):
def __call__(
self,
- embeddings: np.ndarray = None,
- segmentations: SlidingWindowFeature = None,
- file: AudioFile = None,
- frames: SlidingWindow = None,
+ embeddings: Optional[np.ndarray] = None,
+ segmentations: Optional[SlidingWindowFeature] = None,
+ file: Optional[AudioFile] = None,
+ frames: Optional[SlidingWindow] = None,
**kwargs,
) -> np.ndarray:
"""Apply oracle clustering
diff --git a/pyannote/audio/pipelines/multilabel.py b/pyannote/audio/pipelines/multilabel.py
index 18693f14c..b35ebee7c 100644
--- a/pyannote/audio/pipelines/multilabel.py
+++ b/pyannote/audio/pipelines/multilabel.py
@@ -75,7 +75,7 @@ class MultiLabelSegmentation(Pipeline):
def __init__(
self,
- segmentation: PipelineModel = None,
+ segmentation: Optional[PipelineModel] = None,
fscore: bool = False,
share_min_duration: bool = False,
use_auth_token: Union[Text, None] = None,
diff --git a/pyannote/audio/pipelines/overlapped_speech_detection.py b/pyannote/audio/pipelines/overlapped_speech_detection.py
index 1c9790feb..1429e4299 100644
--- a/pyannote/audio/pipelines/overlapped_speech_detection.py
+++ b/pyannote/audio/pipelines/overlapped_speech_detection.py
@@ -128,7 +128,7 @@ def __init__(
# load model
model = get_model(segmentation, use_auth_token=use_auth_token)
- if model.example_output.dimension > 1:
+ if model.dimension > 1:
inference_kwargs["pre_aggregation_hook"] = lambda scores: np.partition(
scores, -2, axis=-1
)[:, :, -2, np.newaxis]
@@ -255,7 +255,7 @@ def compute_components(
_self,
reference: Annotation,
hypothesis: Annotation,
- uem: Timeline = None,
+ uem: Optional[Timeline] = None,
**kwargs,
) -> dict:
return super().compute_components(
diff --git a/pyannote/audio/pipelines/resegmentation.py b/pyannote/audio/pipelines/resegmentation.py
index d01e5d65f..85492f774 100644
--- a/pyannote/audio/pipelines/resegmentation.py
+++ b/pyannote/audio/pipelines/resegmentation.py
@@ -86,7 +86,7 @@ def __init__(
self,
segmentation: PipelineModel = "pyannote/segmentation",
diarization: Text = "diarization",
- der_variant: dict = None,
+ der_variant: Optional[dict] = None,
use_auth_token: Union[Text, None] = None,
):
super().__init__()
@@ -96,7 +96,6 @@ def __init__(
model: Model = get_model(segmentation, use_auth_token=use_auth_token)
self._segmentation = Inference(model)
- self._frames = self._segmentation.model.example_output.frames
self._audio = model.audio
@@ -137,7 +136,7 @@ def classes(self):
def apply(
self,
file: AudioFile,
- diarization: Annotation = None,
+ diarization: Optional[Annotation] = None,
hook: Optional[Callable] = None,
) -> Annotation:
"""Apply speaker diarization
@@ -193,8 +192,8 @@ def apply(
# estimate frame-level number of instantaneous speakers
count = self.speaker_count(
binarized_segmentations,
+ self._segmentation.model.receptive_field,
warm_up=(self.warm_up, self.warm_up),
- frames=self._frames,
)
hook("speaker_counting", count)
@@ -205,7 +204,7 @@ def apply(
support=Segment(
0.0, self._audio.get_duration(file) + self._segmentation.step
),
- resolution=self._frames,
+ resolution=self._segmentation.model.receptive_field,
)
hook("@resegmentation/original", diarization)
diff --git a/pyannote/audio/pipelines/speaker_diarization.py b/pyannote/audio/pipelines/speaker_diarization.py
index 354f6be7e..737cd1cb2 100644
--- a/pyannote/audio/pipelines/speaker_diarization.py
+++ b/pyannote/audio/pipelines/speaker_diarization.py
@@ -32,7 +32,7 @@
import numpy as np
import torch
from einops import rearrange
-from pyannote.core import Annotation, SlidingWindow, SlidingWindowFeature
+from pyannote.core import Annotation, SlidingWindowFeature
from pyannote.metrics.diarization import GreedyDiarizationErrorRate
from pyannote.pipeline.parameter import ParamDict, Uniform
@@ -121,7 +121,7 @@ def __init__(
clustering: str = "AgglomerativeClustering",
embedding_batch_size: int = 1,
segmentation_batch_size: int = 1,
- der_variant: dict = None,
+ der_variant: Optional[dict] = None,
use_auth_token: Union[Text, None] = None,
):
super().__init__()
@@ -147,7 +147,6 @@ def __init__(
skip_aggregation=True,
batch_size=segmentation_batch_size,
)
- self._frames: SlidingWindow = self._segmentation.model.example_output.frames
if self._segmentation.model.specifications.powerset:
self.segmentation = ParamDict(
@@ -428,9 +427,9 @@ def reconstruct(
def apply(
self,
file: AudioFile,
- num_speakers: int = None,
- min_speakers: int = None,
- max_speakers: int = None,
+ num_speakers: Optional[int] = None,
+ min_speakers: Optional[int] = None,
+ max_speakers: Optional[int] = None,
return_embeddings: bool = False,
hook: Optional[Callable] = None,
) -> Annotation:
@@ -493,7 +492,7 @@ def apply(
# estimate frame-level number of instantaneous speakers
count = self.speaker_count(
binarized_segmentations,
- frames=self._frames,
+ self._segmentation.model.receptive_field,
warm_up=(0.0, 0.0),
)
hook("speaker_counting", count)
@@ -527,7 +526,7 @@ def apply(
min_clusters=min_speakers,
max_clusters=max_speakers,
file=file, # <== for oracle clustering
- frames=self._frames, # <== for oracle clustering
+ frames=self._segmentation.model.receptive_field, # <== for oracle clustering
)
# hard_clusters: (num_chunks, num_speakers)
# centroids: (num_speakers, dimension)
@@ -538,15 +537,20 @@ def apply(
# detected number of speakers can still be out of bounds
# (specifically, lower than `min_speakers`), since there could be too few embeddings
# to make enough clusters with a given minimum cluster size.
- if num_different_speakers < min_speakers or num_different_speakers > max_speakers:
- warnings.warn(textwrap.dedent(
- f"""
+ if (
+ num_different_speakers < min_speakers
+ or num_different_speakers > max_speakers
+ ):
+ warnings.warn(
+ textwrap.dedent(
+ f"""
The detected number of speakers ({num_different_speakers}) is outside
the given bounds [{min_speakers}, {max_speakers}]. This can happen if the
given audio file is too short to contain {min_speakers} or more speakers.
Try to lower the desired minimal number of speakers.
"""
- ))
+ )
+ )
# during counting, we could possibly overcount the number of instantaneous
# speakers due to segmentation errors, so we cap the maximum instantaneous number
@@ -618,7 +622,9 @@ def apply(
# of clusters obtained from `clustering`. In this case, we append zero embeddings
# for extra speakers
if len(diarization.labels()) > centroids.shape[0]:
- centroids = np.pad(centroids, ((0, len(diarization.labels()) - centroids.shape[0]), (0, 0)))
+ centroids = np.pad(
+ centroids, ((0, len(diarization.labels()) - centroids.shape[0]), (0, 0))
+ )
# re-order centroids so that they match
# the order given by diarization.labels()
diff --git a/pyannote/audio/pipelines/speaker_verification.py b/pyannote/audio/pipelines/speaker_verification.py
index c870ea622..8c4139b6f 100644
--- a/pyannote/audio/pipelines/speaker_verification.py
+++ b/pyannote/audio/pipelines/speaker_verification.py
@@ -23,12 +23,11 @@
import warnings
from functools import cached_property
from pathlib import Path
-from typing import Text, Union
+from typing import Optional, Text, Union
import numpy as np
import torch
import torch.nn.functional as F
-import torchaudio
import torchaudio.compliance.kaldi as kaldi
from huggingface_hub import hf_hub_download
from huggingface_hub.utils import RepositoryNotFoundError
@@ -40,7 +39,6 @@
from pyannote.audio.core.model import CACHE_DIR
from pyannote.audio.pipelines.utils import PipelineModel, get_model
-backend = torchaudio.get_audio_backend()
try:
from speechbrain.pretrained import (
EncoderClassifier as SpeechBrain_EncoderClassifier,
@@ -49,8 +47,6 @@
SPEECHBRAIN_IS_AVAILABLE = True
except ImportError:
SPEECHBRAIN_IS_AVAILABLE = False
-finally:
- torchaudio.set_audio_backend(backend)
try:
from nemo.collections.asr.models import (
@@ -73,7 +69,7 @@ class NeMoPretrainedSpeakerEmbedding(BaseInference):
def __init__(
self,
embedding: Text = "nvidia/speakerverification_en_titanet_large",
- device: torch.device = None,
+ device: Optional[torch.device] = None,
):
if not NEMO_IS_AVAILABLE:
raise ImportError(
@@ -139,7 +135,7 @@ def min_num_samples(self) -> int:
return upper
def __call__(
- self, waveforms: torch.Tensor, masks: torch.Tensor = None
+ self, waveforms: torch.Tensor, masks: Optional[torch.Tensor] = None
) -> np.ndarray:
"""
@@ -238,7 +234,7 @@ class SpeechBrainPretrainedSpeakerEmbedding(BaseInference):
def __init__(
self,
embedding: Text = "speechbrain/spkrec-ecapa-voxceleb",
- device: torch.device = None,
+ device: Optional[torch.device] = None,
use_auth_token: Union[Text, None] = None,
):
if not SPEECHBRAIN_IS_AVAILABLE:
@@ -314,7 +310,7 @@ def min_num_samples(self) -> int:
return upper
def __call__(
- self, waveforms: torch.Tensor, masks: torch.Tensor = None
+ self, waveforms: torch.Tensor, masks: Optional[torch.Tensor] = None
) -> np.ndarray:
"""
@@ -414,7 +410,7 @@ class ONNXWeSpeakerPretrainedSpeakerEmbedding(BaseInference):
def __init__(
self,
embedding: Text = "hbredin/wespeaker-voxceleb-resnet34-LM",
- device: torch.device = None,
+ device: Optional[torch.device] = None,
):
if not ONNX_IS_AVAILABLE:
raise ImportError(
@@ -560,7 +556,7 @@ def compute_fbank(
return features - torch.mean(features, dim=1, keepdim=True)
def __call__(
- self, waveforms: torch.Tensor, masks: torch.Tensor = None
+ self, waveforms: torch.Tensor, masks: Optional[torch.Tensor] = None
) -> np.ndarray:
"""
@@ -645,7 +641,7 @@ class PyannoteAudioPretrainedSpeakerEmbedding(BaseInference):
def __init__(
self,
embedding: PipelineModel = "pyannote/embedding",
- device: torch.device = None,
+ device: Optional[torch.device] = None,
use_auth_token: Union[Text, None] = None,
):
super().__init__()
@@ -672,7 +668,7 @@ def sample_rate(self) -> int:
@cached_property
def dimension(self) -> int:
- return self.model_.example_output.dimension
+ return self.model_.dimension
@cached_property
def metric(self) -> str:
@@ -695,7 +691,7 @@ def min_num_samples(self) -> int:
return upper
def __call__(
- self, waveforms: torch.Tensor, masks: torch.Tensor = None
+ self, waveforms: torch.Tensor, masks: Optional[torch.Tensor] = None
) -> np.ndarray:
with torch.inference_mode():
if masks is None:
@@ -711,7 +707,7 @@ def __call__(
def PretrainedSpeakerEmbedding(
embedding: PipelineModel,
- device: torch.device = None,
+ device: Optional[torch.device] = None,
use_auth_token: Union[Text, None] = None,
):
"""Pretrained speaker embedding
@@ -801,7 +797,7 @@ class SpeakerEmbedding(Pipeline):
def __init__(
self,
embedding: PipelineModel = "pyannote/embedding",
- segmentation: PipelineModel = None,
+ segmentation: Optional[PipelineModel] = None,
use_auth_token: Union[Text, None] = None,
):
super().__init__()
@@ -848,7 +844,7 @@ def main(
protocol: str = "VoxCeleb.SpeakerVerification.VoxCeleb1",
subset: str = "test",
embedding: str = "pyannote/embedding",
- segmentation: str = None,
+ segmentation: Optional[str] = None,
):
import typer
from pyannote.database import FileFinder, get_protocol
diff --git a/pyannote/audio/pipelines/utils/diarization.py b/pyannote/audio/pipelines/utils/diarization.py
index 4a35f7049..5a0f8f675 100644
--- a/pyannote/audio/pipelines/utils/diarization.py
+++ b/pyannote/audio/pipelines/utils/diarization.py
@@ -20,7 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
-from typing import Dict, Mapping, Tuple, Union
+from typing import Dict, Mapping, Optional, Tuple, Union
import numpy as np
from pyannote.core import Annotation, SlidingWindow, SlidingWindowFeature
@@ -28,7 +28,7 @@
from pyannote.metrics.diarization import DiarizationErrorRate
from pyannote.audio.core.inference import Inference
-from pyannote.audio.utils.signal import Binarize, binarize
+from pyannote.audio.utils.signal import Binarize
# TODO: move to dedicated module
@@ -37,9 +37,9 @@ class SpeakerDiarizationMixin:
@staticmethod
def set_num_speakers(
- num_speakers: int = None,
- min_speakers: int = None,
- max_speakers: int = None,
+ num_speakers: Optional[int] = None,
+ min_speakers: Optional[int] = None,
+ max_speakers: Optional[int] = None,
):
"""Validate number of speakers
@@ -121,8 +121,8 @@ def optimal_mapping(
@staticmethod
def speaker_count(
binarized_segmentations: SlidingWindowFeature,
+ frames: SlidingWindow,
warm_up: Tuple[float, float] = (0.1, 0.1),
- frames: SlidingWindow = None,
) -> SlidingWindowFeature:
"""Estimate frame-level number of instantaneous speakers
@@ -133,7 +133,7 @@ def speaker_count(
warm_up : (float, float) tuple, optional
Left/right warm up ratio of chunk duration.
Defaults to (0.1, 0.1), i.e. 10% on both sides.
- frames : SlidingWindow, optional
+ frames : SlidingWindow
Frames resolution. Defaults to estimate it automatically based on
`segmentations` shape and chunk size. Providing the exact frame
resolution (when known) leads to better temporal precision.
@@ -147,7 +147,7 @@ def speaker_count(
trimmed = Inference.trim(binarized_segmentations, warm_up=warm_up)
count = Inference.aggregate(
np.sum(trimmed, axis=-1, keepdims=True),
- frames=frames,
+ frames,
hamming=False,
missing=0.0,
skip_average=False,
@@ -212,7 +212,7 @@ def to_diarization(
# TODO: investigate alternative aggregation
activations = Inference.aggregate(
segmentations,
- frames=count.sliding_window,
+ count.sliding_window,
hamming=False,
missing=0.0,
skip_average=True,
diff --git a/pyannote/audio/pipelines/utils/getter.py b/pyannote/audio/pipelines/utils/getter.py
index 4c589ad05..51040d1c4 100644
--- a/pyannote/audio/pipelines/utils/getter.py
+++ b/pyannote/audio/pipelines/utils/getter.py
@@ -21,7 +21,7 @@
# SOFTWARE.
import itertools
-from typing import Mapping, Text, Union
+from typing import Mapping, Optional, Text, Union
import torch
from torch_audiomentations.core.transforms_interface import BaseWaveformTransform
@@ -171,7 +171,7 @@ def get_augmentation(augmentation: PipelineAugmentation) -> BaseWaveformTransfor
)
-def get_devices(needs: int = None):
+def get_devices(needs: Optional[int] = None):
"""Get devices that can be used by the pipeline
Parameters
diff --git a/pyannote/audio/pipelines/utils/hook.py b/pyannote/audio/pipelines/utils/hook.py
index 2a675d1c9..db6972e2e 100644
--- a/pyannote/audio/pipelines/utils/hook.py
+++ b/pyannote/audio/pipelines/utils/hook.py
@@ -24,6 +24,7 @@
from copy import deepcopy
from typing import Any, Mapping, Optional, Text
+import torch
from rich.progress import (
BarColumn,
Progress,
@@ -75,6 +76,9 @@ def __call__(
):
return
+ if isinstance(step_artifact, torch.Tensor):
+ step_artifact = step_artifact.numpy(force=True)
+
file.setdefault(self.file_key, dict())[step_name] = deepcopy(step_artifact)
diff --git a/pyannote/audio/pipelines/utils/oracle.py b/pyannote/audio/pipelines/utils/oracle.py
index 44b4ded61..24401f752 100644
--- a/pyannote/audio/pipelines/utils/oracle.py
+++ b/pyannote/audio/pipelines/utils/oracle.py
@@ -20,7 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
-from typing import Union
+from typing import Optional, Union
import numpy as np
from pyannote.core import Annotation, Segment, SlidingWindow, SlidingWindowFeature
@@ -32,14 +32,14 @@ def oracle_segmentation(
file: AudioFile,
window: SlidingWindow,
frames: Union[SlidingWindow, float],
- num_speakers: int = None,
+ num_speakers: Optional[int] = None,
) -> SlidingWindowFeature:
"""Oracle speaker segmentation
Simulates inference based on an (imaginary) oracle segmentation model:
>>> oracle = Model.from_pretrained("oracle")
- >>> assert frames == oracle.example_output.frames
+ >>> assert frames == oracle.receptive_field
>>> inference = Inference(oracle, duration=window.duration, step=window.step, skip_aggregation=True)
>>> oracle_segmentation = inference(file)
diff --git a/pyannote/audio/pipelines/voice_activity_detection.py b/pyannote/audio/pipelines/voice_activity_detection.py
index f67489b64..39e529d89 100644
--- a/pyannote/audio/pipelines/voice_activity_detection.py
+++ b/pyannote/audio/pipelines/voice_activity_detection.py
@@ -284,7 +284,7 @@ class AdaptiveVoiceActivityDetection(Pipeline):
def __init__(
self,
segmentation: PipelineInference = "hbredin/VoiceActivityDetection-PyanNet-DIHARD",
- augmentation: PipelineAugmentation = None,
+ augmentation: Optional[PipelineAugmentation] = None,
fscore: bool = False,
):
super().__init__()
diff --git a/pyannote/audio/sample/__init__.py b/pyannote/audio/sample/__init__.py
new file mode 100644
index 000000000..85399af66
--- /dev/null
+++ b/pyannote/audio/sample/__init__.py
@@ -0,0 +1,56 @@
+# MIT License
+#
+# Copyright (c) 2024- CNRS
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+
+from pathlib import Path
+
+from pyannote.core import Annotation, Segment, Timeline
+from pyannote.database.util import load_rttm
+
+from pyannote.audio.core.io import Audio, AudioFile
+
+
+def _sample() -> AudioFile:
+ sample_wav = Path(__file__).parent / "sample.wav"
+ uri = "sample"
+
+ audio = Audio()
+ waveform, sample_rate = audio(sample_wav)
+
+ sample_rttm = Path(__file__).parent / "sample.rttm"
+
+ annotation: Annotation = load_rttm(sample_rttm)[uri]
+ duration = audio.get_duration(sample_wav)
+
+ annotated: Timeline = Timeline([Segment(0.0, duration)], uri=uri)
+
+ return {
+ "audio": sample_wav,
+ "uri": "sample",
+ "waveform": waveform,
+ "sample_rate": sample_rate,
+ "annotation": annotation,
+ "annotated": annotated,
+ }
+
+
+SAMPLE_FILE = _sample()
diff --git a/pyannote/audio/sample/sample.rttm b/pyannote/audio/sample/sample.rttm
new file mode 100644
index 000000000..7c6b378fe
--- /dev/null
+++ b/pyannote/audio/sample/sample.rttm
@@ -0,0 +1,10 @@
+SPEAKER sample 1 6.690 0.430 speaker90
+SPEAKER sample 1 7.550 0.800 speaker91
+SPEAKER sample 1 8.320 1.700 speaker90
+SPEAKER sample 1 9.920 1.110 speaker91
+SPEAKER sample 1 10.570 4.130 speaker90
+SPEAKER sample 1 14.490 3.430 speaker91
+SPEAKER sample 1 18.050 3.440 speaker90
+SPEAKER sample 1 18.150 0.440 speaker91
+SPEAKER sample 1 21.780 6.720 speaker91
+SPEAKER sample 1 27.850 2.150 speaker90
diff --git a/pyannote/audio/sample/sample.wav b/pyannote/audio/sample/sample.wav
new file mode 100644
index 000000000..150d49a69
Binary files /dev/null and b/pyannote/audio/sample/sample.wav differ
diff --git a/pyannote/audio/tasks/embedding/arcface.py b/pyannote/audio/tasks/embedding/arcface.py
index bb2cb1f6c..cb6401e2b 100644
--- a/pyannote/audio/tasks/embedding/arcface.py
+++ b/pyannote/audio/tasks/embedding/arcface.py
@@ -23,7 +23,7 @@
from __future__ import annotations
-from typing import Dict, Sequence, Union
+from typing import Dict, Optional, Sequence, Union
import pytorch_metric_learning.losses
from pyannote.database import Protocol
@@ -82,15 +82,15 @@ class SupervisedRepresentationLearningWithArcFace(
def __init__(
self,
protocol: Protocol,
- min_duration: float = None,
+ min_duration: Optional[float] = None,
duration: float = 2.0,
num_classes_per_batch: int = 32,
num_chunks_per_class: int = 1,
margin: float = 28.6,
scale: float = 64.0,
- num_workers: int = None,
+ num_workers: Optional[int] = None,
pin_memory: bool = False,
- augmentation: BaseWaveformTransform = None,
+ augmentation: Optional[BaseWaveformTransform] = None,
metric: Union[Metric, Sequence[Metric], Dict[str, Metric]] = None,
):
diff --git a/pyannote/audio/tasks/embedding/mixins.py b/pyannote/audio/tasks/embedding/mixins.py
index da164f04e..9b404f9cf 100644
--- a/pyannote/audio/tasks/embedding/mixins.py
+++ b/pyannote/audio/tasks/embedding/mixins.py
@@ -75,7 +75,7 @@ def batch_size(self) -> int:
def batch_size(self, batch_size: int):
self.batch_size_ = batch_size
- def setup(self):
+ def setup(self, stage=None):
# loop over the training set, remove annotated regions shorter than
# chunk duration, and keep track of the reference annotations, per class.
@@ -119,12 +119,6 @@ def setup(self):
classes=sorted(self._train),
)
- if not self.has_validation:
- return
-
- if isinstance(self.protocol, SpeakerVerificationProtocol):
- self._validation = list(self.protocol.development_trial())
-
def default_metric(
self,
) -> Union[Metric, Sequence[Metric], Dict[str, Metric]]:
@@ -145,7 +139,7 @@ def train__iter__(self):
"""
# create worker-specific random number generator
- rng = create_rng_for_worker(self.model.current_epoch)
+ rng = create_rng_for_worker(self.model)
classes = list(self.specifications.classes)
@@ -250,9 +244,13 @@ def training_step(self, batch, batch_idx: int):
return {"loss": loss}
+ def prepare_validation(self, prepared_dict: Dict):
+ if isinstance(self.protocol, SpeakerVerificationProtocol):
+ prepared_dict["validation"] = list(self.protocol.development_trial())
+
def val__getitem__(self, idx):
if isinstance(self.protocol, SpeakerVerificationProtocol):
- trial = self._validation[idx]
+ trial = self.prepared_data["validation"][idx]
data = dict()
for idx in [1, 2]:
@@ -281,7 +279,7 @@ def val__getitem__(self, idx):
def val__len__(self):
if isinstance(self.protocol, SpeakerVerificationProtocol):
- return len(self._validation)
+ return len(self.prepared_data["validation"])
elif isinstance(self.protocol, SpeakerDiarizationProtocol):
return 0
diff --git a/pyannote/audio/tasks/segmentation/mixins.py b/pyannote/audio/tasks/segmentation/mixins.py
index 018e8db70..be30828f0 100644
--- a/pyannote/audio/tasks/segmentation/mixins.py
+++ b/pyannote/audio/tasks/segmentation/mixins.py
@@ -23,44 +23,40 @@
import itertools
import math
import random
-import warnings
-from collections import defaultdict
from typing import Dict, Sequence, Union
import matplotlib.pyplot as plt
import numpy as np
import torch
-from pyannote.database.protocol import SegmentationProtocol, SpeakerDiarizationProtocol
from pyannote.database.protocol.protocol import Scope, Subset
from pytorch_lightning.loggers import MLFlowLogger, TensorBoardLogger
from torch.utils.data._utils.collate import default_collate
-from torchaudio.backend.common import AudioMetaData
+from torchaudio import AudioMetaData
from torchmetrics import Metric
from torchmetrics.classification import BinaryAUROC, MulticlassAUROC, MultilabelAUROC
-from pyannote.audio.core.task import Problem
+from pyannote.audio.core.task import Problem, Task, get_dtype
from pyannote.audio.utils.random import create_rng_for_worker
Subsets = list(Subset.__args__)
Scopes = list(Scope.__args__)
-class SegmentationTaskMixin:
+class SegmentationTask(Task):
"""Methods common to most segmentation tasks"""
def get_file(self, file_id):
file = dict()
- file["audio"] = str(self.audios[file_id], encoding="utf-8")
+ file["audio"] = self.prepared_data["audio-path"][file_id]
- _audio_info = self.audio_infos[file_id]
- _encoding = self.audio_encodings[file_id]
+ _audio_info = self.prepared_data["audio-info"][file_id]
+ encoding = self.prepared_data["audio-encoding"][file_id]
sample_rate = _audio_info["sample_rate"]
num_frames = _audio_info["num_frames"]
num_channels = _audio_info["num_channels"]
bits_per_sample = _audio_info["bits_per_sample"]
- encoding = str(_encoding, encoding="utf-8")
file["torchaudio.info"] = AudioMetaData(
sample_rate=sample_rate,
num_frames=num_frames,
@@ -71,319 +67,6 @@ def get_file(self, file_id):
return file
- def setup(self):
- """Setup"""
-
- # duration of training chunks
- # TODO: handle variable duration case
- duration = getattr(self, "duration", 0.0)
-
- # list of possible values for each metadata key
- metadata_unique_values = defaultdict(list)
-
- metadata_unique_values["subset"] = Subsets
-
- if isinstance(self.protocol, SpeakerDiarizationProtocol):
- metadata_unique_values["scope"] = Scopes
-
- elif isinstance(self.protocol, SegmentationProtocol):
- classes = getattr(self, "classes", list())
-
- # make sure classes attribute exists (and set to None if it did not exist)
- self.classes = getattr(self, "classes", None)
- if self.classes is None:
- classes = list()
- # metadata_unique_values["classes"] = list(classes)
-
- audios = list() # list of path to audio files
- audio_infos = list()
- audio_encodings = list()
- metadata = list() # list of metadata
-
- annotated_duration = list() # total duration of annotated regions (per file)
- annotated_regions = list() # annotated regions
- annotations = list() # actual annotations
- annotated_classes = list() # list of annotated classes (per file)
- unique_labels = list()
-
- if self.has_validation:
- files_iter = itertools.chain(
- self.protocol.train(), self.protocol.development()
- )
- else:
- files_iter = self.protocol.train()
-
- for file_id, file in enumerate(files_iter):
- # gather metadata and update metadata_unique_values so that each metadatum
- # (e.g. source database or label) is represented by an integer.
- metadatum = dict()
-
- # keep track of source database and subset (train, development, or test)
- if file["database"] not in metadata_unique_values["database"]:
- metadata_unique_values["database"].append(file["database"])
- metadatum["database"] = metadata_unique_values["database"].index(
- file["database"]
- )
- metadatum["subset"] = Subsets.index(file["subset"])
-
- # keep track of speaker label scope (file, database, or global) for speaker diarization protocols
- if isinstance(self.protocol, SpeakerDiarizationProtocol):
- metadatum["scope"] = Scopes.index(file["scope"])
-
- # keep track of list of classes for regular segmentation protocols
- # Different files may be annotated using a different set of classes
- # (e.g. one database for speech/music/noise, and another one for male/female/child)
- if isinstance(self.protocol, SegmentationProtocol):
- if "classes" in file:
- local_classes = file["classes"]
- else:
- local_classes = file["annotation"].labels()
-
- # if task was not initialized with a fixed list of classes,
- # we build it as the union of all classes found in files
- if self.classes is None:
- for klass in local_classes:
- if klass not in classes:
- classes.append(klass)
- annotated_classes.append(
- [classes.index(klass) for klass in local_classes]
- )
-
- # if task was initialized with a fixed list of classes,
- # we make sure that all files use a subset of these classes
- # if they don't, we issue a warning and ignore the extra classes
- else:
- extra_classes = set(local_classes) - set(self.classes)
- if extra_classes:
- warnings.warn(
- f"Ignoring extra classes ({', '.join(extra_classes)}) found for file {file['uri']} ({file['database']}). "
- )
- annotated_classes.append(
- [
- self.classes.index(klass)
- for klass in set(local_classes) & set(self.classes)
- ]
- )
-
- remaining_metadata_keys = set(file) - set(
- [
- "uri",
- "database",
- "subset",
- "audio",
- "torchaudio.info",
- "scope",
- "classes",
- "annotation",
- "annotated",
- ]
- )
-
- # keep track of any other (integer or string) metadata provided by the protocol
- # (e.g. a "domain" key for domain-adversarial training)
- for key in remaining_metadata_keys:
- value = file[key]
-
- if isinstance(value, str):
- if value not in metadata_unique_values[key]:
- metadata_unique_values[key].append(value)
- metadatum[key] = metadata_unique_values[key].index(value)
-
- elif isinstance(value, int):
- metadatum[key] = value
-
- else:
- warnings.warn(
- f"Ignoring '{key}' metadata because of its type ({type(value)}). Only str and int are supported for now.",
- category=UserWarning,
- )
-
- metadata.append(metadatum)
-
- database_unique_labels = list()
-
- # reset list of file-scoped labels
- file_unique_labels = list()
-
- # path to audio file
- audios.append(str(file["audio"]))
-
- # audio info
- audio_info = file["torchaudio.info"]
- audio_infos.append(
- (
- audio_info.sample_rate, # sample rate
- audio_info.num_frames, # number of frames
- audio_info.num_channels, # number of channels
- audio_info.bits_per_sample, # bits per sample
- )
- )
- audio_encodings.append(audio_info.encoding) # encoding
-
- # annotated regions and duration
- _annotated_duration = 0.0
- for segment in file["annotated"]:
- # skip annotated regions that are shorter than training chunk duration
- if segment.duration < duration:
- continue
-
- # append annotated region
- annotated_region = (
- file_id,
- segment.duration,
- segment.start,
- segment.end,
- )
- annotated_regions.append(annotated_region)
-
- # increment annotated duration
- _annotated_duration += segment.duration
-
- # append annotated duration
- annotated_duration.append(_annotated_duration)
-
- # annotations
- for segment, _, label in file["annotation"].itertracks(yield_label=True):
- # "scope" is provided by speaker diarization protocols to indicate
- # whether speaker labels are local to the file ('file'), consistent across
- # all files in a database ('database'), or globally consistent ('global')
-
- if "scope" in file:
- # 0 = 'file'
- # 1 = 'database'
- # 2 = 'global'
- scope = Scopes.index(file["scope"])
-
- # update list of file-scope labels
- if label not in file_unique_labels:
- file_unique_labels.append(label)
- # and convert label to its (file-scope) index
- file_label_idx = file_unique_labels.index(label)
-
- database_label_idx = global_label_idx = -1
-
- if scope > 0: # 'database' or 'global'
- # update list of database-scope labels
- if label not in database_unique_labels:
- database_unique_labels.append(label)
-
- # and convert label to its (database-scope) index
- database_label_idx = database_unique_labels.index(label)
-
- if scope > 1: # 'global'
- # update list of global-scope labels
- if label not in unique_labels:
- unique_labels.append(label)
- # and convert label to its (global-scope) index
- global_label_idx = unique_labels.index(label)
-
- # basic segmentation protocols do not provide "scope" information
- # as classes are global by definition
-
- else:
- try:
- file_label_idx = (
- database_label_idx
- ) = global_label_idx = classes.index(label)
- except ValueError:
- # skip labels that are not in the list of classes
- continue
-
- annotations.append(
- (
- file_id, # index of file
- segment.start, # start time
- segment.end, # end time
- file_label_idx, # file-scope label index
- database_label_idx, # database-scope label index
- global_label_idx, # global-scope index
- )
- )
-
- # since not all metadata keys are present in all files, fallback to -1 when a key is missing
- metadata = [
- tuple(metadatum.get(key, -1) for key in metadata_unique_values)
- for metadatum in metadata
- ]
- dtype = [(key, "i") for key in metadata_unique_values]
- self.metadata = np.array(metadata, dtype=dtype)
-
- # NOTE: read with str(self.audios[file_id], encoding='utf-8')
- self.audios = np.array(audios, dtype=np.string_)
-
- # turn list of files metadata into a single numpy array
- # TODO: improve using https://github.com/pytorch/pytorch/issues/13246#issuecomment-617140519
-
- dtype = [
- ("sample_rate", "i"),
- ("num_frames", "i"),
- ("num_channels", "i"),
- ("bits_per_sample", "i"),
- ]
- self.audio_infos = np.array(audio_infos, dtype=dtype)
- self.audio_encodings = np.array(audio_encodings, dtype=np.string_)
-
- self.annotated_duration = np.array(annotated_duration)
-
- # turn list of annotated regions into a single numpy array
- dtype = [("file_id", "i"), ("duration", "f"), ("start", "f"), ("end", "f")]
- self.annotated_regions = np.array(annotated_regions, dtype=dtype)
-
- # convert annotated_classes (which is a list of list of classes, one list of classes per file)
- # into a single (num_files x num_classes) numpy array:
- # * True indicates that this particular class was annotated for this particular file (though it may not be active in this file)
- # * False indicates that this particular class was not even annotated (i.e. its absence does not imply that it is not active in this file)
- if isinstance(self.protocol, SegmentationProtocol) and self.classes is None:
- self.classes = classes
- self.annotated_classes = np.zeros(
- (len(annotated_classes), len(self.classes)), dtype=np.bool_
- )
- for file_id, classes in enumerate(annotated_classes):
- self.annotated_classes[file_id, classes] = True
-
- # turn list of annotations into a single numpy array
- dtype = [
- ("file_id", "i"),
- ("start", "f"),
- ("end", "f"),
- ("file_label_idx", "i"),
- ("database_label_idx", "i"),
- ("global_label_idx", "i"),
- ]
- self.annotations = np.array(annotations, dtype=dtype)
-
- self.metadata_unique_values = metadata_unique_values
-
- if not self.has_validation:
- return
-
- validation_chunks = list()
-
- # obtain indexes of files in the validation subset
- validation_file_ids = np.where(
- self.metadata["subset"] == Subsets.index("development")
- )[0]
-
- # iterate over files in the validation subset
- for file_id in validation_file_ids:
- # get annotated regions in file
- annotated_regions = self.annotated_regions[
- self.annotated_regions["file_id"] == file_id
- ]
-
- # iterate over annotated regions
- for annotated_region in annotated_regions:
- # number of chunks in annotated region
- num_chunks = round(annotated_region["duration"] // duration)
-
- # iterate over chunks
- for c in range(num_chunks):
- start_time = annotated_region["start"] + c * duration
- validation_chunks.append((file_id, start_time, duration))
-
- dtype = [("file_id", "i"), ("start", "f"), ("duration", "f")]
- self.validation_chunks = np.array(validation_chunks, dtype=dtype)
-
def default_metric(
self,
) -> Union[Metric, Sequence[Metric], Dict[str, Metric]]:
@@ -419,14 +102,20 @@ def train__iter__helper(self, rng: random.Random, **filters):
"""
# indices of training files that matches domain filters
- training = self.metadata["subset"] == Subsets.index("train")
+ training = self.prepared_data["audio-metadata"]["subset"] == Subsets.index(
+ "train"
+ )
for key, value in filters.items():
- training &= self.metadata[key] == self.metadata_unique_values[key].index(value)
+ training &= self.prepared_data["audio-metadata"][key] == self.prepared_data[
+ "metadata"
+ ][key].index(value)
file_ids = np.where(training)[0]
# turn annotated duration into a probability distribution
- annotated_duration = self.annotated_duration[file_ids]
- prob_annotated_duration = annotated_duration / np.sum(annotated_duration)
+ annotated_duration = self.prepared_data["audio-annotated"][file_ids]
+ cum_prob_annotated_duration = np.cumsum(
+ annotated_duration / np.sum(annotated_duration)
+ )
duration = self.duration
@@ -434,28 +123,38 @@ def train__iter__helper(self, rng: random.Random, **filters):
while True:
# select one file at random (with probability proportional to its annotated duration)
- file_id = np.random.choice(file_ids, p=prob_annotated_duration)
+ file_id = file_ids[cum_prob_annotated_duration.searchsorted(rng.random())]
# generate `num_chunks_per_file` chunks from this file
for _ in range(num_chunks_per_file):
# find indices of annotated regions in this file
annotated_region_indices = np.where(
- self.annotated_regions["file_id"] == file_id
+ self.prepared_data["annotations-regions"]["file_id"] == file_id
)[0]
# turn annotated regions duration into a probability distribution
- prob_annotated_regions_duration = self.annotated_regions["duration"][
- annotated_region_indices
- ] / np.sum(self.annotated_regions["duration"][annotated_region_indices])
- # selected one annotated region at random (with probability proportional to its duration)
- annotated_region_index = np.random.choice(
- annotated_region_indices, p=prob_annotated_regions_duration
+ cum_prob_annotated_regions_duration = np.cumsum(
+ self.prepared_data["annotations-regions"]["duration"][
+ annotated_region_indices
+ ]
+ / np.sum(
+ self.prepared_data["annotations-regions"]["duration"][
+ annotated_region_indices
+ ]
+ )
)
+ # selected one annotated region at random (with probability proportional to its duration)
+ annotated_region_index = annotated_region_indices[
+ cum_prob_annotated_regions_duration.searchsorted(rng.random())
+ ]
+
# select one chunk at random in this annotated region
- _, _, start, end = self.annotated_regions[annotated_region_index]
- start_time = rng.uniform(start, end - duration)
+ _, region_duration, start = self.prepared_data["annotations-regions"][
+ annotated_region_index
+ ]
+ start_time = rng.uniform(start, start + region_duration - duration)
yield self.prepare_chunk(file_id, start_time, duration)
@@ -475,7 +174,7 @@ def train__iter__(self):
"""
# create worker-specific random number generator
- rng = create_rng_for_worker(self.model.current_epoch)
+ rng = create_rng_for_worker(self.model)
balance = getattr(self, "balance", None)
if balance is None:
@@ -485,7 +184,7 @@ def train__iter__(self):
# create a subchunk generator for each combination of "balance" keys
subchunks = dict()
for product in itertools.product(
- *[self.metadata_unique_values[key] for key in balance]
+ *[self.prepared_data["metadata"][key] for key in balance]
):
# we iterate on the cartesian product of the values in metadata_unique_values
# eg: for balance=["database", "split"], with 2 databases and 2 splits:
@@ -556,12 +255,52 @@ def collate_fn(self, batch, stage="train"):
def train__len__(self):
# Number of training samples in one epoch
+ train_file_ids = np.where(
+ self.prepared_data["audio-metadata"]["subset"] == Subsets.index("train")
+ )[0]
- duration = np.sum(self.annotated_duration)
+ duration = np.sum(self.prepared_data["audio-annotated"][train_file_ids])
return max(self.batch_size, math.ceil(duration / self.duration))
+ def prepare_validation(self, prepared_data: Dict):
+ validation_chunks = list()
+
+ # obtain indexes of files in the validation subset
+ validation_file_ids = np.where(
+ prepared_data["audio-metadata"]["subset"] == Subsets.index("development")
+ )[0]
+
+ # iterate over files in the validation subset
+ for file_id in validation_file_ids:
+ # get annotated regions in file
+ annotated_regions = prepared_data["annotations-regions"][
+ prepared_data["annotations-regions"]["file_id"] == file_id
+ ]
+
+ # iterate over annotated regions
+ for annotated_region in annotated_regions:
+ # number of chunks in annotated region
+ num_chunks = round(annotated_region["duration"] // self.duration)
+
+ # iterate over chunks
+ for c in range(num_chunks):
+ start_time = annotated_region["start"] + c * self.duration
+ validation_chunks.append((file_id, start_time, self.duration))
+
+ dtype = [
+ (
+ "file_id",
+ get_dtype(max(v[0] for v in validation_chunks)),
+ ),
+ ("start", "f"),
+ ("duration", "f"),
+ ]
+
+ prepared_data["validation"] = np.array(validation_chunks, dtype=dtype)
+ validation_chunks.clear()
+
def val__getitem__(self, idx):
- validation_chunk = self.validation_chunks[idx]
+ validation_chunk = self.prepared_data["validation"][idx]
return self.prepare_chunk(
validation_chunk["file_id"],
validation_chunk["start"],
@@ -569,7 +308,7 @@ def val__getitem__(self, idx):
)
def val__len__(self):
- return len(self.validation_chunks)
+ return len(self.prepared_data["validation"])
def validation_step(self, batch, batch_idx: int):
"""Compute validation area under the ROC curve
diff --git a/pyannote/audio/tasks/segmentation/multilabel.py b/pyannote/audio/tasks/segmentation/multilabel.py
index c1d58431a..9184121c4 100644
--- a/pyannote/audio/tasks/segmentation/multilabel.py
+++ b/pyannote/audio/tasks/segmentation/multilabel.py
@@ -20,6 +20,8 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
+import itertools
+import textwrap
from typing import Dict, List, Optional, Sequence, Text, Tuple, Union
import numpy as np
@@ -31,11 +33,11 @@
from torch_audiomentations.core.transforms_interface import BaseWaveformTransform
from torchmetrics import Metric
-from pyannote.audio.core.task import Problem, Resolution, Specifications, Task
-from pyannote.audio.tasks.segmentation.mixins import SegmentationTaskMixin
+from pyannote.audio.core.task import Problem, Resolution, Specifications
+from pyannote.audio.tasks.segmentation.mixins import SegmentationTask
-class MultiLabelSegmentation(SegmentationTaskMixin, Task):
+class MultiLabelSegmentation(SegmentationTask):
"""Generic multi-label segmentation
Multi-label segmentation is the process of detecting temporal intervals
@@ -47,7 +49,13 @@ class MultiLabelSegmentation(SegmentationTaskMixin, Task):
Parameters
----------
protocol : Protocol
- pyannote.database protocol
+ cache : str, optional
+ As (meta-)data preparation might take a very long time for large datasets,
+ it can be cached to disk for later (and faster!) re-use.
+ When `cache` does not exist, `Task.prepare_data()` generates training
+ and validation metadata from `protocol` and save them to disk.
+ When `cache` exists, `Task.prepare_data()` is skipped and (meta)-data
+ are loaded from disk. Defaults to a temporary path.
classes : List[str], optional
List of classes. Defaults to the list of classes available in the training set.
duration : float, optional
@@ -84,15 +92,16 @@ class MultiLabelSegmentation(SegmentationTaskMixin, Task):
def __init__(
self,
protocol: Protocol,
+ cache: Optional[Union[str, None]] = None,
classes: Optional[List[str]] = None,
duration: float = 2.0,
warm_up: Union[float, Tuple[float, float]] = 0.0,
- balance: Sequence[Text] = None,
- weight: Text = None,
+ balance: Optional[Sequence[Text]] = None,
+ weight: Optional[Text] = None,
batch_size: int = 32,
- num_workers: int = None,
+ num_workers: Optional[int] = None,
pin_memory: bool = False,
- augmentation: BaseWaveformTransform = None,
+ augmentation: Optional[BaseWaveformTransform] = None,
metric: Union[Metric, Sequence[Metric], Dict[str, Metric]] = None,
):
if not isinstance(protocol, SegmentationProtocol):
@@ -109,6 +118,7 @@ def __init__(
pin_memory=pin_memory,
augmentation=augmentation,
metric=metric,
+ cache=cache,
)
self.balance = balance
@@ -119,11 +129,114 @@ def __init__(
# classes should be detected. therefore, we postpone the definition of
# specifications to setup()
- def setup(self):
- super().setup()
+ def post_prepare_data(self, prepared_data: Dict):
+ # as different files may be annotated using a different set of classes
+ # (e.g. one database for speech/music/noise, and another one for
+ # male/female/child), we keep track of this information. this is used
+ # to know whether a missing class is considered a negative example (0) or
+ # simple an unknown example (-1)
+
+ if self.classes is None and not self.has_classes:
+ msg = textwrap.dedent(
+ """
+ Could not infer list of classes. Either provide a list of classes when
+ instantiating the task, or make sure that the training protocol provides
+ a 'classes' entry. See https://github.com/pyannote/pyannote-database#segmentation
+ for more details.
+ """
+ )
+
+ if self.has_validation:
+ files_iter = itertools.chain(
+ self.protocol.train(), self.protocol.development()
+ )
+ else:
+ files_iter = self.protocol.train()
+
+ if self.classes is None:
+ classes = list() # overall list of classes
+ annotated_classes = list() # list of annotated classes (per file)
+
+ for file in files_iter:
+ file_classes = file.get("classes", None)
+
+ if not file_classes:
+ msg = textwrap.dedent(
+ f"""
+ File "{file['uri']}" (from {file['database']} database) does not
+ provide a 'classes' entry. Please make sure the corresponding
+ training protocol provides a 'classes' entry for all files. See
+ https://github.com/pyannote/pyannote-database#segmentation for more
+ details.
+ """
+ )
+ raise ValueError(msg)
+
+ for klass in file_classes:
+ if klass not in classes:
+ classes.append(klass)
+ annotated_classes.append(
+ [classes.index(klass) for klass in file_classes]
+ )
+
+ prepared_data["classes-list"] = np.array(classes, dtype=np.str_)
+ self.classes = classes
+
+ else:
+ annotated_classes = list() # list of annotated classes (per file)
+ for file in files_iter:
+ file_classes = file.get("classes", None)
+
+ if not file_classes:
+ msg = textwrap.dedent(
+ f"""
+ File "{file['uri']}" (from {file['database']} database) does not
+ provide a 'classes' entry. Please make sure the corresponding
+ training protocol provides a 'classes' entry for all files. See
+ https://github.com/pyannote/pyannote-database#segmentation for more
+ details.
+ """
+ )
+ raise ValueError(msg)
+
+ extra_classes = set(file_classes) - set(self.classes)
+ if extra_classes:
+ msg = textwrap.dedent(
+ f"""
+ File "{file['uri']}" (from {file['database']} database) provides
+ extra classes ({', '.join(extra_classes)}) that are ignored.
+ """
+ )
+ print(msg)
+
+ annotated_classes.append(
+ [
+ self.classes.index(klass)
+ for klass in set(file_classes) & set(self.classes)
+ ]
+ )
+
+ prepared_data["classes-list"] = np.array(self.classes, dtype=np.str_)
+
+ # convert annotated_classes (which is a list of list of classes, one list of classes per file)
+ # into a single (num_files x num_classes) numpy array:
+ # * True indicates that this particular class was annotated for this particular file
+ # (though it may not be active in this file)
+ # * False indicates that this particular class was not even annotated (i.e. its absence
+ # does not imply that it is not active in this file)
+ annotated_classes_array = np.zeros(
+ (len(annotated_classes), len(self.classes)), dtype=np.bool_
+ )
+ for file_id, classes in enumerate(annotated_classes):
+ annotated_classes_array[file_id, classes] = True
+ prepared_data["classes-annotated"] = annotated_classes_array
+ annotated_classes.clear()
+
+ def setup(self, stage=None):
+ super().setup(stage)
self.specifications = Specifications(
- classes=self.classes,
+ classes=self.prepared_data["classes-list"],
problem=Problem.MULTI_LABEL_CLASSIFICATION,
resolution=Resolution.FRAME,
duration=self.duration,
@@ -169,7 +282,9 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
sample = dict()
sample["X"], _ = self.model.audio.crop(file, chunk, duration=duration)
# gather all annotations of current file
- annotations = self.annotations[self.annotations["file_id"] == file_id]
+ annotations = self.prepared_data["annotations-segments"][
+ self.prepared_data["annotations-segments"]["file_id"] == file_id
+ ]
# gather all annotations with non-empty intersection with current chunk
chunk_annotations = annotations[
@@ -177,26 +292,37 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
]
# discretize chunk annotations at model output resolution
- start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start
- start_idx = np.floor(start / self.model.example_output.frames.step).astype(int)
- end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start
- end_idx = np.ceil(end / self.model.example_output.frames.step).astype(int)
+ step = self.model.receptive_field.step
+ half = 0.5 * self.model.receptive_field.duration
+
+ start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start - half
+ start_idx = np.maximum(0, np.round(start / step)).astype(int)
+
+ end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start - half
+ end_idx = np.round(end / step).astype(int)
# frame-level targets (-1 for un-annotated classes)
+ num_frames = self.model.num_frames(
+ round(duration * self.model.hparams.sample_rate)
+ )
y = -np.ones(
- (self.model.example_output.num_frames, len(self.classes)), dtype=np.int8
+ (
+ num_frames,
+ len(self.prepared_data["classes-list"]),
+ ),
+ dtype=np.int8,
)
- y[:, self.annotated_classes[file_id]] = 0
+ y[:, self.prepared_data["classes-annotated"][file_id]] = 0
for start, end, label in zip(
start_idx, end_idx, chunk_annotations["global_label_idx"]
):
- y[start:end, label] = 1
+ y[start : end + 1, label] = 1
sample["y"] = SlidingWindowFeature(
- y, self.model.example_output.frames, labels=self.classes
+ y, self.model.receptive_field, labels=self.classes
)
- metadata = self.metadata[file_id]
+ metadata = self.prepared_data["audio-metadata"][file_id]
sample["meta"] = {key: metadata[key] for key in metadata.dtype.names}
sample["meta"]["file"] = file_id
diff --git a/pyannote/audio/tasks/segmentation/overlapped_speech_detection.py b/pyannote/audio/tasks/segmentation/overlapped_speech_detection.py
index 0b7209c5c..89d299a8d 100644
--- a/pyannote/audio/tasks/segmentation/overlapped_speech_detection.py
+++ b/pyannote/audio/tasks/segmentation/overlapped_speech_detection.py
@@ -21,7 +21,7 @@
# SOFTWARE.
-from typing import Dict, Sequence, Text, Tuple, Union
+from typing import Dict, Optional, Sequence, Text, Tuple, Union
import numpy as np
from pyannote.core import Segment, SlidingWindowFeature
@@ -29,11 +29,11 @@
from torch_audiomentations.core.transforms_interface import BaseWaveformTransform
from torchmetrics import Metric
-from pyannote.audio.core.task import Problem, Resolution, Specifications, Task
-from pyannote.audio.tasks.segmentation.mixins import SegmentationTaskMixin
+from pyannote.audio.core.task import Problem, Resolution, Specifications
+from pyannote.audio.tasks.segmentation.mixins import SegmentationTask
-class OverlappedSpeechDetection(SegmentationTaskMixin, Task):
+class OverlappedSpeechDetection(SegmentationTask):
"""Overlapped speech detection
Overlapped speech detection is the task of detecting regions where at least
@@ -51,6 +51,13 @@ class OverlappedSpeechDetection(SegmentationTaskMixin, Task):
----------
protocol : Protocol
pyannote.database protocol
+ cache : str, optional
+ As (meta-)data preparation might take a very long time for large datasets,
+ it can be cached to disk for later (and faster!) re-use.
+ When `cache` does not exist, `Task.prepare_data()` generates training
+ and validation metadata from `protocol` and save them to disk.
+ When `cache` exists, `Task.prepare_data()` is skipped and (meta)-data
+ are loaded from disk. Defaults to a temporary path.
duration : float, optional
Chunks duration. Defaults to 2s.
warm_up : float or (float, float), optional
@@ -66,11 +73,12 @@ class OverlappedSpeechDetection(SegmentationTaskMixin, Task):
overlap: dict, optional
Controls how artificial chunks with overlapping speech are generated:
- "probability" key is the probability of artificial overlapping chunks. Setting
- "probability" to 0.6 means that, on average, 40% of training chunks are "real"
- chunks, while 60% are artifical chunks made out of the (weighted) sum of two
- chunks. Defaults to 0.5.
+ "probability" to 0.6 means that, on average, 40% of training chunks are "real"
+ chunks, while 60% are artifical chunks made out of the (weighted) sum of two
+ chunks. Defaults to 0.5.
- "snr_min" and "snr_max" keys control the minimum and maximum signal-to-noise
- ratio between summed chunks, in dB. Default to 0.0 and 10.
+ ratio between summed chunks, in dB. Default to 0.0 and 10.
+
weight: str, optional
When provided, use this key to as frame-wise weight in loss function.
batch_size : int, optional
@@ -98,13 +106,14 @@ def __init__(
duration: float = 2.0,
warm_up: Union[float, Tuple[float, float]] = 0.0,
overlap: dict = OVERLAP_DEFAULTS,
- balance: Sequence[Text] = None,
- weight: Text = None,
+ balance: Optional[Sequence[Text]] = None,
+ weight: Optional[Text] = None,
batch_size: int = 32,
- num_workers: int = None,
+ num_workers: Optional[int] = None,
pin_memory: bool = False,
- augmentation: BaseWaveformTransform = None,
+ augmentation: Optional[BaseWaveformTransform] = None,
metric: Union[Metric, Sequence[Metric], Dict[str, Metric]] = None,
+ cache: Optional[Union[str, None]] = None,
):
super().__init__(
protocol,
@@ -115,6 +124,7 @@ def __init__(
pin_memory=pin_memory,
augmentation=augmentation,
metric=metric,
+ cache=cache,
)
self.specifications = Specifications(
@@ -163,7 +173,9 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
sample["X"], _ = self.model.audio.crop(file, chunk, duration=duration)
# gather all annotations of current file
- annotations = self.annotations[self.annotations["file_id"] == file_id]
+ annotations = self.prepared_data["annotations-segments"][
+ self.prepared_data["annotations-segments"]["file_id"] == file_id
+ ]
# gather all annotations with non-empty intersection with current chunk
chunk_annotations = annotations[
@@ -171,22 +183,29 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
]
# discretize chunk annotations at model output resolution
- start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start
- start_idx = np.floor(start / self.model.example_output.frames.step).astype(int)
- end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start
- end_idx = np.ceil(end / self.model.example_output.frames.step).astype(int)
+ step = self.model.receptive_field.step
+ half = 0.5 * self.model.receptive_field.duration
+
+ start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start - half
+ start_idx = np.maximum(0, np.round(start / step)).astype(int)
+
+ end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start - half
+ end_idx = np.round(end / step).astype(int)
# frame-level targets
- y = np.zeros((self.model.example_output.num_frames, 1), dtype=np.uint8)
+ num_frames = self.model.num_frames(
+ round(duration * self.model.hparams.sample_rate)
+ )
+ y = np.zeros((num_frames, 1), dtype=np.uint8)
for start, end in zip(start_idx, end_idx):
- y[start:end, 0] += 1
+ y[start : end + 1, 0] += 1
y = 1 * (y > 1)
sample["y"] = SlidingWindowFeature(
- y, self.model.example_output.frames, labels=["speech"]
+ y, self.model.receptive_field, labels=["speech"]
)
- metadata = self.metadata[file_id]
+ metadata = self.prepared_data["audio-metadata"][file_id]
sample["meta"] = {key: metadata[key] for key in metadata.dtype.names}
sample["meta"]["file"] = file_id
diff --git a/pyannote/audio/tasks/segmentation/speaker_diarization.py b/pyannote/audio/tasks/segmentation/speaker_diarization.py
index 1094672ed..8a091b1f7 100644
--- a/pyannote/audio/tasks/segmentation/speaker_diarization.py
+++ b/pyannote/audio/tasks/segmentation/speaker_diarization.py
@@ -23,7 +23,7 @@
import math
import warnings
from collections import Counter
-from typing import Dict, Literal, Sequence, Text, Tuple, Union
+from typing import Dict, Literal, Optional, Sequence, Text, Tuple, Union
import numpy as np
import torch
@@ -37,8 +37,8 @@
from torch_audiomentations.core.transforms_interface import BaseWaveformTransform
from torchmetrics import Metric
-from pyannote.audio.core.task import Problem, Resolution, Specifications, Task
-from pyannote.audio.tasks.segmentation.mixins import SegmentationTaskMixin
+from pyannote.audio.core.task import Problem, Resolution, Specifications
+from pyannote.audio.tasks.segmentation.mixins import SegmentationTask
from pyannote.audio.torchmetrics import (
DiarizationErrorRate,
FalseAlarmRate,
@@ -58,13 +58,20 @@
Scopes = list(Scope.__args__)
-class SpeakerDiarization(SegmentationTaskMixin, Task):
+class SpeakerDiarization(SegmentationTask):
"""Speaker diarization
Parameters
----------
protocol : SpeakerDiarizationProtocol
pyannote.database protocol
+ cache : str, optional
+ As (meta-)data preparation might take a very long time for large datasets,
+ it can be cached to disk for later (and faster!) re-use.
+ When `cache` does not exist, `Task.prepare_data()` generates training
+ and validation metadata from `protocol` and save them to disk.
+ When `cache` exists, `Task.prepare_data()` is skipped and (meta)-data
+ are loaded from disk. Defaults to a temporary path.
duration : float, optional
Chunks duration. Defaults to 2s.
max_speakers_per_chunk : int, optional
@@ -127,20 +134,23 @@ class SpeakerDiarization(SegmentationTaskMixin, Task):
def __init__(
self,
protocol: SpeakerDiarizationProtocol,
+ cache: Optional[Union[str, None]] = None,
duration: float = 2.0,
- max_speakers_per_chunk: int = None,
- max_speakers_per_frame: int = None,
+ max_speakers_per_chunk: Optional[int] = None,
+ max_speakers_per_frame: Optional[int] = None,
weigh_by_cardinality: bool = False,
warm_up: Union[float, Tuple[float, float]] = 0.0,
- balance: Sequence[Text] = None,
- weight: Text = None,
+ balance: Optional[Sequence[Text]] = None,
+ weight: Optional[Text] = None,
batch_size: int = 32,
- num_workers: int = None,
+ num_workers: Optional[int] = None,
pin_memory: bool = False,
- augmentation: BaseWaveformTransform = None,
+ augmentation: Optional[BaseWaveformTransform] = None,
vad_loss: Literal["bce", "mse"] = None,
metric: Union[Metric, Sequence[Metric], Dict[str, Metric]] = None,
- max_num_speakers: int = None, # deprecated in favor of `max_speakers_per_chunk``
+ max_num_speakers: Optional[
+ int
+ ] = None, # deprecated in favor of `max_speakers_per_chunk``
loss: Literal["bce", "mse"] = None, # deprecated
):
super().__init__(
@@ -152,6 +162,7 @@ def __init__(
pin_memory=pin_memory,
augmentation=augmentation,
metric=metric,
+ cache=cache,
)
if not isinstance(protocol, SpeakerDiarizationProtocol):
@@ -186,28 +197,34 @@ def __init__(
self.weight = weight
self.vad_loss = vad_loss
- def setup(self):
- super().setup()
+ def setup(self, stage=None):
+ super().setup(stage)
# estimate maximum number of speakers per chunk when not provided
if self.max_speakers_per_chunk is None:
- training = self.metadata["subset"] == Subsets.index("train")
+ training = self.prepared_data["audio-metadata"]["subset"] == Subsets.index(
+ "train"
+ )
num_unique_speakers = []
progress_description = f"Estimating maximum number of speakers per {self.duration:g}s chunk in the training set"
for file_id in track(
np.where(training)[0], description=progress_description
):
- annotations = self.annotations[
- np.where(self.annotations["file_id"] == file_id)[0]
+ annotations = self.prepared_data["annotations-segments"][
+ np.where(
+ self.prepared_data["annotations-segments"]["file_id"] == file_id
+ )[0]
]
- annotated_regions = self.annotated_regions[
- np.where(self.annotated_regions["file_id"] == file_id)[0]
+ annotated_regions = self.prepared_data["annotations-regions"][
+ np.where(
+ self.prepared_data["annotations-regions"]["file_id"] == file_id
+ )[0]
]
for region in annotated_regions:
# find annotations within current region
region_start = region["start"]
- region_end = region["end"]
+ region_end = region["start"] + region["duration"]
region_annotations = annotations[
np.where(
(annotations["start"] >= region_start)
@@ -318,7 +335,7 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
file = self.get_file(file_id)
# get label scope
- label_scope = Scopes[self.metadata[file_id]["scope"]]
+ label_scope = Scopes[self.prepared_data["audio-metadata"][file_id]["scope"]]
label_scope_key = f"{label_scope}_label_idx"
#
@@ -328,7 +345,9 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
sample["X"], _ = self.model.audio.crop(file, chunk, duration=duration)
# gather all annotations of current file
- annotations = self.annotations[self.annotations["file_id"] == file_id]
+ annotations = self.prepared_data["annotations-segments"][
+ self.prepared_data["annotations-segments"]["file_id"] == file_id
+ ]
# gather all annotations with non-empty intersection with current chunk
chunk_annotations = annotations[
@@ -336,10 +355,14 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
]
# discretize chunk annotations at model output resolution
- start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start
- start_idx = np.floor(start / self.model.example_output.frames.step).astype(int)
- end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start
- end_idx = np.ceil(end / self.model.example_output.frames.step).astype(int)
+ step = self.model.receptive_field.step
+ half = 0.5 * self.model.receptive_field.duration
+
+ start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start - half
+ start_idx = np.maximum(0, np.round(start / step)).astype(int)
+
+ end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start - half
+ end_idx = np.round(end / step).astype(int)
# get list and number of labels for current scope
labels = list(np.unique(chunk_annotations[label_scope_key]))
@@ -349,7 +372,10 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
pass
# initial frame-level targets
- y = np.zeros((self.model.example_output.num_frames, num_labels), dtype=np.uint8)
+ num_frames = self.model.num_frames(
+ round(duration * self.model.hparams.sample_rate)
+ )
+ y = np.zeros((num_frames, num_labels), dtype=np.uint8)
# map labels to indices
mapping = {label: idx for idx, label in enumerate(labels)}
@@ -358,13 +384,11 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
start_idx, end_idx, chunk_annotations[label_scope_key]
):
mapped_label = mapping[label]
- y[start:end, mapped_label] = 1
+ y[start : end + 1, mapped_label] = 1
- sample["y"] = SlidingWindowFeature(
- y, self.model.example_output.frames, labels=labels
- )
+ sample["y"] = SlidingWindowFeature(y, self.model.receptive_field, labels=labels)
- metadata = self.metadata[file_id]
+ metadata = self.prepared_data["audio-metadata"][file_id]
sample["meta"] = {key: metadata[key] for key in metadata.dtype.names}
sample["meta"]["file"] = file_id
@@ -420,7 +444,7 @@ def segmentation_loss(
self,
permutated_prediction: torch.Tensor,
target: torch.Tensor,
- weight: torch.Tensor = None,
+ weight: Optional[torch.Tensor] = None,
) -> torch.Tensor:
"""Permutation-invariant segmentation loss
@@ -463,7 +487,7 @@ def voice_activity_detection_loss(
self,
permutated_prediction: torch.Tensor,
target: torch.Tensor,
- weight: torch.Tensor = None,
+ weight: Optional[torch.Tensor] = None,
) -> torch.Tensor:
"""Voice activity detection loss
@@ -861,7 +885,7 @@ def main(protocol: str, subset: str = "test", model: str = "pyannote/segmentatio
main_task = progress.add_task(protocol.name, total=len(files))
file_task = progress.add_task("Processing", total=1.0)
- def progress_hook(completed: int = None, total: int = None):
+ def progress_hook(completed: Optional[int] = None, total: Optional[int] = None):
progress.update(file_task, completed=completed / total)
inference = Inference(model, device=device)
diff --git a/pyannote/audio/tasks/segmentation/voice_activity_detection.py b/pyannote/audio/tasks/segmentation/voice_activity_detection.py
index fd9eb8e75..e52613aeb 100644
--- a/pyannote/audio/tasks/segmentation/voice_activity_detection.py
+++ b/pyannote/audio/tasks/segmentation/voice_activity_detection.py
@@ -20,7 +20,7 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
-from typing import Dict, Sequence, Text, Tuple, Union
+from typing import Dict, Optional, Sequence, Text, Tuple, Union
import numpy as np
from pyannote.core import Segment, SlidingWindowFeature
@@ -28,11 +28,11 @@
from torch_audiomentations.core.transforms_interface import BaseWaveformTransform
from torchmetrics import Metric
-from pyannote.audio.core.task import Problem, Resolution, Specifications, Task
-from pyannote.audio.tasks.segmentation.mixins import SegmentationTaskMixin
+from pyannote.audio.core.task import Problem, Resolution, Specifications
+from pyannote.audio.tasks.segmentation.mixins import SegmentationTask
-class VoiceActivityDetection(SegmentationTaskMixin, Task):
+class VoiceActivityDetection(SegmentationTask):
"""Voice activity detection
Voice activity detection (or VAD) is the task of detecting speech regions
@@ -45,6 +45,13 @@ class VoiceActivityDetection(SegmentationTaskMixin, Task):
----------
protocol : Protocol
pyannote.database protocol
+ cache : str, optional
+ As (meta-)data preparation might take a very long time for large datasets,
+ it can be cached to disk for later (and faster!) re-use.
+ When `cache` does not exist, `Task.prepare_data()` generates training
+ and validation metadata from `protocol` and save them to disk.
+ When `cache` exists, `Task.prepare_data()` is skipped and (meta)-data
+ are loaded from disk. Defaults to a temporary path.
duration : float, optional
Chunks duration. Defaults to 2s.
warm_up : float or (float, float), optional
@@ -79,14 +86,15 @@ class VoiceActivityDetection(SegmentationTaskMixin, Task):
def __init__(
self,
protocol: Protocol,
+ cache: Optional[Union[str, None]] = None,
duration: float = 2.0,
warm_up: Union[float, Tuple[float, float]] = 0.0,
- balance: Sequence[Text] = None,
- weight: Text = None,
+ balance: Optional[Sequence[Text]] = None,
+ weight: Optional[Text] = None,
batch_size: int = 32,
- num_workers: int = None,
+ num_workers: Optional[int] = None,
pin_memory: bool = False,
- augmentation: BaseWaveformTransform = None,
+ augmentation: Optional[BaseWaveformTransform] = None,
metric: Union[Metric, Sequence[Metric], Dict[str, Metric]] = None,
):
super().__init__(
@@ -98,6 +106,7 @@ def __init__(
pin_memory=pin_memory,
augmentation=augmentation,
metric=metric,
+ cache=cache,
)
self.balance = balance
@@ -145,7 +154,9 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
sample["X"], _ = self.model.audio.crop(file, chunk, duration=duration)
# gather all annotations of current file
- annotations = self.annotations[self.annotations["file_id"] == file_id]
+ annotations = self.prepared_data["annotations-segments"][
+ self.prepared_data["annotations-segments"]["file_id"] == file_id
+ ]
# gather all annotations with non-empty intersection with current chunk
chunk_annotations = annotations[
@@ -153,21 +164,28 @@ def prepare_chunk(self, file_id: int, start_time: float, duration: float):
]
# discretize chunk annotations at model output resolution
- start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start
- start_idx = np.floor(start / self.model.example_output.frames.step).astype(int)
- end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start
- end_idx = np.ceil(end / self.model.example_output.frames.step).astype(int)
+ step = self.model.receptive_field.step
+ half = 0.5 * self.model.receptive_field.duration
+
+ start = np.maximum(chunk_annotations["start"], chunk.start) - chunk.start - half
+ start_idx = np.maximum(0, np.round(start / step)).astype(int)
+
+ end = np.minimum(chunk_annotations["end"], chunk.end) - chunk.start - half
+ end_idx = np.round(end / step).astype(int)
# frame-level targets
- y = np.zeros((self.model.example_output.num_frames, 1), dtype=np.uint8)
+ num_frames = self.model.num_frames(
+ round(duration * self.model.hparams.sample_rate)
+ )
+ y = np.zeros((num_frames, 1), dtype=np.uint8)
for start, end in zip(start_idx, end_idx):
- y[start:end, 0] = 1
+ y[start : end + 1, 0] = 1
sample["y"] = SlidingWindowFeature(
- y, self.model.example_output.frames, labels=["speech"]
+ y, self.model.receptive_field, labels=["speech"]
)
- metadata = self.metadata[file_id]
+ metadata = self.prepared_data["audio-metadata"][file_id]
sample["meta"] = {key: metadata[key] for key in metadata.dtype.names}
sample["meta"]["file"] = file_id
diff --git a/pyannote/audio/torchmetrics/functional/audio/diarization_error_rate.py b/pyannote/audio/torchmetrics/functional/audio/diarization_error_rate.py
index 9502a527e..c401b70dd 100644
--- a/pyannote/audio/torchmetrics/functional/audio/diarization_error_rate.py
+++ b/pyannote/audio/torchmetrics/functional/audio/diarization_error_rate.py
@@ -25,6 +25,7 @@
from typing import Optional, Tuple, Union
import torch
+import torch.nn.functional as F
from pyannote.audio.utils.permutation import permutate
@@ -33,6 +34,7 @@ def _der_update(
preds: torch.Tensor,
target: torch.Tensor,
threshold: Union[torch.Tensor, float] = 0.5,
+ reduce: str = "batch",
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
"""Compute components of diarization error rate
@@ -44,16 +46,41 @@ def _der_update(
(batch_size, num_speakers, num_frames)-shaped (0 or 1) targets.
threshold : float or torch.Tensor, optional
Threshold(s) used to binarize predictions. Defaults to 0.5.
+ reduce : {'batch', 'chunk', 'frame'}, optional
+ Reduction method. Defaults to 'batch'.
Returns
-------
- false_alarm : (num_thresholds, )-shaped torch.Tensor
- missed_detection : (num_thresholds, )-shaped torch.Tensor
- speaker_confusion : (num_thresholds, )-shaped torch.Tensor
- speech_total : torch.Tensor
- Diarization error rate components accumulated over the whole batch.
+ false_alarm : torch.Tensor
+ missed_detection : torch.Tensor
+ speaker_confusion : torch.Tensor
+ If `reduce` is 'batch', returns (num_thresholds, )-shaped tensors.
+ If `reduce` is 'chunk', returns (batch_size, num_thresholds)-shaped tensors.
+ If `reduce` is 'frame', returns (batch_size, num_frames, num_thresholds)-shaped tensors.
+ In case `threshold` is a float, the last dimension is removed from the output tensors.
+ speech_total : (...,)-shaped torch.Tensor torch.Tensor
+ If `reduce` is 'batch', returns a scalar.
+ If `reduce` is 'chunk', returns (batch_size,)-shaped tensor.
+ If `reduce` is 'frame', returns (batch_size, num_frames)-shaped tensor.
"""
+ prd_batch_size, prd_num_speakers, prd_num_frames = preds.shape
+ tgt_batch_size, tgt_num_speakers, tgt_num_frames = target.shape
+
+ if prd_batch_size != tgt_batch_size:
+ raise ValueError(f"Batch size mismatch: {prd_batch_size} != {tgt_batch_size}.")
+
+ if prd_num_frames != tgt_num_frames:
+ raise ValueError(
+ f"Number of frames mismatch: {prd_num_frames} != {tgt_num_frames}."
+ )
+
+ # pad number of speakers if necessary
+ if prd_num_speakers > tgt_num_speakers:
+ target = F.pad(target, (0, 0, 0, prd_num_speakers - tgt_num_speakers))
+ elif prd_num_speakers < tgt_num_speakers:
+ preds = F.pad(preds, (0, 0, 0, tgt_num_speakers - prd_num_speakers))
+
# make threshold a (num_thresholds,) tensor
scalar_threshold = isinstance(threshold, Number)
if scalar_threshold:
@@ -70,6 +97,9 @@ def _der_update(
hypothesis = (permutated_preds.unsqueeze(-1) > threshold).float()
# (batch_size, num_speakers, num_frames, num_thresholds)
+ speech_total = 1.0 * torch.sum(target, 1)
+ # (batch_size, num_frames)
+
target = target.unsqueeze(-1)
# (batch_size, num_speakers, num_frames, 1)
@@ -87,17 +117,47 @@ def _der_update(
speaker_confusion = torch.sum((hypothesis != target) * hypothesis, 1) - false_alarm
# (batch_size, num_frames, num_thresholds)
- false_alarm = torch.sum(torch.sum(false_alarm, 1), 0)
- missed_detection = torch.sum(torch.sum(missed_detection, 1), 0)
- speaker_confusion = torch.sum(torch.sum(speaker_confusion, 1), 0)
+ if reduce == "frame":
+ if scalar_threshold:
+ return (
+ false_alarm[:, :, 0],
+ missed_detection[:, :, 0],
+ speaker_confusion[:, :, 0],
+ speech_total,
+ )
+ return false_alarm, missed_detection, speaker_confusion, torch.sum(target, 1)
+
+ speech_total = torch.sum(speech_total, 1)
+ # (batch_size, )
+ false_alarm = torch.sum(false_alarm, 1)
+ missed_detection = torch.sum(missed_detection, 1)
+ speaker_confusion = torch.sum(speaker_confusion, 1)
+ # (batch_size, num_thresholds)
+
+ if reduce == "chunk":
+ if scalar_threshold:
+ return (
+ false_alarm[:, 0],
+ missed_detection[:, 0],
+ speaker_confusion[:, 0],
+ speech_total,
+ )
+ return false_alarm, missed_detection, speaker_confusion, speech_total
+
+ speech_total = torch.sum(speech_total, 0)
+ # scalar
+ false_alarm = torch.sum(false_alarm, 0)
+ missed_detection = torch.sum(missed_detection, 0)
+ speaker_confusion = torch.sum(speaker_confusion, 0)
# (num_thresholds, )
- speech_total = 1.0 * torch.sum(target)
-
if scalar_threshold:
- false_alarm = false_alarm[0]
- missed_detection = missed_detection[0]
- speaker_confusion = speaker_confusion[0]
+ return (
+ false_alarm[0],
+ missed_detection[0],
+ speaker_confusion[0],
+ speech_total,
+ )
return false_alarm, missed_detection, speaker_confusion, speech_total
@@ -131,6 +191,8 @@ def diarization_error_rate(
preds: torch.Tensor,
target: torch.Tensor,
threshold: Union[torch.Tensor, float] = 0.5,
+ reduce: str = "batch",
+ return_components: bool = False,
) -> torch.Tensor:
"""Compute diarization error rate
@@ -142,16 +204,32 @@ def diarization_error_rate(
(batch_size, num_speakers, num_frames)-shaped (0 or 1) targets.
threshold : float or torch.Tensor, optional
Threshold(s) used to binarize predictions. Defaults to 0.5.
+ reduce : {'batch', 'chunk', 'frame'}, optional
+ Reduction method. Defaults to 'batch'.
+ return_components : bool, optional
+ Return diarization error rate components as an additional tuple.
+ Defaults to False.
+
Returns
-------
- der : (num_thresholds, )-shaped torch.Tensor
- Aggregated diarization error rate
+ der : torch.Tensor
+ If `reduce` is 'batch', returns (num_thresholds, )-shaped tensors.
+ If `reduce` is 'chunk', returns (batch_size, num_thresholds)-shaped tensors.
+ If `reduce` is 'frame', returns (batch_size, num_frames, num_thresholds)-shaped tensors.
+ In case `threshold` is a float, the last dimension is removed from the output tensors.
+ components : (false_alarm, missed_detection, speaker_confusion, speech_total) tuple, optional
+ Same shape as `der`. Only returned when `return_components` is True.
+
"""
false_alarm, missed_detection, speaker_confusion, speech_total = _der_update(
- preds, target, threshold=threshold
+ preds, target, threshold=threshold, reduce=reduce
)
- return _der_compute(false_alarm, missed_detection, speaker_confusion, speech_total)
+
+ der = _der_compute(false_alarm, missed_detection, speaker_confusion, speech_total)
+ if return_components:
+ return der, (false_alarm, missed_detection, speaker_confusion, speech_total)
+ return der
def optimal_diarization_error_rate(
diff --git a/pyannote/audio/utils/loss.py b/pyannote/audio/utils/loss.py
index 2c55b26f3..55121a678 100644
--- a/pyannote/audio/utils/loss.py
+++ b/pyannote/audio/utils/loss.py
@@ -23,11 +23,13 @@
"""Frame-weighted versions of common loss functions"""
+from typing import Optional
+
import torch
import torch.nn.functional as F
-def interpolate(target: torch.Tensor, weight: torch.Tensor = None):
+def interpolate(target: torch.Tensor, weight: Optional[torch.Tensor] = None):
"""Interpolate weight to match target frame resolution
Parameters
@@ -55,7 +57,9 @@ def interpolate(target: torch.Tensor, weight: torch.Tensor = None):
def binary_cross_entropy(
- prediction: torch.Tensor, target: torch.Tensor, weight: torch.Tensor = None
+ prediction: torch.Tensor,
+ target: torch.Tensor,
+ weight: Optional[torch.Tensor] = None,
) -> torch.Tensor:
"""Frame-weighted binary cross entropy
@@ -91,7 +95,9 @@ def binary_cross_entropy(
def mse_loss(
- prediction: torch.Tensor, target: torch.Tensor, weight: torch.Tensor = None
+ prediction: torch.Tensor,
+ target: torch.Tensor,
+ weight: Optional[torch.Tensor] = None,
) -> torch.Tensor:
"""Frame-weighted mean-squared error loss
@@ -131,8 +137,8 @@ def mse_loss(
def nll_loss(
prediction: torch.Tensor,
target: torch.Tensor,
- class_weight: torch.Tensor = None,
- weight: torch.Tensor = None,
+ class_weight: Optional[torch.Tensor] = None,
+ weight: Optional[torch.Tensor] = None,
) -> torch.Tensor:
"""Frame-weighted negative log-likelihood loss
diff --git a/pyannote/audio/utils/params.py b/pyannote/audio/utils/params.py
index 685e01653..f4ed42bcc 100644
--- a/pyannote/audio/utils/params.py
+++ b/pyannote/audio/utils/params.py
@@ -1,8 +1,10 @@
# TODO - make it depth-recursive
# TODO - switch to Omegaconf maybe?
+from typing import Optional
-def merge_dict(defaults: dict, custom: dict = None):
+
+def merge_dict(defaults: dict, custom: Optional[dict] = None):
params = dict(defaults)
if custom is not None:
params.update(custom)
diff --git a/pyannote/audio/utils/powerset.py b/pyannote/audio/utils/powerset.py
index b75221e48..23f921569 100644
--- a/pyannote/audio/utils/powerset.py
+++ b/pyannote/audio/utils/powerset.py
@@ -25,7 +25,8 @@
# Alexis PLAQUET
from functools import cached_property
-from itertools import combinations
+from itertools import combinations, permutations
+from typing import Dict, Tuple
import scipy.special
import torch
@@ -65,6 +66,27 @@ def num_powerset_classes(self) -> int:
)
def build_mapping(self) -> torch.Tensor:
+ """Compute powerset to regular mapping
+
+ Returns
+ -------
+ mapping : (num_powerset_classes, num_classes) torch.Tensor
+ mapping[i, j] == 1 if jth regular class is a member of ith powerset class
+ mapping[i, j] == 0 otherwise
+
+ Example
+ -------
+ With num_classes == 3 and max_set_size == 2, returns
+
+ [0, 0, 0] # none
+ [1, 0, 0] # class #1
+ [0, 1, 0] # class #2
+ [0, 0, 1] # class #3
+ [1, 1, 0] # classes #1 and #2
+ [1, 0, 1] # classes #1 and #3
+ [0, 1, 1] # classes #2 and #3
+
+ """
mapping = torch.zeros(self.num_powerset_classes, self.num_classes)
powerset_k = 0
for set_size in range(0, self.max_set_size + 1):
@@ -76,13 +98,7 @@ def build_mapping(self) -> torch.Tensor:
def build_cardinality(self) -> torch.Tensor:
"""Compute size of each powerset class"""
- cardinality = torch.zeros(self.num_powerset_classes)
- powerset_k = 0
- for set_size in range(0, self.max_set_size + 1):
- for _ in combinations(range(self.num_classes), set_size):
- cardinality[powerset_k] = set_size
- powerset_k += 1
- return cardinality
+ return torch.sum(self.mapping, dim=1)
def to_multilabel(self, powerset: torch.Tensor, soft: bool = False) -> torch.Tensor:
"""Convert predictions from powerset to multi-label
@@ -93,7 +109,7 @@ def to_multilabel(self, powerset: torch.Tensor, soft: bool = False) -> torch.Ten
Soft predictions in "powerset" space.
soft : bool, optional
Return soft multi-label predictions. Defaults to False (i.e. hard predictions)
- Assumes that `powerset` are "logits" (not "probabilities").
+ Assumes that `powerset` are "log probabilities".
Returns
-------
@@ -138,3 +154,76 @@ def to_powerset(self, multilabel: torch.Tensor) -> torch.Tensor:
torch.argmax(torch.matmul(multilabel, self.mapping.T), dim=-1),
num_classes=self.num_powerset_classes,
)
+
+ def _permutation_powerset(
+ self, multilabel_permutation: Tuple[int, ...]
+ ) -> Tuple[int, ...]:
+ """Helper function for `permutation_mapping` property
+
+ Takes a (num_classes,)-shaped permutation in multilabel space and returns
+ the corresponding (num_powerset_classes,)-shaped permutation in powerset space.
+ This does not cache anything and only works on one single permutation at a time.
+
+ Parameters
+ ----------
+ multilabel_permutation : tuple of int
+ Permutation in multilabel space.
+
+ Returns
+ -------
+ powerset_permutation : tuple of int
+ Permutation in powerset space.
+
+ Example
+ -------
+ >>> powerset = Powerset(3, 2)
+ >>> powerset._permutation_powerset((1, 0, 2))
+ # (0, 2, 1, 3, 4, 6, 5)
+
+ """
+
+ permutated_mapping: torch.Tensor = self.mapping[:, multilabel_permutation]
+
+ arange = torch.arange(
+ self.num_classes, device=self.mapping.device, dtype=torch.int
+ )
+ powers_of_two = (2**arange).tile((self.num_powerset_classes, 1))
+
+ # compute the encoding of the powerset classes in this 2**N space, before and after
+ # permutation of the columns (mapping cols=labels, mapping rows=powerset classes)
+ before = torch.sum(self.mapping * powers_of_two, dim=-1)
+ after = torch.sum(permutated_mapping * powers_of_two, dim=-1)
+
+ # find before-to-after permutation
+ powerset_permutation = (before[None] == after[:, None]).int().argmax(dim=0)
+
+ # return as tuple of indices
+ return tuple(powerset_permutation.tolist())
+
+ @cached_property
+ def permutation_mapping(self) -> Dict[Tuple[int, ...], Tuple[int, ...]]:
+ """Mapping between multilabel and powerset permutations
+
+ Example
+ -------
+ With num_classes == 3 and max_set_size == 2, returns
+
+ {
+ (0, 1, 2): (0, 1, 2, 3, 4, 5, 6),
+ (0, 2, 1): (0, 1, 3, 2, 5, 4, 6),
+ (1, 0, 2): (0, 2, 1, 3, 4, 6, 5),
+ (1, 2, 0): (0, 2, 3, 1, 6, 4, 5),
+ (2, 0, 1): (0, 3, 1, 2, 5, 6, 4),
+ (2, 1, 0): (0, 3, 2, 1, 6, 5, 4)
+ }
+ """
+ permutation_mapping = {}
+
+ for multilabel_permutation in permutations(
+ range(self.num_classes), self.num_classes
+ ):
+ permutation_mapping[
+ tuple(multilabel_permutation)
+ ] = self._permutation_powerset(multilabel_permutation)
+
+ return permutation_mapping
diff --git a/pyannote/audio/utils/preprocessors.py b/pyannote/audio/utils/preprocessors.py
index ce4685d1f..b26553bf9 100644
--- a/pyannote/audio/utils/preprocessors.py
+++ b/pyannote/audio/utils/preprocessors.py
@@ -27,12 +27,13 @@
# Hervé BREDIN - http://herve.niderb.fr
from functools import reduce
-from itertools import chain
from typing import Dict, List, Optional, Set
from pyannote.core import Annotation, Segment
from pyannote.database import ProtocolFile
+from pyannote.audio.core.io import Audio, get_torchaudio_info
+
class LowerTemporalResolution:
"""Artificially degrade temporal resolution of reference annotation
@@ -50,7 +51,6 @@ def __init__(self, resolution: float = 0.1):
self.resolution = resolution
def __call__(self, current_file: ProtocolFile) -> Annotation:
-
annotation = current_file["annotation"]
new_annotation = annotation.empty()
@@ -128,3 +128,17 @@ def __call__(self, current_file: ProtocolFile) -> Annotation:
derived[seg] = intersect_label
return derived
+
+
+class Waveform:
+ def __init__(self):
+ self._audio = Audio()
+
+ def __call__(self, file: ProtocolFile):
+ waveform, _ = self._audio(file)
+ return waveform
+
+
+class SampleRate:
+ def __call__(self, file: ProtocolFile):
+ return get_torchaudio_info(file).sample_rate
diff --git a/pyannote/audio/utils/preview.py b/pyannote/audio/utils/preview.py
index fcdf4d124..1a5ace08c 100644
--- a/pyannote/audio/utils/preview.py
+++ b/pyannote/audio/utils/preview.py
@@ -47,7 +47,7 @@
MOVIEPY_INSTALLED = False
-from typing import Mapping
+from typing import Mapping, Optional
import torch
from pyannote.core import (
@@ -64,7 +64,7 @@
from pyannote.audio.utils.signal import Binarize
-def listen(audio_file: AudioFile, segment: Segment = None) -> None:
+def listen(audio_file: AudioFile, segment: Optional[Segment] = None) -> None:
"""listen to audio
Allows playing of audio files. It will play the whole thing unless
@@ -91,7 +91,7 @@ def listen(audio_file: AudioFile, segment: Segment = None) -> None:
def preview(
audio_file: AudioFile,
- segment: Segment = None,
+ segment: Optional[Segment] = None,
zoom: float = 10.0,
video_fps: int = 5,
video_ext: str = "webm",
diff --git a/pyannote/audio/utils/random.py b/pyannote/audio/utils/random.py
index 97d50c362..0006ad612 100644
--- a/pyannote/audio/utils/random.py
+++ b/pyannote/audio/utils/random.py
@@ -22,12 +22,13 @@
import os
+import zlib
from random import Random
import torch
-def create_rng_for_worker(epoch: int) -> Random:
+def create_rng_for_worker(model) -> Random:
"""Create worker-specific random number generator
This makes sure that
@@ -43,19 +44,23 @@ def create_rng_for_worker(epoch: int) -> Random:
# create random number generator
rng = Random()
- # create seed as a combination of PL_GLOBAL_SEED (set by pl.seed_everything())
- # and other PL multi-processing variables
- global_seed = int(os.environ.get("PL_GLOBAL_SEED", "0"))
- local_rank = int(os.environ.get("LOCAL_RANK", "0"))
- node_rank = int(os.environ.get("NODE_RANK", "0"))
-
+ global_seed = os.environ.get("PL_GLOBAL_SEED", "unset")
worker_info = torch.utils.data.get_worker_info()
if worker_info is None:
- worker_id = 0
+ worker_id = None
else:
worker_id = worker_info.id
- rng.seed(hash((global_seed, worker_id, local_rank, node_rank, epoch)))
+ seed_tuple = (
+ global_seed,
+ worker_id,
+ model.local_rank,
+ model.global_rank,
+ model.current_epoch,
+ )
+ # use adler32 because python's `hash` is not deterministic.
+ seed = zlib.adler32(str(seed_tuple).encode())
+ rng.seed(seed)
return rng
diff --git a/pyannote/audio/utils/receptive_field.py b/pyannote/audio/utils/receptive_field.py
new file mode 100644
index 000000000..420a62de0
--- /dev/null
+++ b/pyannote/audio/utils/receptive_field.py
@@ -0,0 +1,165 @@
+# MIT License
+#
+# Copyright (c) 2023 CNRS
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+from typing import List
+
+
+def conv1d_num_frames(
+ num_samples, kernel_size=5, stride=1, padding=0, dilation=1
+) -> int:
+ """Compute expected number of frames after 1D convolution
+
+ Parameters
+ ----------
+ num_samples : int
+ Number of samples in the input signal
+ kernel_size : int
+ Kernel size
+ stride : int
+ Stride
+ padding : int
+ Padding
+ dilation : int
+ Dilation
+
+ Returns
+ -------
+ num_frames : int
+ Number of frames in the output signal
+
+ Source
+ ------
+ https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d
+ """
+ return 1 + (num_samples + 2 * padding - dilation * (kernel_size - 1) - 1) // stride
+
+
+def multi_conv_num_frames(
+ num_samples: int,
+ kernel_size: List[int] = None,
+ stride: List[int] = None,
+ padding: List[int] = None,
+ dilation: List[int] = None,
+) -> int:
+ num_frames = num_samples
+ for k, s, p, d in zip(kernel_size, stride, padding, dilation):
+ num_frames = conv1d_num_frames(
+ num_frames, kernel_size=k, stride=s, padding=p, dilation=d
+ )
+
+ return num_frames
+
+
+def conv1d_receptive_field_size(
+ num_frames=1, kernel_size=5, stride=1, padding=0, dilation=1
+):
+ """Compute size of receptive field
+
+ Parameters
+ ----------
+ num_frames : int, optional
+ Number of frames in the output signal
+ kernel_size : int
+ Kernel size
+ stride : int
+ Stride
+ padding : int
+ Padding
+ dilation : int
+ Dilation
+
+ Returns
+ -------
+ size : int
+ Receptive field size
+ """
+
+ effective_kernel_size = 1 + (kernel_size - 1) * dilation
+ return effective_kernel_size + (num_frames - 1) * stride - 2 * padding
+
+
+def multi_conv_receptive_field_size(
+ num_frames: int,
+ kernel_size: List[int] = None,
+ stride: List[int] = None,
+ padding: List[int] = None,
+ dilation: List[int] = None,
+) -> int:
+ receptive_field_size = num_frames
+
+ for k, s, p, d in reversed(list(zip(kernel_size, stride, padding, dilation))):
+ receptive_field_size = conv1d_receptive_field_size(
+ num_frames=receptive_field_size,
+ kernel_size=k,
+ stride=s,
+ padding=p,
+ dilation=d,
+ )
+ return receptive_field_size
+
+
+def conv1d_receptive_field_center(
+ frame=0, kernel_size=5, stride=1, padding=0, dilation=1
+) -> int:
+ """Compute center of receptive field
+
+ Parameters
+ ----------
+ frame : int
+ Frame index
+ kernel_size : int
+ Kernel size
+ stride : int
+ Stride
+ padding : int
+ Padding
+ dilation : int
+ Dilation
+
+ Returns
+ -------
+ center : int
+ Index of receptive field center
+ """
+
+ effective_kernel_size = 1 + (kernel_size - 1) * dilation
+ return frame * stride + (effective_kernel_size - 1) // 2 - padding
+
+
+def multi_conv_receptive_field_center(
+ frame: int,
+ kernel_size: List[int] = None,
+ stride: List[int] = None,
+ padding: List[int] = None,
+ dilation: List[int] = None,
+) -> int:
+ receptive_field_center = frame
+ for k, s, p, d in reversed(list(zip(kernel_size, stride, padding, dilation))):
+ receptive_field_center = conv1d_receptive_field_center(
+ frame=receptive_field_center,
+ kernel_size=k,
+ stride=s,
+ padding=p,
+ dilation=d,
+ )
+
+ return receptive_field_center
diff --git a/requirements.txt b/requirements.txt
index 7e71fe024..3a0aa74dc 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -15,5 +15,5 @@ speechbrain >= 0.5.14
tensorboardX >= 2.6
torch >= 2.0.0
torch_audiomentations >= 0.11.0
-torchaudio >= 2.0.0
+torchaudio >= 2.2.0
torchmetrics >= 0.11.0
diff --git a/tests/data/database.yml b/tests/data/database.yml
index 608bf40b4..10d3fb084 100644
--- a/tests/data/database.yml
+++ b/tests/data/database.yml
@@ -2,6 +2,7 @@ Protocols:
Debug:
SpeakerDiarization:
Debug:
+ scope: database
train:
uri: debug.train.lst
annotation: debug.train.rttm
diff --git a/tests/data/debug.train.lst b/tests/data/debug.train.lst
index 16be824f4..471bf03d4 100644
--- a/tests/data/debug.train.lst
+++ b/tests/data/debug.train.lst
@@ -1,4 +1,4 @@
-trn00
+trñ00
trn01
trn02
trn03
diff --git a/tests/data/debug.train.rttm b/tests/data/debug.train.rttm
index 004a3f2eb..be89e2a4c 100644
--- a/tests/data/debug.train.rttm
+++ b/tests/data/debug.train.rttm
@@ -1,26 +1,26 @@
-SPEAKER trn00 1 3.168 0.800 MEO069
-SPEAKER trn00 1 5.463 0.640 MEO069
+SPEAKER trn00 1 3.168 0.800 MÉO069
+SPEAKER trn00 1 5.463 0.640 MÉO069
SPEAKER trn00 1 5.496 0.574 MEE068
-SPEAKER trn00 1 10.454 0.499 MEO069
+SPEAKER trn00 1 10.454 0.499 MÉO069
SPEAKER trn00 1 11.040 4.592 MEE068
-SPEAKER trn00 1 16.736 1.410 MEO069
+SPEAKER trn00 1 16.736 1.410 MÉO069
SPEAKER trn00 1 16.980 2.778 MEE067
SPEAKER trn00 1 18.883 0.490 MEE068
-SPEAKER trn00 1 18.985 1.831 MEO069
+SPEAKER trn00 1 18.985 1.831 MÉO069
SPEAKER trn00 1 20.944 0.447 MEE067
SPEAKER trn00 1 21.392 4.465 MEE068
-SPEAKER trn00 1 22.928 0.384 MEO069
-SPEAKER trn00 1 25.001 2.471 MEO069
+SPEAKER trn00 1 22.928 0.384 MÉO069
+SPEAKER trn00 1 25.001 2.471 MÉO069
SPEAKER trn00 1 28.033 1.967 MEE068
SPEAKER trn01 1 2.977 0.391 FEO066
SPEAKER trn01 1 18.705 0.964 MEE068
SPEAKER trn01 1 22.269 0.457 FEO065
-SPEAKER trn01 1 28.474 1.526 MEO069
+SPEAKER trn01 1 28.474 1.526 MÉO069
SPEAKER trn01 1 28.593 1.407 FEO066
SPEAKER trn01 1 28.993 1.007 FEO065
SPEAKER trn02 1 20.704 0.688 FEO066
SPEAKER trn03 1 0.000 1.184 MEE067
-SPEAKER trn03 1 1.104 28.896 MEO069
+SPEAKER trn03 1 1.104 28.896 MÉO069
SPEAKER trn04 1 14.032 1.744 MEE076
SPEAKER trn04 1 14.345 2.471 MEO074
SPEAKER trn04 1 16.736 7.216 MEE075
diff --git a/tests/data/trn00.wav "b/tests/data/tr\303\26100.wav"
similarity index 100%
rename from tests/data/trn00.wav
rename to "tests/data/tr\303\26100.wav"
diff --git a/tests/inference_test.py b/tests/inference_test.py
index bd5040394..da28b79d2 100644
--- a/tests/inference_test.py
+++ b/tests/inference_test.py
@@ -9,7 +9,7 @@
from pyannote.audio.models.segmentation.debug import SimpleSegmentationModel
from pyannote.audio.tasks import VoiceActivityDetection
-HF_SAMPLE_MODEL_ID = "pyannote/TestModelForContinuousIntegration"
+HF_SAMPLE_MODEL_ID = "pyannote/ci-segmentation"
def test_hf_download_inference():
diff --git a/tests/tasks/test_reproducibility.py b/tests/tasks/test_reproducibility.py
index a7307e0cf..88f2b2d8a 100644
--- a/tests/tasks/test_reproducibility.py
+++ b/tests/tasks/test_reproducibility.py
@@ -3,7 +3,7 @@
from pyannote.database import FileFinder, get_protocol
from pyannote.audio.models.segmentation.debug import SimpleSegmentationModel
-from pyannote.audio.tasks import MultiLabelSegmentation, VoiceActivityDetection
+from pyannote.audio.tasks import VoiceActivityDetection
def setup_tasks(task):
@@ -16,7 +16,8 @@ def setup_tasks(task):
def create_dl(model, task):
m = model(task=task)
- m.setup("fit")
+ m.prepare_data()
+ m.setup()
return task.train_dataloader()
@@ -31,35 +32,32 @@ def get_next5(dl):
def test_seeding_ensures_data_loaders():
"Setting a global seed for the dataloaders ensures that we get data back in the same order"
- for task in [VoiceActivityDetection, MultiLabelSegmentation]:
+ seed_everything(1)
+ protocol, vad = setup_tasks(VoiceActivityDetection)
+ dl = create_dl(SimpleSegmentationModel, vad)
+ last5a = get_next5(dl)
- seed_everything(1)
- protocol, vad = setup_tasks(task)
- dl = create_dl(SimpleSegmentationModel, vad)
- last5a = get_next5(dl)
+ seed_everything(1)
+ protocol, vad = setup_tasks(VoiceActivityDetection)
+ dl = create_dl(SimpleSegmentationModel, vad)
+ last5b = get_next5(dl)
- seed_everything(1)
- protocol, vad = setup_tasks(task)
- dl = create_dl(SimpleSegmentationModel, vad)
- last5b = get_next5(dl)
-
- for i in range(len(last5b)):
- assert torch.equal(last5a[i]["X"], last5b[i]["X"])
+ for i in range(len(last5b)):
+ assert torch.equal(last5a[i]["X"], last5b[i]["X"])
def test_different_seeds():
"Changing the global seed will change the order of the data that loads"
- for task in [VoiceActivityDetection, MultiLabelSegmentation]:
- protocol, vad = setup_tasks(task)
- seed_everything(4)
- dl = create_dl(SimpleSegmentationModel, vad)
- last5a = get_next5(dl)
+ protocol, vad = setup_tasks(VoiceActivityDetection)
+ seed_everything(4)
+ dl = create_dl(SimpleSegmentationModel, vad)
+ last5a = get_next5(dl)
- protocol, vad = setup_tasks(task)
- seed_everything(5)
- dl = create_dl(SimpleSegmentationModel, vad)
- last5b = get_next5(dl)
+ protocol, vad = setup_tasks(VoiceActivityDetection)
+ seed_everything(5)
+ dl = create_dl(SimpleSegmentationModel, vad)
+ last5b = get_next5(dl)
- for i in range(5):
- assert not torch.equal(last5a[i]["X"], last5b[i]["X"])
+ for i in range(5):
+ assert not torch.equal(last5a[i]["X"], last5b[i]["X"])
diff --git a/tests/tasks/test_specifications.py b/tests/tasks/test_specifications.py
new file mode 100644
index 000000000..32b816155
--- /dev/null
+++ b/tests/tasks/test_specifications.py
@@ -0,0 +1,27 @@
+import pytest
+from pyannote.database import FileFinder, get_protocol
+
+from pyannote.audio.core.model import Model
+from pyannote.audio.core.task import UnknownSpecificationsError
+from pyannote.audio.tasks import SpeakerDiarization
+
+
+@pytest.fixture()
+def protocol():
+ return get_protocol(
+ "Debug.SpeakerDiarization.Debug", preprocessors={"audio": FileFinder()}
+ )
+
+
+def test_unknown_specifications_error_raised_on_non_setup_task(protocol):
+ task = SpeakerDiarization(protocol=protocol)
+ with pytest.raises(UnknownSpecificationsError):
+ _ = task.specifications
+
+
+def test_unknown_specifications_error_raised_on_non_setup_model_task(protocol):
+ task = SpeakerDiarization(protocol=protocol)
+ model = Model.from_pretrained("pyannote/ci-segmentation")
+ model.task = task
+ with pytest.raises(UnknownSpecificationsError):
+ _ = model.specifications
diff --git a/tests/test_cli.py b/tests/test_cli.py
new file mode 100644
index 000000000..f62cc5260
--- /dev/null
+++ b/tests/test_cli.py
@@ -0,0 +1,149 @@
+# The MIT License (MIT)
+#
+# Copyright (c) 2024- CNRS
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import subprocess
+
+import pytest
+from pyannote.database import FileFinder, get_protocol
+
+
+@pytest.fixture()
+def protocol():
+ return get_protocol(
+ "Debug.SpeakerDiarization.Debug", preprocessors={"audio": FileFinder()}
+ )
+
+
+@pytest.fixture()
+def database():
+ return "./tests/data/database.yml"
+
+
+@pytest.fixture()
+def model():
+ return "pyannote/ci-segmentation"
+
+
+def test_cli_train_vad(database, protocol):
+ res = subprocess.run(
+ [
+ "pyannote-audio-train",
+ "model=DebugSegmentation",
+ "task=VoiceActivityDetection",
+ f"+registry={database}",
+ f"protocol={protocol.name}",
+ "trainer=fast_dev_run",
+ "hydra.run.dir=.", # run hydra app in current directory
+ "hydra.output_subdir=null", # disable hydra outputs
+ "hydra/hydra_logging=disabled",
+ "hydra/job_logging=disabled",
+ ]
+ )
+ assert res.returncode == 0
+
+
+def test_cli_train_segmentation(database, protocol):
+ res = subprocess.run(
+ [
+ "pyannote-audio-train",
+ "model=DebugSegmentation",
+ "task=SpeakerDiarization",
+ f"+registry={database}",
+ f"protocol={protocol.name}",
+ "trainer=fast_dev_run",
+ "hydra.run.dir=.", # run hydra app in current directory
+ "hydra.output_subdir=null", # disable hydra outputs
+ "hydra/hydra_logging=disabled",
+ "hydra/job_logging=disabled",
+ ]
+ )
+ assert res.returncode == 0
+
+
+def test_cli_train_osd(database, protocol):
+ res = subprocess.run(
+ [
+ "pyannote-audio-train",
+ "model=DebugSegmentation",
+ "task=OverlappedSpeechDetection",
+ f"+registry={database}",
+ f"protocol={protocol.name}",
+ "trainer=fast_dev_run",
+ "hydra.run.dir=.", # run hydra app in current directory
+ "hydra.output_subdir=null", # disable hydra outputs
+ "hydra/hydra_logging=disabled",
+ "hydra/job_logging=disabled",
+ ]
+ )
+ assert res.returncode == 0
+
+
+def test_cli_train_supervised_representation_with_arcface(database, protocol):
+ res = subprocess.run(
+ [
+ "pyannote-audio-train",
+ "model=DebugEmbedding",
+ "task=SpeakerEmbedding",
+ f"+registry={database}",
+ f"protocol={protocol.name}",
+ "trainer=fast_dev_run",
+ "hydra.run.dir=.", # run hydra app in current directory
+ "hydra.output_subdir=null", # disable hydra outputs
+ "hydra/hydra_logging=disabled",
+ "hydra/job_logging=disabled",
+ ]
+ )
+ assert res.returncode == 0
+
+
+def test_cli_train_segmentation_with_pyannet(database, protocol):
+ res = subprocess.run(
+ [
+ "pyannote-audio-train",
+ "model=PyanNet",
+ "task=SpeakerDiarization",
+ f"+registry={database}",
+ f"protocol={protocol.name}",
+ "trainer=fast_dev_run",
+ "hydra.run.dir=.", # run hydra app in current directory
+ "hydra.output_subdir=null", # disable hydra outputs
+ "hydra/hydra_logging=disabled",
+ "hydra/job_logging=disabled",
+ ]
+ )
+ assert res.returncode == 0
+
+
+def test_cli_eval_segmentation_model(database, protocol, model):
+ res = subprocess.run(
+ [
+ "pyannote-audio-eval",
+ f"model={model}",
+ f"+registry={database}",
+ f"protocol={protocol.name}",
+ "hydra.run.dir=.", # run hydra app in current directory
+ "hydra.output_subdir=null", # disable hydra outputs
+ "hydra/hydra_logging=disabled",
+ "hydra/job_logging=disabled",
+ ]
+ )
+ assert res.returncode == 0
diff --git a/tests/test_metrics.py b/tests/test_metrics.py
new file mode 100644
index 000000000..c366dc091
--- /dev/null
+++ b/tests/test_metrics.py
@@ -0,0 +1,142 @@
+# MIT License
+#
+# Copyright (c) 2024- CNRS
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import pytest
+import torch
+
+from pyannote.audio.torchmetrics.functional.audio.diarization_error_rate import (
+ _der_update,
+ diarization_error_rate,
+)
+
+
+@pytest.fixture
+def target():
+ chunk1 = [[0, 0], [1, 0], [1, 0], [1, 1], [1, 1], [0, 1], [0, 1]]
+ chunk2 = [[0, 0], [0, 0], [1, 0], [1, 0], [1, 0], [1, 0], [0, 0]]
+ return torch.tensor([chunk1, chunk2], dtype=torch.float32).transpose(2, 1)
+
+
+@pytest.fixture
+def prediction():
+ chunk1 = [[0, 0], [1, 0], [0, 0], [1, 1], [0, 1], [1, 1], [1, 0]]
+ chunk2 = [[0, 0], [0, 0], [0, 1], [0, 1], [0, 1], [1, 1], [1, 0]]
+ return torch.tensor([chunk1, chunk2], dtype=torch.float32).transpose(2, 1)
+
+
+def test_frame_reduction(target, prediction):
+ false_alarm, missed_detection, speaker_confusion, speech_total = _der_update(
+ prediction, target, reduce="frame"
+ )
+
+ torch.testing.assert_close(
+ false_alarm,
+ torch.Tensor(
+ [[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0]]
+ ),
+ )
+
+ torch.testing.assert_close(
+ missed_detection,
+ torch.Tensor(
+ [
+ [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0],
+ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
+ ]
+ ),
+ )
+
+ torch.testing.assert_close(
+ speaker_confusion,
+ torch.Tensor(
+ [[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
+ ),
+ )
+
+ torch.testing.assert_close(
+ speech_total,
+ torch.Tensor(
+ [[0.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0]]
+ ),
+ )
+
+
+def test_chunk_reduction(target, prediction):
+ false_alarm, missed_detection, speaker_confusion, speech_total = _der_update(
+ prediction, target, reduce="chunk"
+ )
+
+ torch.testing.assert_close(
+ false_alarm,
+ torch.Tensor([1.0, 2.0]),
+ )
+
+ torch.testing.assert_close(
+ missed_detection,
+ torch.Tensor([2.0, 0.0]),
+ )
+
+ torch.testing.assert_close(
+ speaker_confusion,
+ torch.Tensor([1.0, 0.0]),
+ )
+
+ torch.testing.assert_close(
+ speech_total,
+ torch.Tensor([8.0, 4.0]),
+ )
+
+
+def test_batch_reduction(target, prediction):
+ false_alarm, missed_detection, speaker_confusion, speech_total = _der_update(
+ prediction, target, reduce="batch"
+ )
+ torch.testing.assert_close(false_alarm.item(), 3.0)
+ torch.testing.assert_close(missed_detection.item(), 2.0)
+ torch.testing.assert_close(speaker_confusion.item(), 1.0)
+ torch.testing.assert_close(speech_total.item(), 12.0)
+
+
+def test_batch_der(target, prediction):
+ der = diarization_error_rate(prediction, target, reduce="batch")
+ torch.testing.assert_close(der.item(), (3.0 + 2.0 + 1.0) / 12.0)
+
+
+def test_batch_der_with_components(target, prediction):
+ der, (
+ false_alarm,
+ missed_detection,
+ speaker_confusion,
+ speech_total,
+ ) = diarization_error_rate(
+ prediction, target, reduce="batch", return_components=True
+ )
+ torch.testing.assert_close(der.item(), (3.0 + 2.0 + 1.0) / 12.0)
+ torch.testing.assert_close(false_alarm.item(), 3.0)
+ torch.testing.assert_close(missed_detection.item(), 2.0)
+ torch.testing.assert_close(speaker_confusion.item(), 1.0)
+ torch.testing.assert_close(speech_total.item(), 12.0)
+
+
+def test_chunk_der(target, prediction):
+ der = diarization_error_rate(prediction, target, reduce="chunk")
+ torch.testing.assert_close(der, torch.Tensor([4.0 / 8.0, 2.0 / 4.0]))
diff --git a/tests/test_sample.py b/tests/test_sample.py
new file mode 100644
index 000000000..d47dc9613
--- /dev/null
+++ b/tests/test_sample.py
@@ -0,0 +1,28 @@
+# The MIT License (MIT)
+#
+# Copyright (c) 2024- CNRS
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+
+def test_sample():
+ from pyannote.audio.sample import SAMPLE_FILE
+
+ assert "annotation" in SAMPLE_FILE
+ assert "annotated" in SAMPLE_FILE
diff --git a/tests/test_train.py b/tests/test_train.py
index 7a7bfe338..6a7a6c69b 100644
--- a/tests/test_train.py
+++ b/tests/test_train.py
@@ -1,11 +1,39 @@
+# The MIT License (MIT)
+#
+# Copyright (c) 2024- CNRS
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+
+from tempfile import mkstemp
+
import pytest
from pyannote.database import FileFinder, get_protocol
from pytorch_lightning import Trainer
+from pyannote.audio.models.embedding.debug import SimpleEmbeddingModel
from pyannote.audio.models.segmentation.debug import SimpleSegmentationModel
from pyannote.audio.tasks import (
+ MultiLabelSegmentation,
OverlappedSpeechDetection,
SpeakerDiarization,
+ SupervisedRepresentationLearningWithArcFace,
VoiceActivityDetection,
)
@@ -17,6 +45,31 @@ def protocol():
)
+@pytest.fixture()
+def cache():
+ return mkstemp()[1]
+
+
+@pytest.fixture()
+def gender_protocol():
+ def to_gender(file):
+ annotation = file["annotation"]
+ mapping = {label: label[0] for label in annotation.labels()}
+ return annotation.rename_labels(mapping)
+
+ def classes(file):
+ return ["M", "F"]
+
+ return get_protocol(
+ "Debug.SpeakerDiarization.Debug",
+ preprocessors={
+ "audio": FileFinder(),
+ "annotation": to_gender,
+ "classes": classes,
+ },
+ )
+
+
def test_train_segmentation(protocol):
segmentation = SpeakerDiarization(protocol)
model = SimpleSegmentationModel(task=segmentation)
@@ -24,6 +77,48 @@ def test_train_segmentation(protocol):
trainer.fit(model)
+def test_train_segmentation_with_cached_data_mono_device(protocol, cache):
+ first_task = SpeakerDiarization(protocol, cache=cache)
+ first_model = SimpleSegmentationModel(task=first_task)
+ first_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ first_trainer.fit(first_model)
+
+ second_task = SpeakerDiarization(protocol, cache=cache)
+ second_model = SimpleSegmentationModel(task=second_task)
+ second_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ second_trainer.fit(second_model)
+
+
+def test_train_multilabel_segmentation(gender_protocol):
+ multilabel_segmentation = MultiLabelSegmentation(gender_protocol)
+ model = SimpleSegmentationModel(task=multilabel_segmentation)
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+
+def test_train_multilabel_segmentation_with_cached_data_mono_device(
+ gender_protocol, cache
+):
+ first_task = MultiLabelSegmentation(gender_protocol, cache=cache)
+ first_model = SimpleSegmentationModel(task=first_task)
+ first_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ first_trainer.fit(first_model)
+
+ second_task = MultiLabelSegmentation(gender_protocol, cache=cache)
+ second_model = SimpleSegmentationModel(task=second_task)
+ second_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ second_trainer.fit(second_model)
+
+
+def test_train_supervised_representation_with_arcface(protocol):
+ supervised_representation_with_arface = SupervisedRepresentationLearningWithArcFace(
+ protocol
+ )
+ model = SimpleEmbeddingModel(task=supervised_representation_with_arface)
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+
def test_train_voice_activity_detection(protocol):
voice_activity_detection = VoiceActivityDetection(protocol)
model = SimpleSegmentationModel(task=voice_activity_detection)
@@ -31,6 +126,18 @@ def test_train_voice_activity_detection(protocol):
trainer.fit(model)
+def test_train_voice_activity_detection_with_cached_data_mono_device(protocol, cache):
+ first_task = VoiceActivityDetection(protocol, cache=cache)
+ first_model = SimpleSegmentationModel(task=first_task)
+ first_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ first_trainer.fit(first_model)
+
+ second_task = VoiceActivityDetection(protocol, cache=cache)
+ second_model = SimpleSegmentationModel(task=second_task)
+ second_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ second_trainer.fit(second_model)
+
+
def test_train_overlapped_speech_detection(protocol):
overlapped_speech_detection = OverlappedSpeechDetection(protocol)
model = SimpleSegmentationModel(task=overlapped_speech_detection)
@@ -38,6 +145,20 @@ def test_train_overlapped_speech_detection(protocol):
trainer.fit(model)
+def test_train_overlapped_speech_detection_with_cached_data_mono_device(
+ protocol, cache
+):
+ first_task = OverlappedSpeechDetection(protocol, cache=cache)
+ first_model = SimpleSegmentationModel(task=first_task)
+ first_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ first_trainer.fit(first_model)
+
+ second_task = OverlappedSpeechDetection(protocol, cache=cache)
+ second_model = SimpleSegmentationModel(task=second_task)
+ second_trainer = Trainer(fast_dev_run=True, accelerator="cpu", devices=1)
+ second_trainer.fit(second_model)
+
+
def test_finetune_with_task_that_does_not_need_setup_for_specs(protocol):
voice_activity_detection = VoiceActivityDetection(protocol)
model = SimpleSegmentationModel(task=voice_activity_detection)
@@ -62,6 +183,18 @@ def test_finetune_with_task_that_needs_setup_for_specs(protocol):
trainer.fit(model)
+def test_finetune_with_task_that_needs_setup_for_specs_and_with_cache(protocol, cache):
+ segmentation = SpeakerDiarization(protocol, cache=cache)
+ model = SimpleSegmentationModel(task=segmentation)
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+ segmentation = SpeakerDiarization(protocol, cache=cache)
+ model.task = segmentation
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+
def test_transfer_with_task_that_does_not_need_setup_for_specs(protocol):
segmentation = SpeakerDiarization(protocol)
model = SimpleSegmentationModel(task=segmentation)
@@ -94,7 +227,22 @@ def test_finetune_freeze_with_task_that_needs_setup_for_specs(protocol):
segmentation = SpeakerDiarization(protocol)
model.task = segmentation
- model.freeze_up_to("mfcc")
+ model.freeze_by_name("mfcc")
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+
+def test_finetune_freeze_with_task_that_needs_setup_for_specs_and_with_cache(
+ protocol, cache
+):
+ segmentation = SpeakerDiarization(protocol, cache=cache)
+ model = SimpleSegmentationModel(task=segmentation)
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+ segmentation = SpeakerDiarization(protocol, cache=cache)
+ model.task = segmentation
+ model.freeze_by_name("mfcc")
trainer = Trainer(fast_dev_run=True, accelerator="cpu")
trainer.fit(model)
@@ -107,7 +255,23 @@ def test_finetune_freeze_with_task_that_does_not_need_setup_for_specs(protocol):
vad = VoiceActivityDetection(protocol)
model.task = vad
- model.freeze_up_to("mfcc")
+ model.freeze_by_name("mfcc")
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+
+def test_finetune_freeze_with_task_that_does_not_need_setup_for_specs_and_with_cache(
+ protocol,
+ cache,
+):
+ vad = VoiceActivityDetection(protocol, cache=cache)
+ model = SimpleSegmentationModel(task=vad)
+ trainer = Trainer(fast_dev_run=True, accelerator="cpu")
+ trainer.fit(model)
+
+ vad = VoiceActivityDetection(protocol, cache=cache)
+ model.task = vad
+ model.freeze_by_name("mfcc")
trainer = Trainer(fast_dev_run=True, accelerator="cpu")
trainer.fit(model)
@@ -120,7 +284,7 @@ def test_transfer_freeze_with_task_that_does_not_need_setup_for_specs(protocol):
voice_activity_detection = VoiceActivityDetection(protocol)
model.task = voice_activity_detection
- model.freeze_up_to("mfcc")
+ model.freeze_by_name("mfcc")
trainer = Trainer(fast_dev_run=True, accelerator="cpu")
trainer.fit(model)
@@ -133,6 +297,6 @@ def test_transfer_freeze_with_task_that_needs_setup_for_specs(protocol):
segmentation = SpeakerDiarization(protocol)
model.task = segmentation
- model.freeze_up_to("mfcc")
+ model.freeze_by_name("mfcc")
trainer = Trainer(fast_dev_run=True, accelerator="cpu")
trainer.fit(model)
diff --git a/tests/utils/test_powerset.py b/tests/utils/test_powerset.py
index dd12eed41..e04480753 100644
--- a/tests/utils/test_powerset.py
+++ b/tests/utils/test_powerset.py
@@ -27,10 +27,8 @@
def test_roundtrip():
-
for num_classes in range(2, 5):
for max_set_size in range(1, num_classes + 1):
-
powerset = Powerset(num_classes, max_set_size)
# simulate a sequence where each frame is assigned to a different powerset class
@@ -51,3 +49,28 @@ def test_roundtrip():
reconstruction = powerset.to_powerset(batch_multilabel)
assert torch.equal(batch_powerset, reconstruction)
+
+
+def test_permutate_powerset():
+ for num_classes in range(1, 6):
+ for max_set_size in range(1, num_classes + 1):
+ powerset = Powerset(num_classes, max_set_size)
+
+ # create (num_powerset_class, num_powerset_class)-shaped tensor, where each frame is assigned to a different powerset class
+ # and convert it to its multi-label equivalent
+ t1 = torch.nn.functional.one_hot(
+ torch.arange(powerset.num_powerset_classes),
+ powerset.num_powerset_classes,
+ )
+ t1_ml = powerset.to_multilabel(t1)
+
+ # then permutate the powerset class in powerset space AND the multilabel equivalent in its native space
+ # and check it has the same result.
+ # perm = torch.randperm(num_classes)
+ perm = tuple(torch.randperm(num_classes).tolist())
+ t1_ml_perm = t1_ml[:, perm]
+ perm_ps = powerset.permutation_mapping[perm]
+ t1_ps_perm = t1[..., perm_ps]
+ t1_ps_perm_ml = powerset.to_multilabel(t1_ps_perm)
+
+ assert t1_ml_perm.equal(t1_ps_perm_ml)
diff --git a/tutorials/MRE_template.ipynb b/tutorials/MRE_template.ipynb
new file mode 100644
index 000000000..70ffacbc2
--- /dev/null
+++ b/tutorials/MRE_template.ipynb
@@ -0,0 +1,2220 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "gpuType": "T4",
+ "authorship_tag": "ABX9TyNUZLZoYLpzG6gIYECEOuiV",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ },
+ "accelerator": "GPU",
+ "widgets": {
+ "application/vnd.jupyter.widget-state+json": {
+ "3d0fe95350234ab599497683ae6d4ce6": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HBoxModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HBoxModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HBoxView",
+ "box_style": "",
+ "children": [
+ "IPY_MODEL_239beda16fde4b1d9e6abfb47f036040",
+ "IPY_MODEL_2d6f5fe6f9ab4a8b853e7b023aaee35f",
+ "IPY_MODEL_771629a0fbab4b0e9ad6425b5affe380"
+ ],
+ "layout": "IPY_MODEL_8763b8e4c8104b879d4324257a810c0f"
+ }
+ },
+ "239beda16fde4b1d9e6abfb47f036040": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_22536aef3997408dbf5ea58241f01e3e",
+ "placeholder": "",
+ "style": "IPY_MODEL_2a6a5f33d7c34b6b996ab7a5b7c2f11d",
+ "value": "config.yaml: 100%"
+ }
+ },
+ "2d6f5fe6f9ab4a8b853e7b023aaee35f": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "FloatProgressModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "FloatProgressModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "ProgressView",
+ "bar_style": "success",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_c348b2f57c6c437fa8a9a2688bf17f4f",
+ "max": 469,
+ "min": 0,
+ "orientation": "horizontal",
+ "style": "IPY_MODEL_7526127b349a4467a6d3083d92bef5ec",
+ "value": 469
+ }
+ },
+ "771629a0fbab4b0e9ad6425b5affe380": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_30efc0ed76fb496483d4bb425a930636",
+ "placeholder": "",
+ "style": "IPY_MODEL_e760e231084f4889bb489bc550b7a816",
+ "value": " 469/469 [00:00<00:00, 21.8kB/s]"
+ }
+ },
+ "8763b8e4c8104b879d4324257a810c0f": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "22536aef3997408dbf5ea58241f01e3e": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "2a6a5f33d7c34b6b996ab7a5b7c2f11d": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "c348b2f57c6c437fa8a9a2688bf17f4f": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "7526127b349a4467a6d3083d92bef5ec": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ProgressStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ProgressStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "bar_color": null,
+ "description_width": ""
+ }
+ },
+ "30efc0ed76fb496483d4bb425a930636": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "e760e231084f4889bb489bc550b7a816": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "9951dce8a6c947dd9a2ed6106cb68f30": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HBoxModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HBoxModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HBoxView",
+ "box_style": "",
+ "children": [
+ "IPY_MODEL_d68c24e62487484b83bbb06271a6981c",
+ "IPY_MODEL_467f3c9f15184784aa913cb3ae46700f",
+ "IPY_MODEL_c87055ce8cef498f999a20ff2b34e1b8"
+ ],
+ "layout": "IPY_MODEL_63d957e24276476b9bf1be150675217c"
+ }
+ },
+ "d68c24e62487484b83bbb06271a6981c": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_afe35fd2ea654d7b861046e847f82e51",
+ "placeholder": "",
+ "style": "IPY_MODEL_c5f4f5f5fbad4d14973dc21eae2358d4",
+ "value": "pytorch_model.bin: 100%"
+ }
+ },
+ "467f3c9f15184784aa913cb3ae46700f": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "FloatProgressModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "FloatProgressModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "ProgressView",
+ "bar_style": "success",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_b4faa4c7fd01498abe07fef6189de4d4",
+ "max": 5905440,
+ "min": 0,
+ "orientation": "horizontal",
+ "style": "IPY_MODEL_35ab2642d8a5481780c43eaef7044f3c",
+ "value": 5905440
+ }
+ },
+ "c87055ce8cef498f999a20ff2b34e1b8": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_37de7e413b25451ca5288ea5f43dd2d5",
+ "placeholder": "",
+ "style": "IPY_MODEL_47012c0ff9394d77b8d857a933b4082e",
+ "value": " 5.91M/5.91M [00:00<00:00, 66.3MB/s]"
+ }
+ },
+ "63d957e24276476b9bf1be150675217c": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "afe35fd2ea654d7b861046e847f82e51": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "c5f4f5f5fbad4d14973dc21eae2358d4": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "b4faa4c7fd01498abe07fef6189de4d4": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "35ab2642d8a5481780c43eaef7044f3c": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ProgressStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ProgressStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "bar_color": null,
+ "description_width": ""
+ }
+ },
+ "37de7e413b25451ca5288ea5f43dd2d5": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "47012c0ff9394d77b8d857a933b4082e": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "d3ca3e02944c49f88d2b232295b19293": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HBoxModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HBoxModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HBoxView",
+ "box_style": "",
+ "children": [
+ "IPY_MODEL_392a6347341f4dac94cad19e1f0bf02b",
+ "IPY_MODEL_ba0102afa62c443eb9de89412301bb46",
+ "IPY_MODEL_b71a286118004001a9a65ece5ba352d2"
+ ],
+ "layout": "IPY_MODEL_29b969a7441d41059969962f338d0def"
+ }
+ },
+ "392a6347341f4dac94cad19e1f0bf02b": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_3d0be590ca374ef1a768b8d3577d3105",
+ "placeholder": "",
+ "style": "IPY_MODEL_c0ac81654e1c4904a87a2c777b520c09",
+ "value": "config.yaml: 100%"
+ }
+ },
+ "ba0102afa62c443eb9de89412301bb46": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "FloatProgressModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "FloatProgressModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "ProgressView",
+ "bar_style": "success",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_36f8eb5005864f779aa8185acf55e2c5",
+ "max": 399,
+ "min": 0,
+ "orientation": "horizontal",
+ "style": "IPY_MODEL_5cbb5cf6a37d451c87215851ee8b0b65",
+ "value": 399
+ }
+ },
+ "b71a286118004001a9a65ece5ba352d2": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_ca3ee9d356bc464e82ee061375cc4ef2",
+ "placeholder": "",
+ "style": "IPY_MODEL_67e4c4a2768b40eca34e8add6265c40c",
+ "value": " 399/399 [00:00<00:00, 28.2kB/s]"
+ }
+ },
+ "29b969a7441d41059969962f338d0def": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "3d0be590ca374ef1a768b8d3577d3105": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "c0ac81654e1c4904a87a2c777b520c09": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "36f8eb5005864f779aa8185acf55e2c5": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "5cbb5cf6a37d451c87215851ee8b0b65": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ProgressStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ProgressStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "bar_color": null,
+ "description_width": ""
+ }
+ },
+ "ca3ee9d356bc464e82ee061375cc4ef2": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "67e4c4a2768b40eca34e8add6265c40c": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "2149e37d14e94f82a77386682ba29195": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HBoxModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HBoxModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HBoxView",
+ "box_style": "",
+ "children": [
+ "IPY_MODEL_501eb1f4c1a8458a86c917d7dac5eeb5",
+ "IPY_MODEL_7a115cf3065f4acdb0ae13577447a333",
+ "IPY_MODEL_532d713a67c94b0d8cf0d61be72c73e7"
+ ],
+ "layout": "IPY_MODEL_3f36703cc5bc4fbcbeb97ab722e573cc"
+ }
+ },
+ "501eb1f4c1a8458a86c917d7dac5eeb5": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_8088c86a3d394f9e88267196b369dfb9",
+ "placeholder": "",
+ "style": "IPY_MODEL_46c5a63bcc3a472284869bcb11407bed",
+ "value": "pytorch_model.bin: 100%"
+ }
+ },
+ "7a115cf3065f4acdb0ae13577447a333": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "FloatProgressModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "FloatProgressModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "ProgressView",
+ "bar_style": "success",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_754c1729ba074d518557bd53c01e3787",
+ "max": 26645418,
+ "min": 0,
+ "orientation": "horizontal",
+ "style": "IPY_MODEL_bbe45325b7b74832bf6fe4a9769e0565",
+ "value": 26645418
+ }
+ },
+ "532d713a67c94b0d8cf0d61be72c73e7": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_9261da2864ce4fd9b0f05f2a2efc54a3",
+ "placeholder": "",
+ "style": "IPY_MODEL_940ed223c43d40e5b85eadf3a5ef0382",
+ "value": " 26.6M/26.6M [00:00<00:00, 172MB/s]"
+ }
+ },
+ "3f36703cc5bc4fbcbeb97ab722e573cc": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "8088c86a3d394f9e88267196b369dfb9": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "46c5a63bcc3a472284869bcb11407bed": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "754c1729ba074d518557bd53c01e3787": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "bbe45325b7b74832bf6fe4a9769e0565": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ProgressStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ProgressStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "bar_color": null,
+ "description_width": ""
+ }
+ },
+ "9261da2864ce4fd9b0f05f2a2efc54a3": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "940ed223c43d40e5b85eadf3a5ef0382": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "ef97ca55c62b40f7bb233120f8efba62": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HBoxModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HBoxModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HBoxView",
+ "box_style": "",
+ "children": [
+ "IPY_MODEL_2da74c40ebd44b61adaf840b9e7ce343",
+ "IPY_MODEL_c9517e2631c247b6ba083da05cfd0399",
+ "IPY_MODEL_29b6d0dc6624409c8c8a8e0565c2bc08"
+ ],
+ "layout": "IPY_MODEL_7b6f8f9b2eee485f8defe25d2a9afc4a"
+ }
+ },
+ "2da74c40ebd44b61adaf840b9e7ce343": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_fbaa859f9a8d4d5fb15667ede53a2cfb",
+ "placeholder": "",
+ "style": "IPY_MODEL_1840bc4e8cb54d3d89d5b1f199065c77",
+ "value": "config.yaml: 100%"
+ }
+ },
+ "c9517e2631c247b6ba083da05cfd0399": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "FloatProgressModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "FloatProgressModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "ProgressView",
+ "bar_style": "success",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_3f955f5b7421415bb79319058d0d094d",
+ "max": 221,
+ "min": 0,
+ "orientation": "horizontal",
+ "style": "IPY_MODEL_849cf4fb4ac249368fb8a5c837f8430a",
+ "value": 221
+ }
+ },
+ "29b6d0dc6624409c8c8a8e0565c2bc08": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "HTMLModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_dom_classes": [],
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "HTMLModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/controls",
+ "_view_module_version": "1.5.0",
+ "_view_name": "HTMLView",
+ "description": "",
+ "description_tooltip": null,
+ "layout": "IPY_MODEL_0411eb97e38b4fa081e05f87076ba129",
+ "placeholder": "",
+ "style": "IPY_MODEL_34842bb9a73048caace624929cfc2b03",
+ "value": " 221/221 [00:00<00:00, 10.8kB/s]"
+ }
+ },
+ "7b6f8f9b2eee485f8defe25d2a9afc4a": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "fbaa859f9a8d4d5fb15667ede53a2cfb": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "1840bc4e8cb54d3d89d5b1f199065c77": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ },
+ "3f955f5b7421415bb79319058d0d094d": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "849cf4fb4ac249368fb8a5c837f8430a": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "ProgressStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "ProgressStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "bar_color": null,
+ "description_width": ""
+ }
+ },
+ "0411eb97e38b4fa081e05f87076ba129": {
+ "model_module": "@jupyter-widgets/base",
+ "model_name": "LayoutModel",
+ "model_module_version": "1.2.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/base",
+ "_model_module_version": "1.2.0",
+ "_model_name": "LayoutModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "LayoutView",
+ "align_content": null,
+ "align_items": null,
+ "align_self": null,
+ "border": null,
+ "bottom": null,
+ "display": null,
+ "flex": null,
+ "flex_flow": null,
+ "grid_area": null,
+ "grid_auto_columns": null,
+ "grid_auto_flow": null,
+ "grid_auto_rows": null,
+ "grid_column": null,
+ "grid_gap": null,
+ "grid_row": null,
+ "grid_template_areas": null,
+ "grid_template_columns": null,
+ "grid_template_rows": null,
+ "height": null,
+ "justify_content": null,
+ "justify_items": null,
+ "left": null,
+ "margin": null,
+ "max_height": null,
+ "max_width": null,
+ "min_height": null,
+ "min_width": null,
+ "object_fit": null,
+ "object_position": null,
+ "order": null,
+ "overflow": null,
+ "overflow_x": null,
+ "overflow_y": null,
+ "padding": null,
+ "right": null,
+ "top": null,
+ "visibility": null,
+ "width": null
+ }
+ },
+ "34842bb9a73048caace624929cfc2b03": {
+ "model_module": "@jupyter-widgets/controls",
+ "model_name": "DescriptionStyleModel",
+ "model_module_version": "1.5.0",
+ "state": {
+ "_model_module": "@jupyter-widgets/controls",
+ "_model_module_version": "1.5.0",
+ "_model_name": "DescriptionStyleModel",
+ "_view_count": null,
+ "_view_module": "@jupyter-widgets/base",
+ "_view_module_version": "1.2.0",
+ "_view_name": "StyleView",
+ "description_width": ""
+ }
+ }
+ }
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Sharing a minimal reproduction example (MRE) is a prerequisite for `pyannote.audio` contributors to be able to solve them.\n",
+ "\n",
+ "Having an MRE is very important for contributors to be able to reproduce the bug in the same way that you are experiencing it. When testing a potential fix for the issue, contributors will use the MRE to validate that the fix is working as intended.\n",
+ "\n",
+ "This notebook provides a template that should help you create such a MRE.\n",
+ "\n",
+ "Duplicate it, edit it, and share it as a link within your `pyannote.audio` bug report."
+ ],
+ "metadata": {
+ "id": "SWidE_E7ol-U"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Setup\n",
+ "\n",
+ "Before anything, make sure to run this section."
+ ],
+ "metadata": {
+ "id": "k1vex_KZTDFm"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Specify the `pyannote.audio` version you found the issue in (including the Git commit hash if using a non-released version)."
+ ],
+ "metadata": {
+ "id": "XRNSJ2omranm"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!pip install -qqq pyannote.audio==3.1.1"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "THKj6xjdSv9k",
+ "outputId": "719baaf8-2028-4b8e-8e46-e6a813aaf6f8"
+ },
+ "execution_count": 1,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m208.7/208.7 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.6/44.6 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m15.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.5/58.5 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.1/48.1 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.4/51.4 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m118.6/118.6 kB\u001b[0m \u001b[31m16.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m630.6/630.6 kB\u001b[0m \u001b[31m21.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m101.7/101.7 kB\u001b[0m \u001b[31m14.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m47.9/47.9 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m806.1/806.1 kB\u001b[0m \u001b[31m25.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m777.7/777.7 kB\u001b[0m \u001b[31m29.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m18.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m413.4/413.4 kB\u001b[0m \u001b[31m29.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m39.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.6/59.6 kB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m233.4/233.4 kB\u001b[0m \u001b[31m27.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.4/116.4 kB\u001b[0m \u001b[31m14.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.6/78.6 kB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m526.7/526.7 kB\u001b[0m \u001b[31m45.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Building wheel for julius (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Declare your [Huggingface token](https://huggingface.co/settings/tokens) as `HF_TOKEN` secret by clicking on the 🔑 icon on the left:\n",
+ "\n",
+ "* **Name**: `HF_TOKEN` \n",
+ "* **Value**: your Huggingface token (e.g. `hf_ABCdzRFTkglhlcalBAPGHSQvxLmQs`)"
+ ],
+ "metadata": {
+ "id": "ZLhE8e7iTpTu"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Check that you can load the pretrained pipeline."
+ ],
+ "metadata": {
+ "id": "Mogy5_qYUoXs"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# access your HF token\n",
+ "from google.colab import userdata\n",
+ "hf_token = userdata.get('HF_TOKEN')\n",
+ "\n",
+ "# load the pretrained pipeline\n",
+ "from pyannote.audio import Pipeline\n",
+ "pipeline = Pipeline.from_pretrained(\n",
+ " \"pyannote/speaker-diarization-3.1\",\n",
+ " use_auth_token=hf_token)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 307,
+ "referenced_widgets": [
+ "3d0fe95350234ab599497683ae6d4ce6",
+ "239beda16fde4b1d9e6abfb47f036040",
+ "2d6f5fe6f9ab4a8b853e7b023aaee35f",
+ "771629a0fbab4b0e9ad6425b5affe380",
+ "8763b8e4c8104b879d4324257a810c0f",
+ "22536aef3997408dbf5ea58241f01e3e",
+ "2a6a5f33d7c34b6b996ab7a5b7c2f11d",
+ "c348b2f57c6c437fa8a9a2688bf17f4f",
+ "7526127b349a4467a6d3083d92bef5ec",
+ "30efc0ed76fb496483d4bb425a930636",
+ "e760e231084f4889bb489bc550b7a816",
+ "9951dce8a6c947dd9a2ed6106cb68f30",
+ "d68c24e62487484b83bbb06271a6981c",
+ "467f3c9f15184784aa913cb3ae46700f",
+ "c87055ce8cef498f999a20ff2b34e1b8",
+ "63d957e24276476b9bf1be150675217c",
+ "afe35fd2ea654d7b861046e847f82e51",
+ "c5f4f5f5fbad4d14973dc21eae2358d4",
+ "b4faa4c7fd01498abe07fef6189de4d4",
+ "35ab2642d8a5481780c43eaef7044f3c",
+ "37de7e413b25451ca5288ea5f43dd2d5",
+ "47012c0ff9394d77b8d857a933b4082e",
+ "d3ca3e02944c49f88d2b232295b19293",
+ "392a6347341f4dac94cad19e1f0bf02b",
+ "ba0102afa62c443eb9de89412301bb46",
+ "b71a286118004001a9a65ece5ba352d2",
+ "29b969a7441d41059969962f338d0def",
+ "3d0be590ca374ef1a768b8d3577d3105",
+ "c0ac81654e1c4904a87a2c777b520c09",
+ "36f8eb5005864f779aa8185acf55e2c5",
+ "5cbb5cf6a37d451c87215851ee8b0b65",
+ "ca3ee9d356bc464e82ee061375cc4ef2",
+ "67e4c4a2768b40eca34e8add6265c40c",
+ "2149e37d14e94f82a77386682ba29195",
+ "501eb1f4c1a8458a86c917d7dac5eeb5",
+ "7a115cf3065f4acdb0ae13577447a333",
+ "532d713a67c94b0d8cf0d61be72c73e7",
+ "3f36703cc5bc4fbcbeb97ab722e573cc",
+ "8088c86a3d394f9e88267196b369dfb9",
+ "46c5a63bcc3a472284869bcb11407bed",
+ "754c1729ba074d518557bd53c01e3787",
+ "bbe45325b7b74832bf6fe4a9769e0565",
+ "9261da2864ce4fd9b0f05f2a2efc54a3",
+ "940ed223c43d40e5b85eadf3a5ef0382",
+ "ef97ca55c62b40f7bb233120f8efba62",
+ "2da74c40ebd44b61adaf840b9e7ce343",
+ "c9517e2631c247b6ba083da05cfd0399",
+ "29b6d0dc6624409c8c8a8e0565c2bc08",
+ "7b6f8f9b2eee485f8defe25d2a9afc4a",
+ "fbaa859f9a8d4d5fb15667ede53a2cfb",
+ "1840bc4e8cb54d3d89d5b1f199065c77",
+ "3f955f5b7421415bb79319058d0d094d",
+ "849cf4fb4ac249368fb8a5c837f8430a",
+ "0411eb97e38b4fa081e05f87076ba129",
+ "34842bb9a73048caace624929cfc2b03"
+ ]
+ },
+ "id": "i8rs3XylTTys",
+ "outputId": "b1b7445e-6f29-4ce0-c111-f5ffcb488c47"
+ },
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "config.yaml: 0%| | 0.00/469 [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "3d0fe95350234ab599497683ae6d4ce6"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/pyannote/audio/pipelines/speaker_verification.py:43: UserWarning: torchaudio._backend.get_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.\n",
+ " backend = torchaudio.get_audio_backend()\n",
+ "/usr/local/lib/python3.10/dist-packages/pyannote/audio/pipelines/speaker_verification.py:53: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.\n",
+ " torchaudio.set_audio_backend(backend)\n",
+ "/usr/local/lib/python3.10/dist-packages/pyannote/audio/tasks/segmentation/mixins.py:37: UserWarning: `torchaudio.backend.common.AudioMetaData` has been moved to `torchaudio.AudioMetaData`. Please update the import path.\n",
+ " from torchaudio.backend.common import AudioMetaData\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "pytorch_model.bin: 0%| | 0.00/5.91M [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "9951dce8a6c947dd9a2ed6106cb68f30"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "config.yaml: 0%| | 0.00/399 [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "d3ca3e02944c49f88d2b232295b19293"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "pytorch_model.bin: 0%| | 0.00/26.6M [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "2149e37d14e94f82a77386682ba29195"
+ }
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "config.yaml: 0%| | 0.00/221 [00:00, ?B/s]"
+ ],
+ "application/vnd.jupyter.widget-view+json": {
+ "version_major": 2,
+ "version_minor": 0,
+ "model_id": "ef97ca55c62b40f7bb233120f8efba62"
+ }
+ },
+ "metadata": {}
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Check that GPU is available and send pipeline to GPU."
+ ],
+ "metadata": {
+ "id": "SSGSKkLdXheL"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import torch\n",
+ "if torch.cuda.is_available():\n",
+ " gpu = torch.device(\"cuda\")\n",
+ " pipeline.to(gpu)\n",
+ "else:\n",
+ " print(\"Please switch to (free) T4 GPU runtime.\")"
+ ],
+ "metadata": {
+ "id": "vxMDKwA0XcKi"
+ },
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Dowload a sample audio file (make sure the download link is public or your bug report will not be reproducible by anyone)."
+ ],
+ "metadata": {
+ "id": "ISngZk15Uzhb"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!wget https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.wav"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "jYfZIys7Te0z",
+ "outputId": "ba64b747-c973-4274-feb4-3b1b4293e529"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "--2024-01-08 15:31:41-- https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.wav\n",
+ "Resolving github.com (github.com)... 140.82.113.4\n",
+ "Connecting to github.com (github.com)|140.82.113.4|:443... connected.\n",
+ "HTTP request sent, awaiting response... 302 Found\n",
+ "Location: https://raw.githubusercontent.com/pyannote/pyannote-audio/develop/tutorials/assets/sample.wav [following]\n",
+ "--2024-01-08 15:31:41-- https://raw.githubusercontent.com/pyannote/pyannote-audio/develop/tutorials/assets/sample.wav\n",
+ "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
+ "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 960104 (938K) [audio/wav]\n",
+ "Saving to: ‘sample.wav’\n",
+ "\n",
+ "sample.wav 100%[===================>] 937.60K --.-KB/s in 0.04s \n",
+ "\n",
+ "2024-01-08 15:31:42 (25.7 MB/s) - ‘sample.wav’ saved [960104/960104]\n",
+ "\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Apply the pretrained pipeline and visualize the output."
+ ],
+ "metadata": {
+ "id": "RH6AEClgaAgU"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "diarization = pipeline(\"sample.wav\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "AoCal-3UXK0z",
+ "outputId": "59428f0f-af42-4e78-fc04-abd6324c73db"
+ },
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/pyannote/audio/utils/reproducibility.py:74: ReproducibilityWarning: TensorFloat-32 (TF32) has been disabled as it might lead to reproducibility issues and lower accuracy.\n",
+ "It can be re-enabled by calling\n",
+ " >>> import torch\n",
+ " >>> torch.backends.cuda.matmul.allow_tf32 = True\n",
+ " >>> torch.backends.cudnn.allow_tf32 = True\n",
+ "See https://github.com/pyannote/pyannote-audio/issues/1370 for more details.\n",
+ "\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "diarization"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 259
+ },
+ "id": "iIKvKZoZXN_M",
+ "outputId": "e7dcd7e1-0512-425c-b46a-6dc18d7a67b1"
+ },
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAABiIAAADyCAYAAADAzN2uAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAe8UlEQVR4nO3de5RV5X038O/hKnEuCjgzoCPiJV4SiMamikmMQQWV5RKlpsalkWhkhYV0qWnkjUWNsZqG9aY2qdrc8NIgxmWjJjG1uViwJqBGG0MxKY0UoykyIMhw0QGEef/wZepkCJyR2XMG5vNZa9Zi9n7Os3/n8Jxn79nfc/Yutba2tgYAAAAAAKAAfSpdAAAAAAAAsPcSRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIURRAAAAAAAAIXZ64OIVatWZerUqTn44IMzcODANDQ0ZPz48fn5z3+eJDnkkENSKpVSKpWy77775v3vf38eeOCBtsd//vOfb1v/9p+jjjqqw7buu+++9O3bN9OmTeuwbv78+SmVSlm7dm3bsuXLl2fUqFE5+eST09zc3NZmRz8rVqzoUE/fvn3T2NiYKVOmZM2aNWW/Ji0tLZk2bVqGDBmSqqqqTJo0KU1NTe3avPTSS5kwYULe9a53pa6uLp/97Gfz5ptvlr2N3sY466iccfYXf/EXOf744zNw4MAce+yxZffdWxlnHe1qnP3qV7/Kxz/+8TQ2NmbQoEE5+uij85WvfKXs/gEAAADYff12t4PmTc1dUUdZagfWdvoxkyZNyubNm3PPPffk0EMPTVNTUx577LGsXr26rc0XvvCFXH755Vm3bl2+/OUv58///M9z4IEH5qSTTkqSvOc978lPf/rTdv3269fxpZs9e3auueaafP3rX8+Xv/zl7LPPPn+0rqVLl+b000/PMccckwceeCCDBg1qW7dkyZLU1NS0a19XV9f27+31bN26Nb/5zW9y6aWXprm5Offff39Zr8lVV12VH/7wh3nggQdSW1ubK664Iuedd17bycytW7dmwoQJaWhoyIIFC/LKK6/kE5/4RPr3759bbrmlrG10pa1v+7/qDn2HDOn0Y4yzjnY1zra79NJL89RTT2XRokVl9Vuk1zZu7rZt7b/vgE4/xjjraFfj7Nlnn01dXV3mzJmTxsbGLFiwIFOmTEnfvn1zxRVXlLUNAAAAAHbPbgcRFz96YVfUUZbvT/xhp9qvXbs2TzzxRObPn5+PfOQjSZIRI0bkT//0T9u1q66uTkNDQxoaGnL77bdnzpw5+cEPftB24q5fv35paGjY6baWLVuWBQsW5Lvf/W7mzZuXBx98MBdeuOPXZtGiRRk/fnzGjh2be+65p8NJwLq6uuy3335/dFtvr+fAAw/M+eefn7vuumun9W3X3Nyc2bNnZ+7cuRk7dmyS5K677srRRx+dJ598MieeeGJ+/OMf59e//nV++tOfpr6+Pscee2xuuummzJgxI5///OczYEDnT6DujhWjj+3W7R34Py93qr1x1lE54yxJvvrVryZ565P+PSGIOHPWvG7b1pM3ju9Ue+Oso3LG2aWXXtruMYceemgWLlyYBx98UBABAAAA0E326kszVVVVpaqqKg8//HA2bdpU1mP69euX/v37Z/Pmzn0y+q677sqECRNSW1ubiy66KLNnz95huwULFuQjH/lIJk2alDlz5uzwk8id8eKLL+ZHP/pR2eHAs88+my1btuS0005rW3bUUUfl4IMPzsKFC5MkCxcuzKhRo1JfX9/WZvz48Vm3bl2ef/753ap3b2ScdVTOOKNzjLOO3uk4a25uzuDBg3erVgAAAADKt1cHEf369cvdd9+de+65J/vtt18++MEP5tprr/2jn7zevHlzvvjFL6a5ubnt07VJ8h//8R9tJwG3/3z6059uW79t27bcfffdueiii5IkF1xwQX72s59l2bJlHbZx7rnn5uyzz85tt92WUqm0wzoOOuigdtt6z3ve02799noGDRqUkSNH5vnnn8+MGTPKek1WrFiRAQMGdPiEcn19fdt121esWNEuhNi+fvs62jPOOipnnNE5xllH72ScLViwIPfff3+mTJlS1jYAAAAA2H27fWmmnm7SpEmZMGFCnnjiiTz55JN59NFHM2vWrHzrW9/K5MmTkyQzZszIzJkz09LSkqqqqvzN3/xNJkyY0NbHkUceme9///vt+n37Nc9/8pOfZOPGjTnrrLOSJEOHDs3pp5+eO++8MzfddFO7x51zzjl56KGH8sQTT+TDH/7wDmt+4oknUl1d3fZ7//79263fXk9LS0vmzJmT5557LtOnT+/8i0OXMc7oDsbZ7lm8eHHOOeec3HDDDRk3blwh2wAAAACgo90OIr595tyuqKNQ++yzT04//fScfvrpue666/KpT30qN9xwQ9uJu89+9rOZPHlyqqqqUl9f3+GTvQMGDMjhhx/+R/ufPXt21qxZ0+4Grdu2bcuiRYty4403pk+f//3iyde//vVcc801OfPMM/PP//zPOfnkkzv0N3LkyJ1eU/3t9Ww/yXjjjTd2OEm4Iw0NDdm8eXPWrl3bbhtNTU1t12lvaGjI008/3e5xTU1Nbeu6W8Oi57p9m++Ecfa/yhlnPdGj13y00iXsknH2vzozzn7961/n1FNPzZQpUzJz5sxd9g0AAABA19ntIKJ2YG1X1NGtjjnmmDz88MNtvw8dOnSnJ+Z2ZvXq1fne976X73znO+0uObJ169Z86EMfyo9//OOcccYZbctLpVK+8Y1vpE+fPjnrrLPywx/+sO3Gs+/UzJkzM3bs2EydOjXDhw/fadvjjz8+/fv3z2OPPZZJkyYlSZYsWZKXXnopY8aMSZKMGTMmN998c1auXJm6urokb31KuqamJsccc8xu1fpO9B0ypNu32RWMs52Ps55o/32790bsXcE42/U4e/755zN27Nhccsklufnmm3erPgAAAAA6b6++NNPq1atz/vnn59JLL83o0aNTXV2dZ555JrNmzco555xTdj9vvvlmh+uNl0ql1NfX59vf/naGDBmSj33sYx0+eXzWWWdl9uzZ7U7cbX/s1772tfTt27ft5N0pp5zStn7lypVpaWlp95ghQ4Z0uKTJdmPGjMno0aNzyy235Lbbbtvpc6mtrc1ll12Wq6++OoMHD05NTU2mT5+eMWPG5MQTT0ySjBs3Lsccc0wuvvjizJo1KytWrMjMmTMzbdq0DBw4cKf990bGWUfljLMkeeGFF7Jhw4asWLEib7zxRp577rkkb51cL/eGxb2FcdZROeNs8eLFGTt2bMaPH5+rr7667bn37ds3BxxwwE77BwAAAKBr7NVBRFVVVU444YTceuutWbp0abZs2ZLGxsZcfvnlufbaa8vu5/nnn8+wYcPaLRs4cGBaWlpy55135txzz93hjVonTZqUiy++OK+++mqHdaVSKbfffnv69OmTCRMm5JFHHmnr48gjj+zQfuHChe1O4P6hq666KpMnT86MGTPS2Ni40+dz6623pk+fPpk0aVI2bdqU8ePH54477mhb37dv3zzyyCOZOnVqxowZk3333TeXXHJJvvCFL+y0397KONuxXY2zJPnUpz6Vxx9/vO334447LkmybNmyHHLIITvtv7cxznZsV+Psn/7pn7Jq1arMmTMnc+bMaVs+YsSIvPjiizvtGwAAAICuUWptbW2tdBEAAAAAAMDeqc+umwAAAAAAALwzgoi9zL333puqqqod/rz95rOwO4wzuoNxBgAAALB3cGmmvcz69evT1NS0w3X9+/fPiBEjurki9kbGGd3BOAMAAADYOwgiAAAAAACAwrg0EwAAAAAAUBhBBAAAAAAAUJh+5TTatm1bli9fnurq6pRKpaJrAgAAAAAAerDW1tasX78+w4cPT58+O//OQ1lBxPLly9PY2NglxQEAAAAAAHuHl19+OQcddNBO25QVRFRXV7d1WFNTs/uVAQAAAAAAe6x169alsbGxLT/YmbKCiO2XY6qpqRFEAAAAAAAASVLW7RzcrBoAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAAChMp4KIrStXFlVHB03//fvcdvO385vFy/LNeS/k1fWbCt/m1qamrPvy32ZrU1Ph2yrC9tes6b9/X+lS2IVVS1/O/CtmZtXSlytdSo+x6uUluevOK/Lb//5F5v7m3qxpWVPpkqBXWNOyxnsO9kDlvHdfXb+p246jgc7pqven/TgAvZn94J6lc0HEqlVF1dHByt83Zc7muix9cWVmz1/aPUHEypVZ/7e3dmvg0pW2v2Yrf79nBim9yWu/+58c8dA9ee13/1PpUnqM1U3L8tDgZfndit/kO0vm5jU7EegWr7Ws8Z6DPVA5791X12/qtuNooHO66v1pPw5Ab2Y/uGdxaSYAAAAAAKAwgggAAAAAAKAwgggAAAAAAKAw/TrTeFvzumxdvbqoWtppXb8hSfL6m9u6ZXtvt21tc7c9z660/TXbsHlbXtu4ucLVsDOvt7yZfZOU1u2ZY60I2/7/+H1jW0uFK4HeacPmDWne1FzpMoAybdi8oey269/Y4tgQepj1b2zp0v7sxwHojTpzTEzldSqIWPPJS7OlT/d8iaJ5yMHJudfnb3+5rlu293arL/h4t2+zK2x/za56Yk3yxLxKl8NOjHz1d/m/Sd417VNZUelieojmxkHJ/zki33z1e5UuBXql6xb8VaVLAAoy/R+fqXQJQMHsxwGAns6lmQAAAAAAgMIIIgAAAAAAgMIIIgAAAAAAgMJ06h4Rg++6M0M+8CdF1dLO6icXJ0+35Orjarr9PhFDvnNf+h9zdLdusytsf81u/fDgHDXmfZUuh5343eNPJw8nr9/+rRz64e55T/V0zYvmJc135fKh57hPBFTATSfdnENqR1a6DKBMLzYvK/ua8H//iT/J4Q3VBVcEdMYLK9Z36f1b7McB6I06c0xM5XUqiOhTW5O+Q4YUVUs7peqqJC15V7/u/9JGn/1qu+15dqXtr1nVgD7Zf98BlS6HnVi1z1tvvdaaPXOsFaFPdVXSnAzqs0+lS4FeqWpAVWoH1la6DKBMVQOqym5bPai/Y0PoYaoH9e/S/uzHAeiNOnNMTOW5NBMAAAAAAFAYQQQAAAAAAFAYQQQAAAAAAFAYQQQAAAAAAFCYTgURfQ84oKg6Oqg7qD4XDViZww6py2WnHJah1QML32bfurpUX31V+tbVFb6tImx/zeoOqq90KezC/iMOzG/PvST7jziw0qX0GEPqR+bcNSMzouHoXHDkhdl/n8GVLgl6hf33Gew9B3ugct67Q6sHdttxNNA5XfX+tB8HoDezH9yzlFpbW1t31WjdunWpra1Nc3NzampquqMuAAAAAACgh+pMbuDSTAAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEUCS5NX1m/LNeS/k1fWbKl0K0AuYc6BnWNOyJnN/c2/WtKzpUX0BAFAZ/lajM1Z3YpwIIoAkb+1oZs9fakcDdAtzDvQMr7WsyXeWzM1rXRAedGVfAABUhr/V6IzVGwQRAAAAAABADyCIAAAAAAAACtOv0gUAPcv6N7bktY2bK10GsJdb/8aWSpcAvM2GzRvSvKl5t/sAAGDv4PwQ5Vj/xptltxVEAO1M/8dnKl0CANDNrlvwV5UuAQCAHsT5Icrx5qaNZbd1aSYAAAAAAKAwgggAAAAAAKAwgggAAAAAAKAw7hEBtPP3n/iTHN5QXekygL3cCyvWu+Yo9CA3nXRzDqkduVt9vNi8zL0mAAD2Es4PUY7nfrs8Y79UXltBBNBO9aD+2X/fAZUuA9jLVQ/qX+kSgLepGlCV2oG1u90HAAB7B+eHKEf1oPLjBZdmAgAAAAAACiOIAAAAAAAACiOIAAAAAAAACiOIAAAAAAAACiOIAJIkQ6sH5rJTDsvQ6oGVLgXoBcw50DPsv8/gXHDkhdl/n8E9qi8AACrD32p0xpCq8sdJqbW1tXVXjdatW5fa2to0NzenpqZmt4oDAAAAAAD2bJ3JDXwjAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKIwgAgAAAAAAKEy/chq1trYmSdatW1doMQAAAAAAQM+3PS/Ynh/sTFlBxPr165MkjY2Nu1EWAAAAAACwN1m/fn1qa2t32qbUWkZcsW3btixfvjzV1dUplUpdViDQ3rp169LY2JiXX345NTU1lS4HoEczZwKUz5wJUD5zJkB5Wltbs379+gwfPjx9+uz8LhBlfSOiT58+Oeigg7qkOGDXampqHOwAlMmcCVA+cyZA+cyZALu2q29CbOdm1QAAAAAAQGEEEQAAAAAAQGEEEdCDDBw4MDfccEMGDhxY6VIAejxzJkD5zJkA5TNnAnS9sm5WDQAAAAAA8E74RgQAAAAAAFAYQQQAAAAAAFAYQQQAAAAAAFAYQQQAAAAAAFAYQQRUwL/927/l7LPPzvDhw1MqlfLwww+3W9/a2prrr78+w4YNy6BBg3Laaaflt7/9bWWKBaigXc2XkydPTqlUavdzxhlnVKZYgAr74he/mA984AOprq5OXV1dJk6cmCVLlrRr09LSkmnTpmXIkCGpqqrKpEmT0tTUVKGKASqnnDnzlFNO6XCs+elPf7pCFQPs2QQRUAEbN27M+973vtx+++07XD9r1qx89atfzde+9rU89dRT2XfffTN+/Pi0tLR0c6UAlbWr+TJJzjjjjLzyyittP/fdd183VgjQczz++OOZNm1annzyyfzkJz/Jli1bMm7cuGzcuLGtzVVXXZUf/OAHeeCBB/L4449n+fLlOe+88ypYNUBllDNnJsnll1/e7lhz1qxZFaoYYM9Wam1tba10EdCblUqlPPTQQ5k4cWKSt74NMXz48HzmM5/JX/7lXyZJmpubU19fn7vvvjsXXHBBBasFqJw/nC+Tt74RsXbt2g7flAAgWbVqVerq6vL444/n5JNPTnNzcw444IDMnTs3f/Znf5Yk+c///M8cffTRWbhwYU488cQKVwxQOX84ZyZvfSPi2GOPzd/93d9VtjiAvYBvREAPs2zZsqxYsSKnnXZa27La2tqccMIJWbhwYQUrA+iZ5s+fn7q6uhx55JGZOnVqVq9eXemSAHqE5ubmJMngwYOTJM8++2y2bNnS7jjzqKOOysEHH+w4E+j1/nDO3O7ee+/N0KFD8973vjef+9zn8vrrr1eiPIA9Xr9KFwC0t2LFiiRJfX19u+X19fVt6wB4yxlnnJHzzjsvI0eOzNKlS3PttdfmzDPPzMKFC9O3b99KlwdQMdu2bcuVV16ZD37wg3nve9+b5K3jzAEDBmS//fZr19ZxJtDb7WjOTJILL7wwI0aMyPDhw7No0aLMmDEjS5YsyYMPPljBagH2TIIIAGCP9fbL1Y0aNSqjR4/OYYcdlvnz5+fUU0+tYGUAlTVt2rQsXrw4P/vZzypdCkCP98fmzClTprT9e9SoURk2bFhOPfXULF26NIcddlh3lwmwR3NpJuhhGhoakiRNTU3tljc1NbWtA2DHDj300AwdOjQvvPBCpUsBqJgrrrgijzzySObNm5eDDjqobXlDQ0M2b96ctWvXtmvvOBPozf7YnLkjJ5xwQpI41gR4BwQR0MOMHDkyDQ0Neeyxx9qWrVu3Lk899VTGjBlTwcoAer7f//73Wb16dYYNG1bpUgC6XWtra6644oo89NBD+dd//deMHDmy3frjjz8+/fv3b3ecuWTJkrz00kuOM4FeZ1dz5o4899xzSeJYE+AdcGkmqIANGza0+wTFsmXL8txzz2Xw4ME5+OCDc+WVV+av//qvc8QRR2TkyJG57rrrMnz48EycOLFyRQNUwM7my8GDB+fGG2/MpEmT0tDQkKVLl+aaa67J4YcfnvHjx1ewaoDKmDZtWubOnZvvfe97qa6ubrvvQ21tbQYNGpTa2tpcdtllufrqqzN48ODU1NRk+vTpGTNmTE488cQKVw/QvXY1Zy5dujRz587NWWedlSFDhmTRokW56qqrcvLJJ2f06NEVrh5gz1NqbW1trXQR0NvMnz8/H/3oRzssv+SSS3L33XentbU1N9xwQ77xjW9k7dq1+dCHPpQ77rgj7373uytQLUDl7Gy+/Id/+IdMnDgxv/zlL7N27doMHz4848aNy0033ZT6+voKVAtQWaVSaYfL77rrrkyePDlJ0tLSks985jO57777smnTpowfPz533HGHSzMBvc6u5syXX345F110URYvXpyNGzemsbEx5557bmbOnJmamppurhZgzyeIAAAAAAAACuMeEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAAAAAQGEEEQAAQDuTJ0/OxIkTK10GAACwl+hX6QIAAIDuUyqVdrr+hhtuyFe+8pW0trZ2U0UAAMDeThABAAC9yCuvvNL27/vvvz/XX399lixZ0rasqqoqVVVVlSgNAADYS7k0EwAA9CINDQ1tP7W1tSmVSu2WVVVVdbg00ymnnJLp06fnyiuvzP7775/6+vp885vfzMaNG/PJT34y1dXVOfzww/Poo4+229bixYtz5plnpqqqKvX19bn44ovz6quvdvMzBgAAKk0QAQAA7NI999yToUOH5umnn8706dMzderUnH/++TnppJPy7//+7xk3blwuvvjivP7660mStWvXZuzYsTnuuOPyzDPP5F/+5V/S1NSUj33sYxV+JgAAQHcTRAAAALv0vve9LzNnzswRRxyRz33uc9lnn30ydOjQXH755TniiCNy/fXXZ/Xq1Vm0aFGS5Lbbbstxxx2XW265JUcddVSOO+643HnnnZk3b17+67/+q8LPBgAA6E7uEQEAAOzS6NGj2/7dt2/fDBkyJKNGjWpbVl9fnyRZuXJlkuRXv/pV5s2bt8P7TSxdujTvfve7C64YAADoKQQRAADALvXv37/d76VSqd2yUqmUJNm2bVuSZMOGDTn77LPzpS99qUNfw4YNK7BSAACgpxFEAAAAXe79739/vvvd7+aQQw5Jv37+7AAAgN7MPSIAAIAuN23atKxZsyYf//jH84tf/CJLly7Nj370o3zyk5/M1q1bK10eAADQjQQRAABAlxs+fHh+/vOfZ+vWrRk3blxGjRqVK6+8Mvvtt1/69PFnCAAA9Cal1tbW1koXAQAAAAAA7J18FAkAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACiMIAIAAAAAACjM/wM8DotLF49/uQAAAABJRU5ErkJggg==\n"
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# MRE\n",
+ "\n",
+ "Now that things are setup, edit the following cells with the piece of code allowing to reproduce the bug report.\n"
+ ],
+ "metadata": {
+ "id": "qHxFJZDxr5O1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from pyannote.audio import Model\n",
+ "model = Model.from_pretrained(\n",
+ " \"pyannote/speaker-diarization-3.1\",\n",
+ " use_auth_token=hf_token)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 499
+ },
+ "id": "gVrDtBcusDbK",
+ "outputId": "25823c18-bff7-43b5-ef30-f5e8b2e43e2b"
+ },
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "EntryNotFoundError",
+ "evalue": "404 Client Error. (Request ID: Root=1-659c1570-229842ad49cdd505022bd7b3;3b3426ec-0f8e-49a4-8783-f5740d92a8ed)\n\nEntry Not Found for url: https://huggingface.co/pyannote/speaker-diarization-3.1/resolve/main/pytorch_model.bin.",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mHTTPError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py\u001b[0m in \u001b[0;36mhf_raise_for_status\u001b[0;34m(response, endpoint_name)\u001b[0m\n\u001b[1;32m 285\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 286\u001b[0;31m \u001b[0mresponse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 287\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mHTTPError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/requests/models.py\u001b[0m in \u001b[0;36mraise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1020\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1021\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mHTTPError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhttp_error_msg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1022\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mHTTPError\u001b[0m: 404 Client Error: Not Found for url: https://huggingface.co/pyannote/speaker-diarization-3.1/resolve/main/pytorch_model.bin",
+ "\nThe above exception was the direct cause of the following exception:\n",
+ "\u001b[0;31mEntryNotFoundError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mpyannote\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maudio\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mModel\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m model = Model.from_pretrained(\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;34m\"pyannote/speaker-diarization-3.1\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m use_auth_token=hf_token)\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pyannote/audio/core/model.py\u001b[0m in \u001b[0;36mfrom_pretrained\u001b[0;34m(cls, checkpoint, map_location, hparams_file, strict, use_auth_token, cache_dir, **kwargs)\u001b[0m\n\u001b[1;32m 622\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 623\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 624\u001b[0;31m path_for_pl = hf_hub_download(\n\u001b[0m\u001b[1;32m 625\u001b[0m \u001b[0mmodel_id\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 626\u001b[0m \u001b[0mHF_PYTORCH_WEIGHTS_NAME\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py\u001b[0m in \u001b[0;36m_inner_fn\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 116\u001b[0m \u001b[0mkwargs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msmoothly_deprecate_use_auth_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfn_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhas_token\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mhas_token\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 118\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 119\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 120\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0m_inner_fn\u001b[0m \u001b[0;31m# type: ignore\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py\u001b[0m in \u001b[0;36mhf_hub_download\u001b[0;34m(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint)\u001b[0m\n\u001b[1;32m 1236\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1237\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1238\u001b[0;31m metadata = get_hf_file_metadata(\n\u001b[0m\u001b[1;32m 1239\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1240\u001b[0m \u001b[0mtoken\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtoken\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py\u001b[0m in \u001b[0;36m_inner_fn\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 116\u001b[0m \u001b[0mkwargs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msmoothly_deprecate_use_auth_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfn_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhas_token\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mhas_token\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 118\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 119\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 120\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0m_inner_fn\u001b[0m \u001b[0;31m# type: ignore\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py\u001b[0m in \u001b[0;36mget_hf_file_metadata\u001b[0;34m(url, token, proxies, timeout, library_name, library_version, user_agent)\u001b[0m\n\u001b[1;32m 1629\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1630\u001b[0m \u001b[0;31m# Retrieve metadata\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1631\u001b[0;31m r = _request_wrapper(\n\u001b[0m\u001b[1;32m 1632\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"HEAD\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1633\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py\u001b[0m in \u001b[0;36m_request_wrapper\u001b[0;34m(method, url, follow_relative_redirects, **params)\u001b[0m\n\u001b[1;32m 383\u001b[0m \u001b[0;31m# Recursively follow relative redirects\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 384\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mfollow_relative_redirects\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 385\u001b[0;31m response = _request_wrapper(\n\u001b[0m\u001b[1;32m 386\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 387\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py\u001b[0m in \u001b[0;36m_request_wrapper\u001b[0;34m(method, url, follow_relative_redirects, **params)\u001b[0m\n\u001b[1;32m 407\u001b[0m \u001b[0;31m# Perform request and return if status_code is not in the retry list.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 408\u001b[0m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_session\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 409\u001b[0;31m \u001b[0mhf_raise_for_status\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 410\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 411\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py\u001b[0m in \u001b[0;36mhf_raise_for_status\u001b[0;34m(response, endpoint_name)\u001b[0m\n\u001b[1;32m 294\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0merror_code\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"EntryNotFound\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 295\u001b[0m \u001b[0mmessage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34mf\"{response.status_code} Client Error.\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m\"\\n\\n\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34mf\"Entry Not Found for url: {response.url}.\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 296\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mEntryNotFoundError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 297\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 298\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0merror_code\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"GatedRepo\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mEntryNotFoundError\u001b[0m: 404 Client Error. (Request ID: Root=1-659c1570-229842ad49cdd505022bd7b3;3b3426ec-0f8e-49a4-8783-f5740d92a8ed)\n\nEntry Not Found for url: https://huggingface.co/pyannote/speaker-diarization-3.1/resolve/main/pytorch_model.bin."
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# this does not work because `pyannote/speaker-diarization-3.1` is a not a `Model`, it is a `Pipeline`."
+ ],
+ "metadata": {
+ "id": "e4GWU8Sbsy9u"
+ },
+ "execution_count": 9,
+ "outputs": []
+ }
+ ]
+}
diff --git a/tutorials/adapting_pretrained_pipeline.ipynb b/tutorials/adapting_pretrained_pipeline.ipynb
index 06d318809..ee749fd80 100644
--- a/tutorials/adapting_pretrained_pipeline.ipynb
+++ b/tutorials/adapting_pretrained_pipeline.ipynb
@@ -43,9 +43,7 @@
"id": "CZjbjOBBDrdm"
},
"source": [
- "## Installation\n",
- "\n",
- "Let's start by installing `pyannote.audio` 2.1.1 (and `rich` for pretty progress bars)."
+ "## Installation\n"
]
},
{
@@ -66,7 +64,7 @@
"id": "ndQ10VIf2W1c"
},
"source": [
- "⚠ Restart the runtime (Runtime > Restart runtime). \n",
+ "⚠ Restart the runtime (Runtime > Restart session). \n",
"If you don't, `pyannote.database` will throw an error below."
]
},
@@ -154,10 +152,10 @@
},
"outputs": [],
"source": [
- "import os\n",
- "os.environ[\"PYANNOTE_DATABASE_CONFIG\"] = \"/content/AMI-diarization-setup/pyannote/database.yml\"\n",
- "from pyannote.database import get_protocol, FileFinder\n",
- "dataset = get_protocol(\"AMI-SDM.SpeakerDiarization.mini\", {\"audio\": FileFinder()})"
+ "from pyannote.database import registry, FileFinder\n",
+ "\n",
+ "registry.load_database(\"AMI-diarization-setup/pyannote/database.yml\")\n",
+ "dataset = registry.get_protocol(\"AMI-SDM.SpeakerDiarization.mini\", {\"audio\": FileFinder()})"
]
},
{
@@ -344,7 +342,8 @@
" loss=\"bce\", \n",
" vad_loss=\"bce\")\n",
"model.task = task\n",
- "model.setup(stage=\"fit\")"
+ "model.prepare_data()\n",
+ "model.setup()"
]
},
{
diff --git a/tutorials/add_your_own_model.ipynb b/tutorials/add_your_own_model.ipynb
index 30020f8a9..487932588 100644
--- a/tutorials/add_your_own_model.ipynb
+++ b/tutorials/add_your_own_model.ipynb
@@ -1,284 +1 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Defining a custom model\n",
- "\n",
- "A collection of models is readily available in `pyannote.audio.models` but you will eventually want to try your own architecture. \n",
- "\n",
- "This tutorial explains how to define (and then use) your own model. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from typing import Optional\n",
- "import torch\n",
- "import torch.nn as nn\n",
- "from pyannote.audio import Model\n",
- "from pyannote.audio.core.task import Task, Resolution\n",
- "from torchaudio.transforms import MFCC\n",
- "\n",
- "# Your custom model must be a subclass of `pyannote.audio.Model`,\n",
- "# which is a subclass of `pytorch_lightning.LightningModule`, \n",
- "# which is a subclass of `torch.nn.Module`.\n",
- "class MyCustomModel(Model):\n",
- " \"\"\"My custom model\"\"\"\n",
- "\n",
- "\n",
- " def __init__(\n",
- " self,\n",
- " sample_rate: int = 16000, \n",
- " num_channels: int = 1, \n",
- " task: Optional[Task] = None,\n",
- " param1: int = 32,\n",
- " param2: int = 16,\n",
- " ):\n",
- "\n",
- " # First three parameters (sample_rate, num_channels, and task)\n",
- " # must be there and passed to super().__init__()\n",
- " super().__init__(sample_rate=sample_rate, \n",
- " num_channels=num_channels, \n",
- " task=task)\n",
- "\n",
- " # Mark param1 and param2 as hyper-parameters.\n",
- " self.save_hyperparameters(\"param1\", \"param2\")\n",
- "\n",
- " # They will be saved automatically into checkpoints.\n",
- " # They are now also available in self.hparams:\n",
- " # - param1 == self.hparams.param1\n",
- " # - param2 == self.hparams.param2\n",
- "\n",
- " # Layers that do not depend on the addressed task should be defined in '__init__'.\n",
- " self.mfcc = MFCC()\n",
- " self.linear1 = nn.Linear(self.mfcc.n_mfcc, self.hparams.param1)\n",
- " self.linear2 = nn.Linear(self.hparams.param1, self.hparams.param2)\n",
- "\n",
- " def build(self):\n",
- " # Add layers that depend on the specifications of the task addressed \n",
- " # by this model.\n",
- "\n",
- " # For instance, this simple model could be used for \"speech vs. non-speech\"\n",
- " # or \"speech vs. music vs. other\" classification and the only difference\n",
- " # would lie in the number of classes (2 or 3) in the final classifier.\n",
- " \n",
- " # Since task specifications are not available at the time '__init__' is called,\n",
- " # task-dependent layers can only be added a 'build' time (where task specifications\n",
- " # are available in 'specifications' attribute)\n",
- " \n",
- " num_classes = len(self.specifications.classes)\n",
- " self.classifier = nn.Linear(self.hparams.param2, num_classes)\n",
- "\n",
- " # 'specifications' has several attributes describing what the task is:\n",
- " # - classes: the list of classes\n",
- " # - problem: the type of machine learning problem (e.g. binary \n",
- " # classification or representation learning) \n",
- " # - duration: the duration of input audio chunks, in seconds\n",
- " # - resolution: the resolution of the output (e.g. frame-wise scores\n",
- " # for voice activity detection or chunk-wise vector for speaker \n",
- " # embedding)\n",
- " # - permutation_invariant : whether classes are permutation-invariant\n",
- " # (e.g. in the case of speaker diarization)\n",
- "\n",
- " # Depending on the type of 'problem', 'default_activation' can be used\n",
- " # to automatically guess what the final activation should be (e.g. softmax\n",
- " # for multi-class classification or sigmoid for multi-label classification).\n",
- " self.activation = self.default_activation()\n",
- "\n",
- " # You obviously do not _have_ to use 'default_activation' and can choose to\n",
- " # use any activation you see fit (or even not use any activation layer). But\n",
- " # note that pyannote.audio tasks also define default loss functions that are\n",
- " # consistent with `default_activation` (e.g. binary cross entropy with softmax\n",
- " # for binary classification tasks)\n",
- " \n",
- " def forward(self, waveforms: torch.Tensor) -> torch.Tensor:\n",
- "\n",
- " # Models are expected to work on batches of audio chunks provided as tensors\n",
- " # with shape (batch_size, num_channels, num_samples) and using the sample rate \n",
- " # passed to __init__. Resampling will be done automatically for you so you do \n",
- " # not have to bother about that when preparing the data.\n",
- "\n",
- " # Extract sequence of MFCCs and passed them through two linear layers\n",
- " mfcc = self.mfcc(waveforms).squeeze(dim=1).transpose(1, 2)\n",
- " output = self.linear1(mfcc)\n",
- " output = self.linear2(output)\n",
- "\n",
- " # Apply temporal pooling for tasks which need an output at chunk-level. \n",
- " if self.specifications.resolution == Resolution.CHUNK:\n",
- " output = torch.mean(output, dim=-1)\n",
- " # Keep 'mfcc' frame resolution for frame-level tasks.\n",
- " elif self.specifications.resolution == Resolution.FRAME:\n",
- " pass\n",
- " \n",
- " # Apply final classifier and activation function\n",
- " output = self.classifier(output)\n",
- " return self.activation(output) "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Using your model with `pyannote.audio` API \n",
- "\n",
- "Your model can now be used like any other builtin model."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# initialize your experimental protocol\n",
- "from pyannote.database import get_protocol\n",
- "protocol = get_protocol('Debug.SpeakerDiarization.Debug')\n",
- "\n",
- "# initialize the task you want to address\n",
- "from pyannote.audio.tasks import VoiceActivityDetection\n",
- "task = VoiceActivityDetection(protocol)\n",
- "\n",
- "# initialize the model\n",
- "model = MyCustomModel(task=task)\n",
- "\n",
- "# train the model\n",
- "from pytorch_lightning import Trainer\n",
- "trainer = Trainer(max_epochs=1)\n",
- "trainer.fit(model)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Using your model with `pyannote-audio-train` CLI\n",
- "\n",
- "1. Define your model in a proper Python package:\n",
- "\n",
- "```\n",
- "/your/favorite/directory/\n",
- " your_package_name/\n",
- " __init__.py # needs to be here but can be empty\n",
- " custom_model.py # contains the above definition of your model \n",
- "```\n",
- "\n",
- "2. Add the package to your `PYTHONPATH`:\n",
- "\n",
- "```bash\n",
- "$ export PYTHONPATH=/your/favorite/directory\n",
- "```\n",
- "\n",
- "3. Check that you can import it from Python:\n",
- "\n",
- "```python\n",
- ">>> from your_package_name.custom_model import MyCustomModel\n",
- "```\n",
- "\n",
- "4. Tell `Hydra` (on which `pyannote-audio-train` is based) about this new model:\n",
- "\n",
- "```\n",
- "/your/favorite/directory/\n",
- " custom_config/\n",
- " model/\n",
- " MyCustomModel.yaml\n",
- "```\n",
- "\n",
- "where the content of `MyCustomModel.yaml` is as follows:\n",
- "\n",
- "```yaml\n",
- "# @package _group_\n",
- "_target_: your_package_name.custom_model.MyCustomModel\n",
- "param1: 32\n",
- "param2: 16\n",
- "```\n",
- "\n",
- "5. Enjoy\n",
- "\n",
- "```bash\n",
- "$ pyannote-audio-train --config-dir=/your/favorite/directory/custom_config \\\n",
- " protocol=Debug.SpeakerDiarization.Debug \\\n",
- " task=VoiceActivityDetection \\\n",
- " model=MyCustomModel \\\n",
- " model.param2=12 \n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Contributing your model to `pyannote-audio` \n",
- "\n",
- "1. Add your model in `pyannote.audio.models`.\n",
- "\n",
- "```\n",
- "pyannote/\n",
- " audio/\n",
- " models/\n",
- " custom_model.py \n",
- "```\n",
- "\n",
- "2. Check that you can import it from Python:\n",
- "\n",
- "```python\n",
- ">>> from pyannote.audio.models.custom_model import MyCustomModel\n",
- "```\n",
- "\n",
- "3. Add the corresponding `Hydra` configuration file:\n",
- "\n",
- "```\n",
- "pyannote/\n",
- " audio/\n",
- " cli/\n",
- " train_config/\n",
- " model/\n",
- " MyCustomModel.yaml\n",
- "```\n",
- "\n",
- "where the content of `MyCustomModel.yaml` is as follows:\n",
- "\n",
- "```yaml\n",
- "# @package _group_\n",
- "_target_: pyannote.audio.models.custom_model.MyCustomModel\n",
- "param1: 32\n",
- "param2: 16\n",
- "```\n",
- "\n",
- "4. Enjoy\n",
- "\n",
- "```bash\n",
- "$ pyannote-audio-train protocol=Debug.SpeakerDiarization.Debug \\\n",
- " task=VoiceActivityDetection \\\n",
- " model=MyCustomModel \\\n",
- " model.param2=12 \n",
- "```"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+{"cells":[{"cell_type":"markdown","metadata":{"id":"kY1p-wCLHw92"},"source":["# Add your own model"]},{"cell_type":"markdown","metadata":{"id":"iD_DNGmmHs9v"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"hhBTSvk6H_JC"},"source":["## Tutorial setup"]},{"cell_type":"markdown","metadata":{"id":"r-ocA5Z8PqNl"},"source":["### `Google Colab` setup"]},{"cell_type":"markdown","metadata":{"id":"I7lc6ctfIBv-"},"source":["If you are running this tutorial on `Colab`, execute the following commands in order to setup `Colab` environment. These commands will install `pyannote.audio` and download a mini version of the `AMI` corpus."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"l07Xq_UAIUFE"},"outputs":[],"source":["!pip install -qq pyannote.audio==3.1.1\n","!pip install -qq ipython==7.34.0\n","!git clone https://github.com/pyannote/AMI-diarization-setup.git\n","%cd ./AMI-diarization-setup/pyannote/\n","!bash ./download_ami_mini.sh\n","%cd /content"]},{"cell_type":"markdown","metadata":{"id":"3rjw5hATOv_c"},"source":["⚠ Restart the runtime (Runtime > Restart session)."]},{"cell_type":"markdown","metadata":{},"source":["### Non `Google Colab` setup"]},{"cell_type":"markdown","metadata":{"id":"VdMVQD-9QAto"},"source":["If you are not using `Colab`, this tutorial assumes that\n","* `pyannote.audio` has been installed\n","* the [AMI corpus](https://groups.inf.ed.ac.uk/ami/corpus/) has already been [setup for use with `pyannote`](https://github.com/pyannote/AMI-diarization-setup/tree/main/pyannote)"]},{"cell_type":"markdown","metadata":{"id":"kuemd4PWHeqh"},"source":["## Defining a custom model\n","\n","A collection of models is readily available in `pyannote.audio.models` but you will eventually want to try your own architecture.\n","\n","This tutorial explains how to define (and then use) your own model. "]},{"cell_type":"code","execution_count":18,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":12960,"status":"ok","timestamp":1704802939163,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"kNwQfnTOHeqm","outputId":"5f71bde3-5f13-4431-918f-6e1ca3ddd518"},"outputs":[],"source":["from typing import Optional\n","import torch\n","import torch.nn as nn\n","from pyannote.audio import Model\n","from pyannote.core import SlidingWindow\n","from pyannote.audio.core.task import Task, Resolution\n","from torchaudio.transforms import MFCC\n","\n","# Your custom model must be a subclass of `pyannote.audio.Model`,\n","# which is a subclass of `pytorch_lightning.LightningModule`,\n","# which is a subclass of `torch.nn.Module`.\n","class MyCustomModel(Model):\n"," \"\"\"My custom model\"\"\"\n","\n","\n"," def __init__(\n"," self,\n"," sample_rate: int = 16000,\n"," num_channels: int = 1,\n"," task: Optional[Task] = None,\n"," param1: int = 32,\n"," param2: int = 16,\n"," ):\n","\n"," # First three parameters (sample_rate, num_channels, and task)\n"," # must be there and passed to super().__init__()\n"," super().__init__(sample_rate=sample_rate,\n"," num_channels=num_channels,\n"," task=task)\n","\n"," # Mark param1 and param2 as hyper-parameters.\n"," self.save_hyperparameters(\"param1\", \"param2\")\n","\n"," # They will be saved automatically into checkpoints.\n"," # They are now also available in self.hparams:\n"," # - param1 == self.hparams.param1\n"," # - param2 == self.hparams.param2\n","\n"," # Layers that do not depend on the addressed task should be defined in '__init__'.\n"," self.mfcc = MFCC()\n"," self.linear1 = nn.Linear(self.mfcc.n_mfcc, self.hparams.param1)\n"," self.linear2 = nn.Linear(self.hparams.param1, self.hparams.param2)\n","\n"," def num_frames(self, num_samples: int) -> int:\n"," # Compute number of output frames for a given number of input samples\n"," hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length\n"," n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft\n"," center = self.mfcc.MelSpectrogram.spectrogram.center\n"," return (\n"," 1 + num_samples // hop_length\n"," if center\n"," else 1 + (num_samples - n_fft) // hop_length\n"," )\n","\n"," def receptive_field_size(self, num_frames: int = 1) -> int:\n"," # Compute receptive field size\n"," hop_length = self.mfcc.MelSpectrogram.spectrogram.hop_length\n"," n_fft = self.mfcc.MelSpectrogram.spectrogram.n_fft\n"," center = self.mfcc.MelSpectrogram.spectrogram.center\n","\n"," if center:\n"," return (num_frames - 1) * hop_length\n"," else:\n"," return (num_frames - 1) * hop_length + n_fft\n","\n"," def receptive_field(self) -> SlidingWindow:\n"," # Compute receptive field\n","\n"," # duration of the receptive field of each output frame\n"," duration = (\n"," self.mfcc.MelSpectrogram.spectrogram.win_length / self.hparams.sample_rate\n"," )\n","\n"," # step between the receptive field region of two consecutive output frames\n"," step = (\n"," self.mfcc.MelSpectrogram.spectrogram.hop_length / self.hparams.sample_rate\n"," )\n","\n"," return SlidingWindow(start=0.0, duration=duration, step=step)\n","\n"," def build(self):\n"," # Add layers that depend on the specifications of the task addressed\n"," # by this model.\n","\n"," # For instance, this simple model could be used for \"speech vs. non-speech\"\n"," # or \"speech vs. music vs. other\" classification and the only difference\n"," # would lie in the number of classes (2 or 3) in the final classifier.\n","\n"," # Since task specifications are not available at the time '__init__' is called,\n"," # task-dependent layers can only be added a 'build' time (where task specifications\n"," # are available in 'specifications' attribute)\n","\n"," num_classes = len(self.specifications.classes)\n"," self.classifier = nn.Linear(self.hparams.param2, num_classes)\n","\n"," # 'specifications' has several attributes describing what the task is:\n"," # - classes: the list of classes\n"," # - problem: the type of machine learning problem (e.g. binary\n"," # classification or representation learning)\n"," # - duration: the duration of input audio chunks, in seconds\n"," # - resolution: the resolution of the output (e.g. frame-wise scores\n"," # for voice activity detection or chunk-wise vector for speaker\n"," # embedding)\n"," # - permutation_invariant : whether classes are permutation-invariant\n"," # (e.g. in the case of speaker diarization)\n","\n"," # Depending on the type of 'problem', 'default_activation' can be used\n"," # to automatically guess what the final activation should be (e.g. softmax\n"," # for multi-class classification or sigmoid for multi-label classification).\n"," self.activation = self.default_activation()\n","\n"," # You obviously do not _have_ to use 'default_activation' and can choose to\n"," # use any activation you see fit (or even not use any activation layer). But\n"," # note that pyannote.audio tasks also define default loss functions that are\n"," # consistent with `default_activation` (e.g. binary cross entropy with softmax\n"," # for binary classification tasks)\n","\n"," def forward(self, waveforms: torch.Tensor) -> torch.Tensor:\n","\n"," # Models are expected to work on batches of audio chunks provided as tensors\n"," # with shape (batch_size, num_channels, num_samples) and using the sample rate\n"," # passed to __init__. Resampling will be done automatically for you so you do\n"," # not have to bother about that when preparing the data.\n","\n"," # Extract sequence of MFCCs and passed them through two linear layers\n"," mfcc = self.mfcc(waveforms).squeeze(dim=1).transpose(1, 2)\n"," output = self.linear1(mfcc)\n"," output = self.linear2(output)\n","\n"," # Apply temporal pooling for tasks which need an output at chunk-level.\n"," if self.specifications.resolution == Resolution.CHUNK:\n"," output = torch.mean(output, dim=-1)\n"," # Keep 'mfcc' frame resolution for frame-level tasks.\n"," elif self.specifications.resolution == Resolution.FRAME:\n"," pass\n","\n"," # Apply final classifier and activation function\n"," output = self.classifier(output)\n"," return self.activation(output)"]},{"cell_type":"markdown","metadata":{"id":"BuieqViJHeqp"},"source":["## Using your model with `pyannote.audio` API\n","\n","Your model can now be used like any other builtin model."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"qwTjuGuvHeqr"},"outputs":[],"source":["# initialize your experimental protocol\n","from pyannote.database import registry, FileFinder\n","\n","registry.load_database(\"./AMI-diarization-setup/pyannote/database.yml\")\n","protocol = registry.get_protocol('AMI.SpeakerDiarization.mini', preprocessors={\"audio\": FileFinder()})\n","\n","# initialize the task you want to address\n","from pyannote.audio.tasks import VoiceActivityDetection\n","task = VoiceActivityDetection(protocol)\n","\n","# initialize the model\n","model = MyCustomModel(task=task)\n","\n","# train the model\n","from pytorch_lightning import Trainer\n","trainer = Trainer(max_epochs=1)\n","trainer.fit(model)"]},{"cell_type":"markdown","metadata":{"id":"4qidmGQyHeqt"},"source":["## Using your model with `pyannote-audio-train` CLI\n","\n","1. Define your model in a proper Python package:\n","\n","```\n","/your/favorite/directory/\n"," your_package_name/\n"," __init__.py # needs to be here but can be empty\n"," custom_model.py # contains the above definition of your model\n","```\n","\n","2. Add the package to your `PYTHONPATH`:\n","\n","```bash\n","$ export PYTHONPATH=/your/favorite/directory\n","```\n","\n","3. Check that you can import it from Python:\n","\n","```python\n",">>> from your_package_name.custom_model import MyCustomModel\n","```\n","\n","4. Tell `Hydra` (on which `pyannote-audio-train` is based) about this new model:\n","\n","```\n","/your/favorite/directory/\n"," custom_config/\n"," model/\n"," MyCustomModel.yaml\n","```\n","\n","where the content of `MyCustomModel.yaml` is as follows:\n","\n","```yaml\n","# @package _group_\n","_target_: your_package_name.custom_model.MyCustomModel\n","param1: 32\n","param2: 16\n","```\n","\n","5. Enjoy\n","\n","```bash\n","$ pyannote-audio-train --config-dir=/your/favorite/directory/custom_config \\\n"," protocol=Debug.SpeakerDiarization.Debug \\\n"," task=VoiceActivityDetection \\\n"," model=MyCustomModel \\\n"," model.param2=12\n","```"]},{"cell_type":"markdown","metadata":{"id":"8W2UT1IpHequ"},"source":["## Contributing your model to `pyannote-audio`\n","\n","1. Add your model in `pyannote.audio.models`.\n","\n","```\n","pyannote/\n"," audio/\n"," models/\n"," custom_model.py \n","```\n","\n","2. Check that you can import it from Python:\n","\n","```python\n",">>> from pyannote.audio.models.custom_model import MyCustomModel\n","```\n","\n","3. Add the corresponding `Hydra` configuration file:\n","\n","```\n","pyannote/\n"," audio/\n"," cli/\n"," train_config/\n"," model/\n"," MyCustomModel.yaml\n","```\n","\n","where the content of `MyCustomModel.yaml` is as follows:\n","\n","```yaml\n","# @package _group_\n","_target_: pyannote.audio.models.custom_model.MyCustomModel\n","param1: 32\n","param2: 16\n","```\n","\n","4. Enjoy\n","\n","```bash\n","$ pyannote-audio-train protocol=Debug.SpeakerDiarization.Debug \\\n"," task=VoiceActivityDetection \\\n"," model=MyCustomModel \\\n"," model.param2=12\n","```"]}],"metadata":{"accelerator":"GPU","colab":{"gpuType":"T4","provenance":[]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.13"}},"nbformat":4,"nbformat_minor":0}
diff --git a/tutorials/add_your_own_task.ipynb b/tutorials/add_your_own_task.ipynb
index 251846957..b572e3d28 100644
--- a/tutorials/add_your_own_task.ipynb
+++ b/tutorials/add_your_own_task.ipynb
@@ -1,350 +1 @@
-{
- "cells": [
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Defining a custom task\n",
- "\n",
- "In `pyannote.audio`, a *task* is a combination of a **_problem_** that needs to be addressed and an **experimental protocol**.\n",
- "\n",
- "For example, one can address **_voice activity detection_** following the **AMI only_words** experimental protocol, by instantiating the following *task*:\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# this assumes that the AMI corpus has been setup for diarization\n",
- "# according to https://github.com/pyannote/AMI-diarization-setup\n",
- "import os\n",
- "os.environ['PYANNOTE_DATABASE_CONFIG'] = '/Users/bredin/Development/pyannote/pyannote-db/AMI-diarization-setup/pyannote/database.yml'\n",
- "\n",
- "from pyannote.database import get_protocol, FileFinder\n",
- "ami = get_protocol('AMI.SpeakerDiarization.only_words', \n",
- " preprocessors={'audio': FileFinder()})\n",
- "\n",
- "# address voice activity detection\n",
- "from pyannote.audio.tasks import VoiceActivityDetection\n",
- "task = VoiceActivityDetection(ami)"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "A growing collection of tasks is readily available in `pyannote.audio.tasks`..."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from pyannote.audio.tasks import __all__ as TASKS; print('\\n'.join(TASKS))"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "... but you will eventually want to use `pyannote.audio` to address a different task. \n",
- "In this example, we will add a new task addressing the **sound event detection** problem.\n",
- "\n"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Problem specification\n",
- "\n",
- "A problem is expected to be solved by a model $f$ that takes an audio chunk $X$ as input and returns its predicted solution $\\hat{y} = f(X)$. \n",
- "\n",
- "### Resolution\n",
- "\n",
- "Depending on the addressed problem, you might expect the model to output just one prediction for the whole audio chunk (`Resolution.CHUNK`) or a temporal sequence of predictions (`Resolution.FRAME`).\n",
- "\n",
- "In our particular case, we would like the model to provide one decision for the whole chunk:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from pyannote.audio.core.task import Resolution\n",
- "resolution = Resolution.CHUNK"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Type of problem\n",
- "\n",
- "Similarly, the type of your problem may fall into one of these generic machine learning categories:\n",
- "* `Problem.BINARY_CLASSIFICATION` for binary classification\n",
- "* `Problem.MONO_LABEL_CLASSIFICATION` for multi-class classification \n",
- "* `Problem.MULTI_LABEL_CLASSIFICATION` for multi-label classification\n",
- "* `Problem.REGRESSION` for regression\n",
- "* `Problem.REPRESENTATION` for representation learning\n",
- "\n",
- "In our particular case, we would like the model to do multi-label classification because one audio chunk may contain multiple sound events:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from pyannote.audio.core.task import Problem\n",
- "problem = Problem.MULTI_LABEL_CLASSIFICATION"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from pyannote.audio.core.task import Specifications\n",
- "specifications = Specifications(\n",
- " problem=problem,\n",
- " resolution=resolution,\n",
- " duration=5.0,\n",
- " classes=[\"Speech\", \"Dog\", \"Cat\", \"Alarm_bell_ringing\", \"Dishes\", \n",
- " \"Frying\", \"Blender\", \"Running_water\", \"Vacuum_cleaner\", \n",
- " \"Electric_shaver_toothbrush\"],\n",
- ")"
- ]
- },
- {
- "attachments": {},
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "A task is expected to be solved by a model $f$ that (usually) takes an audio chunk $X$ as input and returns its predicted solution $\\hat{y} = f(X)$. \n",
- "\n",
- "To help training the model $f$, the task $\\mathcal{T}$ is in charge of \n",
- "- generating $(X, y)$ training samples using the **dataset**\n",
- "- defining the loss function $\\mathcal{L}(y, \\hat{y})$\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from typing import Optional\n",
- "import torch\n",
- "import torch.nn as nn\n",
- "import numpy as np\n",
- "from pyannote.core import Annotation\n",
- "from pyannote.audio import Model\n",
- "from pyannote.audio.core.task import Task, Resolution\n",
- "\n",
- "# Your custom task must be a subclass of `pyannote.audio.core.task.Task`\n",
- "class SoundEventDetection(Task):\n",
- " \"\"\"Sound event detection\"\"\"\n",
- "\n",
- " def __init__(\n",
- " self,\n",
- " protocol: Protocol,\n",
- " duration: float = 5.0,\n",
- " warm_up: Union[float, Tuple[float, float]] = 0.0,\n",
- " batch_size: int = 32,\n",
- " num_workers: int = None,\n",
- " pin_memory: bool = False,\n",
- " augmentation: BaseWaveformTransform = None,\n",
- " **other_params,\n",
- " ):\n",
- "\n",
- " super().__init__(\n",
- " protocol,\n",
- " duration=duration,\n",
- " min_duration=min_duration,\n",
- " warm_up=warm_up,\n",
- " batch_size=batch_size,\n",
- " num_workers=num_workers,\n",
- " pin_memory=pin_memory,\n",
- " augmentation=augmentation,\n",
- " )\n",
- "\n",
- " def setup(self):\n",
- "\n",
- " # load metadata for training subset\n",
- " self.train_metadata_ = list()\n",
- " for training_file in self.protocol.train():\n",
- " self.training_metadata_.append({\n",
- " # path to audio file (str)\n",
- " \"audio\": training_file[\"audio\"],\n",
- " # duration of audio file (float)\n",
- " \"duration\": training_file[\"duration\"],\n",
- " # reference annotation (pyannote.core.Annotation)\n",
- " \"annotation\": training_file[\"annotation\"],\n",
- " })\n",
- "\n",
- " # gather the list of classes\n",
- " classes = set()\n",
- " for training_file in self.train_metadata_:\n",
- " classes.update(training_file[\"reference\"].labels())\n",
- " classes = sorted(classes)\n",
- "\n",
- " # specify the addressed problem\n",
- " self.specifications = Specifications(\n",
- " # it is a multi-label classification problem\n",
- " problem=Problem.MULTI_LABEL_CLASSIFICATION,\n",
- " # we expect the model to output one prediction \n",
- " # for the whole chunk\n",
- " resolution=Resolution.CHUNK,\n",
- " # the model will ingest chunks with that duration (in seconds)\n",
- " duration=self.duration,\n",
- " # human-readable names of classes\n",
- " classes=classes)\n",
- "\n",
- " # `has_validation` is True iff protocol defines a development set\n",
- " if not self.has_validation:\n",
- " return\n",
- "\n",
- " # load metadata for validation subset\n",
- " self.validation_metadata_ = list()\n",
- " for validation_file in self.protocol.development():\n",
- " self.validation_metadata_.append({\n",
- " \"audio\": validation_file[\"audio\"],\n",
- " \"num_samples\": math.floor(validation_file[\"duration\"] / self.duration),\n",
- " \"annotation\": validation_file[\"annotation\"],\n",
- " })\n",
- " \n",
- " \n",
- "\n",
- " def train__iter__(self):\n",
- " # this method generates training samples, one at a time, \"ad infinitum\". each worker \n",
- " # of the dataloader will run it, independently from other workers. pyannote.audio and\n",
- " # pytorch-lightning will take care of making batches out of it.\n",
- "\n",
- " # create worker-specific random number generator (RNG) to avoid this common bug:\n",
- " # tanelp.github.io/posts/a-bug-that-plagues-thousands-of-open-source-ml-projects/\n",
- " rng = create_rng_for_worker(self.model.current_epoch)\n",
- "\n",
- " # load list and number of classes\n",
- " classes = self.specifications.classes\n",
- " num_classes = len(classes)\n",
- "\n",
- " # yield training samples \"ad infinitum\"\n",
- " while True:\n",
- "\n",
- " # select training file at random\n",
- " random_training_file, *_ = rng.choices(self.train_metadata_, k=1)\n",
- "\n",
- " # select one chunk at random \n",
- " random_start_time = rng.uniform(0, random_training_file[\"duration\"] - self.duration)\n",
- " random_chunk = Segment(random_start_time, random_start_time + self.duration)\n",
- "\n",
- " # load audio excerpt corresponding to random chunk\n",
- " X = self.model.audio.crop(random_training_file[\"audio\"], \n",
- " random_chunk, \n",
- " fixed=self.duration)\n",
- " \n",
- " # load labels corresponding to random chunk as {0|1} numpy array\n",
- " # y[k] = 1 means that kth class is active\n",
- " y = np.zeros((num_classes,))\n",
- " active_classes = random_training_file[\"annotation\"].crop(random_chunk).labels()\n",
- " for active_class in active_classes:\n",
- " y[classes.index(active_class)] = 1\n",
- " \n",
- " # yield training samples as a dict (use 'X' for input and 'y' for target)\n",
- " yield {'X': X, 'y': y}\n",
- "\n",
- " def train__len__(self):\n",
- " # since train__iter__ runs \"ad infinitum\", we need a way to define what an epoch is.\n",
- " # this is the purpose of this method. it outputs the number of training samples that\n",
- " # make an epoch.\n",
- "\n",
- " # we compute this number as the total duration of the training set divided by \n",
- " # duration of training chunks. we make sure that an epoch is at least one batch long,\n",
- " # or pytorch-lightning will complain\n",
- " train_duration = sum(training_file[\"duration\"] for training_file in self.train_metadata_)\n",
- " return max(self.batch_size, math.ceil(train_duration / self.duration))\n",
- "\n",
- " def val__getitem__(self, sample_idx):\n",
- "\n",
- " # load list and number of classes\n",
- " classes = self.specifications.classes\n",
- " num_classes = len(classes)\n",
- "\n",
- "\n",
- " # find which part of the validation set corresponds to sample_idx\n",
- " num_samples = np.cumsum([\n",
- " validation_file[\"num_samples\"] for validation_file in self.validation_metadata_])\n",
- " file_idx = np.where(num_samples < sample_idx)[0][0]\n",
- " validation_file = self.validation_metadata_[file_idx]\n",
- " idx = sample_idx - (num_samples[file_idx] - validation_file[\"num_samples\"]) \n",
- " chunk = SlidingWindow(start=0., duration=self.duration, step=self.duration)[idx]\n",
- "\n",
- " # load audio excerpt corresponding to current chunk\n",
- " X = self.model.audio.crop(validation_file[\"audio\"], chunk, fixed=self.duration)\n",
- "\n",
- " # load labels corresponding to random chunk as {0|1} numpy array\n",
- " # y[k] = 1 means that kth class is active\n",
- " y = np.zeros((num_classes,))\n",
- " active_classes = validaiton_file[\"annotation\"].crop(chunk).labels()\n",
- " for active_class in active_classes:\n",
- " y[classes.index(active_class)] = 1\n",
- "\n",
- " return {'X': X, 'y': y}\n",
- "\n",
- " def val__len__(self):\n",
- " return sum(validation_file[\"num_samples\"] \n",
- " for validation_file in self.validation_metadata_)\n",
- "\n",
- " # `pyannote.audio.core.task.Task` base class provides a `LightningModule.training_step` and \n",
- " # `LightningModule.validation_step` methods that rely on self.specifications to guess which \n",
- " # loss and metrics should be used. you can obviously choose to customize them. \n",
- " # More details can be found in pytorch-lightning documentation and in \n",
- " # pyannote.audio.core.task.Task source code. \n",
- "\n",
- " # def training_step(self, batch, batch_idx: int):\n",
- " # return loss\n",
- "\n",
- " # def validation_step(self, batch, batch_idx: int):\n",
- " # return metric\n",
- "\n",
- " # pyannote.audio.tasks.segmentation.mixin also provides a convenient mixin\n",
- " # for \"segmentation\" tasks (ie. with Resolution.FRAME) that already defines\n",
- " # a bunch of useful methods. \n"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3.8.5 64-bit ('pyannote-audio-v2': conda)",
- "name": "python385jvsc74a57bd0af55542e943232842f746a64555e4e006c72c98a3a863e85e6cbaf12772fa219"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+{"cells":[{"cell_type":"markdown","metadata":{"id":"W7BMj2EZlWqU"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"HG6OvaE4lWqZ"},"source":["# Defining a custom task"]},{"cell_type":"markdown","metadata":{"id":"c6LwrLYVlWqZ"},"source":["## Tutorial setup"]},{"cell_type":"markdown","metadata":{"id":"6lR9bgJBlWqb"},"source":["### `Google Colab` setup"]},{"cell_type":"markdown","metadata":{},"source":["If you are running this tutorial on `Colab`, execute the following commands in order to setup `Colab` environment. These commands will install `pyannote.audio` and download a mini version of the `AMI` corpus."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":127254,"status":"ok","timestamp":1704809957597,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"6LoOS-PjlWqd","outputId":"cd92e2d4-83cc-4bb0-ad5c-824cb2ca11ac"},"outputs":[],"source":["!pip install -qq pyannote.audio==3.1.1\n","!pip install -qq ipython==7.34.0\n","!git clone https://github.com/pyannote/AMI-diarization-setup.git\n","%cd ./AMI-diarization-setup/pyannote/\n","!bash ./download_ami_mini.sh\n","%cd /content"]},{"cell_type":"markdown","metadata":{"id":"LsZTSX-ulWqf"},"source":["⚠ Restart the runtime (Runtime > Restart session)."]},{"cell_type":"markdown","metadata":{"id":"904hVjv8lWqg"},"source":["### Non `Google Colab` setup"]},{"cell_type":"markdown","metadata":{"id":"serWAfFxlWqh"},"source":["If you are not using `Colab`, this tutorial assumes that\n","* `pyannote.audio` has been installed\n","* the [AMI corpus](https://groups.inf.ed.ac.uk/ami/corpus/) has already been [setup for use with `pyannote`](https://github.com/pyannote/AMI-diarization-setup/tree/main/pyannote)"]},{"cell_type":"markdown","metadata":{"id":"uWyNce9FlkA3"},"source":["## Task in `pyannote.audio`"]},{"cell_type":"markdown","metadata":{"id":"BK4hbdq6lWqj"},"source":["\n","In `pyannote.audio`, a *task* is a combination of a **_problem_** that needs to be addressed and an **experimental protocol**.\n","\n","For example, one can address **_voice activity detection_** following the **AMI only_words** experimental protocol, by instantiating the following *task*:\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"-4B8nLDmlWql"},"outputs":[],"source":["# this assumes that the AMI corpus has been setup for diarization\n","# according to https://github.com/pyannote/AMI-diarization-setup\n","\n","from pyannote.database import registry, FileFinder\n","registry.load_database(\"AMI-diarization-setup/pyannote/database.yml\")\n","ami = registry.get_protocol('AMI.SpeakerDiarization.mini',\n"," preprocessors={'audio': FileFinder()})\n","\n","# address voice activity detection\n","from pyannote.audio.tasks import VoiceActivityDetection\n","task = VoiceActivityDetection(ami)"]},{"cell_type":"markdown","metadata":{"id":"A9nxwDQGlWqn"},"source":["A growing collection of tasks is readily available in `pyannote.audio.tasks`..."]},{"cell_type":"code","execution_count":2,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":232,"status":"ok","timestamp":1704810010556,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"qbbDA2P5lWqp","outputId":"bf1988fb-9140-4d6a-8a2c-970578331f35"},"outputs":[{"name":"stdout","output_type":"stream","text":["SpeakerDiarization\n","VoiceActivityDetection\n","OverlappedSpeechDetection\n","MultiLabelSegmentation\n","SpeakerEmbedding\n","Segmentation\n"]}],"source":["from pyannote.audio.tasks import __all__ as TASKS; print('\\n'.join(TASKS))"]},{"cell_type":"markdown","metadata":{"id":"hihPu4iElWqr"},"source":["... but you will eventually want to use `pyannote.audio` to address a different task. \n","In this example, we will add a new task addressing the **sound event detection** problem.\n","\n"]},{"cell_type":"markdown","metadata":{"id":"RZOs3C4HlWqr"},"source":["## Problem specification\n","\n","A problem is expected to be solved by a model $f$ that takes an audio chunk $X$ as input and returns its predicted solution $\\hat{y} = f(X)$.\n","\n","### Resolution\n","\n","Depending on the addressed problem, you might expect the model to output just one prediction for the whole audio chunk (`Resolution.CHUNK`) or a temporal sequence of predictions (`Resolution.FRAME`).\n","\n","In our particular case, we would like the model to provide one decision for the whole chunk:"]},{"cell_type":"code","execution_count":3,"metadata":{"executionInfo":{"elapsed":234,"status":"ok","timestamp":1704810016464,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"G96Mz8vPlWqs"},"outputs":[],"source":["from pyannote.audio.core.task import Resolution\n","resolution = Resolution.CHUNK"]},{"cell_type":"markdown","metadata":{"id":"_Efd28eclWqt"},"source":["### Type of problem\n","\n","Similarly, the type of your problem may fall into one of these generic machine learning categories:\n","* `Problem.BINARY_CLASSIFICATION` for binary classification\n","* `Problem.MONO_LABEL_CLASSIFICATION` for multi-class classification\n","* `Problem.MULTI_LABEL_CLASSIFICATION` for multi-label classification\n","* `Problem.REGRESSION` for regression\n","* `Problem.REPRESENTATION` for representation learning\n","\n","In our particular case, we would like the model to do multi-label classification because one audio chunk may contain multiple sound events:"]},{"cell_type":"code","execution_count":4,"metadata":{"executionInfo":{"elapsed":315,"status":"ok","timestamp":1704810020230,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"Cl0VqB5jlWqu"},"outputs":[],"source":["from pyannote.audio.core.task import Problem\n","problem = Problem.MULTI_LABEL_CLASSIFICATION"]},{"cell_type":"code","execution_count":5,"metadata":{"executionInfo":{"elapsed":251,"status":"ok","timestamp":1704810021646,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"Hz_B7FCplWqv"},"outputs":[],"source":["from pyannote.audio.core.task import Specifications\n","specifications = Specifications(\n"," problem=problem,\n"," resolution=resolution,\n"," duration=5.0,\n"," classes=[\"Speech\", \"Dog\", \"Cat\", \"Alarm_bell_ringing\", \"Dishes\",\n"," \"Frying\", \"Blender\", \"Running_water\", \"Vacuum_cleaner\",\n"," \"Electric_shaver_toothbrush\"],\n",")"]},{"cell_type":"markdown","metadata":{"id":"5N72ksU7lWqv"},"source":["A task is expected to be solved by a model $f$ that (usually) takes an audio chunk $X$ as input and returns its predicted solution $\\hat{y} = f(X)$.\n","\n","To help training the model $f$, the task $\\mathcal{T}$ is in charge of\n","- generating $(X, y)$ training samples using the **dataset**\n","- defining the loss function $\\mathcal{L}(y, \\hat{y})$\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"lrTD1RwUlWqw"},"outputs":[],"source":["from math import ceil\n","from typing import Dict, Optional,Tuple, Union\n","import numpy as np\n","from pyannote.core import Segment, SlidingWindow\n","from pyannote.audio.utils.random import create_rng_for_worker\n","from pyannote.audio.core.task import Task, Resolution\n","from pyannote.database import Protocol\n","from torchmetrics.classification import MultilabelAUROC\n","\n","# Your custom task must be a subclass of `pyannote.audio.core.task.Task`\n","class SoundEventDetection(Task):\n"," \"\"\"Sound event detection\"\"\"\n","\n"," def __init__(\n"," self,\n"," protocol: Protocol,\n"," duration: float = 5.0,\n"," min_duration: float = 5.0,\n"," warm_up: Union[float, Tuple[float, float]] = 0.0,\n"," batch_size: int = 32,\n"," num_workers: int = None,\n"," pin_memory: bool = False,\n"," augmentation = None,\n"," cache: Optional[Union[str, None]] = None,\n"," **other_params,\n"," ):\n","\n"," super().__init__(\n"," protocol,\n"," duration=duration,\n"," min_duration=min_duration,\n"," warm_up=warm_up,\n"," batch_size=batch_size,\n"," num_workers=num_workers,\n"," pin_memory=pin_memory,\n"," augmentation=augmentation,\n"," cache=cache,\n"," )\n"," \n"," def prepare_data(self):\n"," # this method is called to prepare data from the specified protocol. \n"," # For most tasks, calling Task.prepare_data() is sufficient. If you \n"," # need to prepare task-specific data, define a post_prepare_data method for your task.\n"," super().prepare_data()\n","\n"," def post_prepare_data(self, prepared_data: Dict):\n"," # this method is called at the end of Task.prepare_data() \n"," # to complete data preparation with task-specific data, here \n"," # the list of classes and some training metadata\n","\n"," # load metadata for training subset\n"," prepared_data[\"train_metadata\"] = list()\n"," for training_file in self.protocol.train():\n"," prepared_data[\"train_metadata\"].append({\n"," # path to audio file (str)\n"," \"audio\": training_file[\"audio\"],\n"," # duration of audio file (float)\n"," \"duration\": training_file[\"torchaudio.info\"].num_frames / training_file[\"torchaudio.info\"].sample_rate,\n"," # reference annotation (pyannote.core.Annotation)\n"," \"annotation\": training_file[\"annotation\"],\n"," })\n","\n"," # gather the list of classes\n"," classes = set()\n"," for training_file in prepared_data[\"train_metadata\"]:\n"," classes.update(training_file[\"annotation\"].labels())\n"," prepared_data[\"classes\"] = sorted(classes)\n","\n"," # `has_validation` is True if protocol defines a development set\n"," if not self.has_validation:\n"," return\n"," \n"," def prepare_validation(self, prepared_data : Dict):\n"," # this method is called at the end of Task.prepare_data(), to complete data preparation\n"," # with task validation elements\n"," \n"," # load metadata for validation subset\n"," prepared_data[\"validation\"] = list()\n"," for validation_file in self.protocol.development():\n"," prepared_data[\"validation\"].append({\n"," \"audio\": validation_file[\"audio\"],\n"," \"num_samples\": validation_file[\"torchaudio.info\"].num_frames,\n"," \"annotation\": validation_file[\"annotation\"],\n"," })\n"," \n"," \n"," def setup(self, stage: Optional[Union[str, None]] = None):\n"," # this method assigns prepared data from task.prepare_data() to the task\n"," # and declares the task specifications\n","\n"," super().setup(stage)\n"," \n"," # specify the addressed problem\n"," self.specifications = Specifications(\n"," # it is a multi-label classification problem\n"," problem=Problem.MULTI_LABEL_CLASSIFICATION,\n"," # we expect the model to output one prediction \n"," # for the whole chunk\n"," resolution=Resolution.CHUNK,\n"," # the model will ingest chunks with that duration (in seconds)\n"," duration=self.duration,\n"," # human-readable names of classes\n"," classes=self.prepared_data[\"classes\"])\n"," \n"," def default_metric(self):\n"," # this method defines the default metrics used to evaluate the model during\n"," # a training\n"," num_classes = len(self.specifications.classes)\n"," return MultilabelAUROC(num_classes, average=\"macro\", compute_on_cpu=True)\n","\n"," def train__iter__(self):\n"," # this method generates training samples, one at a time, \"ad infinitum\". each worker \n"," # of the dataloader will run it, independently from other workers. pyannote.audio and\n"," # pytorch-lightning will take care of making batches out of it.\n","\n"," # create worker-specific random number generator (RNG) to avoid this common bug:\n"," # tanelp.github.io/posts/a-bug-that-plagues-thousands-of-open-source-ml-projects/\n"," rng = create_rng_for_worker(self.model)\n","\n"," # load list and number of classes\n"," classes = self.specifications.classes\n"," num_classes = len(classes)\n","\n"," # yield training samples \"ad infinitum\"\n"," while True:\n","\n"," # select training file at random\n"," random_training_file, *_ = rng.choices(self.prepared_data[\"train_metadata\"], k=1)\n","\n"," # select one chunk at random \n"," random_start_time = rng.uniform(0, random_training_file[\"duration\"] - self.duration)\n"," random_chunk = Segment(random_start_time, random_start_time + self.duration)\n","\n"," # load audio excerpt corresponding to random chunk\n"," X = self.model.audio.crop(random_training_file[\"audio\"], \n"," random_chunk, \n"," fixed=self.duration)\n"," \n"," # load labels corresponding to random chunk as {0|1} numpy array\n"," # y[k] = 1 means that kth class is active\n"," y = np.zeros((num_classes,))\n"," active_classes = random_training_file[\"annotation\"].crop(random_chunk).labels()\n"," for active_class in active_classes:\n"," y[classes.index(active_class)] = 1\n"," \n"," # yield training samples as a dict (use 'X' for input and 'y' for target)\n"," yield {'X': X, 'y': y}\n","\n"," def train__len__(self):\n"," # since train__iter__ runs \"ad infinitum\", we need a way to define what an epoch is.\n"," # this is the purpose of this method. it outputs the number of training samples that\n"," # make an epoch.\n","\n"," # we compute this number as the total duration of the training set divided by \n"," # duration of training chunks. we make sure that an epoch is at least one batch long,\n"," # or pytorch-lightning will complain\n"," train_duration = sum(training_file[\"duration\"] for training_file in self.prepared_data[\"train_metadata\"])\n"," return max(self.batch_size, ceil(train_duration / self.duration))\n","\n"," def val__getitem__(self, sample_idx):\n","\n"," # load list and number of classes\n"," classes = self.specifications.classes\n"," num_classes = len(classes)\n","\n","\n"," # find which part of the validation set corresponds to sample_idx\n"," num_samples = np.cumsum([\n"," validation_file[\"num_samples\"] for validation_file in self.prepared_data[\"validation\"]])\n"," file_idx = np.where(num_samples < sample_idx)[0][0]\n"," validation_file = self.prepared_data[\"validation\"][file_idx]\n"," idx = sample_idx - (num_samples[file_idx] - validation_file[\"num_samples\"]) \n"," chunk = SlidingWindow(start=0., duration=self.duration, step=self.duration)[idx]\n","\n"," # load audio excerpt corresponding to current chunk\n"," X = self.model.audio.crop(validation_file[\"audio\"], chunk, fixed=self.duration)\n","\n"," # load labels corresponding to random chunk as {0|1} numpy array\n"," # y[k] = 1 means that kth class is active\n"," y = np.zeros((num_classes,))\n"," active_classes = validation_file[\"annotation\"].crop(chunk).labels()\n"," for active_class in active_classes:\n"," y[classes.index(active_class)] = 1\n","\n"," return {'X': X, 'y': y}\n","\n"," def val__len__(self):\n"," return sum(validation_file[\"num_samples\"] \n"," for validation_file in self.prepared_data[\"validation\"])\n","\n"," # `pyannote.audio.core.task.Task` base class provides a `LightningModule.training_step` and \n"," # `LightningModule.validation_step` methods that rely on self.specifications to guess which \n"," # loss and metrics should be used. you can obviously choose to customize them. \n"," # More details can be found in pytorch-lightning documentation and in \n"," # pyannote.audio.core.task.Task source code. \n","\n"," # def training_step(self, batch, batch_idx: int):\n"," # return loss\n","\n"," # def validation_step(self, batch, batch_idx: int):\n"," # return metric\n","\n"," # pyannote.audio.tasks.segmentation.mixin also provides a convenient mixin\n"," # for \"segmentation\" tasks (ie. with Resolution.FRAME) that already defines\n"," # a bunch of useful methods. You can use it by inheriting your task from the \n"," # pyannote.audio.tasks.segmentation.mixinSegmentationTask"]}],"metadata":{"colab":{"provenance":[]},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.13"}},"nbformat":4,"nbformat_minor":0}
diff --git a/tutorials/applying_a_model.ipynb b/tutorials/applying_a_model.ipynb
index e035d0654..68524eb57 100644
--- a/tutorials/applying_a_model.ipynb
+++ b/tutorials/applying_a_model.ipynb
@@ -1,437 +1 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "# preparing notebook for visualization purposes\n",
- "# (only show outputs between t=0s and t=30s)\n",
- "from pyannote.core import notebook, Segment\n",
- "notebook.crop = Segment(0, 30)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Applying a pretrained model\n",
- "\n",
- "In this tutorial, you will learn how to apply `pyannote.audio` models on an audio file, whose manual annotation is depicted below"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "# clone pyannote-audio Github repository and update ROOT_DIR accordingly\n",
- "ROOT_DIR = \"/Users/hbredin/Development/pyannote/pyannote-audio\"\n",
- "AUDIO_FILE = f\"{ROOT_DIR}/tutorials/assets/sample.wav\""
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 3,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "from pyannote.database.util import load_rttm\n",
- "REFERENCE = f\"{ROOT_DIR}/tutorials/assets/sample.rttm\"\n",
- "reference = load_rttm(REFERENCE)[\"sample\"]\n",
- "reference"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Loading models from 🤗 hub\n",
- "\n",
- "A bunch of pretrained models are available on [🤗 Huggingface model hub](https://hf.co/models?other=pyannote-audio-model) and can be listed by looking for the [`pyannote-audio-model`](https://hf.co/models?other=pyannote-audio-model) tag."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['pyannote/Segmentation-PyanNet-DIHARD',\n",
- " 'pyannote/TestModelForContinuousIntegration',\n",
- " 'pyannote/embedding',\n",
- " 'pyannote/segmentation',\n",
- " 'pyannote/brouhaha']"
- ]
- },
- "execution_count": 16,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "from huggingface_hub import HfApi\n",
- "available_models = [m.modelId for m in HfApi().list_models(filter=\"pyannote-audio-model\")]\n",
- "list(filter(lambda p: p.startswith(\"pyannote/\"), available_models))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Official [pyannote.audio](https://github.com/pyannote/pyannote-audio) models (i.e. those under the [`pyannote` organization](https://hf.co/pyannote) umbrella) are open-source, but gated. It means that you have to first accept users conditions on their respective Huggingface page to access the pretrained weights and hyper-parameters. Despite this initial process, those models can perfectly be downloaded for later offline use: keep reading this tutorial until the end to learn how to do that.\n",
- "\n",
- "For instance, to load the speaker segmentation model used in this tutorial, you have to visit [hf.co/pyannote/segmentation](https://hf.co/pyannote/segmentation), accept the terms, and log in using `notebook_login` below:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "application/vnd.jupyter.widget-view+json": {
- "model_id": "1b7ac613e9e841c8903dc4932e183006",
- "version_major": 2,
- "version_minor": 0
- },
- "text/plain": [
- "VBox(children=(HTML(value=' , resolution=, duration=5.0, warm_up=(0.0, 0.0), classes=['speaker#1', 'speaker#2', 'speaker#3'], permutation_invariant=True)"
- ]
- },
- "execution_count": 8,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "specs = model.specifications\n",
- "specs"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "... which can be understood like that:\n",
- "\n",
- "* `duration = 5.0`: the model ingests 5s-long audio chunks\n",
- "* `Resolution.FRAME` and `len(classes) == 3`: the model output a sequence of frame-wise 3-dimensoinal scores\n",
- "* `Problem.MULTI_LABEL_CLASSIFICATION` for each frame, more than one speaker can be active at once"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To apply the model on the audio file, we wrap it into an `Inference` instance:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 9,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "from pyannote.audio import Inference\n",
- "inference = Inference(model, step=2.5)\n",
- "output = inference(AUDIO_FILE)\n",
- "output"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "For each of the 11 positions of the 5s window, the model outputs a 3-dimensional vector every 16ms (293 frames for 5 seconds), corresponding to the probabilities that each of (up to) 3 speakers is active. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "(11, 293, 3)"
- ]
- },
- "execution_count": 10,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "output.data.shape"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Processing a file from memory\n",
- "\n",
- "In case the audio file is not stored on disk, pipelines can also process audio provided as a `{\"waveform\": ..., \"sample_rate\": ...}` dictionary. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "type(waveform)=\n",
- "waveform.shape=torch.Size([1, 480000])\n",
- "waveform.dtype=torch.float32\n"
- ]
- }
- ],
- "source": [
- "import torchaudio\n",
- "waveform, sample_rate = torchaudio.load(AUDIO_FILE)\n",
- "\n",
- "print(f\"{type(waveform)=}\")\n",
- "print(f\"{waveform.shape=}\")\n",
- "print(f\"{waveform.dtype=}\")\n",
- "\n",
- "audio_in_memory = {\"waveform\": waveform, \"sample_rate\": sample_rate}"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 12,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "output = inference(audio_in_memory)\n",
- "output"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Processing part of a file\n",
- "\n",
- "If needed, `Inference` can be used to process only part of a file:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 13,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "from pyannote.core import Segment\n",
- "output = inference.crop(AUDIO_FILE, Segment(10, 20))\n",
- "output"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Offline use\n",
- "\n",
- "Gating models allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. Please fill this form as precisely as possible. \n",
- "\n",
- "For instance, before gating `pyannote/segmentation`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! maintaining open source libraries is time consuming.\n",
- "\n",
- "That being said: this whole authentication process does not prevent you from using official `pyannote.audio` models offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production).\n",
- "\n",
- "* Step 1: download the `pytorch_model.bin` model\n",
- "\n",
- "![](assets/download-model.png)\n",
- "\n",
- "* Step 2: load the model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "metadata": {},
- "outputs": [],
- "source": [
- "# look ma: no hands! \n",
- "offline_model = Model.from_pretrained(\"pytorch_model.bin\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "metadata": {},
- "outputs": [],
- "source": [
- "# just checking weights are the same...\n",
- "import torch\n",
- "for weights, offline_weights in zip(model.parameters(), offline_model.parameters()):\n",
- " assert torch.equal(weights, offline_weights)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3.9.13 ('pyannote-mps')",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.13"
- },
- "vscode": {
- "interpreter": {
- "hash": "36a3a48a52702f18671693adf589423ec3f7db45d50f6ee539f1b0696bb58d43"
- }
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+{"cells":[{"cell_type":"markdown","metadata":{"id":"rkz0m90MTNdU"},"source":["****"]},{"cell_type":"markdown","metadata":{"id":"vDBFFNeGWF0v"},"source":["# Applying a pretrained model\n","\n","In this tutorial, you will learn how to apply `pyannote.audio` models on an audio file, whose manual annotation is depicted below"]},{"cell_type":"markdown","metadata":{"id":"DcXUj3lVTkWz"},"source":["## Tutorial setup"]},{"cell_type":"code","execution_count":3,"metadata":{"executionInfo":{"elapsed":221,"status":"ok","timestamp":1704806538788,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"p7uwJtF1W0Od"},"outputs":[],"source":["# preparing notebook for visualization purposes\n","# (only show outputs between t=0s and t=30s)\n","from pyannote.core import notebook, Segment\n","notebook.crop = Segment(0, 30)"]},{"cell_type":"markdown","metadata":{"id":"yr4ONuV9Tlgj"},"source":["### `Google Colab` setup"]},{"cell_type":"markdown","metadata":{"id":"OaXXRLf5Tp2D"},"source":["If you are running this tutorial on `Colab`, execute the following commands in order to setup `Colab` environment. These commands will install `pyannote.audio`, and download resources used in this tutorial."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"wQub0z0VTzpU"},"outputs":[],"source":["!pip install -qq pyannote.audio==3.1.1\n","!pip install -qq ipython==7.34.0\n","!wget -q \"https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.wav\"\n","!wget -q \"https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.rttm\"\n","!wget -q -P ./assets/ \"https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/assets/download-model.png\""]},{"cell_type":"markdown","metadata":{"id":"4Qy1iMvGVaHF"},"source":["⚠ Restart the runtime (Runtime > Restart session)."]},{"cell_type":"code","execution_count":18,"metadata":{"executionInfo":{"elapsed":203,"status":"ok","timestamp":1704807063824,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"vHwJOO-sUdF2"},"outputs":[],"source":["AUDIO_FILE = \"sample.wav\"\n","REFERENCE = \"sample.rttm\""]},{"cell_type":"markdown","metadata":{"id":"CUmGkiY-V-wI"},"source":["### Non `Google Colab` setup"]},{"cell_type":"markdown","metadata":{"id":"_E4buYUQWXE5"},"source":["If you are not using Colab, clone `pyannote.audio` [GitHub repository](https://github.com/pyannote/pyannote-audio) and update ROOT_DIR accordingly"]},{"cell_type":"code","execution_count":2,"metadata":{"executionInfo":{"elapsed":232,"status":"ok","timestamp":1704806051725,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"Wii7pyvSTIa1"},"outputs":[],"source":["# clone pyannote-audio Github repository and update ROOT_DIR accordingly\n","ROOT_DIR = \"/pyannote-audio\"\n","AUDIO_FILE = f\"{ROOT_DIR}/tutorials/assets/sample.wav\"\n","REFERENCE = f\"{ROOT_DIR}/tutorials/assets/sample.rttm\""]},{"cell_type":"markdown","metadata":{"id":"o01jf1VTXC8u"},"source":["## References\n","\n","First, let's take a look at the audio reference used in this tutorial. It can be accessed as follows:"]},{"cell_type":"code","execution_count":4,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":259},"executionInfo":{"elapsed":542,"status":"ok","timestamp":1704807074571,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"Q4hYAMEiTIa3","outputId":"9fecf73e-2448-4a4e-93e4-f35d2ec4da8b"},"outputs":[{"data":{"image/png":"","text/plain":[""]},"execution_count":4,"metadata":{},"output_type":"execute_result"}],"source":["from pyannote.database.util import load_rttm\n","\n","reference = load_rttm(REFERENCE)[\"sample\"]\n","reference"]},{"cell_type":"markdown","metadata":{"id":"3cpR2brxTIa6"},"source":["## Loading models from 🤗 hub\n","\n","A bunch of pretrained models are available on [🤗 Huggingface model hub](https://hf.co/models?other=pyannote-audio-model) and can be listed by looking for the [`pyannote-audio-model`](https://hf.co/models?other=pyannote-audio-model) tag."]},{"cell_type":"code","execution_count":5,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":1201,"status":"ok","timestamp":1704807077972,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"OXNju-sFTIa7","outputId":"6c587fd6-f14f-4162-81ba-0609153aa7f2"},"outputs":[{"data":{"text/plain":["['pyannote/TestModelForContinuousIntegration',\n"," 'pyannote/embedding',\n"," 'pyannote/segmentation',\n"," 'pyannote/brouhaha',\n"," 'pyannote/segmentation-3.0',\n"," 'pyannote/wespeaker-voxceleb-resnet34-LM']"]},"execution_count":5,"metadata":{},"output_type":"execute_result"}],"source":["from huggingface_hub import HfApi\n","available_models = [m.modelId for m in HfApi().list_models(filter=\"pyannote-audio-model\")]\n","list(filter(lambda p: p.startswith(\"pyannote/\"), available_models))"]},{"cell_type":"markdown","metadata":{"id":"LrQ2ykU4TIa-"},"source":["Official [pyannote.audio](https://github.com/pyannote/pyannote-audio) models (i.e. those under the [`pyannote` organization](https://hf.co/pyannote) umbrella) are open-source, but gated. It means that you have to first accept users conditions on their respective Huggingface page to access the pretrained weights and hyper-parameters. Despite this initial process, those models can perfectly be downloaded for later offline use: keep reading this tutorial until the end to learn how to do that.\n","\n","For instance, to load the speaker segmentation model used in this tutorial, you have to visit [hf.co/pyannote/segmentation](https://hf.co/pyannote/segmentation), accept the terms, and log in using `notebook_login` below:"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":145,"referenced_widgets":["a73cecb8608f491e8f996c5114df4d0f","bc10c6804346449b8e349728a1acb8d2","73915caf6d1042ca9a27853986217ae2","2d1965df902b4b7eb6e4e28e44d6e32b","b2e03b10050d46548932f429bcc18d81","0c537fef94bd4a3d8c42d981a70ea35b","79d55946b3764666a0b57a72722f6c19","988c83c85f7d44d0be949de6eedcf977","53919e13e42441d9b8cf5eb4f2effe96","47da80658de14604a7cc5268b951c172","c7fbfffbbc304f0ba025e03beb8f638a","416ffb7703b1404ab1c19c6b1ecafb4c","5be7421c575d41128d0675dc9abbc196","8b3353af24074ce5b5d2d9b162c61062","cd40e31c191044b7a0f8fa4694ad6380","ecbb880cf0044404ab9bfc5995656807","79287ffea49a4e4a80191b1157de15cc","0e04a45510f346b8ab3c465b10ccdfd2","bf5c88fe90f0462b95aa8a81467c1e9d","09ee64808a5c431b928066a4cb287a78","52a4a1aab39741f2a0499428f07a4c25","f23cc133109c4ecb9b383f118bcc71aa","2b91f4b11d5848c0b6795b649bc2d7bc","b437981acb894f97ad45daaebe2734c7","01a79756e1104be0bcbe1d233677d605","990375a830eb4cd79c69ad78b9ac685b","49cb317b141a43b7b8ce342e2d62ab03","abc23089f4ef4d2a9f8a104fb1dba14a","57cf64e9c70e4057a76c2f3fb1e9b191","5e90e4818f1f43f2aa4714e0e40dc15d","fb027db34fd740c2a4758d650ae3f73c","1a30ac8b519948bbad72886049b53d55"]},"executionInfo":{"elapsed":215,"status":"ok","timestamp":1704807086866,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"x9iQwHjfTIa_","outputId":"b5b3c597-4c2e-4306-e6a5-9ccc9ed19d08"},"outputs":[],"source":["from huggingface_hub import notebook_login\n","notebook_login()"]},{"cell_type":"markdown","metadata":{"id":"fUvB3kwwTIbA"},"source":["Once authenticated, you can load the model..."]},{"cell_type":"code","execution_count":null,"metadata":{"executionInfo":{"elapsed":3254,"status":"ok","timestamp":1704807117456,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"IzYUeS99TIbE"},"outputs":[],"source":["from pyannote.audio import Model\n","model = Model.from_pretrained(\"pyannote/segmentation-3.0\", use_auth_token=True)"]},{"cell_type":"markdown","metadata":{"id":"UxuGtfPoTIbF"},"source":["... which consists in SincNet feature extraction (`sincnet`) , LSTM sequence modeling (`lstm`), a few feed-forward layers (`linear`), and a final multi-label `classifier`:"]},{"cell_type":"code","execution_count":8,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":223,"status":"ok","timestamp":1704807124564,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"BQ2RDdqUTIbH","outputId":"6caf82ff-7a19-401d-ad79-8350315ec9de"},"outputs":[{"data":{"text/plain":[" | Name | Type | Params | In sizes | Out sizes \n","---------------------------------------------------------------------------------------------------------\n","0 | sincnet | SincNet | 42.6 K | [1, 1, 160000] | [1, 60, 589] \n","1 | lstm | LSTM | 1.4 M | [1, 589, 60] | [[1, 589, 256], [[8, 1, 128], [8, 1, 128]]]\n","2 | linear | ModuleList | 49.4 K | ? | ? \n","3 | classifier | Linear | 903 | [1, 589, 128] | [1, 589, 7] \n","4 | activation | LogSoftmax | 0 | [1, 589, 7] | [1, 589, 7] \n","---------------------------------------------------------------------------------------------------------\n","1.5 M Trainable params\n","0 Non-trainable params\n","1.5 M Total params\n","5.893 Total estimated model params size (MB)"]},"execution_count":8,"metadata":{},"output_type":"execute_result"}],"source":["from pytorch_lightning.utilities.model_summary import summarize\n","\n","summarize(model)"]},{"cell_type":"markdown","metadata":{"id":"ahfL4saFTIbI"},"source":["More details about the model are provided by its specifications..."]},{"cell_type":"code","execution_count":9,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":195,"status":"ok","timestamp":1704807127275,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"B0OFti5ITIbJ","outputId":"c2d74d99-7727-4f36-9a0f-4ec2aa76bd25"},"outputs":[{"data":{"text/plain":["Specifications(problem=, resolution=, duration=10.0, min_duration=None, warm_up=(0.0, 0.0), classes=['speaker#1', 'speaker#2', 'speaker#3'], powerset_max_classes=2, permutation_invariant=True)"]},"execution_count":9,"metadata":{},"output_type":"execute_result"}],"source":["specs = model.specifications\n","specs"]},{"cell_type":"markdown","metadata":{"id":"kP5sRrrdTIbK"},"source":["... which can be understood like that:\n","\n","* `duration = 10.0`: the model ingests 10s-long audio chunks\n","* `Resolution.FRAME`: the model output a sequence of frame-wise scores\n","* `len(classes) = 3`: model handle chunks with up to 3 speakers\n","* `powerset_max_classes = 2`: at most 2 speakers can talk at the same time (overlapped speech)\n","The previous two specifications give the classes that the model can predict: {no speech}, {spk1}, {spk2}, {spk3}, {spk1, spk2}, {spk1, spk3}, {spk2, spk3}, so a total of 7 classes.\n","More details about powerset can be found in the article `A. Plaquet and H. Bredin, “Powerset multi-class cross entropy loss for neural speaker diarization,” 2023.`, available [here](https://arxiv.org/abs/2310.13025).\n","* `Problem.MONO_LABEL_CLASSIFICATION`: the model prediction associates one class to each time frame"]},{"cell_type":"markdown","metadata":{"id":"wTY4-nT2TIbM"},"source":["To apply the model on the audio file, we wrap it into an `Inference` instance:"]},{"cell_type":"code","execution_count":10,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":657},"executionInfo":{"elapsed":1614,"status":"ok","timestamp":1704807131091,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"vKcaevu8TIbN","outputId":"e020fdb9-3c92-4f76-fcec-5053c1842d19"},"outputs":[{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAABi0AAAKACAYAAADgsjvAAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB4q0lEQVR4nO3de7xkWV0Y+t+uqu6enuk+PdPzpGea91NgAFFeCiIij0QUFSOYGOAaNAQxKCaKVxm9mphoEh9XwahXMTGgJgqJ5qJR5HGJ+ODNgLwGcBp6BpgZpk+/prurat8/Tu3ddbrPOd1VtVftVXW+389Hpzl9zt6ru3ftvfb6rd/vV5RlWQYAAAAAAEDLOm0PAAAAAAAAIELQAgAAAAAAyISgBQAAAAAAkAVBCwAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFnoTfNDw+EwDh8+HHv37o2iKJoeEwAAAAAAsEDKsoyjR4/GgQMHotOZPl9iqqDF4cOH4+DBg1OfFAAAAAAAWD6HDh2KG264YeqfnyposXfv3vrkKysrU58cAAAAAABYfKurq3Hw4ME6fjCtqYIWVUmolZUVQQsAAAAAACAiYuaWEhpxAwAAAAAAWRC0AAAAAAAAsiBoAQAAAAAAZEHQAgAAAAAAyIKgBQAAAAAAkAVBCwAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALLQa3sAAGzg5j+I+PQ72h4FXJyn/WjEZVe1PQpa8lN/+VMxLIdtD2N7uetTEUdvT3LoTkR8w87r4tE7Lk9yfLhoRRHxZd8Ucf+nrvvy//zU/4z3fP49EYPTEbd/KKJ/Kv1Y9lwTceUD05+HbadbdONbHvQt8bArH3bxPzQ4E/HOn49Y/VyycU3t3k+MeNS3tz0KAJaAoAVAbgZnIt74PWsv47AIvuqfC1psY7//8d+Pftlvexg06MOrn4o3HP5828OAiE+9LeL73lf/z3v698SPvvNH53/POfW5iDvfd+HvgykcOnYofuXpv3LxP/CZd0a89afSDWgW5VDQAoBGCFoA5GZw+mzA4in/IqK7s93xwIXsvrztEdCilz76pTIt5unMPRHv/Pdrv77vkyOK5qq9Hu4fizceuyWO77k64mv/SWPHhYkd+0LE3/xaxKlj6758anCqDli8dP9XROeWP4u47JqIaybYpT6pT78jIsqIJ/3ziF170p2HbedTd38q3vyZN8eJMycm+8HTo8/Fyg0Rj31h8wObxXU3tj0CAJaEoAVAboaDs79+8g9G7LikvbEAXMB33/jdbQ9hezl+Z8Qf3bT262/87YhOc0GL937+vfHGP35hDHdfEfE1/7Kx48LEvvC3a0GLcrDuy+MB0n+6/zHRec8fRNz72RHf9KvpxvJ/XRUxPBPxkBdE7Ls+3XnYdv781j+PN3/mzTE45zq/oOpd4fJ7u1cDsLQ04gbIzXCs5EGn2944AMjPumdEs1P5zihroz9U7ouWFaP5zznX4vjibqf6dZF4rtTZeCwwq+7o2h0MJw1ajK5F7wkALDFBC4DcjJdZSf0iDsBiqRZqO80nTPdGx1Tui9bVgYL112K1uNsremd3m6deuK0+a5PuhocL6I6u3YnvudX3C1oAsMQELQByU++2KhrfRQvAghum211eZVpMXKoEmlYtxm5SHqpTdOa3cFtsHECBWU19z034HACAXFgNA8hNOaedgwAsnoTPiKpUiUwLWlcHCtYv5laLu91Od34Lt9UGEsE8Gjb1Pde7AgDbgKAFQG7sngJgM3PItBC0oHUXlWkxp4XbTQIoMCuZFgCwOUELgNzYPQXAZuqSOM1P4+umsHaU07YLZFp0is4cMy02DqDArGRaAMDmBC0AcmP3FACbGfbX/puyp4Ud5bStXowt1/WSqK7NbtE9+1mYW6ZFP+152Haqe25/0murflewnAPA8vKUA8hN9SKiCTcA5xqm72kh04LWjS/Gjl2PdU+Lons26yj1wm1HI27SmDrTon4O9BoeEQDkw4oYQG5KLyIAbCLhM6Lb0YibTIxf32OZP9W1uZZpMaf5kvJQJFLdcycOFCsPBcA2IGgBkBvloQDYzBwaccu0oHXji7Hl+UGLTkcjbhbfzJkW3hUAWGKCFgC5sXsKgM3MoRG3TAtaN74YO9ykPJRG3Cy4KlCsETcAnE/QAiA3dk8BsJk5ZFoMy2GUZdn48eGiXSjTopBpweKbuo+QRtwAbAOecgC50YgbgM0kXKjtjgVCZFvQqnWZFmevxf6wHxHnZlrMqRG3TAsaVgeKJ23yPpRpAcDysyIGkJtSpgUAmxgt2ibJtBgLlutrQavGN25U13xs1og7dabFaCwyLWhYFSjul/0LfOc56uB14ib0ANAiQQuA3Ay9iACwiYTPiF5x9piCFrSuusbL83tarG/EnXi+VB1f0IKGdTsacQPAZgQtAHKjuR4AmynTlRDsjJXZUR6K1m3QS2LDTAuNuFlQ1T134iCxdwUAtgFBC4Dc2D0FwGaq2ucJnhHjPS1kWtC6DYIFdaaFRtwsgeqeO32mheUcAJaXpxxAbhLuogVgwSVcqF2XaTFpY1ho2kVnWmjEzWKqG3GXwyjL8uJ/sApyyLQAYIlZEQPIjUwLADaT8BkxHrSQaUHrqs0bY7vQB8PxTIs5LdxqxE0i49ltE2VbVM3pvSsAsMQELQByM1SnFoBNVItVCZ4RRVFMX2MdmlZnWvTrL1XX5VqmxZwWbjvKQ5FGpzNloHg4pyb0ANAiQQuA3JQyLQDYROJnxHi5EmjVBsGCujxUpzu/TR6F8lCkMXUfIY24AdgGBC0AcmP3FACbGaYtidMr1p49Mi1oXTUPumAj7sTzper4Mi1o2PTloWxwAmD5CVoA5MbuKQA2k/gZUWdaaMRN2y66EfecykMJ5NGw2TMtLOcAsLw85QByUy0UFW7RAJwj8UJttYgm04LWbdSIu9yoEXfi+ZJG3CTSGZvrTxQort8VbHACYHlZEQPITcImqwAsuNSZFh09LcjEBpkWg+F4I+55Z1r4TNCs8aDFZI24vSsAsPwELQByoxE3AJuRacF2sUFZpuq67Bbd+ZXT3CB4Ak0oiuJsSb5JgmLz6ucCAC0StADIzVBPCwA2Ue+wTTONrxbQBC1oXR0s6NdfqntadLpnvz6vTIuxcUBTprrnasQNwDYgaAGQG7unANhMXcc/zTNCpgXZqK7x4fmZFp2iM7bJI/F8qTq+zwQJ9Iq160sjbgBYz1MOIDf17im3aADOMafyUBM1hYUU6kbcZxdzq0yLtUbcc1q4VR6KhOryUBpxA8A6VsQAclPvovUiAsA5UjfiVh6KXNTBgrOLuXV5qKI7v4XbDYIn0JSpstvm1c8FAFokaAGQm3nVaAZg8aTOtBgtgk3UFBZS2KARd380R1qfaTGvRtw+EzSv05miEbd3BQC2AUELgNxoxA3AZhKXxJFpQTY2KMtULez2Or35NSPeIHgCTZkq02Je/VwAoEWCFgC5Kef0Eg7A4klcEqfuaSHTgrZtECxY14h77pkWghY0r+5pMck9V3koALYBQQuA3Azn1FgSgMVTlQVJ3dPCAi1tq4MF/fpL63tazKlETuf8cUBTqntuv5zg+qqD194VAFhennIAuakbcUv5BuAcZdqyIFOVKoEUOuf3kliXaVF9PfVuc+WhSKhXrN3Lh5P0TJFpAcA2IGgBkJt51WgGYPGkbsStPBS52CBYsC7TQiNulsBUfYS8KwCwDQhaAOQmcekPABZY4oXaTkcjbjKxQS+JqmzZWqaFRtwsvm5nikCxdwUAtgFBC4DcaMQNwGbqhdo003iZFmRji0bc3U4bmRaCFjRvqkyLxGUCASAHghYAuRmqUwvAJsq0dfynWkCDFDYIFqxvxC3TgsU3VaBYI24AtgFPOYDclF5EANhEVRYkcU+LgV3ltG1Uqqy+5uPcRtzzzrTob/19MIU6UDzJPVcjbgC2AStiALkZSvkGYBOJnxF10MKuctpWXeNjO9BbacTd0YibdKa652rEDcA2IGgBkBu7pwDYzJwacetpQes2asS9UaaF8lAssKnKQ3lXAGAbELQAyE3i0h8ALDCNuNkuNmrEPbr+55ppoRE3CVWB4skyLbwrALD8BC0AcqMRNwCb0Yib7WKLTItuRyNulsNMjbiVkgVgiQlaAOSmTLuLFoAFlnihVqYF2agacY8FC+qeFtGJiHL0fTItWFxTBYrrLCPvCgAsL085gNwM0+6iBWCBpe5pIdOCXBTnN8Cue1pEMfZ9iV9pNwieQFOmy7TQiBuA5SdoAZCb0osIAJtIXMu8WkAb2FVO26rAXHXNx9mF3c5G35eKTAsSqgLF/bHr/II04gZgGxC0AMhN3dNCnVoAzpG471F3dFyZFrSumgdtVB6q2OD7Uo9D0IIEqnuuTAsAWE/QAiA31U4ru6cAOFfiRtx6WpCNDTIcqt3onbI4//tS0YibhGYqD+VdAYAlJmgBkBuNuAHYTOIdtnpakI0NggXVwm6vKM7/vlSUhyKh2RpxC1oAsLysiAHkRiNuADaTeLFKpgXZqDZvjAULNm7ELdOCxaURNwBsTNACIDcacQOwGZkWbBd1sODsYm7d02I8aNFJ/EpbB08E8mieTAsA2JigBUBu1KkFYDOJ+x7VC2hK4dC2uixTv/7S2UyLcv33pNQ5fxzQlCrTYqJ7rkwLALYBQQuA3NS7p3rtjgOA/CRuxN0bPXuUh6J11TxoeH5Pi+653zOPccg+IoHu6F5+0ZkWZSnTAoBtQdACIDfVTj67pwA4l/JQbBcb9JKodqPXPS3msWirETcJTdzTYvz7vCsAsMQELQByUzfidosG4BwacbNdbNGIu77651keSiCPBCYOFI8Hz2RaALDErIgB5EYjbgA2I9OC7eJiGnHPY4OHRtwkNHmmhaAFANuDoAVAbjTiBmAzMi3YLjYoy1Q34i7P+Z6UZFqQ0EyZFjY4AbDEBC0AciPTAoDNzCvTQv1+2rZBsOD8Rtx6WrDYqnuuTAsAWE/QAiA3VSNuLyIAnKvOxkszja8yLZSHonV1sKBff6k/+nUnyvXfk1Ln/HFAU+p77sUGxWRaALBNCFoA5KZuxO1FBIBz1OWhekkO3+0oD0UmOudnONSZFlV5qESfg/XjGJ1DII8EqnuuRtwAsJ6gBUBulIcCYDMacbNdZNOIuwqeCOTRvKkbcRediKJINCoAaJ+gBUBuNOIGYDMacbNdbNGIuzvX8lCjV2aBPBKYuhG3zU0ALDlBC4DcyLQAYDMyLdgutmjE3Tn3e1LSiJuEps60sLkJgCUnaAGQG5kWAGymWthK3Ih7qBQObdsq06JsoRG3QB4J1IHiSRtx29wEwJITtADIjZcRADYz7K/9N3GmRb/sJzk+XLS6EffZa7Fa2O3Uwbt5Zlr4TNC8KlA8cXkom5sAWHKCFgC5kfYNwGbqBateksP3RsfV04LWbdCI+2xPi3O+J+k4Rp812Uck0O0oDwUAG5npbefI2342yssuaWosAERE79SRKIoi7jl9KsoTR9seDlzQFZdcFp1EpWrIW1mWcebt/37doiJp7Th9LIqIODmIKE83v/O7P1oPu+XuT8Vr3vcfGz8+XKzii38bvX0rUZ78TAzf9J0REXHi1JGIiCg//qcRETGITpxK8DlYN45BGbsjouyfjDNv/Zmk52L7KY/cHBERH/7MW+NXvnDLhX/g9PHo7luJcseu6Gd4j37QFQ+Op9/nqVEURdtDAWDBFWVZFQS9eKurq7Fv37542GsfFt3dIvwAsJ297dv+Iq68dG/bw6AFJ073Y8e/uiZ2FGq9z9tX3fML8bm4uvHj9lbeH7uv/53GjwtN+l+3fi7uNRjEXw4fFs8//WNJz3VVHIl3X/LSpOdg+3r93j3x01ftb3sYjTn9pcfF+/7Zf4xLd6bJBgQgf1Xc4MiRI7GysjL1cTxJAACY2n8dfE10QqbFPH20vHeSgEVERP/ow+LUHV8bRU+mH+3qRBlfVvxd7I5T675+7amd8fZT949hdOL3Bk9NPo47Yl/8n2f+j3hk8ank52L7OXVkEE/o3B0nu5M9Rz9bXh23RX7BjsHJ+7Q9BACWxEyZFp++7bOxd4aICQCw+JSH2r7KsoyTZ2RZAABrdu/oKg8FsI1lkWmx/9K9saIcBADAtlQUhRIQAAAANMq2SAAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALIgaAEAAAAAAGRB0AIAAAAAAMiCoAUAAAAAAJAFQQsAAAAAACALghYAAAAAAEAWetP8UFmWERGxurra6GAAAAAAAIDFU8ULqvjBtKYKWtx5550REXHw4MGZTg4AAAAAACyPo0ePxr59+6b++amCFvv374+IiFtvvXWmkwOLa3V1NQ4ePBiHDh2KlZWVtocDtMS9AHAfANwHAPcBoLoPfOQjH4kDBw7MdKypghadzlorjH379rkRwTa3srLiPgC4FwDuA4D7AOA+AMT1119fxw+mpRE3AAAAAACQBUELAAAAAAAgC1MFLXbt2hU33XRT7Nq1q+nxAAvCfQCIcC8A3AcA9wHAfQBo9j5QlGVZNjAmAAAAAACAmSgPBQAAAAAAZEHQAgAAAAAAyIKgBQAAAAAAkAVBCwAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALIgaAEAAAAAAGRB0AIAAAAAAMiCoAUAAAAAAJAFQQsAAAAAACALvWl+aDgcxuHDh2Pv3r1RFEXTYwIAAAAAABZIWZZx9OjROHDgQHQ60+dLTBW0OHz4cBw8eHDqkwIAAAAAAMvn0KFDccMNN0z981MFLfbu3VuffGVlZeqTAwAAAAAAi291dTUOHjxYxw+mNVXQoioJtbKyImgBAAAAAABERMzcUkIjbgAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALIgaAEAAAAAAGRB0AIAAAAAAMiCoAUAAAAAAJAFQQsAAAAAACALvbYHAMAmztwTceKOdMfvXRJx2VXNHe/U0Yh7jjR3vKbsvCy+WPZjUA7aHslS2NndGfsHZUT/ZNLznBqcji+dmvf1VETsvTaiKOZ83sV37aXXRuHvbXsa9COO3d72KFhGe66L6J7/unpmeCbuPHlnCwOC5nWLblx96dWT/+CpYxH33N34eGBmu1YiLllpexTAEhC0AMjR6eMRv/iYiGOfT3ue5/xCxGNfNPtx7vhkxK98dfKF7Gn83P4r4jf27W17GEvlR+64K15w9Fiy458oivj7NxyIO3rdZOegWR/4xx+IIgQttqXjX4j4uYe3PQqW0XU3RnzPO9YFkoflML7tf3xb3HLklhYHBs160cNfFK/8ilde/A/c9emI1z4p4syJdIOCaT3txyKe8oNtjwJYAoIWADm6+9DZgEV3Z/PHH/YjymHEZ9/dTNDi8zefDVikGO+0BmfiAzt3RMTaTrZOoSriLAblIIblMD54ya54wbETEZ0004jDO3p1wGJHWSY5x+aKiO6OOZ8TFlxO930WX1lGDM9E3P7BiMGZiN7Z6+vYmWN1wKLX6QmWstCG5TAG5SA++MUPTvaDn//w2YCF+y+56dh0BDRD0AIgR1Upo8uujvgXn2z++O/8uYg/+/G1wEUTqvHe98kRL/qjZo7ZhP/xfTG87c0REfHvvubfxdPv8/SWB7TY/tOH/1P87Lt/NgYREU/9kYiv+RdJzjO462MRf/i8uPKSK+Nt3/62JOc4z+feE/FrT4vYdzDi+987n3PCMlg5EPFjX2x7FCyTU0cjfvqGtV+fU9pxODw7b3n3P3x3dC2OscDecutb4hVvfcXkJUyr77/3EyP+jz9ufmAAkAFbTgFyNBy9jBSJXsar4w4b6vNQLSLklsnQ6cZgtAtTlsXsqsWhYUREJ93f53AUTOumuv430vRnAoDpjN/7z7knjy/ueq6z6Kp5znDSTUSp3xMAIANmegA5ql7KU+0grI7bVHPq1OOdVtGN4ahyxFwXwJdUtUA0LIqkL8rVy3snYWDkPE1/JgCYzvhc4txMi+r5UHSiKJSGYrFV86rJMy1GQY55zpMAYM485QByVGcuLEqmRaY7vmRaNKoK/Awikgaoqpd3mRYA29BFZFp4prMMZFoAwObM9gByNOyv/TfVDqpqwbk6z6zq8Wb28lR0o1rukGkxu3pHYOJMi1YWpZr+TAAwnc6Fgxae6SyDap7Tn3Tukeu8GwAaJGgBkKO63FIvzfHrUjgNN+JONd5pdcbKQ3mxm9n6TIt0/9aDYQuLUtWfp6nPBADTKYqzPbI2acQtaMEy6I3mHhNnWuQ67waABglaAORo4RpxV+PN7LGiPFSjlrsR9+jPozwUQPs2mafItGCZ1L3ClIcCgPNYwQHI0cI14q4aAmb28qQRd6PmXh5KI26A7WmTe3LdiFsDYpZAncE6cSPu6j3B5wCA5eUpB5Cjhc20yCww0Dnb00KmxezqhpERSQNU7WRaaMQNkA2ZFmwD02dajL7f5wCAJWYFByBHdeZC4kbcjWVaJM4MmVbRjeGoPJQFjtmdzbSI5W3ELdMCoH2b9N6qMy1sRGAJzJ5pYW4LwPIy2wPIUfJMi4br92ebadFZW2APpSSacDbToljeTItyGFGW8zsvAOer5yn9dV/ul2v/W9CCZVBnWgwnzbQYfS5ym3cDQIPM9gByVL2MpO5p0VjQohpvZo+V4mwjbpkWs6t3BEakzbQYtphpEaFEFEDbNpmnVIu7vaI37xFB46p5ThWMu2jV56LjcwDA8spsdQmAiBhL+070MlIdt/FG3Jm9PHV6GnE3qDtaRBoUkTTTopWa5eN/HiWiANq1yTyllfKBkEhvdJ1P3NNCI24AtgFPOYAcacTdjLFG3IIWs6vLGMyrPNQ8azUXMi0AsrHJPKWV5wMkUvcKm3SzhEbcAGwDghYAOdKIuxljjbjtypzdtmjEHSHTAqBtm8xTZFqwTOpeYVNnWmQ27waABpntAeRIpkUzOt26EbdMi9mdbcQdy9uIO0KmBUDb6kbc6xdzW3k+QCJ1BuukQYtc590A0CBBC4Acpd5BVWVwTPqStJlcd3wVnaj+hB11f2dWZ1pEseSZFg19LgCYTt2Ie32D4sFQpgXLowq+DSbdLFF9LnKbdwNAg8z2AHJUvYwkz7Tob/19F6ve8ZXZY6XTjX6xlmphV+bsqoaRa4240/1bVy/v8820GPvzNPW5AGA6xdbloTzTWQZT97SoNwv1Gh4RAOQjs9UlACLibBAgWaZFovJQue34KrpnMy1yC6gsoHWNuJct06IoxsqRKA8F0KpN5inKQ7FMqobykzfiznSzEAA0yFMOIEfJy0P11p9nVpnu+CqLbgxlWjSmLmNQRNJ/62pRqjfv66npzwUA09nkflwHtZV8ZAmMz00n6mtRfW9um4UAoEFmewA5qhpPasQ9k+HYooagxezOZlpE0hflVjItIpr/XAAwHY242QbG5zkTZVtkOu8GgCYJWgDkSCPuRgyjqH9tV+bs6kyLxOWhqkWpuQctOhvXUAdgzja5H7cW1IYEps+0yHPeDQBNMtsDyFHqHVTJMi3yeqwMOmeDFnZlzq7OtCgi6Ytyaztp689FQ8E8AKZT34/76748GGrEzfJYl2kxyZy8+lz4HACwxPJaXQJgTfJMi4Z3lGdaW3ddpkVmAZVFVDeMjEgaoGptJ22dgSTTAqBVmzTirp4PghYsg25nykyLoUwLAJafFRyAHKXeQbXJDsapZbrjq1/ItGjS2UbcRdqeFm3tpG36cwHAdIqNN1fU5QOVfGQJTN3TItPNQgDQJLM9gBwNE7+MdHqj8zRcHqo6biaGhUbcTVrfiDvdv3W9k3beL+NNfy4AmM4mmRYacbNMxq9jjbgBYD1BC4Acza081HI34h4oD9WoeTfinvuilEbcAHnYZJ6iETfLpFN0ohjNVTXiBoD1zPYAcpS8EXdn/XlmlWkj7uFoOJ0yohgrFcV05tWIu7VFKY24AfJQyLRge6g3hEzUiFumBQDLL6/VJQDWaMTdiCrTwsOuGWczLSJpgKq9TAuNuAGy0Nm4x1B/9L9lWrAs6g0hEzXiHn0uMpt3A0CTzPYAcpQ802LjHYxTy3TH13CUXZHXqBZX1WOiLIr67zaF9jMtBC0AWnWBRty9Iq8eWjCtam6lETcArCdoAZCjegdVotv0JjsYp5bpjq/q9c/DrhnrGkZGwqDFKGjQWk+Lpj4XAExnk0bcdVA71fwI5qzaoKERNwCsZ7YHkKNqB1XqTIsoI8py9uOVeb48DWRaNGo88yFlpkW1k7a1TAvloQDaVd3/zymZo6cFy2aqoIVG3ABsA4IWADmqdlB1EpU/GH/JaaIUTurxTqkuD9VEYIb1mRZzKA/VnffLeHX9Kg8F0K5N7setlQ+ERKpSZ8PhJD0t8twsBABNMtsDyNG8GnGPn2sWdW3dvB4r1Z/MK10zOmOxn2HC8lDtN+KeYOEAgOZ1tu5pIdOCZSHTAgA2ltfqEgBr5tWIe/xcs8h0x1e1sN6RaNGI8YyVQbqYhUbcANtdsXGPoarnkUwLlkUVgBtOsmGinnf7HACwvDzlAHJU76BK3Ih7/FyzyHTH19lG3KIWTeiOBSpS5iK0l2mhpwVAFi7QiFumBcuiaio/VdAis7KsANAkQQuAHA3n1Yg7ljvTQiPuRnXGMy0Snqe1nbQyLQDyUDfi3qQ8VGabJGBaVQBOeSgAWE/QAiBHVTmEefS0aCRokXi8U+qPMiw6GnE3oxzUJaKSBi3a2knb2bgcCQBzVt+P1+8+14ibZTNVT4tMNwsBQJPM9gByVCZO+x5/2V/i8lBVT4uemEUzhoN64lBlsaRQBy3mfT3V5aE04gZoVTX/0YibJVdnWkyyiSjTeTcANEnQAiBHyRtxF82WwkldzmpKVbNoPS0aUg7nkmlRLUopDwWwTW1yP5ZpwbKZrRF3XvNuAGiS2R5Ajuaxg6rJpsOZ7viqXv+6ykM1Y9g/m2mRMBCkETfANrdJub5qN7pMC5ZF1Yh7qvJQHcs5ACwvTzmAHNU7qBLephvNtMhzx1f1J5Np0ZDhILqjv8qJXq4npBE3wDa3SSPu1soHQiJTZVqkLiMLABkQtADIUfXiItNiJmczLSJCtsXsykF0RwGgiV6uJyTTAmCb26QRd2vlAyERjbgBYGNmewA5msfLSLHxgsBUMn15Go73tNBceXZjjbiTZlq0VbO8Op9MC4B2FRsHkTXiZtnMlmnhcwDA8hK0AMhRVcM5aaZFtUDb3/r7LkamtXX71eJGGc38Obe78UbcCRf2Wyv/0VEeCiALm9yPNeJm2dSZFpPMPTLdLAQATTLbA8jRPGrVVsdutDxUXrV1h7GWatGN0kJ0E8YyLVKWh6qDFnMvD9XgZwKA6W1yP5ZpwbKpruWJMljrMrKWcwBYXp5yADnSiLsRg2ItK6BThoXoJgz7c2nE3dqilEbcAHmo78frsyT7o/8t04JlUWWVTrQZpPpcZDbvBoAmme0B5Egj7kYMRwvs3QgL0U2YUyPuqkTC3BelNOIGyENdwnLjRty9zDI7YVozNeL2OQBgiQlaAORII+5GDEYL7BpxN2Q4kGkBQHqbNOLW04JloxE3AGzMbA8gR/N4Gal2MTaSaTGHzJApDEdBi7VG3BaiZ1bOp6dFdez5Z1o0+JkAYHqbNOLW04JlM1Omhc8BAEtM0AIgR/OoVbtJveip1OPN67EyqBa/I5r5c253w+Fa1kqcrSueQr9cO3Z33kEwmRYAeZBpwTZRN+KeZO6R6WYhAGiS2R5AjuZRq7Y6dpONuDOrrVstbnTL0u75JpRny0PNI9Ni7jtpm/xMADC9Te7HMi1YNnXQYqpMC8s5ACwvTzmAHNU7qBLeprdFI+7R4kaEhegmDPv1xCFlT4vWdtJqxA2Qh7oR9/qsvmo3ukwLlkWVVTrRZpDqc5HZvBsAmmS2B5CjuTbibjDTIrOdj/Xid4SF6CYMB9Er11ItkmZaDDXiBtjW6vJQ6581dQalxVqWxFQ9Lco8M5wBoEmCFgA5mmsj7hkXn8syYtTnILcdX3WmRVlGDNMtsm8bY424lzPTQiNugCxoxM02Uc11Jsu0yHOzEAA0SdACIEeLlGkx/vOZlWuoy0hEWIhuwnAY3ZhDpkW1KNVaI24BLoBWacTNNlEF4C56XpXxZiEAaJLZHkCO5pJp0VD9/vGfz+zl6WymRSj504RyEJ3Re/JyZlroaQGQBZkWbBMTl4fKeLMQADTJUw4gR3PNtOhv/X0XMv7zmS0i9Mu1sXWinP3PScRwENW/8CBhEKiuWd5aTwvXCkCrZFqwTVRznYueV2W8WQgAmmS2B5Cj4TwyLXrrzzWt8Z/PrCFgtSOzF2H3fBOG/bUAUCxpI+6mPhMAzGaTTItqYVemBcuiKoV58ZkW+W4WAoAmCVoA5GihGnHnu+Or3pGpEXczysFaqa1Y1vJQGnEDZOFC5aEym2/AtCbuaZHxZiEAaJKgBUCOFqoR99hLVmY7vqoXQI24GzJWHmoujbhbKw8lwAXQKuWh2CYm7mmR8WYhAGiS2R5Ajha2EXdej5W6jIRG3M0oh2tZKzGnTIt5X08acQPkQSNutonJMy3y3SwEAE3Ka3UJgDXVC8lCZFrMIStkSmczLUoL0U0Yz7RImI3QfqaFawWgVTIt2CZmy7TwOQBgeXnKAeSoarKX8mWk3sXY3/r7LqQea35Bi+oFsBsx+5+TiHJQZ1r0y3R/n+31tGjoMwHAbKr5zzkBcpkWLJvqWh5c7IaJjDcLAUCTBC0AclSXh0rYYK/p8lAZNgOsgxZlafd8E4b9ufS0qF7ce8Wcr6nqGpaVA9Cu6n58ThC5P/rfMi1YFlVT+YsvD5XvZiEAaJLZHkCO5tqIe8bF54x3fNU7MiMiEi6ybxtzbsQ990Wp6nwCXADt2qQ8VPV86GW4UQKmMXV5KJ8BAJacoAVAjhaqEfdo8TrDurrVjv2ORtzNmHMj7u68dxHWnwkBLoBWbdKIW08Lls3kjbjz3SwEAE0y2wPITVmeXTTViHsmZzMtNOJuxBwyLcqyjDLWAiPzz7TQiBsgCxfItNDTgmUxeaZFvpuFAKBJcgqZWFmWcfKMBR1IZjiIS0e/PNEvI06naQq8M4roRcTp/pnoz3CO4vTp2B0RZdGNk4nGOq1Tg1Ht64g4dfp0DDIb36Lp9c+sZa1ExD1nzsSJBH+f/bH65afOlHGiM79/s+4wYldEDAb9OOVamcjuHd0oiqLtYdAC80JS6AzKuCQihsNB3DN2P64yKE/3yyTPIJi34XDt2Xm637+oa7o4dSrbeTdEmBMCzSnKclTnYQKrq6uxb9++OHLkSKysrKQYFxk7cbof3/Pap8UH9t3V9lBg6ZVRRBlpJn3F6OhNGmaWwFcUa3++f37X3fFPjqzGsDSBnkWnKOMnr7wifm9lb0RElAn+Pqt/s4iIox/78YjhJY2fYzPP7bwzfn7nayIiXCsTuudHvhCX7trZ9jBowWfuPhzf8KZntT0Mlsxmc5RytBD2xkO3xf3PWLBl8f3Gvr3xC1deHhERxYRLM7nNuyEi4p896mXxzx7zPW0PA2hRU3EDmRZMqaxfGoC0mg4sjGs6IJJyrNPqlEU84tSptV8X+Y1v0TzqntPxe3sjolgfYGja4J7rIobzXQS/ubxvnCh3xaXFKdcKTCDlvYDta7M5yrX9flw/6LtPsxQedfpU7CjLOFMUE79f5zjvBoCmyLRgYmVZxhe/9Nk4dWq17aHActt9Rfo+EadWIwanGzhQEXHp/rX/ZuaS3iVx2ZlTEUM7MhvR2xXHOp041ch1s7nLd10+/0bcERFnTkScOjb/8y643ZdfF4X62ttSf9CP247d2fYwWEbD0xH3nP++sbJjT+zo7GhhQJDGif7JODm4Z4KfyHfeDft374nLdl7W9jCAFsm0oDVFUcQ1+w+2PQygEde2PQAW0KUX/pbFtXMl4jIbMuBi9bq9OLjPswRgWks9rwKAKdkSBwAAAAAAZEHQAgAAAAAAyIKgBQAAAAAAkAVBCwAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALIgaAEAAAAAAGRB0AIAAAAAAMhCb5ofKssyIiJWV1cbHQwAAAAAALB4qnhBFT+Y1lRBizvvvDMiIg4ePDjTyQEAAAAAgOVx9OjR2Ldv39Q/P1XQYv/+/RERceutt850cmBxra6uxsGDB+PQoUOxsrLS9nCAFrgPAO4DQIR7AeA+AJy9D3zkIx+JAwcOzHSsqYIWnc5aK4x9+/a5EcE2t7Ky4j4A25z7AOA+AES4FwDuA0DE9ddfX8cPpqURNwAAAAAAkAVBCwAAAAAAIAtTBS127doVN910U+zatavp8QALwn0AcB8A3AeACPcCwH0AaPY+UJRlWTYwJgAAAAAAgJkoDwUAAAAAAGRB0AIAAAAAAMiCoAUAAAAAAJAFQQsAAAAAACALghYAAAAAAEAWBC0AAAAAAIAsCFoAAAAAAABZELQAAAAAAACyIGgBAAAAAABkQdACAAAAAADIgqAFAAAAAACQBUELAAAAAAAgC71pfmg4HMbhw4dj7969URRF02MCAAAAAAAWSFmWcfTo0Thw4EB0OtPnS0wVtDh8+HAcPHhw6pMCAAAAAADL59ChQ3HDDTdM/fNTBS327t1bn3xlZWXqkwMAAAAAAItvdXU1Dh48WMcPpjVV0KIqCbWysiJoAQAAAAAARETM3FJCI24AAAAAACALghYAAAAAAEAWBC0AAAAAAIAsCFoAAAAAAABZELQAAAAAAACyIGgBAAAAAABkQdACAAAAAADIgqAFAAAAAACQBUELAAAAAAAgC4IWAAAAAABAFnptDwCAjf3Wh38rPv6ljyc59nMe8Jx4wr2e0Ogxj5w6Er/ygV+J1dOrjR53ViunT8ZLjhyN/dFteygL6X/t7MTbVy6PKIp2B3L8jog7PxkRZaOHvbLYGS/ddTB2F64PFs+nBifiP58+HKdj2NARi4irHhRx6ZUT/dTKzpX4ocf9UENjYOHc8cmI/+/fzX6cXXsjnvzKiL3XnfdbH//Sx+P1f/v6ODM8E1GWEV/424hTR2Y/50W4b2d3/JOdN0TR9nOQ5fOAr4u48dsm+pGyLOM3P/ybccvdtyQaFMBsvu3B3xaPvubRbQ+DJSBoAZChzx37XPy7dzewALCJj9710fj9b/z9Ro/55k+/OX77b3+70WM25fo7vxT/aPVo28NYSD9+7xvi6B3LnZh546H3x9NPnGx7GDCx37pqf/zB3j3NHvS2L0z8I9dddp2gxXZ24o6ID7yhmWPtuSbiKf/ivC//2gd/Lf74M3/czDmm8DW3/FU8+MyZ1s7Pkrr5DyIe+byJNoZ8ZvUz8XPv+bmEgwKYzZMOPEnQgkYIWgBk6OSZtQXU3b3d8c8e9c8aO+7h44fjDR99Q5zsN79AWx3zy678snj2fZ/d+PGn8b/+7n/Fh+74UJwsiogbnx9x7cPbHtJi+Yv/O0521l6kv/vG7469O/a2N5a/em3Ekc9FXP/lEZdd08ghf3/1o/GZM0fi5CO/LWLlQY0cE+bp5O1/HnHslnjqZfeJx15y/u70iRy9LeK2D0Tsv1/EV3zXRD966Y5LZzs3i+3ye0d8/U/OdoyPvTni1r+IOH1iw9+u5hjPuM8z4pG7rox412siepdE3P+ps533An79S++PI8NTcfIJ3xNxybVJz8U2cuZkxNv+dcTgVEQ5jJgg27P6LOzZsSe+58bvSTVCgKk9bP/D2h4CS0LQAiBDg3IQERGX9i6NFz3iRY0d94Nf/GC84aNviMFw0NgxK9WYH3zFgxsd8yz+7ujfxYfu+FD0i4h42DdEPOw5bQ9psXzgd6JfrGWoPP8hz4+rL726vbG88/+JWD0a8ZyXRzzo6xs55F/92UvjM597Z/Qf+LSIB31zI8eEeeq/7TMRx26JJz78O+I7HvYdsx3sw2+M+NiLIi7fE5HJPZwFsXIg4qu+b7ZjHPv8WtBi2N/wt/vl2teffMOT47m7D0b8yb+NWNkX8Q2/Ptt5L+C//sHfjyNHb43BI7414tovT3outpGTd68FLSLWrvnOxQctqjn83p17s5lvA0AKy13vAWBBVQGA7gQvMRejOl51/CZVL1HdjHoDVGMZRhHREaef1LBzdprQ9LU4sXJUs7/BcfSKtWtiWDbVDwDmq7p2e03c36pjJHg+wAVV9/ZN7sfD4drXu0U3Ytj882AzKedNbGPj9+wJNxLV7wgZzbcBIAVBC4AMVQtRTb+Q1Iv4CRZpU415FtVYBkVMlHrPmsHYglDr/67VS32D4+gUa9Mgi1Esqurara7lmVSfrQSZeHBBF7j+1s0xqnv2PIIWCedNbGPj1+6Ec5D6s9D2ZhIASEzQAiBDjS5EjUm5SJtqzLOoxjKMiOjkM65FMRwLELT+75pgkap64bcYxaJqNFhc73QXtKAFF7j+6jlGp5MkiL3psAS3SWH82p0y06L1eRkAJOZJB5Chhc60yGjn17ryUG1nCiygwXh5qLb//mRawHlkWrA0ZFqwnazLtJjs2soxsxkAUhC0AMhQ1R9CpsVsOqNF90ERc1ncWDbLnmlRZ+JYjGJBVXX+G/l8VkFKnwfacLGZFkU7mRaeEzRq/J4t0wIANuRJB5Chhc60yGjn19lMi5BpMYW8Mi1G12yD47CDlkXXaEPWeqd7f/ZjwaSqBdiJMi3Sv8rWvbFkINGkojh7zU/b06LteRkAJCZoAZChfrm2aNRp+IW8zrRI8PLdHy105bTzqxpLvyhkWkxhMPZv2XrZr2ohtcHPRH19WKRlQdVBiyY+n53e2n8tztKGztblodbNMfS0YBlMGShOlY0NALnxpAPIULWLqlf0Gj1ub7QoleLlO8edX9Xf31oj7mb/LreDPMtDNffvWH0eZFqwqDTiZmlU9/ZNrr9186Jh88+DzVQBQUELGjdloLjRYDUAZEzQAiBD1ct5qp4WKRZpc3yJqndIRrG+fjAXpSoP1Y2i5ZGERtywAY24WRoXuP7qa73T0Yib5TBloDjHTUIAkIIVHIAMVanfTb+QpFykTRVomUUVQBlqxD2V4ejfMot/0wSLVBajWHRVI+5mMi004qZFF1jAXbdQqzwUy6AO1E12z9WIG4DtwpMOIEOpAgDjC1tNL9TmuPPrbKZFaMQ9hcFoESmPTIvmG3FbjGLRybRgaVxkI+5OIdOCJVEHimVaAMBGBC0AMpRqF9X48ZpeqM1x51e92KAR91SGo2BFp8ggaFEvUjV3fVmMYtHVZfma7GmhMT1tqDMtNr4fr7vW6yB2+vlGHdwWzKNpUwaKc5xvA0AKnnQAGUrVH2J8YavpF/BUJa1mIdNiNv26p0UG04VqITVFpoXFKBZUveO2iWfFBRohQ1LF1kGz6j7dKTpnv2eOmRYy8mjclIHiHOfbAJBCBqsQAJyr0d2zY8YXtpreXZ5qzLOoFxsiZFpMoeppkUd5qCrTotfYIavPg8UoFpXyUCyN6t5+gUbc3U53LPOuuefBZureWDLyaNqUgeJGg9UAkDFBC4AMpapXuy7TouGF2hxfopSHms2gbsSdQdBCI244T6PPiguU54GkNOJmu9GIGwC25EkHkKF1ZRAaNH68VJkWOb1EdTrKQ81iWJeHallZnl1I1Ygbao0+Ky7QCBmSukCmz7o5hkbcLAONuAFgS/msLAFQW+hMi4xeomRazKZuxN12eajxxSKZFlBLk2khaEEL6gXcje/HG2daaMTNAtOIGwC25EkHkKFULyRFUUQxWoDeFpkWGnHPZNBZu1Za72kx/kLf4PUl04JF1y/XGrhWWWUzuUAjZEhqokyL0fxFpgWLbMpAcY6bhAAghXxWlgCopewPUTenbnjXYI4vUdVi+zBCpsUUhqNpQrftnhbjL/QyLaBWXbu9ooGGxBdohAxJ6WnBdjNtpkWiErIAkBtPOoAMpcxaSPUCnuNLVFXWaFAUcykjsWz6RS6ZFmM7vxP0tOjbWc6CqhZyG7nv1gHBcq2PDMzTBTJ91s0xqu+ZR6bF6ByCFjSuM112W3Ut5rRJCABSsIIDkKHq5TzFC0mqF/D6JSqjjIZqsX0QcXYXMRetasTd+mRhfBdig/+OvdGxZFqwqBpdvBoPfMi2YN4ukOlTXeu9Tm+sEXf653qdnSpoQdPq7KLJ5iA5zrcBIIXW1yEAOF/KUkupSuJkWR5qlCAwLEJ5qCkMlrwRt7IfLLrhMEWmRWjGzfzVjbi3Lg+1lmkx/0bcgts0bsryUDnOtwEgBUELgAwtZHmoHBtxl1WmRaER9xSGdXmoliVuxG0xikXVbKbF2DFkWjBv9QLuxvfjthtxN90HDKZtxJ3jfBsAUvCkA8jQPBpxDzdZGJhWjju/6kyLCJkWUxgUVaZFy8qxXbUNNgXXiJtFVy9edRrOtNDnhXnbYgF3/B7dViNuzwkaN2OmhaAFAMvOkw4gQzItmlFnWhQh02IKVaZFr+3yUIkWqOrPgh20LKhGMy3G+wMoD8W8bbGAOz5fWcu0qHpazC/TQtCCxk2baZGw7x0A5CSflSUAao0uRJ0jVVPJHF+iulFGxKg8VBM7kbeZsz0tWlbt+m54gUqDVRZdoxlu68pDWaBlzqr7+wZZPuOB5bVMi9H3zCPTYjR36Jeyj2hYtclnwo0TOW4SAoAUPOkAMpQyAFCVnErWiDujMkzdqDItWs4UWFCD0Qtx6/+i9a7a3tbfN6HqWhW0YBGVZdlsmZDxwK7PBPNW3d8vVB6qM1Yeag7zjV7RO28M0Ijqmp+yPFSv4TkRAORG0AIgQynr1W6n8lDVcsZQ0GIqw9FfW6dsdxz1ru+Gg3jKfrDIzqvz34Qpa6zDzOpd5+ffj8fnK92iO9dG3MoIkoxG3ACwJU86gAylbGqdaqE2x0bc1UPOkvR0qvJQrf+L1pkWzU5bUgXwYB7G7+GNNOKOmHoRDWZ2kY24O0Vnro24BbdJZsZG3DnNtwEgBUELgAzVu6gS9GFItWswx51f3VGGgPJQ06kzLdodRrIFKotRLLLxGvvNZ1qo38+cXWQj7rVMi/mVhxLcJpkZG3HnNN8GgBQ86QAyJNOiGWczLQQtppFfpkWzI1H2g0WWpDzUlDXWYWYXkWlRRBFFUcw30yJRHzCYthF3jvNtAEhB0AIgQymzFqrsjcZ7WmS486t6nRuIWUyl6gXSLVtuaiHTAs5z3u7zJlTZfT4TzNtWmRajr9XXuUwLlkEdqJvsfptjZjMApOBJB5Ch817QG1QdM1Uj7px2fnVGL4IDmRZT6cdasKL1ycIwcaaFxSgW0HB4Tp3/JmjETVs6Fy4PVV/nVfmyOSzappozwbTl+Or59hyCdgDQptbXIQA4X8oAQKrd5Tm+RHVHwQqZFtM5m2nR8kAS7aqtrlWLUSyi8eu2saCFRty0ZYtr77z5RRWwq8qZJVTPmYayj2jYlOX4lIcCYLsQtADIUP1CkiAAUO8abHgnbY4vUVVZI0sN06kyVFqfLCgPBeeprttO0Vmr898EmRa0ZYtr77z5hfJQLINpG3ErDwXANuFJB5ChpD0tRsdM1Yg7p5eoaiTKQ02nukK60XKqRepG3BajWEBJnhOd6cqVwMwuItOic27j4nk04hbcJpUpg8RV1k9Om4QAIIV8VpYAqKXMWki1UJtjT4s600LMYirV39vSN+JW9oMFVN1ze0WDJXKmbAwLM9sq0+LcRdp5Zlp0BLdJZHRtybQAgI150gFkKOULSVVyqvFMi2G+mRaW36ZTvUZ3ZFpAdpLcc5WHoi31/b0827NiZPNMi/k14pZpQePq++1k11aO5VgBIIV8VpYAqFX9JlKWh+qXzZb/qI6X00tUd/Rip9DJdAajYEXrjbgTLVBpxM0iS5LdphE3bRm/v59z/Z13rQ/nmGmRaM4E05bjq67FKgsIAJaVJx1AhlLuokq1a7DuaZHRS1RHI+6Z1JkWuZSHSpRpYQctiyjJPVemBW0Zv7+fc/2dd62XLfS0UEaQpp1b7uwiybQAYLvIZ2UJgFq9qzDBLsLqJWfQ8KJUlj0tRv8d6GkxleGogXnrPS3q8lAN1u6Ps70AZFqwiGRasFTG7+8XnWnR7DNhI8pDkUx1/U44H89xvg0AKQhaAGRokTMtcnqJGl9st+AwuSrY03pPi0SNuOtMCztoWUBJeh8V05UrgZkVF8600IibpTJlI+4ce8gBQAqedAAZStmIO0Xz4bIsz5ZvyOglanyx3YLD5Kql/F42mRYacUMlaaaFQB7zNn5/PzfT4tw+X4kC2RuRaUEyGnEDwJbyWVkCoJY006LT/Av4+LFyeomSaTGbqhF3Nj0tmm7EbTGKBVbttlUeiqWwLtNi/T35/EyL0e/PoYeW4DbJTHm/rTc2ZdRDDgBS8KQDyNCiZVqML/rm9BI1vtjedA+P7aD6V22/p0W1QNVwpoWyHyywtOWhfCaYs/G5wyY9Ler5hUwLlsGU91uZFgBsF/msLAFQqxbYU7yQ1EGLBhel+uXZ+uc5vUR1xxY+LExPLp9Mi9H11fC1VTeld22wgOqFqyaDeTItaNMmPVXOb8Q9+v159LQYzZn6+rzQtM7G1/uFVHPunMqxAkAKnnQAGUqyGDXSK3rrztGEfMtDFfWv7ZKcXB20iJb/7qoAW6fX6GHtoGWRybRg6VT3+ItuxN3sM2EjKeZMEBFTB4mr0oDVtQkAy0rQAiBDi1YeavxYeQUtzi4y2E0/ueEowaLb9mJNokbcdaaFBVoWUJpG3OeU34F52mQR97zs0zmWh1JGkGSmbMSd8h0BAHLiSQeQoYVrxD32wpXTS1RRDqMYlTayS3JyVaZFt+2/ukSNuKtr1bXBIkrTiHu0c9cCLW3YJNOnukfX84s6kJ1+viEjj2SmzbTQ0wKAbSKflSUAaoucaZFT0CLKQVSvdHbTT25YBS3aLg+VqBF3FcCzg5ZFpDwUS6cKQpwTIDjvWq82Sswj00Jwm1SmvN+e15geAJaUJx1AhpJmWiTYNTi+C7Ioigt89xwNB9EZlTiy4DC5uqfFhKULGpeoFEi1GFVG6fpg4SR5TmjETZsukGnRPff6nEMj7rqMoM8ETZNpAQBbErQAyFB/2I+ItJkW1TmakG193WE/uqOF937Z3J93uxiMSmt1R/9tTXWtJuppEWFBisWTZLetTAvaVN3jz5mfVM/vs5kWo9+fY6aFbE0aV2x8vV9ItnNuAGiYJx1AhhYt0yJJQ9gmlMP6QWcn/eTqTIu2/+7KtJkWEa4PFk+aTItzegbAPBUb7zw/71ofyrRgCVT320nLQ53bmB4AlpSgBUCGziuF0KAk5aFSNIRtwnBQZwlYcJhc3dOi7QX9eoGq1+hh12Va2EXLgjlv93kTZFrQpuoef871d94ibZnmmbCRah4msE3jqut3wmtLeSgAtgtBC4AMpcxcqEqJpGjEnd0L1Fgj7mHbfRkW0LDMJGhRL1A1O20ZDwpakGLRVPe0XtHgwu0mi8YwF5s04j4/06JqxJ3+VbYuD2XjA03TiBsAtuRJB5Ch8cbWTUvaiDu3F6ixRtwWHCaXTyPuaoEqXXko1weLJkldc424adMmi7jnXestNOIW2KZxGnEDwJYyW10CICJxpkWCXYN5Z1qsLbxbcJhcNuWhEi1QjV+vrg8WTZJgsfJQtGmTRdzzSmZW1+c8G3EL5NG0WTMtNOIGYMl50gFkqCr7sXCZFrm9QA014p7FoMykEXeiBSqZFiwyjbhZOjIt2E5kWgDAljJbXQIgIlGD1ZHqmP1hv7FjphzvTIb96I7KQ1Vj5OLVGTTlMGIUwGhFda0mbEyvETeLJslu23N7BsA81ZkU65/X1f25vtar359npoVnBE2rr+cJMy3O/TwAwJLypAPIULWLqtdpsMHqSHXMRjMtUjSEbYLyUDOpe1qU5XmNUecq4a5au2hZVNXCVbOZFhsvGsNcXKA8VD3HqIJqc8i0qOZMsvFoXDXHn7I8VHZzbgBomKAFQIZS1qtN2dMiu11f44247ZKc2HCUXdGLaLfGfaJG3BFna6RbkGLRJOklVC2i+TzQhk0yfdosD1WdU2Cbxs1YHqrRfkYAkCFPOoAMpaxXm7KnRXcOCwgTkWkxk0GMXozLst1FzIQLVBakWFRJnhMacdMmjbjZTmZsxK2nBQDLTtACIEMpG1tvu0yL0S8tOEyuXiiKaDnTIt0ClQUpFlV9321yt+2UO3+hERk34o4Q3KZhs2Za5DbnBoCGedIBZChJrfKRFDvLU2aGzKQc1o24LTZMrioP1YlY2kwLPS1YVGkyLaZrDAuNuFCmxblBjTlmWkQIbtOw+n472fwj241CANAwTzqADKXcRZVikTbbF6jhIDqj8lAWGyY3qBaKylKmBWQmyX23XjQWxKMFF5NpMRxGjJ7rMi1YaFNkWoxfg9ltFAKAhmW2ugRARES/7EdEmh4R1QJXf9hv7JgpM0NmMuzXmRYacU+uXiiKyCNokaDpZHXNuj5YNEnuu/WicXPPB7honY0zfdZd6+MLvHPYKLEu08JzgiZNcb8dvwaz2ygEAA3zpAPITOpdVL1O77zzzCrf8lCDqEZkh+Tkqkbc3WwacfcaP3QVGHR9sGiS3Herz5jFWdpQXX9blYcavzYTPBPO1Rs7h4w8GjXF/Xb8GuzN4foHgDZ50gETK8sy/vuf/2jcevcn2x7KUhpWZQ8iYvj2/xBnOjsbPX554tMREXHLbe+JX/z9b2/kmJ/pH42IiGL1tjjzpz/ZyDGb0P3se6LTWfv7/B+3/FF88IsfbnlEi+WLJ74YEWuNuM+84z9E7Nzbyji6hz8QnYg4PSyif7rZHeDFaP/Gf/nbN8TVu69p9NiQ0nu/8O6IiBiWRZxo6HPRK4vYGRHDz747BhPcy8tLVmLHV39fFEXRyDhYLIePHo7f/dh/m/k4veFdUVy+L4Yf+I8Rn3xj/fV3n/r82i9u/cs4c9dq7Bh9/US/jGj4mXCu/li/gV9+36/Eru6upOdj+yhWD0fv8n0RnaMxvMj5+Jk4ez0O/vxn4ozABZCh3iOfG8V1j2x7GCyBoizL8sLftt7q6mrs27cvjhw5EisrKynGBWTsxOl+vPjXHxcfuexM20NZar2yjHf93Wfjkslv01v6s0t3x/dfe3Wjx6w86cTJ+I+f/2KSY0/rZddeHe+4dHfbw1hov/O52+Lhp9v/vP/QmZfE7w6+ttFjXnq/n4/uJbc3ekyYp1N3PDVOf/FZjRzrxd03x007/vPEP/e58sq44v/8eFy60wLadvSuz707vvvPXpz8PC+/6+747iOrERFxqtwRDz/1/0Q/+R68MvY85NVRdNp/BkJl57CMv/y7Q3UADyAnp77pP8auxzy/7WHQoqbiBt4sgKl0jz0wHn3mtraHsdRuOHlp/M6ZhzV+3P7RYXxV96443mt2d2KnLOIBq/eN1/Uf0+hxZ3XVFzsRe/bE6dxKVy2I+/fvib8+8ZD4m2h3B/XdsSf+5+DxjR/3ntu/OXasfCAimg0OwjyUw51x5ktPaux4vz94clxRHI2VODHRzx2Jy+IljY2CRXPV7qvi9F1PnPk4K8XxeHDx2ejE+eX6dg270b/7gfG6wdrr67uGXzaHgEVERBEnP/eC6F32iTmci+3mocWtsVIcn/jn7n3isvgv/ebfEQCa8Pz9D2h7CCwJmRbAxMqyjJNn1PUFANbs3tFVHmqbMi8EACrmhMi0AFpTFIUSEAAAmBcCANC4TtsDAAAAAAAAiBC0AAAAAAAAMiFoAQAAAAAAZEHQAgAAAAAAyIKgBQAAAAAAkAVBCwAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALLQm+aHyrKMiIjV1dVGBwMAAAAAACyeKl5QxQ+mNVXQ4ujRoxERcfDgwZlODgAAAAAALI+jR4/Gvn37pv75opwi7DEcDuPw4cOxd+/eKIpi6pMDi2t1dTUOHjwYhw4dipWVlbaHA7TAfQBwHwDcB4AI9wLg7H3gIx/5SDzkIQ+JTmf6zhRTZVp0Op244YYbpj4psDxWVlZMSGCbcx8A3AcA9wEgwr0AiLj++utnClhEaMQNAAAAAABkQtACAAAAAADIgqAFMJVdu3bFTTfdFLt27Wp7KEBL3AcA9wHAfQCIcC8Amr0PTNWIGwAAAAAAoGkyLQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALIgaAEAAAAAAGRB0AIAAAAAAMiCoAUAAAAAAJAFQQsAAAAAACALghYAAAAAAEAWBC0AAAAAAIAsCFoAAAAAAABZELQAAAAAAACyIGgBAAAAAABkoTfNDw2Hwzh8+HDs3bs3iqJoekwAAAAAAMACKcsyjh49GgcOHIhOZ/p8iamCFocPH46DBw9OfVIAAAAAAGD5HDp0KG644Yapf36qoMXevXvrk6+srEx9cgAAAAAAYPGtrq7GwYMH6/jBtKYKWlQloVZWVgQtAAAAAACAiIiZW0poxA0AAAAAAGRB0AIAAAAAAMiCoAUAAAAAAJAFQQsAAAAAACALghYAAAAAAEAWBC0AAAAAAIAsCFoAAAAAAABZELQAAAAAAACyIGgBAAAAAABkodf2AABYUP1TEXcfansUWxruvjwODY5HWZZtD2WpXLVzJfYcvyvZ8e8+vRp3nz6a7PjnufxgRGEfB4uijFi9LWJwpvEjX7FrX+zbsafx47JNXHGfiO6OtkdBCwbDQRw6mvecMDf7du2LKy65YrofHvQjvvSZRscDkJPVM8fjrlN3tz2MNSsHzG8mdOng0kaOI2gBwHTu+lTEa57Q9ii29Mprro4/u2x328NYOpeWEW++9bOxfzhs/Ngf37Ejvv3666JfFI0fG9jajrKMN372trhPv9/2UFhEr/hQxOX3bnsUtOB4/3g8503PaXsYC6VbdOO3/95vxyOuesTkP/yfvjHi7/5384MCyMDt3W58ww33ilMdm8oW1Wu/+rWNHEfQAoDpFJ2IXfvaHsXmTh+Lj+xc2xGxu7c7eoVHXhOOnTkWJ4oybt3Ri/3lrsYzFD65e0f0iyK6ZRmXJk+QKSPKiCiKiF0rqU8GzThzfG2XbREx+n+NOF5EnCmKuOWyfXGfewQtmIZg83ZVRBF7d+xtexgL40T/RAzKQXziS5+YLmhx2wfW/rtzT0TRbXZwAC371K5enOp0olOWcVmrBRNG74oREZdkvO6RoU5DawRWcACYztUPiXjVrW2PYnP/+ZtjcPqjERHxm8/8zXj4VQ9veUDL4Rv+4O/H3x29NYZRRLzigxGX7m/0+INb/jDinT8SjzvwpPjVZ/xqo8c+zxc/HvHLXxlxyeURP3xz2nNBU377eRGf/tOIb3pNxGP+YWOH/cdv/sfxvi+8L4bf8qsR93l6Y8cFlt/enXvjL77jL9oexsJ4+VteHm/77NtiUA6mO8Bw9HMv/Yu1smwAS2T4uXdG/NlL4yFXfln83nN+r72BnLgr4mfut/brH/tARNcS+sVaXV1t5DhybQBYTkU3quJFTUX6Oft3OSgiotP87r5hufav1plHOnA1/rL5MleQTLXI1fDnr/5sT7uIBsBFqe63w2nnH4meAwA5qN8H236HH7/Hmh+3wioOAMup043BqC9C6xOeJdKtXrQjkpQkqCap3XmUO6iui6FJKAukul4b/oxUn7mpF9EAuCjd0ULYzJkWSkMBS2gwusfN5X1wK+Pn977YCqs4ACynsUyL1ic8S6TejR1Fkh1+1Qv8XAJNdaaFSSgLpAoqNJyNJNMCYD5myrQoS5kWwFKTaUFF0AKA5dTprC2sx9kdbcyuO/o7HRSRZIffXHfWVOcYajrMAqmu10SZFgM7yQCSqu63/WnmH+OBDptygCXUL9fuja0HLdZlWnhfbIOgBQDLqdNbW1gPmRZNqkvIRER0mm9GVu3ynsu/WTV+i7Qskup6bfjzVwV3lYcCSGumcnzjcxabcoAlVN0bewneNScyfv6h+XEbBC0AWE4acSfRKapMi6Lx8jQRc+5pUb/sl2vlFmARaMQNsNBmut+WghbAcptrueCtjL/rmh+3wioOAMtprBG3TIvmdEdTh2GiSWQ9SU0QEDnP+J9BtgWLQiNugIU2U2bb+HzF/BZYQnPdxHYhdTlh74ptELQAYDnJtEjibKZFmr/TdjItwu4ZFkedaaERN8AikmkBsLmqv1oW7/DVfdb8uBUZXAEA0LyyKM5mWnipa0y3qDIt0vydzjUdeF1zNRNRFkRVU1emBcBCaqynRQ67kAEaJtOCiqAFAEtpfFE9iwnPkujEWiCon6h8U7WzZu6ZFsN++vNBE6prNVFPi77PAkBS1RxnMM0imEbcwJLLpqdFxNn7rPlxKzK4AgCgecOxF7ksJjxLos60SBS0mG95qN7ZX0v5ZVHU5aF6W3/fhHqj48m0AEirkfJQRSdilFEMsEyqe2MW1RLq8lDmx22wigPAUhqMLarLtGhOd5RpkbwR99zLQ5mIsiASNeLW0wJgPhopD2VuCywp5aGoCFoAsJSGY484mRbNmVsj7nnsrBnPFrFQy6KoMy30tABYRJ1OA5kWOexABkhAI24qGVwBANC8dZkWXuwas1SZFhF2z7B46kbczX5GZFoAzIdMC4DNybSgImgBwFIajtX5zWKXxpKoGnEnz7SY1yTV7hkWTaJdttV9UqYFQFp1kHiaRbDqHm1DDrCksmzE7V2xFRlcAQDQvPFF9Sx2aSwJmRbQskS7bKv75FSLaABctF7Ri4hZMy0s5QDLaa7lgi+kfle0qacNnnQALKXBKNOiiEx2aSyJqqdFv1Nc4DunUy2Yzj3TYtifz/lgVtW1mijTQnkogLRmut8megYA5KJfrt3nsniHr0pOe1dsRQZXAAA0bzBa9K4yA2hGb06ZFnPbWVOn/No9w4Koy0P1Gj1srzPDzl8ALlo1x5mtEXezzwCAXGTV06K619rU0wpBCwCW0nCUCdARtGjU2Z4Waf5eq0mq8lCwCY24ARbaTD2ENOIGltzcM++34l2xVYIWACylwWhxPYOpzlLpjmIVqTItNOKGC0jUiLv6zMm0AEir7iE0U6aFGS6wnOa+iW0r3hVblcEVAADNGxYyLVLolKNMi0R/rxpxwwUk2mVbZ1r4LAAkVWdaTNPYNVG2HUAusioP5V2xVZ50ACylwehlLoOpzlKpGnEPEzXilmkBFyDTAmChybQA2Fy9ia2TwZJ1NQbz41ZkcAUAQPOqngsacTerekVOlWnRH/YjQqYFbGr0GUmWaSGAB5DUTEGLRM8AgFxU98a8Mi367Y5jmxK0AGApVT0XPOia1S3X/jtI3NOi1+klOf55OoIWLJDxUiINf0a6nRkW0QC4aNXu4emCFlWmxZzmSQBzllUj7upe612xFdZyAFhK1bRCpkWzulV5qGJJelooD8UiGb9OG06ZVx4KYD5mut8qDwUsuax6WnhXbJWgBQBLqeq54EHXrE6dabEkPS2Uh2KRjF+nykMBLKSZ7rcacQNLbu6b2LbiXbFVGVwBANC8qudCBvszlkr19zlMlMAi0wK2sC7TIlEj7qFMC4CUZrrfyrQAlly9iS2H+5xG3K0StABgKQ0LQYsUqolDqkbcMi1gCzItABbeTOWhqudADmVTABKQaUElgysAAJpXLap70DXrbKbFsvS0sHuGBTKPTAufBYCkZmrELdMCWHJ6WlCxlgPAUhqM1tQzmOosldSZFoPhnIMW9e6Z/nzOB7NImWkxWkTrlz4LAClVC3HT9bQY3aNzWMwDSGDu74Nb8a7YqgyuAABoXtUoumocTTO6c2rE3ev0khz/PNV5pPyyCOrrtDibJdSQXrH2WZBpAZDWbI24ZVoAy626N+aRaeFdsU2CFgAspaFG3EnU5aESHV8jbthCwrIgeloAzMdsjbhHPyNoASypvMpDVaWEzY/bIGgBwFIajhIBOqVUiybV5aHSJFpoxA1bSdiAtQpaTLWIBsBFayTTIofFPIAE6k1sDWcVT6V+VzQ/bkMGVwAANK/quTCnIkPbRjfWgkDDVD0tNOKGzSXMtNCIG2A+ZirHpxE3sOTyyrSQld8mQQsAllL1GuhB16xO3dMizfGrXd4yLWADKTMtOspDAczDTPdbmRbAkpv7JrateFdsVQZXAAA0rz/6r/JQzeqO/j5TTdta62kx7G/9fZCDugFr85+PKlAoaAGQ1kz322q+ItMCWFKDYU6NuL0rtknQAoClVPW0yGCqs1Sqv89B4vJQ3Xm9jHdGBcQs1LII6rIgzRe+E7QAmI+6h9BU5aE04gaWm/JQVAQtAFhK1bRCpkWzqkyLoUbcMH8Jy4LUPS00GgRIqg4STzP3UB4KWHIacVPJ4AoAgOZVjaKrxtE0o5o4LE95qKoRt6AFCyBhA9bqMyfTAiCt2TItNOIGlptMCyqCFgAspapRdFemRaOqzJVUe03ay7Swe4YFMI9Mi2kW0QC4aFUJTI24Ac6nETeVDK4AAGhetezWFbNoVJW5kmpZsyqVMPdG3HbPsAjqWubNfz6qFHyZFgBpzRQklmkBLDmZFlQELQBYSoPR4npHeahGVZkWqaZtelrAFmRaACy8mcrx1c8BSznAcpJpQSWDKwAAmlcFLZSHalZ3tKA5SBQM6pf9iJhj47Vq98ywP5/zwSyq61RPC4CFVTfiniVoIdMCWFJV5n0emRajd1Lviq0QtABgKVWNuDuCFo2qym2lzrToFb1EZziHlF8WSV0WpPnPR72IZicZQFJ1I+5p+mklfA4A5KAK6HZzCM5W91qZyK0QtABgKdXloUwwGtWdUyPuuaUDa8TNIlEeCmDhNZJpkcMOZIAEsuppoTxUqwQtAFhK1bJbT6ZFozqRtjxUvbNmXpNUmRYsknqHrUbcAIuq2j2sETfA+ea+iW0r3hVblcEVAADNq6YVykM1K3mmxbCtTAsTURZAlREk0wJgYTWTaWEpB1hOc9/EthXviq3ypANgKQ1Hy+rdRBkB21UVBJJpAS1IuMNWI26A+ah7WkyVaTH6GZkWwJLKK9NiNAbz41ZkcAUAQPP6VU8LvQoa1S3nUx6qk6D8zYbq3TP9+ZwPZlFdpwkzLQQtANIaDxKXk2YEJ3wOAOSgP7rPybRA0AKApVSFKmRaNKs7rMpDLVmmhYkoi6C6Tju9xg9dl4cS6AVIanyOM3G2RcLnAEAO6kbcOWSUVfda74qtELQAYCkNSpkWKZxtxJ1GPUmde3ko1wkLIGF5KJkWAPMxXvJk4qCFRtzAkqsz77MoD6WUcJsyuAIAoHnrMi0ELhrTHf1dDhM0OC/Lcv41TKX8skjqRtzNfz6qkmwacQOk1RvLkpg4UFw34ha0AJbT3DexbcW7YqsELQBYSmcbcYedEQ3qRLpG3OOLpRpxwwZkWgAsvGYyLSzlAMtJpgWVDK4AAGheXR6qDDsjGlRnWiQOWsy/EbdrhAWQcIdt9WIo0wIgrfGNGZNnWlQZdxnsQAZIIK9Mi9E7qXfFVghaALCUqkX1bpR2RjSoMwoGDRKUhxp/cZ9fpsVoKuQaYREkzLSoghaDchBlgs83AGv0tADY3NzLBW9F/8NWZXAFAEDz+jItkuiOXpYH0fzEbTxooacFbGDYX/tvgqDeeKBQtgVAOuP32351X79YCZ8DADmo7ot5ZFpU74oT3qtphKAFAEvpbKZFmGQ0KGUj7vGgRa/obfGdDaqaYQpasAiGCXtadAQtAOahKIooooiIKe639XNgTvMkgDmry0PlkFHmXbFVghYALKWqfNFaeSgLcE3pjv4ukzTiHo71tJhXpoXmaiyS6l6WsBF3hGbcAKlVi3ET32814gaWnEbcVDK4AgCgecNR+aKu8lCN6pRVpoXyUDB3c2jEHSHTAiC1KlA8eaaFRtzA8hq/J+ZRHkoj7jYJWgCwlOodGhpxNypppsVY07WiKBo//obsnmGRJGzAKtMCYH6qQPH0mRYZLOYBNKyVTWxb0Yi7VRlcAQDQvLoWpkyLRnXm0NNirhNUu2dYJDItAJbC9JkW6Z4DAG3LL9NCVn6bBC0AWEqDYZVpEXbRN6gz+nsdxDDKhgMXVdBirhPUjokoC2TYX/tvglrm45+7fnUeAJKoMy0mnX/Uz4EMFvMAGjZ+T8wq08LcuBUZXAEA0Lx15aEsSDemOxYAano3dtWIe76ZFspDsUDKdLXMi6KIItbKssm0AEirChRPXR4qhx3IAA0bvyd2cwjOeldslaAFAEupWnTrKQ/VqO4wXdCimqT2il6jx91SZ3Qu1wiLoLpOO2k+I9XLoZ4WAGlV99upG3HnsJgH0LDsykN5V2yVoAUAS0kj7jS6YyWhml7YrBtxJyh9symNuFkkiRuwTl1jHYCJaMQNcL51mRZZBC1G76Xmxq0QtABgKWnEnUZnDpkWc52gasTNIkncgHXqRTQAJqIRN8D5qntiEUUURdHyaEIj7pYJWgCwlM5mWoRd9A0a72mRLNNinj0t6kwLu2dYADItAJaCTAuA81WNuLPIsoiQld+yORaNBmCZDIaDOHrqVNvD2NTp/pmIiOhGGfecPB7DE8dbHtFy6A369a+/dPJYDAbNTSjvvmft36iITpw43b/AdzejO4jYFRGD/uk45RohcztOn4odEXGmLOJMgs9ItYh298njsW/HicaPz3Jb2XXJfMv7kY1yOIyT95xsexgLpYi1HcRHjh+Juy/50kX/3CWDM9GJiHvO9GN40n0aWC7V+2CnmN/74FY6g4hLImI4OBP3eFe8aGVDmSlFWY4Vp75Iq6ursW/fvjhy5EisrKw0MhAAFsvNX/xYvOD/fV7bw7igf/OFO+LvH/dS15QyIm68372TnmN45vI4/skfTnqOytd33h2/tvM/zOVc0JTX9L8xfqb//MaPe9mDfjI6PS9kTOdN3/g/4wFXpH0+kKcTq3fGpf/h/m0PY6E85/p7xWd27mh7GABZKoc74tjHfrLtYcSXFx+PP9j1420PY+F84Zv/W1z76GfMHDewFQaApbVr0ImHnzrd9jCWShERjzmRdtdL//gDkx5/3M3D+8Xd5WVzOx/M6lTZi78ZPiTJsQdz/OwBbGdPPHlP20MAyNY83we38onyhri9vKLtYWxbMi0AmEp/0I+7Th5texhb2tW9JHb0T66lB9CYcsfuOD5I9LJdFLFnx540x97M4EzEGWUtWBDdHRE7dic7/LEzxyImfz2AuPLSleiqs78tlcNhnDx2d9vDWDjHz5yIMqboIdTbHdFV6RtYXpft2JNHI+6IiGE/4rTKDZM4Myjj8v1XzRw38KQDYCq9bi+u2bMIuw7SLe5tZ5ct1d9rL2L3Mv15YHqX7ry87SEAC6bodOLSlf1tD2PhXBr+zgDy14u45JK2B7FQVldXGzmO8lAAAAAAAEAWBC0AAAAAAIAsCFoAAAAAAABZELQAAAAAAACyIGgBAAAAAABkQdACAAAAAADIgqAFAAAAAACQBUELAAAAAAAgC4IWAAAAAABAFgQtAAAAAACALPSm+aGyLCMiYnV1tdHBAAAAAAAAi6eKF1Txg2lNFbS48847IyLi4MGDM50cAAAAAABYHkePHo19+/ZN/fNTBS32798fERG33nrrTCcHFtfq6mocPHgwDh06FCsrK20PB2iB+wDgPgBEuBcA7gPA2fvARz7ykThw4MBMx5oqaNHprLXC2LdvnxsRbHMrKyvuA7DNuQ8A7gNAhHsB4D4ARFx//fV1/GBaGnEDAAAAAABZELQAAAAAAACyMFXQYteuXXHTTTfFrl27mh4PsCDcBwD3AcB9AIhwLwDcB4Bm7wNFWZZlA2MCAAAAAACYifJQAAAAAABAFgQtAAAAAACALAhaAAAAAAAAWRC0AAAAAAAAsiBoAQAAAAAAZEHQAgAAAAAAyIKgBQAAAAAAkAVBCwAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALLQm+aHhsNhHD58OPbu3RtFUTQ9JgAAAAAAYIGUZRlHjx6NAwcORKczfb7EVEGLw4cPx8GDB6c+KQAAAAAAsHwOHToUN9xww9Q/P1XQYu/evfXJV1ZWpj45AAAAAACw+FZXV+PgwYN1/GBaUwUtqpJQKysrghYAAAAAAEBExMwtJTTiBgAAAAAAsiBoAQAAAAAAZEHQAgAAAAAAyIKgBQAAAAAAkAVBCwAAAAAAIAuCFgAAAAAAQBYELQAAAAAAgCwIWgAAAAAAAFkQtAAAAAAAALIgaAEAAAAAAGSh1/YAAIAFdfT2iDf/0FxO9Y7BkfgfgzujTHaGImLfDRF775XsDNCI/j0RX/xoxOB0slMcKHbG9/Wujx1FkewcsKVLViKe+qqIlQPn/dbNd9wc//kj/zn6w34LA2MrP/y4H46rL7267WEAAEtA0AIAmM7p4xEfedNcTvXvr79XfGrnjrQnuetLEXd9KO05YEE85dCH4ivvOdX2MNjOrrhvxJNfed6Xf/1Dvx5vufUt8x8PF/R9X/59bQ8BAFgSghYAwHQuvTLi7/27uZzq5Cd+M6J/NF68/8vjXjv2Nnvwe+6OuOWtEZfsi/i6H2v22NC0T/xZxMf/eG1B98BjGj/8b9z5nri9fyxOPu4lEXvu2/jx4YI+/MaIv/vfEWdObvjbJ/trX3/O/Z8Tj7z6kfMcGRdwxSVXtD0EAGBJCFoAANPZfXnE414yl1MN/u53IvpH45lP+pfx8Csf3uzBb/9QxPv/MGJ4acRDX9DssaFpt30m4uixiIc8LuJZP9f44f/HH70gbr/z5hg+6OsjDj618ePDBd35ybWgxXCw4W8PyrWvP+n6J8U33P8b5jkyAADmRCNuACB7w3IYERG9IsF+i87omOXGC2SQleo67aTZe9TtdCPi7MIwzN0F7slJnwcAAGRB0AIAyF61SNUpEkxdiu7oJBZpWQDVdVpdtw3rjo5bfeZg7qr7/GaZFqOvJ3keAACQBTM9ACB71a7vboqF2tHO8rBIyyKoMy3SBC2qhWCZFrTmAvfkKqCW5HkAAEAWBC0AgOwNhykzLbbe1QtZqTMt0kzj60yLoSAeLblA9lvSzDsAALJgpgcAZG8umRbDfvPHhqZVu89lWrCs6kyLrRtxdxN9BgAAaJ+gBQCQvWqRqtNJ2NPCIi2LoAquJSqNU33GBC1oTbF1ILl+Hsi0AABYWmZ6AED20mZa9Nb+qzwUi6C6TqvrtmG9Yu24GnHTms7W5aGSPg8AAMiCoAUAkL2kjVfrEiNlRFk2f3xokkbcLLsLNeIeasQNALDsBC0AgKyVZZm28er4MWVbkDuNuFl2F2jErTwUAMDyM9MDALI2XqYmbaZF6GtB/jTiZtldoBF3nXmnETcAwNIStAAAsjYetEjaiDti08avkI060yLNgm2daaGnBW2RaQEAsO2Z6QEAWRvf8Z0800J5KHKXuqdFR6YFLbvYTAs9LQAAlpagBQCQtfHF0zQ9LZSHYoFU2UCJMy0ELWhNdZ/fLNNiKNMCAGDZmekBAFkbXzztFb3mT9AZO6bmw+RumDbTQnkoWlfdky9QHkqmBQDA8hK0AACyNhwLJCTZWTveJ8PucnI3r0bcSqXRFuWhAAC2PUELACBryctDRVyw8StkQyNult3FNuLueJUFAFhWZnoAQNaqxdNO0YmiKNKcpNrZW/ULgFylbsRdaMRNy2RaAABse4IWAEDW6l21KZuuFlsvkkE2UmdadGRa0LILNeKexzMBAIBWmekBAFmbS9PVjvJQLIgqGyhxpkW/lHVESy5wP676rci0AABYXoIWAEDWqkbccwla2F1O7hI34u4VvYg4+7mDueusXYPKQwEAbF+CFgBA1uaSaaERN4sicXkoPS1o3UU24u4mCtwBANA+QQsAIGt1I+5OwmnLBRq/Qjbm1IhbTwtas8X9uCzLKKNc+zY9LQAAlpaZHgCQtflmWqjjT+bm1IhbpgWt2SLTYvy6VB4KAGB5CVoAAFmrMy1S7qrViJtFIdOCZVdl1W1wDY5flzItAACWl5keAJC1amdt0gWqQiNuFkTqTIvRcQUtaI1MCwCAbU/QAgDI2mA4h/JQ1c5emRbkrrpGE/V4qYKDfaXSaEtn83J9g7F7tEwLAIDlZaYHAGRtLj0tOr21/6rjT+7q8lC9JIfvFWvHlWlBa7a4H6/LtEhUIg0AgPYJWgAAWasWT5MuUG1RjgSykrg8VLV7XSNuWrPF/Xg8mKY8FADA8hK0AACyNpeeFluUI4GsJG7EXQUHZVrQmi0acY8H05SHAgBYXmZ6AEDW6kyLlLtq60bcdpeTOZkWLLuLyLSQZQEAsNwELQCArM0n06JqxG13OZmrdp8nasRdLQbLtKA1nc2DyNV1KcsCAGC5me0BAFkbDOfQiFumBYuiKmGWOtNCfxfaUmxerq8/+ppMCwCA5SZoAQBkbS7lQDq90cks1JK56hqtrtmGVZ8z5aFozRb34/p5kKinCwAAeRC0AACyVpeHSlQOZ+3gMi1YEBpxs+zq+/HmjbiVhwIAWG5mewBA1ubaiHuDciSQFY24WXZVQEIjbgCAbUvQAgDImkbcMKZuxJ0o00Ijbtq2ReabTAsAgO3BbA8AyNpcMy3sLid3daZFmmm8TAtaV2e+ybQAANiuBC0AgKzNJ9Ni80UyyErqnhYyLWibTAsAgG3PbA8AyNpgFEiQaQFxtu9K6p4WAni0pRhrxF2W635rLs8DAABaJ2gBAGStLgeSaGd5RER0eqOTWaglc9U1Wl2zDeuNjqs8FK0Zv9efc0+ey/MAAIDWCVoAAFmbbyPufrpzwKzKMiJGO88TLdpWnzPloWjN+LV9TvCseh7ItAAAWG6CFgBA1ubbiNtCLRkb33WuETfLqrhwpoWeFgAAy81sDwDImkbcMDIeSNCIm2V1EZkWghYAAMvNbA8AyNp8My0ELcjYukyLxI24fRZoy1aZFsM5PA8AAGidoAUAkLX+qM+ETAu2vfGeK4kzLQY+C7Rli0bc/XIOzwMAAFpntgcAZK3KtOh1eulO0pFpwQJYVx4qzeehO/osyLSgNeMBiXLjnhZJnwcAALRO0AIAyNpcapgXMi1YAMOxPhOJyuPoaUHrimLTe7KeFgAA24PZHgCQtbn0tFAeikWwLtMizTReTwuysEn221yeBwAAtE7QAgDI2lwzLSzUkrMqqJZwwVamBVmQaQEAsK2Z7QEAWasWTzXiZturgmqJmnBHnP2cCVrQqs0yLYYyLQAAtgNBCwAga4NRICHpIpVMCxbBHDMtlIeiVXWmxfrgmUwLAIDtwWwPAMjaXBapqv4AMi3I2bC/9t85ZFoMfBZoU31P7q/7cv08SNTTBQCAPJjtAQBZq8rU9Dq9dCepjm2hlpxVJZsSBi26HZkWZKC6J2/SiLtXJHweAADQOkELACBrGnHDiEbcbBcacQMAbGtmewBA1qrF06Q9LTTiZhHMsRG3TAtapRE3AMC2JmgBAGStqq0v04JtT6YF24VG3AAA25rZHgCQtflkWmjEzQKYY6bFsBxGWZbJzgNbqu7Jm/S0kGkBALDcBC0AgKzNt6eF3eVkbI6ZFhFKRNGiOtOiv+7L9fOg4zUWAGCZme0BAFmrFqm6CXeXR6e39t9zFsggK8P0mRbjnzMlomhNfU/euBG3TAsAgOUmaAEAZE0jbhiZQ3komRZkYbNG3MpDAQBsC4IWAEDW5lseyiItGZtDeajxz5lMC1pTbBxIHgw14gYA2A7M9gCArA2HMi0gImRasH3UjbjXB85kWgAAbA+CFgBA1uaTabHxAhlkZd6ZFkOfB1qyWabFPJ4HAAC0zmwPAMianhYwUgXVOumm8OOLwTItaM2FelokzDYCAKB9ghYAQNb6ZT8iIjoJF2rP7urtpzsHzKq6PhMG8IqiqAMXgha0ZpN7cv08kGkBALDUzPYAgKxVO2t7RS/dSTqjY1ukJWdVJlAn4WchzmY1acRNa6pr/Jzst6pkWdLnAQAArTPbAwCmUpZlnDyTfpH/zGBtZ21/UMaJ02kyIbrDiF0RMRj041Sic8CsumdOr12n0Ul6nXaKbkSciWOnTseJHT4PzN+uKKIbEafOnInB2LV+avQ8GAwj2fOA6e3e0Y2iKNoeBgCwBIqyLMtJf2h1dTX27dsXR44ciZWVlRTjAgAy97nVL8TTfv0nkp+nt/eD0dl5d5z83LdHf/UxSc7xnM5fxP+985fic+WV8UeDJyQ5B8zq/sVt8fXd98ZfDh8Wzz/9Y8nOs+fBr46iezpOf+krI4a7k50HNvPszl/GweKOuLW8Jr4Ue+qvf3b3sfjiJffEV9x1dTzhzutaHCEb+c5X/oe49Ar/LgCwnTUVN5BpAQBM5eiZo7HzynfM7XzlcFeyY6/GZRERcX1xZ3xP738mOw804Wh5adLjl8PdUXRPx84r/ibpeWAzb4mIiJWIuGf0f+s9JT4eL+69Z76D4oJOnropIgQtAIDZCVoAAFO55tIr4h899IVzOddVu6+K5/2Db48d3R1pTjB4Wpx+z74ojt2W5vjQlE43vvqRz4+PXPmgZKd43xeuind+7v9Ldny4oHvujs4X/zZig74qe4pefOPV94sznZ0tDIytXLL3yraHAAAsCeWhAAAAAACAmTQVN+g0OCYAAAAAAICpCVoAAAAAAABZELQAAAAAAACyIGgBAAAAAABkQdACAAAAAADIgqAFAAAAAACQBUELAAAAAAAgC4IWAAAAAABAFgQtAAAAAACALAhaAAAAAAAAWRC0AAAAAAAAstCb5ofKsoyIiNXV1UYHAwAAAAAALJ4qXlDFD6Y1VdDizjvvjIiIgwcPznRyAAAAAABgeRw9ejT27ds39c9PFbTYv39/RETceuutM50cWFyrq6tx8ODBOHToUKysrLQ9HKAF7gOA+wAQ4V4AuA8AZ+8DH/nIR+LAgQMzHWuqoEWns9YKY9++fW5EsM2trKy4D8A25z4AuA8AEe4FgPsAEHH99dfX8YNpacQNAAAAAABkQdACAAAAAADIwlRBi127dsVNN90Uu3btano8wIJwHwDcBwD3ASDCvQBwHwCavQ8UZVmWDYwJAAAAAABgJspDAQAAAAAAWRC0AAAAAAAAsiBoAQAAAAAAZEHQAgAAAAAAyMJUQYtf/uVfjvve975xySWXxOMf//j467/+66bHBWTqx3/8x6MoinX/99CHPrTtYQEJveMd74jnPOc5ceDAgSiKIt70pjet+/2yLOPVr3513Ote94rdu3fH05/+9PjEJz7RzmCBJC50H3jRi1503vzgWc96VjuDBZL46Z/+6fjKr/zK2Lt3b1xzzTXx3Oc+Nz72sY+t+5577rknXvayl8WVV14Ze/bsiW/91m+Nz3/+8y2NGGjaxdwHnvrUp543J/in//SftjRioGmvfe1r48Ybb4yVlZVYWVmJJz7xifHmN7+5/v2m5gITBy1+93d/N37gB34gbrrppnjve98bj3rUo+KZz3xmfOELX5j45MBievjDHx633XZb/X/vfOc72x4SkNDx48fjUY96VPzyL//yhr//Mz/zM/GLv/iL8Su/8ivxV3/1V3HZZZfFM5/5zLjnnnvmPFIglQvdByIinvWsZ62bH7zhDW+Y4wiB1N7+9rfHy172svjLv/zL+NM//dM4c+ZMPOMZz4jjx4/X3/P93//98Yd/+IfxX//rf423v/3tcfjw4fiWb/mWFkcNNOli7gMRES95yUvWzQl+5md+pqURA0274YYb4t/8m38T73nPe+Ld7353PO1pT4tv+qZvig9/+MMR0dxcoCjLspzkBx7/+MfHV37lV8Yv/dIvRUTEcDiMgwcPxstf/vL44R/+4YkHACyWH//xH483velN8f73v7/toQAtKIoi3vjGN8Zzn/vciFjLsjhw4EC88pWvjB/8wR+MiIgjR47EtddeG6973evi+c9/foujBVI49z4QsZZpcffdd5+XgQEsry9+8YtxzTXXxNvf/vZ4ylOeEkeOHImrr746Xv/618fznve8iIj46Ec/Gg972MPiXe96VzzhCU9oecRA0869D0SsZVo8+tGPjp//+Z9vd3DA3Ozfvz9+9md/Np73vOc1NheYKNPi9OnT8Z73vCee/vSnnz1ApxNPf/rT413vetckhwIW2Cc+8Yk4cOBA3P/+949/+A//Ydx6661tDwloyac//em4/fbb180N9u3bF49//OPNDWCbedvb3hbXXHNNPOQhD4mXvvSlceedd7Y9JCChI0eORMTaQkVExHve8544c+bMujnBQx/60Lj3ve9tTgBL6tz7QOW//Jf/EldddVU84hGPiFe96lVx4sSJNoYHJDYYDOJ3fud34vjx4/HEJz6x0blAb5JvvuOOO2IwGMS111677uvXXnttfPSjH53oxMBievzjHx+ve93r4iEPeUjcdttt8RM/8RPx5Cc/OW6++ebYu3dv28MD5uz222+PiNhwblD9HrD8nvWsZ8W3fMu3xP3ud7+45ZZb4kd+5Efi2c9+drzrXe+Kbrfb9vCAhg2Hw3jFK14RX/VVXxWPeMQjImJtTrBz5864/PLL132vOQEsp43uAxER3/Ed3xH3uc994sCBA/HBD34wfuiHfig+9rGPxR/8wR+0OFqgSR/60IfiiU98Ytxzzz2xZ8+eeOMb3xhf9mVfFu9///sbmwtMFLQAePazn13/+sYbb4zHP/7xcZ/73Cd+7/d+L77ru76rxZEBAG0ZLwX3yEc+Mm688cZ4wAMeEG9729vi677u61ocGZDCy172srj55pv1toNtbLP7wHd/93fXv37kIx8Z97rXveLrvu7r4pZbbokHPOAB8x4mkMBDHvKQeP/73x9HjhyJ//bf/lu88IUvjLe//e2NnmOi8lBXXXVVdLvd8zp+f/7zn4/rrruu0YEBi+Hyyy+PBz/4wfHJT36y7aEALaie/+YGwLj73//+cdVVV5kfwBL63u/93vijP/qjeOtb3xo33HBD/fXrrrsuTp8+HXffffe67zcngOWz2X1gI49//OMjIswJYIns3LkzHvjAB8ZjH/vY+Omf/ul41KMeFb/wC7/Q6FxgoqDFzp0747GPfWy85S1vqb82HA7jLW95SzzxiU+c6MTAcjh27Fjccsstca973avtoQAtuN/97hfXXXfdurnB6upq/NVf/ZW5AWxjn/3sZ+POO+80P4AlUpZlfO/3fm+88Y1vjD//8z+P+93vfut+/7GPfWzs2LFj3ZzgYx/7WNx6663mBLAkLnQf2Mj73//+iAhzAlhiw+EwTp061ehcYOLyUD/wAz8QL3zhC+MrvuIr4nGPe1z8/M//fBw/fjxe/OIXT3ooYAH94A/+YDznOc+J+9znPnH48OG46aabotvtxgte8IK2hwYkcuzYsXU7oz796U/H+9///ti/f3/c+973jle84hXxUz/1U/GgBz0o7ne/+8WP/diPxYEDB+K5z31ue4MGGrXVfWD//v3xEz/xE/Gt3/qtcd1118Utt9wS//Jf/st44AMfGM985jNbHDXQpJe97GXx+te/Pv77f//vsXfv3ro29b59+2L37t2xb9+++K7v+q74gR/4gdi/f3+srKzEy1/+8njiE58YT3jCE1oePdCEC90Hbrnllnj9618ff+/v/b248sor44Mf/GB8//d/fzzlKU+JG2+8seXRA0141ateFc9+9rPj3ve+dxw9ejRe//rXx9ve9rb4kz/5k0bnAkVZluWkg/ulX/ql+Nmf/dm4/fbb49GPfnT84i/+Yp3uBSy35z//+fGOd7wj7rzzzrj66qvjq7/6q+Nf/at/pTYlLLG3ve1t8bVf+7Xnff2FL3xhvO51r4uyLOOmm26KX/3VX4277747vvqrvzpe85rXxIMf/OAWRguksNV94LWvfW0897nPjfe9731x9913x4EDB+IZz3hG/ORP/mRce+21LYwWSKEoig2//pu/+Zvxohe9KCIi7rnnnnjlK18Zb3jDG+LUqVPxzGc+M17zmtcoDwVL4kL3gUOHDsU/+kf/KG6++eY4fvx4HDx4ML75m785fvRHfzRWVlbmPFoghe/6ru+Kt7zlLXHbbbfFvn374sYbb4wf+qEfiq//+q+PiObmAlMFLQAAAAAAAJo2UU8LAAAAAACAVAQtAAAAAACALAhaAAAAAAAAWRC0AAAAAAAAsiBoAQAAAAAAZEHQAgAAAAAAyIKgBQAAAAAAkAVBCwAAAAAAIAuCFgAAwJZe9KIXxXOf+9y2hwEAAGwDvbYHAAAAtKcoii1//6abbopf+IVfiLIs5zQiAABgOxO0AACAbey2226rf/27v/u78epXvzo+9rGP1V/bs2dP7Nmzp42hAQAA25DyUAAAsI1dd9119f/t27cviqJY97U9e/acVx7qqU99arz85S+PV7ziFXHFFVfEtddeG7/2a78Wx48fjxe/+MWxd+/eeOADHxhvfvOb153r5ptvjmc/+9mxZ8+euPbaa+M7v/M744477pjznxgAAMiZoAUAADCx3/qt34qrrroq/vqv/zpe/vKXx0tf+tL4tm/7tnjSk54U733ve+MZz3hGfOd3fmecOHEiIiLuvvvueNrTnhaPecxj4t3vfnf88R//cXz+85+Pf/AP/kHLfxIAACAnghYAAMDEHvWoR8WP/uiPxoMe9KB41ateFZdccklcddVV8ZKXvCQe9KAHxatf/eq4884744Mf/GBERPzSL/1SPOYxj4l//a//dTz0oQ+NxzzmMfEbv/Eb8da3vjU+/vGPt/ynAQAAcqGnBQAAMLEbb7yx/nW3240rr7wyHvnIR9Zfu/baayMi4gtf+EJERHzgAx+It771rRv2x7jlllviwQ9+cOIRAwAAi0DQAgAAmNiOHTvW/e+iKNZ9rSiKiIgYDocREXHs2LF4znOeE//23/7b8451r3vdK+FIAQCARSJoAQAAJPflX/7l8fu///tx3/veN3o9ryEAAMDG9LQAAACSe9nLXhZ33XVXvOAFL4i/+Zu/iVtuuSX+5E/+JF784hfHYDBoe3gAAEAmBC0AAIDkDhw4EP/7f//vGAwG8YxnPCMe+chHxite8Yq4/PLLo9PxWgIAAKwpyrIs2x4EAAAAAACALU0AAAAAAEAWBC0AAAAAAIAsCFoAAAAAAABZELQAAAAAAACyIGgBAAAAAABkQdACAAAAAADIgqAFAAAAAACQBUELAAAAAAAgC4IWAAAAAABAFgQtAAAAAACALAhaAAAAAAAAWfj/AS/m1uIjv92VAAAAAElFTkSuQmCC","text/plain":[""]},"execution_count":10,"metadata":{},"output_type":"execute_result"}],"source":["from pyannote.audio import Inference\n","inference = Inference(model, step=2.5)\n","output = inference(AUDIO_FILE)\n","output"]},{"cell_type":"markdown","metadata":{"id":"MoqfhoX_TIbO"},"source":["For each of the 9 positions of the 10s window, the model outputs a 3-dimensional vector every 17ms (589 frames for 10 seconds), corresponding to the probabilities that each of (up to) 3 speakers is active. "]},{"cell_type":"code","execution_count":11,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":196,"status":"ok","timestamp":1704807141814,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"JObvduJMTIbO","outputId":"af45591c-c30d-401a-8906-db029140cea2"},"outputs":[{"data":{"text/plain":["(9, 589, 3)"]},"execution_count":11,"metadata":{},"output_type":"execute_result"}],"source":["output.data.shape"]},{"cell_type":"markdown","metadata":{"id":"Zdk-iqOaTIbQ"},"source":["## Processing a file from memory\n","\n","In case the audio file is not stored on disk, pipelines can also process audio provided as a `{\"waveform\": ..., \"sample_rate\": ...}` dictionary."]},{"cell_type":"code","execution_count":12,"metadata":{"executionInfo":{"elapsed":229,"status":"ok","timestamp":1704807145587,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"oFQrkl01TIbQ"},"outputs":[{"name":"stdout","output_type":"stream","text":["type(waveform)=\n","waveform.shape=torch.Size([1, 480000])\n","waveform.dtype=torch.float32\n"]}],"source":["import torchaudio\n","waveform, sample_rate = torchaudio.load(AUDIO_FILE)\n","\n","print(f\"{type(waveform)=}\")\n","print(f\"{waveform.shape=}\")\n","print(f\"{waveform.dtype=}\")\n","\n","audio_in_memory = {\"waveform\": waveform, \"sample_rate\": sample_rate}"]},{"cell_type":"code","execution_count":13,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":657},"executionInfo":{"elapsed":1904,"status":"ok","timestamp":1704807149946,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"j2zi1CzHTIbR","outputId":"fb8ac3d0-c9f0-4b0a-c01f-a10612bdf254"},"outputs":[{"data":{"image/png":"","text/plain":[""]},"execution_count":13,"metadata":{},"output_type":"execute_result"}],"source":["output = inference(audio_in_memory)\n","output"]},{"cell_type":"markdown","metadata":{"id":"aB5QNx7lTIbS"},"source":["## Processing part of a file\n","\n","If needed, `Inference` can be used to process only part of a file:"]},{"cell_type":"code","execution_count":19,"metadata":{"id":"E0Pydt0VTIbT"},"outputs":[{"data":{"image/png":"","text/plain":[""]},"execution_count":19,"metadata":{},"output_type":"execute_result"}],"source":["from pyannote.core import Segment\n","output = inference.crop(audio_in_memory, Segment(0, 20))\n","output"]},{"cell_type":"markdown","metadata":{"id":"K_Z-ciLaTIbU"},"source":["## Offline use\n","\n","Gating models allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. Please fill this form as precisely as possible.\n","\n","For instance, before gating `pyannote/segmentation`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! maintaining open source libraries is time consuming.\n","\n","That being said: this whole authentication process does not prevent you from using official `pyannote.audio` models offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production).\n","\n","* Step 1: download the `pytorch_model.bin` model\n","\n","![](assets/download-model.png)\n","\n","* Step 2: load the model"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"t_kZgFSSTIbV"},"outputs":[],"source":["# look ma: no hands!\n","offline_model = Model.from_pretrained(\"pytorch_model.bin\")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"wBkIFwR-TIbV"},"outputs":[],"source":["# just checking weights are the same...\n","import torch\n","for weights, offline_weights in zip(model.parameters(), offline_model.parameters()):\n"," assert torch.equal(weights, offline_weights)"]}],"metadata":{"accelerator":"GPU","colab":{"gpuType":"T4","provenance":[]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.13"},"vscode":{"interpreter":{"hash":"36a3a48a52702f18671693adf589423ec3f7db45d50f6ee539f1b0696bb58d43"}},"widgets":{"application/vnd.jupyter.widget-state+json":{"01a79756e1104be0bcbe1d233677d605":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"09ee64808a5c431b928066a4cb287a78":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"0c537fef94bd4a3d8c42d981a70ea35b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_ecbb880cf0044404ab9bfc5995656807","placeholder":"","style":"IPY_MODEL_79287ffea49a4e4a80191b1157de15cc","value":"\nPro Tip: If you don't already have one, you can create a dedicated\n'notebooks' token with 'write' access, that you can then easily reuse for all\nnotebooks. "}},"0e04a45510f346b8ab3c465b10ccdfd2":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_bf5c88fe90f0462b95aa8a81467c1e9d","placeholder":"","style":"IPY_MODEL_09ee64808a5c431b928066a4cb287a78","value":"Connecting..."}},"1a30ac8b519948bbad72886049b53d55":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"2b91f4b11d5848c0b6795b649bc2d7bc":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_57cf64e9c70e4057a76c2f3fb1e9b191","placeholder":"","style":"IPY_MODEL_5e90e4818f1f43f2aa4714e0e40dc15d","value":"Your token has been saved to /root/.cache/huggingface/token"}},"2d1965df902b4b7eb6e4e28e44d6e32b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"CheckboxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"CheckboxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"CheckboxView","description":"Add token as git credential?","description_tooltip":null,"disabled":false,"indent":true,"layout":"IPY_MODEL_416ffb7703b1404ab1c19c6b1ecafb4c","style":"IPY_MODEL_5be7421c575d41128d0675dc9abbc196","value":true}},"416ffb7703b1404ab1c19c6b1ecafb4c":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"47da80658de14604a7cc5268b951c172":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"49cb317b141a43b7b8ce342e2d62ab03":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"52a4a1aab39741f2a0499428f07a4c25":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_01a79756e1104be0bcbe1d233677d605","placeholder":"","style":"IPY_MODEL_990375a830eb4cd79c69ad78b9ac685b","value":"Token is valid (permission: write)."}},"53919e13e42441d9b8cf5eb4f2effe96":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"57cf64e9c70e4057a76c2f3fb1e9b191":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"5be7421c575d41128d0675dc9abbc196":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"5e90e4818f1f43f2aa4714e0e40dc15d":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"73915caf6d1042ca9a27853986217ae2":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"PasswordModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"PasswordModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"PasswordView","continuous_update":true,"description":"Token:","description_tooltip":null,"disabled":false,"layout":"IPY_MODEL_47da80658de14604a7cc5268b951c172","placeholder":"","style":"IPY_MODEL_c7fbfffbbc304f0ba025e03beb8f638a","value":""}},"79287ffea49a4e4a80191b1157de15cc":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"79d55946b3764666a0b57a72722f6c19":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":"center","align_self":null,"border":null,"bottom":null,"display":"flex","flex":null,"flex_flow":"column","grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":"50%"}},"8b3353af24074ce5b5d2d9b162c61062":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"988c83c85f7d44d0be949de6eedcf977":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"990375a830eb4cd79c69ad78b9ac685b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"a73cecb8608f491e8f996c5114df4d0f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"VBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"VBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"VBoxView","box_style":"","children":["IPY_MODEL_52a4a1aab39741f2a0499428f07a4c25","IPY_MODEL_f23cc133109c4ecb9b383f118bcc71aa","IPY_MODEL_2b91f4b11d5848c0b6795b649bc2d7bc","IPY_MODEL_b437981acb894f97ad45daaebe2734c7"],"layout":"IPY_MODEL_79d55946b3764666a0b57a72722f6c19"}},"abc23089f4ef4d2a9f8a104fb1dba14a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"b2e03b10050d46548932f429bcc18d81":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ButtonModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ButtonModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ButtonView","button_style":"","description":"Login","disabled":false,"icon":"","layout":"IPY_MODEL_8b3353af24074ce5b5d2d9b162c61062","style":"IPY_MODEL_cd40e31c191044b7a0f8fa4694ad6380","tooltip":""}},"b437981acb894f97ad45daaebe2734c7":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_fb027db34fd740c2a4758d650ae3f73c","placeholder":"","style":"IPY_MODEL_1a30ac8b519948bbad72886049b53d55","value":"Login successful"}},"bc10c6804346449b8e349728a1acb8d2":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_988c83c85f7d44d0be949de6eedcf977","placeholder":"","style":"IPY_MODEL_53919e13e42441d9b8cf5eb4f2effe96","value":" Copy a token from your Hugging Face\ntokens page and paste it below. Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file. "}},"bf5c88fe90f0462b95aa8a81467c1e9d":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"c7fbfffbbc304f0ba025e03beb8f638a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"cd40e31c191044b7a0f8fa4694ad6380":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ButtonStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ButtonStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","button_color":null,"font_weight":""}},"ecbb880cf0044404ab9bfc5995656807":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"f23cc133109c4ecb9b383f118bcc71aa":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_49cb317b141a43b7b8ce342e2d62ab03","placeholder":"","style":"IPY_MODEL_abc23089f4ef4d2a9f8a104fb1dba14a","value":"Your token has been saved in your configured git credential helpers (store)."}},"fb027db34fd740c2a4758d650ae3f73c":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}}}}},"nbformat":4,"nbformat_minor":0}
diff --git a/tutorials/applying_a_pipeline.ipynb b/tutorials/applying_a_pipeline.ipynb
index c1080071d..945caab4e 100644
--- a/tutorials/applying_a_pipeline.ipynb
+++ b/tutorials/applying_a_pipeline.ipynb
@@ -1,400 +1 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Applying a pretrained pipeline\n",
- "\n",
- "In this tutorial, you will learn how to apply `pyannote.audio` pipelines on an audio file.\n",
- "\n",
- "A pipeline takes an audio file as input and returns a labeled temporal segmentation of the audio file. \n",
- "\n",
- "More precisely, it usually applies a pretrained model (= neural network) on the audio file, post-processes the output of the model, and returns its output as a [`pyannote.core.Annotation`](http://pyannote.github.io/pyannote-core/structure.html#annotation) instance. It should become clearer as you keep reading..."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Loading pipeline from 🤗 hub\n",
- "\n",
- "A bunch of pretrained pipelines are available on [🤗 Huggingface model hub](https://hf.co/models?other=pyannote-audio-pipeline) and can be listed by looking for the [`pyannote-audio-pipeline`](https://hf.co/models?other=pyannote-audio-pipeline) tag."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['pyannote/overlapped-speech-detection',\n",
- " 'pyannote/speaker-diarization',\n",
- " 'pyannote/speaker-segmentation',\n",
- " 'pyannote/voice-activity-detection']"
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "from huggingface_hub import HfApi\n",
- "available_pipelines = [p.modelId for p in HfApi().list_models(filter=\"pyannote-audio-pipeline\")]\n",
- "list(filter(lambda p: p.startswith(\"pyannote/\"), available_pipelines))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Official [pyannote.audio](https://github.com/pyannote/pyannote-audio) pipelines (i.e. those under the [`pyannote` organization](https://hf.co/pyannote) umbrella) are open-source, but gated. It means that you have to first accept users conditions on their respective Huggingface page to access the pretrained weights and hyper-parameters. Despite this initial process, those pipelines can perfectly be downloaded for later offline use: keep reading this tutorial until the end to learn how to do that.\n",
- "\n",
- "For instance, to load the speaker diarization pipeline used in this tutorial, you have to visit [hf.co/pyannote/speaker-diarization](https://hf.co/pyannote/speaker-diarization), accept the terms, visit [hf.co/pyannote/segmentation](https://hf.co/pyannote/segmentation) (used internally by the speaker diarization pipeline), accept the terms, and log in using `notebook_login` below:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "application/vnd.jupyter.widget-view+json": {
- "model_id": "9466934e67254fe6a7ff67b727a3f3ab",
- "version_major": 2,
- "version_minor": 0
- },
- "text/plain": [
- "VBox(children=(HTML(value=' "
- ]
- },
- "execution_count": 9,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# we visualize [0, 30] time range\n",
- "from pyannote.core import notebook, Segment\n",
- "notebook.crop = Segment(0, 30)\n",
- "dia"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "When available, the reference annotation can be visualized too, for comparison:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 10,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "from pyannote.database.util import load_rttm\n",
- "REFERENCE = f\"{ROOT_DIR}/tutorials/assets/sample.rttm\"\n",
- "reference = load_rttm(REFERENCE)[\"sample\"]\n",
- "\n",
- "# map hypothesized and reference speakers for visualization purposes\n",
- "pipeline.optimal_mapping(dia, reference)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Processing a file from memory\n",
- "\n",
- "In case the audio file is not stored on disk, pipelines can also process audio provided as a `{\"waveform\": ..., \"sample_rate\": ...}` dictionary. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "type(waveform)=\n",
- "waveform.shape=torch.Size([1, 480000])\n",
- "waveform.dtype=torch.float32\n"
- ]
- }
- ],
- "source": [
- "import torchaudio\n",
- "waveform, sample_rate = torchaudio.load(AUDIO_FILE)\n",
- "\n",
- "print(f\"{type(waveform)=}\")\n",
- "print(f\"{waveform.shape=}\")\n",
- "print(f\"{waveform.dtype=}\")\n",
- "\n",
- "audio_in_memory = {\"waveform\": waveform, \"sample_rate\": sample_rate}"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 12,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "vad = Pipeline.from_pretrained(\"pyannote/voice-activity-detection\", use_auth_token=True)\n",
- "vad(audio_in_memory)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Offline use\n",
- "\n",
- "Gating models and pipelines allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. Please fill this form as precisely as possible. \n",
- "\n",
- "For instance, before gating `pyannote/speaker-diarization`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! maintaining open source libraries is time consuming.\n",
- "\n",
- "That being said: this whole authentication process does not prevent you from using official `pyannote.audio` models and pipelines offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production).\n",
- "\n",
- "* Step 1: download `config.yaml` of [`pyannote/voice-activity-detection`](https://hf.co/pyannote/voice-activity-detection) pipeline\n",
- "\n",
- "![](assets/download-pipeline.png)\n",
- "\n",
- "* Step 2: download the `pytorch_model.bin` model\n",
- "\n",
- "![](assets/download-model.png)\n",
- "\n",
- "* Step 3: edit `config.yaml` to point to the local model\n",
- "\n",
- "```diff\n",
- "pipeline:\n",
- " name: pyannote.audio.pipelines.VoiceActivityDetection\n",
- " params:\n",
- "- segmentation: pyannote/segmentation@Interspeech2021\n",
- "+ segmentation: pytorch_model.bin\n",
- "\n",
- "params:\n",
- " min_duration_off: 0.09791355693027545\n",
- " min_duration_on: 0.05537587440407595\n",
- " offset: 0.4806866463041527\n",
- " onset: 0.8104268538848918\n",
- "```\n",
- "\n",
- "* Step 4: load the pipeline"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 21,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 21,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# look ma: no hands!\n",
- "offline_vad = Pipeline.from_pretrained(\"config.yaml\")\n",
- "offline_vad(audio_in_memory)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 22,
- "metadata": {},
- "outputs": [],
- "source": [
- "# just checking output is the same\n",
- "assert (vad(audio_in_memory) == offline_vad(audio_in_memory))"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3.9.13 ('pyannote-mps')",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.13"
- },
- "vscode": {
- "interpreter": {
- "hash": "36a3a48a52702f18671693adf589423ec3f7db45d50f6ee539f1b0696bb58d43"
- }
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
+{"cells":[{"cell_type":"markdown","metadata":{},"source":["****"]},{"cell_type":"markdown","metadata":{"id":"uiNbotxCeq4b"},"source":["# Applying a pretrained pipeline\n","\n","In this tutorial, you will learn how to apply `pyannote.audio` pipelines on an audio file.\n","\n","A pipeline takes an audio file as input and returns a labeled temporal segmentation of the audio file.\n","\n","More precisely, it usually applies a pretrained model (= neural network) on the audio file, post-processes the output of the model, and returns its output as a [`pyannote.core.Annotation`](http://pyannote.github.io/pyannote-core/structure.html#annotation) instance. It should become clearer as you keep reading..."]},{"cell_type":"markdown","metadata":{"id":"z_4VUpZxesS2"},"source":["## Tutorial setup"]},{"cell_type":"markdown","metadata":{"id":"gbonpHYte3FC"},"source":["### `Google Colab` setup\n"]},{"cell_type":"markdown","metadata":{"id":"hxhhcj8Fe7v7"},"source":["If you are running this tutorial on `Colab`, execute the following commands in order to setup `Colab` environment. These commands will install `pyannote.audio`, and download resources used in this tutorial."]},{"cell_type":"code","execution_count":4,"metadata":{"executionInfo":{"elapsed":17283,"status":"ok","timestamp":1704808539241,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"NFdANMKsfD1p"},"outputs":[],"source":["!pip install -qq pyannote.audio==3.1.1\n","!pip install -qq ipython==7.34.0\n","!wget -q \"https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.wav\"\n","!wget -q \"https://github.com/pyannote/pyannote-audio/raw/develop/tutorials/assets/sample.rttm\"\n","!wget -q -P ./assets/ \"https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/assets/download-model.png\"\n","!wget -q -P ./assets/ \"https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/assets/download-pipeline.png\""]},{"cell_type":"markdown","metadata":{"id":"PScZ6o_9gkfo"},"source":["⚠ Restart the runtime (Runtime > Restart session)."]},{"cell_type":"code","execution_count":13,"metadata":{"executionInfo":{"elapsed":458,"status":"ok","timestamp":1704809048938,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"ZU_QIx1UhcV6"},"outputs":[],"source":["AUDIO_FILE = \"sample.wav\"\n","REFERENCE = \"sample.rttm\""]},{"cell_type":"markdown","metadata":{"id":"d-qKbc9shI6o"},"source":["### Non `Google Colab` setup"]},{"cell_type":"markdown","metadata":{"id":"0KEh-IGqho6F"},"source":["If you are not using Colab, clone `pyannote.audio` [GitHub repository](https://github.com/pyannote/pyannote-audio) and update ROOT_DIR accordingly"]},{"cell_type":"code","execution_count":10,"metadata":{"id":"DuoohiL6hOiJ"},"outputs":[],"source":["ROOT_DIR = \"/pyannote-audio\"\n","AUDIO_FILE = f\"{ROOT_DIR}/tutorials/assets/sample.wav\"\n","REFERENCE = f\"{ROOT_DIR}/tutorials/assets/sample.rttm\""]},{"cell_type":"markdown","metadata":{"id":"yA9JsY84eq4e"},"source":["## Loading pipeline from 🤗 hub\n","\n","A bunch of pretrained pipelines are available on [🤗 Huggingface model hub](https://hf.co/models?other=pyannote-audio-pipeline) and can be listed by looking for the [`pyannote-audio-pipeline`](https://hf.co/models?other=pyannote-audio-pipeline) tag."]},{"cell_type":"code","execution_count":2,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":2323,"status":"ok","timestamp":1704808592263,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"eAXGgYt8eq4f","outputId":"86772355-3bf0-42e5-c478-2d35a80853e9"},"outputs":[{"data":{"text/plain":["['pyannote/overlapped-speech-detection',\n"," 'pyannote/speaker-diarization',\n"," 'pyannote/speaker-segmentation',\n"," 'pyannote/voice-activity-detection',\n"," 'pyannote/speaker-diarization-3.0',\n"," 'pyannote/speaker-diarization-3.1']"]},"execution_count":2,"metadata":{},"output_type":"execute_result"}],"source":["from huggingface_hub import HfApi\n","available_pipelines = [p.modelId for p in HfApi().list_models(filter=\"pyannote-audio-pipeline\")]\n","list(filter(lambda p: p.startswith(\"pyannote/\"), available_pipelines))"]},{"cell_type":"markdown","metadata":{"id":"fry3qMrJeq4h"},"source":["Official [pyannote.audio](https://github.com/pyannote/pyannote-audio) pipelines (i.e. those under the [`pyannote` organization](https://hf.co/pyannote) umbrella) are open-source, but gated. It means that you have to first accept users conditions on their respective Huggingface page to access the pretrained weights and hyper-parameters. Despite this initial process, those pipelines can perfectly be downloaded for later offline use: keep reading this tutorial until the end to learn how to do that.\n","\n","For instance, to load the speaker diarization pipeline used in this tutorial, you have to visit [hf.co/pyannote/speaker-diarization](https://hf.co/pyannote/speaker-diarization), accept the terms, visit [hf.co/pyannote/segmentation](https://hf.co/pyannote/segmentation) (used internally by the speaker diarization pipeline), accept the terms, and log in using `notebook_login` below:"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":145,"referenced_widgets":["6fe26270a8794dea85b9ac8dedde6353","f9969c7acd834359bd67e83c31b82c0b","ef6be001daaa465598bdf20c304bae0b","a8e61b300a324911b1d3babf6b1ebac1","45c2352ba41b46288c8ba74c661e1f85","78f428e1d8ac4c78bd84d208a78b1fe5","9d7b5e47af9448c98a2a434f02b54633","253408a6cab947c7b9ae40a14e255328","a435c85128f148cb80152d03825710e2","90533da4fc1341a28bc247bc9fbbbe9e","6556790d1997431a8476447297e3f595","5253489738b249f69b30f1c84884a95a","3f500b1182a945c3b8b546b1bd04f98e","c3b3069a93994e55b993cab04c066f17","f77186e2976d4a7daa798fd9f0f7b4cb","7ffa4930e0a7421e9464abff87bbde7f","1dd6bfb824ba4919ba392e0d4cd5ac82","7c23cd10c573438c840148ba1afaa5f3","bc4d3f45eaf04cccb7a655ef87d7a927","36f60aa99ea84b739f066c6ee216f159","5033834cc66f49efaa16b240245ec1d3","0b127e22a6de4339aecc1e76a6719d7f","8a7afaf126ac49dcaf24d64c2d918086","1b5f49211db54348924a05482ca7379d","1f3be36a57074214a5f92e08165e41be","275642719b694328ad3b0fb06b53e42a","ae378b480ed747438bcbe147a7e8d277","b22a1d21940e4995b72642b003ae1301","58e423b86bf349aa89fc81a2b0d0f697","a3ebe0f0e6e44522a2545db0ed47af62","dc9c065286e544df921e5bc4a51faf64","16244ab1dc76493d9fafad80a9b65700"]},"executionInfo":{"elapsed":418,"status":"ok","timestamp":1704808601143,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"IdXJumYveq4i","outputId":"cb87e106-b30f-4ec8-e726-d797ca98698b"},"outputs":[],"source":["from huggingface_hub import notebook_login\n","notebook_login()"]},{"cell_type":"markdown","metadata":{"id":"ym2671Fweq4j"},"source":["Once authenticated, you can load the pipeline (and the internal models)..."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":266,"referenced_widgets":["5a2a95f41f014663b5d131635844e3c8","f8df65e5ad6444648681230f415b8d08","17e14569a3f74cbf991b582c27fdf280","4e0ca7f084804d30a89cb9dc439b6cdc","d304f698c6f34f3692548267ca0aba47","3f69734124f7449da0efffbbfd1153fe","686b58e0ccee468f938ea5659f3a9623","e98b1f7ec5e04e748069340fa2fcec2d","616f35eccdd34bcdb5c9f700103ff538","185457147fe44751a1c6e7ae6ad23570","b02b253086e34249ae1d896a75a08960","0b191fb0d6db4f1a98a641dc50445f1e","4a589ea883de4409b7a8275127747fa7","b5f4829158cf4a5c8350966d8eacbf60","7cab72ad0ace45d58d6954021082ace7","5440ac4cffa0438c99c0daeca27a4087","0d0dfc2576784812b6abf673752fa812","84e416e01cfa4a79b0294a44cfd77f32","b33021db4313476ab8bcb85bb1b190d7","6d5431955c09424a96e825aa9bba1b62","54eacc5604f64f658617cde596a728dd","1749a0e7fc6847849ab40ac7f764b699","ff048516fdba44a48a69c51a82aa7aaa","775611277f1948a5ab95641c4e92ff2e","c955f8144b46451493474e02726b5deb","c9eeb26bd5e049f89205e494d54e3818","d4d428f0b0c44756a128adae747716d2","70d672feec1c452aa49a40a3b8c1a4f0","e8b3188ee4934ab4becbc431fc4ccb0b","754a843fd634446b8d246998eaded1fa","1e53e26668aa4f8dabdd6973b79b5b9f","cf5ae82d36384a728d8ca893cc27392b","5cc77ca8ba304ad88342ed6ced6c6a4a","6412ef268e11484f8bbcdb510f66ee21","a987408c2f994329a9a80208b22e3c52","9446f61936c4421d9610464984890353","a80c0fb6a1254bb9a00741c1346b9d9a","b55e95f31216480fa6c81bd1b57ad729","517a1b1faec045f7847ca84f0538f4e1","008a47e4d1174b188e75232fed325398","30864d7b4d074994ba643c73875a67b8","b852998433e24f7fad1fe27e249085f9","46ac2d0bfedc47c8836e8dde0fc91662","58b140d0453a4a219bc64e0152cc5c5e","3bb60a8db7c0406f8debdc8dd88bcdb4","bf644ae3e2894787bb416740749200cd","0fccc53081a34d58a46912d1ceae0533","39cd6fe37b6f4476b6032c29f026d570","87a0e712c0b045449b98ee368c62fff2","2be7c1f5fe32462f8d1cd7ea57cfbf3a","d6135ce9cfaa4f8496a70cee4f1e89a7","00ac431507a54c0d9a41244013b1c0f3","6f318bc3e900434a9134a79e4057f666","7580595b498a4c50b748af9df67132e2","3f1d750804a54d78a6abff2623441360"]},"executionInfo":{"elapsed":30028,"status":"ok","timestamp":1704808674003,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"1NihzPUPeq4k","outputId":"04275648-23ef-4453-9bf4-8d8dfadd99e0"},"outputs":[],"source":["from pyannote.audio import Pipeline\n","pipeline = Pipeline.from_pretrained(\"pyannote/speaker-diarization-3.1\", use_auth_token=True)"]},{"cell_type":"markdown","metadata":{"id":"cZhtRXAHeq4l"},"source":["## Processing a file from disk"]},{"cell_type":"markdown","metadata":{"id":"X7hQRbzeeq4m"},"source":["... and apply it to an audio file. \n","\n","The pipeline will automatically use GPUs when available.\n","On CPU it might take a long while (up to 10x RT)."]},{"cell_type":"code","execution_count":9,"metadata":{"executionInfo":{"elapsed":82100,"status":"ok","timestamp":1704808982782,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"digxFLaueq4n"},"outputs":[],"source":["dia = pipeline(AUDIO_FILE)"]},{"cell_type":"markdown","metadata":{"id":"9WTsQVjjeq4o"},"source":["## Visualizing the output\n","\n","Most pipelines return a [`pyannote.core.Annotation`](http://pyannote.github.io/pyannote-core/structure.html#annotation) instance..."]},{"cell_type":"code","execution_count":10,"metadata":{"executionInfo":{"elapsed":1079,"status":"ok","timestamp":1704809008347,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"Ch7SaA4-eq4p"},"outputs":[],"source":["from pyannote.core import Annotation\n","assert isinstance(dia, Annotation)"]},{"cell_type":"markdown","metadata":{"id":"T4BsOhmXeq4p"},"source":["... whose [API](https://pyannote.github.io/pyannote-core/structure.html#annotation) you can use to print the result:"]},{"cell_type":"code","execution_count":11,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":610,"status":"ok","timestamp":1704809011691,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"OzEPn1hqeq4p","outputId":"e61015ad-c0df-4890-acc6-f65c87d7c30d"},"outputs":[{"name":"stdout","output_type":"stream","text":[" 6.7 7.2 SPEAKER_01\n"," 7.2 7.2 SPEAKER_02\n"," 7.6 8.3 SPEAKER_01\n"," 8.3 9.9 SPEAKER_02\n"," 9.9 10.9 SPEAKER_01\n","10.5 14.7 SPEAKER_02\n","10.9 11.0 SPEAKER_00\n","14.3 17.9 SPEAKER_00\n","18.0 21.5 SPEAKER_02\n","18.2 18.4 SPEAKER_00\n","21.8 28.5 SPEAKER_00\n","27.9 30.0 SPEAKER_02\n"]}],"source":["for speech_turn, track, speaker in dia.itertracks(yield_label=True):\n"," print(f\"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}\")"]},{"cell_type":"markdown","metadata":{"id":"PqiV2D2geq4q"},"source":["If you happen to be running this example in a _Jupyter notebook_, `dia` can be [visualized directly](http://pyannote.github.io/pyannote-core/visualization.html):"]},{"cell_type":"code","execution_count":12,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":220},"executionInfo":{"elapsed":841,"status":"ok","timestamp":1704809016036,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"80D-4Yhreq4r","outputId":"ee02c234-ebc6-40f3-94b1-0838fac05f81"},"outputs":[{"data":{"image/png":"","text/plain":[""]},"execution_count":12,"metadata":{},"output_type":"execute_result"}],"source":["# we visualize [0, 30] time range\n","from pyannote.core import notebook, Segment\n","notebook.crop = Segment(0, 30)\n","dia"]},{"cell_type":"markdown","metadata":{"id":"TC_ZiDYoeq4s"},"source":["When available, the reference annotation can be visualized too, for comparison:"]},{"cell_type":"code","execution_count":14,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":220},"executionInfo":{"elapsed":1085,"status":"ok","timestamp":1704809060416,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"xlcEFiHUeq4s","outputId":"d30ad6ba-a0c6-416d-eb8c-5c8e93f309da"},"outputs":[{"data":{"image/png":"","text/plain":[""]},"execution_count":14,"metadata":{},"output_type":"execute_result"}],"source":["from pyannote.database.util import load_rttm\n","\n","reference = load_rttm(REFERENCE)[\"sample\"]\n","\n","# map hypothesized and reference speakers for visualization purposes\n","pipeline.optimal_mapping(dia, reference)"]},{"cell_type":"markdown","metadata":{"id":"rfTV3nSSeq4t"},"source":["## Processing a file from memory\n","\n","In case the audio file is not stored on disk, pipelines can also process audio provided as a `{\"waveform\": ..., \"sample_rate\": ...}` dictionary."]},{"cell_type":"code","execution_count":11,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":410,"status":"ok","timestamp":1704809066383,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"Nha4sg76eq4u","outputId":"69192d82-c240-47d1-fe01-d373c2b4e500"},"outputs":[{"name":"stdout","output_type":"stream","text":["type(waveform)=\n","waveform.shape=torch.Size([1, 480000])\n","waveform.dtype=torch.float32\n"]}],"source":["import torchaudio\n","waveform, sample_rate = torchaudio.load(AUDIO_FILE)\n","\n","print(f\"{type(waveform)=}\")\n","print(f\"{waveform.shape=}\")\n","print(f\"{waveform.dtype=}\")\n","\n","audio_in_memory = {\"waveform\": waveform, \"sample_rate\": sample_rate}"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":388,"referenced_widgets":["514552137d7442368b39aaddce056bb8","f744ac29ccdd4d9d91050fabaed13e16","f18c80869ccc4ce1b2a215293d493e19","2c0b1ed212544e14b4934e036e8736d7","9988b4433d6e4c58a819a2e1b54f57b2","b8861f8cb72c4e65afac6519b7004170","668a49e70c2b4c8a9dcb534cb30714d6","9025d2b6c7b14162a692b98c3e86ec8a","6c361855da5e4eb78e311defb2ca2f2b","3be13f7f05da47b1ae3795ead9f2546a","07bb94f0dcda45a9a5e2d553fd239106","57ef34701b594a60a9fd269943b49888","1d051eb2d87e4671aa08039761c7899a","9537f195677745cba18618764b2290ab","4f0d63f098b0491097c6b72f3baee04f","f7be373f08c64c8bb5b6e070a7373cca","be8b64ce7f8247499c8bbcb62c5dd902","491c3dcba4474a52b860080fa7a627e7","4abbe0d8e0fa4e1b9d712743013b2d4e","661c7494ef2d47c794e6236e5f2c8978","6358920ecd46403a87cca46509465076","4f25b89365b24056b803119d415ac056","1d97d11a99454cc990cc5e374d0f8197","425146d4e0fb453486629d9d6af0a999","8fad50e0475341d9801e9f86e04e15c7","ea03f9526a7440a0a9a281ee877892fd","d1863b2c9ebc4efa9cac2dad7dfaf53a","8bda58a4c0c6430ca1470282ddbe8222","5d38b6ceec1844068f48da7bedf471fa","2a72c7660c2f498089c63c78eaa69fc8","05283cf0054d412c9a1ebfcc44bf41cb","b9d9acd287c345debdff1e72a85641af","91ca95844e084619b9ae01fe8176f875"]},"executionInfo":{"elapsed":12810,"status":"ok","timestamp":1704809082573,"user":{"displayName":"Clément PAGES","userId":"11757386314069785178"},"user_tz":-60},"id":"LvboMeeYeq4u","outputId":"225dd352-6367-4793-ae11-f8125d930c2b"},"outputs":[],"source":["vad = Pipeline.from_pretrained(\"pyannote/voice-activity-detection\", use_auth_token=True)"]},{"cell_type":"code","execution_count":12,"metadata":{},"outputs":[{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAABiIAAADyCAYAAADAzN2uAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAWJUlEQVR4nO3dfZCVdf3/8dfhRrzZG2SRXTZWF2/QTDDtV941Rt4Aak4o2r2JFY2KOqilaSplTRYTYk3ajU2pk/ptHNOyGW3KwMxRy8rIvt/IdpwRQ9FodhcxhOD8/nBc20BYZD+cZXk8ZnaGvc6153ofZs41n+XJua5KtVqtBgAAAAAAoIAhtR4AAAAAAAAYvIQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAihEiAAAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKGfQh4oUXXsg555yTPffcMyNGjEhLS0umTp2ahx56KEnS3t6eSqWSSqWS3XbbLYceemjuuOOOnp//3Oc+1/P4f34dcMABPftMnjx5o/ucffbZvWZZuHBhTjzxxDQ1NWXXXXfNgQcemIsvvjh///vfkySLFi1KpVJJZ2fnBq+jvb091113Xf//BQEAAAAAQEHDtvYJ1q1Y0R9z9MnQpqYt/pkZM2ZkzZo1ufnmm7P33ntn+fLluf/++7PiP+a++uqrM2vWrHR3d2f+/Pl5//vfnze96U058sgjkyRvectb8otf/KLX8w4b1vuvbtasWbn66qt7bdt11117/vztb3875557bs4888zceeedaW9vz9NPP51bbrkl8+fPz7XXXrvFrw0AAAAAAAa6rQ4Rz016az+M0Tdv+vvSLdq/s7MzDz74YBYtWpR3vetdSZK99tor73jHO3rtV19fn5aWlrS0tOT666/PD37wg9xzzz09IWLYsGFpaWnZ5LF23XXX193nmWeeyQUXXJALLrggCxYs6Nne3t6eo48+eqOfgAAAAAAAgMFgUF+aqa6uLnV1dbn77rvz8ssv9+lnhg0bluHDh2fNmjX9Nscdd9yRNWvW5JJLLtno4yNHjuy3YwEAAAAAwEAyqEPEsGHDctNNN+Xmm2/OyJEjc9RRR+Xyyy/P4sWLN7r/mjVrcs0116SrqyvHHHNMz/Y//elPPVHj1a//vv/DDTfcsME+t956a5LkySefTENDQ8aOHdunuceNG7fBcz399NNv8G8BAAAAAABqZ6svzTTQzZgxIyeddFIefPDBPPLII7n33nszb968fPe7383MmTOTJJdeemmuuOKKrF69OnV1dfnyl7+ck046qec59t9///zkJz/p9bwNDQ29vv/whz+cz372s722NTc3J0mq1WoqlUqfZ37wwQdTX1/fa9vkyZP7/PMAAAAAADBQbHWIaFn8eD+MUdbOO++c448/Pscff3yuvPLKfOITn8jcuXN7QsSnP/3pzJw5M3V1dWlubt4gGuy0007Zd999N3mMxsbG191nwoQJ6erqyrPPPtunT0WMHz9+g8s1/ffNsQEAAAAAYHuw1f+6PbSpqT/m2KYOPPDA3H333T3fjx49erOhYWucdtpp+cxnPpN58+b1uln1qzo7O90nAgAAAACAQWlQ/zf7FStW5PTTT8/HPvaxTJo0KfX19Xnssccyb968vPe97+3z8/z73//Oc88912tbpVLpufRSkrz00ksb7DNixIjsvvvuaWtry4IFC3Leeeelu7s7H/3oR9Pe3p5nnnkmt9xyS+rq6jJ//vyte7EAAAAAADAADeoQUVdXl8MOOywLFixIR0dH1q5dm7a2tsyaNSuXX355n5/nz3/+8waXVBoxYkRWr17d8/2NN96YG2+8sdc+U6dOzX333ZckOffcczNhwoR89atfzSmnnJJ//etfaW9vz3ve855cdNFFW/EqAQAAAABg4KpUq9VqrYcAAAAAAAAGpyG1HgAAAAAAABi8hAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoJhhfdlp/fr1WbZsWerr61OpVErPBAAAAAAADGDVajUrV65Ma2trhgzZ9Gce+hQili1blra2tn4ZDgAAAAAAGByWLl2acePGbXKfPoWI+vr6nidsaGjY+skAAAAAAIDtVnd3d9ra2nr6wab0KUS8ejmmhoYGIQIAAAAAAEiSPt3Owc2qAQAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAihEiAAAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAihEiAAAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAihEiAAAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAihEiAAAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAihEiAAAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKCYLQoR655/vtQcrx1j+fJ0z78265YvL36sWhwPAAD6y+bWsta6MHBt7fvT+xsAqLUt6QVbFiJeeGGLh9lS655/PiuvXbBNokctjgcAAP1lc2tZa10YuLb2/en9DQDU2pb0ApdmAgAAAAAAihEiAAAAAACAYoQIAAAAAACgmGFbsvP6ru6sW7Gi1CyvHKOzq+jzb+q4pV8bAAD0p76una11YeDpr999vb8BgFpZ39Xd5323KET886yPZe2QwfkhihUf+GCtRwAAgCKsdWHw8v4GAGpl5fr1fd53cFYFAAAAAABgQBAiAAAAAACAYoQIAAAAAACgmC26R8So738vTW//f6VmSZKs/d//q8k1Lpv+5/YMP/DN2/y4AADwRvV17WytCwNPf/3u6/0NANTK8N8+lpwwrU/7blGIGNLYkKFNTW9oqL5aN7Kx6PO/niEjG4u/NgAA6E99XTtb68LA01+/+3p/AwC1MqSxoe/7FpwDAAAAAADYwQkRAAAAAABAMUIEAAAAAABQzBaFiKF77FFqjteOMWZM6i+6MEPHjCl+rFocDwAA+svm1rLWujBwbe370/sbAKi1LekFlWq1Wt3cTt3d3WlsbExXV1caGvp+AwoAAAAAAGDw2ZJu4NJMAAAAAABAMUIEAAAAAABQjBABAAAAAAAUI0QAAAAAAADFCBEAAAAAAEAxQgQAAAAAAFCMEAEAAAAAABQjRAAAAAAAAMUIEQAAAAAAQDFCBAAAAAAAUIwQAQAAAAAAFCNEAAAAAAAAxQgRAAAAAABAMUIEAAAAAABQjBABAAAAAAAUI0QAAAAAAADFCBEAAAAAAEAxQgQAAAAAAFCMEAEAAAAAABQjRAAAAAAAAMUIEQAAAAAAQDFCBAAAAAAAUIwQAQAAAAAAFCNEAAAAAAAAxQgRAAAAAABAMUIEAAAAAABQjBABAAAAAAAUI0QAAAAAAADFCBEAAAAAAEAxQgQAAAAAAFCMEAEAAAAAABQjRAAAAAAAAMUIEQAAAAAAQDFCBAAAAAAAUIwQAQAAAAAAFCNEAAAAAAAAxQgRAAAAAABAMUIEAAAAAABQjBABAAAAAAAUI0QAAAAAAADFCBEAAAAAAEAxQgQAAAAAAFCMEAEAAAAAABQjRAAAAAAAAMUIEQAAAAAAQDFCBAAAAAAAUIwQAQAAAAAAFCNEAAAAAAAAxQgRAAAAAABAMUIEAAAAAABQjBABAAAAAAAUI0QAAAAAAADFCBEAAAAAAEAxQgQAAAAAAFCMEAEAAAAAABQjRAAAAAAAAMUIEQAAAAAAQDFCBAAAAAAAUIwQAQAAAAAAFCNEAAAAAAAAxQgRAAAAAABAMUIEAAAAAABQjBABAAAAAAAUI0QAAAAAAADFCBEAAAAAAEAxQgQAAAAAAFCMEAEAAAAAABQzrC87VavVJEl3d3fRYQAAAAAAgIHv1V7waj/YlD6FiJUrVyZJ2tratmIsAAAAAABgMFm5cmUaGxs3uU+l2odcsX79+ixbtiz19fWpVCr9NiDwmu7u7rS1tWXp0qVpaGio9TgAA5pzJkDfOWcC9J1zJkDfVavVrFy5Mq2trRkyZNN3gejTJyKGDBmScePG9ctwwKY1NDRY7AD0kXMmQN85ZwL0nXMmQN9s7pMQr3KzagAAAAAAoBghAgAAAAAAKEaIgAFixIgRmTt3bkaMGFHrUQAGPOdMgL5zzgToO+dMgDL6dLNqAAAAAACAN8InIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAihEiYBv71a9+lZNPPjmtra2pVCq5++67ez1erVZz1VVXZezYsdlll11y3HHH5cknn6zNsAA1trlz5syZM1OpVHp9TZs2rTbDAtTQNddck7e//e2pr6/PmDFjMn369CxZsqTXPqtXr87s2bPT1NSUurq6zJgxI8uXL6/RxAC11Zfz5uTJkzdYa5599tk1mhhg+yZEwDa2atWqHHzwwbn++us3+vi8efPy9a9/Pd/61rfy6KOPZrfddsvUqVOzevXqbTwpQO1t7pyZJNOmTcuzzz7b83X77bdvwwkBBoYHHnggs2fPziOPPJKf//znWbt2baZMmZJVq1b17HPhhRfmnnvuyR133JEHHnggy5Yty6mnnlrDqQFqpy/nzSSZNWtWr7XmvHnzajQxwPatUq1Wq7UeAnZUlUold911V6ZPn57klU9DtLa25uKLL86nPvWpJElXV1eam5tz00035QMf+EANpwWorf8+ZyavfCKis7Nzg09KAOzoXnjhhYwZMyYPPPBAjj766HR1dWWPPfbIbbfdltNOOy1J8pe//CVvfvOb8/DDD+fwww+v8cQAtfXf583klU9EvPWtb811111X2+EABgGfiIAB5Kmnnspzzz2X4447rmdbY2NjDjvssDz88MM1nAxg4Fq0aFHGjBmT/fffP+ecc05WrFhR65EAaq6rqytJMmrUqCTJ7373u6xdu7bXOvOAAw7InnvuaZ0JkA3Pm6+69dZbM3r06Bx00EG57LLL8tJLL9ViPIDt3rBaDwC85rnnnkuSNDc399re3Nzc8xgAr5k2bVpOPfXUjB8/Ph0dHbn88stzwgkn5OGHH87QoUNrPR5ATaxfvz5z5szJUUcdlYMOOijJK+vMnXbaKSNHjuy1r3UmwMbPm0nyoQ99KHvttVdaW1uzePHiXHrppVmyZEl+9KMf1XBagO2TEAEAbLf+85J1EydOzKRJk7LPPvtk0aJFOfbYY2s4GUDtzJ49O0888UR+/etf13oUgO3C6503P/nJT/b8eeLEiRk7dmyOPfbYdHR0ZJ999tnWYwJs11yaCQaQlpaWJMny5ct7bV++fHnPYwC8vr333jujR4/O3/72t1qPAlAT5513Xn76059m4cKFGTduXM/2lpaWrFmzJp2dnb32t84EdnSvd97cmMMOOyxJrDUB3gAhAgaQ8ePHp6WlJffff3/Ptu7u7jz66KM54ogjajgZwPbhmWeeyYoVKzJ27NhajwKwTVWr1Zx33nm566678stf/jLjx4/v9fjb3va2DB8+vNc6c8mSJXn66aetM4Ed0ubOmxvz+OOPJ4m1JsAb4NJMsI29+OKLvf73xFNPPZXHH388o0aNyp577pk5c+bki1/8Yvbbb7+MHz8+V155ZVpbWzN9+vTaDQ1QI5s6Z44aNSqf//znM2PGjLS0tKSjoyOXXHJJ9t1330ydOrWGUwNse7Nnz85tt92WH//4x6mvr++570NjY2N22WWXNDY25uMf/3guuuiijBo1Kg0NDTn//PNzxBFH5PDDD6/x9ADb3ubOmx0dHbntttty4oknpqmpKYsXL86FF16Yo48+OpMmTarx9ADbn0q1Wq3WegjYkSxatCjvfve7N9h+5pln5qabbkq1Ws3cuXPzne98J52dnXnnO9+ZG264IRMmTKjBtAC1talz5je/+c1Mnz49f/jDH9LZ2ZnW1tZMmTIlX/jCF9Lc3FyDaQFqp1KpbHT797///cycOTNJsnr16lx88cW5/fbb8/LLL2fq1Km54YYbXJoJ2CFt7ry5dOnSfOQjH8kTTzyRVatWpa2tLaecckquuOKKNDQ0bONpAbZ/QgQAAAAAAFCMe0QAAAAAAADFCBEAAAAAAEAxQgQAAAAAAFCMEAEAAAAAABQjRAAAAAAAAMUIEQAAAAAAQDFCBAAAAAAAUIwQAQAA9DJz5sxMnz691mMAAACDxLBaDwAAAGw7lUplk4/PnTs3X/va11KtVrfRRAAAwGAnRAAAwA7k2Wef7fnzD3/4w1x11VVZsmRJz7a6urrU1dXVYjQAAGCQcmkmAADYgbS0tPR8NTY2plKp9NpWV1e3waWZJk+enPPPPz9z5szJ7rvvnubm5tx4441ZtWpVzjrrrNTX12fffffNvffe2+tYTzzxRE444YTU1dWlubk5Z5xxRv7xj39s41cMAADUmhABAABs1s0335zRo0fnN7/5Tc4///ycc845Of3003PkkUfm97//faZMmZIzzjgjL730UpKks7MzxxxzTA455JA89thjue+++7J8+fK8733vq/ErAQAAtjUhAgAA2KyDDz44V1xxRfbbb79cdtll2XnnnTN69OjMmjUr++23X6666qqsWLEiixcvTpJ84xvfyCGHHJIvfelLOeCAA3LIIYfke9/7XhYuXJi//vWvNX41AADAtuQeEQAAwGZNmjSp589Dhw5NU1NTJk6c2LOtubk5SfL8888nSf74xz9m4cKFG73fREdHRyZMmFB4YgAAYKAQIgAAgM0aPnx4r+8rlUqvbZVKJUmyfv36JMmLL76Yk08+OV/5ylc2eK6xY8cWnBQAABhohAgAAKDfHXroobnzzjvT3t6eYcP82gEAADsy94gAAAD63ezZs/PPf/4zH/zgB/Pb3/42HR0d+dnPfpazzjor69atq/V4AADANiREAAAA/a61tTUPPfRQ1q1blylTpmTixImZM2dORo4cmSFD/BoCAAA7kkq1Wq3WeggAAAAAAGBw8l+RAAAAAACAYoQIAAAAAACgGCECAAAAAAAoRogAAAAAAACKESIAAAAAAIBihAgAAAAAAKAYIQIAAAAAAChGiAAAAAAAAIoRIgAAAAAAgGKECAAAAAAAoBghAgAAAAAAKEaIAAAAAAAAivn/OkPb8btR7hwAAAAASUVORK5CYII=","text/plain":[""]},"execution_count":12,"metadata":{},"output_type":"execute_result"}],"source":["vad(audio_in_memory)"]},{"cell_type":"markdown","metadata":{"id":"LllNe0s3eq4u"},"source":["## Offline use\n","\n","Gating models and pipelines allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. Please fill this form as precisely as possible.\n","\n","For instance, before gating `pyannote/speaker-diarization`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! maintaining open source libraries is time consuming.\n","\n","That being said: this whole authentication process does not prevent you from using official `pyannote.audio` models and pipelines offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production).\n","\n","* Step 1: download `config.yaml` of [`pyannote/voice-activity-detection`](https://hf.co/pyannote/voice-activity-detection) pipeline\n","\n","![](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/assets/download-pipeline.png?raw=1)\n","\n","* Step 2: download the `pytorch_model.bin` model\n","\n","![](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/assets/download-model.png?raw=1)\n","\n","* Step 3: edit `config.yaml` to point to the local model\n","\n","```diff\n","pipeline:\n"," name: pyannote.audio.pipelines.VoiceActivityDetection\n"," params:\n","- segmentation: pyannote/segmentation@Interspeech2021\n","+ segmentation: pytorch_model.bin\n","\n","params:\n"," min_duration_off: 0.09791355693027545\n"," min_duration_on: 0.05537587440407595\n"," offset: 0.4806866463041527\n"," onset: 0.8104268538848918\n","```\n","\n","* Step 4: load the pipeline"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"a1bZGOoEeq4v","outputId":"a8851164-100c-4cc1-afd2-d387a5fc4fd5"},"outputs":[{"data":{"image/png":"","text/plain":[""]},"execution_count":21,"metadata":{},"output_type":"execute_result"}],"source":["# look ma: no hands!\n","offline_vad = Pipeline.from_pretrained(\"config.yaml\")\n","offline_vad(audio_in_memory)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"eepjEec8eq4w"},"outputs":[],"source":["# just checking output is the same\n","assert (vad(audio_in_memory) == offline_vad(audio_in_memory))"]}],"metadata":{"accelerator":"GPU","colab":{"gpuType":"T4","provenance":[{"file_id":"https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb","timestamp":1704808199718}]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.13"},"vscode":{"interpreter":{"hash":"36a3a48a52702f18671693adf589423ec3f7db45d50f6ee539f1b0696bb58d43"}},"widgets":{"application/vnd.jupyter.widget-state+json":{"008a47e4d1174b188e75232fed325398":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"00ac431507a54c0d9a41244013b1c0f3":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"05283cf0054d412c9a1ebfcc44bf41cb":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"07bb94f0dcda45a9a5e2d553fd239106":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"0b127e22a6de4339aecc1e76a6719d7f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_ae378b480ed747438bcbe147a7e8d277","placeholder":"","style":"IPY_MODEL_b22a1d21940e4995b72642b003ae1301","value":"Your token has been saved in your configured git credential helpers (store)."}},"0b191fb0d6db4f1a98a641dc50445f1e":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_4a589ea883de4409b7a8275127747fa7","IPY_MODEL_b5f4829158cf4a5c8350966d8eacbf60","IPY_MODEL_7cab72ad0ace45d58d6954021082ace7"],"layout":"IPY_MODEL_5440ac4cffa0438c99c0daeca27a4087"}},"0d0dfc2576784812b6abf673752fa812":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"0fccc53081a34d58a46912d1ceae0533":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_00ac431507a54c0d9a41244013b1c0f3","max":221,"min":0,"orientation":"horizontal","style":"IPY_MODEL_6f318bc3e900434a9134a79e4057f666","value":221}},"16244ab1dc76493d9fafad80a9b65700":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"1749a0e7fc6847849ab40ac7f764b699":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"17e14569a3f74cbf991b582c27fdf280":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_e98b1f7ec5e04e748069340fa2fcec2d","max":469,"min":0,"orientation":"horizontal","style":"IPY_MODEL_616f35eccdd34bcdb5c9f700103ff538","value":469}},"185457147fe44751a1c6e7ae6ad23570":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"1b5f49211db54348924a05482ca7379d":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_dc9c065286e544df921e5bc4a51faf64","placeholder":"","style":"IPY_MODEL_16244ab1dc76493d9fafad80a9b65700","value":"Login successful"}},"1d051eb2d87e4671aa08039761c7899a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_be8b64ce7f8247499c8bbcb62c5dd902","placeholder":"","style":"IPY_MODEL_491c3dcba4474a52b860080fa7a627e7","value":"pytorch_model.bin: 100%"}},"1d97d11a99454cc990cc5e374d0f8197":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_425146d4e0fb453486629d9d6af0a999","IPY_MODEL_8fad50e0475341d9801e9f86e04e15c7","IPY_MODEL_ea03f9526a7440a0a9a281ee877892fd"],"layout":"IPY_MODEL_d1863b2c9ebc4efa9cac2dad7dfaf53a"}},"1dd6bfb824ba4919ba392e0d4cd5ac82":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"1e53e26668aa4f8dabdd6973b79b5b9f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"1f3be36a57074214a5f92e08165e41be":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"253408a6cab947c7b9ae40a14e255328":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"275642719b694328ad3b0fb06b53e42a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"2a72c7660c2f498089c63c78eaa69fc8":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"2be7c1f5fe32462f8d1cd7ea57cfbf3a":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"2c0b1ed212544e14b4934e036e8736d7":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_3be13f7f05da47b1ae3795ead9f2546a","placeholder":"","style":"IPY_MODEL_07bb94f0dcda45a9a5e2d553fd239106","value":" 277/277 [00:00<00:00, 14.5kB/s]"}},"30864d7b4d074994ba643c73875a67b8":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"36f60aa99ea84b739f066c6ee216f159":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"39cd6fe37b6f4476b6032c29f026d570":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_7580595b498a4c50b748af9df67132e2","placeholder":"","style":"IPY_MODEL_3f1d750804a54d78a6abff2623441360","value":" 221/221 [00:00<00:00, 14.0kB/s]"}},"3bb60a8db7c0406f8debdc8dd88bcdb4":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_bf644ae3e2894787bb416740749200cd","IPY_MODEL_0fccc53081a34d58a46912d1ceae0533","IPY_MODEL_39cd6fe37b6f4476b6032c29f026d570"],"layout":"IPY_MODEL_87a0e712c0b045449b98ee368c62fff2"}},"3be13f7f05da47b1ae3795ead9f2546a":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"3f1d750804a54d78a6abff2623441360":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"3f500b1182a945c3b8b546b1bd04f98e":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"3f69734124f7449da0efffbbfd1153fe":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"425146d4e0fb453486629d9d6af0a999":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_8bda58a4c0c6430ca1470282ddbe8222","placeholder":"","style":"IPY_MODEL_5d38b6ceec1844068f48da7bedf471fa","value":"config.yaml: 100%"}},"45c2352ba41b46288c8ba74c661e1f85":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ButtonModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ButtonModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ButtonView","button_style":"","description":"Login","disabled":false,"icon":"","layout":"IPY_MODEL_c3b3069a93994e55b993cab04c066f17","style":"IPY_MODEL_f77186e2976d4a7daa798fd9f0f7b4cb","tooltip":""}},"46ac2d0bfedc47c8836e8dde0fc91662":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"491c3dcba4474a52b860080fa7a627e7":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"4a589ea883de4409b7a8275127747fa7":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_0d0dfc2576784812b6abf673752fa812","placeholder":"","style":"IPY_MODEL_84e416e01cfa4a79b0294a44cfd77f32","value":"pytorch_model.bin: 100%"}},"4abbe0d8e0fa4e1b9d712743013b2d4e":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"4e0ca7f084804d30a89cb9dc439b6cdc":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_185457147fe44751a1c6e7ae6ad23570","placeholder":"","style":"IPY_MODEL_b02b253086e34249ae1d896a75a08960","value":" 469/469 [00:00<00:00, 34.3kB/s]"}},"4f0d63f098b0491097c6b72f3baee04f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_6358920ecd46403a87cca46509465076","placeholder":"","style":"IPY_MODEL_4f25b89365b24056b803119d415ac056","value":" 17.7M/17.7M [00:00<00:00, 66.5MB/s]"}},"4f25b89365b24056b803119d415ac056":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"5033834cc66f49efaa16b240245ec1d3":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_1f3be36a57074214a5f92e08165e41be","placeholder":"","style":"IPY_MODEL_275642719b694328ad3b0fb06b53e42a","value":"Token is valid (permission: write)."}},"514552137d7442368b39aaddce056bb8":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_f744ac29ccdd4d9d91050fabaed13e16","IPY_MODEL_f18c80869ccc4ce1b2a215293d493e19","IPY_MODEL_2c0b1ed212544e14b4934e036e8736d7"],"layout":"IPY_MODEL_9988b4433d6e4c58a819a2e1b54f57b2"}},"517a1b1faec045f7847ca84f0538f4e1":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"5253489738b249f69b30f1c84884a95a":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"5440ac4cffa0438c99c0daeca27a4087":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"54eacc5604f64f658617cde596a728dd":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"57ef34701b594a60a9fd269943b49888":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_1d051eb2d87e4671aa08039761c7899a","IPY_MODEL_9537f195677745cba18618764b2290ab","IPY_MODEL_4f0d63f098b0491097c6b72f3baee04f"],"layout":"IPY_MODEL_f7be373f08c64c8bb5b6e070a7373cca"}},"58b140d0453a4a219bc64e0152cc5c5e":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"58e423b86bf349aa89fc81a2b0d0f697":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"5a2a95f41f014663b5d131635844e3c8":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_f8df65e5ad6444648681230f415b8d08","IPY_MODEL_17e14569a3f74cbf991b582c27fdf280","IPY_MODEL_4e0ca7f084804d30a89cb9dc439b6cdc"],"layout":"IPY_MODEL_d304f698c6f34f3692548267ca0aba47"}},"5cc77ca8ba304ad88342ed6ced6c6a4a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"5d38b6ceec1844068f48da7bedf471fa":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"616f35eccdd34bcdb5c9f700103ff538":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"6358920ecd46403a87cca46509465076":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"6412ef268e11484f8bbcdb510f66ee21":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_a987408c2f994329a9a80208b22e3c52","IPY_MODEL_9446f61936c4421d9610464984890353","IPY_MODEL_a80c0fb6a1254bb9a00741c1346b9d9a"],"layout":"IPY_MODEL_b55e95f31216480fa6c81bd1b57ad729"}},"6556790d1997431a8476447297e3f595":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"661c7494ef2d47c794e6236e5f2c8978":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"668a49e70c2b4c8a9dcb534cb30714d6":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"686b58e0ccee468f938ea5659f3a9623":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"6c361855da5e4eb78e311defb2ca2f2b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"6d5431955c09424a96e825aa9bba1b62":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"6f318bc3e900434a9134a79e4057f666":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"6fe26270a8794dea85b9ac8dedde6353":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"VBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"VBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"VBoxView","box_style":"","children":["IPY_MODEL_5033834cc66f49efaa16b240245ec1d3","IPY_MODEL_0b127e22a6de4339aecc1e76a6719d7f","IPY_MODEL_8a7afaf126ac49dcaf24d64c2d918086","IPY_MODEL_1b5f49211db54348924a05482ca7379d"],"layout":"IPY_MODEL_9d7b5e47af9448c98a2a434f02b54633"}},"70d672feec1c452aa49a40a3b8c1a4f0":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"754a843fd634446b8d246998eaded1fa":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"7580595b498a4c50b748af9df67132e2":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"775611277f1948a5ab95641c4e92ff2e":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_70d672feec1c452aa49a40a3b8c1a4f0","placeholder":"","style":"IPY_MODEL_e8b3188ee4934ab4becbc431fc4ccb0b","value":"config.yaml: 100%"}},"78f428e1d8ac4c78bd84d208a78b1fe5":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_7ffa4930e0a7421e9464abff87bbde7f","placeholder":"","style":"IPY_MODEL_1dd6bfb824ba4919ba392e0d4cd5ac82","value":"\nPro Tip: If you don't already have one, you can create a dedicated\n'notebooks' token with 'write' access, that you can then easily reuse for all\nnotebooks. "}},"7c23cd10c573438c840148ba1afaa5f3":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_bc4d3f45eaf04cccb7a655ef87d7a927","placeholder":"","style":"IPY_MODEL_36f60aa99ea84b739f066c6ee216f159","value":"Connecting..."}},"7cab72ad0ace45d58d6954021082ace7":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_54eacc5604f64f658617cde596a728dd","placeholder":"","style":"IPY_MODEL_1749a0e7fc6847849ab40ac7f764b699","value":" 5.91M/5.91M [00:00<00:00, 88.7MB/s]"}},"7ffa4930e0a7421e9464abff87bbde7f":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"84e416e01cfa4a79b0294a44cfd77f32":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"87a0e712c0b045449b98ee368c62fff2":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"8a7afaf126ac49dcaf24d64c2d918086":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"LabelModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"LabelModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"LabelView","description":"","description_tooltip":null,"layout":"IPY_MODEL_58e423b86bf349aa89fc81a2b0d0f697","placeholder":"","style":"IPY_MODEL_a3ebe0f0e6e44522a2545db0ed47af62","value":"Your token has been saved to /root/.cache/huggingface/token"}},"8bda58a4c0c6430ca1470282ddbe8222":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"8fad50e0475341d9801e9f86e04e15c7":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_2a72c7660c2f498089c63c78eaa69fc8","max":1980,"min":0,"orientation":"horizontal","style":"IPY_MODEL_05283cf0054d412c9a1ebfcc44bf41cb","value":1980}},"9025d2b6c7b14162a692b98c3e86ec8a":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"90533da4fc1341a28bc247bc9fbbbe9e":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"91ca95844e084619b9ae01fe8176f875":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"9446f61936c4421d9610464984890353":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_30864d7b4d074994ba643c73875a67b8","max":26645418,"min":0,"orientation":"horizontal","style":"IPY_MODEL_b852998433e24f7fad1fe27e249085f9","value":26645418}},"9537f195677745cba18618764b2290ab":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_4abbe0d8e0fa4e1b9d712743013b2d4e","max":17739960,"min":0,"orientation":"horizontal","style":"IPY_MODEL_661c7494ef2d47c794e6236e5f2c8978","value":17739960}},"9988b4433d6e4c58a819a2e1b54f57b2":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"9d7b5e47af9448c98a2a434f02b54633":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":"center","align_self":null,"border":null,"bottom":null,"display":"flex","flex":null,"flex_flow":"column","grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":"50%"}},"a3ebe0f0e6e44522a2545db0ed47af62":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"a435c85128f148cb80152d03825710e2":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"a80c0fb6a1254bb9a00741c1346b9d9a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_46ac2d0bfedc47c8836e8dde0fc91662","placeholder":"","style":"IPY_MODEL_58b140d0453a4a219bc64e0152cc5c5e","value":" 26.6M/26.6M [00:00<00:00, 104MB/s]"}},"a8e61b300a324911b1d3babf6b1ebac1":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"CheckboxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"CheckboxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"CheckboxView","description":"Add token as git credential?","description_tooltip":null,"disabled":false,"indent":true,"layout":"IPY_MODEL_5253489738b249f69b30f1c84884a95a","style":"IPY_MODEL_3f500b1182a945c3b8b546b1bd04f98e","value":true}},"a987408c2f994329a9a80208b22e3c52":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_517a1b1faec045f7847ca84f0538f4e1","placeholder":"","style":"IPY_MODEL_008a47e4d1174b188e75232fed325398","value":"pytorch_model.bin: 100%"}},"ae378b480ed747438bcbe147a7e8d277":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"b02b253086e34249ae1d896a75a08960":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"b22a1d21940e4995b72642b003ae1301":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"b33021db4313476ab8bcb85bb1b190d7":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"b55e95f31216480fa6c81bd1b57ad729":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"b5f4829158cf4a5c8350966d8eacbf60":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_b33021db4313476ab8bcb85bb1b190d7","max":5905440,"min":0,"orientation":"horizontal","style":"IPY_MODEL_6d5431955c09424a96e825aa9bba1b62","value":5905440}},"b852998433e24f7fad1fe27e249085f9":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"b8861f8cb72c4e65afac6519b7004170":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"b9d9acd287c345debdff1e72a85641af":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"bc4d3f45eaf04cccb7a655ef87d7a927":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"be8b64ce7f8247499c8bbcb62c5dd902":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"bf644ae3e2894787bb416740749200cd":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_2be7c1f5fe32462f8d1cd7ea57cfbf3a","placeholder":"","style":"IPY_MODEL_d6135ce9cfaa4f8496a70cee4f1e89a7","value":"config.yaml: 100%"}},"c3b3069a93994e55b993cab04c066f17":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"c955f8144b46451493474e02726b5deb":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_754a843fd634446b8d246998eaded1fa","max":399,"min":0,"orientation":"horizontal","style":"IPY_MODEL_1e53e26668aa4f8dabdd6973b79b5b9f","value":399}},"c9eeb26bd5e049f89205e494d54e3818":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_cf5ae82d36384a728d8ca893cc27392b","placeholder":"","style":"IPY_MODEL_5cc77ca8ba304ad88342ed6ced6c6a4a","value":" 399/399 [00:00<00:00, 24.8kB/s]"}},"cf5ae82d36384a728d8ca893cc27392b":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d1863b2c9ebc4efa9cac2dad7dfaf53a":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d304f698c6f34f3692548267ca0aba47":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d4d428f0b0c44756a128adae747716d2":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d6135ce9cfaa4f8496a70cee4f1e89a7":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"dc9c065286e544df921e5bc4a51faf64":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"e8b3188ee4934ab4becbc431fc4ccb0b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"e98b1f7ec5e04e748069340fa2fcec2d":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"ea03f9526a7440a0a9a281ee877892fd":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_b9d9acd287c345debdff1e72a85641af","placeholder":"","style":"IPY_MODEL_91ca95844e084619b9ae01fe8176f875","value":" 1.98k/1.98k [00:00<00:00, 122kB/s]"}},"ef6be001daaa465598bdf20c304bae0b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"PasswordModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"PasswordModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"PasswordView","continuous_update":true,"description":"Token:","description_tooltip":null,"disabled":false,"layout":"IPY_MODEL_90533da4fc1341a28bc247bc9fbbbe9e","placeholder":"","style":"IPY_MODEL_6556790d1997431a8476447297e3f595","value":""}},"f18c80869ccc4ce1b2a215293d493e19":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_9025d2b6c7b14162a692b98c3e86ec8a","max":277,"min":0,"orientation":"horizontal","style":"IPY_MODEL_6c361855da5e4eb78e311defb2ca2f2b","value":277}},"f744ac29ccdd4d9d91050fabaed13e16":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_b8861f8cb72c4e65afac6519b7004170","placeholder":"","style":"IPY_MODEL_668a49e70c2b4c8a9dcb534cb30714d6","value":"config.yaml: 100%"}},"f77186e2976d4a7daa798fd9f0f7b4cb":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ButtonStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ButtonStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","button_color":null,"font_weight":""}},"f7be373f08c64c8bb5b6e070a7373cca":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"f8df65e5ad6444648681230f415b8d08":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_3f69734124f7449da0efffbbfd1153fe","placeholder":"","style":"IPY_MODEL_686b58e0ccee468f938ea5659f3a9623","value":"config.yaml: 100%"}},"f9969c7acd834359bd67e83c31b82c0b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_253408a6cab947c7b9ae40a14e255328","placeholder":"","style":"IPY_MODEL_a435c85128f148cb80152d03825710e2","value":" Copy a token from your Hugging Face\ntokens page and paste it below. Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file. "}},"ff048516fdba44a48a69c51a82aa7aaa":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_775611277f1948a5ab95641c4e92ff2e","IPY_MODEL_c955f8144b46451493474e02726b5deb","IPY_MODEL_c9eeb26bd5e049f89205e494d54e3818"],"layout":"IPY_MODEL_d4d428f0b0c44756a128adae747716d2"}}}}},"nbformat":4,"nbformat_minor":0}
diff --git a/tutorials/assets/download-model.png b/tutorials/assets/download-model.png
index 4ca5350f5..810500178 100644
Binary files a/tutorials/assets/download-model.png and b/tutorials/assets/download-model.png differ
diff --git a/tutorials/community/offline_usage_speaker_diarization.ipynb b/tutorials/community/offline_usage_speaker_diarization.ipynb
new file mode 100644
index 000000000..932742628
--- /dev/null
+++ b/tutorials/community/offline_usage_speaker_diarization.ipynb
@@ -0,0 +1,172 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Offline Speaker Diarization (speaker-diarization-3.1)\n",
+ "\n",
+ "This notebooks gives a short introduction how to use the [speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) pipeline with local models.\n",
+ "\n",
+ "In order to use local models, you first need to download them from huggingface and place them in a local folder. \n",
+ "Then you need to create a local config file, similar to the one in HF, but with local model paths.\n",
+ "\n",
+ "❗ **Naming of the model files is REALLY important! See end of notebook for details.** ❗\n",
+ "\n",
+ "## Get the models\n",
+ "\n",
+ "1. Install the `pyannote-audio` package: `!pip install pyannote.audio`\n",
+ "2. Create a huggingface account https://huggingface.co/join\n",
+ "3. Accept [pyannote/segmentation-3.0](https://hf.co/pyannote/segmentation-3.0) user conditions\n",
+ "4. Create a local folder `models`, place all downloaded files there\n",
+ " 1. [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/blob/main/pytorch_model.bin), to be placed in `models/pyannote_model_wespeaker-voxceleb-resnet34-LM.bin`\n",
+ " 2. [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0/blob/main/pytorch_model.bin), to be placed in `models/pyannote_model_segmentation-3.0.bin`\n",
+ "\n",
+ "Running `ls models` should show the following files:\n",
+ "```\n",
+ "pyannote_model_segmentation-3.0.bin (5.7M)\n",
+ "pyannote_model_wespeaker-voxceleb-resnet34-LM.bin (26MB)\n",
+ "```\n",
+ "\n",
+ "❗ **make sure the 'wespeaker-voxceleb-resnet34-LM' model is named 'pyannote_model_wespeaker-voxceleb-resnet34-LM.bin'** ❗"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Config for local models\n",
+ "\n",
+ "Create a local config, similar to the one in HF: [speaker-diarization-3.1/blob/main/config.yaml](https://huggingface.co/pyannote/speaker-diarization-3.1/blob/main/config.yaml), but with local model paths\n",
+ "\n",
+ "Contents of `models/pyannote_diarization_config.yaml`:\n",
+ "\n",
+ "```yaml\n",
+ "version: 3.1.0\n",
+ "\n",
+ "pipeline:\n",
+ " name: pyannote.audio.pipelines.SpeakerDiarization\n",
+ " params:\n",
+ " clustering: AgglomerativeClustering\n",
+ " # embedding: pyannote/wespeaker-voxceleb-resnet34-LM # if you want to use the HF model\n",
+ " embedding: models/pyannote_model_wespeaker-voxceleb-resnet34-LM.bin # if you want to use the local model\n",
+ " embedding_batch_size: 32\n",
+ " embedding_exclude_overlap: true\n",
+ " # segmentation: pyannote/segmentation-3.0 # if you want to use the HF model\n",
+ " segmentation: models/pyannote_model_segmentation-3.0.bin # if you want to use the local model\n",
+ " segmentation_batch_size: 32\n",
+ "\n",
+ "params:\n",
+ " clustering:\n",
+ " method: centroid\n",
+ " min_cluster_size: 12\n",
+ " threshold: 0.7045654963945799\n",
+ " segmentation:\n",
+ " min_duration_off: 0.0\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Loading the local pipeline\n",
+ "\n",
+ "**Hint**: The paths in the config are relative to the current working directory, not relative to the config file.\n",
+ "If you want to start your notebook/script from a different directory, you can use `os.chdir` temporarily, to 'emulate' config-relative paths.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from pathlib import Path\n",
+ "from pyannote.audio import Pipeline\n",
+ "\n",
+ "def load_pipeline_from_pretrained(path_to_config: str | Path) -> Pipeline:\n",
+ " path_to_config = Path(path_to_config)\n",
+ "\n",
+ " print(f\"Loading pyannote pipeline from {path_to_config}...\")\n",
+ " # the paths in the config are relative to the current working directory\n",
+ " # so we need to change the working directory to the model path\n",
+ " # and then change it back\n",
+ "\n",
+ " cwd = Path.cwd().resolve() # store current working directory\n",
+ "\n",
+ " # first .parent is the folder of the config, second .parent is the folder containing the 'models' folder\n",
+ " cd_to = path_to_config.parent.parent.resolve()\n",
+ "\n",
+ " print(f\"Changing working directory to {cd_to}\")\n",
+ " os.chdir(cd_to)\n",
+ "\n",
+ " pipeline = Pipeline.from_pretrained(path_to_config)\n",
+ "\n",
+ " print(f\"Changing working directory back to {cwd}\")\n",
+ " os.chdir(cwd)\n",
+ "\n",
+ " return pipeline\n",
+ "\n",
+ "PATH_TO_CONFIG = \"path/to/your/pyannote_diarization_config.yaml\"\n",
+ "pipeline = load_pipeline_from_pretrained(PATH_TO_CONFIG)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Notes on file naming (pyannote-audio 3.1.1)\n",
+ "\n",
+ "Pyannote uses some internal logic to determine the model type.\n",
+ "\n",
+ "The funtion `def PretrainedSpeakerEmbedding(...` in (speaker_verification.py)[https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speaker_verification.py#L712] uses the the file path of the model to infer the model type.\n",
+ "\n",
+ "```python\n",
+ "def PretrainedSpeakerEmbedding(\n",
+ " embedding: PipelineModel,\n",
+ " device: torch.device = None,\n",
+ " use_auth_token: Union[Text, None] = None,\n",
+ "):\n",
+ " #...\n",
+ " if isinstance(embedding, str) and \"pyannote\" in embedding:\n",
+ " return PyannoteAudioPretrainedSpeakerEmbedding(\n",
+ " embedding, device=device, use_auth_token=use_auth_token\n",
+ " )\n",
+ "\n",
+ " elif isinstance(embedding, str) and \"speechbrain\" in embedding:\n",
+ " return SpeechBrainPretrainedSpeakerEmbedding(\n",
+ " embedding, device=device, use_auth_token=use_auth_token\n",
+ " )\n",
+ "\n",
+ " elif isinstance(embedding, str) and \"nvidia\" in embedding:\n",
+ " return NeMoPretrainedSpeakerEmbedding(embedding, device=device)\n",
+ "\n",
+ " elif isinstance(embedding, str) and \"wespeaker\" in embedding:\n",
+ " return ONNXWeSpeakerPretrainedSpeakerEmbedding(embedding, device=device) # <-- this is called, but the wespeaker-voxceleb-resnet34-LM is not an ONNX model\n",
+ "\n",
+ " else:\n",
+ " # fallback to pyannote in case we are loading a local model\n",
+ " return PyannoteAudioPretrainedSpeakerEmbedding(\n",
+ " embedding, device=device, use_auth_token=use_auth_token\n",
+ " )\n",
+ "```\n",
+ "\n",
+ "The [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/blob/main/pytorch_model.bin) model is not an ONNX model, but a `PyannoteAudioPretrainedSpeakerEmbedding`. So if `wespeaker` is in the file name, the code will infer the model type incorrectly. If `pyannote` is somewhere in the file name, the model type will be inferred correctly, as the first if statement will be true..."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.11.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/tutorials/intro.ipynb b/tutorials/intro.ipynb
index 572ea2f6d..328ddceaf 100644
--- a/tutorials/intro.ipynb
+++ b/tutorials/intro.ipynb
@@ -10,15 +10,6 @@
""
]
},
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "9-KmdPlBYnp6"
- },
- "source": [
- ""
- ]
- },
{
"cell_type": "markdown",
"metadata": {
@@ -622,7 +613,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.11.5"
+ "version": "3.10.13"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
diff --git a/tutorials/overlapped_speech_detection.ipynb b/tutorials/overlapped_speech_detection.ipynb
index 1ad5d4090..9211f0626 100644
--- a/tutorials/overlapped_speech_detection.ipynb
+++ b/tutorials/overlapped_speech_detection.ipynb
@@ -1,13 +1,41 @@
{
"cells": [
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [],
"source": [
- "# TODO: switch to AMI\n",
- "PROTOCOL = 'Debug.SpeakerDiarization.Debug'"
+ ""
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Finetuning a segmentation model on an overlapped speech detection task with `pyannote.audio`\n",
+ "\n",
+ "Overlapped speech detection (OSD) is the task of detecting regions where at least two speakers are speaking at the same time. In this notebook, we will finetune a segmentation model on the OSD task, then evaluate an OSD pipeline on AMI database."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Tutorial setup"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### `Google Colab` setup"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you are running this tutorial on `Colab`, execute the following commands in order to setup `Colab` environment. These commands will install `pyannote.audio` and download a mini version of the `AMI` corpus."
]
},
{
@@ -16,17 +44,49 @@
"metadata": {},
"outputs": [],
"source": [
- "# TODO: update this tutorial to do fine tuning of a model pretrained on DIHARD"
+ "!pip install -qq pyannote.audio==3.1.1\n",
+ "!pip install -qq ipython==7.34.0\n",
+ "!git clone https://github.com/pyannote/AMI-diarization-setup.git\n",
+ "%cd ./AMI-diarization-setup/pyannote/\n",
+ "!bash ./download_ami_mini.sh\n",
+ "%cd /content"
]
},
{
- "attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Overlapped speech detection with `pyannote.audio`\n",
- "\n",
- "Overlapped speech detection (OSD) is the task of detecting regions where at least two speakers are speaking at the same time. In this notebook, we will train and evaluate an OSD pipeline on Debug database."
+ "⚠ Restart the runtime (Runtime > Restart session)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Non `Google Colab` setup"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you are not using `Colab`, this tutorial assumes that\n",
+ "* `pyannote.audio` has been installed\n",
+ "* the [AMI corpus](https://groups.inf.ed.ac.uk/ami/corpus/) has already been [setup for use with `pyannote`](https://github.com/pyannote/AMI-diarization-setup/tree/main/pyannote)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Protocol"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Firstly, we define a protocol, here `AMI.SpeakerDiarization.mini`"
]
},
{
@@ -35,8 +95,10 @@
"metadata": {},
"outputs": [],
"source": [
- "from pyannote.database import get_protocol, FileFinder\n",
- "protocol = get_protocol(PROTOCOL, preprocessors={\"audio\": FileFinder()})"
+ "from pyannote.database import registry, FileFinder\n",
+ "\n",
+ "registry.load_database(\"AMI-diarization-setup/pyannote/database.yml\")\n",
+ "protocol = registry.get_protocol(\"AMI.SpeakerDiarization.mini\", preprocessors={\"audio\": FileFinder()})"
]
},
{
@@ -60,7 +122,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
@@ -79,9 +141,21 @@
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"first_training_file['annotation']"
]
@@ -96,9 +170,21 @@
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ", , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |