Skip to content

Commit

Permalink
Move things around. Simplify code and wording
Browse files Browse the repository at this point in the history
  • Loading branch information
juanmc2005 authored Nov 11, 2023
1 parent ab8f516 commit 145d32d
Showing 1 changed file with 42 additions and 44 deletions.
86 changes: 42 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,24 @@

<div align="center">
<h4>
<a href="#-installation">
💾 Installation
</a>
<span> | </span>
<a href="#%EF%B8%8F-stream-audio">
🎙️ Stream audio
</a>
<span> | </span>
<a href="#-available-models">
🧠 Available models
<a href="#-installation">
💾 Installation
</a>
<span> | </span>
<a href="#-add-your-model">
🤖 Add your model
<a href="#-models">
🧠 Available models
</a>
<br />
<a href="#-tune-hyper-parameters">
📈 Tune hyper-parameters
📈 Tuning
</a>
<span> | </span>
<a href="#-build-pipelines">
🧠🔗 Build pipelines
🧠🔗 Pipelines
</a>
<span> | </span>
<a href="#-websockets">
Expand All @@ -59,30 +55,6 @@
<img width="100%" src="/demo.gif" title="Real-time diarization example" />
</p>

## 💾 Installation

1) Create environment:

```shell
conda env create -f diart/environment.yml
conda activate diart
```

2) Install the package:
```shell
pip install diart
```

### Get access to 🎹 pyannote models

By default, diart is based on [pyannote.audio](https://github.com/pyannote/pyannote-audio) models stored in the [huggingface](https://huggingface.co/) hub.
To allow diart to use them, you need to follow these steps:

1) [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model
2) [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the newest `pyannote/segmentation-3.0` model
3) [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model
4) Install [huggingface-cli](https://huggingface.co/docs/huggingface_hub/quick-start#install-the-hub-library) and [log in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with your user access token (or provide it manually in diart CLI or API).

## 🎙️ Stream audio

### From the command line
Expand Down Expand Up @@ -122,10 +94,41 @@ prediction = inference()

For inference and evaluation on a dataset we recommend to use `Benchmark` (see notes on [reproducibility](#-reproducibility)).

## 🧠 Available models
## 💾 Installation

**1) Make sure your system has the following dependencies:**

```
ffmpeg < 4.4
portaudio == 19.6.X
libsndfile >= 1.2.2
```

Alternatively, we provide an `environment.yml` file for a pre-configured conda environment:

```shell
conda env create -f diart/environment.yml
conda activate diart
```

**2) Install the package:**
```shell
pip install diart
```

You can use a different segmentation or embedding model with `--segmentation` and `--embedding`.
### Get access to 🎹 pyannote models

By default, diart is based on [pyannote.audio](https://github.com/pyannote/pyannote-audio) models stored in the [huggingface](https://huggingface.co/) hub.
To allow diart to use them, you need to follow these steps:

1) [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model
2) [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the newest `pyannote/segmentation-3.0` model
3) [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model
4) Install [huggingface-cli](https://huggingface.co/docs/huggingface_hub/quick-start#install-the-hub-library) and [log in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with your user access token (or provide it manually in diart CLI or API).

## 🧠 Models

You can use other models with the `--segmentation` and `--embedding` arguments.
Or in python:

```python
Expand All @@ -135,6 +138,8 @@ segmentation = m.SegmentationModel.from_pretrained("model_name")
embedding = m.EmbeddingModel.from_pretrained("model_name")
```

### Available pre-trained models

Below is a list of all the models currently supported by diart:

| Model Name | Model Type | CPU Time* | GPU Time* |
Expand All @@ -155,16 +160,13 @@ The latency of embedding models is measured in a diarization pipeline using `pya

\* CPU: AMD Ryzen 9 - GPU: RTX 4060 Max-Q

## 🤖 Add your model
### Custom models

Third-party models can be integrated by providing a loader function:

```python
from diart import SpeakerDiarization, SpeakerDiarizationConfig
from diart.models import EmbeddingModel, SegmentationModel
from diart.sources import MicrophoneAudioSource
from diart.inference import StreamingInference


def segmentation_loader():
# It should take a waveform and return a segmentation tensor
Expand All @@ -174,17 +176,13 @@ def embedding_loader():
# It should take (waveform, weights) and return per-speaker embeddings
return load_pretrained_model("my_other_model.ckpt")


segmentation = SegmentationModel(segmentation_loader)
embedding = EmbeddingModel(embedding_loader)
config = SpeakerDiarizationConfig(
segmentation=segmentation,
embedding=embedding,
)
pipeline = SpeakerDiarization(config)
mic = MicrophoneAudioSource()
inference = StreamingInference(pipeline, mic)
prediction = inference()
```

If you have an ONNX model, you can use `from_onnx()`:
Expand Down

0 comments on commit 145d32d

Please sign in to comment.