diff --git a/README.md b/README.md index 78082aea..6fe11b37 100644 --- a/README.md +++ b/README.md @@ -15,28 +15,24 @@

- - 💾 Installation - - | 🎙️ Stream audio | - - 🧠 Available models + + 💾 Installation | - - 🤖 Add your model + + 🧠 Available models
- 📈 Tune hyper-parameters + 📈 Tuning | - 🧠🔗 Build pipelines + 🧠🔗 Pipelines | @@ -59,30 +55,6 @@

-## 💾 Installation - -1) Create environment: - -```shell -conda env create -f diart/environment.yml -conda activate diart -``` - -2) Install the package: -```shell -pip install diart -``` - -### Get access to 🎹 pyannote models - -By default, diart is based on [pyannote.audio](https://github.com/pyannote/pyannote-audio) models stored in the [huggingface](https://huggingface.co/) hub. -To allow diart to use them, you need to follow these steps: - -1) [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model -2) [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the newest `pyannote/segmentation-3.0` model -3) [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model -4) Install [huggingface-cli](https://huggingface.co/docs/huggingface_hub/quick-start#install-the-hub-library) and [log in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with your user access token (or provide it manually in diart CLI or API). - ## 🎙️ Stream audio ### From the command line @@ -122,10 +94,41 @@ prediction = inference() For inference and evaluation on a dataset we recommend to use `Benchmark` (see notes on [reproducibility](#-reproducibility)). -## 🧠 Available models +## 💾 Installation + +**1) Make sure your system has the following dependencies:** + +``` +ffmpeg < 4.4 +portaudio == 19.6.X +libsndfile >= 1.2.2 +``` + +Alternatively, we provide an `environment.yml` file for a pre-configured conda environment: + +```shell +conda env create -f diart/environment.yml +conda activate diart +``` + +**2) Install the package:** +```shell +pip install diart +``` -You can use a different segmentation or embedding model with `--segmentation` and `--embedding`. +### Get access to 🎹 pyannote models +By default, diart is based on [pyannote.audio](https://github.com/pyannote/pyannote-audio) models stored in the [huggingface](https://huggingface.co/) hub. +To allow diart to use them, you need to follow these steps: + +1) [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model +2) [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the newest `pyannote/segmentation-3.0` model +3) [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model +4) Install [huggingface-cli](https://huggingface.co/docs/huggingface_hub/quick-start#install-the-hub-library) and [log in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with your user access token (or provide it manually in diart CLI or API). + +## 🧠 Models + +You can use other models with the `--segmentation` and `--embedding` arguments. Or in python: ```python @@ -135,6 +138,8 @@ segmentation = m.SegmentationModel.from_pretrained("model_name") embedding = m.EmbeddingModel.from_pretrained("model_name") ``` +### Available pre-trained models + Below is a list of all the models currently supported by diart: | Model Name | Model Type | CPU Time* | GPU Time* | @@ -155,16 +160,13 @@ The latency of embedding models is measured in a diarization pipeline using `pya \* CPU: AMD Ryzen 9 - GPU: RTX 4060 Max-Q -## 🤖 Add your model +### Custom models Third-party models can be integrated by providing a loader function: ```python from diart import SpeakerDiarization, SpeakerDiarizationConfig from diart.models import EmbeddingModel, SegmentationModel -from diart.sources import MicrophoneAudioSource -from diart.inference import StreamingInference - def segmentation_loader(): # It should take a waveform and return a segmentation tensor @@ -174,7 +176,6 @@ def embedding_loader(): # It should take (waveform, weights) and return per-speaker embeddings return load_pretrained_model("my_other_model.ckpt") - segmentation = SegmentationModel(segmentation_loader) embedding = EmbeddingModel(embedding_loader) config = SpeakerDiarizationConfig( @@ -182,9 +183,6 @@ config = SpeakerDiarizationConfig( embedding=embedding, ) pipeline = SpeakerDiarization(config) -mic = MicrophoneAudioSource() -inference = StreamingInference(pipeline, mic) -prediction = inference() ``` If you have an ONNX model, you can use `from_onnx()`: