Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

cvqluu, updated 🕥 2023-01-12 04:49:23


Open In Colab

Simplified diarization pipeline using some pretrained models.

Made to be a simple as possible to go from an input audio file to diarized segments.

```python import soundfile as sf import matplotlib.pyplot as plt

from simple_diarizer.diarizer import Diarizer from simple_diarizer.utils import combined_waveplot

diar = Diarizer( embed_model='xvec', # 'xvec' and 'ecapa' supported cluster_method='sc' # 'ahc' and 'sc' supported )

segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)

signal, fs = combined_waveplot(signal, fs, segments) ```


Simplified diarization is available on PyPI:

pip install simple-diarizer

Source Video

"Some Quick Advice from Barack Obama!"

YouTube Thumbnail

Pre-trained Models

The following pretrained models are used:


Open In Colab

It can be checked out in the above link, where it will try and diarize any input YouTube URL.

Other References

Planned Features


Auto-detect number of speakers (clustering)

opened on 2023-03-06 14:56:16 by ADD-eNavarro

Hi there!

Nice job, I've just tested this in Windows with Python 3.10 and works fine (once soundfile installed). Just one query, it would be great if the tool could cluster the voice embeddings and guess how many speakers there are, since my intended use is to diarize conversations where the amount of speakers is unknown before hand.

Is such a feature in your road map?

Add soundfile python library in requirements.txt

opened on 2023-02-02 04:21:35 by RAHUL-KAD

I did pip install simple-diarizer and then run the demo code sample you have with my own audio file.

But then I got an error ModuleNotFoundError: No module named 'soundfile'

You don't have soundfile python library in your requirements.txt file. Please add support for it.

Add support for Apple M1 & M2 CPUs (MPS)

opened on 2023-01-30 14:49:01 by lucaintlx

From what I can see, only CUDA and CPU are supported. It would be nice to include support also for MPS (Apple M1 & M2 CPUs).

Does not cover the whole audio

opened on 2023-01-17 21:26:23 by codeHorasan

I tried it with a 2-minute-long audio file but the outcome had only 1 minute.

The Google Colab's kernel keeps crashing when I run the tutorial notebook.

opened on 2023-01-12 10:04:25 by takan1

Hi thanks for the great work! When I was running the tutorial notebook on Google colab, the notebook kept crashing without any notice. Would someone help me figure out?

Clustering kwargs exposed

opened on 2023-01-12 04:49:23 by andrewmackie

I have exposed the kwargs for all of the sklearn-based clustering algorithms so that they can be called from cluster_SC(), cluster_AHC(), Diarizer.diarize() and the command line.

All kwargs available in the sklearn algorithms should be available. I noted that you have some default values for kwargs and have retained those.

I haven't done comprehensive testing. I won't be offended if you want to change the way it is implemented.

FYI, the reason I did this was that 'arpack' eigen solver in sklearn.cluster.SpectralClustering falls over when attempting to cluster a large number (>2k) of embeddings. Using the 'lobpcg' eigen solver appears to address this problem, but the eigen_solver kwarg could not be set from Diarizer.diarize() - now it can.


Fixed missing ipython dependency 2022-12-12 13:26:06

Reworked extra_info from .diarize() 2022-12-08 13:28:16

Setting extra_info to True will now return an additional dict, containing cluster labels

python >=3.7 support 2022-11-09 05:18:03

Removed youtube related dependencies, keeping the repository slim. There are no longer youtube helper functions, but the core functionality should now work for python >=3.7

Fixed dependencies 2022-08-30 06:19:02

Allowed for a newer version of speechbrain, which should have fixed the issues with pulling from huggingface_hub

Version 0.0.9 2022-01-10 19:21:06

Version 0.0.8 2022-01-10 19:08:31


PhD student at the University of Edinburgh, CSTR

GitHub Repository

speech-to-text transcription diarization asr colab-notebook speaker-diarization