How does LIUM Speaker Diaritization work?

https://stackoverflow.com//questions/21041819

21-12-2019
|

Question

in my project, I am using the library called LIUM_SpkDiarization-4.7.jar, but I am not quite sure how does it work. Could anyone, please, explain it a bit?

Also, I'm using it with python.

The link to the library is: https://voiceid.googlecode.com/svn-history/r11/trunk/scripts/LIUM_SpkDiarization-4.7.jar

Thanks in advance.

Solution

I was not aware of this tool. It looks really cool. Have you checked their wiki? They have some papers on how the system works: http://lium3.univ-lemans.fr/diarization/doku.php

Basically, they compute MFCC Mel Frequency Cepstrum Coefficients (standard technique). This is the fundamental step. It generates a feature space to work with. It is similar to computing FFT over sliding window in time. Ultimately clustering is performed on these time sliced features using Bayesian Information Criterion (BIC) methods. First to segment the time based feature space, then to cluster, and find consistent features for each speaker. HMM, viterbi, EM, and sometimes GMM can be used as well.

I don't know the algorithm well enough to explain it in detail, but this should also help: http://lium3.univ-lemans.fr/diarization/doku.php/overview

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow