Speaker Diarization of Known Speakers #1667
Replies: 3 comments 6 replies
-
Hey @tobiasschmidt89 I'm just started working on this too (for podcasts). Digging into pyannote I see it uses speechbrain "speechbrain/spkrec-ecapa-voxceleb" embedding model. https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb I can share the approach I have in mind:
I'll see how that goes. Let me know how you get on with it. Feel free to hmu if you wanna collaborate a bit on this. |
Beta Was this translation helpful? Give feedback.
-
Hi @desicochrane, @tobiasschmidt89 . I'm exploring ways to improve speaker diarization in recordings with multiple speakers, but my primary goal is to accurately detect a specific target speaker. I have additional enrollment samples for that speaker (which I suppose I can use to generate an embedding), and I'm wondering if anyone here has achieved high precision for this particular use case. |
Beta Was this translation helpful? Give feedback.
-
Hi, here is my gist that applies embedding on a wav of a known speaker, and compares it to the output of the diarisation. I am building an API that manages meetings and audio from meetings to transcribe it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I really enjoy using this Library for speaker diarization to create labeled transcripts in combination with Whisper: Speaker 1: ..., Speaker 2: ..., Speaker 1: ...
Currently I then search and replace the anonymous speaker labels with the real names.
I have some meetings that always have the same speakers (D&D game sessions). Therefore I am searching for a way to kind of create "voice embeddings" of each speaker by recording them in isolation for a minute or so. Then I want to do a speaker diarization using these embeddings to get labels like: Max: ..., Maria: ..., Tobi: ..., Max: ..., Unknown: ...
I would be interested if someone has some pointers on how I could achieve this with Pyannote. I expect I need to do the following:
Crossed items I know how to do.
I am very comfortable with text embeddings. Audio embedding is a new topic for me.
I would really appreciate any pointers or example scripts.
Thank you
T.
Beta Was this translation helpful? Give feedback.
All reactions