Wraps a few open source models to transcribe, diarize and optionally tag a specific speaker.
Briefly,
-
Transcribe: Extract the audio from video (.mp4) files and convert the audio to text (speed-to-text).
-
Diarize: Adds speaker tags to the transcript for multiple speakers. The speaker tags are generic markers, e.g. SPEAKER_01, SPEAKER_02, etc.
-
Speaker Identification: Replaces the generic speaker tag with a configured speaker name using available audio sample embeddings of the speaker.
- Fast Transcriber - transcription and diarization
- Pyannote - audio embedding model for speaker identification
An F# windows service that exposes a SignalR connection to process transcription requests. The service queues incoming 'jobs' and processes them serially. The client is notified when the job is done.
Transcription, diarizaton and speaker identification are compute intensive and so the service is meant to run on a GPU-enabled machine. The intent is to increase the utilization of the GPU infrastructure by making it more easily shareable.
An F# GUI application to submit jobs to the service. The client uploads the .mp4 files to the service and triggers the processing. When server processing is complete, the client downloads the transcript (.vtt) files.
Client UI - built with Avalnoia FuncUI
The client uses SSH protocol for secure data exchange. And uploaded files are deleted by the service after the job is complete.
Contains core logic for transcription and diarization. Also contains the script ExtractSpeakerEmbeddings.fsx that can be used to extract the audio embeddings for speaker identification.
Common definitions shared between client and server
Older project that contains batch scripts that were used to develop and refine the transcription processing.
The TranscriptionClient is meant to connect to the remote service (running on a GPU box) via SSH and SCP. The SSH connection, with port forwarding, is used for SignalR. SCP protocol is used for upload/download.
The user id and password required for an SSH/SCP connection are stored in appsettings.json. For this reason appsettings.json is excluded from the repo. Instead appsettings.json.template is provided that contains the 'schema' of the settings. Copy the template file to appsettings.json and configure it appropriately for your setup.