8000 GitHub - fwaris/TranscriptionAndDiarization: A service (and corresponding gui client) to transcript and diarize video (.mp4) files.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A service (and corresponding gui client) to transcript and diarize video (.mp4) files.

License

Notifications You must be signed in to change notification settings

fwaris/TranscriptionAndDiarization

Repository files navigation

Transcription and Diarization

Wraps a few open source models to transcribe, diarize and optionally tag a specific speaker.

Briefly,

  • Transcribe: Extract the audio from video (.mp4) files and convert the audio to text (speed-to-text).

  • Diarize: Adds speaker tags to the transcript for multiple speakers. The speaker tags are generic markers, e.g. SPEAKER_01, SPEAKER_02, etc.

  • Speaker Identification: Replaces the generic speaker tag with a configured speaker name using available audio sample embeddings of the speaker.

Models

Solution Projects

TranscriptionServiceHost

An F# windows service that exposes a SignalR connection to process transcription requests. The service queues incoming 'jobs' and processes them serially. The client is notified when the job is done.

Transcription, diarizaton and speaker identification are compute intensive and so the service is meant to run on a GPU-enabled machine. The intent is to increase the utilization of the GPU infrastructure by making it more easily shareable.

TranscriptionClient

An F# GUI application to submit jobs to the service. The client uploads the .mp4 files to the service and triggers the processing. When server processing is complete, the client downloads the transcript (.vtt) files.

Transcription Client UI

Client UI - built with Avalnoia FuncUI

The client uses SSH protocol for secure data exchange. And uploaded files are deleted by the service after the job is complete.

TranscriptionService

Contains core logic for transcription and diarization. Also contains the script ExtractSpeakerEmbeddings.fsx that can be used to extract the audio embeddings for speaker identification.

TranscriptionInterop

Common definitions shared between client and server

TranscriptionAndDiarization

Older project that contains batch scripts that were used to develop and refine the transcription processing.

Configuration

The TranscriptionClient is meant to connect to the remote service (running on a GPU box) via SSH and SCP. The SSH connection, with port forwarding, is used for SignalR. SCP protocol is used for upload/download.

The user id and password required for an SSH/SCP connection are stored in appsettings.json. For this reason appsettings.json is excluded from the repo. Instead appsettings.json.template is provided that contains the 'schema' of the settings. Copy the template file to appsettings.json and configure it appropriately for your setup.

About

A service (and corresponding gui client) to transcript and diarize video (.mp4) files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0