Speakr

This project is dual-licensed. See the License section for details.

Speakr is a personal, self-hosted web application designed for transcribing audio recordings (like meetings), generating concise summaries and titles, and interacting with the content through a chat interface. Keep all your meeting notes and insights securely on your own server.

Features Showcase

Transcription and Chat

Integrated Chat

Light and Dark Mode

Light Mode

Dark Mode

Intuitive Transcription Views

Simple View

Bubble View

Speaker Identification

AI-Assisted Identification

Manual & Auto Options

Saved Speaker Suggestions

ASR and API Options

With ASR Webservice

With OpenAI-Compatible API

Advanced Features

Reprocess with New Settings

What's New?

Simplified Docker Setup: Configuration is now managed via a single .env file, making setup faster and easier.
Advanced ASR Integration: Added support for ASR endpoints using the onerahmet/openai-whisper-asr-webservice package. This integration is necessary for the speaker diarization feature.
Speaker Diarization: Identify and label different speakers in your recordings. Note: This feature requires the ASR Webservice method with the whisperx engine.
Speaker Auto-Detection: When using speaker diarization, the system can automatically attemtpt to detect participant names based on the transcript, so you don't have to specify it manually. (/ASR method only)
Clickable Timestamps: Each transcribed sentence now includes a timestamp. Clicking it will jump the audio player to that specific point in the recording, making it easier to identify speakers and verify transcription accuracy. (/ASR method only)
Intuitive Speaker Labeling: A new, more intuitive interface for identifying and labeling speakers. (/ASR method only)
Transcription Reprocessing: A new "Reprocess" button allows you to re-run transcription with different settings (e.g., to add diarization).
Speaker Identification: A new modal helps you name speakers, with highlighting for clarity. You can also automatically identify speakers using an LLM. (/ASR method only)
Saved Speaker Profiles: Save identified speakers for auto-completion in future transcriptions. View and manage them on your Account page. (/ASR method only)
Enhanced Summaries: Summarization now includes user context (name, title) and allows for custom prompts.
Streaming Chat Responses: The chat interface now streams responses, providing a more interactive and responsive experience.
UI Preferences Saved: Your preferences for audio volume and transcription view (simple vs. bubble) are now saved locally in your browser and automatically applied on your next visit.
Last Viewed Recording: The application now remembers the last recording you viewed and will take you there directly when you revisit the page.
Performance Tracking: The processing time for transcriptions is now tracked and displayed, giving you insight into the performance of your setup.

Features

Audio Upload: Upload audio files (MP3, WAV, M4A, etc. depending on your transcription endpoint) via drag-and-drop or file selection.
Browser Recording: Record audio directly in your browser from the "New Recording" screen (requires HTTPS or browser configuration for HTTP - see Deployment Guide) 8000 .
Transcription: Choose between a standard OpenAI-compatible API or a more advanced ASR Webservice.
Speaker Diarization: (ASR/WhisperX only) Automatically detect and separate different speakers in the transcript.
Intuitive Speaker Labeling: A new, more intuitive interface for identifying and labeling speakers.
AI Summarization & Titling: Generates concise titles and summaries using configurable LLMs.
Interactive Chat: Ask questions and interact with the transcription content using an AI model.
Metadata Editing: Edit titles, participants, meeting dates, summaries, and notes.
User Management: Secure user registration and login system with an admin dashboard for managing users.
Customization: Users can set their own language preferences, custom summarization prompts, and professional context to improve AI results.

API Endpoint Requirements

Speakr integrates with external APIs for transcription and text generation. Here are the exact endpoints each service must implement:

Transcription Services

Standard Whisper API Method:

Endpoint: /audio/transcriptions
Method: POST
Format: OpenAI Whisper API compatible
Used for: Audio transcription
When: When USE_ASR_ENDPOINT=false (default)

Common providers using this method:

OpenAI Whisper API (https://api.openai.com/v1) - The original OpenAI service
OpenRouter (https://openrouter.ai/api/v1) - Multi-provider API gateway
Local APIs (http://localhost:1234/v1) - Self-hosted solutions like LM Studio, Ollama, or custom deployments
Other API providers - Any service implementing the OpenAI Whisper API format

ASR Webservice Method:

Endpoint: /asr
Method: POST
Format: Custom ASR webservice format
Used for: Audio transcription with speaker diarization support
When: When USE_ASR_ENDPOINT=true

Recommended provider:

onerahmet/openai-whisper-asr-webservice - Docker image that provides the /asr endpoint with WhisperX engine support for speaker diarization

Text Generation Services

Chat Completions API:

Endpoint: /chat/completions
Method: POST
Format: OpenAI Chat Completions API compatible
Used for:
- AI-generated titles and summaries
- Interactive chat with transcriptions
- Automatic speaker identification
- Summary reprocessing
When: Always (for all text generation features)

Common providers using this method:

OpenAI (https://api.openai.com/v1) - GPT models
OpenRouter (https://openrouter.ai/api/v1) - Access to multiple LLM providers
Local APIs (http://localhost:1234/v1) - Self-hosted solutions like LM Studio, Ollama, vLLM, or custom deployments
Other API providers - Anthropic, Google, Azure OpenAI, or any service implementing the OpenAI Chat Completions format

Example API Base URLs:

OpenAI: https://api.openai.com/v1
OpenRouter: https://openrouter.ai/api/v1
Local APIs: http://localhost:1234/v1

Setup Instructions

For detailed deployment instructions, see the Deployment Guide

The recommended method is to use the pre-built Docker image, which is fast and simple.

Easy Setup (Pre-built Docker Image)

You do not need to clone this repository for this method. You only need Docker installed.

Create the docker-compose.yml file: Create a new file named docker-compose.yml and paste the following content into it:

services:
  app:
    image: learnedmachine/speakr:latest
    container_name: speakr
    restart: unless-stopped
    ports:
      - "8899:8899"
    env_file:
      - .env
    volumes:
      - ./uploads:/data/uploads
      - ./instance:/data/instance

Create a Configuration (.env) File: Your choice here depends on which transcription method you want to use. See the API Endpoint Requirements section above for details on what endpoints each service must implement. Create a new file named .env and paste one of the following templates into it.

Option A: Standard Whisper API Method Uses the /audio/transcriptions endpoint. This is the simplest method and works with OpenAI, OpenRouter, local APIs, and other providers implementing the OpenAI Whisper API format.

# --- Text Generation Model (uses /chat/completions endpoint) ---
TEXT_MODEL_BASE_URL=https://openrouter.ai/api/v1
TEXT_MODEL_API_KEY=your_openrouter_api_key
TEXT_MODEL_NAME=openai/gpt-4o-mini

# --- Transcription Service (uses /audio/transcriptions endpoint) ---
TRANSCRIPTION_BASE_URL=https://api.openai.com/v1
TRANSCRIPTION_API_KEY=your_openai_api_key
WHISPER_MODEL=whisper-1

# --- Application Settings ---
ALLOW_REGISTRATION=false
SUMMARY_MAX_TOKENS=8000
CHAT_MAX_TOKENS=5000

# --- Admin User (created on first run) ---
ADMIN_USERNAME=admin
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=changeme

# --- Docker Settings (rarely need to be changed) ---
SQLALCHEMY_DATABASE_URI=sqlite:////data/instance/transcriptions.db
UPLOAD_FOLDER=/data/uploads

Now, edit the .env file with your API keys and settings.

Option B: ASR Webservice Method (for Speaker Diarization) Uses the /asr endpoint. This method requires a separate ASR webservice container but enables speaker identification. This has been tested with the onerahmet/openai-whisper-asr-webservice image. See the Deployment Guide for instructions on how to run the ASR service.

# --- Text Generation Model (uses /chat/completions endpoint) ---
TEXT_MODEL_BASE_URL=https://openrouter.ai/api/v1
TEXT_MODEL_API_KEY=your_openrouter_api_key
TEXT_MODEL_NAME=openai/gpt-4o-mini

# --- Transcription Service (uses /asr endpoint) ---
USE_ASR_ENDPOINT=true
ASR_BASE_URL=http://your_asr_host:9000  # URL of your running ASR webservice
ASR_DIARIZE=true
ASR_MIN_SPEAKERS=1
ASR_MAX_SPEAKERS=5

# --- Application Settings ---
ALLOW_REGISTRATION=false
SUMMARY_MAX_TOKENS=8000
CHAT_MAX_TOKENS=5000

# --- Admin User (created on first run) ---
ADMIN_USERNAME=admin
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=changeme

# --- Docker Settings (rarely need to be changed) ---
SQLALCHEMY_DATABASE_URI=sqlite:////data/instance/transcriptions.db
UPLOAD_FOLDER=/data/uploads

Now, edit the .env file with your ASR service URL and other settings.

Start the Application: In your terminal, in the same directory as your docker-compose.yml and .env files, run:
```
docker compose up -d
```

Advanced Setup (Build from Source)

If you want to modify the code or build the Docker image yourself, clone the repository first.

Clone the Repository:

git clone https://github.com/murtaza-nasir/speakr.git
cd speakr

Create docker-compose.yml and .env files: Copy the example files from the repository.

cp docker-compose.example.yml docker-compose.yml

# For standard API
cp env.whisper.example .env

# OR for ASR Webservice
cp env.asr.example .env

Edit the .env file with your settings.

Build and Start:
```
docker compose up -d --build
```

Recommended ASR Webservice Setup

Speaker Diarization only works with the ASR method and requires the whisperx engine. Here is an example docker-compose.yml for running the ASR service itself. You would run this in addition to the Speakr app container.

services:
  whisper-asr-webservice:
    image: onerahmet/openai-whisper-asr-webservice:latest-gpu
    container_name: whisper-asr-webservice
    ports:
      - "9000:9000"
    environment:
      - ASR_MODEL=distil-large-v3 # or large-v3, medium
      - ASR_COMPUTE_TYPE=float16     # or int8, float32
      - ASR_ENGINE=whisperx        # REQUIRED for diarization
      - HF_TOKEN=your_hugging_face_token # Optional
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ["0"]
    restart: unless-stopped

A Note on Diarization Accuracy: For best results, it is often better to set the number of speakers slightly higher (e.g., by 1-2) than the actual number present. You can easily merge speakers later in the speaker identification modal.

Speakr has been tested with the recommended onerahmet/openai-whisper-asr-webservice image. Other ASR webservices might work but have not been tested.

Important for ASR Setup: Speaker diarization requires a Hugging Face token and accepting terms for gated models (pyannote). If you encounter issues, check the ASR container logs with docker logs whisper-asr-webservice for troubleshooting.

For complete ASR setup instructions, model configurations, troubleshooting, and logs analysis, see the Deployment Guide

Usage Guide

Register/Login: Access the application at http://localhost:8899. The admin user is created from the ADMIN_* variables in your .env file on first launch.
Set Preferences (Recommended): Go to your Account page. Here you can:
- Set your preferred transcription and output languages.
- Define a custom summarization prompt to tailor summaries to your needs.
- Add your name, job title, and company to provide more context for AI chat interactions.
- View and manage your saved speakers.
Upload or Record:
- Upload: Go to New Recording or drag-and-drop an audio file onto the page
- Record: Use the browser recording feature in the "New Recording" screen to record directly from your microphone (requires HTTPS or browser configuration - see Browser Recording Setup)
- The upload and processing status will appear in a popup.
View and Interact:
- The main Gallery lists your recordings. Click one to view its details.
- Listen to the audio with the built-in player.
- Read the transcription and the AI-generated summary.
- Edit the title, participants, and other metadata.
- Chat with Transcript: Use the chat panel to ask questions about the recording.
Speaker Diarization Workflow:
- To enable speaker diarization, you must use the ASR endpoint method and set ASR_DIARIZE=true in your .env file.
- When a recording is processed with this option, speakers will be automatically detected and assigned generic labels (e.g., SPEAKER 00, SPEAKER 01).
- After processing, click the Identify Speakers button on the transcription page.
- In the speaker identification modal, you can manually assign names to each speaker.
- Alternatively, use the Auto Identify button to let an AI model attempt to identify and name the speakers based on the conversation context.
- Saved speakers will be suggested for auto-completion in future sessions.
- You can also use the Reprocess button to re-transcribe the audio with different diarization settings if needed.

License

This project is dual-licensed:

GNU Affero General Public License v3.0 (AGPLv3)

Speakr is offered under the AGPLv3 as its open-source license. You are free to use, modify, and distribute this software under the terms of the AGPLv3. A key condition of the AGPLv3 is that if you run a modified version on a network server and provide access to it for others, you must also make the source code of your modified version available to those users under the AGPLv3.
- You must create a file named LICENSE (or COPYING) in the root of your repository and paste the full text of the GNU AGPLv3 license into it.
- Read the full license text carefully to understand your rights and obligations.
Commercial License

For users or organizations who cannot or do not wish to comply with the terms of the AGPLv3 (for example, if you want to integrate Speakr into a proprietary commercial product or service without being obligated to share your modifications under AGPLv3), a separate commercial license is available.

Please contact speakr maintainers for details on obtaining a commercial license.

You must choose one of these licenses under which to use, modify, or distribute this software. If you are using or distributing the software without a commercial license agreement, you must adhere to the terms of the AGPLv3.

Roadmap

Speakr is actively being developed. Future planned features include:

Quick Language Switching: A faster way to change transcription or output languages on the fly.
Large File Chunking: Automatic splitting of large audio files to support transcription endpoints with file size limits (like OpenAI's 25MB limit).

Contributing

While direct code contributions are not the primary focus at this stage, feedback, bug reports, and feature suggestions are highly valuable! Please feel free to open an Issue on the GitHub repository.

Note on Future Contributions and CLAs: Should this project begin accepting code contributions from external developers in the future, signing a Contributor License Agreement (CLA) will be required before any pull requests can be merged. This policy ensures that the project maintainer receives the necessary rights to distribute all contributions under both the AGPLv3 and the commercial license options offered. Details on the CLA process will be provided if and when the project formally opens up to external code contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
.github/workflows		.github/workflows
deployment		deployment
static		static
templates		templates
.dockerignore		.dockerignore
.env.example		.env.example
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
create_admin.py		create_admin.py
create_docs.py		create_docs.py
docker-compose.example.yml		docker-compose.example.yml
docker-entrypoint.sh		docker-entrypoint.sh
docker_create_admin.py		docker_create_admin.py
env.asr.example		env.asr.example
env.whisper.example		env.whisper.example
parse_asr_json.py		parse_asr_json.py
requirements.txt		requirements.txt
reset_db.py		reset_db.py
temp.json		temp.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speakr

Features Showcase

What's New?

Features

API Endpoint Requirements

Transcription Services

Text Generation Services

Setup Instructions

Easy Setup (Pre-built Docker Image)

Advanced Setup (Build from Source)

Recommended ASR Webservice Setup

Usage Guide

License

Roadmap

Contributing

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

murtaza-nasir/speakr

Folders and files

Latest commit

History

Repository files navigation

Speakr

Features Showcase

What's New?

Features

API Endpoint Requirements

Transcription Services

Text Generation Services

Setup Instructions

Easy Setup (Pre-built Docker Image)

Advanced Setup (Build from Source)

Recommended ASR Webservice Setup

Usage Guide

License

Roadmap

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages