10000 GitHub - the-momentum/notetaker: πŸ§‘β€βš•οΈπŸ€– AI-powered audio transcription and smart summarization tool that transforms spoken conversations into structured notes for healthcare professionals.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

πŸ§‘β€βš•οΈπŸ€– AI-powered audio transcription and smart summarization tool that transforms spoken conversations into structured notes for healthcare professionals.

Notifications You must be signed in to change notification settings

the-momentum/notetaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Notetaker AI

Intelligent Transcription & Summarization for Professionals

Contact us Visit Momentum MIT License

πŸ“‹ Table of Contents

πŸ” About The Project

Notetaker AI transforms how professionals handle meetings, interviews, and consultations with advanced audio-to-text capabilities. It combines precise transcription with intelligent summarization to create concise, structured notes that save time and enhance documentation accuracy.

Notetaker workflow

✨ Key Features

  • πŸŽ™οΈ Smart Transcription: Convert audio to text with exceptional accuracy, including optional speaker diarization and time alignment
  • πŸ“Š Multiple Summary Formats: Generate summaries in various formats to fit different professional needs:
    • πŸ“ Text – Simple, readable plain-text format
    • πŸ“‹ SOAP – Structured clinical format (Subjective, Objective, Assessment, Plan)
    • πŸ₯ PKI HL7 CDA – Standards-compliant summary for healthcare interoperability
    • 🩺 Therapy Assessment – Custom format for structured evaluation of therapist performance across key professional competencies
  • ⏳ Long-form Audio Support: Designed to handle recordings of over 1 hour
  • βš™οΈ Flexible Deployment: Can be deployed fully locally, using local AI models for full data control, or using wavaliable external integrations
  • βš™οΈ Multiple access points: Run as an API-only service or with an intuitive Gradio UI for interactive use
  • πŸš„ GPU Acceleration: Leverage GPU hardware for faster processing of large audio files
  • πŸ”§ Customizable: Configure to your specific requirements with extensive environment variables

demo.mp4

(back to top)

πŸš€ Getting Started

Follow these steps to set up Notetaker AI in your environment.

Prerequisites

  • Python: 3.12 or higher
  • Poetry: For dependency management (Installation Guide)
  • FFmpeg: Required for audio processing
  • CUDA Toolkit: 12.2+ recommended (only if using GPU acceleration)
  • Hugging Face Access: You'll need access to these gated models:

Installation

  1. Clone the repository:

    git clone https://github.com/the-momentum/notetaker
    cd notetaker
  2. Install dependencies:

    # For API only (recommended for production)
    poetry install --without demo --without dev
    
    # With demo interface (for testing and demonstration)
    poetry install --with demo --without dev

(back to top)

πŸ“ Usage

Configuration

  1. Set up environment variables:

    cp .env.example .env

    Edit the .env file with your specific configuration.

  2. Start the application:

    ./run.sh

    The API will be available at http://localhost:8001 by default.

  3. Access the API documentation:

Environment Variables

Variable Description Example Value
PROJECT_NAME Name used for logging and display Notetaker AI
BACKEND_CORS_ORIGINS Allowed CORS origins ["http://localhost:8000"]
HOST Host address for API availability 0.0.0.0
PORT Port for the API server 8001
OLLAMA_URL Base URL for Ollama server http://localhost:11434
LLM_MODEL LLM model name llama3.2
USE_LOCAL_MODELS Whether to use local models True
WHISPER_MODEL Whisper model type turbo
WHISPER_DEVICE Device for running Whisper cpu or cuda
WHISPER_COMPUTE_TYPE Compute type for Whisper int8
WHISPER_BATCH_SIZE Batch size for processing 16
HF_API_KEY Hugging Face API key hf_...
OPENAI_API_KEY OpenAI API key sk-proj-...

⚠️ Note: The transcription output length depends on the selected model's token limit. If the transcription is too long, it may be truncated or cause errors. Choose a model appropriate for the expected transcription length to ensure complete results.

(back to top)

πŸ–₯️ Demo

The interactive Gradio demo provides a user-friendly interface to experience Notetaker AI's capabilities without writing code.

Running the Demo

  1. Install demo dependencies (if not already done):

    poetry install --with demo --without dev
  2. Configure the demo: Update demo/.env.demo with your API base URL.

  3. Launch the integrated demo:

    ./run.sh --demo

    This starts both the API and Gradio interface.

  4. Or run the demo separately (if API is already running):

    poetry run python demo/ui.py

The demo will be available at http://localhost:7860.

Demo Features

  • πŸ“ Upload or Record: Submit audio files or record directly in your browser
  • βš™οΈ Configure Options: Set parameters for transcription and summarization
  • πŸ“Š Format Selection: Choose between different summary formats
  • ⏱️ Real-time Processing: Watch as your audio is transcribed and summarized
  • πŸ’Ύ Download Results: Save output as JSON for further use

(back to top)

🐳 Docker Setup

For consistent deployment across environments, use our Docker setup.

Quick Commands

# Build the Docker images
just docker-build

# Rebuild without using cache
just docker-rebuild

# Run the API only
just docker-up

# Run API with Gradio demo
just docker-demo

Access Points

(back to top)

πŸ—ΊοΈ Roadmap

We're continuously enhancing Notetaker AI with new capabilities. Here's what's on the horizon:

  • OpenAI API Integration: Direct connection to Whisper via OpenAI API
  • Expanded LLM Support: Integration with additional LLM providers
  • Enhanced Note Formats: More specialized formats and improved customization options
  • Performance Optimizations: Faster processing for large audio files

Have a suggestion? We'd love to hear from you! Contact us or contribute directly.

πŸ‘₯ Contributors

(back to top)

πŸ“„ License

Distributed under the MIT License. See LICENSE for more information.


Built with ❀️ by Momentum β€’ Turning conversations into structured knowledge

About

πŸ§‘β€βš•οΈπŸ€– AI-powered audio transcription and smart summarization tool that transforms spoken conversations into structured notes for healthcare professionals.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0