- π About
- π Getting Started
- π Usage
- π₯οΈ Demo
- π³ Docker Setup
- πΊοΈ Roadmap
- π₯ Contributors
- π License
Notetaker AI transforms how professionals handle meetings, interviews, and consultations with advanced audio-to-text capabilities. It combines precise transcription with intelligent summarization to create concise, structured notes that save time and enhance documentation accuracy.
- ποΈ Smart Transcription: Convert audio to text with exceptional accuracy, including optional speaker diarization and time alignment
- π Multiple Summary Formats: Generate summaries in various formats to fit different professional needs:
- π Text β Simple, readable plain-text format
- π SOAP β Structured clinical format (Subjective, Objective, Assessment, Plan)
- π₯ PKI HL7 CDA β Standards-compliant summary for healthcare interoperability
- π©Ί Therapy Assessment β Custom format for structured evaluation of therapist performance across key professional competencies
- β³ Long-form Audio Support: Designed to handle recordings of over 1 hour
- βοΈ Flexible Deployment: Can be deployed fully locally, using local AI models for full data control, or using wavaliable external integrations
- βοΈ Multiple access points: Run as an API-only service or with an intuitive Gradio UI for interactive use
- π GPU Acceleration: Leverage GPU hardware for faster processing of large audio files
- π§ Customizable: Configure to your specific requirements with extensive environment variables
demo.mp4
Follow these steps to set up Notetaker AI in your environment.
- Python: 3.12 or higher
- Poetry: For dependency management (Installation Guide)
- FFmpeg: Required for audio processing
- CUDA Toolkit: 12.2+ recommended (only if using GPU acceleration)
- Hugging Face Access: You'll need access to these gated models:
-
Clone the repository:
git clone https://github.com/the-momentum/notetaker cd notetaker
-
Install dependencies:
# For API only (recommended for production) poetry install --without demo --without dev # With demo interface (for testing and demonstration) poetry install --with demo --without dev
-
Set up environment variables:
cp .env.example .env
Edit the
.env
file with your specific configuration. -
Start the application:
./run.sh
The API will be available at http://localhost:8001 by default.
-
Access the API documentation:
- Swagger UI: http://localhost:8001/docs
- ReDoc: http://localhost:8001/redoc
Variable | Description | Example Value |
---|---|---|
PROJECT_NAME | Name used for logging and display | Notetaker AI |
BACKEND_CORS_ORIGINS | Allowed CORS origins | ["http://localhost:8000"] |
HOST | Host address for API availability | 0.0.0.0 |
PORT | Port for the API server | 8001 |
OLLAMA_URL | Base URL for Ollama server | http://localhost:11434 |
LLM_MODEL | LLM model name | llama3.2 |
USE_LOCAL_MODELS | Whether to use local models | True |
WHISPER_MODEL | Whisper model type | turbo |
WHISPER_DEVICE | Device for running Whisper | cpu or cuda |
WHISPER_COMPUTE_TYPE | Compute type for Whisper | int8 |
WHISPER_BATCH_SIZE | Batch size for processing | 16 |
HF_API_KEY | Hugging Face API key | hf_... |
OPENAI_API_KEY | OpenAI API key | sk-proj-... |
The interactive Gradio demo provides a user-friendly interface to experience Notetaker AI's capabilities without writing code.
-
Install demo dependencies (if not already done):
poetry install --with demo --without dev
-
Configure the demo: Update
demo/.env.demo
with your API base URL. -
Launch the integrated demo:
./run.sh --demo
This starts both the API and Gradio interface.
-
Or run the demo separately (if API is already running):
poetry run python demo/ui.py
The demo will be available at http://localhost:7860.
- π Upload or Record: Submit audio files or record directly in your browser
- βοΈ Configure Options: Set parameters for transcription and summarization
- π Format Selection: Choose between different summary formats
- β±οΈ Real-time Processing: Watch as your audio is transcribed and summarized
- πΎ Download Results: Save output as JSON for further use
For consistent deployment across environments, use our Docker setup.
# Build the Docker images
just docker-build
# Rebuild without using cache
just docker-rebuild
# Run the API only
just docker-up
# Run API with Gradio demo
just docker-demo
- API: http://localhost:8001
- API Documentation:
- Swagger UI: http://localhost:8001/docs
- ReDoc: http://localhost:8001/redoc
- Gradio Demo (if enabled): http://localhost:7860
We're continuously enhancing Notetaker AI with new capabilities. Here's what's on the horizon:
- OpenAI API Integration: Direct connection to Whisper via OpenAI API
- Expanded LLM Support: Integration with additional LLM providers
- Enhanced Note Formats: More specialized formats and improved customization options
- Performance Optimizations: Faster processing for large audio files
Have a suggestion? We'd love to hear from you! Contact us or contribute directly.
Distributed under the MIT License. See LICENSE
for more information.
Built with β€οΈ by Momentum β’ Turning conversations into structured knowledge