A powerful real-time speech translation application that captures audio, transcribes speech in multiple languages and provides instantaneous translations with high accuracy - all running locally on your machine.
- Real-time Audio Processing: Captures and processes audio in real-time using WebRTC
- Multilingual Support: Automatically detects and transcribes 17+ languages
- High-Quality Speech Recognition: Powered by Whisper Large-v3-turbo for accurate transcription
- Fast Translation: Uses NLLB-200 translation model for quick and accurate translations
- Interactive UI: Clean interface with audio visualization, voice activity detection, and real-time updates
- Export Options: Export translations to multiple formats (TXT, DOCX, PDF, HTML, clipboard)
- Advanced Audio Visualization: Visual feedback with spectrum analysis and voice activity indication
- Sentence Management: Intelligent handling of partial transcripts for natural sentence formation
- Language Auto-detection: Automatically detects the spoken language with confidence metrics
The application currently supports the following languages:
Language | Code | Support |
---|---|---|
English | eng_Latn | Full |
Korean | kor_Hang | Full |
Japanese | jpn_Jpan | Full |
Chinese (Simplified) | cmn_Hans | Full |
German | deu_Latn | Full |
French | fra_Latn | Full |
Spanish | spa_Latn | Full |
Russian | rus_Cyrl | Full |
Portuguese | por_Latn | Full |
Italian | ita_Latn | Full |
Vietnamese | vie_Latn | Full |
Thai | tha_Thai | Full |
Indonesian | ind_Latn | Full |
Dutch | nld_Latn | Full |
Turkish | tur_Latn | Full |
Arabic | ara_Arab | Full |
Hindi | hin_Deva | Full |
- Python with Flask web framework
- Socket.IO for bidirectional communication
- Faster-Whisper for speech recognition
- NLLB-200 for neural machine translation
- LangDetect for language detection
- HTML5, CSS3, JavaScript
- Socket.IO for real-time communication
- Web Audio API for audio processing and visualization
- Libraries: docx.js, jsPDF, FileSaver.js
- Python: 3.9+
- GPU: CUDA-compatible GPU recommended for optimal performance
- Memory: 16GB+ RAM recommended, 8GB RAM minimum
- Disk Space: At least 10GB free space for model files
- Browser: Modern browser with WebRTC support
- Network: Stable internet connection
-
Clone the repository:
git clone https://github.com/SkyFever/Local-Live-Translator.git cd Local-Live-Translator
-
Set up the backend:
# Create and activate a virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt
-
Environment Configuration:
Create a
.env
file in the project root with:SECRET_KEY=your_secret_key_here
-
Start the server:
python server/app.py
The server will start on
http://localhost:7880
by default. -
Access the application:
Open
http://localhost:7880
in your web browser.
-
Select Languages:
- Choose source language (or use auto-detect)
- Select target language for translation
-
Start Recording:
- Click the "Start" button to begin capturing audio
- Speak clearly into your microphone
- View real-time transcription and translation
-
End Session:
- Click "Stop" to end the recording session
-
Export Results:
- Use the export controls to save translations in your preferred format
- Choose which components to include (original text, translations, timestamps, language info)
You can modify the following parameters in server/app.py
:
model_size
: Change the Whisper model size (e.g., 'medium', 'large-v2', 'large-v3')device
: Set to 'cpu' if CUDA is not availableport
: Change the server port (default: 7880)
Adjust audio processing parameters in app.py
:
MAX_BUFFER_SIZE
: Buffer size for audio chunksmin_processing_interval
: Minimum time between processing audio chunks- Voice activity detection thresholds and parameters
- Microphone Access Issues: Ensure your browser has permission to access the microphone
- Performance Issues: Consider using a smaller Whisper model if experiencing lag
- Language Detection Problems: Speak clearly and provide longer utterances for better language detection
- Audio Quality Issues: Try using a better microphone or reduce background noise for improved recognition
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This project uses NLLB-200 (No Language Left Behind) translation model from Meta AI which is licensed under the CC-BY-NC 4.0 license. Therefore:
- Non-Commercial Use Only: This application may only be used for non-commercial purposes without obtaining a separate license from Meta AI.
- Attribution: Appropriate attribution must be provided to Meta AI's NLLB-200 and other included open-source components.
For commercial use, you would need to replace NLLB-200 with an alternative translation solution or obtain a commercial license from Meta AI.
- OpenAI Whisper for the speech recognition technology
- Faster Whisper for the optimized Whisper implementation
- NLLB (No Language Left Behind) by Meta AI for the translation model
- Socket.IO for the real-time communication framework
- Flask for the web application framework