EchoForge - Voice Generation Platform

EchoForge is an advanced voice generation platform that allows you to create lifelike vocal content with customizable parameters.

🚨 Debug Mode Available 🚨

For troubleshooting API issues, a new debug page has been added. To access it:

Start the server as described below
Visit: http://localhost:8765/debug
Use the debug interface to test different API endpoints and trace request/response flow

Repository

https://github.com/toddllm/echoforge

Features

Text-to-speech generation with multiple character voices
Adjustable generation parameters (temperature, top-k, style)
Background task processing for long-running generations
RESTful API for integration with other applications
Web interface with light and dark mode support
Easy environment configuration

Technology

EchoForge is built on top of the Conversational Speech Model (CSM) from Sesame AI Labs. CSM is a speech generation model that generates high-quality, natural-sounding speech from text input. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

Key features of the CSM model:

High-quality speech synthesis
Support for multiple speakers
Contextual awareness for more natural-sounding conversations
Adjustable generation parameters (temperature, top-k)

EchoForge wraps this technology in a user-friendly web interface and API, making it accessible for various applications.

Direct CSM Implementation

EchoForge now uses the Direct CSM implementation by default for faster voice generation, especially on CUDA-enabled devices.

Features

Up to 25x faster generation on GPU compared to CPU
Maintains the same high-quality voice output
Automatically falls back to CPU if CUDA is unavailable

Usage

To start the server with Direct CSM enabled:

python run.py --direct-csm

When using the API, specify device=cuda for faster generation:

curl -X POST http://localhost:8779/api/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text here", "voice": "male", "temperature": 0.7, "device": "cuda"}'

Performance

CUDA generation: ~3-6 seconds
CPU generation: ~150 seconds

Installation

Option 1: Using uv (Recommended)

uv is a faster, more reliable Python package installer and resolver.

Install uv:

pip install uv

Clone the repository:

git clone https://github.com/yourusername/echoforge.git
cd echoforge

Create a virtual environment and install dependencies:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

Option 2: Using pip

Clone the repository:

git clone https://github.com/yourusername/echoforge.git
cd echoforge

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Configuration

Create a .env file in the project root with the following content:

DEBUG=true
HOST=0.0.0.0
PORT=8765
OUTPUT_DIR=./static/voices
STATIC_DIR=./static
TEMPLATES_DIR=./templates

Configure voice directories:

# Create directories for voice data
mkdir -p static/voices/creative

Running the Server

Starting the Server

Use the provided script to start the server:

./start_server.sh

Or manually:

uvicorn run:app --host 0.0.0.0 --port 8765 --reload

Stopping the Server

Use the provided script to stop the server:

./stop_server.sh

Using the Debug Page

The debug page is a powerful tool for troubleshooting voice generation issues:

Navigate to http://localhost:8765/debug in your browser
The debug page provides:
- API selection between /api/v1/generate and /api/voices/generate
- Detailed request/response information
- Task ID validation and tracking
- Audio URL construction debugging
- Network connection testing

Debug Page Features

API Version Selection: Test both API versions to identify compatibility issues
Parameter Controls: Adjust voice generation parameters like temperature and top-k
Request Inspection: View the exact request payload being sent
Response Monitoring: See the raw response and parsed data
Status Checking: Monitor the task progress through different endpoints
URL Validation: Troubleshoot issues with audio URL construction
Network Testing: Test if endpoints are accessible and responding

API Endpoints

Voice Generation

POST /api/v1/generate: Generate voice using API v1

{
  "text": "Text to convert to speech",
  "speaker_id": 1,
  "temperature": 0.7,
  "top_k": 50,
  "device": "auto"
}

POST /api/voices/generate: Generate voice using voices API

{
  "text": "Text to convert to speech",
  "speaker_id": 1,
  "temperature": 0.7,
  "top_k": 50,
  "device": "auto"
}

Task Status

GET /api/v1/tasks/{task_id}: Check task status with API v1
GET /api/voices/tasks/{task_id}: Check task status with voices API

Voice Listing

GET /api/voices: Get list of available voices
GET /api/voices/list: Alternative endpoint for voice listing

Troubleshooting

Common Issues

"GET http://localhost:8765/undefined 404 (Not Found)"
- This indicates an issue with audio URL construction
- Use the debug page to trace the URL construction process
- Ensure task IDs are properly validated
Speaker ID validation errors
- If using speaker_id: 0, the system will automatically map to speaker_id: 1
- Verify speaker IDs in the range 1-5 are available
API endpoint not found
- The system supports both /api/v1/generate and /api/voices/generate
- Use the debug page to test which endpoint is working
- Check for t 7D32 ypos in endpoint URLs
Voice files not playing
- Ensure the static/voices/creative directory contains voice files
- Check the audio URL construction in the debug logs
- Verify file permissions on the voice files

Using the Debug Console

The debug page contains a JavaScript console that provides detailed logging:

Open your browser's developer tools (F12 or right-click -> Inspect)
Navigate to the Console tab
Look for errors or warnings related to API calls
The debug page will also display detailed logs in its interface

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This project uses the CSM model architecture from Sesame AI Labs for voice generation
Special thanks to the open-source community for their contributions to speech synthesis technology

Voice Generation Interface

The EchoForge admin panel includes a simplified voice generation interface that allows you to:

Enter text to be spoken
Select a voice (Male, Female, or Child)
Adjust temperature settings
Generate and download audio files

To access the voice generation interface:

Navigate to http://[server-address]:[port]/admin/voices
Log in with your admin credentials
Use the form to generate voice samples

The interface provides real-time feedback on generation status and device information during processing.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
app		app
docker		docker
docs		docs
examples		examples
scripts		scripts
static		static
templates		templates
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTH_MIGRATION_PLAN.md		AUTH_MIGRATION_PLAN.md
CONTRIBUTING.md		CONTRIBUTING.md
CSM_INTEGRATION_PLAN.md		CSM_INTEGRATION_PLAN.md
DEBUGGING_GUIDE.md		DEBUGGING_GUIDE.md
DOCUMENTATION_PLAN.md		DOCUMENTATION_PLAN.md
Dockerfile		Dockerfile
FUTURE_ENHANCEMENTS.md		FUTURE_ENHANCEMENTS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VOICE_SETUP_GUIDE.md		VOICE_SETUP_GUIDE.md
VOICE_SETUP_README.md		VOICE_SETUP_README.md
debug_env.py		debug_env.py
development_notes.md		development_notes.md
docker-compose.yml		docker-compose.yml
fix_auth_cookie.py		fix_auth_cookie.py
fix_auth_issues.py		fix_auth_issues.py
fix_auth_session.py		fix_auth_session.py
fix_auth_styling.py		fix_auth_styling.py
get-pip.py		get-pip.py
main.py		main.py
port_stability_changes.md		port_stability_changes.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-freeze.txt		requirements-freeze.txt
requirements.txt		requirements.txt
run.py		run.py
run_admin_tests.sh		run_admin_tests.sh
run_server.sh		run_server.sh
run_tests.sh		run_tests.sh
setup.py		setup.py
setup_env.sh		setup_env.sh
setup_voices.py		setup_voices.py
stop_server.sh		stop_server.sh
ux_improvement_plan.md		ux_improvement_plan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EchoForge - Voice Generation Platform

🚨 Debug Mode Available 🚨

Repository

Features

Technology

Direct CSM Implementation

Features

Usage

Performance

Installation

Option 1: Using uv (Recommended)

Option 2: Using pip

Configuration

Running the Server

Starting the Server

Stopping the Server

Using the Debug Page

Debug Page Features

API Endpoints

Voice Generation

Task Status

Voice Listing

Troubleshooting

Common Issues

Using the Debug Console

Contributing

License

Acknowledgements

Voice Generation Interface

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

toddllm/echoforge

Folders and files

Latest commit

History

Repository files navigation

EchoForge - Voice Generation Platform

🚨 Debug Mode Available 🚨

Repository

Features

Technology

Direct CSM Implementation

Features

Usage

Performance

Installation

Option 1: Using uv (Recommended)

Option 2: Using pip

Configuration

Running the Server

Starting the Server

Stopping the Server

Using the Debug Page

Debug Page Features

API Endpoints

Voice Generation

Task Status

Voice Listing

Troubleshooting

Common Issues

Using the Debug Console

Contributing

License

Acknowledgements

Voice Generation Interface

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages