F5-TTS Server is a FastAPI application that provides endpoints for text-to-speech conversion and voice cloning. It offers a simple and efficient way to generate natural-sounding speech from text using various voices and accents.
This project is built on top of F5-TTS, a text-to-speech system that enables high-quality voice synthesis and cloning.
There is a RunPod template for this project: https://runpod.io/console/deploy?template=x9wtzn9izl&ref=2vdt3dn9 The cheapest GPU is enough to run the server.
- Text-to-speech conversion with customizable voices
- Voice cloning capabilities
- Adjustable speech speed
- Audio file upload and processing
- Clone the repository:
git clone https://github.com/ValyrianTech/F5-TTS_server.git
- Navigate to the project directory:
cd F5-TTS_server
- Install the required dependencies:
pip install -r requirements.txt
To start the server, run:
uvicorn f5-tts_server.server:app --host 0.0.0.0 --port 7860
The server provides the following endpoints:
Performs text-to-speech conversion using the base speaker.
Endpoint: /base_tts/
Method: GET
Parameters:
text
(str): The text to convert to speechspeed
(float, optional): Speech speed. Default: 1.0
Changes the voice of an existing audio file.
Endpoint: /change_voice/
Method: POST
Parameters:
reference_speaker
(str): The reference speaker namefile
(file): Audio file to process
Upload a reference audio file for voice cloning.
Endpoint: /upload_audio/
Method: POST
Parameters:
audio_file_label
(str): Label for the uploaded audiofile
(file): Audio file to upload
Synthesize speech using a specific voice and style.
Endpoint: /synthesize_speech/
Method: GET
Parameters:
text
(str): Text to synthesizevoice
(str): Voice to usespeed
(float, optional): Speech speed. Default: 1.0
All synthesis endpoints include these response headers:
x-elapsed-time
: Time taken for synthesis (seconds)x-device-used
: Device used for synthesis (CPU/GPU)
import requests
# Basic text-to-speech
url = "http://localhost:7860/base_tts/"
params = {
"text": "Hello, this is a test.",
"speed": 1.0
}
response = requests.get(url, params=params)
with open("output.wav", "wb") as f:
f.write(response.content)
# Upload reference audio
url = "http://localhost:7860/upload_audio/"
files = {
'file': ('reference.wav', open('reference.wav', 'rb'), 'audio/wav')
}
data = {
'audio_file_label': 'custom_voice'
}
response = requests.post(url, files=files, data=data)
print(response.json()) # Should print: {"message": "File reference.wav uploaded successfully with label custom_voice."}
# Voice cloning
url = "http://localhost:7860/synthesize_speech/"
params = {
"text": "Hello, this is a test.",
"voice": "custom_voice",
"speed": 1.0
}
response = requests.get(url, params=params)
with open("cloned_voice.wav", "wb") as f:
f.write(response.content)