F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.
E2 TTS: Flat-UNet Transformer, closest reproduction from paper.
Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance
- 2025/03/12: š„ F5-TTS v1 base model with better training and inference performance. Few demo.
- 2024/10/08: F5-TTS & E2 TTS base models on š¤ Hugging Face, š¤ Model Scope, š£ Wisemodel.
# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts
NVIDIA GPU
# Install pytorch with your CUDA version, e.g. pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124
AMD GPU
# Install pytorch with your ROCm version (Linux only), e.g. pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2
Intel GPU
# Install pytorch with your XPU version, e.g. # IntelĀ® Deep Learning Essentials or IntelĀ® oneAPI Base Toolkit must be installed pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu # Intel GPU support is also available through IPEX (IntelĀ® Extension for PyTorch) # IPEX does not require the IntelĀ® Deep Learning Essentials or IntelĀ® oneAPI Base Toolkit # See: https://pytorch-extension.intel.com/installation?request=platform
Apple Silicon
# Install the stable pytorch, e.g. pip install torch torchaudio
pip install f5-tts