8000 GitHub - jalvarado-it/F5-TTS: Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
forked from SWivid/F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

License

Notifications You must be signed in to change notification settings

jalvarado-it/F5-TTS

Ā 
Ā 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 

Repository files navigation

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

python arXiv demo hfspace msspace lab lab

F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.

E2 TTS: Flat-UNet Transformer, closest reproduction from paper.

Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance

Thanks to all the contributors !

News

Installation

Create a separate environment if needed

# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts

Install PyTorch with matched device

NVIDIA GPU
# Install pytorch with your CUDA version, e.g.
pip install torch==2.3.0+cu118 torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
AMD GPU
# Install pytorch with your ROCm version (Linux only), e.g.
pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2
Intel GPU
# Install pytorch with your XPU version, e.g.
# IntelĀ® Deep Learning Essentials or IntelĀ® oneAPI Base Toolkit must be installed
pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu

# Intel GPU support is also available through IPEX (IntelĀ® Extension for PyTorch)
# IPEX does not require the IntelĀ® Deep Learning Essentials or IntelĀ® oneAPI Base Toolkit
# See: https://pytorch-extension.intel.com/installation?request=platform
Apple Silicon
# Install the stable pytorch, e.g.
pip install torch torchaudio

Then you can choose one from below:

1. As a pip package (if just for inference)

pip install git+https://github.com/SWivid/F5-TTS.git

2. Local editable (if also do training, finetuning)

git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
# git submodule update --init --recursive  # (optional, if need > bigvgan)
pip install -e .

Docker usage also available

# Build from Dockerfile
docker build -t f5tts:v1 .

# Or pull from GitHub Container Registry
docker pull ghcr.io/swivid/f5-tts:main

Inference

1. Gradio App

Currently supported features:

< 9AF8 ul dir="auto">
  • Basic TTS with Chunk Inference
  • Multi-Style / Multi-Speaker Generation
  • Voice Chat powered by Qwen2.5-3B-Instruct
  • Custom inference with more language support
  • # Launch a Gradio app (web interface)
    f5-tts_infer-gradio
    
    # Specify the port/host
    f5-tts_infer-gradio --port 7860 --host 0.0.0.0
    
    # Launch a share link
    f5-tts_infer-gradio --share
    NVIDIA device docker compose file example
    services:
      f5-tts:
        image: ghcr.io/swivid/f5-tts:main
        ports:
          - "7860:7860"
        environment:
          GRADIO_SERVER_PORT: 7860
        entrypoint: ["f5-tts_infer-gradio", "--port", "7860", "--host", "0.0.0.0"]
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: 1
                  capabilities: [gpu]
    
    volumes:
      f5-tts:
        driver: local

    2. CLI Inference

    # Run with flags
    # Leave --ref_text "" will have ASR model transcribe (extra GPU memory usage)
    f5-tts_infer-cli \
    --model "F5-TTS" \
    --ref_audio "ref_audio.wav" \
    --ref_text "The content, subtitle or transcription of reference audio." \
    --gen_text "Some text you want TTS model generate for you."
    
    # Run with default setting. src/f5_tts/infer/examples/basic/basic.toml
    f5-tts_infer-cli
    # Or with your own .toml file
    f5-tts_infer-cli -c custom.toml
    
    # Multi voice. See src/f5_tts/infer/README.md
    f5-tts_infer-cli -c src/f5_tts/infer/examples/multi/story.toml

    3. More instructions

    • In order to have better generation results, take a moment to read detailed guidance.
    • The Issues are very useful, please try to find the solution by properly searching the keywords of problem encountered. If no answer found, then feel free to open an issue.

    Training

    1. Gradio App

    Read training & finetuning guidance for more instructions.

    # Quick start with Gradio web interface
    f5-tts_finetune-gradio

    Development

    Use pre-commit to ensure code quality (will run linters and formatters automatically)

    pip install pre-commit
    pre-commit install

    When making a pull request, before each commit, run:

    pre-commit run --all-files

    Note: Some model components have linting exceptions for E722 to accommodate tensor notation

    Acknowledgements

    Citation

    If our work and codebase is useful for you, please cite as:

    @article{chen-etal-2024-f5tts,
          title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
          author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
          journal={arXiv preprint arXiv:2410.06885},
          year={2024},
    }
    

    License

    Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

    About

    Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages

    • Python 99.5%
    • Other 0.5%
    0