8000 GitHub - seongho608/RingFormer
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

seongho608/RingFormer

Repository files navigation


RingFormer Architecture

RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer

Seongho Hong, Yong-Hoon Choi

[Paper] - [Code] - [Demo] - [Weights]

Getting Started

Dependencies

You can install the Python dependencies with

pip install -r requirements.txt

Datasets

The supported dataset is

  • LJSpeech: The LJSpeech Dataset consists of speech data recorded by a single English speaker (single-speaker TTS) with an American accent. The dataset contains approximately 13,100 audio clips, each corresponding to a short passage from classic literature and other public domain texts. The total audio duration is around 24 hours, making it well-suited for training text-to-speech models.

Training

Train with

python train.py -c configs/vits2_ljs_ring.json -m "model name"

TensorBoard

Use

tensorboard --logdir ./logs/

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.

Inference

python inference.py --text "your text to synthesize" --output "output.wav"

References

Codes

Papers

Datasets

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0