GitHub

RingFormer Architecture

RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer

Seongho Hong, Yong-Hoon Choi

Getting Started

Dependencies

You can install the Python dependencies with

pip install -r requirements.txt

Datasets

The supported dataset is

LJSpeech: The LJSpeech Dataset consists of speech data recorded by a single English speaker (single-speaker TTS) with an American accent. The dataset contains approximately 13,100 audio clips, each corresponding to a short passage from classic literature and other public domain texts. The total audio duration is around 24 hours, making it well-suited for training text-to-speech models.

Training

Train with

python train.py -c configs/vits2_ljs_ring.json -m "model name"

TensorBoard

Use

tensorboard --logdir ./logs/

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.

Inference

python inference.py --text "your text to synthesize" --output "output.wav"

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
conformer		conformer
filelists		filelists
generator_blocks		generator_blocks
img		img
monotonic_align		monotonic_align
preprocess		preprocess
text		text
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
inference.ipynb		inference.ipynb
inference.py		inference.py
losses.py		losses.py
models.py		models.py
modules.py		modules.py
requirements.txt		requirements.txt
stft.py		stft.py
train.py		train.py 8000
transforms.py		transforms.py
utils.py		utils.py
webui.py		webui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer

Seongho Hong, Yong-Hoon Choi

Getting Started

Dependencies

Datasets

Training

TensorBoard

Inference

References

Codes

Papers

Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

seongho608/RingFormer

Folders and files

Latest commit

History

Repository files navigation

RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer

Seongho Hong, Yong-Hoon Choi

Getting Started

Dependencies

Datasets

Training

TensorBoard

Inference

References

Codes

Papers

Datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages