8000 GitHub - Mddct/transformer-vocos
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Mddct/transformer-vocos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ongoing

Why vocos with transformer or conformer ?

Easy to scale and Good control over latency and caching

Why sequence mask in gan?

No data length limit is required, such as 1s

Why in wenet ?

cache and multiple speech models are available out of the box

Data Prepare

{"wav": "/data/BAC009S0764W0121.wav"}
{"wav": "/data/BAC009S0764W0122.wav"}

train

train_data = 'train.jsonl'
model_dir = 'vocos/exp/2025/0.1/transformer/'
tensorboard_dir = ${model_dir}/runs/

mkdir -p $model_dir $tensorboard_dir
torchrun --standalone --nnodes=1 --nproc_per_node=8 vocos/main.py -- \
        --config vocos/configs/default.py \
        --config.train_data=${train_data} \
        --config.model_dir=${model_dir} \
        --config.tensorboard_dir=${tensorboard_dir} \
        --config.max_train_steps 1000000

TODO:

  • training
    • training works
    • check training process
    • generator
    • disc
    • transformer discriminators
    • distill
    • resume
    • stereo for music
    • cqt loss
  • dev benchmark etc
  • infer
    • offline
    • chunk by chunk or frame by frame
    • onnx
  • exmple for: cosyvoice2 and transformer-vocos

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0