8000 GitHub - LIEGU0317/vits: VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

LIEGU0317/vits

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to Use

【简体中文】

If you're also struggling without a GPU, you might want to try the Featurize platform. Here's my invitation link.

Python Version

Python 3.10

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu116

Python 3.7

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu110

Clone the Repository

git clone https://github.com/CjangCjengh/vits.git

Choose Cleaners

The repository has been modified for training in Chinese, so you can skip this step if training in Chinese.

  • Fill in "text_cleaners" in config.json
  • Edit text/symbols.py
  • Remove unnecessary imports from text/cleaners.py

Install Dependencies

pip install -r requirements_py310.txt  # or requirements.txt

Create Dataset

Single Speaker

Set "n_speakers" to 0 in config.json.

Format:

path/to/XXX.wav|transcribed text

Example:

dataset/001.wav|こんにちは。

Multiple Speakers

Speaker IDs should start from 0.

Format:

path/to/XXX.wav|speaker ID|transcribed text

Example:

dataset/001.wav|0|こんにちは。

Preprocessing

If you have already completed this step, set "cleaned_text" to true in config.json.

# Single speaker
python preprocess.py --text_index 1 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners chinese_cleaners

# Multiple speakers
python preprocess.py --text_index 2 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners chinese_cleaners

Build Monotonic Alignment Search

cd monotonic_align
mkdir "monotonic_align"
python setup.py build_ext --inplace
cd ..

Training

# Single speaker
python train.py -c <config> -m <folder>

# Multiple speakers
python train_ms.py -c <config> -m <folder>

Inference

Online

See inference.ipynb

Offline

See MoeGoe

Running in Docker

docker run -itd --gpus all --name "container name" -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all "image name"

About

VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 88.8%
  • Jupyter Notebook 7.8%
  • C++ 2.4%
  • Other 1.0%
0