If you're also struggling without a GPU, you might want to try the Featurize platform. Here's my invitation link.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu116
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu110
git clone https://github.com/CjangCjengh/vits.git
The repository has been modified for training in Chinese, so you can skip this step if training in Chinese.
- Fill in
"text_cleaners"
inconfig.json
- Edit
text/symbols.py
- Remove unnecessary imports from
text/cleaners.py
pip install -r requirements_py310.txt # or requirements.txt
Set "n_speakers"
to 0 in config.json
.
Format:
path/to/XXX.wav|transcribed text
Example:
dataset/001.wav|こんにちは。
Speaker IDs should start from 0.
Format:
path/to/XXX.wav|speaker ID|transcribed text
Example:
dataset/001.wav|0|こんにちは。
If you have already completed this step, set "cleaned_text"
to true in config.json.
# Single speaker
python preprocess.py --text_index 1 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners chinese_cleaners
# Multiple speakers
python preprocess.py --text_index 2 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners chinese_cleaners
cd monotonic_align
mkdir "monotonic_align"
python setup.py build_ext --inplace
cd ..
# Single speaker
python train.py -c <config> -m <folder>
# Multiple speakers
python train_ms.py -c <config> -m <folder>
See inference.ipynb
See MoeGoe
docker run -itd --gpus all --name "container name" -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all "image name"