This project is all in one, easy-to-use voice conversion tool. With the goal of creating high-quality and high-performance voice conversion products, the project allows users to change voices smoothly and naturally.
- Cli for colab notebook
Feature | Description |
---|---|
Music Separation | Utilizes MDX-Net and Demucs for separating audio tracks. |
Voice Conversion | Supports file conversion, batch conversion, conversion with Whisper, and text-to-speech conversion. |
Background Music Editing | Enables editing and manipulation of background music tracks. |
Apply Effects to Audio | Allows application of various effects to enhance or modify audio output. |
Generate Training Data | Creates training data from linked paths for model training. |
Model Training | Supports v1 and v2 models with high-quality encoders for training. |
Model Fusion | Facilitates combining multiple models for enhanced performance. |
Read Model Information | Provides functionality to access and display model metadata. |
Export Models to ONNX | Enables exporting trained models to ONNX format for compatibility. |
Download from Pre-existing Model Repositories | Allows downloading models from established repositories. |
Search for Models on the Web | Supports searching for models online for easy access. |
Pitch Extraction | Extracts pitch information from audio inputs. |
Support for Audio Conversion Inference Using ONNX Models | Enables inference for audio conversion using ONNX-compatible models. |
ONNX RVC Models with Indexing | Supports ONNX RVC models with indexing for efficient inference. |
Multiple Model Options | |
F0: pm, dio, mangio-crepe-tiny, mangio-crepe-small, mangio-crepe-medium, mangio-crepe-large, mangio-crepe-full, crepe-tiny, crepe-small, crepe-medium, crepe-large, crepe-full, fcpe, fcpe-legacy, rmvpe, rmvpe-legacy, harvest, yin, pyin, swipe |
|
F0_ONNX: Some models converted to ONNX for accelerated pitch extraction. | |
F0_HYBRID: Combines multiple options, e.g., hybrid[rmvpe+harvest] , or all options together. |
|
EMBEDDERS: contentvec_base, hubert_base, japanese_hubert_base, korean_hubert_base, chinese_hubert_base, portuguese_hubert_base |
|
EMBEDDERS_ONNX: Pre-converted ONNX versions of embedding models for accelerated extraction. | |
EMBEDDERS_TRANSFORMERS: Pre-converted Hugging Face versions of embedding models as an alternative to Fairseq. | |
SPIN_EMBEDDERS: A new embedding extraction model offering potentially higher quality than older methods. |
- Step 1: Install Python from the official website or Python (REQUIRES PYTHON 3.10.x OR PYTHON 3.11.x)
- Step 2: Install FFmpeg from FFMPEG, extract it, and add it to PATH
- Step 3: Download and extract the source code
- Step 4: Navigate to the source code directory and open Command Prompt or Terminal
- Step 5: Run the command to install the required libraries
python -m venv envenv\Scripts\activate
If you have an NVIDIA GPU, run this step depending on your CUDA version (you may need to change cu117 to cu128, etc.):
If using Torch 2.3.1
python -m pip install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu117
If using Torch 2.6.0
python -m pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu117
Then run:
python -m pip install -r requirements.txt
- Step 6: Run the
run_app
file to open the user interface (Note: Do not close the Command Prompt or Terminal for the interface) - Alternatively, use Command Prompt or Terminal in the source code directory
- To allow the interface to access files outside the project, add
--allow_all_disk
to the command:
env\Scripts\python.exe main\app\app.py --open
To use TensorBoard for training monitoring:
Run the file: tensorboard or the command
env\Scripts\python.exe main\app\tensorboard.py
python main\app\parser.py --help
- This project only supports NVIDIA GPUs
- Currently, new encoders like MRF HIFIGAN do not yet have complete pre-trained datasets
- MRF HIFIGAN and REFINEGAN encoders do not support training without pitch training
- Models in the URVC repository are collected from AI Hub, HuggingFace, and other repositories. They may carry different licenses (For example, Audioldm2 has model weights with a "Non-Commercial" clause).
- This source code contains third party software components licensed under "non-commercial" terms. Any commercial use, including solicitation of donations or financialization of derivatives, may violate the license and will be subject to appropriate legal liability.
-
You must ensure that the audio content you upload and convert through this project does not violate the intellectual property rights of third parties.
-
The project must not be used for any illegal activities, including but not limited to fraud, harassment, or causing harm to others.
-
You are solely responsible for any damages arising from improper use of the product.
-
I will not be responsible for any direct or indirect damages arising from the use of this project.
Project | Author/Organization | License |
---|---|---|
Vietnamese-RVC | Phạm Huỳnh Anh | Apache License 2.0 |
Applio | IAHispano | MIT License |
Python-audio-separator | Nomad Karaoke | MIT License |
Retrieval-based-Voice-Conversion-WebUI | RVC Project | MIT License |
RVC-ONNX-INFER-BY-Anh | Phạm Huỳnh Anh | MIT License |
Torch-Onnx-Crepe-By-Anh | Phạm Huỳnh Anh | MIT License |
Hubert-No-Fairseq | Phạm Huỳnh Anh | MIT License |
Local-attention | Phil Wang | MIT License |
TorchFcpe | CN_ChiTu | MIT License |
FcpeONNX | Yury | MIT License |
ContentVec | Kaizhi Qian | MIT License |
Mediafiredl | Santiago Ariel Mansilla | MIT License |
Noisereduce | Tim Sainburg | MIT License |
World.py-By-Anh | Phạm Huỳnh Anh | MIT License |
Mega.py | O'Dwyer Software | Apache 2.0 License |
Gdown | Kentaro Wada | MIT License |
Whisper | OpenAI | MIT License |
PyannoteAudio | pyannote | MIT License |
AudioEditingCode | Hila Manor | MIT License |
StftPitchShift | Jürgen Hock | MIT License |
Codename-RVC-Fork-3 | Codename;0 | MIT License |
This document provides detailed information on the pitch extraction methods used, including their advantages, limitations, strengths, and reliability based on personal experience.
Method | Type | Advantages | Limitations | Strength | Reliability |
---|---|---|---|---|---|
pm | Praat | Fast | Less accurate | Low | Low |
dio | PYWORLD | Suitable for rap | Less accurate at high frequencies | Medium | Medium |
harvest | PYWORLD | More accurate than DIO | Slower processing | High | Very high |
crepe | Deep Learning | High accuracy | Requires GPU | Very high | Very high |
mangio-crepe | Crepe finetune | Optimized for RVC | Sometimes less accurate than original crepe | Medium to high | Medium to high |
fcpe | Deep Learning | Accurate, real-time | Requires powerful GPU | Good | Medium |
fcpe-legacy | Old | Accurate, real-time | Older | Good | Medium |
rmvpe | Deep Learning | Effective for singing voices | Resource-intensive | Very high | Excellent |
rmvpe-legacy | Old | Supports older systems | Older | High | Good |
yin | Librosa | Simple, efficient | Prone to octave errors | Medium | Low |
pyin | Librosa | More stable than YIN | More complex computation | Good | Good |
swipe | WORLD | High accuracy | Sensitive to noise | High | Good |
-
If you encounter an error while using this source code, I sincerely apologize for the poor experience. You can report the bug using the methods below.
-
you can report bugs to us via ISSUE.