8000 GitHub - thuhcsi/hifi-gan: HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
forked from jik876/hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

License

Notifications You must be signed in to change notification settings

thuhcsi/hifi-gan

 
 

Repository files navigation

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

A Pytorch1.4 implementation of HiFi-GAN vocoder adapted from the official github repository, with slight modification on Mel-spectrogram preprocessing.

In the original repo, input mel-spectrogram of target audio is extracted base on pytorch.stft on the fly, while in this repo, input mel-spectrogram of target audio is loaded from stored result of data preprocess.

Training

  1. Prepare your dataset.
    • Extract mel-spectrogram from training audio data;
    • Divide dataset into training set & validation set;
    • (Optional) Implement your own get_dataset_filelist method in meldataset.py to handle metadata.
  2. Specify following parameters and run train.py to start trainging:
    • input_training_file: metadata file of training set;
    • input_validation_file: metadata file of validation set;
    • input_mel_dir: directory where input mel-spectrograms are stored;
    • input_wavs_dir: directory where target audio data are stored (in .npy)
    • checkpoint_path: directory to save model & trainging logs
    • config: path of configuration file (in JSON, e.g. config.json)
    • other optional parameters.

Inference

  1. Download a pretrained HiFi-GAN generator model and corresponding JSON configuration file into the same directory.
  2. Follow the example in inference.py, construct a Vocoder instance by specifying the path of the checkpoint file of Generator to be loaded.
  3. Waveform now could be generated from input mel-spectrogram with interface Vocoder.mel2wav.

About

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
0