8000 GitHub - teasgen/speech_separation
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

teasgen/speech_separation

Repository files navigation

VAT-SS: Investigation of Speech Separation Models Including Video Source

[🔥 VAT-SS Report]

AboutInstallationHow To Extract Video EmbeddingsHow To TrainHow To EvaluateCreditsLicense

About

VAT-SS is Andrey-Vera-Teasgen-SpeechSeparation models family. This repository allows user to train and evaluate mentioned in report SS models.

Pay attention that in all configs base model is state-of-the-art DPTN-AV-repack-by-teasgen, but you may use other SS models reported in the paper additionally (take a look at other configs)

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda.

    # create env
    conda create -n project_env python=3.10
    
    # activate env
    conda activate project_env
  2. Install all required packages:

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

How To Extract Video Embeddings

This section is mandatory for running Train and Evaluation script for Audio-Video models. Preliminary video embeddings extraction is necessary for speed up forward time.

bash download_lipreader.sh

python make_embeddings.py \
    --cfg_path src/lipreader/configs/lrw_resnet18_mstcn.json \
    --lipreader_path lrw_resnet18_mstcn_video.pth \
    --mouths_dir mouths \
    --embeds_dir embeddings

The embeddings will be saved to --embeds_dir. Please set correct path to your directory in all Hydra configs at Datasets level

How To Train

You should have single A100-80gb GPU to exactly reproduce training, otherwise please implement and use gradient accumulation

To train a model, run the following commands and register in WandB:

Two-steps training:

python3 train.py -cn dptn_wav_av.yaml dataloader.batch_size=16 writer.run_name=av_dptn_wav_av_v1_video_tanh_gate

Moreover, training logs are available in WandBs

How To Evaluate

Read How To Extract Video Embeddings section before

All generated texts will be saved into data/saved/inferenced/<dataset part> directory with corresponing names. Download SOTA pretrained model using

wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1egOSgh3qaADxWpxd379nmhLrfZ-5xYEf' -O ./model.tar
tar xvf ./model.tar

To run inference and calculate metrics, provide custom dataset, change paths to WAVs and video embeddings in cmd arguments datasets.val.audio_dir, datasets.val.embedding_dir and run:

python3 inference.py -cn inference_dptn_av.yaml dataloader.batch_size=32 inferencer.from_pretrained=model_best.pth datasets.val.part=null datasets.val.audio_dir=<PATH_TO_WAVS> datasets.val.embedding_dir=<PATH_TO_EMBEDDINGS>

Set dataloader.batch_size not more than len(dataset)

In case you don't have GT please change device_tensors in inference_dptn_av.yaml config to device_tensors: ["mix_spectrogram", "mix", "s1_embedding", "s2_embedding"], following that metrics won't be calculated and only predictions will be saved. Or via cmd arguments: inferencer.device_tensors="["mix_spectrogram","mix","s1_embedding","s2_embedding"]"

Use following command to run SiSNRi calculation on GT and predicted directories

export PYTHONPATH=./
python3 src/utils/eval_si_snri.py --predicts-dir <PATH_TO_PREDS> --gt-dir <PATH_TO_GTS>

<PATH_TO_PREDS> is directory containing predicts file in .pth format <PATH_TO_GTS> is directory containing s1, s2, mix dirs

To evaluate the computational performance of the model, run:

python3 profiler.py

Best model DPTN-AV-repack profiler results in Kaggle enviroment with P100 GPU:

Metric Value
GFLOPs 108.556458241
CUDA Memory 14378.582016
Inference Time (Mean) 0.09988968074321747
Inference Time (Std) 0.04486224800348282
Number of Parameters 40809590

Credits

This repository is based on a PyTorch Project Template.

License

License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0