Deep neural architectures for dialect classification with single frequency filtering and zero−time windowing feature representations
Install Matlab for feature extraction and Python==3.8 for classification
Install required packages using: pip install -r requirements.txt
UT-Podcast is a speech corpus collected from podcasts, it has three dialects of English (US, UK, AU). Please download it from here. For more details refer
The train, validation, and test split of VoxCeleb corpus is provided in voxceleb_corpus folder. VoxCeleb1 corpus can be dowloaded from here
For extraction of features (STFT, SFF, and ZTW based features), MATLAB is used. Code for feature extraction will soon be updated at feature_extraction/
This project implements three neural architectures:
- The code for Convolution Neural Network architecture can be found in main_cnn.py
- The code for Convolution Neural Network with embedded spectra filter as convolution layer architecture can be found in cnn_spectral_layer.py
- The code for Temporal Convolution Neural Network architecture can be found in main_tcnn.py
- The code for Time delay Neural Network architecture can be found in main_tdnn.py
NOTE: Please find the pre-trained models at: https://drive.google.com/drive/folders/1O4ZK1c8I5Vkglyka2fniUTpolyokTAsL?usp=sharing
Unweighted Average Recall (UAR) is used as classification metric. Evaluation results will be updated soon.
@article{dialect_class,
title = {Deep neural architectures for dialect
4A4E
classification with single frequency filtering and zero-time windowing feature representations},
author={Kethireddy, Rashmi and Kadiri, Sudarsana Reddy and Gangashetty, Suryakanth V},
journal = JASA,
volume = {151},
number = {2},
pages = {1077-1092},
year = {2022}
}