-
Southern University of Science and Technology
- Shenzhen, China
Stars
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
Audio-Visual Speech Separation with Cross-Modal Consistency
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
The PyTorch-based audio source separation toolkit for researchers
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
cogmhear / avse_challenge
Forked from claritychallenge/clarityCOG-MHEAR Audio-Visual Speech Enhancement Challenge
Python toolkit for likelihood-ratio calibration of binary classifiers
ESC-50: Dataset for Environmental Sound Classification
Speech To Speech: an effort for an open-sourced and modular GPT4-o
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Code for Audio-Visual Target Speaker Extraction with Selective Auditory Attention (TASLP)
PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
In defence of metric learning for speaker recognition