10000 rickltt (dushuren) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View rickltt's full-sized avatar
  • Southern University of Science and Technology
  • Shenzhen, China

Block or report rickltt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 20 3 Updated Nov 2, 2024

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Python 234 41 Updated Feb 15, 2024

Auto-AVSR: Lip-Reading Sentences Project

Python 352 55 Updated Jan 8, 2025

A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.

Python 85 21 Updated Nov 25, 2021

Audio-Visual Speech Separation with Cross-Modal Consistency

Python 232 38 Updated Jul 25, 2023
Python 7 1 Updated Jul 4, 2024

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 409 32 Updated Jan 25, 2024

Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs

Python 62 7 Updated Jun 22, 2025

The PyTorch-based audio source separation toolkit for researchers

Python 2,407 437 Updated Jan 11, 2025

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,724 329 Updated Jan 4, 2024

An open source dataset for source separation

Python 430 71 Updated Feb 9, 2024

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Python 699 58 Updated Nov 27, 2024

Air and bone conduction speech

18 Updated Nov 26, 2022

COG-MHEAR Audio-Visual Speech Enhancement Challenge

Python 40 12 Updated May 7, 2025

Python toolkit for likelihood-ratio calibration of binary classifiers

Python 27 9 Updated Feb 21, 2023

ESC-50: Dataset for Environmental Sound Classification

Python 1,587 301 Updated Mar 20, 2024

语音方向实验室/公司/资源/实习等,欢迎推荐或自荐

560 68 Updated Nov 13, 2024
Python 5,577 416 Updated May 11, 2025

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 4,083 463 Updated Apr 15, 2025

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

Python 110 8 Updated Jun 25, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 575 53 Updated Jun 9, 2024

Audio Large Language Models

Python 583 33 Updated Jun 2, 2025

Code for Audio-Visual Target Speaker Extraction with Selective Auditory Attention (TASLP)

Python 20 Updated Feb 28, 2025

PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation

Python 171 23 Updated Jun 20, 2024

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 943 143 Updated May 19, 2025

The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"

Python 165 10 Updated Nov 14, 2024

End-to-End Speech Processing Toolkit

Python 9,239 2,288 Updated Jun 20, 2025

In defence of metric learning for speaker recognition

Python 1,111 284 Updated Mar 26, 2024

A PyTorch-based Speech Toolkit

Python 10,047 1,519 Updated Jun 18, 2025
Next
0