-
Tsinghua University
- Beijing
-
11:29
(UTC +08:00) - https://scholar.google.com/citations?user=w68g1qkAAAAJ&hl=zh-CN&oi=ao
Lists (12)
Sort Name ascending (A-Z)
Stars
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Machine Learning applied to sound
Unified automatic quality assessment for speech, music, and sound.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Open rotating mechanical fault datasets (开源旋转机械故障数据集整理)
A benchmark fault diagnosis dataset comprises vibration data collected from a gearbox under variable working conditions with intentionally induced faults, encompassing diverse fault severities and …
Benchmark popular audio i/o packages
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Multilingual Voice Understanding Model
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Speech, Language, Audio, Music Processing with Large Language Model
A PyTorch Implementation of Federated Learning
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
Audio Codec Speech processing Universal PERformance Benchmark
Modeling, training, eval, and inference code for OLMo
A library built for easier audio self-supervised training, downstream tasks evaluation