-
Meta
- New York City, NY, US
-
03:36
(UTC -04:00) - https://bigpon.github.io/
Stars
Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Interactive visualizations of the geometric intuition behind diffusion models.
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
Audio processing by using pytorch 1D convolution network
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Unified automatic quality assessment for speech, music, and sound.
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Code and data recipes for the paper: Heterogeneous Target Speech Separation
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
A library for soundscape synthesis and augmentation
The PyTorch-based audio source separation toolkit for researchers
ModelScope: bring the notion of Model-as-a-Service to life.
This is the audio sample repository for speech separation model "MossFormer2".
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement
wsj0-{2, 3, 4, 5} mix generation scripts, in Python.
The official Implementation of PeriodWave and PeriodWave-Turbo
Generation scripts for EARS-WHAM and EARS-Reverb
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
PAM is a no-reference audio quality metric for audio generation tasks
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment