-
Multimedia Information Lab.
- South Korea
-
09:21
(UTC +09:00)
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
The official Python SDK for Model Context Protocol servers and clients
An open protocol enabling communication and interoperability between opaque agentic applications.
An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
10 Lessons to Get Started Building AI Agents
Open-source simulator for autonomous driving research.
Pytorch implementation of our work "Domain-Invariant Representation Learning of Bird Sounds" (arXiv 2024)
An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"
Pre-trained models for bioacoustic classification tasks
A benchmark dataset collection for bird sound classification
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
This toolbox aims to unify audio generation model evaluation for easier comparison.
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
Minimum implementation of EDM (Elucidating the Design Space of Diffusion-Based Generative Models) on cifar10 and mnist
Karras et al. (2022) diffusion models for PyTorch
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
OpenMusic: SOTA Text-to-music (TTM) Generation
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
Code for Fast Training of Diffusion Models with Masked Transformers
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
Vector (and Scalar) Quantization, in Pytorch
A family of diffusion models for text-to-audio generation.
Generative models for conditional audio generation
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
[NeurIPS 2023] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
The dataset and baseline code for Text-to-Audio Grounding (TAG)