Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.

Jupyter Notebook 343 27 Updated May 30, 2024

google / break-a-scene

Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]

Python 518 25 Updated Jan 14, 2024

spotify / basic-pitch

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Python 3,946 331 Updated Jan 17, 2025

OpenGVLab / all-seeing

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

Python 484 17 Updated Aug 9, 2024

lixin4ever / Conference-Acceptance-Rate

Acceptance rates for the major AI conferences

Jupyter Notebook 4,485 309 Updated Jan 24, 2025

RayVentura / ShortGPT

🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation

Python 6,502 867 Updated Feb 10, 2025

Yutong-Zhou-cv / Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,339 201 Updated May 6, 2025

jim-schwoebel / allie

🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.

Python 141 35 Updated Apr 2, 2025

seungheondoh / lp-music-caps

LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]

Python 322 38 Updated Apr 8, 2024

YuanGongND / cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Python 256 25 Updated Mar 20, 2024

CrossmodalGroup / DynamicVectorQuantization

Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization"

Python 176 6 Updated Jul 23, 2023

Audio-AGI / WavJourney

WavJourney: Compositional Audio Creation with LLMs

Python 536 43 Updated Sep 28, 2023

baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI

Python 1,720 85 Updated Sep 27, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

15,240 984 Updated May 15, 2025

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,998 2,328 Updated Mar 13, 2025

haoheliu / AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Python 2,653 236 Updated Dec 9, 2024

Sanster / IOPaint

Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.

Python 21,254 2,159 Updated Apr 29, 2025