Lists (6)
Sort Name ascending (A-Z)
Starred repositories
[ACM MM 2024] See or Guess: Counterfactually Regularized Image Captioning
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
A curated list of recent diffusion models for video generation, editing, and various other applications.
CHAIR metric is a rule-based metric for evaluating object hallucination in caption generation.
[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.
Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
Acceptance rates for the major AI conferences
🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization"
WavJourney: Compositional Audio Creation with LLMs
Emu Series: Generative Multimodal Models from BAAI
✨✨Latest Advances on Multimodal Large Language Models
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
Open-source and strong foundation image recognition models.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
Tracking and collecting papers/projects/others related to Segment Anything.
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).