Starred repositories
Solve Visual Understanding with Reinforced VLMs
R1V, trained with AI feedback, answers open-ended visual questions.
Puzzles for learning Triton, play it with minimal environment configuration!
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
A Massively Parallel Large Scale Self-Play Framework
😎 Awesome papers on token redundancy reduction
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Efficient Triton Kernels for LLM Training
💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Repo for the Deep Reinforcement Learning Nanodegree program
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
A paper list of some recent works about Token Compress for Vit and VLM
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Awesome papers & datasets specifically focused on long-term videos.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed Attention
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
verl: Volcano Engine Reinforcement Learning for LLMs
SOTA Re-identification Methods and Toolbox
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥