⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
The offical benchmark implementation for Chinese Pun Rebus Art Dataset
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
[LREC] MMChat: Multi-Modal Chat Dataset on Social Media
A simple and efficient Mamba implementation in pure PyTorch and MLX.
A paper list of some recent works about Token Compress for Vit and VLM
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
[TLLM'23] PandaGPT: One Model To Instruction-Follow Them All
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Explainable Multimodal Emotion Reasoning (EMER), Open-vocabulary MER (OV-MER), and AffectGPT
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
A unified evaluation library for multiple machine learning libraries
Graph Transformer Networks (Authors' PyTorch implementation for the NeurIPS 19 paper)
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model in ACM MM 2024 Oral
OpenMMLab Pose Estimation Toolbox and Benchmark.
This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
[ACM Multimedia 2024] Observe before Generate: Emotion-Cause aware Video Caption for Multimodal Emotion Cause Generation in Conversations
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
4 bits quantization of LLaMA using GPTQ
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
The code for the paper "ECR-Chain: Advancing Generative Language Models to Better Emotion Cause Reasoners through Reasoning Chains" (IJCAI-2024).
Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"