8000 leexinhao (Xinhao Li) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View leexinhao's full-sized avatar

Block or report leexinhao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 19 Updated Jun 1, 2025

Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model

Python 89 3 Updated May 27, 2025

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

22 Updated May 25, 2025

Open-source unified multimodal model

Python 3,445 229 Updated May 30, 2025

The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"

Python 100 5 Updated Apr 23, 2025

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Python 1,267 86 Updated May 29, 2025

Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Only 4GB VRAM is enou…

Python 1,609 92 Updated May 16, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,127 41 Updated May 21, 2025

An Enhanced CLIP Framework for Learning with Synthetic Captions

Python 34 1 Updated Apr 18, 2025

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

Python 132 7 Updated Jan 30, 2025

Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

34 Updated Jan 8, 2024

FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.

Python 101 Updated Dec 8, 2024

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Python 138 2 Updated May 16, 2025

TransMLA: Multi-Head Latent Attention Is All You Need

Python 284 22 Updated Jun 1, 2025

A Fine-grained Benchmark for Video Captioning and Retrieval

Python 15 Updated Mar 20, 2025

Automatic evals for LLMs

HTML 399 47 Updated May 31, 2025

R1-like Video-LLM for Temporal Grounding

Python 92 Updated May 27, 2025
Python 884 56 Updated Mar 24, 2025

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 631 23 Updated May 27, 2025

Collections of Papers and Projects for Multimodal Reasoning.

105 9 Updated Apr 25, 2025

[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online

Python 38 2 Updated Apr 6, 2025

Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.

218 7 Updated May 22, 2025
1 Updated Feb 20, 2025

Witness the aha moment of VLM with less than $3.

Python 3,707 286 Updated May 19, 2025

Simple RL training for reasoning

Python 3,601 268 Updated Apr 10, 2025

Fully open reproduction of DeepSeek-R1

Python 24,631 2,277 Updated May 28, 2025

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Python 105 4 Updated Apr 22, 2025

(ICLR 2024, CVPR 2024) SparseFormer

Python 74 2 Updated Nov 10, 2024

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,245 264 Updated Jan 18, 2025

XTuner is a toolkit for efficiently fine-tuning LLM

Python 5 1 Updated Apr 16, 2025
Next
0