-
14:37
(UTC -05:00) - https://scholar.google.com/citations?user=5iSEcFkAAAAJ&hl=en
Stars
[ICLR 2024 Spotlight] Unified Human-Scene Interaction via Prompted Chain-of-Contacts
🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
A Paper List for Humanoid Robot Learning.
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
A curated list of recent diffusion models for video generation, editing, and various other applications.
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
DFloat11: Lossless LLM Compression for Efficient GPU Inference
A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms
This repository collects papers on VLLM applications. We will update new papers irregularly.
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
Implementing DeepSeek R1's GRPO algorithm from scratch
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
Official inference framework for 1-bit LLMs
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
Lets make video diffusion practical!
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
🤖 The Full Process Python Package for Robot Learning from Demonstration and Robot Manipulation
Democratizing Reinforcement Learning for LLMs
flow-pilot is an openpilot based driver assistance system that runs on linux, windows and android powered machines.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
一个全开源低成本的双足机器人(2万元($3000))A Fully Opensourced Humanoid Robot with only $3000
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model