-
National University of Singapore
- Singapore
-
23:38
(UTC +08:00)
Stars
Wan: Open and Advanced Large-Scale Video Generative Models
TradingAgents: Multi-Agents LLM Financial Trading Framework
[Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey
A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!
IEAP: Image Editing As Programs with Diffusion Models
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
This repository includes the official implementation of our paper "Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers"
[Preprint 2025] Thinkless: LLM Learns When to Think
Dimple, the first Discrete Diffusion Multimodal Large Language Model
In-context subject-driven image generation while preserving foreground fidelity
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
The official code release for Q#: Provably Optimal Distributional RL for LLM Post-Training
Understanding R1-Zero-Like Training: A Critical Perspective
[ICML 2025] Official PyTorch implementation of paper "Ultra-Resolution Adaptation with Ease".
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
[CVPR 2025] Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"(ICCV2025)
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
PE3R: Perception-Efficient 3D Reconstruction. Take 2 - 3 photos with your phone, upload them, wait a few minutes, and then start exploring your 3D world via text!
verl: Volcano Engine Reinforcement Learning for LLMs
SGLang is a fast serving framework for large language models and vision language models.
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
[ICCV2025] Official implementation of paper "Towards Performance Consistency in Multi-Level Model Collaboration"
CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.