-
ByteDance
- China
-
09:05
(UTC +08:00) - https://scholar.google.com/citations?user=PH8rJHYAAAAJ&hl
- @tiahch
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
Official repository for the paper "Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers"
Code release for paper "Test-Time Training Done Right"
Open-source Multi-agent Poster Generation from Papers
Efficient triton implementation of Native Sparse Attention.
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
[CVPR 2025] Official implementation for "Empowering LLMs to Understand and Generate Complex Vector Graphics" https://arxiv.org/abs/2412.11102
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
Understand and test language model architectures on synthetic tasks.
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
A curated collection of resources, tools, and frameworks for developing GUI Agents.
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training