zhangzw12319

🎯

Focusing

zhangzw12319

🎯

Focusing

PhD Student & CVer & Oier(past)

32 followers · 110 following

CSE Dept, SJTU
Shanghai, China

Achievements

Highlights

Lists (1)

Sort

✨ Inspiration

1 repository

Stars

Shaier / arxiv_summarizer

This repository provides a Python script to fetch and summarize research papers from arXiv using the free Gemini API

Python 195 25 Updated Mar 5, 2025

OpenDriveLab / UniVLA

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 246 5 Updated May 16, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,060 986 Updated May 17, 2025

DepthAnything / Video-Depth-Anything

[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Python 957 67 Updated Apr 25, 2025

AgibotTech / genie_sim

The Simulation Framework from AgiBot

Python 138 9 Updated May 7, 2025

google-deepmind / tapnet

Tracking Any Point (TAP)

Jupyter Notebook 1,519 141 Updated May 17, 2025

OpenDriveLab / OpenScene

3D Occupancy Prediction Benchmark in Autonomous Driving

Python 356 22 Updated May 27, 2024

OpenDriveLab / Nexus

Nexus: Decoupled Diffusion Sparks Adaptive Scene Generation

Python 37 4 Updated May 15, 2025

changgyhub / leetcode_101

LeetCode 101：力扣刷题指南

9,411 1,231 Updated Dec 8, 2024

ABAKA-AI / mooredata-sdk

Python 5 1 Updated May 8, 2025

wzzheng / OccSora

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Python 174 9 Updated May 31, 2024

cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”

Python 965 48 Updated Apr 21, 2025

flymin / MagicDrive3D

Official implementation of the paper “MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes”

284 4 Updated Jun 8, 2024

BluePointLilac / ContextMenuManager

🖱️ 纯粹的Windows右键菜单管理程序

C# 14,560 691 Updated Aug 17, 2024

MINT-SJTU / STI-Bench

STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Python 20 1 Updated May 9, 2025

bolianchen / nuscenes_depth

Python 16 1 Updated Jun 1, 2022

tulerfeng / Video-R1

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 520 25 Updated May 16, 2025

WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Python 321 23 Updated Jul 17, 2024

ding523 / Curr_REFT

Python 56 1 Updated Apr 17, 2025

TencentARC / GeometryCrafter

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Python 277 8 Updated Apr 28, 2025

microsoft / MoGe

[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Python 960 55 Updated Mar 23, 2025

royalmelon0505 / dist4d

50 2 Updated Mar 28, 2025

Arlo0o / UniScene-Unified-Occupancy-centric-Driving-Scene-Generation

[CVPR 2025] UniScene: Unified Occupancy-centric Driving Scene Generation

Python 381 10 Updated Apr 24, 2025

Stable-X / Stable3DGen

Python 987 51 Updated May 14, 2025

OpenDriveLab / RoboDual

RoboDual: Dual-System for Robotic Manipulation

Python 76 2 Updated Apr 28, 2025

illume-unified-mllm / ILLUME_plus

96 2 Updated Apr 3, 2025

zoeylove / Multi-cam-Multi-map-VILO

C++ 32 3 Updated Apr 17, 2025

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 10,422 749 Updated May 15, 2025

Jiahao000 / VICT

[CVPR 2025] Test-Time Visual In-Context Tuning

23 1 Updated Mar 28, 2025

Chrixtar / latent-gaussian-rasterization

Modified 3D Gaussian rasterizer for latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Cuda 52 6 Updated Apr 10, 2024