-
Reconova
- Beijing
-
05:18
(UTC +08:00) - https://alfredxiangwu.github.io/
- https://orcid.org/0000-0001-5317-1338
- in/wuxiang123
- https://scholar.google.com/citations?user=ZykwvvYAAAAJ
Lists (1)
Sort Name ascending (A-Z)
Stars
Solve Visual Understanding with Reinforced VLMs
SpatialLM: Large Language Model for Spatial Understanding
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
A high-throughput and memory-efficient inference and serving engine for LLMs
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
Illumination Drawing Tools for Text-to-Image Diffusion Models
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
很多镜像都在国外。比如 gcr 。国内下载很慢,需要加速。致力于提供连接全世界的稳定可靠安全的容器镜像服务。
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
This is a toolbox repository to help evaluate various methods that perform image matching from a pair of images.
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Effortless data labeling with AI support from Segment Anything and other awesome models.
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
✨✨Latest Advances on Multimodal Large Language Models
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A curated list of awesome header-only C++ libraries
OpenXRLab Structure-from-Motion Toolbox and Benchmark
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)
Header-only 4437 C++/python library for fast approximate nearest neighbors
[TPAMI 2021] DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition
TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search (ECCV2020)