Stars
[2025 ICML spotlight] When Every Millisecond Counts: Real-Time Anomaly Detection via the Multimodal Asynchronous Hybrid Network
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR 2024 Highlight]
This is the repo for our Detection of Traffic Anomaly (DoTA) dataset.
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
🔥[CVPR2025] EventGPT: Event Stream Understanding with Multimodal Large Language Models
[CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"
Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"
Official implementation of "Dense Continuous-Time Optical Flow from Event Cameras"
Offical implementation of "Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection" (ECCV2024 Best Paper Candidate)
[ICRA'23] Dataset of Moving Object Detection; Official Implementation of "RGB-Event Fusion for Moving Object Detection in Autonomous Driving"
This is the implementation code for the paper, "An Attention-guided Multistream Feature Fusion Network for Early Localization of Risky Traffic Agents in Driving Videoss", IEEE Transaction on Intell…
a state-of-the-art-level open visual language model | 多模态预训练模型
GPT4V-level open-source multi-modal model based on Llama3-8B
[CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of the AI City Challenge 2024 Track 2.
Code for the paper "Low Latency Automotive Vision with Event Cameras", published in Nature
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
PyTorch code and models for the DINOv2 self-supervised learning method.
Event-based Vision Resources. Community effort to collect knowledge on event-based vision technology (papers, workshops, datasets, code, videos, etc)
Event-based neural networks
The suite of modeling video with Mamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
Pytorch implementation of our WACV 2023 paper "Image-Consistent Detection of Road Anomalies As Unpredictable Patches"
[NeurIPS 2024] Official implementation of MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection.