Stars
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Code for "PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection".
AGI拓展工具,支持AI搜索&爬虫&数据清洗,开箱即用。tavily、天工、百度百科、百家号、360百科、头条、微信公众号、搜狐百科、腾讯新闻、网易新闻、马蜂窝、小红书
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection CVPR 2025
[AAAI 2025] Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
Series of work (ECCV2020, CVPR2021, CVPR2021, ECCV2022) about Compositional Learning for Human-Object Interaction Exploration
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
ligang-cs / njustPhDRoad
Forked from dichen-cd/njustPhDRoad从开题到离校——南京理工大学博士毕业之路
Situation With Groundings (SWiG) dataset and Joint Situation Localizer (JSL)
[NeurIPS 2024] Official code for paper "EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection"
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Code for ICCV2021: Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
[CVPR2020] "Detecting Attended Visual Targets in Video"
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
Official repo of the paper "Object-aware Gaze Target Detection" (ICCV 2023)
CVPR2022 Distillation Using Oracle Queries for Transformer-based Human-Object Interaction Detection
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
[CVPR'22] Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"