Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.00464 (cs)

[Submitted on 2 Jul 2023 (v1), last revised 11 Aug 2023 (this version, v2)]

Title:Human-to-Human Interaction Detection

Authors:Zhenhua Wang, Kaining Ying, Jiajun Meng, Jifeng Ning

View PDF

Abstract:A comprehensive understanding of interested human-to-human interactions in video streams, such as queuing, handshaking, fighting and chasing, is of immense importance to the surveillance of public security in regions like campuses, squares and parks. Different from conventional human interaction recognition, which uses choreographed videos as inputs, neglects concurrent interactive groups, and performs detection and recognition in separate stages, we introduce a new task named human-to-human interaction detection (HID). HID devotes to detecting subjects, recognizing person-wise actions, and grouping people according to their interactive relations, in one model. First, based on the popular AVA dataset created for action detection, we establish a new HID benchmark, termed AVA-Interaction (AVA-I), by adding annotations on interactive relations in a frame-by-frame manner. AVA-I consists of 85,254 frames and 86,338 interactive groups, and each image includes up to 4 concurrent interactive groups. Second, we present a novel baseline approach SaMFormer for HID, containing a visual feature extractor, a split stage which leverages a Transformer-based model to decode action instances and interactive groups, and a merging stage which reconstructs the relationship between instances and groups. All SaMFormer components are jointly trained in an end-to-end manner. Extensive experiments on AVA-I validate the superiority of SaMFormer over representative methods. The dataset and code will be made public to encourage more follow-up studies.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2307.00464 [cs.CV]
	(or arXiv:2307.00464v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.00464

Submission history

From: Kaining Ying [view email]
[v1] Sun, 2 Jul 2023 03:24:58 UTC (1,798 KB)
[v2] Fri, 11 Aug 2023 10:08:46 UTC (1,798 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Human-to-Human Interaction Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Human-to-Human Interaction Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators