Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.13387 (cs)

[Submitted on 24 Mar 2022]

Title:CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

Authors:Mohammed Hassanin, Abdelwahed Khamiss, Mohammed Bennamoun, Farid Boussaid, Ibrahim Radwan

View PDF

Abstract:3D human pose estimation can be handled by encoding the geometric dependencies between the body parts and enforcing the kinematic constraints. Recently, Transformer has been adopted to encode the long-range dependencies between the joints in the spatial and temporal domains. While they had shown excellence in long-range dependencies, studies have noted the need for improving the locality of vision Transformers. In this direction, we propose a novel pose estimation Transformer featuring rich representations of body joints critical for capturing subtle changes across frames (i.e., inter-feature representation). Specifically, through two novel interaction modules; Cross-Joint Interaction and Cross-Frame Interaction, the model explicitly encodes the local and global dependencies between the body joints. The proposed architecture achieved state-of-the-art performance on two popular 3D human pose estimation datasets, Human3.6 and MPI-INF-3DHP. In particular, our proposed CrossFormer method boosts performance by 0.9% and 0.3%, compared to the closest counterpart, PoseFormer, using the detected 2D poses and ground-truth settings respectively.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.13387 [cs.CV]
	(or arXiv:2203.13387v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.13387

Submission history

From: Mohammed Hassanin [view email]
[v1] Thu, 24 Mar 2022 23:40:11 UTC (2,345 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators