Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.11330 (cs)

[Submitted on 22 Apr 2023]

Title:Self-supervised Learning by View Synthesis

Authors:Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

View PDF

Abstract:We present view-synthesis autoencoders (VSA) in this paper, which is a self-supervised learning framework designed for vision transformers. Different from traditional 2D pretraining methods, VSA can be pre-trained with multi-view data. In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose. The decoder of VSA has several cross-attention blocks, which use the source view as value, source pose as key, and target pose as query. They achieve cross-attention to synthesize the target view. This simple approach realizes large-angle view synthesis and learns spatial invariant representation, where the latter is decent initialization for transformers on downstream tasks, such as 3D classification on ModelNet40, ShapeNet Core55, and ScanObjectNN. VSA outperforms existing methods significantly for linear probing and is competitive for fine-tuning. The code will be made publicly available.

Comments:	13 pages, 12 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.11330 [cs.CV]
	(or arXiv:2304.11330v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.11330

Submission history

From: Shaoteng Liu [view email]
[v1] Sat, 22 Apr 2023 06:12:13 UTC (5,200 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Self-supervised Learning by View Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Self-supervised Learning by View Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators