Computer Science > Robotics

arXiv:2311.00924 (cs)

[Submitted on 2 Nov 2023]

Title:The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

Authors:Carmelo Sferrazza, Younggyo Seo, Hao Liu, Youngwoon Lee, Pieter Abbeel

View PDF

Abstract:Humans rely on the synergy of their senses for most essential tasks. For tasks requiring object manipulation, we seamlessly and effectively exploit the complementarity of our senses of vision and touch. This paper draws inspiration from such capabilities and aims to find a systematic approach to fuse visual and tactile information in a reinforcement learning setting. We propose Masked Multimodal Learning (M3L), which jointly learns a policy and visual-tactile representations based on masked autoencoding. The representations jointly learned from vision and touch improve sample efficiency, and unlock generalization capabilities beyond those achievable through each of the senses separately. Remarkably, representations learned in a multimodal setting also benefit vision-only policies at test time. We evaluate M3L on three simulated environments with both visual and tactile observations: robotic insertion, door opening, and dexterous in-hand manipulation, demonstrating the benefits of learning a multimodal policy. Code and videos of the experiments are available at this https URL.

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.00924 [cs.RO]
	(or arXiv:2311.00924v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2311.00924

Submission history

From: Carmelo Sferrazza [view email]
[v1] Thu, 2 Nov 2023 01:33:00 UTC (3,367 KB)

Computer Science > Robotics

Title:The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators