Abstract
Embodied agents, trained to explore and navigate indoor photorealistic environments, have achieved impressive results on standard datasets and benchmarks. So far, experiments and evaluations have involved domestic and working scenes like offices, flats, and houses. In this paper, we build and release a new 3D space with unique characteristics: the one of a complete art museum. We name this environment ArtGallery3D (AG3D). Compared with existing 3D scenes, the collected space is ampler, richer in visual features, and provides very sparse occupancy information. This feature is challenging for occupancy-based agents which are usually trained in crowded domestic environments with plenty of occupancy information. Additionally, we annotate the coordinates of the main points of interest inside the museum, such as paintings, statues, and other items. Thanks to this manual process, we deliver a new benchmark for PointGoal navigation inside this new space. Trajectories in this dataset are far more complex and lengthy than existing ground-truth paths for navigation in Gibson and Matterport3D. We carry on extensive experimental evaluation using our new space for evaluation and prove that existing methods hardly adapt to this scenario. As such, we believe that the availability of this 3D model will foster future research and help improve existing solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The dataset has been collected at the Galleria Estense museum of Modena and can be found at https://github.com/aimagelab/ag3d.
- 2.
References
Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)
Anderson, P., et al..: Sim-to-real transfer for vision-and-language navigation. In: CoRL (2021)
Anderson, P., et al.: Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: CVPR (2018)
Batra, D., et al.: Objectnav revisited: On evaluation of embodied agents navigating to objects. arXiv preprint arXiv:2006.13171 (2020)
Bigazzi, R., Landi, F., Cascianelli, S., Baraldi, L., Cornia, M., Cucchiara, R.: Focus on impact: indoor exploration with intrinsic motivation. RA-L (2022)
Bigazzi, R., Landi, F., Cornia, M., Cascianelli, S., Baraldi, L., Cucchiara, R.: Explore and explain: self-supervised navigation and recounting. In: ICPR (2020)
Bigazzi, R., Landi, F., Cornia, M., Cascianelli, S., Baraldi, L., Cucchiara, R.: Out of the box: embodied navigation in the real world. In: CAIP (2021)
Cascianelli, S., Costante, G., Ciarfuglia, T.A., Valigi, P., Fravolini, M.L.: Full-GRU natural language video description for service robotics applications. RA-L 3(2), 841–848 (2018)
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)
Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural SLAM. In: ICLR (2019)
Chen, T., Gupta, S., Gupta, A.: Learning exploration policies for navigation. In: ICLR (2019)
Cornia, M., Baraldi, L., Cucchiara, R.: Smart: training shallow memory-aware transformers for robotic explainability. In: ICRA (2020)
Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)
Irshad, M.Z., Ma, C.Y., Kira, Z.: Hierarchical cross-modal agent for robotics vision-and-language navigation. In: ICRA (2021)
Kadian, A., et al.: Sim2real predictivity: does evaluation in simulation predict real-world performance? RA-L 5(4), 6670–6677 (2020)
Krantz, J., Wijmans, E., Majumdar, A., Batra, D., Lee, S.: Beyond the Nav-graph: vision-and-language navigation in continuous environments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_7
Landi, F., Baraldi, L., Cornia, M., Corsini, M., Cucchiara, R.: Multimodal attention networks for low-level vision-and-language navigation. CVIU 210, 103255 (2021)
Niroui, F., Zhang, K., Kashino, Z., Nejat, G.: Deep reinforcement learning robot for search and rescue applications: exploration in unknown cluttered environments. RA-L 4(2), 610–617 (2019)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML (2017)
Ramakrishnan, S.K., Al-Halah, Z., Grauman, K.: Occupancy anticipation for efficient exploration and navigation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 400–418. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_24
Ramakrishnan, S.K., Jayaraman, D., Grauman, K.: An exploration of embodied visual exploration. Int. J. Comput. Vis. 129(5), 1616–1649 (2021). https://doi.org/10.1007/s11263-021-01437-z
Ramakrishnan, S.K., et al.: Habitat-matterport 3d dataset (HM3d): 1000 large-scale 3D environments for embodied AI. In: NeurIPS (2021). https://openreview.net/forum?id=-v4OuqNs5P
Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347 (2017)
Straub, J., et al.: The Replica Dataset: A Digital Replica of Indoor Spaces. arXiv preprint arXiv:1906.05797 (2019)
Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson env: real-world perception for embodied agents. In: CVPR (2018)
Ye, J., Batra, D., Das, A., Wijmans, E.: Auxiliary tasks and exploration enable ObjectNav. In: ICCV (2021)
Ye, J., Batra, D., Wijmans, E., Das, A.: Auxiliary tasks speed up learning point goal navigation. In: CoRL (2021)
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA (2017)
Acknowledgement
This work has been supported by “Fondazione di Modena” and the “European Training Network on PErsonalized Robotics as SErvice Oriented applications” (PERSEO) MSCA-ITN-2020 project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bigazzi, R., Landi, F., Cascianelli, S., Cornia, M., Baraldi, L., Cucchiara, R. (2022). Embodied Navigation at the Art Gallery. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13231. Springer, Cham. https://doi.org/10.1007/978-3-031-06427-2_61
Download citation
DOI: https://doi.org/10.1007/978-3-031-06427-2_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06426-5
Online ISBN: 978-3-031-06427-2
eBook Packages: Computer ScienceComputer Science (R0)