Abstract
This is an extension from a selected paper from JSAI2019. In this paper, we study data augmentation for visually-grounded language understanding in the context of picking task. A typical picking task consists of predicting a target object specified by an ambiguous instruction ,e.g., “Pick up the yellow toy near the bottle”. We specifically show that existing methods for understanding such an instruction can be improved by data augmentation. More explicitly, MCTM [1] and MTCM-GAN [2] show better results with data augmentation when specifically considering latent space features instead of raw features. Additionally our results show that latent-space data augmentation can improve better a network accuracy than regularization methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Magassouba, A., Sugiura, K., Kawai, H.: A multimodal target-source classifier model for object fetching from natural language instructions. In: Proceedings of the National Congress of the Japanese Society for Artificial Intelligence, pp. 2D3E403–2D3E403 (2019). (in Japanese)
Magassouba, A., Sugiura, K., Trinh Quoc, A., Kawai, H.: Understanding natural language instructions for fetching daily objects using GAN-based multimodal target-source classification. IEEE RA-L 4(4), 3884–3891 (2019)
Iocchi, L., Holz, D., Ruiz-del Solar, J., Sugiura, K., Van Der Zant, T.: RoboCup@ Home: analysis and results of evolving competitions for domestic and service robots. Artif. Intell. 229, 258–281 (2015)
Yu, L., Tan, H., Bansal, M., Berg, T.L.: A joint speaker listener-reinforcer model for referring expressions. In: CVPR, vol. 2 (2017)
Magassouba, A., Sugiura, K., Kawai, H.: A multimodal classifier generative adversarial network for carry and place tasks from ambiguous language instructions. IEEE RA-L 3(4), 3113–3120 (2018)
Magassouba, A., Sugiura, K., Kawai, H.: Multimodal attention branch network for perspective-free sentence generation. In: Conference on Robot Learning (CoRL) (2019)
Cohen, V., Burchfiel, B., Nguyen, T., Gopalan, N., Tellex, S., Konidaris, G.: Grounding language attributes to objects using bayesian eigenobjects. In: Proceedings IEEE IROS 2019 (2019)
Nagaraja,V.K., Morariu, V.I., Davis, L.S.: Modeling context between objects for referring expression understanding. In: ECCV, pp. 792–807 (2016)
Hatori, J., et al.: Interactively picking real-world objects with unconstrained spoken language instructions. In: IEEE ICRA, pp. 3774–3781 (2018)
Shridhar, M., Hsu, D.: Interactive visual grounding of referring expressions for human-robot interaction. In: RSS (2018)
Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: Proceedings ICLR 2015 (2015)
Sugiura, K., Kawai, H.: Grounded language understanding for manipulation instructions using GAN-based classification. In: IEEE ASRU (2017)
Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier GANs. In: ICML, pp. 2642–2651 (2017)
Anderson, P., Wu, Q., Teney, D., Bruce, J., Johnson, M., Sünderhauf, N., Reid, I., Gould, S., van den Hengel, A.: Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: ECCV, pp. 3674–3683 (2018)
Bousmalis, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: Proceedings of IEEE ICRA, pp. 4243–4250 (2018)
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., Vanhoucke, V.: Sim-to-real: learning agile locomotion for quadruped robots. In: RSS (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings ICLR 2015 (2014)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: ACL, pp. 4171–4186 (2019)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, C.: Improved training of wasserstein GANs. In: NIPS, pp. 5769–5779 (2017)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
Acknowledgement
This work was partially supported by JST CREST, SCOPE and NEDO.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Magassouba, A., Sugiura, K., Kawai, H. (2020). Latent-Space Data Augmentation for Visually-Grounded Language Understanding. In: Ohsawa, Y., et al. Advances in Artificial Intelligence. JSAI 2019. Advances in Intelligent Systems and Computing, vol 1128. Springer, Cham. https://doi.org/10.1007/978-3-030-39878-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-39878-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39877-4
Online ISBN: 978-3-030-39878-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)