Abstract
Grasping objects intelligently is a challenging task even for humans, and we spend a considerable amount of time during our childhood to learn how to grasp objects correctly. In the case of robots, we cannot afford to spend that much time on making it to learn how to grasp objects effectively. Therefore, in the present research we propose an efficient learning architecture based on VQVAE so that robots can be taught with sufficient data corresponding to correct grasping. However, getting sufficient labelled data is extremely difficult in the robot grasping domain. To help solve this problem, a semi-supervised learning-based model, which has much more generalization capability even with limited labelled data set, has been investigated. Its performance shows 6% improvement when compared with existing state-of-the-art models including our earlier model. During experimentation, it has been observed that our proposed model, RGGCNN2, performs significantly better, both in grasping isolated objects as well as objects in a cluttered environment, compared to the existing approaches which do not use unlabelled data for generating grasping rectangles. To the best of our knowledge, developing an intelligent robot grasping model (based on semi-supervised learning) trained through representation learning and exploiting the high-quality learning ability of GGCNN2 architecture with the limited number of labelled dataset together with the learned latent embeddings, can be used as a de-facto training method which has been established and also validated in this paper through rigorous hardware experimentations using Baxter (Anukul) research robot (Video demonstration).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Abbreviations
- VAE::
-
Variational auto-encoder
- VQVAE::
-
Vector-quantized VAE
- CNN::
-
Convolutional neural network
- GGCNN::
-
Generative grasp CNN
- GGCNN2::
-
Generative grasp CNN-2
- RGGCNN::
-
Representation-based GGCNN
- RGGCNN2::
-
Representation-based GGCNN2
References
Sahbani, A., El-Khoury, S., Bidaud, P.: An overview of 3d object grasp synthesis algorithms. Robot. Auton. Syst. 60, 326–336 (2012). https://doi.org/10.1016/j.robot.2011.07.016
Bohg, J., Morales, A., Asfour, T., Kragic, D.: Data-driven grasp synthesis-a survey. IEEE Trans. Robot. 30(2), 289–309 (2014)
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)
Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J.A., Goldberg, K.: Dex-net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. CoRR arXiv:1703.09312 (2017)
Pinto, L., Gupta, A.: Supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours. CoRR arXiv:1509.06825 (2015)
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 769–776 (2017). IEEE
Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322 (2015). IEEE
Ku, L.Y., Learned-Miller, E.G., Grupen, R.A.: Associating grasping with convolutional neural network features. CoRR arXiv:1609.03947 (2016)
Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. CoRR arXiv:1804.05172 (2018)
Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39(2–3), 183–201 (2020). https://doi.org/10.1177/0278364919859066
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. CoRR arXiv:1711.00937 (2017)
van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Conditional image generation with pixelcnn decoders. CoRR arXiv:1606.05328 (2016)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14, pp. 2672–2680. MIT Press, Cambridge, MA, USA (2014)
van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.W., Kavukcuoglu, K.: Wavenet: a generative model for raw audio. CoRR arXiv:1609.03499 (2016)
Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J., Courville, A.C., Bengio, Y.: Samplernn: an unconditional end-to-end neural audio generation model. CoRR arXiv:1612.07837 (2016)
Kalchbrenner, N., van den Oord, A., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A., Kavukcuoglu, K.: Video pixel networks. CoRR arXiv:1610.00527 (2016)
Finn, C., Goodfellow, I.J., Levine, S.: Unsupervised learning for physical interaction through video prediction. CoRR arXiv:1605.07157 (2016)
Maitin-Shepard, J., Cusumano-Towner, M., Lei, J., Abbeel, P.: Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: 2010 IEEE International Conference on Robotics and Automation, pp. 2308–2315 (2010)
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. CoRR arXiv:1506.02640 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. CoRR arXiv:1512.02325 (2015)
Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. Int. J. Robot. Res. 31(3), 360–375 (2012). https://doi.org/10.1177/0278364911428653
Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. CoRR arXiv:1603.02199 (2016)
Viereck, U., ten Pas, A., Saenko, K., Jr., R.P.: Learning a visuomotor controller for real world robotic grasping using easily simulated depth images. CoRR arXiv:1706.04652 (2017)
Schmidt, P., Vahrenkamp, N., Wächter, M., Asfour, T.: Grasping of unknown objects using deep convolutional neural networks based on depth images. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6831–6838 (2018)
Zeng, A., Song, S., Yu, K., Donlon, E., Hogan, F.R., Bauzá, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N., Alet, F., Dafle, N.C., Holladay, R., Morona, I., Nair, P.Q., Green, D., Taylor, I.J., Liu, W., Funkhouser, T.A., Rodriguez, A.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. CoRR arXiv:1710.01330 (2017)
Wei, J., Liu, H., Yan, G., Sun, F.: Robotic grasping recognition using multi-modal deep extreme learning machine. Multidimension. Syst. Signal Process. 28(3), 817–833 (2017)
Wang, J., Hu, Q., Jiang, D.: A Lagrangian network for kinematic control of redundant robot manipulators. IEEE Trans. Neural Netw. 10(5), 1123–1132 (1999)
University, C.: Robot learning lab: Learning to grasp. Available online: http://pr.cs.cornell.edu/grasping/rect_data/data.php
Depierre, A., Dellandréa, E., Chen, L.: Jacquard: a large scale dataset for robotic grasp detection. CoRR arXiv:1803.11469 (2018)
Tchuiev, V., Indelman, V.: Inference over distribution of posterior class probabilities for reliable Bayesian classification and object-level perception. IEEE Robot. Autom. Lett. 3(4), 4329–4336 (2018). https://doi.org/10.1109/LRA.2018.2852844
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2014)
Ji, S., Zhang, Z., Ying, S., Wang, L., Zhao, X., Gao, Y.: Kullback–Leibler divergence metric learning. IEEE Trans. Cybern. (2020). https://doi.org/10.1109/TCYB.2020.3008248
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. CoRR arXiv:1411.4555 (2014)
Ju, Z., Yang, C., Ma, H.: Kinematics modeling and experimental verification of baxter robot. In: Proceedings of the 33rd Chinese Control Conference, pp. 8518–8523 (2014). https://doi.org/10.1109/ChiCC.2014.6896430
Mahajan, M., Bhattacharjee, T., Krishnan, A., Shukla, P., Nandi, G.C.: Robotic grasp detection by learning representation in a vector quantized manifold. In: 2020 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2020)
Wang, Z., Li, Z., Wang, B., Liu, H.: Robot grasp detection using multimodal deep convolutional neural networks. Adv. Mech. Eng. 8(9), 1687814016668077 (2016)
Asif, U., Tang, J., Harrer, S.: Ensemblenet: improving grasp detection using an ensemble of convolutional neural networks. In: BMVC, p. 10 (2018)
Karaoguz, H., Jensfelt, P.: Object detection approach for robot grasp detection. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 4953–4959 (2019). IEEE
Asif, U., Tang, J., Harrer, S.: Graspnet: an efficient convolutional neural network for real-time grasp detection for low-powered devices. In: IJCAI, vol. 7, pp. 4875–4882 (2018)
Guo, D., Sun, F., Liu, H., Kong, T., Fang, B., Xi, N.: A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1609–1614 (2017). IEEE
Shukla, P., Mahajan, M., Bhattacharjee, T., Krishnan, A., Nandi, G.C.: Robotic grasp detection by learning representation in a vector quantized manifold. In: CVPR, Women in Computer Vision Workshop (Poster) (2020)
Ying, Z., Li, G., Zang, X., Wang, R., Wang, W.: A novel shadow-free feature extractor for real-time road detection. In: Proceedings of the 24th ACM international conference on Multimedia, pp. 611–615 (2016). https://doi.org/10.1145/2964284.2967294
Huang, J.-B., Chen, C.-S.: Moving cast shadow detection using physics-based features. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2310–2317 (2009). https://doi.org/10.1109/CVPR.2009.5206629
Healey, G.: Segmenting images using normalized color. IEEE Trans. Syst. Man Cybern. 22(1), 64–73 (1992). https://doi.org/10.1109/21.141311
Ying, Z., Li, G., Wen, S., Tan, G.: ORGB: offset correction in RGB color space for illumination-robust image processing. CoRR arXiv:1708.00975 (2017)
Acknowledgements
The present research is partially funded by the I-Hub foundation for Cobotics (Technology Innovation Hub of IIT-Delhi set up by the Department of Science and Technology, Govt. of India).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shukla, P., Kushwaha, V. & Nandi, G.C. Development of a robust cascaded architecture for intelligent robot grasping using limited labelled data. Machine Vision and Applications 34, 99 (2023). https://doi.org/10.1007/s00138-023-01459-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01459-2