Abstract
Deep reinforcement learning has been widely studied in many fields of robotics. However, the application of the algorithm is seriously restricted by its low convergence efficiency. Although demonstration information can effectively improve the convergence speed, relying too much on demonstration information will reduce the training effect in the real environment and make the convergence effect worse. In addition, historical information should also be considered, as it will affect the utilization efficiency of information and convergence effect of the algorithm. However, there are few studies on this part at present. This paper proposes an improved reinforcement learning algorithm, which introduces the demonstration information utilization mechanism and LSTM network based on the Proximal Policy Optimization algorithm(PPO). Demonstration information is introduced to provide a priori knowledge base for robots, and a utilization mechanism for demonstration information is established to balance the utilization of teaching information and interactive information. So that the data efficiency can be improved. In addition, we reconstruct the network structure in deep reinforcement learning to introduce historical information. Experimental results show that the method is feasible. Compared with the existing solutions, our method significantly improves the convergence effect of robot autonomous learning.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Availability of data is temporarily not allowed by the authors.
Code availability
Code availability is temporarily not allowed by the authors.
References
Torrado, R.R., Bontrager, P., Togelius, J., Liu, J., Perez-Liebana, D.: Deep reinforcement learning for general video game ai. In: 2018 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. IEEE (2018)
Kahn, G., Villaflor, A., Ding, B., Abbeel, P., Levine, S.: Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8. IEEE (2018)
Luo, J., Edmunds, R., Rice, F., Agogino, A.M.: Tensegrity robot locomotion under limited sensory inputs via deep reinforcement learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6260–6267. IEEE (2018)
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 528–535. IEEE (2016)
Beltran-Hernandez, C.C., Petit, D., Ramirez-Alpizar, I.G., Nishi, T., Kikuchi, S., Matsubara, T., Harada, K.: Learning force control for contact-rich manipulation tasks with rigid position-controlled robots. IEEE Robotics and Automation Letters 5(4), 5709–5716 (2020)
Perrusquía, A., Yu, W., Soria, A.: Position/force control of robot manipulators using reinforcement learning. Industrial Robot: the international journal of robotics research and application (2019)
Ghadirzadeh, A., Bütepage, J., Maki, A., Kragic, D., Björkman, M.: A sensorimotor reinforcement learning framework for physical human-robot interaction. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2682–2688. IEEE (2016)
Luo, J., Solowjow, E., Wen, C., Ojea, J.A., Agogino, A.M.: Deep reinforcement learning for robotic assembly of mixed deformable and rigid objects. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2062–2069. IEEE (2018)
Luo, J., Solowjow, E., Wen, C., Ojea, J.A., Agogino, A.M., Tamar, A., Abbeel, P.: Reinforcement learning on variable impedance controller for high-precision robotic assembly. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3080–3087. IEEE (2019)
Nair, A.V., Pong, V., Dalal, M., Bahl, S., Lin, S., Levine, S.: Visual reinforcement learning with imagined goals. Advances in Neural Information Processing Systems 31, 9191–9200 (2018)
Zeng, A., Song, S., Welker, S., Lee, J., Rodriguez, A., Funkhouser, T.: Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4238–4245. IEEE (2018)
Wu, B., Akinola, I., Allen, P.K.: Pixel-attentive policy gradient for multi-fingered grasping in cluttered scenes. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1789–1796. IEEE (2019)
Vecerik, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Rothörl, T., Lampe, T., Riedmiller, M.: Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817 (2017)
Zuo, G., Zhao, Q., Lu, J., Li, J.: Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards. International Journal of Advanced Robotic Systems 17(1), 1729881419898342 (2020)
Zhang, H., Zhou, A., Lin, X.: Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis. Complex & Intelligent Systems 6(3), 741–753 (2020)
Park, H., Bae, J.-H., Park, J.-H., Baeg, M.-H., Park, J.: Intuitive peg-in-hole assembly strategy with a compliant manipulator. In: IEEE ISR 2013, pp. 1–5. IEEE (2013)
Li, R., Platt, R., Yuan, W., ten Pas, A., Roscup, N., Srinivasan, M.A., Adelson, E.: Localization and manipulation of small parts using gelsight tactile sensing. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3988–3993. IEEE (2014)
Tamar, A., Thomas, G., Zhang, T., Levine, S., Abbeel, P.: Learning from the hindsight plan—episodic mpc improvement. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 336–343. IEEE (2017)
Sun, B.-q., Wang, L.: An estimation of distribution algorithm with branch-and-bound based knowledge for robotic assembly line balancing. Complex & Intelligent Systems, 1–14 (2020)
Xu, J., Hou, Z., Wang, W., Xu, B., Zhang, K., Chen, K.: Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. IEEE Transactions on Industrial Informatics 15(3), 1658–1667 (2018)
Breyer, M., Furrer, F., Novkovic, T., Siegwart, R., Nieto, J.: Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning. IEEE Robotics and Automation Letters 4(2), 1549–1556 (2019)
Viereck, U., Pas, A., Saenko, K., Platt, R.: Learning a visuomotor controller for real world robotic grasping using simulated depth images. In: Conference on Robot Learning, pp. 291–300. PMLR (2017)
Rusu, A.A., Večerík, M., Rothörl, T., Heess, N., Pascanu, R., Hadsell, R.: Sim-to-real robot learning from pixels with progressive nets. In: Conference on Robot Learning, pp. 262–270. PMLR (2017)
Barreto, A., Borsa, D., Quan, J., Schaul, T., Silver, D., Hessel, M., Mankowitz, D., Zidek, A., Munos, R.: Transfer in deep reinforcement learning using successor features and generalised policy improvement. In: International Conference on Machine Learning, pp. 501–510. PMLR (2018)
Amiranashvili, A., Dosovitskiy, A., Koltun, V., Brox, T.: Motion perception in reinforcement learning with dynamic objects. In: Conference on Robot Learning, pp. 156–168. PMLR (2018)
Wang, F., Zhou, X., Wang, J., Zhang, X., He, Z., Song, B.: Joining force of human muscular task planning with robot robust and delicate manipulation for programming by demonstration. IEEE/ASME Transactions on Mechatronics (2020)
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Lafleche, J.-F., Saunderson, S., Nejat, G.: Robot cooperative behavior learning using single-shot learning from demonstration and parallel hidden markov models. IEEE Robotics and Automation Letters 4(2), 193–200 (2018)
Tutsoy, O., Brown, M.: Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control. Optimal Control Applications and Methods 37(1), 108–126 (2016)
Tutsoy, O., Barkana, D.E., Balikci, K.: A novel exploration-exploitation-based adaptive law for intelligent model-free control approaches. IEEE Transactions on Cybernetics (2021)
Tsurumine, Y., Cui, Y., Uchibe, E., Matsubara, T.: Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation. Robotics and Autonomous Systems 112, 72–83 (2019)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer (2016)
Liu, S., Liu, D., Srivastava, G., Połap, D., Woźniak, M.: Overview and methods of correlation filter algorithms in object tracking. Complex & Intelligent Systems, 1–23 (2020)
Funding
The Foundation of National Natural Science Foundation of China under Grant: 61973065, 52075531. the Fundamental Research Funds for the Central Universities of China under Grant: N2104008. the Central Government Guides the Local Science and Technology Development Special Fund: 2021JH6/10500129.Innovative Talents Support Program of Liaoning Provincial Universities under LR2020047.
Author information
Authors and Affiliations
Contributions
Fei Wang and Ben Cui conceived the project.Ben Cui and Yue Liu conducted experiments in simulation environment and collected the test data.Fei Wang and Baiming Ren completed the real-world part of the experiment.Fei Wang and Ben Cui analyzed the data and wrote the manuscript.Yue Liu and Baiming Ren provided valuable comments.All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Ethics approval
This article does not contain any studies with human participants performed by any of the authors.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent for publication
The authors declare that they consent to publication.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, F., Cui, B., Liu, Y. et al. Deep Reinforcement Learning for Peg-in-hole Assembly Task Via Information Utilization Method. J Intell Robot Syst 106, 16 (2022). https://doi.org/10.1007/s10846-022-01713-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-022-01713-1