Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

165 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Dexterous manipulation tasks usually have multiple objectives. The priorities of these objectives may vary at different phases of a manipulation task. Current methods do not consider the objective priority and its change during the task, making a robot have a hard time or even fail to learn a good policy. In this work, we develop a novel Adaptive Hierarchical Curriculum to guide the robot to learn manipulation tasks with multiple prioritized objectives. Our method determines the objective priorities during the learning process and updates the learning sequence of the objectives to adapt to the changing priorities at different phases. A smooth transition function is developed to mitigate the effects on the learning stability when updating the learning sequence. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results show that the proposed method outperforms the baseline methods with a 92.5% success rate in 40 tests and on average takes 36.4% less time to finish the task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Hybrid hierarchical learning for solving complex sequential tasks using the robotic manipulation network ROMAN

Article Open access 07 September 2023

A Novel Skill Learning Framework for Redundant Manipulators Based on Multi-task Dynamic Movement Primitives

Efficient Stacking and Grasping in Unstructured Environments

Article Open access 01 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

Not applicable.

Code Availability

Not applicable.

References

Bellman, R.: A markovian decision process. J Appl Math Mech. 6(5), 679–684 (1957)
MathSciNet MATH Google Scholar
Devlin, S., Kudenko, D., Grześ, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(2), 251–278 (2011)
Article MathSciNet Google Scholar
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: Conference on robot learning, pp. 482–495. PMLR (2017)
Google Scholar
Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable deep reinforcement learning for robotic manipulation. In: 2018 IEEE international conference on robotics and automation (ICRA), pp. 6244–6251. IEEE (2018)
Chapter Google Scholar
Hu, Z., Kaifang, W., Gao, X., Zhai, Y.: A dynamic adjusting reward function method for deep reinforcement learning with adjustable parameters. Math Probl. Eng. 1–10 (2019). https://doi.org/10.1155/2019/7619483
Jain V, Tulabandhula T (2017) Faster Reinforcement learning using active simulators. CoRR. abs/1703.07853. http://arxiv.org/abs/1703.07853
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on robot learning, pp. 651–673. PMLR (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 2015 3rd International Conference on Learning Representations, (ICLR). Conference Track Proceedings, San Diego (2015). http://arxiv.org/abs/1412.6980
Google Scholar
Kroemer, O., Daniel, C., Neumann, G., Van Hoof, H., Herke, G., Peters, J.: Towards learning hierarchical skills for multi-phase manipulation tasks. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 1503–1510. IEEE (2015)
Chapter Google Scholar
Kroemer, O., Van Hoof, H., Neumann, G., Peters, J.: Learning to predict phases of manipulation tasks as hidden states. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4009–4014. IEEE (2014)
Chapter Google Scholar
Luo, Y., Dong, K., Zhao, L., Sun, Z., Zhou, C., Song, B.: Balance between efficient and effective learning: Dense2sparse reward shaping for robot manipulation with environment uncertainty. arXiv preprint arXiv:2003.02740. (2020)
Maheu, V., Archambault, P.S., Frappier, J., Routhier, F.: Evaluation of the JACO robotic arm: Clinico-economic study for powered wheelchair users with upper-extremity disabilities. In: 2011 IEEE international conference on rehabilitation robotics, pp. 1–5. IEEE (2011)
Google Scholar
Modugno, V., Neumann, G., Rueckert, E., Oriolo, G., Peters, J., Ivaldi, S.: Learning soft task priorities for control of redundant robots. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 21–226. IEEE (2016)
Google Scholar
Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P.: Curriculum learning for reinforcement learning domains: a framework and survey. CoRR. abs/2003.04960 (2020) https://arxiv.org/abs/2003.04960
Narvekar, S., Sinapov, J., Stone, P.: Autonomous task sequencing for customized curriculum design in reinforcement learning, pp. 2536–2542 (2017) IJCAI
Google Scholar
Narvekar, S., Stone, P.: Learning curriculum policies for reinforcement learning. CoRR. abs/1812.00285 (2018) http://arxiv.org/abs/1812.00285
Narvekar, S., Stone, P.: Generalizing curricula for reinforcement learning. In: 2020 4th lifelong machine learning workshop at ICML (2020) https://openreview.net/forum?id=7YCysi\_070N
Google Scholar
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. Icml. 99, 278–287 (1999)
Google Scholar
Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 590–595. IEEE (2019)
Chapter Google Scholar
Popov, I., Heess, N., Lillicrap, T.P., Hafner, R., Barth-Maron, G., Vecerik, M., Lampe, T., Tassa, Y., Erez, T., Riedmiller, M.A.: Data-efficient deep reinforcement learning for dexterous manipulation. CoRR. abs/1704.03073 (2017) http://arxiv.org/abs/1704.03073
Rohmer, E., Singh, S.S.P., Freese, M.: V-REP: A versatile and scalable robot simulation framework. In: 2013 IEEE/RSJ international conference on intelligent robots and systems, pp. 1321–1326. IEEE (2013)
Chapter Google Scholar
Salini, J., Padois, V., Bidaud, P.: Synthesis of complex humanoid whole-body behavior: A focus on sequencing and tasks transitions. In: 2011 IEEE international conference on robotics and automation, pp. 1283–1290. IEEE (2011)
Chapter Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: Bengio, Y., Lecun, Y. (eds.) 2016 4th International Conference on Learning Representations, (ICLR). Conference Track Proceedings, San Juan (2016) http://arxiv.org/abs/1511.05952
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR. abs/1707.06347 (2017) http://arxiv.org/abs/1707.06347
Sharma, M., Liang, J., Zhao, J., LaGrassa, A., Kroemer, O.: Learning to compose hierarchical object-centric controllers for robotic manipulation. CoRR. abs/2011.04627 (2020) https://arxiv.org/abs/2011.04627
Da Silva, F.L., Da Costa, L., Reali, A.H.: Object-oriented curriculum generation for reinforcement learning. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp. 1026–1034 (2018)
Google Scholar
Tenorio-Gonzalez, A.C., Morales, E.F., Villasenor-Pineda, L.: Dynamic reward shaping: training a robot by voice. In: Ibero-American conference on artificial intelligence, pp. 483–492. Springer (2010)
Google Scholar
Tutsoy, O., Barkana, D.E., Tugal, H.: Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay. ISA Trans. 76, 67–77 (2018)
Article Google Scholar
Tutsoy, O., Erol Barkana, D., Sule, C.: Learning to balance an NAO robot using reinforcement learning with symbolic inverse kinematic. Trans. Inst. Meas. Control. 39(11), 1735–1748 (2017)
Article Google Scholar
Veiga, F., Akrour, R., Peters, J.: Hierarchical tactile-based control decomposition of dexterous in-hand manipulation tasks. Front Robot AI. 7. https://www.frontiersin.org/article/10.3389/frobt.2020.521448, (2020). https://doi.org/10.3389/frobt.2020.521448
Zhang, D., Bailey, C.P.: Obstacle avoidance and navigation utilizing reinforcement learning with reward shaping. In: Pham, T., Solomon, L., Rainey, K. (eds.) Artificial intelligence and machine learning for multi-domain operations applications II, vol. 11413, pp. 500–506. International Society for Optics and Photonics (SPIE) (2020). https://doi.org/10.1117/12.2558212
Chapter Google Scholar
Zhu, Y., Wang, Z., Merel, J., Rusu, A.A., Erez, T., Cabi, S., Tunyasuvunakool, S., Kramár, J., Hadsell, R., de Freitas, N., Heess, N.: Reinforcement and imitation learning for diverse visuomotor skills. CoRR. abs/1802.09564 (2018) http://arxiv.org/abs/1802.09564

Download references

Funding

This material is based on work supported by the US NSF under grant 1652454 and 2114464. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the National Science Foundation.

Author information

Authors and Affiliations

Colorado School of Mines, Intelligent Robotics and Systems Lab, 1500 Illinois St, Golden, CO, 80401, USA
Lingfeng Tao & Xiaoli Zhang
GAC R&D Center Silicon Valley, Sunnyvale, CA, 94085, USA
Jiucai Zhang

Authors

Lingfeng Tao
View author publications
You can also search for this author in PubMed Google Scholar
Jiucai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. The first manuscript was written by Lingfeng Tao. Dr. Jiucai Zhang and Dr. Xiaoli Zhang provided comments and edits towards the creation of the final manuscript.

Corresponding author

Correspondence to Xiaoli Zhang.

Ethics declarations

Ethics Approval

Ethical approval was waived by the local Ethics Committee of Colorado School of Mines in view of the retrospective nature of the study and all the procedures being performed were part of the routine care.

Consents to Participate

Informed consent was obtained from all individual participants included in the study.

Consents for Publication

The participants have consented to the submission of the case report to the journal.

Conflict of Interests

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

Table 3 shows the performance statistics of the trained policies across the three phases over 40 evaluations, including the number of times that the robot touched obstacles, average time consumption in each phase, and the average normalized reward in each phase. LS is not included because it has no phases during the training.

Table 3 Performance Statistics across Phases over 40 Evaluations

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tao, L., Zhang, J. & Zhang, X. Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum. J Intell Robot Syst 106, 1 (2022). https://doi.org/10.1007/s10846-022-01680-7

Download citation

Received: 18 August 2021
Accepted: 20 June 2022
Published: 16 August 2022
DOI: https://doi.org/10.1007/s10846-022-01680-7

Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid hierarchical learning for solving complex sequential tasks using the robotic manipulation network ROMAN

A Novel Skill Learning Framework for Redundant Manipulators Based on Multi-task Dynamic Movement Primitives

Efficient Stacking and Grasping in Unstructured Environments

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consents to Participate

Consents for Publication

Conflict of Interests

Additional information

Publisher’s Note

Appendix:

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid hierarchical learning for solving complex sequential tasks using the robotic manipulation network ROMAN

A Novel Skill Learning Framework for Redundant Manipulators Based on Multi-task Dynamic Movement Primitives

Efficient Stacking and Grasping in Unstructured Environments

Explore related subjects

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consents to Participate

Consents for Publication

Conflict of Interests

Additional information

Publisher’s Note

Appendix:

Appendix:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation