[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Learning dexterous in-hand manipulation

Published: 01 January 2020 Publication History

Abstract

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies that can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system such as friction coefficients and an object’s appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM.

References

[1]
Abadi M, Agarwal A, and Barham P, et al. (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
[2]
Abell T and Erdmann MA (1995) Stably supported rotations of a planar polygon with two frictionless contacts. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1995), Pittsburgh, PA, 5–9 August 1995, pp. 411–418.
[3]
Aiyama Y, Inaba M, and Inoue H (1993) Pivoting: A new method of graspless manipulation of object by robot fingers. In: Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1993), Tokyo, Japan, 26–30 July 1993, pp. 136–143.
[4]
Antonova R, Cruciani S, Smith C, and Kragic D. (2017) Reinforcement learning for pivoting task. CoRR abs/1703.00472.
[5]
Bai Y and Liu CK. (2014) Dexterous manipulation using both palm and fingers. In: 2014 IEEE International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, 31 May–7 June 2014, pp. 1560–1565.
[6]
Barth-Maron G, Hoffman MW, and Budden D, et al. (2018) Distributed distributional deterministic policy gradients. CoRR abs/1804.08617.
[7]
Bertsekas DP. (2005) Dynamic Programming and Optimal Control, Vol. 1. Belmont, MA: Athena Scientific.
[8]
Bicchi A. (2000) Hands for dexterous manipulation and robust grasping: A difficult road toward simplicity. IEEE Transactions on Robotics and Automation 16(6): 652–662.
[9]
Bicchi A and Sorrentino R. (1995) Dexterous manipulation through rolling. In: Proceedings of the 1995 International Conference on Robotics and Automation, Nagoya, Aichi, Japan, 21–27 May 1995, pp. 452–457.
[10]
Brockman G, Cheung V, and Pettersson L, et al. (2016) OpenAI GYM. CoRR abs/1606.01540.
[11]
Cherif M and Gupta KK. (1999) Planning quasi-static fingertip manipulations for reconfiguring objects. IEEE Transactions Robotics and Automation 15(5): 837–848.
[12]
Christiano PF, Shah Z, and Mordatch I, et al. (2016) Transfer from simulation to real world through learning deep inverse dynamics model. CoRR abs/1610.03518.
[13]
Dafle NC and Rodriguez A. (2017) Sampling-based planning of in-hand manipulation with external pushes. CoRR abs/1707.00318.
[14]
Dafle NC, Rodriguez A, and Paolini R, et al. (2014) Extrinsic dexterity: In-hand manipulation with external forces. In: 2014 IEEE International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, 31 May–7 June 2014, pp. 1578–1585.
[15]
Doulgeri Z and Droukas L (2013) On rolling contact motion by robotic fingers via prescribed performance control. In: 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013, pp. 3976–3981.
[16]
Erdmann MA. (1998) An exploration of nonprehensile two-palm manipulation. The International Journal of Robotics Research 17(5): 485–503.
[17]
Erdmann MA and Mason MT. (1988) An exploration of sensorless manipulation. IEEE Journal of Robotics and Automation 4(4): 369–379.
[18]
Falco P, Attawia A, Saveriano M, and Lee D. (2018) On policy learning robust to irreversible events: An application to robotic in-hand manipulation. IEEE Robotics and Automation Letters 3(3): 1482–1489.
[19]
Fearing RS (1986) Implementing a force strategy for object re-orientation. In: Proceedings of the 1986 IEEE International Conference on Robotics and Automation, San Francisco, CA, 7–10 April 1986, pp. 96–102.
[20]
Feix T, Romero J, Schmiedmayer HB, Dollar A, and Kragic D. (2016) The grasp taxonomy of human grasp types. IEEE Transactions on Human–Machine Systems 46(1): 66–77.
[21]
Finn C, Tan XY, Duan Y, Darrell T, Levine S, and Abbeel P. (2015) Deep spatial autoencoders for visuomotor learning. arXiv preprint arXiv:1509.06113.
[22]
Gu S, Holly E, Lillicrap TP, and Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA 2017), Singapore, 29 May–3 June 2017, pp. 3389–3396.
[23]
Gupta A, Devin C, Liu Y, Abbeel P, and Levine S. (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. CoRR abs/1703.02949.
[24]
Han L, Guan Y, Li ZX, Qi S, and Trinkle JC. (1997) Dextrous manipulation with rolling contacts. In: Proceedings of the 1997 IEEE International Conference on Robotics and Automation, Albuquerque, NM, 20–25 April 1997, pp. 992–997.
[25]
Han L and Trinkle JC. (1998) Dextrous manipulation by rolling and finger gaiting. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA-98), Leuven, Belgium, 16–20 May 1998, pp. 730–735.
[26]
He K, Zhang X, Ren S, and Sun J. (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
[27]
Hochreiter S and Schmidhuber J. (1997) Long short-term memory. Neural Computation 9(8): 1735–1780.
[28]
Huang WH and Mason MT. (2000) Mechanics, planning, and control for tapping. The International Journal of Robotics Research 19(10): 883–894.
[29]
Kalashnikov D, Irpan A, and Pastor P, et al. (2018) QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. ArXiv e-prints.
[30]
Kingma D and Ba J. (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[31]
Kumar V, Gupta A, Todorov E, and Levine S. (2016a) Learning dexterous manipulation policies from experience and imitation. CoRR abs/1611.05095.
[32]
Kumar V, Todorov E, and Levine S. (2016b) Optimal control with learned local models: Application to dexterous manipulation. In: 2016 IEEE International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden, 16–21 May 2016, pp. 378–383.
[33]
Levine S and Koltun V. (2013) Guided policy search. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), Atlanta, GA, 16–21 June 2013, pp. 1–9.
[34]
Levine S, Pastor P, Krizhevsky A, Ibarz J, and Quillen D. (2018) Learning hand–eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37(4–5): 421–436.
[35]
Levine S, Wagener N, and Abbeel P. (2015) Learning contact-rich manipulation skills with guided policy search. In: IEEE International Conference on Robotics and Automation (ICRA 2015), Seattle, WA, 26–30 May 2015, pp. 156–163.
[36]
Li M, Bekiroglu Y, Kragic D, and Billard A (2014a) Learning of grasp adaptation through experience and tactile sensing. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, 14–18 September 2014, pp. 3339–3346.
[37]
Li M, Yin H, Tahara K, and Billard A (2014b) Learning object-level impedance control for robust grasping and dexterous manipulation. In: 2014 IEEE International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, 31 May–7 June 2014, pp. 6784–6791.
[38]
Li Q, Meier M, Haschke R, Ritter HJ, and Bolder B. (2013) Rotary object dexterous manipulation in hand: A feedback-based method. IJMA 3(1): 36–47.
[39]
Ma RR and Dollar AM (2011) On dexterity and dexterous manipulation. In: 15th International Conference on Advanced Robotics: New Boundaries for Robotics (ICAR 2011), Tallinn, Estonia, 20–23 June 2011, pp. 1–7.
[40]
Mahler J, Liang J, and Niyaz S, et al. (2017a) Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In: Robotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, MA, 12–16 July 2017.
[41]
Mahler J, Matl M, Liu X, Li A, Gealy DV, and Goldberg K. (2017b) Dex-net 3.0: Computing robust robot suction grasp targets in point clouds using a new analytic model and deep learning. CoRR abs/1709.06670.
[42]
Marcin A, Bowen B, and Maciek C, et al. (2018) OpenAI Five. https://blog.openai.com/openai-five/.
[43]
Mordatch I, Popovic Z, and Todorov E (2012) Contact-invariant optimization for hand manipulation. In: Proceedings of the 2012 Eurographics/ACM SIGGRAPH Symposium on Computer Animation (SCA 2012), Lausanne, Switzerland, 2012, pp. 137–144.
[44]
Nair V and Hinton GE. (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814.
[45]
Okamura AM, Smaby N, and Cutkosky MR (2000) An overview of dexterous manipulation. In: Proceedings of the 2000 IEEE International Conference on Robotics and Automation (ICRA 2000), San Francisco, CA, 24–28 April 2000, pp. 255–262.
[47]
Pehoski C, Henderson A, and Tickle-Degnen L. (1997) In-hand manipulation in young children: Rotation of an object in the fingers. American Journal of Occupational Therapy 51(7): 544–552.
[48]
Peng XB, Andrychowicz M, Zaremba W, and Abbeel P. (2017) Sim-to-real transfer of robotic control with dynamics randomization. CoRR abs/1710.06537.
[49]
Pinto L, Andrychowicz M, Welinder P, Zaremba W, and Abbeel P. (2017a) Asymmetric actor critic for image-based robot learning. arXiv preprint arXiv:1710.06542.
[50]
Pinto L, Davidson J, and Gupta A (2017b) Supervision via competition: Robot adversaries for learning tasks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA 2017), Singapore, 29 May–3 June 2017, pp. 1601–1608.
[51]
Pinto L, Davidson J, Sukthankar R, and Gupta A. (2017c) Robust adversarial reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, NSW, Australia, 6–11 August 2017, pp. 2817–2826.
[52]
Plappert M, Andrychowicz M, and Ray A, et al. (2018) Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464.
[53]
Rajeswaran A, Kumar V, Gupta A, Schulman J, Todorov E, and Levine S. (2017) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. CoRR abs/1709.10087.
[54]
Rus D (1992) Dexterous rotations of polyhedra. In: Proceedings of the 1992 IEEE International Conference on Robotics and Automation, Nice, France, 12–14 May 1992, pp. 2758–2763.
[55]
Rus D. (1999) In-hand dexterous manipulation of piecewise-smooth 3-D objects. The International Journal of Robotics Research 18(4): 355–381.
[56]
Rusu AA, Vecerik M, Rothörl T, Heess N, Pascanu R, and Hadsell R. (2017) Sim-to-real robot learning from pixels with progressive nets. In: Proceedings 1st Annual Conference on Robot Learning (CoRL 2017), Mountain View, CA, 13–15 November 2017, pp. 262–270.
[57]
Sadeghi F and Levine S. (2017) CAD2RL: Real single-image flight without a single real image. In: Robotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, MA, 12–16 July 2017.
[58]
Sawasaki N and Inoue H. (1991) Tumbling objects using a multi-fingered robot. Journal of the Robotics Society of Japan 9(5): 560–571.
[59]
Schulman J, Moritz P, Levine S, Jordan M, and Abbeel P. (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438.
[60]
Schulman J, Wolski F, Dhariwal P, Radford A, and Klimov O. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
[61]
ShadowRobot (2005) ShadowRobot Dexterous Hand. https://www.shadowrobot.com/products/dexterous-hand/.
[62]
Shi J, Woodruff JZ, Umbanhowar PB, and Lynch K. (2017) Dynamic in-hand sliding manipulation. IEEE Transactions on Robotics 33(4): 778–795.
[63]
Sutton RS and Barto AG. (1998) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
[64]
Tahara K, Arimoto S, and Yoshida M. (2010) Dynamic object manipulation using a virtual frame by a triple soft-fingered robotic hand. In: IEEE International Conference on Robotics and Automation (ICRA 2010), Anchorage, AK, 3–7 May 2010, pp. 4322–4327.
[65]
Tan J, Zhang T, and Coumans E, et al. (2018) Sim-to-real: Learning agile locomotion for quadruped robots. CoRR abs/1804.10332.
[66]
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, and Abbeel P. (2017a) Domain randomization for transferring deep neural networks from simulation to the real world. arXiv preprint arXiv:1703.06907.
[67]
Tobin J, Zaremba W, and Abbeel P. (2017b) Domain randomization and generative models for robotic grasping. CoRR abs/1710.06425.
[68]
Todorov E, Erez T, and Tassa Y. (2012) Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 5026–5033.
[69]
Tournassoud P, Lozano-Pérez T, and Mazer E. (1987) Regrasping. In: Proceedings of the 1987 IEEE International Conference on Robotics and Automation, Raleigh, NC, 31 March–3 April 1987, pp. 1924–1928.
[70]
Tzeng E, Devin C, and Hoffman J, et al. (2015) Towards adapting deep visuomotor representations from simulated to real environments. CoRR abs/1511.07111.
[71]
Unity Technologies (2005) Unity game engine. http://unity3d.
[72]
van Hoof H, Hermans T, Neumann G, and Peters J (2015) Learning robot in-hand manipulation with tactile features. In: 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2015), Seoul, South Korea, 3–5 November 2015, pp. 121–127.
[73]
Yu W, Tan J, Liu CK, and Turk G (2017) Preparing for the unknown: Learning a universal policy with online system identification. In: Robotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, MA, 12–16 July 2017.
[74]
Zhu Y, Wang Z, and Merel J, et al. (2018) Reinforcement and imitation learning for diverse visuomotor skills. CoRR abs/1802.09564.

Cited By

View all
  • (2024)ELA: Exploited Level Augmentation for Offline Learning in Zero-Sum GamesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663159(2357-2359)Online publication date: 6-May-2024
  • (2024)Mastering Robot Control through Point-based Reinforcement Learning with Pre-trainingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663106(2198-2200)Online publication date: 6-May-2024
  • (2024)Policy Learning for Off-Dynamics RL with Deficient SupportProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662965(1093-1100)Online publication date: 6-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research
International Journal of Robotics Research  Volume 39, Issue 1
Jan 2020
155 pages
This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 January 2020

Author Tags

  1. Dexterous manipulation
  2. multifingered hands
  3. adaptive control
  4. learning and adaptive systems
  5. humanoid robots

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ELA: Exploited Level Augmentation for Offline Learning in Zero-Sum GamesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663159(2357-2359)Online publication date: 6-May-2024
  • (2024)Mastering Robot Control through Point-based Reinforcement Learning with Pre-trainingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663106(2198-2200)Online publication date: 6-May-2024
  • (2024)Policy Learning for Off-Dynamics RL with Deficient SupportProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662965(1093-1100)Online publication date: 6-May-2024
  • (2024)Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A SurveyACM Computing Surveys10.1145/370345357:4(1-35)Online publication date: 10-Dec-2024
  • (2024)A Projection-based Exploration Method for Multi-Agent CoordinationProceedings of the 2024 3rd International Symposium on Intelligent Unmanned Systems and Artificial Intelligence10.1145/3669721.3669723(8-14)Online publication date: 17-May-2024
  • (2024)A configuration of multi-agent reinforcement learning integrating prior knowledgeProceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition10.1145/3663976.3664019(1-6)Online publication date: 26-Apr-2024
  • (2024)Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with PhysicsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657505(1-10)Online publication date: 13-Jul-2024
  • (2024)Safe Controller Synthesis for Nonlinear Systems Using Bayesian Optimization Enhanced Reinforcement LearningProceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3650137(1-10)Online publication date: 14-May-2024
  • (2024)Security and Privacy Issues in Deep Reinforcement Learning: Threats and CountermeasuresACM Computing Surveys10.1145/364031256:6(1-39)Online publication date: 12-Jan-2024
  • (2024)Offline Reinforcement Learning for Optimizing Production Bidding PoliciesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671555(5251-5259)Online publication date: 25-Aug-2024
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media