[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Adaptive dynamic programming as a theory of sensorimotor control

  • Original Paper
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Many characteristics of sensorimotor control can be explained by models based on optimization and optimal control theories. However, most of the previous models assume that the central nervous system has access to the precise knowledge of the sensorimotor system and its interacting environment. This viewpoint is difficult to be justified theoretically and has not been convincingly validated by experiments. To address this problem, this paper presents a new computational mechanism for sensorimotor control from a perspective of adaptive dynamic programming (ADP), which shares some features of reinforcement learning. The ADP-based model for sensorimotor control suggests that a command signal for the human movement is derived directly from the real-time sensory data, without the need to identify the system dynamics. An iterative learning scheme based on the proposed ADP theory is developed, along with rigorous convergence analysis. Interestingly, the computational model as advocated here is able to reproduce the motor learning behavior observed in experiments where a divergent force field or velocity-dependent force field was present. In addition, this modeling strategy provides a clear way to perform stability analysis of the overall system. Hence, we conjecture that human sensorimotor systems use an ADP-type mechanism to control movements and to achieve successful adaptation to uncertainties present in the environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton, NJ

    Google Scholar 

  • Berniker M, Kording K (2008) Estimating the sources of motor errors for adaptation and generalization. Nat Neurosci 11(12):1454–1461

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Bhushan N, Shadmehr R (1999) Computational nature of human adaptive control during learning of reaching movements in force fields. Biol Cybern 81(1):39–60

    Article  CAS  PubMed  Google Scholar 

  • Bristow DA, Tharayil M, Alleyne AG (2006) A survey of iterative learning control. IEEE Control Syst Mag 26(3):96–114

    Article  Google Scholar 

  • Burdet E, Osu R, Franklin D, Yoshioka T, Milner T, Kawato M (2000) A method for measuring endpoint stiffness during multi-joint arm movements. J Biomech 33(12):1705–1709

    Article  CAS  PubMed  Google Scholar 

  • Burdet E, Osu R, Franklin DW, Milner TE, Kawato M (2001) The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature 414(6862):446–449

    Article  CAS  PubMed  Google Scholar 

  • Davidson PR, Wolpert DM (2003) Motor learning and prediction in a variable environment. Curr Opin Neurobiol 13(2):232–237

    Article  CAS  PubMed  Google Scholar 

  • Diedrichsen J, Shadmehr R, Ivry RB (2010) The coordination of movement: optimal feedback control and beyond. Trends Cognit Sci 14(1):31–39

    Article  Google Scholar 

  • Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245

    Article  CAS  PubMed  Google Scholar 

  • Doya K, Kimura H, Kawato M (2001) Neural mechanisms of learning and control. IEEE Control Syst Mag 21(4):42–54

    Article  Google Scholar 

  • Fitts PM (1954) The information capacity of the human motor system in controlling the amplitude of movement. J Exp Psychol 47(6):381–391

    Article  CAS  PubMed  Google Scholar 

  • Flash T, Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model. J Neurosci 5(7):1688–1703

    CAS  PubMed  Google Scholar 

  • Franklin DW, Wolpert DM (2011) Computational mechanisms of sensorimotor control. Neuron 72(3):425–442

    Article  CAS  PubMed  Google Scholar 

  • Franklin DW, Burdet E, Osu R, Kawato M, Milner TE (2003) Functional significance of stiffness in adaptation of multijoint arm movements to stable and unstable dynamics. Exp Brain Res 151(2):145–157

    PubMed  Google Scholar 

  • Franklin DW, Burdet E, Tee KP, Osu R, Chew CM, Milner TE, Kawato M (2008) CNS learns stable, accurate, and efficient movements using a simple algorithm. J Neurosci 28(44):11165–11173

    CAS  PubMed  Google Scholar 

  • Gomi H, Kawato M (1996) Equilibrium-point control hypothesis examined by measured arm stiffness during multijoint movement. Science 272:117–120

    CAS  PubMed  Google Scholar 

  • Harris CM, Wolpert DM (1998) Signal-dependent noise determines motor planning. Nature 394:780–784

    CAS  PubMed  Google Scholar 

  • Hogan N (1985) The mechanics of multi-joint posture and movement control. Biol Cybern 52(5):315–331

    CAS  PubMed  Google Scholar 

  • Hogan N, Flash T (1987) Moving gracefully: quantitative theories of motor coordination. Trends Neurosci 10(4):170–174

    Google Scholar 

  • Horn RA (1990) Matrix analysis. Cambridge University Press, Cambridge

    Google Scholar 

  • Hudson TE, Landy MS (2012) Adaptation to sensory-motor reflex perturbations is blind to the source of errors. J Vis 12(1):1–10

    Google Scholar 

  • Itô K (1944) Stochastic integral. Proc Jpn Acad Ser A Math Sci 20(8):519–524

    Google Scholar 

  • Izawa J, Shadmehr R (2011) Learning from sensory and reward prediction errors during motor adaptation. PLoS Comput Biol 7(3):e1002,012

    CAS  Google Scholar 

  • Izawa J, Rane T, Donchin O, Shadmehr R (2008) Motor adaptation as a process of reoptimization. J Neurosci 28(11):2883–2891

    CAS  PubMed Central  PubMed  Google Scholar 

  • Jiang Y, Jiang ZP (2012a) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10):2699–2704

    Google Scholar 

  • Jiang Y, Jiang ZP (2012b) Robust adaptive dynamic programming. In: Liu D, Lewis F (eds) Reinforcement learning and adaptive dynamic programming for feedback control, Chap 13. Wiley, New York, pp 281–302

    Google Scholar 

  • Jiang Y, Jiang ZP (2013a) Robust adaptive dynamic programming for optimal nonlinear control design. arXiv, preprint arXiv:13032247v1 [mathDS]

  • Jiang ZP, Jiang Y (2013b) Robust adaptive dynamic programming for linear and nonlinear systems: an overview. Eur J Control 19(5):417–425

    Google Scholar 

  • Jiang Y, Jiang ZP (2014) Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(5):882–893

    PubMed  Google Scholar 

  • Jiang Y, Chemudupati S, Jorgensen JM, Jiang ZP, Peskin CS (2011a) Optimal control mechanism involving the human kidney. In: The 50th IEEE conference on decision and control and European control conference (CDC–ECC), Orlando, FL, pp 3688–3693

  • Jiang Y, Jiang ZP, Qian N (2011b) Optimal control mechanisms in human arm reaching movements. In: Proceedings of the 30th Chinese control conference, IEEE, Yantai, China, pp 1377–1382

  • Kleinman D (1969a) On the stability of linear stochastic systems. IEEE Trans Autom Control 14(4):429–430

    Google Scholar 

  • Kleinman D (1969b) Optimal stationary control of linear systems with control-dependent noise. IEEE Trans Autom Control 14(6):673 –677

  • Kording KP, Tenenbaum JB, Shadmehr R (2007) The dynamics of memory as a consequence of optimal adaptation to a changing body. Nat Neurosci 10(6):779–786

    CAS  PubMed Central  PubMed  Google Scholar 

  • Lewis F, Syrmos V (1995) Optimal control. Wiley, New York

    Google Scholar 

  • Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50

    Google Scholar 

  • Liu D, Todorov E (2007) Evidence for the flexible sensorimotor strategies predicted by optimal feedback control. J Neurosci 27(35):9354–9368

    CAS  PubMed  Google Scholar 

  • Ljung L (1999) System identification. Wiley, London

    Google Scholar 

  • Milne TE (1993) Dependence of elbow viscoelastic behavior on speed and loading in voluntary movements. Exp Brain Res 93(1):177–180

    Google Scholar 

  • Morasso P (1981) Spatial control of arm movements. Exp Brain Res 42(2):223–227

    CAS  PubMed  Google Scholar 

  • Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern C Appl Rev 32(2):140–153

    Google Scholar 

  • Mussa-Ivaldi FA, Hogan N, Bizzi E (1985) Neural, mechanical, and geometric factors subserving arm posture in humans. J Neurosci 5(10):2732–2743

    CAS  PubMed  Google Scholar 

  • Powell WB (2011) Approximate dynamic programming: solving the curses of dimensionality, 2nd edn. Wiley, London

    Google Scholar 

  • Qian N, Jiang Y, Jiang ZP, Mazzoni P (2013) Movement duration, Fitts’s law, and an infinite-horizon optimal feedback control model for biological motor systems. Neural Comput 25(3):697–724

    CAS  PubMed Central  PubMed  Google Scholar 

  • Schmidt RA, Lee TD (2011) Motor control and learning: a behavioral emphasis, 5th edn. Human Kinetics

  • Scott SH (2004) Optimal feedback control and the neural basis of volitional motor control. Nat Rev Neurosci 5(7):532–546

    CAS  PubMed  Google Scholar 

  • Shadmehr R, Mussa-Ivaldi FA (1994) Adaptive representation of dynamics during learning of a motor task. J Neurosci 14(5):3208–3224

    CAS  PubMed  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  • Tanaka H, Krakauer JW, Qian N (2006) An optimization principle for determining movement duration. J Neurophysiol 95(6):3875–3886

    PubMed  Google Scholar 

  • Tee KP, Franklin DW, Kawato M, Milner TE, Burdet E (2010) Concurrent adaptation of force and impedance in the redundant muscle system. Biol Cybern 102(1):31–44

    PubMed  Google Scholar 

  • Todorov E (2005) Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput 17(5):1084–1108

    PubMed Central  PubMed  Google Scholar 

  • Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235

  • Uno Y, Kawato M, Suzuki R (1989) Formation and control of optimal trajectory in human multijoint arm movement: minimum torque-change model. Biolog Cybern 61(2):89–101

  • Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis F (2009) Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2):477–484

  • Watkins C (1989) Learning from delayed rewards. PhD thesis. University of Cambridge, Cambridge

  • Wei K, Körding K (2010), Uncertainty of feedback and state estimation determines the speed of motor adaptation. Front Comput Neurosci 4:1–9

  • Werbos P (1968) The elements of intelligence. Cybernetica (Namur) (3)

  • Werbos P (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis. Harvard University, Harvard

  • Werbos PJ (1989) Neural networks for control and system identification. In: Proceedings of the 28th IEEE conference on decision and control, pp 260–265

  • Wolpert DM, Ghahramani Z (2000) Computational principles of movement neuroscience. Nat Neurosci 3:1212–1217

    CAS  PubMed  Google Scholar 

  • Yang C, Ganesh G, Haddadin S, Parusel S, Albu-Schaeffer A, Burdet E (2011) Human-like adaptation of force and impedance in stable and unstable interactions. IEEE Trans Robot 27(5):918–930

    Google Scholar 

  • Zhou K, Doyle JC, Glover K (1996) Robust and optimal control, vol 272. Prentice Hall, New Jersey

    Google Scholar 

  • Zhou SH, Oetomo D, Tan Y, Burdet E, Mareels I (2012) Modeling individual human motor behavior through model reference iterative learning control. IEEE Trans Biomed Eng 59(7):1892–1901

    PubMed  Google Scholar 

Download references

Acknowledgments

We would like to thank the Editor and anonymous reviewers for the constructive comments that are helpful for improving the presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong-Ping Jiang.

Additional information

This work has been supported in part by the National Science Foundation Grants DMS-0906659, ECCS-1101401, and ECCS-1230040.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Y., Jiang, ZP. Adaptive dynamic programming as a theory of sensorimotor control. Biol Cybern 108, 459–473 (2014). https://doi.org/10.1007/s00422-014-0613-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-014-0613-7

Keywords

Navigation