Abstract
Many characteristics of sensorimotor control can be explained by models based on optimization and optimal control theories. However, most of the previous models assume that the central nervous system has access to the precise knowledge of the sensorimotor system and its interacting environment. This viewpoint is difficult to be justified theoretically and has not been convincingly validated by experiments. To address this problem, this paper presents a new computational mechanism for sensorimotor control from a perspective of adaptive dynamic programming (ADP), which shares some features of reinforcement learning. The ADP-based model for sensorimotor control suggests that a command signal for the human movement is derived directly from the real-time sensory data, without the need to identify the system dynamics. An iterative learning scheme based on the proposed ADP theory is developed, along with rigorous convergence analysis. Interestingly, the computational model as advocated here is able to reproduce the motor learning behavior observed in experiments where a divergent force field or velocity-dependent force field was present. In addition, this modeling strategy provides a clear way to perform stability analysis of the overall system. Hence, we conjecture that human sensorimotor systems use an ADP-type mechanism to control movements and to achieve successful adaptation to uncertainties present in the environment.
Similar content being viewed by others
References
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton, NJ
Berniker M, Kording K (2008) Estimating the sources of motor errors for adaptation and generalization. Nat Neurosci 11(12):1454–1461
Bhushan N, Shadmehr R (1999) Computational nature of human adaptive control during learning of reaching movements in force fields. Biol Cybern 81(1):39–60
Bristow DA, Tharayil M, Alleyne AG (2006) A survey of iterative learning control. IEEE Control Syst Mag 26(3):96–114
Burdet E, Osu R, Franklin D, Yoshioka T, Milner T, Kawato M (2000) A method for measuring endpoint stiffness during multi-joint arm movements. J Biomech 33(12):1705–1709
Burdet E, Osu R, Franklin DW, Milner TE, Kawato M (2001) The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature 414(6862):446–449
Davidson PR, Wolpert DM (2003) Motor learning and prediction in a variable environment. Curr Opin Neurobiol 13(2):232–237
Diedrichsen J, Shadmehr R, Ivry RB (2010) The coordination of movement: optimal feedback control and beyond. Trends Cognit Sci 14(1):31–39
Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245
Doya K, Kimura H, Kawato M (2001) Neural mechanisms of learning and control. IEEE Control Syst Mag 21(4):42–54
Fitts PM (1954) The information capacity of the human motor system in controlling the amplitude of movement. J Exp Psychol 47(6):381–391
Flash T, Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model. J Neurosci 5(7):1688–1703
Franklin DW, Wolpert DM (2011) Computational mechanisms of sensorimotor control. Neuron 72(3):425–442
Franklin DW, Burdet E, Osu R, Kawato M, Milner TE (2003) Functional significance of stiffness in adaptation of multijoint arm movements to stable and unstable dynamics. Exp Brain Res 151(2):145–157
Franklin DW, Burdet E, Tee KP, Osu R, Chew CM, Milner TE, Kawato M (2008) CNS learns stable, accurate, and efficient movements using a simple algorithm. J Neurosci 28(44):11165–11173
Gomi H, Kawato M (1996) Equilibrium-point control hypothesis examined by measured arm stiffness during multijoint movement. Science 272:117–120
Harris CM, Wolpert DM (1998) Signal-dependent noise determines motor planning. Nature 394:780–784
Hogan N (1985) The mechanics of multi-joint posture and movement control. Biol Cybern 52(5):315–331
Hogan N, Flash T (1987) Moving gracefully: quantitative theories of motor coordination. Trends Neurosci 10(4):170–174
Horn RA (1990) Matrix analysis. Cambridge University Press, Cambridge
Hudson TE, Landy MS (2012) Adaptation to sensory-motor reflex perturbations is blind to the source of errors. J Vis 12(1):1–10
Itô K (1944) Stochastic integral. Proc Jpn Acad Ser A Math Sci 20(8):519–524
Izawa J, Shadmehr R (2011) Learning from sensory and reward prediction errors during motor adaptation. PLoS Comput Biol 7(3):e1002,012
Izawa J, Rane T, Donchin O, Shadmehr R (2008) Motor adaptation as a process of reoptimization. J Neurosci 28(11):2883–2891
Jiang Y, Jiang ZP (2012a) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10):2699–2704
Jiang Y, Jiang ZP (2012b) Robust adaptive dynamic programming. In: Liu D, Lewis F (eds) Reinforcement learning and adaptive dynamic programming for feedback control, Chap 13. Wiley, New York, pp 281–302
Jiang Y, Jiang ZP (2013a) Robust adaptive dynamic programming for optimal nonlinear control design. arXiv, preprint arXiv:13032247v1 [mathDS]
Jiang ZP, Jiang Y (2013b) Robust adaptive dynamic programming for linear and nonlinear systems: an overview. Eur J Control 19(5):417–425
Jiang Y, Jiang ZP (2014) Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(5):882–893
Jiang Y, Chemudupati S, Jorgensen JM, Jiang ZP, Peskin CS (2011a) Optimal control mechanism involving the human kidney. In: The 50th IEEE conference on decision and control and European control conference (CDC–ECC), Orlando, FL, pp 3688–3693
Jiang Y, Jiang ZP, Qian N (2011b) Optimal control mechanisms in human arm reaching movements. In: Proceedings of the 30th Chinese control conference, IEEE, Yantai, China, pp 1377–1382
Kleinman D (1969a) On the stability of linear stochastic systems. IEEE Trans Autom Control 14(4):429–430
Kleinman D (1969b) Optimal stationary control of linear systems with control-dependent noise. IEEE Trans Autom Control 14(6):673 –677
Kording KP, Tenenbaum JB, Shadmehr R (2007) The dynamics of memory as a consequence of optimal adaptation to a changing body. Nat Neurosci 10(6):779–786
Lewis F, Syrmos V (1995) Optimal control. Wiley, New York
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Liu D, Todorov E (2007) Evidence for the flexible sensorimotor strategies predicted by optimal feedback control. J Neurosci 27(35):9354–9368
Ljung L (1999) System identification. Wiley, London
Milne TE (1993) Dependence of elbow viscoelastic behavior on speed and loading in voluntary movements. Exp Brain Res 93(1):177–180
Morasso P (1981) Spatial control of arm movements. Exp Brain Res 42(2):223–227
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern C Appl Rev 32(2):140–153
Mussa-Ivaldi FA, Hogan N, Bizzi E (1985) Neural, mechanical, and geometric factors subserving arm posture in humans. J Neurosci 5(10):2732–2743
Powell WB (2011) Approximate dynamic programming: solving the curses of dimensionality, 2nd edn. Wiley, London
Qian N, Jiang Y, Jiang ZP, Mazzoni P (2013) Movement duration, Fitts’s law, and an infinite-horizon optimal feedback control model for biological motor systems. Neural Comput 25(3):697–724
Schmidt RA, Lee TD (2011) Motor control and learning: a behavioral emphasis, 5th edn. Human Kinetics
Scott SH (2004) Optimal feedback control and the neural basis of volitional motor control. Nat Rev Neurosci 5(7):532–546
Shadmehr R, Mussa-Ivaldi FA (1994) Adaptive representation of dynamics during learning of a motor task. J Neurosci 14(5):3208–3224
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Tanaka H, Krakauer JW, Qian N (2006) An optimization principle for determining movement duration. J Neurophysiol 95(6):3875–3886
Tee KP, Franklin DW, Kawato M, Milner TE, Burdet E (2010) Concurrent adaptation of force and impedance in the redundant muscle system. Biol Cybern 102(1):31–44
Todorov E (2005) Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput 17(5):1084–1108
Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235
Uno Y, Kawato M, Suzuki R (1989) Formation and control of optimal trajectory in human multijoint arm movement: minimum torque-change model. Biolog Cybern 61(2):89–101
Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis F (2009) Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2):477–484
Watkins C (1989) Learning from delayed rewards. PhD thesis. University of Cambridge, Cambridge
Wei K, Körding K (2010), Uncertainty of feedback and state estimation determines the speed of motor adaptation. Front Comput Neurosci 4:1–9
Werbos P (1968) The elements of intelligence. Cybernetica (Namur) (3)
Werbos P (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis. Harvard University, Harvard
Werbos PJ (1989) Neural networks for control and system identification. In: Proceedings of the 28th IEEE conference on decision and control, pp 260–265
Wolpert DM, Ghahramani Z (2000) Computational principles of movement neuroscience. Nat Neurosci 3:1212–1217
Yang C, Ganesh G, Haddadin S, Parusel S, Albu-Schaeffer A, Burdet E (2011) Human-like adaptation of force and impedance in stable and unstable interactions. IEEE Trans Robot 27(5):918–930
Zhou K, Doyle JC, Glover K (1996) Robust and optimal control, vol 272. Prentice Hall, New Jersey
Zhou SH, Oetomo D, Tan Y, Burdet E, Mareels I (2012) Modeling individual human motor behavior through model reference iterative learning control. IEEE Trans Biomed Eng 59(7):1892–1901
Acknowledgments
We would like to thank the Editor and anonymous reviewers for the constructive comments that are helpful for improving the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been supported in part by the National Science Foundation Grants DMS-0906659, ECCS-1101401, and ECCS-1230040.
Rights and permissions
About this article
Cite this article
Jiang, Y., Jiang, ZP. Adaptive dynamic programming as a theory of sensorimotor control. Biol Cybern 108, 459–473 (2014). https://doi.org/10.1007/s00422-014-0613-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-014-0613-7