[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1390156.1390218acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Space-indexed dynamic programming: learning to follow trajectories

Published: 05 July 2008 Publication History

Abstract

We consider the task of learning to accurately follow a trajectory in a vehicle such as a car or helicopter. A number of dynamic programming algorithms such as Differential Dynamic Programming (DDP) and Policy Search by Dynamic Programming (PSDP), can efficiently compute non-stationary policies for these tasks --- such policies in general are well-suited to trajectory following since they can easily generate different control actions at different times in order to follow the trajectory. However, a weakness of these algorithms is that their policies are time-indexed, in that they apply different policies depending on the current time. This is problematic since 1) the current time may not correspond well to where we are along the trajectory and 2) the uncertainty over states can prevent these algorithms from finding any good policies at all. In this paper we propose a method for space-indexed dynamic programming that overcomes both these difficulties. We begin by showing how a dynamical system can be rewritten in terms of a spatial index variable (i.e., how far along the trajectory we are) rather than as a function of time. We then use these space-indexed dynamical systems to derive space-indexed version of the DDP and PSDP algorithms. Finally, we show that these algorithms perform well on a variety of control tasks, both in simulation and on real systems.

References

[1]
Anderson, B. D. O., & Moore, J. B. (1989). Optimal control: Linear quadratic methods.
[2]
Atkeson, C., & Morimoto, J. (2003). Nonparametric representation of policies and value functions: A trajectory-based approach. NIPS 15.
[3]
Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. Neural Information Processing Systems 6.
[4]
Bagnell, J., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. Proceedings of the International Conference on Robotics and Automation.
[5]
Bagnell, J. A., Kakade, S., Ng, A. Y., & Schneider, J. (2004). Policy search by dynamic programming. Neural Information Processing Systems 16.
[6]
Dorf, R., & Bishop, R. (2000). Modern control systems, 9th edition. Prentice-Hall.
[7]
Egerstedt, M., & Hu, X. (2000). Coordinated trajectory following for mobile manipulation. Proceedings of the Internation Conference on Robotics and Automation.
[8]
Franklin, G., Powell, J., & Emani-Naeini, A. (1995). Feedback control of dynamic systems. Addison-Wesley.
[9]
Garcia, C., Prett, D., & Morari, M. (1989). Model predictive control: theory and practice --- a survey. Automatica, 25, 335--348.
[10]
Geibel, P., Brefeld, U., & Wysotzki, F. (2004). Perceptron and SVM learning with generalized cost models. Intelligent Data Analysis, 8.
[11]
Hoffmann, G., Tomlin, C., Montemerlo, M., & Thrun, S. (2007). Autonomous automobile trajectory tracking for off-road driving: Controller design, experimental validation and racing. Proc. 26th American Control Conf.
[12]
Jacobson, D., & Mayne, D. (1970). Differential dynamic programming. Elsevier.
[13]
Johnson, E., & Calise, A. (2002). A six degree-of-freedom adaptive flight control architecture for trajectory following. Proceedings of the AIAA Guidance, Navigation, and Control Conference.
[14]
Lagoudakis, M., & Parr, R. (2003). Reinforcement learning as classification: Leveraging modern classifiers. Proceedings of the Int'l Conf on Machine Learning.
[15]
Leith, D., & Leithead, W. (2000). Survey of gain-scheduling analysis and design. International Journal of Control, 73, 1001--1025.
[16]
Ng, A. Y., Kim, H. J., Jordan, M., & Russell, S. (2004). Autonomous helicopter flight via reinforcement learning. Neural Information Processing Systems 16.
[17]
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley.
[18]
Rossetter, E., & Gerdes, J. (2002). Performance guarantees for hazard based lateral vehicle control. Proceedings of the International Mechanical Engineering Conference and Exposition.
[19]
Sastry, S. (1999). Nonlinear systems. Springer.
[20]
Tassa, Y., Erez, T., & Smart, W. (2007). Receding horizon differential dynamic programming. NIPS 20.
[21]
Thrun, S., & al. (2006). Winning the DARPA Grand Challenge. J. of Field Robotics. accepted for publication.

Cited By

View all
  • (2017)DMP and GMR based teaching by demonstration for a KUKA LBR robot2017 23rd International Conference on Automation and Computing (ICAC)10.23919/IConAC.2017.8081982(1-6)Online publication date: Sep-2017
  • (2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/3072959.300990736:4(1)Online publication date: 16-Jul-2017
  • (2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/300990736:2(1-11)Online publication date: 29-Mar-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Pascal
  • University of Helsinki
  • Xerox
  • Federation of Finnish Learned Societies
  • Google Inc.
  • NSF
  • Machine Learning Journal/Springer
  • Microsoft Research: Microsoft Research
  • Intel: Intel
  • Yahoo!
  • Helsinki Institute for Information Technology
  • IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICML '08
Sponsor:
  • Microsoft Research
  • Intel
  • IBM

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)DMP and GMR based teaching by demonstration for a KUKA LBR robot2017 23rd International Conference on Automation and Computing (ICAC)10.23919/IConAC.2017.8081982(1-6)Online publication date: Sep-2017
  • (2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/3072959.300990736:4(1)Online publication date: 16-Jul-2017
  • (2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/300990736:2(1-11)Online publication date: 29-Mar-2017
  • (2014)Dual execution of optimized contact interaction trajectories2014 IEEE/RSJ International Conference on Intelligent Robots and Systems10.1109/IROS.2014.6942539(47-54)Online publication date: Sep-2014
  • (2013)Reinforcement learning in robotics: A surveyThe International Journal of Robotics Research10.1177/027836491349572132:11(1238-1274)Online publication date: 23-Aug-2013
  • (2013)Virtual test driver for critically stable driving maneuvers16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013)10.1109/ITSC.2013.6728495(1835-1839)Online publication date: Oct-2013
  • (2013)Trajectory optimization and optimal control of vehicle dynamics under critically stable driving conditions2013 International Conference on System Science and Engineering (ICSSE)10.1109/ICSSE.2013.6614644(117-121)Online publication date: Jul-2013
  • (2012)Reinforcement Learning in Robotics: A SurveyReinforcement Learning10.1007/978-3-642-27645-3_18(579-610)Online publication date: 2012
  • (2009)Contact-aware nonlinear control of dynamic charactersACM SIGGRAPH 2009 papers10.1145/1576246.1531387(1-9)Online publication date: 27-Jul-2009
  • (2009)Contact-aware nonlinear control of dynamic charactersACM Transactions on Graphics10.1145/1531326.153138728:3(1-9)Online publication date: 27-Jul-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media