More Web Proxy on the site http://driver.im/

research-article

Space-indexed dynamic programming: learning to follow trajectories

Authors:

J. Zico Kolter,

Charles DuHadwayAuthors Info & Claims

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 488 - 495

https://doi.org/10.1145/1390156.1390218

Published: 05 July 2008 Publication History

Abstract

We consider the task of learning to accurately follow a trajectory in a vehicle such as a car or helicopter. A number of dynamic programming algorithms such as Differential Dynamic Programming (DDP) and Policy Search by Dynamic Programming (PSDP), can efficiently compute non-stationary policies for these tasks --- such policies in general are well-suited to trajectory following since they can easily generate different control actions at different times in order to follow the trajectory. However, a weakness of these algorithms is that their policies are time-indexed, in that they apply different policies depending on the current time. This is problematic since 1) the current time may not correspond well to where we are along the trajectory and 2) the uncertainty over states can prevent these algorithms from finding any good policies at all. In this paper we propose a method for space-indexed dynamic programming that overcomes both these difficulties. We begin by showing how a dynamical system can be rewritten in terms of a spatial index variable (i.e., how far along the trajectory we are) rather than as a function of time. We then use these space-indexed dynamical systems to derive space-indexed version of the DDP and PSDP algorithms. Finally, we show that these algorithms perform well on a variety of control tasks, both in simulation and on real systems.

References

[1]

Anderson, B. D. O., & Moore, J. B. (1989). Optimal control: Linear quadratic methods.

Digital Library

[2]

Atkeson, C., & Morimoto, J. (2003). Nonparametric representation of policies and value functions: A trajectory-based approach. NIPS 15.

[3]

Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. Neural Information Processing Systems 6.

[4]

Bagnell, J., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. Proceedings of the International Conference on Robotics and Automation.

[5]

Bagnell, J. A., Kakade, S., Ng, A. Y., & Schneider, J. (2004). Policy search by dynamic programming. Neural Information Processing Systems 16.

[6]

Dorf, R., & Bishop, R. (2000). Modern control systems, 9th edition. Prentice-Hall.

Digital Library

[7]

Egerstedt, M., & Hu, X. (2000). Coordinated trajectory following for mobile manipulation. Proceedings of the Internation Conference on Robotics and Automation.

[8]

Franklin, G., Powell, J., & Emani-Naeini, A. (1995). Feedback control of dynamic systems. Addison-Wesley.

Digital Library

[9]

Garcia, C., Prett, D., & Morari, M. (1989). Model predictive control: theory and practice --- a survey. Automatica, 25, 335--348.

Digital Library

[10]

Geibel, P., Brefeld, U., & Wysotzki, F. (2004). Perceptron and SVM learning with generalized cost models. Intelligent Data Analysis, 8.

Digital Library

[11]

Hoffmann, G., Tomlin, C., Montemerlo, M., & Thrun, S. (2007). Autonomous automobile trajectory tracking for off-road driving: Controller design, experimental validation and racing. Proc. 26th American Control Conf.

[12]

Jacobson, D., & Mayne, D. (1970). Differential dynamic programming. Elsevier.

[13]

Johnson, E., & Calise, A. (2002). A six degree-of-freedom adaptive flight control architecture for trajectory following. Proceedings of the AIAA Guidance, Navigation, and Control Conference.

[14]

Lagoudakis, M., & Parr, R. (2003). Reinforcement learning as classification: Leveraging modern classifiers. Proceedings of the Int'l Conf on Machine Learning.

[15]

Leith, D., & Leithead, W. (2000). Survey of gain-scheduling analysis and design. International Journal of Control, 73, 1001--1025.

[16]

Ng, A. Y., Kim, H. J., Jordan, M., & Russell, S. (2004). Autonomous helicopter flight via reinforcement learning. Neural Information Processing Systems 16.

[17]

Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley.

Digital Library

[18]

Rossetter, E., & Gerdes, J. (2002). Performance guarantees for hazard based lateral vehicle control. Proceedings of the International Mechanical Engineering Conference and Exposition.

[19]

Sastry, S. (1999). Nonlinear systems. Springer.

[20]

Tassa, Y., Erez, T., & Smart, W. (2007). Receding horizon differential dynamic programming. NIPS 20.

[21]

Thrun, S., & al. (2006). Winning the DARPA Grand Challenge. J. of Field Robotics. accepted for publication.

Cited By

Hewitt AYang CLi YCui R(2017)DMP and GMR based teaching by demonstration for a KUKA LBR robot2017 23rd International Conference on Automation and Computing (ICAC)10.23919/IConAC.2017.8081982(1-6)Online publication date: Sep-2017
https://doi.org/10.23919/IConAC.2017.8081982
Borno MPanne MFiume E(2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/3072959.300990736:4(1)Online publication date: 16-Jul-2017
https://dl.acm.org/doi/10.1145/3072959.3009907
Borno MPanne MFiume E(2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/300990736:2(1-11)Online publication date: 29-Mar-2017
https://dl.acm.org/doi/10.1145/3009907
Show More Cited By

Index Terms

Space-indexed dynamic programming: learning to follow trajectories
1. Computing methodologies
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Dynamic programming

Recommendations

Generic programming for indexed datatypes
WGP '11: Proceedings of the seventh ACM SIGPLAN workshop on Generic programming

An indexed datatype is a type that uses a parameter as a type-level tag; a typical example is the type of vectors, which are indexed over a type-level natural number encoding their length. Since the introduction of generalised algebraic datatypes, ...
Transform-Space View: Performing Spatial Join in the Transform Space Using Original-Space Indexes

Spatial joins find all pairs of objects that satisfy a given spatial relationship. In spatial joins using indexes, original-space indexes such as the R-tree are widely used. An original-space index is the one that indexes objects as represented in the ...
A Unified Approach for Indexed and Non-Indexed Spatial Joins
EDBT '00: Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology

Most spatial join algorithms either assume the existence of a spatial index structure that is traversed during the join process, or solve the problem by sorting, partitioning, or on-the-fly index construction. In this paper, we develop a simple plane-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '08: Proceedings of the 25th international conference on Machine learning

July 2008

1310 pages

ISBN:9781605582054

DOI:10.1145/1390156

General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Pascal
University of Helsinki
Xerox
Federation of Finnish Learned Societies
Google Inc.
NSF
Machine Learning Journal/Springer
Microsoft Research: Microsoft Research
Intel: Intel
Yahoo!
Helsinki Institute for Information Technology
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency

Conference

ICML '08

Sponsor:

Microsoft Research
Intel
IBM

ICML '08: The 25th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

July 5 - 9, 2008

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
256
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hewitt AYang CLi YCui R(2017)DMP and GMR based teaching by demonstration for a KUKA LBR robot2017 23rd International Conference on Automation and Computing (ICAC)10.23919/IConAC.2017.8081982(1-6)Online publication date: Sep-2017
https://doi.org/10.23919/IConAC.2017.8081982
Borno MPanne MFiume E(2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/3072959.300990736:4(1)Online publication date: 16-Jul-2017
https://dl.acm.org/doi/10.1145/3072959.3009907
Borno MPanne MFiume E(2017)Domain of Attraction Expansion for Physics-Based Character ControlACM Transactions on Graphics10.1145/300990736:2(1-11)Online publication date: 29-Mar-2017
https://dl.acm.org/doi/10.1145/3009907
Toussaint MRatliff NBohg JRighetti LEnglert PSchaal S(2014)Dual execution of optimized contact interaction trajectories2014 IEEE/RSJ International Conference on Intelligent Robots and Systems10.1109/IROS.2014.6942539(47-54)Online publication date: Sep-2014
https://doi.org/10.1109/IROS.2014.6942539
Kober JBagnell JPeters J(2013)Reinforcement learning in robotics: A surveyThe International Journal of Robotics Research10.1177/027836491349572132:11(1238-1274)Online publication date: 23-Aug-2013
https://doi.org/10.1177/0278364913495721
Khatab STraechtler A(2013)Virtual test driver for critically stable driving maneuvers16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013)10.1109/ITSC.2013.6728495(1835-1839)Online publication date: Oct-2013
https://doi.org/10.1109/ITSC.2013.6728495
Khatab STraechtler A(2013)Trajectory optimization and optimal control of vehicle dynamics under critically stable driving conditions2013 International Conference on System Science and Engineering (ICSSE)10.1109/ICSSE.2013.6614644(117-121)Online publication date: Jul-2013
https://doi.org/10.1109/ICSSE.2013.6614644
Kober JPeters J(2012)Reinforcement Learning in Robotics: A SurveyReinforcement Learning10.1007/978-3-642-27645-3_18(579-610)Online publication date: 2012
https://doi.org/10.1007/978-3-642-27645-3_18
Muico ULee YPopović JPopović ZFunkhouser T(2009)Contact-aware nonlinear control of dynamic charactersACM SIGGRAPH 2009 papers10.1145/1576246.1531387(1-9)Online publication date: 27-Jul-2009
https://dl.acm.org/doi/10.1145/1576246.1531387
Muico ULee YPopović JPopović Z(2009)Contact-aware nonlinear control of dynamic charactersACM Transactions on Graphics10.1145/1531326.153138728:3(1-9)Online publication date: 27-Jul-2009
https://dl.acm.org/doi/10.1145/1531326.1531387
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents