Random-TD Function Approximator

Hassab Elgawi Osman

Image Science and Engineering Lab, Tokyo Institute of Technology, 4259 Nagatsuta, Midori-ku, Yokohama 226-8503, Japan

Received:

July 7, 2008

Accepted:

January 9, 2009

Published:

March 20, 2009

Keywords:

adaptive control, function approximation (FA), TD-learning, random forests (RF)

Abstract

In this paper, adaptive controller architecture based on a combination of temporal-difference (TD) learning and an on-line variant of Random Forest (RF) classifier is proposed. We call this implementation Random-TD. The approach iteratively improves its control strategies by exploiting only relevant parts of action and is able to learn completely in on-line mode. Such capability of on-line adaptation would take us closer to the goal of more robust and adaptable control. To illustrate this and to demonstrate the applicability of the approach, it has been applied to a non-linear, non-stationary control task, Cart-Pole balancing and on high-dimensional control problems –Ailerons, Elevator, Kinematics, and Friedman–. The results demonstrate that our hybrid approach is adaptable and can significantly improves the performance of TD methods while speeding up the learning process.

Cite this article as:

H. Osman, “Random-TD Function Approximator,” J. Adv. Comput. Intell. Intell. Inform., Vol.13 No.2, pp. 155-161, 2009.

Data files:

References

[1] A. Barto, R. Sutton, and C. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,” IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, pp. 834-846, 1983.
[2] L. Breiman, “Random Forests,” Machine Learning, Vol.45(1), pp. 5-32, 2001.
[3] J. H. Friedman, “Multivariate adaptive regression splines,” Annals of Statistics, Vol.19(1), pp. 1-67, 1991.
[4] H. A. Guvenir and I. Uysal, Bilkent University Function Approximation Repository, 2000.
Available online at http://funapp.cs.bilkent.edu.tr/DataSets/.
[5] O. Hassab Elgawi, “Online Random Forests based on CorrFS and CorrBE,” In Proc. IEEE workshop on online classification, CVPR, pp. 1-7, 2008.
[6] P. W. Keller, S. Mannor, and D. Precup, “Automatic basis function construction for approximate dynamic programming and reinforcement learning,” In Proc. of the 23rd int. conf. on Machine learning, ICML, pp. 449-456, 2006.
[7] R. Sutton, “Learning to predict by the method of temporal differences,” Machine Learning, Vol.3(1), pp. 9-44, 1988.
[8] R. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, Vol.8, pp. 1038-1044, 1996.
[9] R. Sutton and A. Barto, “Reinforcement Learning: An introduction,” Cambridge, MA: MIT Press, 1998.
[10] R. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy Gradient Methods for Reinforcement Learning with Function Approximation,” Advances in Neural Information Processing Systems, Vol.12, pp. 1057-1063, 2000.
[11] P. Stone, R. Sutton, and G. Kuhlmann, “Reinforcement Learning for RoboCup-Soccer Keepaway,” Adaptive Behavior, Vol.13(3), pp. 165-188, 2005.
[12] J. Peters and S. Schaal, “Natural Actor-Critic,” Neurocomputing, Vol.71, issue 7-9, pp. 1180-1190, 2008.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.