Abstract
We focus on a potential capability of Exploitation-oriented Learning (XoL) in non-Markov multi-agent environments. XoL has some degree of rationality in non-Markov environments and is also confirmed the effectiveness by computer simulations. Penalty Avoiding Rational Policy Making algorithm (PARP) that is one of XoL methods was planed to learn a penalty avoiding policy. PARP is improved to save memories and to cope with uncertainties, that is called Improved PARP. Though the effectiveness of Improved PARP has been confirmed on computer simulations, there is no result in real world environment. In this paper, we show the effectiveness of Improved PARP in real world environment using a keepaway task that is a testbed of multi-agent soccer environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Proc. of the 22nd International Conference on Machine Learning, pp. 1–8 (2005)
Arai, S., Tanaka, N.: Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains – RoboCup Soccer Keepaway. Transactions of the Japanese Society for Artificial Intelligence 21(6), 537–546 (2006) (in Japanese)
Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value function. In: Proc. of the 15th Int. Conf. on Machine Learning, pp. 278–286 (1998)
Hong, T., Wu, C.: An Improved Weighted Clustering Algorithm for Determination of Application Nodes in Heterogeneous Sensor Networks. J. of Information Hiding and Multimedia Signal Processing. 2(2), 173–184 (2011)
Kuroda, S., Miyazaki, K., Kobayashi, H.: Introduction of Fixed Mode States into Online Profit Sharing and Its Application to Waist Trajectory Generation of Biped Robot. In: European Workshop on Reinforcement Learning 9 (2011)
Lin, T.C., Huang, H.C., Liao, B.Y., Pan, J.S.: An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index. International Journal of Computer Sciences and Engineering Systems 1(4), 253–257 (2007)
Miyazaki, K., Yamamura, M., Kobayashi, S.: On the Rationality of Profit Sharing in Reinforcement Learning. In: Proc. of the 3rd Int. Conf. on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285–288 (1994)
Miyazaki, K., Kobayashi, S.: Learning Deterministic Policies in Partially Observable Markov Decision Processes. In: Proc. of 5th Int. Conf. on Intelligent Autonomous System, pp. 250–257 (1998)
Miyazaki, K., Kobayashi, S.: Reinforcement Learning for Penalty Avoiding Policy Making. In: Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206–211 (2000)
Miyazaki, K., Kobayashi, S.: A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces. J. of Advanced Computational Intelligence and Intelligent Informatics 11(6), 668–676 (2007)
Miyazaki, K., Kobayashi, S.: Exploitation-Oriented Learning PS-r#. J. of Advanced Computational Intelligence and Intelligent Informatics 13(6), 624–630 (2009)
Ng, A.Y.,, Russell, S.J.: Algorithms for Inverse Reinforcement Learning. In: Proc. of the 17th Int. Conf. on Machine Learning, pp. 663–670 (2000)
Stone, P., Sutton, R.S., Kuhlamann, G.: Reinforcement Learning toward RoboCup Soccer Keepaway. Adaptive Behavior 13(3), 0165–0188 (2005)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book. MIT Press (1998)
Watanabe, T., Miyazaki, K., Kobayashi, H.: A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces. J. of Advanced Computational Intelligence and Intelligent Informatics. 13(6), 675–682 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miyazaki, K., Itou, M., Kobayashi, H. (2012). Evaluation of the Improved Penalty Avoiding Rational Policy Making Algorithm in Real World Environment. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Intelligent Information and Database Systems. ACIIDS 2012. Lecture Notes in Computer Science(), vol 7196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28487-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-28487-8_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28486-1
Online ISBN: 978-3-642-28487-8
eBook Packages: Computer ScienceComputer Science (R0)