Abstract
While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aissani, N., Beldjilali, B., & Trentesaux, D. (2008). Efficient and effective reactive scheduling of manufacturing system using sarsa-multi-objective agents. In MOSIM’08: 7th conference internationale de modelisation and simulation, Paris, April 2008.
Barrett, L., & Narayanan, S. (2008). Learning all optimal policies with multiple criteria. In Proceedings of the international conference on machine learning.
Berry, A. (2008). Escaping the bounds of generality—unbounded bi-objective optimisation. Ph.D. thesis, School of Computing, University of Tasmania.
Berry, D. A., & Fristedt, B. (1985). Bandit problems: sequential allocation of experiments. London: Chapman and Hall.
Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: safely approximating the value function, NIPS-7.
Castelletti, A., Corani, G., Rizzolli, A., Soncinie-Sessa, R., & Weber, E. (2002). Reinforcement learning in the operational management of a water system. In IFAC workshop on modeling and control in environmental issues, Keio University, Yokohama, Japan (pp. 325–330).
Chaterjee, K., Majumdar, R., & Henzinger, T. (2006). Markov decision processes with multiple objectives. In Lecture notes in computer science: Vol. 3884. Proceedings of the 23rd international conference on theoretical aspects of computer science (STACS) (pp. 325–336). Berlin: Springer.
Coello, C. A. C., Veldhuizen, D. A. V., & Lamont, G. B. (2002). Evolutionary algorithms for solving multi-objective problems. Dordrecht: Kluwer Academic.
Crabbe, F. L. (2001). Multiple goal Q-learning: Issues and functions. In Proceedings of the international conference on computational intelligence for modelling control and automation (CIMCA). San Mateo: Morgan Kaufmann.
Dutech, A., Edmunds, T., Kok, J., Lagoudakis, M., Littman, M., Riedmiller, M., Russell, B., Scherrer, B., Sutton, R., Timmer, S., Vlassis, N., White, A., & Whiteson, S. (2005). Reinforcement learning benchmarks and bake-offs ii. In Workshop at advances in neural information processing systems conference.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Gabor, Z., Kalmar, Z., & Szepesvari, C. (1998). Multi-criteria reinforcement learning. In The fifteenth international conference on machine learning (pp. 197–205).
Geibel, P. (2006). Reinforcement learning for MDPs with constraints. In ECML 2006: European conference on machine learning (pp. 646–653).
Handa, H. (2009). Solving multi-objective reinforcement learning problems by EDA-RL—acquisition of various strategies. In Proceedings of the 2009 ninth international conference on intelligent systems design and applications (pp. 426–431).
Horn, J., Nafpliotis, N., & Goldberg, D. E. (1994). A niched Pareto genetic algorithm for multiobjective optimisation. In Proceedings of the first IEEE conference on evolutionary computation, IEEE world congress on computational intelligence.
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285.
Knowles, J. D., Thiele, L., & Zitzler, E. (2006). A tutorial on the performance assessment of stochastive multiobjective optimizers (TIK-Report No. 214). Computer engineering and networks laboratory, ETH Zurich, February 2006.
Mannor, S., & Shimkin, N. (2001). The steering approach for multi-criteria reinforcement learning. In Neural information processing systems, Vancouver, Canada (pp. 1563–1570).
Mannor, S., & Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. Journal of Machine Learning Research, 5, 325–360.
Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. In International conference on machine learning, Bonn, Germany (pp. 601–608).
Pareto, V. (1896). Manuel d’economie politique. Paris: Giard.
Perez, J., Germain-Renaud, C., Kegl, B., & Loomis, C. (2009). Responsive elastic computing. In International conference on autonomic computing, Barcelona (pp. 55–64).
Shelton, C. R. (2001). Importance sampling for reinforcement learning with multiple objectives (Tech. Report No. 2001-003). Massachusetts Institute of Technology, AI Laboratory.
Srinivas, N., & Deb, K. (1994). Multiobjective optimisation using nondominated sorting in genetic algorithms. Evolutionary Computation, 2(3), 221–248.
Sutton, R. S. (1996). Generalisation in reinforcement learning: successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems: proceedings of the 1995 conference (pp. 1038–1044). Cambridge: MIT Press.
Tanner, B., & White, A. (2009). RL-glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136.
Tesauro, G., Das, R., Chan, H., Kephart, J. O., Lefurgy, C., Levine, D. W., & Rawson, F. (2007). Managing power consumption and performance of computing systems using reinforcement learning. Neural information processing systems.
UMass (2010). University of Massachusetts reinforcement learning repository. http://www-all.cs.umass.edu/rlr/.
Vamplew, P., Dazeley, R., Barker, E., & Kelarev, A. (2009). Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks. In Lecture notes in artificial intelligence. Proceedings of AI09: the 22nd Australasian conference on artificial intelligence, Melbourne, Australia, December 2009. Berlin: Springer.
Vamplew, P., Yearwood, J., Dazeley, R., & Berry, A. (2008). On the limitations of scalarisation for multiobjective learning of Pareto fronts. In W. Wobcke & M. Zhang (Eds.), Lecture notes in artificial intelligence: Vol. 5360. Proceedings of AI08: the 21st Australasian conference on artificial intelligence Auckland, New Zealand, December 2008 (pp. 372–378). Berlin: Springer.
White, A. (2006). A standard system for benchmarking in reinforcement learningi. Master’s thesis, University of Alberta, Alberta, Canada.
Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2009). Generalized domains for empirical evaluations in reinforcement learning. In Proceedings of the 4th workshop on evaluation methods for machine learning at ICML-09, Montreal, Canada.
Wiering, M. A., & de Jong, E. D. (2007). Computing optimal stationary policies for multi-objective Markov decision processes. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL) (pp. 158–165).
Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), 257–271.
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., & Grunert da Fonseca, V. (2003). Performance assessment of multiobjective optimizers: an analysis and review. IEEE Transactions on Evolutionary Computation, 7(2), 117–132.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: S. Whiteson and M. Littman.
Rights and permissions
About this article
Cite this article
Vamplew, P., Dazeley, R., Berry, A. et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84, 51–80 (2011). https://doi.org/10.1007/s10994-010-5232-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5232-5