Empirical evaluation methods for multiobjective reinforcement learning algorithms

Peter Vamplew¹,
Richard Dazeley¹,
Adam Berry²,
Rustam Issabekov¹ &
…
Evan Dekker¹

6538 Accesses
146 Citations
3 Altmetric
Explore all metrics

Abstract

While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms.

Article PDF

Reinforcement Learning Algorithms: Categorization and Structural Properties

Synergies Between Reinforcement Learning and Evolutionary Dynamic Optimisation

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aissani, N., Beldjilali, B., & Trentesaux, D. (2008). Efficient and effective reactive scheduling of manufacturing system using sarsa-multi-objective agents. In MOSIM’08: 7th conference internationale de modelisation and simulation, Paris, April 2008.
Google Scholar
Barrett, L., & Narayanan, S. (2008). Learning all optimal policies with multiple criteria. In Proceedings of the international conference on machine learning.
Google Scholar
Berry, A. (2008). Escaping the bounds of generality—unbounded bi-objective optimisation. Ph.D. thesis, School of Computing, University of Tasmania.
Berry, D. A., & Fristedt, B. (1985). Bandit problems: sequential allocation of experiments. London: Chapman and Hall.
MATH Google Scholar
Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: safely approximating the value function, NIPS-7.
Castelletti, A., Corani, G., Rizzolli, A., Soncinie-Sessa, R., & Weber, E. (2002). Reinforcement learning in the operational management of a water system. In IFAC workshop on modeling and control in environmental issues, Keio University, Yokohama, Japan (pp. 325–330).
Chaterjee, K., Majumdar, R., & Henzinger, T. (2006). Markov decision processes with multiple objectives. In Lecture notes in computer science: Vol. 3884. Proceedings of the 23rd international conference on theoretical aspects of computer science (STACS) (pp. 325–336). Berlin: Springer.
Google Scholar
Coello, C. A. C., Veldhuizen, D. A. V., & Lamont, G. B. (2002). Evolutionary algorithms for solving multi-objective problems. Dordrecht: Kluwer Academic.
MATH Google Scholar
Crabbe, F. L. (2001). Multiple goal Q-learning: Issues and functions. In Proceedings of the international conference on computational intelligence for modelling control and automation (CIMCA). San Mateo: Morgan Kaufmann.
Google Scholar
Dutech, A., Edmunds, T., Kok, J., Lagoudakis, M., Littman, M., Riedmiller, M., Russell, B., Scherrer, B., Sutton, R., Timmer, S., Vlassis, N., White, A., & Whiteson, S. (2005). Reinforcement learning benchmarks and bake-offs ii. In Workshop at advances in neural information processing systems conference.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Gabor, Z., Kalmar, Z., & Szepesvari, C. (1998). Multi-criteria reinforcement learning. In The fifteenth international conference on machine learning (pp. 197–205).
Google Scholar
Geibel, P. (2006). Reinforcement learning for MDPs with constraints. In ECML 2006: European conference on machine learning (pp. 646–653).
Chapter Google Scholar
Handa, H. (2009). Solving multi-objective reinforcement learning problems by EDA-RL—acquisition of various strategies. In Proceedings of the 2009 ninth international conference on intelligent systems design and applications (pp. 426–431).
Chapter Google Scholar
Horn, J., Nafpliotis, N., & Goldberg, D. E. (1994). A niched Pareto genetic algorithm for multiobjective optimisation. In Proceedings of the first IEEE conference on evolutionary computation, IEEE world congress on computational intelligence.
Google Scholar
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285.
Google Scholar
Knowles, J. D., Thiele, L., & Zitzler, E. (2006). A tutorial on the performance assessment of stochastive multiobjective optimizers (TIK-Report No. 214). Computer engineering and networks laboratory, ETH Zurich, February 2006.
Mannor, S., & Shimkin, N. (2001). The steering approach for multi-criteria reinforcement learning. In Neural information processing systems, Vancouver, Canada (pp. 1563–1570).
Google Scholar
Mannor, S., & Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. Journal of Machine Learning Research, 5, 325–360.
MathSciNet Google Scholar
Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. In International conference on machine learning, Bonn, Germany (pp. 601–608).
Google Scholar
Pareto, V. (1896). Manuel d’economie politique. Paris: Giard.
Google Scholar
Perez, J., Germain-Renaud, C., Kegl, B., & Loomis, C. (2009). Responsive elastic computing. In International conference on autonomic computing, Barcelona (pp. 55–64).
Google Scholar
Shelton, C. R. (2001). Importance sampling for reinforcement learning with multiple objectives (Tech. Report No. 2001-003). Massachusetts Institute of Technology, AI Laboratory.
Srinivas, N., & Deb, K. (1994). Multiobjective optimisation using nondominated sorting in genetic algorithms. Evolutionary Computation, 2(3), 221–248.
Article Google Scholar
Sutton, R. S. (1996). Generalisation in reinforcement learning: successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems: proceedings of the 1995 conference (pp. 1038–1044). Cambridge: MIT Press.
Google Scholar
Tanner, B., & White, A. (2009). RL-glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136.
Google Scholar
Tesauro, G., Das, R., Chan, H., Kephart, J. O., Lefurgy, C., Levine, D. W., & Rawson, F. (2007). Managing power consumption and performance of computing systems using reinforcement learning. Neural information processing systems.
UMass (2010). University of Massachusetts reinforcement learning repository. http://www-all.cs.umass.edu/rlr/.
Vamplew, P., Dazeley, R., Barker, E., & Kelarev, A. (2009). Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks. In Lecture notes in artificial intelligence. Proceedings of AI09: the 22nd Australasian conference on artificial intelligence, Melbourne, Australia, December 2009. Berlin: Springer.
Google Scholar
Vamplew, P., Yearwood, J., Dazeley, R., & Berry, A. (2008). On the limitations of scalarisation for multiobjective learning of Pareto fronts. In W. Wobcke & M. Zhang (Eds.), Lecture notes in artificial intelligence: Vol. 5360. Proceedings of AI08: the 21st Australasian conference on artificial intelligence Auckland, New Zealand, December 2008 (pp. 372–378). Berlin: Springer.
Google Scholar
White, A. (2006). A standard system for benchmarking in reinforcement learningi. Master’s thesis, University of Alberta, Alberta, Canada.
Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2009). Generalized domains for empirical evaluations in reinforcement learning. In Proceedings of the 4th workshop on evaluation methods for machine learning at ICML-09, Montreal, Canada.
Google Scholar
Wiering, M. A., & de Jong, E. D. (2007). Computing optimal stationary policies for multi-objective Markov decision processes. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL) (pp. 158–165).
Chapter Google Scholar
Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), 257–271.
Article Google Scholar
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., & Grunert da Fonseca, V. (2003). Performance assessment of multiobjective optimizers: an analysis and review. IEEE Transactions on Evolutionary Computation, 7(2), 117–132.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Technology and Mathematical Sciences, University of Ballarat, P.O. Box 663, Ballarat, Victoria, 3353, Australia
Peter Vamplew, Richard Dazeley, Rustam Issabekov & Evan Dekker
CSIRO Energy Centre, 10 Murray Dwyer Circuit, Steel River Estate, Mayfield West, New South Wales, 2304, Australia
Adam Berry

Authors

Peter Vamplew
View author publications
You can also search for this author in PubMed Google Scholar
Richard Dazeley
View author publications
You can also search for this author in PubMed Google Scholar
Adam Berry
View author publications
You can also search for this author in PubMed Google Scholar
Rustam Issabekov
View author publications
You can also search for this author in PubMed Google Scholar
Evan Dekker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Vamplew.

Additional information

Editors: S. Whiteson and M. Littman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vamplew, P., Dazeley, R., Berry, A. et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84, 51–80 (2011). https://doi.org/10.1007/s10994-010-5232-5

Download citation

Received: 26 February 2010
Revised: 24 November 2010
Accepted: 03 December 2010
Published: 22 December 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10994-010-5232-5

Empirical evaluation methods for multiobjective reinforcement learning algorithms

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Learning Algorithms: Categorization and Structural Properties

Synergies Between Reinforcement Learning and Evolutionary Dynamic Optimisation

A practical guide to multi-objective reinforcement learning and planning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Empirical evaluation methods for multiobjective reinforcement learning algorithms

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Learning Algorithms: Categorization and Structural Properties

Synergies Between Reinforcement Learning and Evolutionary Dynamic Optimisation

A practical guide to multi-objective reinforcement learning and planning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation