Preference-Based Reinforcement Learning Using Dyad Ranking

Dirk Schäfer¹⁷ &
Eyke Hüllermeier¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11198))

Included in the following conference series:

International Conference on Discovery Science

933 Accesses

Abstract

Preference-based reinforcement learning has recently been introduced as a generalization of conventional reinformcement learning. Instead of numerical rewards, which are often difficult to specify, the former assumes weaker feedback in the form of qualitative preferences between states or trajectories. A specific realization of preference-based reinforcement learning is approximate policy iteration using label ranking. We propose an extension of this method, in which label ranking is replaced by so-called dyad ranking. The main advantage of this extension is the ability of dyad ranking to learn from feature descriptions of actions, which are often available in reinforcement learning. Several simulation studies are conducted to confirm the usefulness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Offline Policy Comparison Under Limited Historical Agent-Environment Interactions

Taxonomy of Reinforcement Learning Algorithms

Reinforcement Learning

Notes

1.
Note that the number of actions is not fixed per rollout and rather depends on the quality of the current policy. This includes the case that rollouts can stop prematurely before the maximal trajectory length L is reached.
2.
Throughout all experiments we used the RPC method in conjunction with logistic regression.

References

Akrour, R., Schoenauer, M., Sebag, M.: Preference-based policy learning. In: Proceedings of ECML/PKDD-2011, Athens, Greece (2011)
Google Scholar
Brazdil, P., Giraud-Carrier, C.G.: Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Mach. Learn. 107(1), 1–14 (2018)
Article MathSciNet Google Scholar
Cheng, W., Fürnkranz, J., Hüllermeier, E., Park, S.H.: Preference-based policy iteration: leveraging preference learning for reinforcement learning. In: Proceedings of ECML/PKDD-2011, Athens, Greece (2011)
Google Scholar
Dimitrakakis, C., Lagoudakis, M.G.: Rollout sampling approximate policy iteration. Mach. Learn. 72(3), 157–171 (2008)
Article Google Scholar
Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89(1–2), 123–156 (2012)
Article MathSciNet Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice Hall, Englewood Cliffs (2002)
Google Scholar
Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artif. Intell. 172, 1897–1917 (2008)
Article MathSciNet Google Scholar
Lagoudakis, M., Parr, R.: Reinforcement learning as classification: leveraging modern classifiers. In: Proceedings of ICML, 20th International Conference on Machine Learning, vol. 20, pp. 424–431. AAAI Press (2003)
Google Scholar
Schäfer, D., Hüllermeier, E.: Plackett-Luce networks for dyad ranking. In: Workshop LWDA, Lernen, Wissen, Daten, Analysen, Potsdam, Germany (2016)
Google Scholar
Schäfer, D., Hüllermeier, E.: Dyad ranking using a bilinear Plackett-Luce model. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 227–242. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23525-7_14
Chapter Google Scholar
Schäfer, D., Hüllermeier, E.: Dyad ranking using Plackett-Luce models based on joint feature representations. Mach. Learn. (2018)
Google Scholar
Settles, B.: Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison (2008)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
MathSciNet MATH Google Scholar
Vembu, S., Gärtner, T.: Label ranking: a survey. In: Fürnkranz, J., Hüllermeier, E., (eds.) Preference Learning. Springer (2010)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 272–292 (1992)
MATH Google Scholar
Wirth, C., Akrour, R., Neumann, G., Fürnkranz, J.: A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18, 136:1–136:46 (2017)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017), arXiv:1708.07747
Zhao, Y., Kosorok, M.R., Zeng, D.: Reinforcement learning design for cancer clinical trials. Stat. Med. 28(15), 1982–1998 (2009)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the German Research Foundation (DFG) within the Collaborative Research Center “On-The-Fly Computing” (SFB 901). We are grateful to Javad Rahnama for his help with the case study on image pipeline configuration.

Author information

Authors and Affiliations

Department of Computer Science, Paderborn University, Paderborn, Germany
Dirk Schäfer & Eyke Hüllermeier

Authors

Dirk Schäfer
View author publications
You can also search for this author in PubMed Google Scholar
Eyke Hüllermeier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk Schäfer .

Editor information

Editors and Affiliations

Goldsmiths University of London, London, UK
Larisa Soldatova
Eindhoven University of Technology, Eindhoven, The Netherlands
Joaquin Vanschoren
University of Cyprus, Nicosia, Cyprus
George Papadopoulos
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schäfer, D., Hüllermeier, E. (2018). Preference-Based Reinforcement Learning Using Dyad Ranking. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds) Discovery Science. DS 2018. Lecture Notes in Computer Science(), vol 11198. Springer, Cham. https://doi.org/10.1007/978-3-030-01771-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-01771-2_11
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01770-5
Online ISBN: 978-3-030-01771-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Preference-Based Reinforcement Learning Using Dyad Ranking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Offline Policy Comparison Under Limited Historical Agent-Environment Interactions

Taxonomy of Reinforcement Learning Algorithms

Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Preference-Based Reinforcement Learning Using Dyad Ranking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Offline Policy Comparison Under Limited Historical Agent-Environment Interactions

Taxonomy of Reinforcement Learning Algorithms

Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation