Reward Shaping for Statistical Optimisation of Dialogue Management

Layla El Asri^22,23,
Romain Laroche²² &
Olivier Pietquin²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

2797 Accesses
5 Citations

Abstract

This paper investigates the impact of reward shaping on a reinforcement learning-based spoken dialogue system’s learning.

A diffuse reward function gives a reward after each transition between two dialogue states. A sparse function only gives a reward at the end of the dialogue. Reward shaping consists of learning a diffuse function without modifying the optimal policy compared to a sparse one.

Two reward shaping methods are applied to a corpus of dialogues evaluated with numerical performance scores. Learning with these functions is compared to the sparse case and it is shown, on simulated dialogues, that the policies learnt after reward shaping lead to higher performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning

Article Open access 07 January 2023

Optimizing Dialogue Strategy in Large-Scale Spoken Dialogue System: A Learning Automaton Based Approach

Deep Reinforcement Learning for On-line Dialogue State Tracking

References

Bos, J., Klein, E., Lemon, O., Oka, T.: DIPPER: Description and Formalisation of an Information-State Update Dialogue System Architecture. In: Proceedings of SIGdial Workshop on Discourse and Dialogue (2003)
Google Scholar
Boularias, A., Chinaei, H.R., Chaib-draa, B.: Learning the reward model of dialogue pomdps from data. In: Proceedings of NIPS (2010)
Google Scholar
Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)
MATH Google Scholar
Chandramohan, S., Geist, M., Lefèvre, F., Pietquin, O.: User simulation in dialogue systems using inverse reinforcement learning. In: Proceedings of Interspeech (2011)
Google Scholar
El-Asri, L., Laroche, R., Pietquin, O.: Reward function learning for dialogue management. In: Proceedings of STAIRS (2012)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet Google Scholar
Larsen, L.B.: Issues in the evaluation of spoken dialogue systems using objective and subjective measures. In: Proceedings of IEEE ASRU, pp. 209–214 (2003)
Google Scholar
Lemon, O., Georgila, K., Henderson, J., Stuttle, M.: An ISU dialogue system exhibiting reinforcement learning of dialogue policies: Generic slot-filling in the talk in-car system. In: Proceedings of EACL (2006)
Google Scholar
Lemon, O., Georgila, K., Henderson, J., Stuttle, M.: An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the talk in-car system. In: Proceedings of EACL (2006)
Google Scholar
Lemon, O., Pietquin, O.: Machine learning for spoken dialogue systems. In: Proceedings of Interspeech, pp. 2685–2688 (2007)
Google Scholar
Li, L., Williams, J.D., Balakrishnan, S.: Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. In: Proceedings of Interspeech (2009)
Google Scholar
Mataric, M.J.: Reward functions for accelerated learning. In: Proceedings of ICML, pp. 181–189 (1994)
Google Scholar
Meguro, T., Higashinaka, R., Minami, Y., Dohsaka, K.: Controlling listening-oriented dialogue using partially observable markov decision processes. In: Proceedings of Coling (2010)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of ICML, pp. 278–287 (1999)
Google Scholar
Paek, T., Pieraccini, R.: Automating spoken dialogue management design using machine learning: An industry perspective. Speech Communication 50, 716–729 (2008)
Article Google Scholar
Pietquin, O., Geist, M., Chandramohan, S., Frezza-Buet, H.: Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transaction on Speech and Language Processing 7(3), 1–21 (2011)
Article Google Scholar
Pietquin, O., Rossignol, S., Ianotto, M.: Training Bayesian networks for realistic man-machine spoken dialogue simulation. In: Proceedings of IWSDS 2009 (2009)
Google Scholar
Rieser, V., Lemon, O.: Learning and evaluation of dialogue strategies for new applications: Empirical methods for optimization from small data sets. Computational Linguistics 37 (2011)
Google Scholar
Russell, S.: Learning agents for uncertain environments (extended abstract). In: Proceedings of COLT (1998)
Google Scholar
Spearman, C.: The proof and measurement of association between two things. American Journal of Psychology 15, 72–101 (1904)
Article Google Scholar
Sugiyama, H., Meguro, T., Minami, Y.: Preference-learning based Inverse Reinforcement Learning for Dialog Control. In: Proceedings of Interspeech (2012)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning. An introduction, pp. 56–57. MIT Press (1998)
Google Scholar
Walker, M.A., Fromer, J.C., Narayanan, S.: Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email. In: Proceedings of COLING/ACL, pp. 1345–1352 (1998)
Google Scholar
Walker, M.A., Litman, D.J., Kamm, C.A., Abella, A.: PARADISE: a framework for evaluating spoken dialogue agents. In: Proceedings of EACL, pp. 271–280 (1997)
Google Scholar
Williams, J.D., Young, S.: Partially observable markov decision processes for spoken dialog systems. Computer Speech and Language 21, 231–422 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 38-40 rue du Général Leclerc, 92794, Issy-les-Moulineaux, France
Layla El Asri & Romain Laroche
IMS-MaLIS Research Group, UMI 2958 (CNRS - GeorgiaTech), SUPELEC Metz Campus, 2 rue Edouard Belin, 57070, Metz, France
Layla El Asri & Olivier Pietquin

Authors

Layla El Asri
View author publications
You can also search for this author in PubMed Google Scholar
Romain Laroche
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Pietquin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Avinguda Catalunya, 35, 43002, Tarragona, Spain
Adrian-Horia Dediu & Carlos Martín-Vide &
Research Institute for Information and Language Processing, Research Group in Computational Linguistics, University of Wolverhampton, WV1 1SB, Wolverhampton, UK
Ruslan Mitkov
Fakultät für Informatik, Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Bianca Truthe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El Asri, L., Laroche, R., Pietquin, O. (2013). Reward Shaping for Statistical Optimisation of Dialogue Management. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-39593-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39592-5
Online ISBN: 978-3-642-39593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reward Shaping for Statistical Optimisation of Dialogue Management

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning

Optimizing Dialogue Strategy in Large-Scale Spoken Dialogue System: A Learning Automaton Based Approach

Deep Reinforcement Learning for On-line Dialogue State Tracking

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Reward Shaping for Statistical Optimisation of Dialogue Management

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning

Optimizing Dialogue Strategy in Large-Scale Spoken Dialogue System: A Learning Automaton Based Approach

Deep Reinforcement Learning for On-line Dialogue State Tracking

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation