[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Reward Shaping for Statistical Optimisation of Dialogue Management

  • Conference paper
Statistical Language and Speech Processing (SLSP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

Abstract

This paper investigates the impact of reward shaping on a reinforcement learning-based spoken dialogue system’s learning.

A diffuse reward function gives a reward after each transition between two dialogue states. A sparse function only gives a reward at the end of the dialogue. Reward shaping consists of learning a diffuse function without modifying the optimal policy compared to a sparse one.

Two reward shaping methods are applied to a corpus of dialogues evaluated with numerical performance scores. Learning with these functions is compared to the sparse case and it is shown, on simulated dialogues, that the policies learnt after reward shaping lead to higher performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bos, J., Klein, E., Lemon, O., Oka, T.: DIPPER: Description and Formalisation of an Information-State Update Dialogue System Architecture. In: Proceedings of SIGdial Workshop on Discourse and Dialogue (2003)

    Google Scholar 

  2. Boularias, A., Chinaei, H.R., Chaib-draa, B.: Learning the reward model of dialogue pomdps from data. In: Proceedings of NIPS (2010)

    Google Scholar 

  3. Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)

    MATH  Google Scholar 

  4. Chandramohan, S., Geist, M., Lefèvre, F., Pietquin, O.: User simulation in dialogue systems using inverse reinforcement learning. In: Proceedings of Interspeech (2011)

    Google Scholar 

  5. El-Asri, L., Laroche, R., Pietquin, O.: Reward function learning for dialogue management. In: Proceedings of STAIRS (2012)

    Google Scholar 

  6. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)

    MathSciNet  Google Scholar 

  7. Larsen, L.B.: Issues in the evaluation of spoken dialogue systems using objective and subjective measures. In: Proceedings of IEEE ASRU, pp. 209–214 (2003)

    Google Scholar 

  8. Lemon, O., Georgila, K., Henderson, J., Stuttle, M.: An ISU dialogue system exhibiting reinforcement learning of dialogue policies: Generic slot-filling in the talk in-car system. In: Proceedings of EACL (2006)

    Google Scholar 

  9. Lemon, O., Georgila, K., Henderson, J., Stuttle, M.: An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the talk in-car system. In: Proceedings of EACL (2006)

    Google Scholar 

  10. Lemon, O., Pietquin, O.: Machine learning for spoken dialogue systems. In: Proceedings of Interspeech, pp. 2685–2688 (2007)

    Google Scholar 

  11. Li, L., Williams, J.D., Balakrishnan, S.: Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. In: Proceedings of Interspeech (2009)

    Google Scholar 

  12. Mataric, M.J.: Reward functions for accelerated learning. In: Proceedings of ICML, pp. 181–189 (1994)

    Google Scholar 

  13. Meguro, T., Higashinaka, R., Minami, Y., Dohsaka, K.: Controlling listening-oriented dialogue using partially observable markov decision processes. In: Proceedings of Coling (2010)

    Google Scholar 

  14. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of ICML, pp. 278–287 (1999)

    Google Scholar 

  15. Paek, T., Pieraccini, R.: Automating spoken dialogue management design using machine learning: An industry perspective. Speech Communication 50, 716–729 (2008)

    Article  Google Scholar 

  16. Pietquin, O., Geist, M., Chandramohan, S., Frezza-Buet, H.: Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transaction on Speech and Language Processing 7(3), 1–21 (2011)

    Article  Google Scholar 

  17. Pietquin, O., Rossignol, S., Ianotto, M.: Training Bayesian networks for realistic man-machine spoken dialogue simulation. In: Proceedings of IWSDS 2009 (2009)

    Google Scholar 

  18. Rieser, V., Lemon, O.: Learning and evaluation of dialogue strategies for new applications: Empirical methods for optimization from small data sets. Computational Linguistics 37 (2011)

    Google Scholar 

  19. Russell, S.: Learning agents for uncertain environments (extended abstract). In: Proceedings of COLT (1998)

    Google Scholar 

  20. Spearman, C.: The proof and measurement of association between two things. American Journal of Psychology 15, 72–101 (1904)

    Article  Google Scholar 

  21. Sugiyama, H., Meguro, T., Minami, Y.: Preference-learning based Inverse Reinforcement Learning for Dialog Control. In: Proceedings of Interspeech (2012)

    Google Scholar 

  22. Sutton, R.S., Barto, A.G.: Reinforcement Learning. An introduction, pp. 56–57. MIT Press (1998)

    Google Scholar 

  23. Walker, M.A., Fromer, J.C., Narayanan, S.: Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email. In: Proceedings of COLING/ACL, pp. 1345–1352 (1998)

    Google Scholar 

  24. Walker, M.A., Litman, D.J., Kamm, C.A., Abella, A.: PARADISE: a framework for evaluating spoken dialogue agents. In: Proceedings of EACL, pp. 271–280 (1997)

    Google Scholar 

  25. Williams, J.D., Young, S.: Partially observable markov decision processes for spoken dialog systems. Computer Speech and Language 21, 231–422 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

El Asri, L., Laroche, R., Pietquin, O. (2013). Reward Shaping for Statistical Optimisation of Dialogue Management. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39593-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39592-5

  • Online ISBN: 978-3-642-39593-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics