[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3341069.3342970acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

Multi-attending Memory Network for Modeling Multi-turn Dialogue

Published: 22 June 2019 Publication History

Abstract

Modeling and reasoning about the dialogue history is a main challenge for building a good multi-turn conversational agent. End-to-end memory networks with recurrent or gated architectures have been demonstrated promising for conversation modeling. However, it still suffers from relatively low computational efficiency for its complex architectures and costly strong supervision information or fixed priori knowledge. This paper proposes a multi-head attention based end-to-end approach called multi-attending memory network without additional information or knowledge, which can effectively model and reason about multi-turn history dialogue. Specifically, a parallel multi-head attention mechanism is introduced to model conversational context via attending to different important sections of a full dialog. Thereafter, a stacked architecture with shortcut connections is presented to reason about the memory (the result of context modeling). Experiments on the bAbI-dialog datasets demonstrate the effectiveness of proposed approach.

References

[1]
Hongshen Chen, Xiaorui Liu, Dawei Yin, Jiliang Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 2017, 19(2): 25--35.
[2]
Yafang Huang, Zuchao Li, Zhuosheng Zhang, Hai Zhao. Moon IME: Neural-based Chinese Pinyin Aided Input Method with Customizable Association. In proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018(4): 140--145.
[3]
Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao, Gongshen Liu. Modeling Multi-turn Conversation with Deep Utterance Aggregation. In proceedings of the 27th International Conference on Computational Linguistics, 2018: 3740--3752.
[4]
Minghui Qiu, Feng-Lin Li, Siyu Wang, Xing Gao, Yan Chen, Weipeng Zhao, Haiqing Chen, Jun Huang, Wei Chu: AliMe Chat. A Sequence to Sequence and Rerank based Chatbot Engine. In proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 2017(2): 498--503.
[5]
Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation, 1997, 9(8): 1735--1780.
[6]
Chung J, Gulcehre C, Cho K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv, 2014:1412.3555.
[7]
Ilya Sutskever, Oriol Vinyals, Quoc V. Le. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 28th In Advances in Neural Information Processing Systems, 2014: 3104--3112.
[8]
Liyuan Liu, Jingbo Shang, Xiang Ren, Frank Fangzheng Xu, Huan Gui, Jian Peng, Jiawei Han. Empower Sequence Labeling with Task-Aware Neural Language Model. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, 2018: 5253--5260.
[9]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention is All you Need. In Proceedings of the 31th In Advances in Neural Information Processing Systems, 2017: 6000--6010.
[10]
Bordes A, Boureau Y L, Weston J. Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683, 2016.
[11]
Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber. Taining Very Deep Networks. In Proceedings of the 29th In Advances in Neural Information Processing Systems, 2015: 2377--2385.
[12]
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv, 2014:1409.0473.
[13]
Yoon Kim, Carl Denton, Luong Hoang, and Alexander M. Rush. Structured attention networks. In International Conference on Learning Representations, 2017.
[14]
Serban I V, Klinger T, Tesauro G, et al. Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation. In Proceedings of the 31th AAAI Conference on Artificial Intelligence, 2017: 3288--3294.
[15]
Weston J, Chopra S, Bordes A. Memory Networks. In International Conference on Learning Representations, 2015.
[16]
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus. End-To-End Memory Networks. In Proceedings of the 29th Advances in Neural Information Processing Systems, 2015: 2440--2448.
[17]
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In Proceedings of the 33th International Conference on Machine Learning, 2016: 1378--1387.
[18]
Alexander H. Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, Jason Weston. Key-Value Memory Networks for Directly Reading Documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 1400--1409.
[19]
Fei Liu, Julien Perez. Gated End-to-End Memory Networks. In Proceedings of the European Chapter of the Association for Computational Linguistics, 2017(1): 1--10.
[20]
Caiming Xiong, Stephen Merity, Richard Socher. Dynamic Memory Networks for Visual and Textual Question Answering. In Proceedings of the 33th International Conference on Machine Learning, 2016: 2397--2406.
[21]
Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv, 2018: 1810.04805.
[22]
Chung J, Gulcehre C, Cho K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv, 2014: 1412.3555.
[23]
Chiu, Chung-Cheng, et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2018: 4774--4778.
[24]
Young T, Hazarika D, Poria S, et al. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 2018, 13(3): 55--75.
[25]
Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 1994, 5(2): 157--166.
[26]
Schmidhuber J. Deep learning in neural networks: An overview. Neural networks, 2015, 61: 85--117.
[27]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770--778.
[28]
Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv preprint arXiv, 2015: 1505.00387.
[29]
Matthew Henderson, Blaise Thomson, Jason D. Williams. The Second Dialog State Tracking Challenge. SIGDIAL Conference 2014: 263--272.
[30]
Hess R A. Human-in-the-loop control. Control System Applications. CRC Press, 2018: 327--334.

Index Terms

  1. Multi-attending Memory Network for Modeling Multi-turn Dialogue

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference
    June 2019
    293 pages
    ISBN:9781450371858
    DOI:10.1145/3341069
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multi-attending memory network
    2. multi-head attention
    3. multi-turn dialogue
    4. shortcut connections

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the Strategy Priority Research Program of Chinese Academy of Sciences
    • the National Key Research and Development Program of China
    • the National Natural Science Foundation of China
    • the National Natural Science Foundation of China together with the National Re-search Foundation of Singapore

    Conference

    HPCCT 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 92
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media