Multi-Hop Question Generation Using Hierarchical Encoding-Decoding and Context Switch Mechanism
Abstract
:1. Introduction
2. Related Work and Background
2.1. Question Generation
2.2. Multi-Hop Question Generation
2.3. Evaluation of Question Generation
2.4. Seq2seq Generation Model and Attention-Based Decoder
RNN-Based Seq2Seq Model
2.5. Attention-Based Decoder
3. Model Architecture
3.1. Encoder
3.2. Decoder
3.3. The Context Switch Mechanism
3.4. Training Objective
4. Experiments
4.1. Data Reparation
4.2. Training and Inference Setup
4.3. Evaluation
4.3.1. Evaluated Models
- Our model-1: Our proposed hierarchical encoding-decoding QG model;
- Our model-2: The proposed QG model integrated with a larger dictionary that mitigates all unknown tokens;
- Semantic-Graph: A framework that contains semantic graphs and an encoder using an attention-based gated graph neural network [28];
- Semantic-Graph*: Semantic-Graph with the context switch mechanism;
- RNN: A vanilla RNN-based seq2seq model;
- GPT-2: A large transformer-based language model [37].
4.3.2. Automatic Evaluation
- BLEU-N: A method that measures the precision based on the n-gram overlap between generated questions and references [13]. We compute BLEU-[1,2,3,4] in this experiment.
- ROUGE-L: ROUGE-L is a method that measures precision and recall on the longest common subsequence (LCS) overlap between system outputs and references [38].
- METEOR: METEOR uses a set of stages (e.g., word stemming, synonyms, etc.) to generate the mappings of unigrams between system outputs and references, and compute the weighted harmonic mean of precision and recall based on the mappings [15]. Recall has a higher weight than precision.
4.3.3. Human Evaluation
4.3.4. Questions of Different Types
5. Discussion and Future Work
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Das, B.; Majumder, M.; Phadikar, S.; Sekh, A.A. Automatic question generation and answer assessment: A survey. Res. Pract. Technol. Enhanc. Learn. 2021, 16, 1–15. [Google Scholar] [CrossRef]
- Graesser, A.C.; Chipman, P.; Haynes, B.C.; Olney, A. AutoTutor: An intelligent tutoring system with mixed-initiative dialogue. IEEE Trans. Educ. 2005, 48, 612–618. [Google Scholar] [CrossRef]
- Kurdi, G.; Leo, J.; Parsia, B.; Sattler, U.; Al-Emari, S. A systematic review of automatic question generation for educational purposes. Int. J. Artif. Intell. Educ. 2020, 30, 121–204. [Google Scholar] [CrossRef] [Green Version]
- Room, C. Question generation. Algorithms 2020, 12, 43. [Google Scholar]
- Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W.; Salakhutdinov, R.; Manning, C.D. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2369–2380. [Google Scholar] [CrossRef] [Green Version]
- Heilman, M.; Smith, N.A. Question Generation via Overgenerating Transformations and Ranking. Available online: https://apps.dtic.mil/sti/citations/ADA531042 (accessed on 1 January 2009).
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014. [Google Scholar]
- Du, X.; Shao, J.; Cardie, C. Learning to Ask: Neural Question Generation for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017. [Google Scholar]
- Zhang, S.; Bansal, M. Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.; Potts, C. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013. [Google Scholar]
- Tai, K.S.; Socher, R.; Manning, C.D. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015. [Google Scholar]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002. [Google Scholar]
- Lin, C.Y.; Hovy, E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, AB, Canada, 27 May–1 June 2003. [Google Scholar]
- Banerjee, S.; Lavie, A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 9–10 April 2005. [Google Scholar]
- Reiter, E. A Structured Review of the Validity of BLEU. Comput. Linguist. 2018, 44, 393–401. [Google Scholar] [CrossRef]
- Pan, L.; Lei, W.; Chua, T.; Kan, M. Recent Advances in Neural Question Generation. arXiv 2019, arXiv:1905.08949. [Google Scholar]
- Sun, X.; Liu, J.; Lyu, Y.; He, W.; Ma, Y.; Wang, S. Answer-focused and Position-aware Neural Question Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 Ocotber–4 November 2018. [Google Scholar]
- See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017. [Google Scholar]
- Ma, X.; Zhu, Q.; Zhou, Y.; Li, X.; Wu, D. Improving Question Generation with Sentence-level Semantic Matching and Answer Position Inferring. arXiv 2020, arXiv:1912.00879. [Google Scholar] [CrossRef]
- Chen, Y.; Wu, L.; Zaki, M.J. Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation. In Proceedings of the 2019 International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Dhole, K.; Manning, C.D. Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online Conference, 5–10 July 2020. [Google Scholar]
- Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016. [Google Scholar]
- Zhou, W.; Zhang, M.; Wu, Y. Question-type Driven Question Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Tuan, L.A.; Shah, D.; Barzilay, R. Capturing greater context for question generation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Duan, N.; Tang, D.; Chen, P.; Zhou, M. Question Generation for Question Answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
- Gupta, D.; Chauhan, H.; Akella, R.T.; Ekbal, A.; Bhattacharyya, P. Reinforced Multi-task Approach for Multi-hop Question Generation. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 13–18 September 2020. [Google Scholar]
- Pan, L.; Xie, Y.; Feng, Y.; Chua, T.S.; Kan, M.Y. Semantic Graphs for Generating Deep Questions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
- Xie, Y.; Pan, L.; Wang, D.; Kan, M.Y.; Feng, Y. Exploring Question-Specific Rewards for Generating Deep Questions. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 13–18 September 2020. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Williams, R.J.; Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1989, 1, 270–280. [Google Scholar] [CrossRef]
- Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 12 December 2014. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Gardner, M.; Grus, J.; Neumann, M.; Tafjord, O.; Dasigi, P.; Liu, N.F.; Peters, M.; Schmitz, M.; Zettlemoyer, L. AllenNLP: A Deep Semantic Natural Language Processing Platform. In Proceedings of the Workshop for NLP Open Source Software (NLP-OSS), Melbourne, Australia, 15–20 July 2018. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. Available online: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (accessed on 11 October 2021).
- Lin, C.Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
- Patil, C.; Patwardhan, M. Visual Question Generation: The State of the Art. ACM Comput. Surv. 2020, 53, 1–22. [Google Scholar] [CrossRef]
- Castellano, G.; Vessio, G. Deep learning approaches to pattern extraction and recognition in paintings and drawings: An overview. Neural Comput. Appl. 2021, 33, 12263–12282. [Google Scholar] [CrossRef]
Model | ROUGE-L | METEOR | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 |
---|---|---|---|---|---|---|
Our model-2 | 27.34 | 18.25 | 26.48 | 13.87 | 8.46 | 5.54 |
Our model-1 | 26.92 | 17.66 | 26.60 | 13.69 | 8.34 | 5.47 |
RNN | 26.43 | 16.56 | 25.21 | 13.06 | 8.03 | 5.37 |
Semantic-Graph* | 26.06 | 20.71 | 27.20 | 14.41 | 9.02 | 6.05 |
GPT-2 | 26.82 | 16.62 | 27.01 | 17.72 | 12.49 | 9.14 |
Semantic-Graph | 25.74 | 20.32 | 26.55 | 13.94 | 8.57 | 5.56 |
Model | N | Overall | Fluency | Relevance | Answerability | Complexity |
---|---|---|---|---|---|---|
Gold | 408 | 5.05 | 5.16 | 5.00 | 5.09 | 4.97 |
Our model-2 | 395 | 4.96 | 5.01 | 4.94 | 4.93 | 4.94 |
Our model-1 | 417 | 4.86 | 4.91 | 4.87 | 4.80 | 4.87 |
RNN | 382 | 4.83 | 4.79 | 4.89 | 4.80 | 4.85 |
Semantic-Graph* | 394 | 4.74 | 4.63 | 4.80 | 4.79 | 4.76 |
GPT-2 | 418 | 4.69 | 4.77 | 4.72 | 4.52 | 4.77 |
Semantic-Graph | 406 | 4.62 | 4.64 | 4.68 | 4.57 | 4.58 |
Model | What | Which | Who | Other | How | Where | When |
---|---|---|---|---|---|---|---|
Reference | 40.8 | 23.0 | 15.9 | 10.1 | 4.1 | 4.0 | 2.2 |
Our model-2 | 55.2 | 16.6 | 16.1 | 8.4 | 0.6 | 0.3 | 3.0 |
Our model-1 | 49.6 | 23.5 | 13.8 | 9.4 | 1.0 | 0.9 | 2.1 |
RNN | 49.7 | 29.7 | 11.1 | 7.8 | 0.4 | 0.0 | 1.5 |
Semantic-Graph* | 37.7 | 21.3 | 17.0 | 14.7 | 4.1 | 4.0 | 1.4 |
GPT-2 | 30.2 | 26.1 | 11.1 | 26.4 | 2.2 | 2.4 | 1.8 |
Semantic-Graph | 36.2 | 20.2 | 15.4 | 18.7 | 3.2 | 2.9 | 3.4 |
Model | What | Which | Who | Other | How | Where | When |
---|---|---|---|---|---|---|---|
Our model-2 | 26.57 | 29.86 | 26.73 | 28.32 | 30.23 | 25.49 | 27.76 |
Our model-1 | 26.50 | 27.15 | 25.96 | 29.29 | 29.16 | 25.86 | 29.38 |
RNN | 25.92 | 26.98 | 24.90 | 30.06 | 20.19 | - | 26.34 |
Semantic-Graph* | 26.05 | 25.80 | 25.40 | 25.67 | 29.93 | 26.25 | 31.16 |
GPT-2 | 24.15 | 27.03 | 23.50 | 30.60 | 27.26 | 23.92 | 29.16 |
Semantic-Graph | 26.11 | 26.24 | 24.74 | 24.76 | 28.79 | 27.96 | 24.07 |
Model | What | Which | Who | Other | How | Where | When |
---|---|---|---|---|---|---|---|
Our model-2 | 4.97 | 4.80 | 4.94 | 5.27 | 5.63 | - | 4.82 |
Our model-1 | 4.92 | 4.76 | 4.85 | 4.60 | 5.67 | 5.00 | 5.15 |
RNN | 4.81 | 4.95 | 4.68 | 4.67 | 6.00 | - | 5.50 |
Semantic-Graph* | 4.76 | 4.62 | 4.86 | 4.88 | 4.47 | 4.84 | 3.94 |
GPT-2 | 4.95 | 4.61 | 4.36 | 4.59 | 5.61 | 5.28 | 4.04 |
Semantic-Graph | 4.63 | 4.79 | 4.59 | 4.50 | 4.42 | 4.72 | 4.17 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ji, T.; Lyu, C.; Cao, Z.; Cheng, P. Multi-Hop Question Generation Using Hierarchical Encoding-Decoding and Context Switch Mechanism. Entropy 2021, 23, 1449. https://doi.org/10.3390/e23111449
Ji T, Lyu C, Cao Z, Cheng P. Multi-Hop Question Generation Using Hierarchical Encoding-Decoding and Context Switch Mechanism. Entropy. 2021; 23(11):1449. https://doi.org/10.3390/e23111449
Chicago/Turabian StyleJi, Tianbo, Chenyang Lyu, Zhichao Cao, and Peng Cheng. 2021. "Multi-Hop Question Generation Using Hierarchical Encoding-Decoding and Context Switch Mechanism" Entropy 23, no. 11: 1449. https://doi.org/10.3390/e23111449
APA StyleJi, T., Lyu, C., Cao, Z., & Cheng, P. (2021). Multi-Hop Question Generation Using Hierarchical Encoding-Decoding and Context Switch Mechanism. Entropy, 23(11), 1449. https://doi.org/10.3390/e23111449