An Interactive Framework of Cross-Lingual NLU for In-Vehicle Dialogue
<p>Interactive attention model based on contrastive learning.</p> "> Figure 2
<p>Contrastive-learning-based encoder.</p> "> Figure 3
<p>Overall accuracy of MIvD in different languages under baseline.</p> "> Figure 4
<p>Contrastive learning-based encoder. (<b>a</b>) Experimental results of different baseline models under mBERT. (<b>b</b>) Experimental results of different baseline models under XLM-R.</p> ">
Abstract
:1. Introduction
- (1)
- This paper provided a set of datasets of NLU tasks of an in-vehicle dialogue system under cross-lingual scenarios, including Chinese, Arabic, Japanese, and English. The intent and slots of the datasets are annotated according to the annotation specification of the currently published MultiATIS++ [10] dataset.
- (2)
- It proposed a cross-lingual interactive framework for in-vehicle human–computer dialog systems. The framework employs an end-to-end architecture for joint modeling, using pretrained models and incorporating contrastive learning to communicate the relationship between intent recognition and slot filling tasks through interactive attention mechanisms in downstream tasks.
- (3)
- It validated an IABCL framework on self-built MIvD datasets and public datasets. To the best of our knowledge, this is the first time to evaluate the model work on the joint task of intent detection and slot filling in in-vehicle dialogue systems. The experimental results show that the attention-based interactive framework achieves remarkable results in cross-lingual intent detection and slot filling.
2. Related Work on Cross-Lingual NLU
2.1. In-Vehicle Dialogue NLU
2.2. Cross-Lingual Transfer Learning
2.3. Joint Learning Frameworks for Intent Detection and Slot Filling
3. Modeling Framework
3.1. Contrastive-Learning-Based Encoder
3.1.1. Positive Sample
3.1.2. Negative Sample
3.1.3. Loss Function of Contrastive Learning
- Sentence-level intent CL module for aligning sentence representations across languages for intent detection.
- 2.
- Word-level slot CL module for cross-lingual alignment of slot representations for slot filling.
- 3.
- Semantic-level intent-slot CL module for aligning representations between slots and intents.
3.2. Interactive-Learning-Based Decoder
- Intent detection task attention head:
- 2.
- Slot filling task attention header:
3.3. Multitask Learning
- Loss of intent detection task:
- 2.
- Loss of slot filling task:
4. Experimental Setup
4.1. Experimental Data
4.2. Evaluation Indicators and Baseline Model
4.2.1. Evaluation Indicators
- Intent accuracy: Intent Accuracy is used to evaluate the performance of intent detection by calculating the percentage of sentences that correctly predict the intent.
- Slot F1: The performance of slot filling is evaluated using the F1 score, which is the average score of the reconciliation between accuracy and recall. Slot predictions are considered correct when exact matches are found [30].
- Overall accuracy: It calculates the proportion of sentences that correctly predict intents and slots. The indicator takes into account both intent detection and slot filling [13].
4.2.2. Baseline Model
- CoSDA-ML. Qin et al. [12] proposed a dynamic code-switching method for randomly performing multilingual word-level substitutions. For a fair comparison of MIvD, this paper uses Chinese training data and code-switching data for fine-tuning.
- Multilingual-ZeroShot. Train a joint intent detection and slot filling model using English and generalize to other languages by Jitin [31].
- GL-CLEF is a global–local contrastive learning framework for display alignment proposed by Qin et al. [13].
- LAJ-MCJ is a multilevel label-aware contrastive learning framework for display alignment that was proposed by Liang et al. [32].
4.3. Experimental Parameters
5. Result Analysis and Discussion
- RQ1: Does IABCL predict better results than other cross-lingual migration models in the MIvD dataset?
- RQ2: How well does IABCL work on public datasets?
- RQ3: What role do contrastive learning and interactive attention specifically play?
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hall, T.; Beecham, S.; Bowes, D.; Gray, D.; Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 2011, 38, 1276–1304. [Google Scholar] [CrossRef]
- Liu, S.; Qi, Z.; Wang, Y.; Tang, Q.; He, Y.; Liu, Y. Design and analysis of dialogue based on vehicle. Veh. Electr. Appl. 2023, 1, 46–48. (In Chinese) [Google Scholar] [CrossRef]
- Zhang, L. Research and Application of Vehicle-Mounted Human-Computer Interface Interaction Design for Voice Interaction; Beijing University of Posts and Telecommunications: Beijing, China, 2021. (In Chinese) [Google Scholar]
- Du, Z. Research on dialogue System of Vehicle HUD Based on Gesture Recognition; Hebei University of Science and Technology: Shijiazhuang, China, 2019. (In Chinese) [Google Scholar]
- Gao, Q.; Xu, Y. Dialogue system of vehicle touch control. Instrum. Technol. Sens. 2012, 2, 72–74. (In Chinese) [Google Scholar]
- Khalil, T. Cross-lingual intent classification in a low resource industrial setting. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Arora, A. Cross-lingual transfer learning for intent detection of COVID-19 utterances. In Proceedings of the Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online, 15 August–4 September 2020. [Google Scholar]
- Gerz, D.; Su, P.-H.; Kusztos, R.; Mondal, A.; Lis, M.; Singhal, E.; Mrkšić, N.; Wen, T.-H.; Vulić, I. Multilingual and Cross-Lingual Intent Detection from Spoken Data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online, 7–11 November 2021; Association for Computational Linguistics: Punta Cana, Dominican Republic, 2021; pp. 7468–7475. [Google Scholar]
- Liu, Z.; Winata, G.I.; Lin, Z.; Xu, P.; Fung, P. Attention-informed mixed-language training for zero-shot cross-lingual task-oriented dialogue systems. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8433–8440. [Google Scholar]
- Xu, W.; Haider, B.; Mansour, S. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv 2020, arXiv:2004.14353. [Google Scholar]
- FitzGerald, J.G. STIL-Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++. arXiv 2020, arXiv:2010.00760. [Google Scholar]
- Qin, L. Cosda-ml: Multi-lingual code-switching data augmentation for zero-shot cross-lingual nlp. arXiv 2020, arXiv:2006.06402. [Google Scholar]
- Qin, L.; Chen, Q.; Xie, T. GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding. arXiv 2022, arXiv:2204.08325. [Google Scholar]
- Yang, B. Design and Implementation of Vehicle Speech Control System Based on Oral Comprehension by Rules and Statistical Methods; Hebei University of Science and Technology: Shijiazhuang, China, 2018. (In Chinese) [Google Scholar]
- Rafiepour, M.; Sartakhti, J.S. CTRAN: CNN-Transformer-based Network for NLU. arXiv 2023, arXiv:2303.10606. [Google Scholar]
- Ruder, S.; Vulić, I.; Søgaard, A. A survey of cross-lingual word embedding models. J. Artif. Intell. Res. 2019, 65, 569–631. [Google Scholar] [CrossRef]
- Zhuang, F.; Luo, P.; He, Q.; Shi, Z. Research progress of transfer learning. J. Softw. Sci. 2014, 26, 26–39. (In Chinese) [Google Scholar]
- Lefevre, F.; Mairesse, F.; Young, S. Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation. In Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Chiba, Japan, 26–30 September 2010. [Google Scholar]
- Adams, O. Cross-lingual word embeddings for low-resource language modeling. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain, 3–7 April 2017. [Google Scholar]
- Johnson, A. Cross-lingual transfer learning for Japanese named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Li, Y. Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding. In Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018. [Google Scholar]
- Schuster, S.; Gupta, S.; Shah, R.; Lewis, M. Cross-lingual transfer learning for multilingual task-oriented dialog. Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Zhang, X.; Wang, H. A joint model of intent determination slot filling for spoken language understanding. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016. [Google Scholar]
- Wang, Y.; Shen, Y.; Jin, H. A bi-model based on semantic frame parsing model for intent detection and slot filling. arXiv 2018, arXiv:1812.10235. [Google Scholar]
- Liu, B.; Lane, I. Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of the Interspeech, San Francisco, CA, USA, 8–12 September 2016. [Google Scholar]
- Chen, Q.; Zhuo, Z.; Wang, W. Bert for joint intent classification and slot filling. arXiv 2019, arXiv:1902.10909. [Google Scholar]
- Zhou, P.; Huang, Z.; Liu, F.; Zou, Y. PIN: A novel parallel interactive network for spoken language understanding. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021. [Google Scholar]
- Qin, L. A co-interactive transformer for joint slot filling and intent detection. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021. [Google Scholar]
- Zhang, C.; Chen, J.; Li, Q.; Deng, B.; Wang, J.; Chen, C. Summary of deep comparative learning. Acta Autom. Sin. 2023, 49, 15–39. (In Chinese) [Google Scholar] [CrossRef]
- Qin, L. A survey on spoken language understanding: Recent advances and new frontiers. arXiv 2021, arXiv:2103.03095. [Google Scholar]
- Krishnan, J.; Anastasopoulos, A.; Purohit, H.; Rangwala, H. Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling. In Proceedings of the 1st Workshop on Multilingual Representation Learning, Online, 11 November 2021; Association for Computational Linguistics: Punta Cana, Dominican Republic, 2021; pp. 211–223. [Google Scholar]
- Liang, S. Label-aware Multi-level Contrastive Learning for Cross-lingual Spoken Language Understanding. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022. [Google Scholar]
Language | Utterances | Intent Types | Slot Types | ||
---|---|---|---|---|---|
Train | Dev | Test | |||
en | 4488 | 490 | 893 | 18 | 84 |
es | 4488 | 490 | 893 | 18 | 84 |
pt | 4488 | 490 | 893 | 18 | 84 |
de | 4488 | 490 | 893 | 18 | 84 |
fr | 4488 | 490 | 893 | 18 | 84 |
zh | 4488 | 490 | 893 | 18 | 84 |
ja | 4488 | 490 | 893 | 18 | 84 |
hi | 1440 | 160 | 893 | 17 | 75 |
tr | 578 | 60 | 715 | 17 | 71 |
Language | Utterances | Intent Number | Slot Number | ||
---|---|---|---|---|---|
Train | Dev | Test | |||
zh | 22,152 | 2770 | 2769 | 17 | 14 |
ja | - | 392 | 392 | 17 | 14 |
en | - | 501 | 500 | 17 | 14 |
ar | - | 493 | 493 | 17 | 14 |
Language | Utterance | Intent Types | Slot Values |
---|---|---|---|
zh | 哎帮我把空调调一下19度 | adjust_ac_temperture | 19 |
ja | 風量を九に | adjust_ac_windspeed | 九 |
en | Open ac cooling mode | open_ac_mode | cooling |
ar | درجة ثلاثين على الحرارة درجة تعيين. | adjust_ac_temperature | ثلاثين |
Experimental Modules and Parameters | Parameters and Module-Specific Information |
---|---|
Pretrained models | mBERT (12 transformer layers, 768 hidden layers, and 12 attention heads) |
Batch size | 32 |
Learning rate | 5 × 10−6 |
Probability of code switching | 0.55 |
Maximum sequence length | 128 |
Optimizer | Adam |
Dropout | 0.1 |
Number of negative samples | 16 |
λ1 | 0.01 |
λ2 | 0.005 |
λ3 | 0.01 |
Intent Accuracy (%) | ZH | JA | AR | EN | Avg |
---|---|---|---|---|---|
CoSDA-ML (2020) | 96.24 | 60.71 | 60.32 | 60.33 | 69.40 |
Multilingual-ZeroShot (2021) | 96.78 | 61.22 | 30.22 | 31.60 | 54.95 |
GL-CLEF (2022) | 97.32 | 72.19 | 61.86 | 54.4 | 71.44 |
IABCL | 97.40 | 73.97 | 62.67 | 61.60 | 73.86 |
Slot F1 (%) | ZH | JA | AR | EN | avg |
CoSDA-ML (2020) | 93.00 | 72.27 | 26.77 | 53.54 | 61.39 |
Multilingual-ZeroShot (2021) | 90.42 | 36.52 | 19.47 | 36.48 | 45.72 |
GL-CLEF (2022) | 95.53 | 75.54 | 28.62 | 55.48 | 63.79 |
IABCL | 95.72 | 79.68 | 30.54 | 56.15 | 65.22 |
Overall Acc (%) | ZH | JA | AR | EN | avg |
CoSDA-ML (2020) | 90.08 | 46.68 | 28.00 | 43.55 | 52.07 |
Multilingual-ZeroShot (2021) | 88.76 | 15.00 | 5.47 | 7.90 | 29.28 |
GL-CLEF (2022) | 93.71 | 58.92 | 28.8 | 37.20 | 54.66 |
IABCL | 94.07 | 60.45 | 28.89 | 46.4 | 57.33 |
Method | mBERT | XLM-R | ||||
---|---|---|---|---|---|---|
Intent Acc (%) | Slot F1 (%) | Overall Acc (%) | Intent Acc (%) | Slot F1 (%) | Overall Acc (%) | |
mBERT (2019) * | 88.42 | 61.66 | 36.29 | - | - | - |
XLM-R (2020) * | - | - | - | 93.02 | 57.38 | 33.31 |
CoSDA-ML (2020) | 90.87 | 68.08 | 43.15 | 93.04 | 70.01 | 43.72 |
Ensemble-Net (2021) * | 87.20 | 55.78 | - | - | - | - |
LAJ-MCJ (2022) | 92.41 | 78.23 | 52.50 | 93.49 | 75.69 | 47.58 |
IABCL | 92.96 | 78.08 | 51.04 | 93.13 | 75.99 | 49.38 |
Intent Acc (%) | ZH | JA | AR | EN | Avg |
---|---|---|---|---|---|
Our-cl | 97.32 | 73.27 | 60.98 | 58.99 | 72.64 |
Our-bi_interaction | 96.93 | 73.19 | 61.86 | 60.00 | 72.99 |
Our-all | 95.66 | 72.95 | 59.22 | 57.80 | 71.40 |
Our | 97.40 | 73.97 | 62.67 | 61.60 | 73.91 |
Slot F1 (%) | ZH | JA | AR | EN | Avg |
Our-cl | 95.53 | 76.82 | 29.84 | 55.00 | 62.29 |
Our-bi_interaction | 95.35 | 75.54 | 29.92 | 55.48 | 64.07 |
Our-all | 93.63 | 75.30 | 28.62 | 54.17 | 62.93 |
Our | 95.72 | 79.68 | 30.54 | 56.15 | 65.52 |
Overall Acc (%) | ZH | JA | AR | EN | Avg |
Our-cl | 93.71 | 59.82 | 28.80 | 41.40 | 55.93 |
Our-bi_interaction | 93.64 | 59.18 | 28.89 | 43.80 | 56.37 |
Our-all | 92.66 | 55.95 | 27.18 | 39.80 | 53.89 |
Our | 94.07 | 60.45 | 30.22 | 46.40 | 57.78 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Fang, L.; Zhang, L.; Cao, P. An Interactive Framework of Cross-Lingual NLU for In-Vehicle Dialogue. Sensors 2023, 23, 8501. https://doi.org/10.3390/s23208501
Li X, Fang L, Zhang L, Cao P. An Interactive Framework of Cross-Lingual NLU for In-Vehicle Dialogue. Sensors. 2023; 23(20):8501. https://doi.org/10.3390/s23208501
Chicago/Turabian StyleLi, Xinlu, Liangkuan Fang, Lexuan Zhang, and Pei Cao. 2023. "An Interactive Framework of Cross-Lingual NLU for In-Vehicle Dialogue" Sensors 23, no. 20: 8501. https://doi.org/10.3390/s23208501
APA StyleLi, X., Fang, L., Zhang, L., & Cao, P. (2023). An Interactive Framework of Cross-Lingual NLU for In-Vehicle Dialogue. Sensors, 23(20), 8501. https://doi.org/10.3390/s23208501