Abstract
In this paper, we propose a simple and general lightweight approach named AL-RE2 for text matching models, and conduct experiments on three well-studied benchmark datasets across tasks of natural language inference and paraphrase identification. Firstly, we explore the feasibility of dimensional compression of word embedding vectors using principal component analysis, and then analyze the impact of the information retained in different dimensions on model accuracy. Considering the balance between compression efficiency and information loss, we choose 128 dimensions to represent each word and make the model params 1.6M. Finally, the feasibility of applying depthwise separable convolution instead of standard convolution in the field of text matching is analyzed in detail. The experimental results show that our model’s inference speed is at least 1.5 times faster and it has 42.76% fewer parameters compared to similarly performing models, while its accuracy on the SciTail dataset of is state-of-the-art among all lightweight models.
Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly (2009). http://www.oreilly.de/catalog/9780596516499/index.html
Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp. 632–642 (2015a). https://doi.org/10.18653/v1/D15-1075. https://aclanthology.org/D15-1075
Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: Màrquez L, Callison-Burch C, Su J, et al (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015. The Association for Computational Linguistics, pp. 632–642 (2015b). https://doi.org/10.18653/v1/d15-1075
Chen, Q., Zhu, X., Ling, Z.H., et al.: Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, pp. 1657–1668 (2017). https://doi.org/10.18653/v1/P17-1152. https://aclanthology.org/P17-1152
Ding, H., Chen, K., Huo, Q.: Compressing cnn-dblstm models for ocr with teacher-student learning and tucker decomposition. Pattern Recogn. 96, (2019). https://doi.org/10.1016/j.patcog.2019.07.002. https://www.sciencedirect.com/science/article/pii/S0031320319302547
Gong, Y., Luo, H., Zhang, J.: Natural language inference over interaction space. In: International conference on learning representations (2018). https://openreview.net/forum?id=r1dHXnH6-
Haase, D., Amthor, M.: Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (2020a)
Haase, D., Amthor, M.: Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets. In: 2020 IEEE/CVF Conference on computer vision and attern recognition, CVPR 2020, Seattle, WA, USA. Computer Vision Foundation / IEEE, pp. 14588–14597 (2020b). https://doi.org/10.1109/CVPR42600.2020.01461. https://openaccess.thecvf.com/content_CVPR_2020/html/Haase_Rethinking_Depthwise_Separable_Convolutions_How_Intra-Kernel_Correlations_Lead_to_Improved_CVPR_2020_paper.html. Accessed 13–19 2020
Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint (2017a). arXiv:1704.04861
Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017b). arXiv:1704.04861
Jeon, G., Jeong, H., Choi, J.: An efficient explorative sampling considering the generative boundaries of deep generative neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 34(04), 4288–4295 (2020). https://doi.org/10.1609/aaai.v34i04.5852. https://ojs.aaai.org/index.php/AAAI/article/view/5852
Jin, J., Dundar, A., Culurciello, E.: Flattened convolutional neural networks for feedforward acceleration. 1412.5474 (2015)
Khot, T., Sabharwal, A., Clark, P.: Scitail: A textual entailment dataset from science question answering. In: McIlraith SA, Weinberger KQ (eds.) Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA. AAAI Press, pp. 5189–5197 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17368. Accessed 2–7 Feb 2018
Kiers, H.: Towards a standardized notation and terminology in multiway analysis. J. Chemom. 14(3), 105–122 (2000). https://doi.org/10.1002/1099-128X(200005/06)14:3<105::AID-CEM582>3.0.CO;2-I
Kim, S., Hong, J., Kang, I., et al.: Semantic sentence matching with densely-connected recurrent and co-attentive information. CoRR abs/1805.11360 (2018). arXiv:1805.11360
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, Conference Track Proceedings (2015). arXiv:1412.6980. Accessed 7–9 May 2015
Lai, S., Liu, K., He, S., et al.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016). https://doi.org/10.1109/MIS.2016.45
Lan, Z., Chen, M., Goodman, S., et al.: Albert: A lite bert for self-supervised learning of language representations. In: International conference on learning representations (2020). https://openreview.net/forum?id=H1eA7AEtvS
Lecun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Lee, R., Chin, Y., Chang, S.: Application of principal component analysis to multikey searching. IEEE Trans. Softw. Eng. SE-2(3):185–193 (1976). https://doi.org/10.1109/TSE.1976.225946
Liu, X., Duh, K., Gao, J.: Stochastic answer networks for natural language inference. CoRR abs/1804.07888 (2018). arXiv:1804.07888
Liu, X., He, P., Chen, W., et al.: Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4487–4496 (2019). https://doi.org/10.18653/v1/P19-1441. https://aclanthology.org/P19-1441
Pan, B., Yang, Y., Zhao, Z., et al.: Discourse marker augmented network with reinforcement learning for natural language inference. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp. 989–999 (2018). https://doi.org/10.18653/v1/P18-1091. https://aclanthology.org/P18-1091
Paranjape, B., Michael, J., Ghazvininejad, M., et al.: Prompting contrastive explanations for commonsense reasoning tasks. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, pp. 4179–4192 (2021). https://doi.org/10.18653/v1/2021.findings-acl.366. https://aclanthology.org/2021.findings-acl.366
Parikh, A., Täckström, O., Das, D., et al.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp. 2249–2255 (2016), https://doi.org/10.18653/v1/D16-1244. https://aclanthology.org/D16-1244
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach HM, Larochelle H, Beygelzimer A, et al (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, pp. 8024–8035 (2019). https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html. Accessed 8–14 Dec 2019
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W., (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp. 1532–1543 (2014). https://doi.org/10.3115/v1/d14-1162. Accessed 25–29 Oct 2014
Salimans, T., Kingma, D.P.: Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In: Neural information processing systems, p. 901 (2016)
SIfre, L., Mallat, S.: Rigid-motion scattering for texture classification. arXiv e-prints pp arXiv–1403 (2014)
Sun, Z., Fan, C., Han, Q., et al.: Self-explaining structures improve NLP models. CoRR abs/2012.01786 (2020). arXiv:2012.01786
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, pp. 1556–1566 (2015). https://doi.org/10.3115/v1/P15-1150. https://aclanthology.org/P15-1150
Tan, C., Wei, F., Wang, W., et al.: Multiway attention networks for modeling sentence pairs. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, pp. 4411–4417 (2018). https://doi.org/10.24963/ijcai.2018/613
Tay, Y., Luu, A.T., Hui, S.C.: Co-stack residual affinity networks with multi-level attention refinement for matching text sequences. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp. 4492–4502 (2018a). https://doi.org/10.18653/v1/D18-1479. https://aclanthology.org/D18-1479
Tay, Y., Luu, A.T., Hui, S.C.: Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp. 1565–1575 (2018b). https://doi.org/10.18653/v1/D18-1185. https://aclanthology.org/D18-1185
Tay, Y., Luu, A.T., Hui, S.C.: Hermitian co-attention networks for text matching in asymmetrical domains. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, pp. 4425–4431 (2018c). https://doi.org/10.24963/ijcai.2018/615
Tomar, G.S., Duque, T., Täckström, O., et al.: Neural paraphrase identification of questions with noisy pretraining. In: Proceedings of the first workshop on subword and character level models in NLP. Association for Computational Linguistics, Copenhagen, Denmark, pp. 142–147 (2017). https://doi.org/10.18653/v1/W17-4121. https://aclanthology.org/W17-4121
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, pp. 4144–4150 (2017). https://doi.org/10.24963/ijcai.2017/579
Xu, D., Tian, Z., Lai, R., et al.: Deep learning based emotion analysis of microblog texts. Inf. Fusion 64, 1–11 (2020). https://api.semanticscholar.org/CorpusID:221910854
Yang, R., Zhang, J., Gao, X., et al.: Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp. 4699–4709 (2019). https://doi.org/10.18653/v1/P19-1465. https://aclanthology.org/P19-1465
Yu, L., Hermann, K.M., Blunsom, P., et al.: Deep learning for answer sentence selection. CoRR abs/1412.1632 (2014). arXiv:1412.1632
Zhang, Z., Wu, Y., Zhao, H., et al.: Semantics-aware bert for language understanding. Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 9628–9635 (2020). https://doi.org/10.1609/aaai.v34i05.6510. https://ojs.aaai.org/index.php/AAAI/article/view/6510
Zhou, X., Dong, D., Wu, H., et al.: Multi-view response selection for human-computer conversation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, pp. 372–381 (2016). https://doi.org/10.18653/v1/D16-1036. https://aclanthology.org/D16-1036
Acknowledgements
This work was supported by National Key R &D Program of China , and Major Programs of the National Social Science Foundation of China.
Funding
Open access funding provided by National Key R &D Program of China (Grant No. 2021YFB3101300, 2021YFB3101302, 2021YFB3101305), Major Programs of the National Social Science Foundation of China (Grant No. 22 &ZD147) and the National Social Science Fund of China (Grant No. 23VRC094).
Author information
Authors and Affiliations
Contributions
All authors contributed equally to this work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics Approval
We declare that this submission follows the policies as outlined in the Guide for Authors. The current research involves no Human Participants and/or Animals.
Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Yan, D., Jiang, W. et al. Exploring highly concise and accurate text matching model with tiny weights. World Wide Web 27, 39 (2024). https://doi.org/10.1007/s11280-024-01262-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11280-024-01262-7