Exploring highly concise and accurate text matching model with tiny weights

Yangchun Li^1,2,
Danfeng Yan¹,
Wei Jiang²,
Yuanqiang Cai¹ &
…
Zhihong Tian³

86 Accesses
Explore all metrics

Abstract

In this paper, we propose a simple and general lightweight approach named AL-RE2 for text matching models, and conduct experiments on three well-studied benchmark datasets across tasks of natural language inference and paraphrase identification. Firstly, we explore the feasibility of dimensional compression of word embedding vectors using principal component analysis, and then analyze the impact of the information retained in different dimensions on model accuracy. Considering the balance between compression efficiency and information loss, we choose 128 dimensions to represent each word and make the model params 1.6M. Finally, the feasibility of applying depthwise separable convolution instead of standard convolution in the field of text matching is analyzed in detail. The experimental results show that our model’s inference speed is at least 1.5 times faster and it has 42.76% fewer parameters compared to similarly performing models, while its accuracy on the SciTail dataset of is state-of-the-art among all lightweight models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

MG-CMF: A Multi-granularity Capture Matching Features Model for Text Matching

Research on Text Classification Modeling Strategy Based on Pre-trained Language Model

Feature Differentiation and Fusion for Semantic Text Matching

Data availability

No datasets were generated or analysed during the current study.

References

Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly (2009). http://www.oreilly.de/catalog/9780596516499/index.html
Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp. 632–642 (2015a). https://doi.org/10.18653/v1/D15-1075. https://aclanthology.org/D15-1075
Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: Màrquez L, Callison-Burch C, Su J, et al (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015. The Association for Computational Linguistics, pp. 632–642 (2015b). https://doi.org/10.18653/v1/d15-1075
Chen, Q., Zhu, X., Ling, Z.H., et al.: Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, pp. 1657–1668 (2017). https://doi.org/10.18653/v1/P17-1152. https://aclanthology.org/P17-1152
Ding, H., Chen, K., Huo, Q.: Compressing cnn-dblstm models for ocr with teacher-student learning and tucker decomposition. Pattern Recogn. 96, (2019). https://doi.org/10.1016/j.patcog.2019.07.002. https://www.sciencedirect.com/science/article/pii/S0031320319302547
Gong, Y., Luo, H., Zhang, J.: Natural language inference over interaction space. In: International conference on learning representations (2018). https://openreview.net/forum?id=r1dHXnH6-
Haase, D., Amthor, M.: Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (2020a)
Haase, D., Amthor, M.: Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets. In: 2020 IEEE/CVF Conference on computer vision and attern recognition, CVPR 2020, Seattle, WA, USA. Computer Vision Foundation / IEEE, pp. 14588–14597 (2020b). https://doi.org/10.1109/CVPR42600.2020.01461. https://openaccess.thecvf.com/content_CVPR_2020/html/Haase_Rethinking_Depthwise_Separable_Convolutions_How_Intra-Kernel_Correlations_Lead_to_Improved_CVPR_2020_paper.html. Accessed 13–19 2020
Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint (2017a). arXiv:1704.04861
Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017b). arXiv:1704.04861
Jeon, G., Jeong, H., Choi, J.: An efficient explorative sampling considering the generative boundaries of deep generative neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 34(04), 4288–4295 (2020). https://doi.org/10.1609/aaai.v34i04.5852. https://ojs.aaai.org/index.php/AAAI/article/view/5852
Jin, J., Dundar, A., Culurciello, E.: Flattened convolutional neural networks for feedforward acceleration. 1412.5474 (2015)
Khot, T., Sabharwal, A., Clark, P.: Scitail: A textual entailment dataset from science question answering. In: McIlraith SA, Weinberger KQ (eds.) Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA. AAAI Press, pp. 5189–5197 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17368. Accessed 2–7 Feb 2018
Kiers, H.: Towards a standardized notation and terminology in multiway analysis. J. Chemom. 14(3), 105–122 (2000). https://doi.org/10.1002/1099-128X(200005/06)14:3<105::AID-CEM582>3.0.CO;2-I
Article Google Scholar
Kim, S., Hong, J., Kang, I., et al.: Semantic sentence matching with densely-connected recurrent and co-attentive information. CoRR abs/1805.11360 (2018). arXiv:1805.11360
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, Conference Track Proceedings (2015). arXiv:1412.6980. Accessed 7–9 May 2015
Lai, S., Liu, K., He, S., et al.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016). https://doi.org/10.1109/MIS.2016.45
Article Google Scholar
Lan, Z., Chen, M., Goodman, S., et al.: Albert: A lite bert for self-supervised learning of language representations. In: International conference on learning representations (2020). https://openreview.net/forum?id=H1eA7AEtvS
Lecun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Lee, R., Chin, Y., Chang, S.: Application of principal component analysis to multikey searching. IEEE Trans. Softw. Eng. SE-2(3):185–193 (1976). https://doi.org/10.1109/TSE.1976.225946
Liu, X., Duh, K., Gao, J.: Stochastic answer networks for natural language inference. CoRR abs/1804.07888 (2018). arXiv:1804.07888
Liu, X., He, P., Chen, W., et al.: Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4487–4496 (2019). https://doi.org/10.18653/v1/P19-1441. https://aclanthology.org/P19-1441
Pan, B., Yang, Y., Zhao, Z., et al.: Discourse marker augmented network with reinforcement learning for natural language inference. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp. 989–999 (2018). https://doi.org/10.18653/v1/P18-1091. https://aclanthology.org/P18-1091
Paranjape, B., Michael, J., Ghazvininejad, M., et al.: Prompting contrastive explanations for commonsense reasoning tasks. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, pp. 4179–4192 (2021). https://doi.org/10.18653/v1/2021.findings-acl.366. https://aclanthology.org/2021.findings-acl.366
Parikh, A., Täckström, O., Das, D., et al.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp. 2249–2255 (2016), https://doi.org/10.18653/v1/D16-1244. https://aclanthology.org/D16-1244
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach HM, Larochelle H, Beygelzimer A, et al (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, pp. 8024–8035 (2019). https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html. Accessed 8–14 Dec 2019
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W., (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp. 1532–1543 (2014). https://doi.org/10.3115/v1/d14-1162. Accessed 25–29 Oct 2014
Salimans, T., Kingma, D.P.: Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In: Neural information processing systems, p. 901 (2016)
SIfre, L., Mallat, S.: Rigid-motion scattering for texture classification. arXiv e-prints pp arXiv–1403 (2014)
Sun, Z., Fan, C., Han, Q., et al.: Self-explaining structures improve NLP models. CoRR abs/2012.01786 (2020). arXiv:2012.01786
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, pp. 1556–1566 (2015). https://doi.org/10.3115/v1/P15-1150. https://aclanthology.org/P15-1150
Tan, C., Wei, F., Wang, W., et al.: Multiway attention networks for modeling sentence pairs. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, pp. 4411–4417 (2018). https://doi.org/10.24963/ijcai.2018/613
Tay, Y., Luu, A.T., Hui, S.C.: Co-stack residual affinity networks with multi-level attention refinement for matching text sequences. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp. 4492–4502 (2018a). https://doi.org/10.18653/v1/D18-1479. https://aclanthology.org/D18-1479
Tay, Y., Luu, A.T., Hui, S.C.: Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp. 1565–1575 (2018b). https://doi.org/10.18653/v1/D18-1185. https://aclanthology.org/D18-1185
Tay, Y., Luu, A.T., Hui, S.C.: Hermitian co-attention networks for text matching in asymmetrical domains. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, pp. 4425–4431 (2018c). https://doi.org/10.24963/ijcai.2018/615
Tomar, G.S., Duque, T., Täckström, O., et al.: Neural paraphrase identification of questions with noisy pretraining. In: Proceedings of the first workshop on subword and character level models in NLP. Association for Computational Linguistics, Copenhagen, Denmark, pp. 142–147 (2017). https://doi.org/10.18653/v1/W17-4121. https://aclanthology.org/W17-4121
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, pp. 4144–4150 (2017). https://doi.org/10.24963/ijcai.2017/579
Xu, D., Tian, Z., Lai, R., et al.: Deep learning based emotion analysis of microblog texts. Inf. Fusion 64, 1–11 (2020). https://api.semanticscholar.org/CorpusID:221910854
Yang, R., Zhang, J., Gao, X., et al.: Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp. 4699–4709 (2019). https://doi.org/10.18653/v1/P19-1465. https://aclanthology.org/P19-1465
Yu, L., Hermann, K.M., Blunsom, P., et al.: Deep learning for answer sentence selection. CoRR abs/1412.1632 (2014). arXiv:1412.1632
Zhang, Z., Wu, Y., Zhao, H., et al.: Semantics-aware bert for language understanding. Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 9628–9635 (2020). https://doi.org/10.1609/aaai.v34i05.6510. https://ojs.aaai.org/index.php/AAAI/article/view/6510
Zhou, X., Dong, D., Wu, H., et al.: Multi-view response selection for human-computer conversation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, pp. 372–381 (2016). https://doi.org/10.18653/v1/D16-1036. https://aclanthology.org/D16-1036

Download references

Acknowledgements

This work was supported by National Key R &D Program of China , and Major Programs of the National Social Science Foundation of China.

Funding

Open access funding provided by National Key R &D Program of China (Grant No. 2021YFB3101300, 2021YFB3101302, 2021YFB3101305), Major Programs of the National Social Science Foundation of China (Grant No. 22 &ZD147) and the National Social Science Fund of China (Grant No. 23VRC094).

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Xitucheng Road 10, Beijing, 100876, China
Yangchun Li, Danfeng Yan & Yuanqiang Cai
Chinese Academy of Cyberspace Studies, Fucheng Road 15, Beijing, 100048, China
Yangchun Li & Wei Jiang
Cyberspace Institute of Advanced Technology, Guangzhou University, No. 230 Outer Ring West Road, Guangzhou, 510006, China
Zhihong Tian

Authors

Yangchun Li
View author publications
You can also search for this author in PubMed Google Scholar
Danfeng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanqiang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Tian
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to this work.

Corresponding author

Correspondence to Danfeng Yan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics Approval

We declare that this submission follows the policies as outlined in the Guide for Authors. The current research involves no Human Participants and/or Animals.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Yan, D., Jiang, W. et al. Exploring highly concise and accurate text matching model with tiny weights. World Wide Web 27, 39 (2024). https://doi.org/10.1007/s11280-024-01262-7

Download citation

Received: 25 December 2023
Revised: 21 February 2024
Accepted: 04 March 2024
Published: 20 June 2024
DOI: https://doi.org/10.1007/s11280-024-01262-7

Exploring highly concise and accurate text matching model with tiny weights

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MG-CMF: A Multi-granularity Capture Matching Features Model for Text Matching

Research on Text Classification Modeling Strategy Based on Pre-trained Language Model

Feature Differentiation and Fusion for Semantic Text Matching

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics Approval

Open Access

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Exploring highly concise and accurate text matching model with tiny weights

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MG-CMF: A Multi-granularity Capture Matching Features Model for Text Matching

Research on Text Classification Modeling Strategy Based on Pre-trained Language Model

Feature Differentiation and Fusion for Semantic Text Matching

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics Approval

Open Access

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation