[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Exploring highly concise and accurate text matching model with tiny weights

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In this paper, we propose a simple and general lightweight approach named AL-RE2  for text matching models, and conduct experiments on three well-studied benchmark datasets across tasks of natural language inference and paraphrase identification. Firstly, we explore the feasibility of dimensional compression of word embedding vectors using principal component analysis, and then analyze the impact of the information retained in different dimensions on model accuracy. Considering the balance between compression efficiency and information loss, we choose 128 dimensions to represent each word and make the model params 1.6M. Finally, the feasibility of applying depthwise separable convolution instead of standard convolution in the field of text matching is analyzed in detail. The experimental results show that our model’s inference speed is at least 1.5 times faster and it has 42.76% fewer parameters compared to similarly performing models, while its accuracy on the SciTail dataset of is state-of-the-art among all lightweight models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly (2009). http://www.oreilly.de/catalog/9780596516499/index.html

  2. Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp. 632–642 (2015a). https://doi.org/10.18653/v1/D15-1075. https://aclanthology.org/D15-1075

  3. Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: Màrquez L, Callison-Burch C, Su J, et al (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015. The Association for Computational Linguistics, pp. 632–642 (2015b). https://doi.org/10.18653/v1/d15-1075

  4. Chen, Q., Zhu, X., Ling, Z.H., et al.: Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, pp. 1657–1668 (2017). https://doi.org/10.18653/v1/P17-1152. https://aclanthology.org/P17-1152

  5. Ding, H., Chen, K., Huo, Q.: Compressing cnn-dblstm models for ocr with teacher-student learning and tucker decomposition. Pattern Recogn. 96, (2019). https://doi.org/10.1016/j.patcog.2019.07.002. https://www.sciencedirect.com/science/article/pii/S0031320319302547

  6. Gong, Y., Luo, H., Zhang, J.: Natural language inference over interaction space. In: International conference on learning representations (2018). https://openreview.net/forum?id=r1dHXnH6-

  7. Haase, D., Amthor, M.: Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (2020a)

  8. Haase, D., Amthor, M.: Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets. In: 2020 IEEE/CVF Conference on computer vision and attern recognition, CVPR 2020, Seattle, WA, USA. Computer Vision Foundation / IEEE, pp. 14588–14597 (2020b). https://doi.org/10.1109/CVPR42600.2020.01461. https://openaccess.thecvf.com/content_CVPR_2020/html/Haase_Rethinking_Depthwise_Separable_Convolutions_How_Intra-Kernel_Correlations_Lead_to_Improved_CVPR_2020_paper.html. Accessed 13–19 2020

  9. Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint (2017a). arXiv:1704.04861

  10. Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017b). arXiv:1704.04861

  11. Jeon, G., Jeong, H., Choi, J.: An efficient explorative sampling considering the generative boundaries of deep generative neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 34(04), 4288–4295 (2020). https://doi.org/10.1609/aaai.v34i04.5852. https://ojs.aaai.org/index.php/AAAI/article/view/5852

  12. Jin, J., Dundar, A., Culurciello, E.: Flattened convolutional neural networks for feedforward acceleration. 1412.5474 (2015)

  13. Khot, T., Sabharwal, A., Clark, P.: Scitail: A textual entailment dataset from science question answering. In: McIlraith SA, Weinberger KQ (eds.) Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA. AAAI Press, pp. 5189–5197 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17368. Accessed 2–7 Feb 2018

  14. Kiers, H.: Towards a standardized notation and terminology in multiway analysis. J. Chemom. 14(3), 105–122 (2000). https://doi.org/10.1002/1099-128X(200005/06)14:3<105::AID-CEM582>3.0.CO;2-I

    Article  Google Scholar 

  15. Kim, S., Hong, J., Kang, I., et al.: Semantic sentence matching with densely-connected recurrent and co-attentive information. CoRR abs/1805.11360 (2018). arXiv:1805.11360

  16. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, Conference Track Proceedings (2015). arXiv:1412.6980. Accessed 7–9 May 2015

  17. Lai, S., Liu, K., He, S., et al.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016). https://doi.org/10.1109/MIS.2016.45

    Article  Google Scholar 

  18. Lan, Z., Chen, M., Goodman, S., et al.: Albert: A lite bert for self-supervised learning of language representations. In: International conference on learning representations (2020). https://openreview.net/forum?id=H1eA7AEtvS

  19. Lecun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  20. Lee, R., Chin, Y., Chang, S.: Application of principal component analysis to multikey searching. IEEE Trans. Softw. Eng. SE-2(3):185–193 (1976). https://doi.org/10.1109/TSE.1976.225946

  21. Liu, X., Duh, K., Gao, J.: Stochastic answer networks for natural language inference. CoRR abs/1804.07888 (2018). arXiv:1804.07888

  22. Liu, X., He, P., Chen, W., et al.: Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4487–4496 (2019). https://doi.org/10.18653/v1/P19-1441. https://aclanthology.org/P19-1441

  23. Pan, B., Yang, Y., Zhao, Z., et al.: Discourse marker augmented network with reinforcement learning for natural language inference. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp. 989–999 (2018). https://doi.org/10.18653/v1/P18-1091. https://aclanthology.org/P18-1091

  24. Paranjape, B., Michael, J., Ghazvininejad, M., et al.: Prompting contrastive explanations for commonsense reasoning tasks. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, pp. 4179–4192 (2021). https://doi.org/10.18653/v1/2021.findings-acl.366. https://aclanthology.org/2021.findings-acl.366

  25. Parikh, A., Täckström, O., Das, D., et al.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp. 2249–2255 (2016), https://doi.org/10.18653/v1/D16-1244. https://aclanthology.org/D16-1244

  26. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach HM, Larochelle H, Beygelzimer A, et al (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, pp. 8024–8035 (2019). https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html. Accessed 8–14 Dec 2019

  27. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W., (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp. 1532–1543 (2014). https://doi.org/10.3115/v1/d14-1162. Accessed 25–29 Oct 2014

  28. Salimans, T., Kingma, D.P.: Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In: Neural information processing systems, p. 901 (2016)

  29. SIfre, L., Mallat, S.: Rigid-motion scattering for texture classification. arXiv e-prints pp arXiv–1403 (2014)

  30. Sun, Z., Fan, C., Han, Q., et al.: Self-explaining structures improve NLP models. CoRR abs/2012.01786 (2020). arXiv:2012.01786

  31. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, pp. 1556–1566 (2015). https://doi.org/10.3115/v1/P15-1150. https://aclanthology.org/P15-1150

  32. Tan, C., Wei, F., Wang, W., et al.: Multiway attention networks for modeling sentence pairs. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, pp. 4411–4417 (2018). https://doi.org/10.24963/ijcai.2018/613

  33. Tay, Y., Luu, A.T., Hui, S.C.: Co-stack residual affinity networks with multi-level attention refinement for matching text sequences. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp. 4492–4502 (2018a). https://doi.org/10.18653/v1/D18-1479. https://aclanthology.org/D18-1479

  34. Tay, Y., Luu, A.T., Hui, S.C.: Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp. 1565–1575 (2018b). https://doi.org/10.18653/v1/D18-1185. https://aclanthology.org/D18-1185

  35. Tay, Y., Luu, A.T., Hui, S.C.: Hermitian co-attention networks for text matching in asymmetrical domains. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, pp. 4425–4431 (2018c). https://doi.org/10.24963/ijcai.2018/615

  36. Tomar, G.S., Duque, T., Täckström, O., et al.: Neural paraphrase identification of questions with noisy pretraining. In: Proceedings of the first workshop on subword and character level models in NLP. Association for Computational Linguistics, Copenhagen, Denmark, pp. 142–147 (2017). https://doi.org/10.18653/v1/W17-4121. https://aclanthology.org/W17-4121

  37. Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, pp. 4144–4150 (2017). https://doi.org/10.24963/ijcai.2017/579

  38. Xu, D., Tian, Z., Lai, R., et al.: Deep learning based emotion analysis of microblog texts. Inf. Fusion 64, 1–11 (2020). https://api.semanticscholar.org/CorpusID:221910854

  39. Yang, R., Zhang, J., Gao, X., et al.: Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp. 4699–4709 (2019). https://doi.org/10.18653/v1/P19-1465. https://aclanthology.org/P19-1465

  40. Yu, L., Hermann, K.M., Blunsom, P., et al.: Deep learning for answer sentence selection. CoRR abs/1412.1632 (2014). arXiv:1412.1632

  41. Zhang, Z., Wu, Y., Zhao, H., et al.: Semantics-aware bert for language understanding. Proceedings of the AAAI Conference on Artificial Intelligence 34(05), 9628–9635 (2020). https://doi.org/10.1609/aaai.v34i05.6510. https://ojs.aaai.org/index.php/AAAI/article/view/6510

  42. Zhou, X., Dong, D., Wu, H., et al.: Multi-view response selection for human-computer conversation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, pp. 372–381 (2016). https://doi.org/10.18653/v1/D16-1036. https://aclanthology.org/D16-1036

Download references

Acknowledgements

This work was supported by National Key R &D Program of China , and Major Programs of the National Social Science Foundation of China.

Funding

Open access funding provided by National Key R &D Program of China (Grant No. 2021YFB3101300, 2021YFB3101302, 2021YFB3101305), Major Programs of the National Social Science Foundation of China (Grant No. 22 &ZD147) and the National Social Science Fund of China (Grant No. 23VRC094).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to this work.

Corresponding author

Correspondence to Danfeng Yan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics Approval

We declare that this submission follows the policies as outlined in the Guide for Authors. The current research involves no Human Participants and/or Animals.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Yan, D., Jiang, W. et al. Exploring highly concise and accurate text matching model with tiny weights. World Wide Web 27, 39 (2024). https://doi.org/10.1007/s11280-024-01262-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01262-7

Keywords

Navigation