[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Fuzzy Multigranularity Convolutional Neural Network With Double Attention Mechanisms for Measuring Semantic Textual Similarity

Published: 01 October 2024 Publication History

Abstract

Semantic textual similarity (STS) is a fundamental task in the field of natural language processing (NLP). Recent advances demonstrate that deep-learning-based approaches can achieve excitingly accurate STS measurement. However, existing studies cannot capture the spatial location of important information by attention mechanisms, fail to model sentences from the perspective of overall sentences, and neglect to deal with semantic fuzziness. In this article, we propose a novel double attentive fuzzy convolutional neural network (DAFCNN) to measure STS more accurately with the consideration of semantic fuzziness. This article first introduces the spatial attention module and combines it with the improved attentive convolutions to create a multigranularity convolutional neural network in DAFCNN, which not only extracts critical spatial location information but also models sentences from multiple perspectives at word and sentence levels. Second, DAFCNN pioneers a fuzzy learning module (FLM) to fulfill the extraction of fuzzy semantic features. By using the fuzzy membership function, fuzzy aggregation operator, and trainable parameters and weights, FLM can map sentence representations to fuzzy space to constitute representations with more accurate and rich semantics. Third, compared with various state-of-the-art STS models, DAFCNN decreases by 14.57&#x0025; mean-square error, increases by 4.61&#x0025; Pearson&#x0027;s &#x03B3; and 8.57&#x0025; Spearman&#x0027;s &#x03C1; on STS score datasets, and increases by 3.39&#x0025; accuracy and 2.41&#x0025; <italic>F</italic>1-score on semantic classification dataset. The ablation experiment demonstrates the effectiveness of each module of DAFCNN. Finally, the experimental results also indicate that FLM is a promising new attempt to incorporate fuzzy set theory in the NLP field.

References

[1]
F. Alam, M. Afzal, and K. M. Malik, “Comparative analysis of semantic similarity techniques for medical text,” in Proc. Int. Conf. Inf. Netw., Barcelona, Spain, 2020, pp. 106–109.
[2]
H. Naderi, B. Kiani, S. Madani, and K. Etminani, “Concept based auto-assignment of healthcare questions to domain experts in online Q&A communities,” Int. J. Med. Inform., vol. 137, May 2020, Art. no.
[3]
P. Thuy, Y. K. Lee, and S. Lee, “S-trans: Semantic transformation of XML healthcare data into OWL ontology,” Knowl-Based Syst., vol. 35, pp. 349–356, Nov. 2012.
[4]
I. Alonso and D. Contreras, “Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents: An UMLS approach,” Expert Syst. Appl., vol. 44, pp. 386–399, Feb. 2016.
[5]
N. Bölücü, B. Can, and H. Artuner, “A Siamese neural network for learning semantically-informed sentence embeddings,” Expert Syst. Appl., vol. 214, no. 15, Mar. 2023, Art. no.
[6]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[7]
J. Elman, “Finding structure in time,” Cognit. Sci., vol. 14, no. 2, pp. 179–211, Mar. 1990.
[8]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[9]
S. Xu, S. E, and Y. Xiang, “Enhanced attentive convolutional neural networks for sentence pair modeling,” Expert Syst. Appl., vol. 151, no. 1, Aug. 2020, Art. no.
[10]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Annu. Conf. NAACL-HLT, Minneapolis, MN, USA, 2019, vol. 1, pp. 4171–4186.
[11]
H. He, K. Gimpel, and J. Lin, “Multi-perspective sentence similarity modeling with convolutional neural networks,” in Proc. Conf. Empirical Methods Natural Lang. Process., Lisbon, Portugal, 2015, pp. 1576–1586.
[12]
H. He and J. Lin, “Pairwise word interaction modeling with deep neural networks for semantic similarity measurement,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Ling., Hum. Lang. Technol, San Diego, CA, USA, 2016, pp. 937–948.
[13]
S. Kim, I. Kang, and N. Kwak, “Semantic sentence matching with densely-connected recurrent and co-attentive information,” in Proc. AAAI Conf. Artif. Intell., Honolulu, HI, USA, 2019, vol. 33, no. 1, pp. 6586–6593.
[14]
W. Yin, H. Schütze, B. Xiang, and B. Zhou, “ABCNN: Attention-based convolutional neural network for modeling sentence pair,” Trans. Assoc. Comput. Linguistics, vol. 4, pp. 259–272, 2016.
[15]
W. Yin, K. Kann, M. Yu, and H. Schütze, “Comparative study of CNN and RNN for natural language processing,” 2017, arXiv:1702.01923.
[16]
J. Bradbury, S. Merity, C. Xiong, and R. Socher, “Quasi-recurrent neural networks,” in Proc. Int. Conf. Learn. Representation, San Juan, Puerto Rico, 2016, pp. 1–12.
[17]
M. Luo et al., “A multi-granularity convolutional neural network model with temporal information and attention mechanism for efficient diabetes medical cost prediction,” Comput. Biol. Med., vol. 151, Dec. 2022, Art. no.
[18]
W. Yin and H. Schütze, “Attentive convolution: Equipping CNNs with RNN-style attention mechanisms,” Trans. Assoc. Comput. Linguistics, vol. 6, pp. 687–702, Dec. 2018.
[19]
Y. Gong, H. Luo, and J. Zhang, “Natural language inference over interaction space,” in Proc. Int. Conf. Learn. Representation, Vancouver, BC, Canada, 2018, pp. 1–15.
[20]
S. Woo, J. Park, J.-Y. Lee, and I. Kweon, “CBAM: Convolutional block attention module,” in Proc. Eur. Conf. Comput. Vis., Munich, Germany, 2018, pp. 3–19.
[21]
Y. Zheng, Z. Xu, and X. Wang, “The fusion of deep learning and fuzzy systems: A state-of-the-art survey,” IEEE Trans. Fuzzy Syst., vol. 30, no. 8, pp. 2783–2799, Aug. 2022.
[22]
S. Piantadosi, H. Tily, and E. Gibson, “The communicative function of ambiguity in language,” Cognition, vol. 122, no. 3, pp. 280–291, Mar. 2012.
[23]
L. Zadeh, “Fuzzy logic—A personal perspective,” Fuzzy Sets Syst., vol. 281, no. 15, pp. 4–20, Dec. 2015.
[24]
T. Wang, H. Shi, W. Liu, and X. Yan, “A joint FrameNet and element focusing sentence-BERT method of sentence similarity computation,” Expert Syst. Appl., vol. 200, Aug. 2022, Art. no.
[25]
J. Mueller and A. Thyagarajan, “Siamese recurrent architectures for learning sentence similarity,” in Proc. AAAI Conf. Artif. Intell., Phoenix, AZ, USA, 2016, vol. 30, no. 1, pp. 2786–2792.
[26]
M. Shajalal and M. Aono, “Semantic textual similarity between sentences using bilingual word semantics,” Prog. Artif. Intell., vol. 8, pp. 263–272, Mar. 2019.
[27]
Y. Yang et al., “Learning semantic textual similarity from conversations,” in Proc. 3rd Workshop Representation Learn. NLP, Melbourne, VIC, Australia, 2018, pp. 164–174.
[28]
N. Subramani, N. Suresh, and M. Peters, “Extracting latent steering vectors from pretrained language models,” in Proc. Findings Assoc. Comput. Linguistics, Dublin, Ireland, 2022, pp. 566–581.
[29]
J. Martinez-Gil, R. Mokadem, J. Küng, and A. Hameurlain, “Neurofuzzy semantic similarity measurement,” Data Knowl. Eng., vol. 145, May 2023, Art. no.
[30]
Q. Liu, H. Huang, J. Xuan, G. Zhang, Y. Gao, and J. Lu, “A fuzzy word similarity measure for selecting top-k similar words in query expansion,” IEEE Trans. Fuzzy Syst., vol. 29, no. 8, pp. 2132–2144, Aug. 2021.
[31]
K. Chen, M. Yang, T. Zhao, and M. Zhang, “Data-driven fuzzy target-side representation for intelligent translation system,” IEEE Trans. Fuzzy Syst., vol. 30, no. 11, pp. 4568–4577, Nov. 2022.
[32]
R. Zhao and K. Mao, “Fuzzy bag-of-words model for document representation,” IEEE Trans. Fuzzy Syst., vol. 26, no. 2, pp. 794–804, Apr. 2018.
[33]
G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, Nov. 1975.
[34]
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Jan. 2003.
[35]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. A. Harshman, “Indexing by latent semantic analysis,” J. Amer. Soc. Inf. Sci., vol. 41, no. 6, pp. 391–407, Sep. 1990.
[36]
V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst., Montreal, QC, Canada, 2014, pp. 2204–2212.
[37]
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proc. 3rd Int. Conf. Learn. Representation, San Diego, CA, USA, 2015, pp. 1–15.
[38]
K. Bai, X. Zhu, S. Wen, R. Zhang, and W. Zhang, “Broad learning based dynamic fuzzy inference system with adaptive structure and interpretable fuzzy rules,” IEEE Trans. Fuzzy Syst., vol. 30, no. 8, pp. 3270–3283, Aug. 2022.
[39]
J. Bromley et al., “Signature verification using a ‘Siamese’ time delay neural network,” Int. J. Pattern Recognit. Artif. Intell., vol. 7, no. 4, pp. 669–683, Aug. 1993.
[40]
Z. Wang, W. Hamza, and R. Florian, “Bilateral multi-perspective matching for natural language sentences,” in Proc. 26th Int. Joint Conf. Artif. Intell., Melbourne, VIC, Australia, 2017, pp. 4144–4150.
[41]
S. Zhu, J. Zeng, and H. Mamitsuka, “Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity,” Bioinformatics, vol. 25, no. 15, pp. 1944–1951, Aug. 2009.
[42]
S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” in Proc. Int. Conf. Learn. Representations, Toulon, France, 2017, pp. 1–13.
[43]
S. Li, “Some comments on fuzzy variables with different membership functions,” Soft Comput., vol. 16, pp. 505–509, Mar. 2012.
[44]
W. M. Dong and F. S. Wong, “Fuzzy weighted averages and implementation of the extension principle,” Fuzzy Sets Syst., vol. 21, no. 2, pp. 183–199, Feb. 1987.
[45]
H. Liao and Z. Xu, “Approaches to manage hesitant fuzzy linguistic information based on the cosine distance and similarity measures for HFLTSs and their application in qualitative decision making,” Expert Syst. Appl., vol. 42, no. 12, pp. 5328–5336, Jul. 2015.
[46]
L. Bentivogli et al., “SICK through the SemEval glasses: Lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment,” Lang. Resour. Eval., vol. 50, no. 1, pp. 95–124, Jan. 2016.
[47]
D. Cera, M. Diabb, E. Agirre, I. Lopez-Gazpio, and L. Specia, “SemEval-2017 task 1: Semantic textual similarity - multilingual and cross-lingual focused evaluation,” in Proc. Int. Workshop Semantic Eval., Vancouver, BC, Canada, 2017, pp. 1–14.
[48]
W. Dolan, C. Quirk, C. Brockett, and B. Dolan, “Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources,” in Proc. Int. Conf. Comput. Linguistics, Geneva, Switzerland, 2004, pp. 350–356.
[49]
M. Lee and M. Welsh, “An empirical evaluation of models of text document similarity,” in Proc. 27th Annu. Conf. Cogn. Sci. Soc., Stresa, Italy, 2005, pp. 1254–1259.
[50]
K. Bai et al., “A data-knowledge-driven interval type-2 fuzzy neural network with interpretability and self-adaptive structure,” Inform. Sci., vol. 660, Mar. 2024, Art. no.

Index Terms

  1. A Fuzzy Multigranularity Convolutional Neural Network With Double Attention Mechanisms for Measuring Semantic Textual Similarity
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image IEEE Transactions on Fuzzy Systems
            IEEE Transactions on Fuzzy Systems  Volume 32, Issue 10
            Oct. 2024
            597 pages

            Publisher

            IEEE Press

            Publication History

            Published: 01 October 2024

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 29 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media