[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Chinese nested entity recognition method for the finance domain based on heterogeneous graph network

Published: 25 September 2024 Publication History

Abstract

In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.

References

[1]
Alex B., Haddow B., Grover C., Recognizing nested named entities in biomedical text, Biological, Translational, and Clinical Language Processing (2007) 65–72.
[2]
Brockschmidt M., Gnn-film: Graph neural networks with feature-wise linear modulation, International Conference on Machine Learning (2020) 1144–1152.
[3]
Fan J., Su K., An efficient algorithm for matching multiple pattern, IEEE Transactions on Knowledge and Data Engineering 5 (1993) 339–351.
[4]
J.-I A., An efficient digital search algorithm by using a double-array structure, IEEE Transactions on Software Engineering 15 (1989) 1066–1077.
[5]
Ju, M., Miwa, M., & Ananiadou, S. (2018). A neural layered model for nested named entity recognition. Vol. 1, In Proc. 2018 conf. Nor Ameri chapt. assoc. comput. linguistics: human langu. tech. (pp. 1446–1459).
[6]
Kim, J., Ohta, T., Tsuruoka, Y., & Tateisi, Y. (2003). Introduction to the Bio-Entity Recognition Task at JNLPBA. In Proc. 2003 workshop. NLP. biomed. (pp. 70–75).
[7]
Lee, J., Pham, L., & Uzuner, O. (2022). MNLP at FinCausal2022: Nested NER with a Generative Model. In Proc. 4th financial narrative processing workshop@ LREC2022 (pp. 135–138).
[8]
Li Y., Multi-layer sequence labeling with contextualized embeddings for biomedical named entity recognition, Journal of Computer Research and Development 55 (2018) 2089–2100.
[9]
Li, J., Fei, H., & Liu, J. (2022). Unified named entity recognition as word-word relation classification. Vol. 36, In Proc. AAAI conf on artificial intelligence (pp. 10965–10973).
[10]
Liao T., Huang R., Zhang S., Nested named entity recognition based on dual stream feature complementation, Entropy 24 (2022) 1454.
[11]
Liu, Z., Huang, D., Huang, K., & Zhao, J. (2020). Finbert: A pre-trained finance language representation model for finance text mining. In Proc. twenty-ninth international joint conf. artificial intelligence (pp. 5–10).
[12]
Lu, W., & Roth, D. (2015). Joint mention extraction and classification with mention hypergraphs. In Proc. conf. EMNLP (pp. 857–867).
[13]
Luo, Y., & Zhao, H. (2020). Bipartite Flat-Graph Network for Nested Named Entity Recognition. In Proc. 58th annu. meeting assoc. comput. linguistics (pp. 6408–6418).
[14]
McDonald, R. (2005). Recognizing nested named entities using layering, conjunctions, and separation. In Proc. 43rd annu. meeting assoc. comput. linguistics (pp. 1–8).
[15]
Ouchi, H., Suzuki, J., Kobayashi, S., Yokoi, S., Kuribayashi, T., Konno, R., et al. (2020). Instance-based learning of span representations: A case study through named entity recognition. In Proc. 58th annu. meeting assoc. comput. linguistics (pp. 6452–6459).
[16]
Seyler, D., Dembelova, T., Del Corro, L., Hoffart, J., & Weikum, G. (2018). A study of the importance of external knowledge in the named entity recognition task. Vol. 2, In Proc. 56th annu. meeting assoc. comput. linguistics (pp. 241–246).
[17]
Shen, Y. (2021). Locate and label: A two-stage identifier for nested named entity recognition. In Proc. 59th annu. meeting assoc. comput. linguistics.
[18]
Shibuya T., Hovy E., Nested named entity recognition via second-best sequence learning and decoding, Transactions of the Association for Computational Linguistics 8 (2020) 605–620.
[19]
Sui, D., Tian, Z., & Chen, Y. (2021). A large-scale chinese multimodal ner dataset with speech clues. Vol. 1, In Proc. 59th annu. meeting assoc. comput. linguistics and 11th inter joint conf. natur langu proc. (pp. 2807–2818).
[20]
Sun L., Sun Y., Ji F., Wang C., Joint learning of token context and span feature for span-based nested NER, IEEE/ACM Transactions on Audio, Speech, and Language Processing (2020) 2720–2730.
[21]
Wan, J., Ru, D., & Zhang, W. (2022). Nested Named Entity Recognition with Span-level Graphs. Vol. 1, In Proc. 60th annu. meeting assoc. comput. linguistics (pp. 892–903).
[22]
Wang, B., & Lu, W. (2018). Neural segmental hypergraphs for overlapping mention recognition. In Proc. conf. EMNLP (pp. 204–214).
[23]
Wang, B., Lu, W., Wang, Y., & Jin, H. (2018). A neural transition-based model for nested mention recognition. In Proc. 2018 conf. empirical methods. natur langu proc. (pp. 1011–1017).
[24]
Wang, J., Shou, L., Chen, K., & Chen, G. (2020). Pyramid: A layered model for nested named entity recognition. In Proc. 58th annu. meeting assoc. comput. linguistics (pp. 5918–5928).
[25]
Wang Y., Tong H., Zhu Z., Li Y., Nested named entity recognition: A survey, ACM Transactions on Knowledge Discovery from Data 16 (2022) 1–29.
[26]
Xu Q., Zhu P., Luo Y., Dong Q., Research progress in Chinese named entity recognition in the financial field, Journal of East China Normal University (Natural Science) 2021 (2021) 1–13.
[27]
Yalcin K., Cicekli I., Ercan G., An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding, Expert Systems with Applications 197 (2022).
[28]
Yan Y., Song S., Local hypergraph-based nested named entity recognition as query-based sequence labeling, 2022, arXiv preprint arXiv:2204.11467.
[29]
Yan H., Sun Y., Li X., An embarrassingly easy but strong baseline for nested named entity recognition, 2022, arXiv preprint arXiv:2208.04534.
[30]
Yang Z., Ma J., Chen H., Context-aware attentive multilevel feature fusion for named entity recognition, IEEE Transactions on Neural Networks and Learning (2022) 1–12.
[31]
Zhang J., A hybrid model for nested named entity recognition in biomedical text, Journal of Biomedical Informatics 39 (2006) 299–308.
[32]
Zhang N., Chen M., Bi Z., Liang X., Li L., Shang X., et al., Cblue: A chinese biomedical language understanding evaluation benchmark, 2021, arXiv preprint arXiv:2106.08087.
[33]
Zhang H., Wang X., Liu J., Chinese named entity recognition method for the finance domain based on enhanced features and pretrained language models, Information Sciences 625 (2023) 385–400.
[34]
Zhang Z., Zhao Y., Gao H., Hu M., LinkNER: Linking local named entity recognition models to large language models using uncertainty, 2024, arXiv preprint arXiv:2402.10573.
[35]
Zhou G., Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid, International Journal of Medical Informatics 75 (2006) 456–467.
[36]
Zhou L., Li J., Gu Z., Panner: Pos-aware nested named entity recognition through heterogeneous graph neural network, IEEE Transactions on Computational Social Systems (2022).
[37]
Zhou G., Zhang J., Su J., Shen D., Tan C., Recognizing names in biomedical texts: A machine learning approach, Bioinformatics 20 (2004) 1178–1190.
[38]
Zhu, E., & Li, J. (2022). Boundary Smoothing for Named Entity Recognition. Vol. 1, In Proc. 60th annu. meeting assoc. comput. linguistics (pp. 7096–7108).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 61, Issue 5
Sep 2024
850 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 25 September 2024

Author Tags

  1. Chinese finance domain
  2. Nested named entity recognition
  3. Heterogeneous graphs
  4. Expert knowledge

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media