Abstract
In the current entity relation extraction technology, more and more researchers focus on semi-supervised Bootstrapping method, because it does not require a large number of artificial tagging corpus, needs only a small amount of seed set, by self-iterative extended to obtain large-scale knowledge base. However, after a large number of iterations, there will be “semantic drift,” that is, the accuracy will reduce due to the accumulation of errors. In order to improve the accuracy of the relation instance the quality of the pattern, it is necessary to evaluate the reliability of instances and patterns. This paper uses large-scale news headline sentences set in the search engine, evaluates the reliability of instances by co-occurrence relation between description words and sentences set, then evaluates the reliability of patterns by the number of positive and negative instances in patterns historical matching record, and selects new patterns to extend and optimize. The experimental results show that the reliability evaluation of instances and patterns used in the iteration effectively improves the accuracy of relation extraction and improves the quality of the extracted pattern.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tan Hongye, Zhao Tiejun, Yao Jianmin, A Study on Pattern Generalization in Extended Named Entity Recognition. Chinese Journal of Electronic, 2007, 4:675–678.
Agichtein, Eugene, Gravano, et al. Snowball: extracting relations from large plain-text collections [J]. 2000:85–94.
Sarhan I, El-Sonbaty Y, El-Nasr M A. Semi-Supervised Pattern Based Algorithm for Arabic Relation Extraction [C]. IEEE, International Conference on TOOLS with Artificial Intelligence. IEEE, 2017:177–183.
Chen C, He L, Lin X. REV: extracting entity relations from World Wide Web [C]. International Conference on Ubiquitous Information Management and Communication. ACM, 2012:8.
Brin S. Extracting Patterns and Relations from the World Wide Web [C]. International Workshop on the World Wide Web and Databases. Springer Berlin Heidelberg, 1998:172–183.
Liu T, Che W, Zhenghua L I. Language Technology Platform [J]. Journal of Chinese Information Processing, 2011, 2(6):13–16.
Tian J L, Wei Z. Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System [J]. Journal of Jilin University, 2010, 28(06).
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and their Compositionality [J]. Advances in neural information processing systems, 2013, 26:3111–3119.
Bille P. A survey on tree edit distance and related problems [J]. Theoretical Computer Science, 2005, 337(1):217–239.
Altınçay H, Erenel Z. Analytical evaluation of term weighting schemes for text categorization [J]. Pattern Recognition Letters, 2010, 31(11):1310–1323.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61471232).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Qin, Z., Ye, F. (2019). Research on Reliability of Instance and Pattern in Semi-supervised Entity Relation Extraction. In: Patnaik, S., Jain, V. (eds) Recent Developments in Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing, vol 752. Springer, Singapore. https://doi.org/10.1007/978-981-10-8944-2_44
Download citation
DOI: https://doi.org/10.1007/978-981-10-8944-2_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8943-5
Online ISBN: 978-981-10-8944-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)