Geographic Knowledge Graph Attribute Normalization: Improving the Accuracy by Fusing Optimal Granularity Clustering and Co-Occurrence Analysis
<p>Flow chart of attribute normalization technique.</p> "> Figure 2
<p>Property Alignment Schematic.</p> "> Figure 3
<p>Classification System for Geographic Attribute Data.</p> "> Figure 4
<p>Clustering granularity diagram. Each color represents a class and each circle represents a property node. (<b>a</b>) Too-fine-grained classification, members that should be in A are divided into other classes; (<b>b</b>) standard classification, suitable granularity of classification; (<b>c</b>) too-coarse classification, members that should not be in A are divided into A.</p> "> Figure 5
<p>Flow chart of attribute normalization technique.</p> "> Figure 6
<p>0.7–0.9 similarity threshold graph.</p> "> Figure 7
<p>Comparison chart of the three methods and average results.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Methods and Models
3.1. Overview
3.2. Method Modeling: Optimal Granularity Attribute Clustering Based on Labeled Target Detection Algorithm
3.3. Method Modeling: Accurate Identification of Synonymous Attributes Based on Co-Occurrence Analysis and Rule Reasoning
3.3.1. Outcome Scoring Strategy Based on Co-Occurrence Analysis
3.3.2. Result Optimization Based on Rule Reasoning
4. Experiment and Discussion
4.1. Introduction to DataSets
4.2. Experimental Condition
4.2.1. Experimental Parameters
- (1)
- Word2vec parameter setting
- (2)
- Similarity threshold parameter setting
4.2.2. Experimental Evaluation Index
4.3. Experimental Results and Analysis
4.3.1. Clustering Granularity Experiments
4.3.2. Experimental Results of Exact Identification of Synonymous Attributes
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Deren, L.I. From Geomatics to Geospatial Intelligent Service Science. Acta Geod. Cartogr. Sin. 2017, 46, 1207–1212. [Google Scholar] [CrossRef]
- Rowley, J. The Wisdom Hierarchy: Representations of the DIKW Hierarchy. J. Inf. Sci. 2007, 33, 163–180. [Google Scholar] [CrossRef] [Green Version]
- Golledge, R.G. The Nature of Geographic Thought. Ann. Assoc. Am. Geogr. 2002, 92, 1–14. [Google Scholar] [CrossRef]
- Stoltman, J.; Lidstone, J.; Kidman, G. The 2016 International Charter on Geographical Education. Int. Res. Geogr. Environ. Educ. 2017, 26, 1–2. [Google Scholar] [CrossRef] [Green Version]
- Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; Zhang, W. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 601–610. [Google Scholar] [CrossRef]
- Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Philip, S.Y. A Survey on Knowledge Graphs: Representation, Acquisition and Applications. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 494–514. [Google Scholar] [CrossRef]
- Zhang, N.; Deng, S.; Chen, H.; Chen, X.; Chen, J.; Li, X.; Zhang, Y. Structured Knowledge Base as Prior Knowledge to Improve Urban Data Analysis. ISPRS Int. J. Geo-Inf. 2018, 7, 264. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Zhu, J.; Zhu, Q.; Xie, Y.; Li, W.; Fu, L.; Zhang, J.; Tan, J. The Construction of Personalized Virtual Landslide Disaster Environments Based on Knowledge Graphs and Deep Neural Networks. Int. J. Digit. Earth 2020, 13, 1637–1655. [Google Scholar] [CrossRef]
- Sun, K.; Hu, Y.; Song, J.; Zhu, Y. Aligning Geographic Entities from Historical Maps for Building Knowledge Graphs. Int. J. Geogr. Inf. Sci. 2021, 35, 2078–2107. [Google Scholar] [CrossRef]
- Shen, Y.; Chen, Z.; Cheng, G.; Qu, Y. CKGG: A Chinese Knowledge Graph for High-School Geography Education and Beyond. In Proceedings of the International Semantic Web Conference, TBA, Virtual event, 24–28 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 429–445. [Google Scholar]
- Auer, S.; Lehmann, J.; Hellmann, S. LinkedGeoData: Adding a Spatial Dimension to the Web of Data. In Proceedings of the 8th International Semantic Web Conference (ISWC ‘09), the Westfields Conference Center, Washington, DC, USA, 25–29 October 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 731–746. [Google Scholar] [CrossRef] [Green Version]
- Maltese, V.; Farazi, F. A Semantic Schema for GeoNames; Università Di Trento: Trento, Italy, 2013. [Google Scholar]
- Ballatore, A.; Wilson, D.C.; Bertolotto, M. A survey of volunteered open geo-knowledge bases in the semantic web. In Quality Issues in the Management of Web Information; Springer: Berlin/Heidelberg, Germany, 2013; pp. 93–120. [Google Scholar]
- Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007. [Google Scholar] [CrossRef] [Green Version]
- Deng, S. CrowdGeoKG: Crowdsourced Geo-Knowledge Graph. In Proceedings of the China Conference on Knowledge Graph and Semantic Computing, Chengdu, China, 26–29 August 2017. [Google Scholar] [CrossRef]
- Speer, R.; Havasi, C. ConceptNet 5: A Large Semantic Network for Relational Knowledge. In The People’s Web Meets NLP; Springer: Berlin/Heidelberg, Germany, 2013; pp. 161–176. [Google Scholar] [CrossRef]
- Chen, j.; Liu, W.; Wu, H. Basic Issues and Research Agenda of Geospatial Knowledge Service. Geomatics and Information Science of Wuhan University. Geomat. Inf. Sci. Wuhan Univ. 2019, 44, 38–47. [Google Scholar]
- Du, C.; Si, W.; Xu, J. Querying and Reasoning of Spatial Relations Based on Geographic Semantics. J. Geo-Inf. Sci. 2010, 12, 48–55. [Google Scholar] [CrossRef]
- Yang, C.; Wu, H.; Huang, Q.; Li, Z.; Jing, L. Using spatial principles to optimize distributed computing for enabling the physical science discoveries. Proc. Natl. Acad. Sci. USA 2011, 108, 5498–5503. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
- Haihong, E.; Cheng, R.; Song, M.; Zhu, P.; Wang, Z. A Joint Embedding Method of Relations and Attributes for Entity Alignment. Int. J. Mach. Learn. Comput. 2020, 10, 605–611. [Google Scholar]
- Gunaratna, K.; Thirunarayan, K.; Jain, P.; Sheth, A.; Wijeratne, S. A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data. In Proceedings of the 9th International Conference on Semantic Systems, Graz, Austria, 4–6 September; 2013; pp. 33–40. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Z.; Gentile, A.L.; Blomqvist, E.; Augenstein, I.; Ciravegna, F. An Unsupervised Data-Driven Method to Discover Equivalent Relations in Large Linked Datasets. Semant. Web 2017, 8, 197–223. [Google Scholar] [CrossRef] [Green Version]
- Bauer, F.; Kaltenböck, M. Linked Open Data: The Essentials; Mono/Monochrom: Vienna, Austria, 2011; Volume 710. [Google Scholar]
- Ristad, E.S.; Yianilos, P.N. Learning string-edit distance. IEEE Trans. Pattern Anal. Mach.-Intell. 1998, 20, 522–532. [Google Scholar] [CrossRef] [Green Version]
- Tsuruoka, Y.; Mcnaught, J.; Tsujii, J.; Ananiadou, S. Learning String Similarity Measures for Gene/Protein Name Dictionary Look-up Using Logistic Regression. Bioinformatics 2007, 23, 2768–2774. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Chen, S.-H.; Chen, J.-G.G. Property Alignment of Linked Data Based on Similarity between Functions. Int. J. Database Theory Appl. 2015, 8, 191–206. [Google Scholar] [CrossRef]
- Huang, T.; Zhang, W.; Liang, X.; Fu, K. Data-driven method for fine-grained property alignment between Chinese open datasets. J. Southeast Univ. (Nat. Sci. Ed.) 2017, 47, 660–666. [Google Scholar] [CrossRef]
- Šmíd, J.; Neruda, R. Comparing Datasets by Attribute Alignment. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Orlando, FL, USA, 9–12 December 2014; pp. 56–62. [Google Scholar] [CrossRef]
- Hinton, G.E. Learning distributed representations of concepts. In Proceedings of the Eighth Conference of the Cognitive Science Society, Amherst, MA, USA, 15–17 August 1986. [Google Scholar]
- Newman, M.E.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [Green Version]
- Blondel, V.D.; Guillaume, J.; Lambiotte, R.; Lefebvre, E. Fast Unfolding of Communities in Large Networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef] [Green Version]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Chen, Z. An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources. ISPRS Int. J. Geo-Inf. 2018, 7, 98. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Hu, Y.; Bian, G. Research on String Similarity Algorithm Based on Levenshtein Distance. In Proceedings of the 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 25–26 March 2017; IEEE: New York, NY, USA, 2017; pp. 2247–2251. [Google Scholar]
- Ren, X.; Han, J. Automatic Synonym Discovery with Knowledge Bases. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017. [Google Scholar] [CrossRef] [Green Version]
- Le, Q.; Mikolov, T.; Com, T.G. Distributed Representations of Sentences and Documents. In Proceedings of the International Conference on Machine Learning, Detroit, MI, USA, 3–6 December 2014; Volume 32, pp. 1188–1196. [Google Scholar]
Rule 1 | ||
Rule 2 | ||
Rule 3 |
Data Source | Number of Triples | Features | Vector Dimension | Results |
---|---|---|---|---|
Encyclopedia data all triples | 25,455,709 | Larger corpus | 100 | The similarity is greater than 0.8 for 2233 articles, with a high accuracy rate |
Encyclopedia Data Introduction Triples | 28,196 | strong contextual relevance | 100 | Similarity greater than 0.8 is 253,275 articles, the effect is poor |
Category | Comparison of Attribute Clustering Accuracy Methods | Parameter Selection Results | |||||
---|---|---|---|---|---|---|---|
Clustering Parameters | 1.0 | 0.8 | 0.6 | 0.4 | 0.2 | ||
Mountain | Precision | 0.862 | 0.871 | 0.902 | 0.916 * | 0.895 | 0.4 |
Error rate | 0.060 | 0.056 | 0.041 | 0.036 * | 0.047 | 0.4 | |
Water | Precision | 0.912 | 0.912 | 0.942 | 0.965 * | 0.942 | 0.4 |
Error rate | 0.056 | 0.056 | 0.043 | 0.026 * | 0.043 | 0.4 | |
Forest | Precision | 0.833 | 0.833 | 0.962 * | 0.926 | 0.882 | 0.6 |
Error rate | 0.126 | 0.126 | 0.025 * | 0.063 | 0.088 | 0.6 | |
Field | Precision | 0.897 | 0.897 | 0.966 * | 0.951 | 0.930 | 0.6 |
Error rate | 0.081 | 0.081 | 0.011 * | 0.023 | 0.069 | 0.6 | |
Lake | Precision | 0.908 | 0.949 * | 0.937 | 0.937 | 0.888 | 0.8 |
Error rate | 0.066 | 0.042 * | 0.054 | 0.054 | 0.084 | 0.8 | |
Grass | Precision | 0.934 | 0.934 | 0.934 | 0.962 * | 0.934 | 0.4 |
Error rate | 0.054 | 0.054 | 0.054 | 0.027 * | 0.054 | 0.4 |
Attribute | P | R | |
---|---|---|---|
Spatial Relations | Cardinal Direction Relation | 100% | 96.7% |
Topological Relation | 92.4% | 89.4% | |
Distance Relation | 100% | 95.2% | |
Data Attributes | Metrology | 97.4% | 99.6% |
Coordinate | 95.7% | 100% | |
Time | 100% | 100% |
Category | Method | P | R | F1 |
---|---|---|---|---|
Mountain | Similarity results | 34.0% | 98.3% | 50.5% |
Co-occurrence analysis results | 49.1% | 98.2% | 65.4% | |
Rule-based modification results | 91.1% | 96.3% | 93.6% | |
Water | Similarity results | 29.0% | 98.2% | 44.8% |
Co-occurrence analysis results | 57.4% | 97.3% | 72.2% | |
Rule-based modification results | 88.5% | 96.7% | 92.4% | |
Forest | Similarity results | 31.0% | 99.3% | 47.2% |
Co-occurrence analysis results | 61.2% | 99.2% | 75.6% | |
Rule-based modification results | 92.7% | 98.1% | 95.3% | |
Field | Similarity results | 32.0% | 98.5% | 48.3% |
Co-occurrence analysis results | 63.0% | 98.3% | 76.7% | |
Rule-based modification results | 93.3% | 94.9% | 94.1% | |
Lake | Similarity results | 28.0% | 98.7% | 43.6% |
Co-occurrence analysis results | 59.4% | 98.6% | 74.1% | |
Rule-based modification results | 90.5% | 95.7% | 93.0% | |
Grass | Similarity results | 33.0% | 99.2% | 49.5% |
Co-occurrence analysis results | 62.8% | 99.0% | 76.8% | |
Rule-based modification results | 90.1% | 93.9% | 92.0% |
Method | P | R | F1 |
---|---|---|---|
Similarity results | 31.2% | 98.7% | 47.3% |
Co-occurrence analysis results | 58.8% | 98.4% | 73.5% |
Rule-based modification results | 91.0% | 95.9% | 93.4% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, C.; Zhang, B.; Liu, W.; Du, M.; Luo, N.; Zhai, X.; Ba, T. Geographic Knowledge Graph Attribute Normalization: Improving the Accuracy by Fusing Optimal Granularity Clustering and Co-Occurrence Analysis. ISPRS Int. J. Geo-Inf. 2022, 11, 360. https://doi.org/10.3390/ijgi11070360
Yin C, Zhang B, Liu W, Du M, Luo N, Zhai X, Ba T. Geographic Knowledge Graph Attribute Normalization: Improving the Accuracy by Fusing Optimal Granularity Clustering and Co-Occurrence Analysis. ISPRS International Journal of Geo-Information. 2022; 11(7):360. https://doi.org/10.3390/ijgi11070360
Chicago/Turabian StyleYin, Chuan, Binyu Zhang, Wanzeng Liu, Mingyi Du, Nana Luo, Xi Zhai, and Tu Ba. 2022. "Geographic Knowledge Graph Attribute Normalization: Improving the Accuracy by Fusing Optimal Granularity Clustering and Co-Occurrence Analysis" ISPRS International Journal of Geo-Information 11, no. 7: 360. https://doi.org/10.3390/ijgi11070360
APA StyleYin, C., Zhang, B., Liu, W., Du, M., Luo, N., Zhai, X., & Ba, T. (2022). Geographic Knowledge Graph Attribute Normalization: Improving the Accuracy by Fusing Optimal Granularity Clustering and Co-Occurrence Analysis. ISPRS International Journal of Geo-Information, 11(7), 360. https://doi.org/10.3390/ijgi11070360