A Knowledge-Based Filtering Method for Open Relations among Geo-Entities
<p>Flowchart for acquiring geo-related knowledge.</p> "> Figure 2
<p>Flowchart of confidence prediction of geo-entity relations.</p> "> Figure 3
<p>(<b>a</b>) MSE for all samples; (<b>b</b>) MSE for semantic relation samples; (<b>c</b>) MSE for spatial relation samples.</p> "> Figure 3 Cont.
<p>(<b>a</b>) MSE for all samples; (<b>b</b>) MSE for semantic relation samples; (<b>c</b>) MSE for spatial relation samples.</p> "> Figure 4
<p>(<b>a</b>) ROC and AUC for all samples; (<b>b</b>) ROC and AUC for semantic relation samples; (<b>c</b>) ROC and AUC for spatial relation samples.</p> "> Figure 5
<p>(<b>a</b>) Precision curve and recall curve changing with different thresholds; (<b>b</b>) curve in which recall varies with precision.</p> ">
Abstract
:1. Introduction
- (1)
- Propose a novel framework to automatically filter geo-entity relations. This framework provides a new way of identifying credible geographic information from web text according to human knowledge.
- (2)
- Establish a credible KB of geo-entity relations (confidence value ≥ 0.7), which can be used to construct and complement a geographic knowledge graph.
2. Related Works
2.1. Quality Assessment of Structured Geographical Information
2.2. Quality Assessment of Unstructured Geographical Information
3. Methodology
3.1. Acquiring Geo-Related Knowledge from KBs
3.2. Predicting Confidence for Geo-Entity Relations
4. Experiments
4.1. Data
- (1)
- Fine-grained categories of geo-entities were extracted from DBpedia Ontology (261 in total). These contain organization (i.e., company, school, government agency, bank, etc.) and place (i.e., island, country, ocean, mountain, road, factory, hotel, etc.).
- (2)
- Class pairs of geo-entities were extracted from the ontology and fact triples of DBpedia (1,159 in total).
- (3)
- Relational indicators were acquired from the ontology and fact triples of DBpedia and WordNet (177 in total).
- (4)
- English Wikipedia articles of geographical entries were used to extract geo-entity relation triples (2.8 GB in total). We generated 517,805 triples by inputting these articles into an RE system (Stanford OpenIE system, https://nlp.stanford.edu/software/openie.html).
- (5)
- All articles from English Wikipedia were used as a corpus to train the doc2vec model; the corpus size is 14.2 GB. Each vector has 100 dimensions.
4.2. Experimental Design
4.3. Metrics
- (1)
- MSE: We measure MSE between the predicted confidence value and the real probability; the lower the better. As given in Formula (2), n is the triple number of each interval, is the predicted confidence value, and is the real probability of each interval.
- (2)
- ROC and AUC: We order triples according to their confidence values, compute the true positive rate (TPR) and the false positive rate (FPR) according to Formulas (3) and (4) and the confusion matrix (Table 2), and then plot the ROC curve, where the x-axis represents the FPR and the y-axis represents the TPR. If a method’s ROC is closer to the point (0,1), its performance is better. AUC computes the area under the ROC curve; the higher the better.
5. Results and Discussion
5.1. MSE
5.2. ROC and AUC
5.3. Determine the Threshold
5.4. Effect of Credible Triple Filtering
5.5. Discussion
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Ehrlinger, L.; Wöß, W. Towards a definition of knowledge graphs. In Proceedings of the SEMANTiCS 2016, Leipzig, Germany, 13–14 September 2016. [Google Scholar]
- Martinez-Rodriguez, J.L.; Hogan, A.; Lopez-Arevalo, I. Information extraction meets the Semantic Web: A survey. Semant. Web Interoperability Usability Appl. 2018, 1–81. [Google Scholar] [CrossRef]
- Synak, M.; Dabrowski, M.; Kruk, S.R. Semantic Web and Ontologies. In Semantic Digital Libraries; Springer: Berlin/Heidelberg, Germany, 2009; pp. 41–54. [Google Scholar]
- Fader, A.; Soderland, S.; Etzioni, O. Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11), Edinburgh, UK, 27–31 July 2011; pp. 1535–1545. [Google Scholar]
- Corro, L.D.; Gemulla, R. Clauseie: Clause-based open information extraction. In Proceedings of the 22nd International Conference on World Wide Web (WWW ’13), Rio de Janeiro, Brazil, 13–17 May 2013; pp. 355–366. [Google Scholar]
- Mausam, M.; Schmitz, R.; Bart, S.; Soderland, O. Etzioni, Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 12–14 July 2012; pp. 523–534. [Google Scholar]
- Angeli, G.; Premkumar, M.J.; Manning, C.D. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 54th Annual Meeting of the Association for Computer Linguistics, Beijing, China, 26–31 July 2015; pp. 344–354. [Google Scholar]
- Pal, H. Demonyms and compound relational nouns in nominal open IE. In Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC 2016), Diego, CA, USA, 17 June 2016; pp. 35–39. [Google Scholar]
- Saha, S. Open information extraction from conjunctive sentences. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, NM, USA, 20–26 August 2018; pp. 2288–2299. [Google Scholar]
- Lehmann, J.; Isele, R.; Jakob, M. DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 2015, 6, 1–5. [Google Scholar]
- ISO 19157:2013(en), Geographical information—Data quality. Available online: https://www.iso.org/obp/ui/#iso:std:iso:19157:ed-1:v1:en (accessed on 25 January 2019).
- Senaratne, H.; Mobasheri, A.; Ali, A.L. A review of volunteered geographic information quality assessment methods. Int. J. Geogr. Inf. Sci. 2017, 31, 139–167. [Google Scholar] [CrossRef]
- Li, X.; Ellis, J.; Griffitt, K.; Strassel, S.M.; Parker, R.; Wright, J. Linguistic resources for 2011 knowledge base population evaluation. In Proceedings of the Text Analysis Conference 2011, Gaithersburg, MD, USA, 14–15 November 2011; pp. 1–8. [Google Scholar]
- Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Li, F.; Dong, X.; Langen, A.; Li, Y. Knowledge verification for long-tail verticals. Proc. VLDB Endow. 2017, 10, 1370–1381. [Google Scholar] [CrossRef]
- Galarraga, L.A.; Teflioudi, C.; Hose, K.; Suchanek, F. Amie: Association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd International Conference on World Wide Web (WWW ’13), Rio de Janeiro, Brazil, 13–17 May 2013; pp. 413–422. [Google Scholar]
- Huang, B.; Kimmig, A.; Getoor, L.; Golbeck, J. Probabilistic soft logic for trust analysis in social networks. In Proceedings of the 3rd International Workshop on Statistical Relational AI (StaRAI-13), Rio de Janeiro, Brazil, 28–30 August 2013; pp. 1–8. [Google Scholar]
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI-14), Québec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
- Hu, Y.; Janowicz, K.; Prasad, S. Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In Proceedings of the Geographic Information Retrieval Workshop (GIR 2014), Dallas, TX, USA, 4–7 November 2014. [Google Scholar]
- Hu, Y.; Janowicz, K.; Prasad, S.; Gao, S. Metadata topic harmonization and semantic search for linked-data-driven geoportals: A case study using ArcGIS online. Trans. GIS 2015, 19, 398–416. [Google Scholar] [CrossRef]
- Keßler, C.; Janowicz, K.; Kauppinen, T. Exploring the research field of GIScience with linked data. In Proceedings of the Seventh International Conference on Geographic Information Science (GIScience 2012), Columbus, OH, USA, 18–21 September 2012. [Google Scholar]
- Ferrucci, D.; Levas, A.; Bagchi, S.; Gondek, D.; Mueller, E.T. Watson: Beyond Jeopardy! Artif. Intell. 2013, 199–200, 93–105. [Google Scholar] [CrossRef]
- Egenhofer, M.J. A formal definition of binary topological relationships. In Proceedings of the 3rd International Conference on Foundations of Data Organization and Algorithms (FODO 1989), Paris, France, 21–23 June 1989; pp. 457–472. [Google Scholar]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML’14), Beijing, China, 21–26 June 2014; pp. 1188–1196. [Google Scholar]
- Yu, L.; Qiu, P.; Liu, X.; Lu, F.; Wan, B. A holistic approach to aligning geospatial data with multidimensional similarity measuring. Int. J. Digit. Earth 2018, 11, 845–862. [Google Scholar] [CrossRef]
- Zhang, H.; Li, Y.; Ma, F.; Gao, J.; Su, L. Texttruth: An unsupervised approach to discover trustworthy information from multi-sourced text data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (KDD 18), London, UK, 19–23 August 2018; pp. 2729–2737. [Google Scholar]
Methods | Relation Type | ||
---|---|---|---|
All | Semantic Relation | Spatial Relation | |
StanOIE | All-StanOIE | Se-StanOIE | Sp-StanOIE |
KNOWfact | All-KNOWfact | Se-KNOWfact | Sp-KNOWfact |
KNOWfact+lex | All-KNOWfact+lex | Se-KNOWfact+lex | Sp-KNOWfact+lex |
Manual Annotation | Predicted Result | |
---|---|---|
Positive Tuples | Negative Tuples | |
1 | TP | FN |
0 | FP | TN |
(a) | ||||
Sample Type | Interval | Method | ||
KNOWfact+lex | KNOWfact | StanOIE | ||
All-0 | [0, 0.3) | 24.79% | 19.25% | 1.80% |
[0.7, 1] | 1.68% | 2.29% | 95.54% | |
All-1 | [0, 0.3) | 1.09% | 6.56% | 0.90% |
[0.7, 1] | 66.48% | 14.57% | 97.99% | |
(b) | ||||
Sample Type | Interval | Method | ||
KNOWfact+lex | KNOWfact | StanOIE | ||
Se-0 | [0, 0.3) | 25.59% | 20.23% | 1.93% |
[0.7, 1] | 0.96% | 1.10% | 95.18% | |
Se-1 | [0, 0.3) | 2.06% | 10.69% | 1.37% |
[0.7, 1] | 40.35% | 9.31% | 96.55% | |
(c) | ||||
Sample Type | Interval | Method | ||
KNOWfact+lex | KNOWfact | StanOIE | ||
Sp-0 | [0, 0.3) | 19.23% | 12.50% | 0.96% |
[0.7, 1] | 6.72% | 10.57% | 98.07% | |
Sp-1 | [0, 0.3) | 0.00% | 1.93% | 0.39% |
[0.7, 1] | 95.75% | 20.46% | 99.62% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, L.; Qiu, P.; Gao, J.; Lu, F. A Knowledge-Based Filtering Method for Open Relations among Geo-Entities. ISPRS Int. J. Geo-Inf. 2019, 8, 59. https://doi.org/10.3390/ijgi8020059
Yu L, Qiu P, Gao J, Lu F. A Knowledge-Based Filtering Method for Open Relations among Geo-Entities. ISPRS International Journal of Geo-Information. 2019; 8(2):59. https://doi.org/10.3390/ijgi8020059
Chicago/Turabian StyleYu, Li, Peiyuan Qiu, Jialiang Gao, and Feng Lu. 2019. "A Knowledge-Based Filtering Method for Open Relations among Geo-Entities" ISPRS International Journal of Geo-Information 8, no. 2: 59. https://doi.org/10.3390/ijgi8020059
APA StyleYu, L., Qiu, P., Gao, J., & Lu, F. (2019). A Knowledge-Based Filtering Method for Open Relations among Geo-Entities. ISPRS International Journal of Geo-Information, 8(2), 59. https://doi.org/10.3390/ijgi8020059