More Web Proxy on the site http://driver.im/

research-article

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Authors:

Jilles Vreeken,

Danai KoutraAuthors Info & Claims

WWW '20: Proceedings of The Web Conference 2020

Pages 1115 - 1126

https://doi.org/10.1145/3366423.3380189

Published: 20 April 2020 Publication History

Abstract

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG.

In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle—a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities—over 10% more than baselines), while also being efficient for large knowledge graphs.

References

[1]

Charu C. Aggarwal. 2016. Outlier Analysis (2nded.). Springer Publishing Company, Incorporated.

[2]

Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, Vol. 22. 207–216.

[3]

Leman Akoglu, Duen Horng Chau, Jilles Vreeken, Nikolaj Tatti, Hanghang Tong, and Christos Faloutsos. 2013. Mining connection pathways for marked nodes in large graphs. In Proceedings of the 14th IEEE International Conference on Data Mining (ICDM), Dallas, Texas.

[4]

Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery 29, 3 (2015).

Digital Library

[5]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In Proceedings of the 6th International Semantic Web Conference, Busan, Korea. Springer.

Digital Library

[6]

Nikita Bhutani, Xinyi Zheng, and HV Jagadish. 2019. Learning to Answer Complex Questions over Knowledge Bases with Query Composition. In Proceedings of the 28th ACM Conference on Information and Knowledge Management (CIKM), Beijing, China. 739–748.

Digital Library

[7]

Carlos Bobed, Pierre Maillot, Peggy Cellier, and Sébastien Ferré. 2019. Data-driven Assessment of Structural Evolution of RDF Graphs. Semantic Web (2019).

[8]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV.

[9]

Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka, and Tom M Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI), Atlanta, GA.

[10]

Yurong Cheng, Lei Chen, Ye Yuan, and Guoren Wang. 2018. Rule-based graph repairing: Semantic and efficient repairing methods. In Proceedings of the 34th International Conference on Data Engineering (ICDE), Paris, France. 773–784.

[11]

Thomas M Cover and Joy A Thomas. 2012. Elements of information theory. John Wiley & Sons.

[12]

Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment 7, 7 (2014).

Digital Library

[13]

Wenfei Fan, Xin Wang, and Yinghui Wu. 2014. Answering graph pattern queries using views. In Proceedings of the 30th International Conference on Data Engineering (ICDE), Chicago, IL.

[14]

Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015. Association rules with graph patterns. Proceedings of the VLDB Endowment 8, 12 (2015), 1502–1513.

Digital Library

[15]

Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In Proceedings of the 25th ACM Conference on Information and Knowledge Management (CIKM), Indianapolis, IN. 1843–1857.

Digital Library

[16]

Luis Galárraga, Simon Razniewski, Antoine Amarilli, and Fabian M Suchanek. 2017. Predicting completeness in knowledge bases. In Proceeding of the 10th ACM International Conference on Web Search and Data Mining (WSDM), Cambridge, UK.

Digital Library

[17]

Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M Suchanek. 2015. Fast rule mining in ontological knowledge bases with AMIE+. The VLDB Journal 24, 6 (2015).

Digital Library

[18]

Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd International Conference on World Wide Web (WWW), Rio de Janeiro, Brazil.

Digital Library

[19]

S. Goebl, A. Tonch, C. Böhm, and C. Plant. 2016. MeGS: Partitioning Meaningful Subgraph Structures Using Minimum Description Length. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM), Barcelona, Spain.

[20]

Vinh Thinh Ho, Daria Stepanova, Mohamed H Gad-Elrab, Evgeny Kharlamov, and Gerhard Weikum. 2018. Rule learning from knowledge graphs guided by embedding models. In Proceedings of the 17th International Semantic Web Conference, Monterey, CA. Springer, 72–90.

Digital Library

[21]

Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In Proceeding of the 12th ACM International Conference on Web Search and Data Mining (WSDM), Melbourne, Australia. 105–113.

Digital Library

[22]

Shengbin Jia, Yang Xiang, Xiaojun Chen, Kun Wang, 2019. Triple Trustworthiness Measurement for Knowledge Graph. In Proceedings of the 28th International Conference on World Wide Web (WebConf), San Francisco, CA. ACM, 2865–2871.

Digital Library

[23]

Linnan Jiang, Lei Chen, and Zhao Chen. 2018. Knowledge Base Enhancement via Data Facts and Crowdsourcing. In Proceedings of the 34th International Conference on Data Engineering (ICDE), Paris, France. 1109–1119.

[24]

Di Jin, Ryan A. Rossi, Eunyee Koh, Sungchul Kim, Anup Rao, and Danai Koutra. 2019. Latent Network Summarization: Bridging Network Embedding and Summarization. In Proceedings of the 21st ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Anchorage, AK. 987–997.

Digital Library

[25]

Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos. 2014. VoG: Summarizing and Understanding Large Graphs. In Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA. 91–99.

[26]

Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph summarization methods and applications: A survey. Comput. Surveys 51, 3 (2018).

Digital Library

[27]

Weizhi Ma, Min Zhang, Yue Cao, Woojeong Jin, Chenyang Wang, Yiqun Liu, Shaoping Ma, and Xiang Ren. 2019. Jointly Learning Explainable Rules for Recommendation with Knowledge Graph. In Proceedings of the 28th International Conference on World Wide Web (WebConf), San Francisco, CA. ACM, 1210–1221.

Digital Library

[28]

Christian Meilicke, Melisachew Wudage Chekol, Daniel Ruffinelli, and Heiner Stuckenschmidt. 2019. Anytime bottom-up rule learning for knowledge graph completion. (2019).

[29]

André Melo and Heiko Paulheim. 2017. Detection of relation assertion errors in knowledge graphs. In Proceedings of the 9th Knowledge Capture Conference, K-CAP 2017, Austin, TX.

Digital Library

[30]

Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph Summarization with Bounded Error. In Proceedings of the 2008 ACM International Conference on Management of Data (SIGMOD), Vancouver, BC. 14.

Digital Library

[31]

Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2015. A review of relational machine learning for knowledge graphs. Proc. IEEE 104, 1 (2015), 11–33.

[32]

Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA.

[33]

Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2012. Factorizing yago: scalable machine learning for linked data. In Proceedings of the 21st International Conference on World Wide Web (WWW), Lyon, France.

Digital Library

[34]

Caleb C Noble and Diane J Cook. 2003. Graph-based anomaly detection. In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington, DC.

Digital Library

[35]

Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web 8, 3 (2017).

[36]

Heiko Paulheim and Christian Bizer. 2014. Improving the quality of linked data using statistical distributions. International Journal on Semantic Web & Information Systems 10, 2(2014).

Digital Library

[37]

Mohammad Rashid, Marco Torchiano, Giuseppe Rizzo, Nandana Mihindukulasooriya, and Oscar Corcho. 2019. A quality assessment approach for evolving knowledge bases. Semantic Web 10, 2 (2019).

[38]

Jorma Rissanen. 1978. Modeling by Shortest Data Description. Automatica 14, 1 (1978).

[39]

Jorma Rissanen. 1983. A Universal Prior for Integers and Estimation by Minimum Description Length. The Annals of Statistics 11, 2 (1983).

[40]

J. Rissanen. 1985. Minimum Description Length Principle. In Encyclopedia of Statistical Sciences. Vol. V. John Wiley and Sons.

[41]

Tara Safavi, Davide Mottin, Caleb Belth, Emmanuel Müller, Lukas Faber, and Danai Koutra. 2019. Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket. In Proceedings of the 19th IEEE International Conference on Data Mining (ICDM), Beijing, China.

[42]

Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. 2015. Timecrunch: Interpretable dynamic graph summarization. In Proceedings of the 21st ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Sydney, Australia.

Digital Library

[43]

Claude Elwood Shannon. 1948. A mathematical theory of communication. Bell system technical journal 27, 3 (1948), 379–423.

[44]

Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding streams in knowledge graphs to support fact checking. In Proceedings of the 17th IEEE International Conference on Data Mining (ICDM), New Orleans, LA. 859–864.

[45]

Qi Song, Yinghui Wu, and Xin Luna Dong. 2016. Mining summaries for knowledge graph search. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM), Barcelona, Spain.

[46]

Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (WWW), Alberta, Canada.

Digital Library

[47]

Thomas Pellissier Tanon, Daria Stepanova, Simon Razniewski, Paramita Mirza, and Gerhard Weikum. 2017. Completeness-aware rule learning from knowledge graphs. In Proceedings of the 16th International Semantic Web Conference, Vienna, Austria.

[48]

Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071–2080.

Digital Library

[49]

Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29, 12(2017).

[50]

Yinghui Wu, Shengqi Yang, Mudhakar Srivatsa, Arun Iyengar, and Xifeng Yan. 2013. Summarizing answer graphs induced by keyword queries. Proceedings of the VLDB Endowment 6, 14 (2013).

Digital Library

[51]

Gensheng Zhang, Damian Jimenez, and Chengkai Li. 2018. Maverick: Discovering exceptional facts from knowledge graphs. In Proceedings of the 27th ACM Conference on Information and Knowledge Management (CIKM), Torino, Italy.

Digital Library

[52]

Wen Zhang, Bibek Paudel, Liang Wang, Jiaoyan Chen, Hai Zhu, Wei Zhang, Abraham Bernstein, and Huajun Chen. 2019. Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning. In Proceedings of the 28th International Conference on World Wide Web (WebConf), San Francisco, CA. 2366–2377.

Digital Library

[53]

Mussab Zneika, Dan Vodislav, and Dimitris Kotzinos. 2019. Quality metrics for RDF graph summarization. Semantic Web (2019), 1–30.

Cited By

Wang XAo XZhang FZhang ZHe Q(2025)Knowledge Error Detection via Textual and Structural Joint LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200408:1(233-240)Online publication date: Feb-2025
https://doi.org/10.26599/BDMA.2024.9020040
Yang YChen JXiang Y(2025)A review on the reliability of knowledge graph: from a knowledge representation learning perspectiveWorld Wide Web10.1007/s11280-024-01316-w28:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11280-024-01316-w
Zhang JYing RShao J(2024)Online Detection of Anomalies in Temporal Knowledge Graphs with InterpretabilityProceedings of the ACM on Management of Data10.1145/36988232:6(1-26)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698823
Show More Cited By

Index Terms

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Index terms have been assigned to the content through auto-classification.

Recommendations

Enabling inductive knowledge graph completion via structure-aware attention network
Abstract
Knowledge graph completion (KGC) aims at complementing missing entities and relations in a knowledge graph (KG). Popular KGC approaches based on KG embedding are typically limited to the transductive setting, i.e., all entities must be seen during ...
Missing-edge aware knowledge graph inductive inference through dual graph learning and traversing
Abstract
Knowledge graph (KG) is a kind of structured human knowledge of modeling the relations between real-world entities. This paper studies the KG inductive inference problem, i.e., predicting the relations for out-of-KG entities. However, due to the ...
Highlights
- A rule-inductive model is proposed to solve the KG inductive inference problem.
- The semantic information of relation’s argument is used to construct a dual graph.
- A dual graph traversing method is proposed to learn all possible ...
Meta-Knowledge Transfer for Inductive Knowledge Graph Embedding
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Knowledge graphs (KGs) consisting of a large number of triples have become widespread recently, and many knowledge graph embedding (KGE) methods are proposed to embed entities and relations of a KG into continuous vector spaces. Such embedding methods ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Proceedings of The Web Conference 2020

April 2020

3143 pages

ISBN:9781450370233

DOI:10.1145/3366423

Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
719
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)7

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang XAo XZhang FZhang ZHe Q(2025)Knowledge Error Detection via Textual and Structural Joint LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200408:1(233-240)Online publication date: Feb-2025
https://doi.org/10.26599/BDMA.2024.9020040
Yang YChen JXiang Y(2025)A review on the reliability of knowledge graph: from a knowledge representation learning perspectiveWorld Wide Web10.1007/s11280-024-01316-w28:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11280-024-01316-w
Zhang JYing RShao J(2024)Online Detection of Anomalies in Temporal Knowledge Graphs with InterpretabilityProceedings of the ACM on Management of Data10.1145/36988232:6(1-26)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698823
Zhang BLi JDai YSerra ESpezzano F(2024)Distilling Knowledge Based on Curriculum Learning for Temporal Knowledge Graph EmbeddingsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679896(4248-4252)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679896
Wan t发仁宋xu Z(2024)Knowledge graph error detection with unsupervised triplet networkSeventh Global Intelligent Industry Conference (GIIC 2024)10.1117/12.3032265(36)Online publication date: 11-Oct-2024
https://doi.org/10.1117/12.3032265
Liu XTang JLi MHan JXiao GJiang J(2024)SeSICL: Semantic and Structural Integrated Contrastive Learning for Knowledge Graph Error DetectionIEEE Access10.1109/ACCESS.2024.338454312(56088-56096)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3384543
Yang YChen JGao XXiang Y(2024)Dual De-confounded Causal Intervention method for knowledge graph error detectionKnowledge-Based Systems10.1016/j.knosys.2024.112644305(112644)Online publication date: Dec-2024
https://doi.org/10.1016/j.knosys.2024.112644
Wang RWong RSun D(2024)Estimation-based optimizations for the semantic compression of RDF knowledge basesInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10379961:5Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.ipm.2024.103799
Zhou YZhu CZhu W(2024)ProMvSDInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10370561:4Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1016/j.ipm.2024.103705
Xue XZhang CWang YSong HXue XNiu Z(2024)Heterogeneous Views and Spatial Structure Enhancement for triple error detectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124938256:COnline publication date: 5-Dec-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124938
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents