[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3366423.3380189acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Published: 20 April 2020 Publication History

Abstract

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG.
In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle—a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities—over 10% more than baselines), while also being efficient for large knowledge graphs.

References

[1]
Charu C. Aggarwal. 2016. Outlier Analysis (2nded.). Springer Publishing Company, Incorporated.
[2]
Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, Vol. 22. 207–216.
[3]
Leman Akoglu, Duen Horng Chau, Jilles Vreeken, Nikolaj Tatti, Hanghang Tong, and Christos Faloutsos. 2013. Mining connection pathways for marked nodes in large graphs. In Proceedings of the 14th IEEE International Conference on Data Mining (ICDM), Dallas, Texas.
[4]
Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery 29, 3 (2015).
[5]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In Proceedings of the 6th International Semantic Web Conference, Busan, Korea. Springer.
[6]
Nikita Bhutani, Xinyi Zheng, and HV Jagadish. 2019. Learning to Answer Complex Questions over Knowledge Bases with Query Composition. In Proceedings of the 28th ACM Conference on Information and Knowledge Management (CIKM), Beijing, China. 739–748.
[7]
Carlos Bobed, Pierre Maillot, Peggy Cellier, and Sébastien Ferré. 2019. Data-driven Assessment of Structural Evolution of RDF Graphs. Semantic Web (2019).
[8]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV.
[9]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka, and Tom M Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI), Atlanta, GA.
[10]
Yurong Cheng, Lei Chen, Ye Yuan, and Guoren Wang. 2018. Rule-based graph repairing: Semantic and efficient repairing methods. In Proceedings of the 34th International Conference on Data Engineering (ICDE), Paris, France. 773–784.
[11]
Thomas M Cover and Joy A Thomas. 2012. Elements of information theory. John Wiley & Sons.
[12]
Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment 7, 7 (2014).
[13]
Wenfei Fan, Xin Wang, and Yinghui Wu. 2014. Answering graph pattern queries using views. In Proceedings of the 30th International Conference on Data Engineering (ICDE), Chicago, IL.
[14]
Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015. Association rules with graph patterns. Proceedings of the VLDB Endowment 8, 12 (2015), 1502–1513.
[15]
Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In Proceedings of the 25th ACM Conference on Information and Knowledge Management (CIKM), Indianapolis, IN. 1843–1857.
[16]
Luis Galárraga, Simon Razniewski, Antoine Amarilli, and Fabian M Suchanek. 2017. Predicting completeness in knowledge bases. In Proceeding of the 10th ACM International Conference on Web Search and Data Mining (WSDM), Cambridge, UK.
[17]
Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M Suchanek. 2015. Fast rule mining in ontological knowledge bases with AMIE+. The VLDB Journal 24, 6 (2015).
[18]
Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd International Conference on World Wide Web (WWW), Rio de Janeiro, Brazil.
[19]
S. Goebl, A. Tonch, C. Böhm, and C. Plant. 2016. MeGS: Partitioning Meaningful Subgraph Structures Using Minimum Description Length. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM), Barcelona, Spain.
[20]
Vinh Thinh Ho, Daria Stepanova, Mohamed H Gad-Elrab, Evgeny Kharlamov, and Gerhard Weikum. 2018. Rule learning from knowledge graphs guided by embedding models. In Proceedings of the 17th International Semantic Web Conference, Monterey, CA. Springer, 72–90.
[21]
Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In Proceeding of the 12th ACM International Conference on Web Search and Data Mining (WSDM), Melbourne, Australia. 105–113.
[22]
Shengbin Jia, Yang Xiang, Xiaojun Chen, Kun Wang, 2019. Triple Trustworthiness Measurement for Knowledge Graph. In Proceedings of the 28th International Conference on World Wide Web (WebConf), San Francisco, CA. ACM, 2865–2871.
[23]
Linnan Jiang, Lei Chen, and Zhao Chen. 2018. Knowledge Base Enhancement via Data Facts and Crowdsourcing. In Proceedings of the 34th International Conference on Data Engineering (ICDE), Paris, France. 1109–1119.
[24]
Di Jin, Ryan A. Rossi, Eunyee Koh, Sungchul Kim, Anup Rao, and Danai Koutra. 2019. Latent Network Summarization: Bridging Network Embedding and Summarization. In Proceedings of the 21st ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Anchorage, AK. 987–997.
[25]
Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos. 2014. VoG: Summarizing and Understanding Large Graphs. In Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA. 91–99.
[26]
Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph summarization methods and applications: A survey. Comput. Surveys 51, 3 (2018).
[27]
Weizhi Ma, Min Zhang, Yue Cao, Woojeong Jin, Chenyang Wang, Yiqun Liu, Shaoping Ma, and Xiang Ren. 2019. Jointly Learning Explainable Rules for Recommendation with Knowledge Graph. In Proceedings of the 28th International Conference on World Wide Web (WebConf), San Francisco, CA. ACM, 1210–1221.
[28]
Christian Meilicke, Melisachew Wudage Chekol, Daniel Ruffinelli, and Heiner Stuckenschmidt. 2019. Anytime bottom-up rule learning for knowledge graph completion. (2019).
[29]
André Melo and Heiko Paulheim. 2017. Detection of relation assertion errors in knowledge graphs. In Proceedings of the 9th Knowledge Capture Conference, K-CAP 2017, Austin, TX.
[30]
Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph Summarization with Bounded Error. In Proceedings of the 2008 ACM International Conference on Management of Data (SIGMOD), Vancouver, BC. 14.
[31]
Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2015. A review of relational machine learning for knowledge graphs. Proc. IEEE 104, 1 (2015), 11–33.
[32]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA.
[33]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2012. Factorizing yago: scalable machine learning for linked data. In Proceedings of the 21st International Conference on World Wide Web (WWW), Lyon, France.
[34]
Caleb C Noble and Diane J Cook. 2003. Graph-based anomaly detection. In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington, DC.
[35]
Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web 8, 3 (2017).
[36]
Heiko Paulheim and Christian Bizer. 2014. Improving the quality of linked data using statistical distributions. International Journal on Semantic Web & Information Systems 10, 2(2014).
[37]
Mohammad Rashid, Marco Torchiano, Giuseppe Rizzo, Nandana Mihindukulasooriya, and Oscar Corcho. 2019. A quality assessment approach for evolving knowledge bases. Semantic Web 10, 2 (2019).
[38]
Jorma Rissanen. 1978. Modeling by Shortest Data Description. Automatica 14, 1 (1978).
[39]
Jorma Rissanen. 1983. A Universal Prior for Integers and Estimation by Minimum Description Length. The Annals of Statistics 11, 2 (1983).
[40]
J. Rissanen. 1985. Minimum Description Length Principle. In Encyclopedia of Statistical Sciences. Vol. V. John Wiley and Sons.
[41]
Tara Safavi, Davide Mottin, Caleb Belth, Emmanuel Müller, Lukas Faber, and Danai Koutra. 2019. Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket. In Proceedings of the 19th IEEE International Conference on Data Mining (ICDM), Beijing, China.
[42]
Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. 2015. Timecrunch: Interpretable dynamic graph summarization. In Proceedings of the 21st ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Sydney, Australia.
[43]
Claude Elwood Shannon. 1948. A mathematical theory of communication. Bell system technical journal 27, 3 (1948), 379–423.
[44]
Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding streams in knowledge graphs to support fact checking. In Proceedings of the 17th IEEE International Conference on Data Mining (ICDM), New Orleans, LA. 859–864.
[45]
Qi Song, Yinghui Wu, and Xin Luna Dong. 2016. Mining summaries for knowledge graph search. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM), Barcelona, Spain.
[46]
Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (WWW), Alberta, Canada.
[47]
Thomas Pellissier Tanon, Daria Stepanova, Simon Razniewski, Paramita Mirza, and Gerhard Weikum. 2017. Completeness-aware rule learning from knowledge graphs. In Proceedings of the 16th International Semantic Web Conference, Vienna, Austria.
[48]
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071–2080.
[49]
Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29, 12(2017).
[50]
Yinghui Wu, Shengqi Yang, Mudhakar Srivatsa, Arun Iyengar, and Xifeng Yan. 2013. Summarizing answer graphs induced by keyword queries. Proceedings of the VLDB Endowment 6, 14 (2013).
[51]
Gensheng Zhang, Damian Jimenez, and Chengkai Li. 2018. Maverick: Discovering exceptional facts from knowledge graphs. In Proceedings of the 27th ACM Conference on Information and Knowledge Management (CIKM), Torino, Italy.
[52]
Wen Zhang, Bibek Paudel, Liang Wang, Jiaoyan Chen, Hai Zhu, Wei Zhang, Abraham Bernstein, and Huajun Chen. 2019. Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning. In Proceedings of the 28th International Conference on World Wide Web (WebConf), San Francisco, CA. 2366–2377.
[53]
Mussab Zneika, Dan Vodislav, and Dimitris Kotzinos. 2019. Quality metrics for RDF graph summarization. Semantic Web (2019), 1–30.

Cited By

View all
  • (2025)Knowledge Error Detection via Textual and Structural Joint LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200408:1(233-240)Online publication date: Feb-2025
  • (2025)A review on the reliability of knowledge graph: from a knowledge representation learning perspectiveWorld Wide Web10.1007/s11280-024-01316-w28:1Online publication date: 1-Jan-2025
  • (2024)Online Detection of Anomalies in Temporal Knowledge Graphs with InterpretabilityProceedings of the ACM on Management of Data10.1145/36988232:6(1-26)Online publication date: 20-Dec-2024
  • Show More Cited By

Index Terms

  1. What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Proceedings of The Web Conference 2020
        April 2020
        3143 pages
        ISBN:9781450370233
        DOI:10.1145/3366423
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)65
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 15 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Knowledge Error Detection via Textual and Structural Joint LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200408:1(233-240)Online publication date: Feb-2025
        • (2025)A review on the reliability of knowledge graph: from a knowledge representation learning perspectiveWorld Wide Web10.1007/s11280-024-01316-w28:1Online publication date: 1-Jan-2025
        • (2024)Online Detection of Anomalies in Temporal Knowledge Graphs with InterpretabilityProceedings of the ACM on Management of Data10.1145/36988232:6(1-26)Online publication date: 20-Dec-2024
        • (2024)Distilling Knowledge Based on Curriculum Learning for Temporal Knowledge Graph EmbeddingsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679896(4248-4252)Online publication date: 21-Oct-2024
        • (2024)Knowledge graph error detection with unsupervised triplet networkSeventh Global Intelligent Industry Conference (GIIC 2024)10.1117/12.3032265(36)Online publication date: 11-Oct-2024
        • (2024)SeSICL: Semantic and Structural Integrated Contrastive Learning for Knowledge Graph Error DetectionIEEE Access10.1109/ACCESS.2024.338454312(56088-56096)Online publication date: 2024
        • (2024)Dual De-confounded Causal Intervention method for knowledge graph error detectionKnowledge-Based Systems10.1016/j.knosys.2024.112644305(112644)Online publication date: Dec-2024
        • (2024)Estimation-based optimizations for the semantic compression of RDF knowledge basesInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10379961:5Online publication date: 1-Sep-2024
        • (2024)ProMvSDInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10370561:4Online publication date: 18-Jul-2024
        • (2024)Heterogeneous Views and Spatial Structure Enhancement for triple error detectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124938256:COnline publication date: 5-Dec-2024
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media