More Web Proxy on the site http://driver.im/

research-article

Generations of Knowledge Graphs: The Crazy Ideas and the Business Impact

Author:

Xin Luna DongAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 16, Issue 12

Pages 4130 - 4137

https://doi.org/10.14778/3611540.3611636

Published: 01 August 2023 Publication History

Abstract

Knowledge Graphs (KGs) have been used to support a wide range of applications, from web search to personal assistant. In this paper, we describe three generations of knowledge graphs: entity-based KGs, which have been supporting general search and question answering (e.g., at Google and Bing); text-rich KGs, which have been supporting search and recommendations for products, bio-informatics, etc. (e.g., at Amazon and Alibaba); and the emerging integration of KGs and LLMs, which we call dual neural KGs. We describe the characteristics of each generation of KGs, the crazy ideas behind the scenes in constructing such KGs, and the techniques developed over time to enable industry impact. In addition, we use KGs as examples to demonstrate a recipe to evolve research ideas from innovations to production practice, and then to the next level of innovations, to advance both science and business.

References

[1]

[n.d.]. ChatGPT. https://chat.openai.com/.

[2]

2018. How Alexa keeps getting smarter.

[3]

E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In DL.

Digital Library

[4]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In Proc. of ISWC.

Digital Library

[5]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247--1250.

[6]

Sebastian Borgeaud, Arthur Mensch, and etc. Jordan Hoffmann&dagger;. 2022. Improving language models by retrieving from trillions of tokens. arXiv (2022).

[7]

Sergey Brin. 1998. Extracting Patterns and Relations from the World Wide Web. In Proc. of the WebDB Workshop.

[8]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv.

[9]

Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. WebTables: exploring the power of tables on the web. In PVLDB. 538--549.

[10]

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka Jr., and T. Mitchell. 2010. Toward an Architecture for Never-Ending Language Learning. In AAAI.

[11]

Kewei Cheng, Xian Li, Zhengyang Wang, Chenwei Zhang, Binxuan Huang, Yifan Ethan Xu, Xin Luna Dong, and Yizhou Sun. 2023. Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-training for E-commerce Catalog. In ACL.

[12]

Kewei Cheng, Xian Li, Yifan Xu, Xin Luna Dong, and Yizhou Sun. 2022. PGE: Robust product graph embedding learning for error detection. In VLDB.

[13]

Ludovic Denoyer and Patrick Gallinari. 2006. The Wikipedia XML corpus. SIGIR Forum 40, 1 (2006), 64--69.

Digital Library

[14]

Xin Luna Dong. 2016. Leave No Valuable Data Behind: The Crazy Ideas and the Business. In VLDB.

[15]

Xin Luna Dong. 2019. Building a broad knowledge graph for products. In Proc. of ICDE.

[16]

Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion. In SIGKDD.

Digital Library

[17]

Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From Data Fusion to Knowledge Fusion. PVLDB (2014).

[18]

Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based trust: estimating the trustworthiness of web sources. In VLDB.

[19]

Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, and Jiawei Han. 2020. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types. In SigKDD.

[20]

Xin Luna Dong and Felix Naumann. 2009. Data fusion-Resolving data conflicts for integration. PVLDB (2009).

[21]

Xin Luna Dong and Divesh Srivastava. 2013. Big data integration. PVLDB (2013).

[22]

Ivan P. Fellegi and Alan B. Sunter. 1969. A Theory for Record Linkage. Journal of the Americal Statistical Association 64, 328 (1969), 1183--1210.

[23]

Yuqing Gao, Jisheng Liang, Benjamin Han, Mohamed Yakout, and Ahmed Mohamed. 2018. Building a Large-Scale, Accurate and Fresh Knowledge Graph. In Proc. of SIGKDD.

[24]

Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, and Wei Wang. 2020. P-Companion: A principled framework for diversified complementary product recommendation. In CIKM.

[25]

Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, and Danai Koutra. 2022. Deep transfer learning for multi-source entity linkage via domain adaptation. In VLDB.

[26]

Giannis Karamanolakis, Jun Ma, and Xin Luna Dong. 2020. TXtract: Taxonomy-aware knowledge extraction for thousands of product categories. In ACL.

[27]

N. Kushmerick, D. S. Weld, and R. B. Doorenbos. 1997. Wrapper induction for information extraction. In Proc. of IJCAI.

[28]

Furong Li, Xin Luna Dong, Anno Largen, and Yang Li. 2017. Knowledge verification for long tail verticals. In VLDB.

[29]

Xian Li, Xin Luna Dong, Kenneth B. Lyons, Weiyi Meng, and Divesh Srivastava. 2015. Scaling up copy detection. In Proc. of ICDE.

[30]

Rongmei Lin, Xiang He, Jie Feng, Nasser Zalmout, Yan Liang, Li Xiong, and Xin Luna Dong. 2021. PAM: Understanding product images in cross product category attribute extraction. In SigKDD.

[31]

Ye Liu, Yao Wan, Lifang He, Hao Peng, and Philip S. Yu. 2021. KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning. In AAAI.

[32]

Colin Lockard, Xin Luna Dong, Arash Einolghozati, and Prashant Shiralkar. 2018. Ceres: Distantly supervised relation extraction from the semi-structured web. In VLDB.

[33]

Colin Lockard, Prashant Shiralkar, and Xin Luna Dong. 2019. OpenCeres: When open information extraction meets the semi-structured web. In NAACL.

[34]

Colin Lockard, Prashant Shiralkar, Hannaneh Hajishirzi, and Xin Luna Dong. 2020. ZeroShotCeres: Zero-shot relation extraction from semi-structured webpages. In ACL.

[35]

Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, and Jiawei Han. 2020. Octet: Online catalog taxonomy enrichment with self-supervision. In SigKDD.

[36]

Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL.

[37]

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2022. WebGPT: Browser-assisted question-answering with human feedback. arXiv (2022).

[38]

Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen tau Yih. 2023. REPLUG: Retrieval-Augmented Black-Box Language Models. arXiv (2023).

[39]

Amit Singhal. 2012. Introducing the Knowledge Graph: Things, Not Strings. Google Official Blog.

[40]

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO - A Core of Semantic Knowledge. In WWW.

[41]

Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. 2009. SOFIE: A Self-Organizing Framework for Information Extraction. In WebConf.

Digital Library

[42]

Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. 2023. How Knowledgeable are Large Language Models? in submission.

[43]

Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase., 78--85 pages.

[44]

Qifan Wang, Li Yang, Bhargav Kanagal, Sumit Sanghai, D. Sivakumar, Bin Shu, Zac Yu, and Jon Elsas. 2020. Learning to Extract Attribute VAlue from Product via Question Answering: A Multi-task Approach. In SigKDD.

[45]

Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang, and Ming Zhou. 2021. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. In ACL.

[46]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

[47]

Chris Welty, Lora Aroyo, Flip Korn, Sara M. McCarthy, and Shubin Zhao. 2021. Rapid Instance-Level Knowledge Acquisition for Google Maps from Class-Level Common Sense. In HCOMP.

[48]

Liqiang Xiao, Jun Ma, Xin Luna Dong, Pascual Martinez-Gomez, Nasser Zalmout, Wei Chen, Tong Zhao, Hao He, and Yaohui Jin. 2021. End-to-end conversational search for online shopping with utterance transfer.

[49]

Huimin Xu, Wenting Wang, Xin Mao, Xinyu Jiang, and Man Lan. 2019. SUOpenTag: Scaling Up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title. In ACL.

[50]

Jun Yan, Nasser Zalmout, Yan Liang, Christan Grant, Xiang Ren, and Xin Luna Dong. 2021. AdaTag: Multi-attribute value extraction from product profiles with adaptive decoding. In ACL.

[51]

Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, and Feifei Li. 2018. OpenTag: Open attribute value extraction from product profiles. In SigKDD.

Digital Library

Cited By

Colombo ABernasconi ACeri S(2025)An LLM-assisted ETL pipeline to build a high-quality knowledge graph of the Italian legislationInformation Processing & Management10.1016/j.ipm.2025.10408262:4(104082)Online publication date: Jul-2025
https://doi.org/10.1016/j.ipm.2025.104082
Wang ZChen HXu GRen M(2025)A novel large-language-model-driven framework for named entity recognitionInformation Processing & Management10.1016/j.ipm.2024.10405462:3(104054)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104054
Wang KXu YLuo S(2024)TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph ReasoningProceedings of the VLDB Endowment10.14778/3675034.367503917:10(2459-2472)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675039
Show More Cited By

Recommendations

Knowledge Graphs: An Information Retrieval Perspective

In this survey, we provide an overview of the literature on knowledge graphs (KGs) in the context of information retrieval (IR). Modern IR systems can benefit from information available in KGs in multiple ways, independent of whether the KGs are publicly ...
Entity Alignment Between Knowledge Graphs Using Entity Type Matching
Knowledge Science, Engineering and Management
Abstract
The task of entity alignment between knowledge graphs (KGs) aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based entity alignment methods get extended attention. Most of them firstly ...
Entity alignment between knowledge graphs using attribute embeddings
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 16, Issue 12

August 2023

685 pages

ISSN:2150-8097

Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2023

Published in PVLDB Volume 16, Issue 12

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
59
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Colombo ABernasconi ACeri S(2025)An LLM-assisted ETL pipeline to build a high-quality knowledge graph of the Italian legislationInformation Processing & Management10.1016/j.ipm.2025.10408262:4(104082)Online publication date: Jul-2025
https://doi.org/10.1016/j.ipm.2025.104082
Wang ZChen HXu GRen M(2025)A novel large-language-model-driven framework for named entity recognitionInformation Processing & Management10.1016/j.ipm.2024.10405462:3(104054)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104054
Wang KXu YLuo S(2024)TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph ReasoningProceedings of the VLDB Endowment10.14778/3675034.367503917:10(2459-2472)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675039
Cao Lvon Ehrenheim VGranroth-Wilding MAnselmo Stahl RMcCornack ACatovic ACavalcanti Rocha DBaeza-Yates RBonchi F(2024)CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671515(4816-4827)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671515
Mankari SSanghavi A(2024)Adaptive Conversation Recommendation Systems: Leveraging Large Language Models and Knowledge Graphs2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI)10.1109/IDICAIEI61867.2024.10842757(1-6)Online publication date: 29-Nov-2024
https://doi.org/10.1109/IDICAIEI61867.2024.10842757
Iglesias ESakor ARohde PJanev VVidal M(2024)Fragmenting Data Strategies to Scale Up the Knowledge Graph CreationSemantic Intelligence10.1007/978-981-97-7356-5_2(11-23)Online publication date: 29-Dec-2024
https://doi.org/10.1007/978-981-97-7356-5_2
Koohborfardhaghighi SDe Geyter GKaliner E(2024)Unlocking the Power of LLM-Based Question Answering Systems: Enhancing Reasoning, Insight, and Automation with Knowledge GraphsIntelligent Systems Design and Applications10.1007/978-3-031-64776-5_16(156-171)Online publication date: 23-Jul-2024
https://doi.org/10.1007/978-3-031-64776-5_16

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents