[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Generations of Knowledge Graphs: The Crazy Ideas and the Business Impact

Published: 01 August 2023 Publication History

Abstract

Knowledge Graphs (KGs) have been used to support a wide range of applications, from web search to personal assistant. In this paper, we describe three generations of knowledge graphs: entity-based KGs, which have been supporting general search and question answering (e.g., at Google and Bing); text-rich KGs, which have been supporting search and recommendations for products, bio-informatics, etc. (e.g., at Amazon and Alibaba); and the emerging integration of KGs and LLMs, which we call dual neural KGs. We describe the characteristics of each generation of KGs, the crazy ideas behind the scenes in constructing such KGs, and the techniques developed over time to enable industry impact. In addition, we use KGs as examples to demonstrate a recipe to evolve research ideas from innovations to production practice, and then to the next level of innovations, to advance both science and business.

References

[1]
[n.d.]. ChatGPT. https://chat.openai.com/.
[2]
2018. How Alexa keeps getting smarter.
[3]
E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In DL.
[4]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In Proc. of ISWC.
[5]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247--1250.
[6]
Sebastian Borgeaud, Arthur Mensch, and etc. Jordan Hoffmann†. 2022. Improving language models by retrieving from trillions of tokens. arXiv (2022).
[7]
Sergey Brin. 1998. Extracting Patterns and Relations from the World Wide Web. In Proc. of the WebDB Workshop.
[8]
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv.
[9]
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. WebTables: exploring the power of tables on the web. In PVLDB. 538--549.
[10]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka Jr., and T. Mitchell. 2010. Toward an Architecture for Never-Ending Language Learning. In AAAI.
[11]
Kewei Cheng, Xian Li, Zhengyang Wang, Chenwei Zhang, Binxuan Huang, Yifan Ethan Xu, Xin Luna Dong, and Yizhou Sun. 2023. Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-training for E-commerce Catalog. In ACL.
[12]
Kewei Cheng, Xian Li, Yifan Xu, Xin Luna Dong, and Yizhou Sun. 2022. PGE: Robust product graph embedding learning for error detection. In VLDB.
[13]
Ludovic Denoyer and Patrick Gallinari. 2006. The Wikipedia XML corpus. SIGIR Forum 40, 1 (2006), 64--69.
[14]
Xin Luna Dong. 2016. Leave No Valuable Data Behind: The Crazy Ideas and the Business. In VLDB.
[15]
Xin Luna Dong. 2019. Building a broad knowledge graph for products. In Proc. of ICDE.
[16]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion. In SIGKDD.
[17]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From Data Fusion to Knowledge Fusion. PVLDB (2014).
[18]
Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based trust: estimating the trustworthiness of web sources. In VLDB.
[19]
Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, and Jiawei Han. 2020. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types. In SigKDD.
[20]
Xin Luna Dong and Felix Naumann. 2009. Data fusion-Resolving data conflicts for integration. PVLDB (2009).
[21]
Xin Luna Dong and Divesh Srivastava. 2013. Big data integration. PVLDB (2013).
[22]
Ivan P. Fellegi and Alan B. Sunter. 1969. A Theory for Record Linkage. Journal of the Americal Statistical Association 64, 328 (1969), 1183--1210.
[23]
Yuqing Gao, Jisheng Liang, Benjamin Han, Mohamed Yakout, and Ahmed Mohamed. 2018. Building a Large-Scale, Accurate and Fresh Knowledge Graph. In Proc. of SIGKDD.
[24]
Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, and Wei Wang. 2020. P-Companion: A principled framework for diversified complementary product recommendation. In CIKM.
[25]
Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, and Danai Koutra. 2022. Deep transfer learning for multi-source entity linkage via domain adaptation. In VLDB.
[26]
Giannis Karamanolakis, Jun Ma, and Xin Luna Dong. 2020. TXtract: Taxonomy-aware knowledge extraction for thousands of product categories. In ACL.
[27]
N. Kushmerick, D. S. Weld, and R. B. Doorenbos. 1997. Wrapper induction for information extraction. In Proc. of IJCAI.
[28]
Furong Li, Xin Luna Dong, Anno Largen, and Yang Li. 2017. Knowledge verification for long tail verticals. In VLDB.
[29]
Xian Li, Xin Luna Dong, Kenneth B. Lyons, Weiyi Meng, and Divesh Srivastava. 2015. Scaling up copy detection. In Proc. of ICDE.
[30]
Rongmei Lin, Xiang He, Jie Feng, Nasser Zalmout, Yan Liang, Li Xiong, and Xin Luna Dong. 2021. PAM: Understanding product images in cross product category attribute extraction. In SigKDD.
[31]
Ye Liu, Yao Wan, Lifang He, Hao Peng, and Philip S. Yu. 2021. KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning. In AAAI.
[32]
Colin Lockard, Xin Luna Dong, Arash Einolghozati, and Prashant Shiralkar. 2018. Ceres: Distantly supervised relation extraction from the semi-structured web. In VLDB.
[33]
Colin Lockard, Prashant Shiralkar, and Xin Luna Dong. 2019. OpenCeres: When open information extraction meets the semi-structured web. In NAACL.
[34]
Colin Lockard, Prashant Shiralkar, Hannaneh Hajishirzi, and Xin Luna Dong. 2020. ZeroShotCeres: Zero-shot relation extraction from semi-structured webpages. In ACL.
[35]
Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, and Jiawei Han. 2020. Octet: Online catalog taxonomy enrichment with self-supervision. In SigKDD.
[36]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL.
[37]
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2022. WebGPT: Browser-assisted question-answering with human feedback. arXiv (2022).
[38]
Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen tau Yih. 2023. REPLUG: Retrieval-Augmented Black-Box Language Models. arXiv (2023).
[39]
Amit Singhal. 2012. Introducing the Knowledge Graph: Things, Not Strings. Google Official Blog.
[40]
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO - A Core of Semantic Knowledge. In WWW.
[41]
Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. 2009. SOFIE: A Self-Organizing Framework for Information Extraction. In WebConf.
[42]
Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. 2023. How Knowledgeable are Large Language Models? in submission.
[43]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase., 78--85 pages.
[44]
Qifan Wang, Li Yang, Bhargav Kanagal, Sumit Sanghai, D. Sivakumar, Bin Shu, Zac Yu, and Jon Elsas. 2020. Learning to Extract Attribute VAlue from Product via Question Answering: A Multi-task Approach. In SigKDD.
[45]
Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang, and Ming Zhou. 2021. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. In ACL.
[46]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
[47]
Chris Welty, Lora Aroyo, Flip Korn, Sara M. McCarthy, and Shubin Zhao. 2021. Rapid Instance-Level Knowledge Acquisition for Google Maps from Class-Level Common Sense. In HCOMP.
[48]
Liqiang Xiao, Jun Ma, Xin Luna Dong, Pascual Martinez-Gomez, Nasser Zalmout, Wei Chen, Tong Zhao, Hao He, and Yaohui Jin. 2021. End-to-end conversational search for online shopping with utterance transfer.
[49]
Huimin Xu, Wenting Wang, Xin Mao, Xinyu Jiang, and Man Lan. 2019. SUOpenTag: Scaling Up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title. In ACL.
[50]
Jun Yan, Nasser Zalmout, Yan Liang, Christan Grant, Xiang Ren, and Xin Luna Dong. 2021. AdaTag: Multi-attribute value extraction from product profiles with adaptive decoding. In ACL.
[51]
Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, and Feifei Li. 2018. OpenTag: Open attribute value extraction from product profiles. In SigKDD.

Cited By

View all
  • (2025)An LLM-assisted ETL pipeline to build a high-quality knowledge graph of the Italian legislationInformation Processing & Management10.1016/j.ipm.2025.10408262:4(104082)Online publication date: Jul-2025
  • (2025)A novel large-language-model-driven framework for named entity recognitionInformation Processing & Management10.1016/j.ipm.2024.10405462:3(104054)Online publication date: May-2025
  • (2024)TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph ReasoningProceedings of the VLDB Endowment10.14778/3675034.367503917:10(2459-2472)Online publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 12
August 2023
685 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2023
Published in PVLDB Volume 16, Issue 12

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)3
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)An LLM-assisted ETL pipeline to build a high-quality knowledge graph of the Italian legislationInformation Processing & Management10.1016/j.ipm.2025.10408262:4(104082)Online publication date: Jul-2025
  • (2025)A novel large-language-model-driven framework for named entity recognitionInformation Processing & Management10.1016/j.ipm.2024.10405462:3(104054)Online publication date: May-2025
  • (2024)TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph ReasoningProceedings of the VLDB Endowment10.14778/3675034.367503917:10(2459-2472)Online publication date: 1-Jun-2024
  • (2024)CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671515(4816-4827)Online publication date: 25-Aug-2024
  • (2024)Adaptive Conversation Recommendation Systems: Leveraging Large Language Models and Knowledge Graphs2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI)10.1109/IDICAIEI61867.2024.10842757(1-6)Online publication date: 29-Nov-2024
  • (2024)Fragmenting Data Strategies to Scale Up the Knowledge Graph CreationSemantic Intelligence10.1007/978-981-97-7356-5_2(11-23)Online publication date: 29-Dec-2024
  • (2024)Unlocking the Power of LLM-Based Question Answering Systems: Enhancing Reasoning, Insight, and Automation with Knowledge GraphsIntelligent Systems Design and Applications10.1007/978-3-031-64776-5_16(156-171)Online publication date: 23-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media