[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3613905.3650844acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Work in Progress

The Promise and Challenge of Large Language Models for Knowledge Engineering: Insights from a Hackathon

Published: 11 May 2024 Publication History

Abstract

Knowledge engineering (KE) is the process of building, maintaining and using knowledge-based systems. This recently takes the form of knowledge graphs (KGs). The advent of new technologies like Large Language Models (LLMs) has the potential to improve automation in KE work due to the richness of their training data and their performance at solving natural language processing tasks. We conducted a multiple-methods study exploring user opinions and needs regarding the use of LLMs in KE. We used ethnographic techniques to observe KE workers using LLMs to solve KE tasks during a hackathon, followed by interviews with some of the participants. This interim study found that despite LLMs’ promising capabilities for efficient knowledge acquisition and requirements elicitation, their effective deployment requires an extended set of capabilities and training, particularly in prompting and understanding data. LLMs can be useful for simple quality assessment tasks, but in complex scenarios, the output is hard to control and evaluation may require novel approaches. With this study, we aim to evidence the interaction of KE stakeholders with LLMs, identify areas of potential, and understand the barriers to their effective use. We find copilot approaches may be valuable in developing processes where the human or a team of humans is assisted by generative AI.

Supplemental Material

References

[1]
David Abián, F Guerra, J Martínez-Romanos, and Raquel Trillo-Lado. 2017. Wikidata and DBpedia: a comparative study. In Semanitic Keyword-based Search on Structured Data Sources. Springer, Springer, Cham, Poland, 142–154.
[2]
David Abián, Albert Meroño-Peñuela, and Elena Simperl. 2022. An analysis of content gaps versus user needs in the wikidata knowledge graph. In International Semantic Web Conference. Springer, Springer, Cham, Switzerland, 354–374.
[3]
Bilal Abu-Salih. 2021. Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications 185 (2021), 103076.
[4]
Bradley P Allen, Lise Stork, and Paul Groth. 2023. Knowledge Engineering using Large Language Models. https://arxiv.org/abs/2310.00637, Accessed on January 2024.
[5]
Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, 2023. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Nusa Dua, Bali, 675–718.
[6]
Rishi Bommasani, Percy Liang, and Tony Lee. 2023. Holistic Evaluation of Language Models. Annals of the New York Academy of Sciences 1525, 1 (2023), 140–146.
[7]
Piero A Bonatti, Aidan Hogan, Axel Polleres, and Luigi Sauro. 2011. Robust and scalable linked data reasoning incorporating provenance and trust annotations. Journal of Web Semantics 9, 2 (2011), 165–201.
[8]
Pere-Lluís Huguet Cabot and Roberto Navigli. 2021. REBEL: Relation extraction by end-to-end language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2370–2381.
[9]
Zongsheng Cao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, and Qingming Huang. 2022. Otkge: Multi-modal knowledge graph embeddings via optimal transport. Advances in Neural Information Processing Systems 35 (2022), 39090–39102.
[10]
Haihua Chen, Gaohui Cao, Jiangping Chen, and Junhua Ding. 2019. A practical framework for evaluating the quality of knowledge graph. In Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding: 4th China Conference, CCKS 2019, Hangzhou, China, August 24–27, 2019, Revised Selected Papers 4. Springer, Springer, Singapore, Hangzhou, China, 111–122.
[11]
Jiaoyan Chen, Yuan He, Yuxia Geng, Ernesto Jiménez-Ruiz, Hang Dong, and Ian Horrocks. 2023. Contextual semantic embeddings for ontology subsumption prediction. World Wide Web 26, 5 (2023), 1–23.
[12]
Yong Chen, Xinkai Ge, Shengli Yang, Linmei Hu, Jie Li, and Jinwen Zhang. 2023. A Survey on Multimodal Knowledge Graphs: Construction, Completion and Applications. Mathematics 11, 8 (2023), 1815.
[13]
Fariz Darari, Werner Nutt, Giuseppe Pirrò, and Simon Razniewski. 2018. Completeness management for RDF data sources. ACM Transactions on the Web (TWEB) 12, 3 (2018), 1–53.
[14]
DBpedia. 2024. DBpedia. https://www.dbpedia.org/, Accessed on January, 2024.
[15]
Jacopo de Berardinis, Valentina Anita Carriero, Nitisha Jain, Nicolas Lazzari, Albert Meroño-Peñuela, Andrea Poltronieri, and Valentina Presutti. 2023. The polifonia ontology network: Building a semantic backbone for musical heritage. In International Semantic Web Conference. Springer, Springer, Cham, Athens,Greece, 302–322.
[16]
Jacopo de Berardinis, Albert Meroño-Peñuela, Andrea Poltronieri, and Valentina Presutti. 2023. Choco: a chord corpus and a data transformation workflow for musical harmony knowledge graphs. Scientific Data 10, 1 (2023), 641.
[17]
emerald publishing limited. 2023. Ethnography techniques. https://www.emeraldgrouppublishing.com/how-to/observation/use-ethnographic-methods-participant-observation, Accessed on January, 2024.
[18]
Diego Esteves, Anisa Rula, Aniketh Janardhan Reddy, and Jens Lehmann. 2018. Toward veracity assessment in RDF knowledge bases: an exploratory analysis. Journal of Data and Information Quality (JDIQ) 9, 3 (2018), 1–26.
[19]
Christian Fürber and Martin Hepp. 2011. Swiqa–a semantic web information quality assessment framework. In European Conference on Information System. Association for Information Systems, Helsinki,Finland, 76.
[20]
Google. 2024. Bard main page. https://bard.google.com/chat, Accessed on January, 2024.
[21]
Nicola Guarino and Christopher A Welty. 2009. An overview of OntoClean., 201–220 pages.
[22]
Qi He, Bee-Chung Chen, and Deepak Agarwal. 2016. Building the LinkiedIn Knowledge Graph. https://engineering.linkedin.com/blog/2016/10/building-the-linkedin-knowledge-graph, Accessed: November 2023.
[23]
Marvin Hofer, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, and Erhard Rahm. 2023. Construction of knowledge graphs: State and challenges. https://arxiv.org/abs/2302.11509, Accessed on January 2024.
[24]
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, 2021. Knowledge graphs. ACM Computing Surveys (Csur) 54, 4 (2021), 1–37.
[25]
Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, and Stefan Decker. 2012. An empirical survey of linked data conformance. Journal of Web Semantics 14 (2012), 14–44.
[26]
Elwin Huaman. 2022. Steps to Knowledge Graphs Quality Assessment.
[27]
Krzysztof Janowicz, Bo Yan, Blake Regalia, Rui Zhu, and Gengchen Mai. 2018. Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes. In ISWC (P&D/Industry/BlueSky). Springer Cham, Monterey, CA, USA, 1–5.
[28]
Tobias Käfer, Ahmed Abdelrahman, Jürgen Umbrich, Patrick O’Byrne, and Aidan Hogan. 2013. Observing linked data dynamics. In The Semantic Web: Semantics and Big Data: 10th International Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings 10. Springer, pringer, Berlin, Heidelberg, Montpellier, France, 213–227.
[29]
Lucie-Aimée Kaffee, Alessandro Piscopo, Pavlos Vougiouklis, Elena Simperl, Leslie Carr, and Lydia Pintscher. 2017. A glimpse into Babel: an analysis of multilinguality in Wikidata. In Proceedings of the 13th International Symposium on Open Collaboration. Association for Computing Machinery, New York, NY, United States, Galway, Ireland, 1–5.
[30]
Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning. PMLR, International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 15696–15707.
[31]
KCL. 2023. Hackathon main page. https://king-s-knowledge-graph-lab.github.io/knowledge-prompting-hackathon/, Accessed on January, 2024.
[32]
Konstantinos I Kotis, George A Vouros, and Dimitris Spiliotopoulos. 2020. Ontology engineering methodologies for the evolution of living and reused ontologies: status, trends, findings and recommendations. The Knowledge Engineering Review 35 (2020), 1–34.
[33]
Abhijeet Kumar, Abhishek Pandey, Rohit Gadia, and Mridul Mishra. 2020. Building knowledge graph using pre-trained language model for learning entity-aware relationships. In 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON). IEEE, IEEE, Greater Noida, India, 310–315.
[34]
Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, 2020. Gaia: A fine-grained multimedia knowledge extraction system. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, online, 77–86.
[35]
Yifan Liu, Bin Shang, Chenxin Wang, and Yinliang Zhao. 2023. Knowledge Graph Completion with Information Adaptation and Refinement. In International Conference on Advanced Data Mining and Applications. Springer, Cham, Shenyang, China, 16–31.
[36]
Lumivero. 2024. Nvivo main page. https://lumivero.com/products/nvivo/?utm_source=google&utm_medium=search_paid&utm_campaign=nv_ROW_go_acq_leadgen_brand&utm_content=nv_ROW_go_acq_leadgen_brand_nvivo&utm_ad=667037909244&utm_term=nvivo%20software&matchtype=b&device=c&GeoLoc=9067671&placement=&network=g&campaign_id=20397585540&adset_id=151793699956&ad_id=667037909244&gad_source=1&gclid=Cj0KCQiAqsitBhDlARIsAGMR1RjwUjQDayf0mJxSsuJoU7m6Y6yWQZYM-Ugm0PEHLsKfJ0wToRmyR6AaArmcEALw_wcB, Accessed on January, 2024.
[37]
Jose L Martinez-Rodriguez, Aidan Hogan, and Ivan Lopez-Arevalo. 2020. Information extraction meets the semantic web: a survey. Semantic Web 11, 2 (2020), 255–335.
[38]
Diana Maynard, Kalina Bontcheva, and Isabelle Augenstein. 2017. Natural language processing for the semantic web. Springer, Springer Cham, IL.
[39]
Microsoft. 2023. Microsoft Forms Design Page. https://forms.office.com/Pages/DesignPageV2.aspx, Accessed on January, 2024.
[40]
Microsoft. 2023. Microsoft Teams Login page. https://www.microsoft.com/el-gr/microsoft-teams/group-chat-software, Accessed on January, 2024.
[41]
Nanna Mik-Meyer. 2020. Multimethod qualitative research. Qualitative research 5 (2020), 357–374.
[42]
Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. 2023. Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys 56, 2 (2023), 1–40.
[43]
Hatem Mousselly-Sergieh, Teresa Botschen, Iryna Gurevych, and Stefan Roth. 2018. A multimodal translation-based approach for knowledge graph representation learning. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics, New Orleans, Louisiana, 225–234.
[44]
MuseIT. 2023. Multisensory, User-centred, Shared cultural Experiences through Interactive Technologies. https://www.muse-it.eu/post/look-back-at-haptics-for-inclusion-symposium, Accessed on January, 2024.
[45]
Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A Feder Cooper, Daphne Ippolito, Christopher A Choquette-Choo, Eric Wallace, Florian Tramèr, and Katherine Lee. 2023. Scalable extraction of training data from (production) language models. https://arxiv.org/abs/2311.17035, Accessed on January 2024.
[46]
Sophie Neutel and Maaike HT de Boer. 2021. Towards Automatic Ontology Alignment using BERT. In AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering. CEUR-WS.org, online, 1–12.
[47]
Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale Knowledge Graphs: Lessons and Challenges: Five diverse technology companies show how it’s done. Queue 17, 2 (2019), 48–75.
[48]
OpenAI. 2024. Dalle AI main page. https://openai.com/research/dall-e Accessed on March 2024.
[49]
OpenAI. 2024. Sora AI main page. https://openai.com/sora Accessed on March 2024.
[50]
Otter.ai. 2024. Otter.ai Main page. https://otter.ai/, Accessed on January, 2024.
[51]
Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, and Francesco Osborne. 2023. Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review 56 (2023), 1–32.
[52]
Maria Pershina, Mohamed Yakout, and Kaushik Chakrabarti. 2015. Holistic entity matching across knowledge graphs. In 2015 IEEE International Conference on Big Data (Big Data). IEEE, IEEE, Santa Clara, CA, USA, 1585–1590.
[53]
María Poveda-Villalón, Asunción Gómez-Pérez, and Mari Carmen Suárez-Figueroa. 2014. Oops!(ontology pitfall scanner!): An on-line tool for ontology evaluation. International Journal on Semantic Web and Information Systems (IJSWIS) 10, 2 (2014), 7–34.
[54]
Rabbit. 2023. Learning human actions on computer applications. https://www.rabbit.tech/research, Accessed on January, 2024.
[55]
Elena Simperl and Markus Luczak-Rösch. 2014. Collaborative ontology engineering: a survey. The Knowledge Engineering Review 29, 1 (2014), 101–131.
[56]
Amit Singhal. 2012. Introducing the Knowledge Graph: things, not strings. https://blog.google/products/search/introducing-knowledge-graph-things-not/amp/, Accessed on January, 2024.
[57]
Rudi Studer, V Richard Benjamins, and Dieter Fensel. 1998. Knowledge engineering: Principles and methods. Data & knowledge engineering 25, 1-2 (1998), 161–197.
[58]
Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal knowledge graphs for recommender systems. In Proceedings of the 29th ACM international conference on information & knowledge management. Association for Computing Machinery, New York, NY, United States, Ireland, 1405–1414.
[59]
Katherine Thornton, Harold Solbrig, Gregory S Stupp, Jose Emilio Labra Gayo, Daniel Mietchen, Eric Prud’Hommeaux, and Andra Waagmeester. 2019. Using shape expressions (ShEx) to share RDF data models and to guide curation with rigorous validation. In The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16. Springer, Springer, Cham, Portorož, Slovenia, 606–620.
[60]
Atiya Usmani, M Jaleed Khan, John G. Breslin, and Edward Curry. 2023. Towards Multimodal Knowledge Graphs for Data Spaces. In Companion Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, United States, Austin, TX, USA, 1494–1499.
[61]
Marieke van Erp, William Tullett, Vincent Christlein, Thibault Ehrhart, Ali Hürriyetoğlu, Inger Leemans, Pasquale Lisena, Stefano Menini, Daniel Schwabe, Sara Tonelli, 2023. More than the Name of the Rose: How to Make Computers Read, See, and Organize Smells. The American Historical Review 128, 1 (2023), 335–369.
[62]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledge Base. Commun. ACM 57, 10 (2014), 78–85.
[63]
Johanna Walker, Elisavet Koutsiana, Joe Massey, Gefion Theurmer, and Elena Simperl. 2023. Prompting Datasets: Data Discovery with Conversational Agents.
[64]
Meng Wang, Sen Wang, Han Yang, Zheng Zhang, Xi Chen, and Guilin Qi. 2021. Is visual context really helpful for knowledge graph? A representation learning perspective. In Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, United States, China, 2735–2743.
[65]
Ruijie Wang, Yuchen Yan, Jialu Wang, Yuting Jia, Ye Zhang, Weinan Zhang, and Xinbing Wang. 2018. Acekg: A large-scale knowledge graph for academic data mining. In Proceedings of the 27th ACM international conference on information and knowledge management. Association for Computing Machinery, New York, NY, United States, Torino, Italy, 1487–1490.
[66]
Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, and Guoyin Wang. 2023. Gpt-ner: Named entity recognition via large language models. https://arxiv.org/abs/2304.10428, Accessed on January 2024.
[67]
Wikidata. 2023. Wikidata. https://www.wikidata.org/wiki/Wikidata:Main_Page, Accessed on January, 2024.
[68]
Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 3, 1 (2016), 1–9.
[69]
Bohui Zhang, Albert Meroño Peñuela, and Elena Simperl. 2023. Towards Explainable Automatic Knowledge Graph Construction with Human-in-the-Loop. In HHAI 2023: Augmenting Human Intellect. IOS Press, Munich, Germany, 274–289.
[70]
Rui Zhang, Yixin Su, Bayu Distiawan Trisedya, Xiaoyan Zhao, Min Yang, Hong Cheng, and Jianzhong Qi. 2023. AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models. IEEE Transactions on Knowledge and Data Engineering Early Access (2023), 1–14.
[71]
Lingfeng Zhong, Jia Wu, Qian Li, Hao Peng, and Xindong Wu. 2023. A comprehensive survey on automatic knowledge graph construction. Comput. Surveys 56, 4 (2023), 1–62.
[72]
Ganggao Zhu and Carlos A Iglesias. 2018. Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Systems with Applications 101 (2018), 8–24.
[73]
Xiangru Zhu, Zhixu Li, Xiaodan Wang, Xueyao Jiang, Penglei Sun, Xuwu Wang, Yanghua Xiao, and Nicholas Jing Yuan. 2022. Multi-modal knowledge graph construction and application: A survey. IEEE Transactions on Knowledge and Data Engineering 36, 2 (2022), 715–735.

Cited By

View all
  • (2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
May 2024
4761 pages
ISBN:9798400703317
DOI:10.1145/3613905
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Check for updates

Author Tags

  1. Interviews
  2. Knowledge Engineering
  3. Knowledge Graph
  4. Large Language Models

Qualifiers

  • Work in progress
  • Research
  • Refereed limited

Conference

CHI '24

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)339
  • Downloads (Last 6 weeks)44
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media