[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Building event-centric knowledge graphs from news

Published: 01 March 2016 Publication History

Abstract

Knowledge graphs have gained increasing popularity in the past couple of years, thanks to their adoption in everyday search engines. Typically, they consist of fairly static and encyclopedic facts about persons and organizations-e.g. a celebrity's birth date, occupation and family members-obtained from large repositories such as Freebase or Wikipedia.In this paper, we present a method and tools to automatically build knowledge graphs from news articles. As news articles describe changes in the world through the events they report, we present an approach to create Event-Centric Knowledge Graphs (ECKGs) using state-of-the-art natural language processing and semantic web techniques. Such ECKGs capture long-term developments and histories on hundreds of thousands of entities and are complementary to the static encyclopedic information in traditional knowledge graphs.We describe our event-centric representation schema, the challenges in extracting event information from news, our open source pipeline, and the knowledge graphs we have extracted from four different news corpora: general news (Wikinews), the FIFA world cup, the Global Automotive Industry, and Airbus A380 airplanes. Furthermore, we present an assessment on the accuracy of the pipeline in extracting the triples of the knowledge graphs. Moreover, through an event-centered browser and visualization tool we show how approaching information from news in an event-centric manner can increase the user's understanding of the domain, facilitates the reconstruction of news story lines, and enable to perform exploratory investigation of news hidden facts.

References

[1]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, S. Hellmann, Dbpedia-a crystallization point for the web of data, J. Web Semant.: Sci. Serv. Agents World Wide Web, 7 (2009) 154-165.
[2]
F.M. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowledge, in: Proceedings of the 16th International Conference on World Wide Web, 2007, pp. 697-706.
[3]
D. Vrandecčić, M. Krötch, Wikidata: a free collaborative knowledgebase, Commun. ACM, 57 (2014) 78-85.
[4]
M.-A.N. Francois~Belleau, N. Tourigny, P. Rigault, J. Morissette, Bio2rdf: Towards a mashup to build bioinformatics knowledge systems, J. Biomed. Inform., 41 (2008) 706-716.
[5]
A.J.G. Gray, P. Groth, A. Loizou, S. Askjaer, C. Brenninkmeijer, K. Burger, C. Chichester, C.T. Evelo, C. Goble, L. Harland, S. Pettifer, M. Thompson, A. Waagmeester, A.J. Williams, Applying linked data approaches to pharmacology: Architectural decisions and implementation, Semant. Web J., 5 (2014) 101-113.
[6]
A. Bordes, E. Gabrilovich, Constructing and mining web-scale knowledge graphs: Kdd 2014 tutorial, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'14, ACM, New York, NY, USA, 2014.
[7]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ACM, 2008, pp. 1247-1250.
[8]
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, W. Zang, Knowledge vault: a web-scale approach to probabilistic knowledge fusion, in: KDD'14 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 601-610.
[9]
D. Shahaf, C. Guestrin, Connecting the dots between news articles, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'10, Washington, DC, USA, 2010, pp. 623-632.
[10]
E. Kuzey, J. Vreeken, G. Weikum, A fresh look on knowledge bases: Distilling named events from news, in: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014, ACM, Shanghai, China, 2014, pp. 1689-1698.
[11]
O. Etzioni, M. Banko, S. Soderland, D.S. Weld, Open information extraction from the web, Commun. ACM, 51 (2008) 68-74.
[12]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. H, T. Mitchell, Toward an architecture for never-ending language learning, in: Proceedings of the Conference on Artificial Intelligence, AAAI, AAAI Press, 2010, pp. 1306-1313.
[13]
M. Surdeanu, S. Harabagiu, J. Williams, P. Aarseth, Using predicate-argument structures for information extraction, in: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, ACL'03, Association for Computational Linguistics, Stroudsburg, PA, USA, 2003, pp. 8-15.
[14]
H. Llorens, E. Saquete, B. Navarro, Tipsem (english and spanish): Evaluating crfs and semantic roles in tempeval-2, in: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval'10, Association for Computational Linguistics, Stroudsburg, PA, USA, 2010, pp. 284-291.
[15]
J. Pustejovsky, J. Castaño, R. Ingria, R. Saurí, R. Gaizauskas, A. Setzer, G. Katz, Timeml: Robust specification of event and temporal expressions in text, in: Fifth International Workshop on Computational Semantics, IWCS-5, 2003.
[16]
P. Exner, P. Nugues, Using semantic role labeling to extract events from wikipedia, in: Proceedings of the Workshop on Detection, Representation, and Exploitation of Events in the Semantic Web, DeRiVE 2011. Workshop in conjunction with the 10th International Semantic Web Conference, 2011.
[17]
J. Christensen, Mausam, S. Soderland, O. Etzioni, Semantic role labeling for open information extraction, in: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, FAM-LbR'10, Association for Computational Linguistics, Stroudsburg, PA, USA, 2010, pp. 52-60.
[18]
S.-H. Hung, C.-H. Lin, J.-S. Hong, Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling, Expert Syst. Appl., 37 (2010) 341-347.
[19]
L. Padró, ¿eljko Agić, X. Carreras, B. Fortuna, E. García-Cuesta, Z. Li, T. Štajner, M. Tadić, Language processing infrastructure in the xlike project, in: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC2014, 2014.
[20]
V. Presutti, F. Draicchio, A. Gangemi, Knowledge extraction based on discourse representation theory and linguistic frames, in: Lecture Notes in Computer Science, vol. 7603, Springer, Berlin, Heidelberg, 2012, pp. 114-129.
[21]
A. Cybulska, P. Vossen, Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution, in: Proceedings of the 9th Language Resources and Evaluation Conference, LREC2014, Reykjavik, Iceland, 2014.
[22]
Z. Beloki, G. Rigau, A. Soroa, A. Fokkens, K. Verstoep, P. Vossen, M. Rospocher, F. Corcoglioniti, R. Cattoni, S. Verhoeven, M. Kattenberg, System Design, Version 2, Deliverable 2.2, NewsReader Project, 2015.
[23]
A. Fokkens, M. van Erp, P. Vossen, S. Tonelli, W.R. van Hage, L. Serafini, R. Sprugnoli, J. Hoeksema, Gaf: A grounded annotation framework for events, in: Proceedings of the 1st workshop on Events: Definition, Detection, Coreference, and Representation at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL2013, Association for Computational Linguistics, Atlanta, GA, USA, no. ISBN: 978-1-937284-47-3, 2013.
[24]
W.R. van Hage, V. Malaisé, R. Segers, L. Hollink, G. Schreiber, Design and use of the Simple Event Model (SEM), J. Web Sem., 9 (2011) 128-136.
[25]
R. Cyganiak, D. Wood, M. Lanthaler, Rdf 1.1 Concepts and Abstract Syntax, Tech. Rep., W3C, 2014. URL http://www.w3.org/TR/rdf11-concepts/.
[26]
L. Moreau, P. Groth, Provenance: An Introduction to PROV, in: vol.¿3 of Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan Claypool, 2013.
[27]
Z. Beloki, G. Rigau, A. Soroa, A. Fokkens, P. Vossen, M. Rospocher, F. Corcoglioniti, R. Cattoni, T. Ploeger, W.R. van Hage, System Design, Version 1, Deliverable 2.1, NewsReader Project, 2014.
[28]
C.F. Baker, C.J. Fillmore, J.B. Lowe, The berkeley framenet project, in: Proceedings of the 17th International Conference on Computational linguistics, vol. 1, Association for Computational Linguistics, 1998, pp. 86-90.
[29]
R. Segers, P. Vossen, M. Rospocher, L. Serafini, E. Laparra, G. Rigau, Eso: A frame based ontology for events and implied situations, in: Proceedings of MAPLEX 2015, Yamagata, Japan, 2015. URL https://dkm-static.fbk.eu/people/rospocher/files/pubs/2015maplex.pdf.
[30]
A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W.R. van Hage, P. Vossen, NAF and GAF: Linking linguistic annotations, in: Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation, Reykjavik, Iceland, 2014, p.¿9. URL http://sigsem.uvt.nl/isa10/ISA-10_proceedings.pdf.
[31]
R. Agerri, X. Artola, Z. Beloki, G. Rigau, A. Soroa, Big data for natural language processing: A streaming approach, Knowl.-Based Syst., 79 (2015) 36-42.
[32]
W. Bosma, P. Vossen, A. Soroa, G. Rigau, M. Tesconi, A. Marchetti, M. Monachini, C.L. Aliprandi, Kaf: a generic semantic annotation format, in: Proceedings of the GL2009 Workshop on Semantic Annotation, 2009.
[33]
N. Ide, L. Romary, É.V. de~La~Clergerie, International standard for a linguistic annotation framework, in: Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems, SEALTS, Association for Computational Linguistics, 2003.
[34]
R. Agerri, I. Aldabe, Z. Beloki, E. Laparra, M.L. de¿Lacalle, G. Rigau, A. Soroa, A. Fokkens, R. Izquierdo, M. van Erp, P. Vossen, C. Girardi, A.-L. Minard, Event Detection, Version 2, Tech. Rep., NewsReader Project, 2015.
[35]
R. Agerri, J. Bermudez, G. Rigau, IXA pipeline: Efficient and ready to use multilingual NLP tools, in: Proceedings of the 9th Language Resources and Evaluation Conference, LREC2014, Reykjavik, Iceland, 2014.
[36]
P. Mirza, A.-L. Minard, HLT-FBK: a complete temporal processing system for QA TempEval, in: Proceedings of the Ninth International Workshop on Semantic Evaluation, SemEval'15, 2015.
[37]
A.B. Anders, B. Bohnet, L. Hafdell, P. Nugues, A high-performance syntactic and semantic dependency parser, in: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, COLING'10, Association for Computational Linguistics, Stroudsburg, PA, USA, 2010, pp. 33-36.
[38]
J. Daiber, M. Jakob, C. Hokamp, P.N. Mendes, Improving efficiency and accuracy in multilingual entity extraction, in: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics), 2013.
[39]
M. van Erp, P. Vossen, R. Agerri, A.-L. Minard, M. Speranza, R. Urizar, E. Laparra, I. Aldabe, G. Rigau, Deliverable d3.3.2: Annotated Data, Version 2, Tech. Rep., NewsReader Project, 2014.
[40]
E.F. Tjong Kim Sang, F.D. Meulder, Introduction to the conll-2003 shared task: Language-independent named entity recognition, in: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 142-147.
[41]
J.R. Finkel, T. Grenager, C. Manning, Incorporating non-local information into information extraction systems by gibbs sampling, in: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, ACL 2005, 2005, pp. 363-370.
[42]
L. Ratinov, D. Roth, Design challenges and misconceptions in named entity recognition, in: Proceedings of CoNLL'09, 2009.
[43]
J. Hoffart, M.A. Yosef, I. Bordin, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, G. Weikum, Robust disambiguation of named entities, in: Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, 2011, pp. 782-792.
[44]
M. Palmer, D. Gildea, P. Kingsbury, The proposition bank: An annotated corpus of semantic roles, Comput. Linguist., 31 (2005) 71-106.
[45]
M. López¿de Lacalle, E. Laparra, G. Rigau, Predicate matrix: extending semlink through wordnet mappings, in: The 9th edition of the Language Resources and Evaluation Conference. Reykjavik, Iceland, 2014.
[46]
K. Kipper, Verbnet: A broad-coverage, comprehensive verb lexicon, University of Pennsylvania, 2005.
[47]
WordNet. An Electronic Lexical Database, in: WordNet. An Electronic Lexical Database, The MIT Press, 1998.
[48]
M. Palmer, Semlink: Linking propbank, verbnet and framenet, in: Proceedings of the Generative Lexicon Conference, 2009, pp. 9-15.
[49]
A. Björkelund, L. Hafdell, P. Nugues, Multilingual semantic role labeling, in: Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL'09, Association for Computational Linguistics, Stroudsburg, PA, USA, 2009, pp. 43-48.
[50]
M. Rospocher, A.-L. Minard, P. Mirza, P. Vossen, T. Caselli, A. Cybulska, R. Morante, I. Aldabe, Deliverable d5.1.2: Event Narrative Module, Version 2, Tech. Rep., NewsReader Project, 2015.
[51]
W.V. Quine, Events and reification, in: Actions and Events: Perspectives on the Philosophy of Davidson, Blackwell, 1985, pp. 162-71.
[52]
A. Cybulska, P. Vossen, Bag of events approach to event coreference resolution. supervised classification of event templates, Int. J. Comput. Linguist. Appl., 6 (2015) 9-24.
[53]
C. Leacock, M. Chodorow, Combining local context with wordnet similarity for word sense identification, 1998.
[54]
F. Corcoglioniti, M. Rospocher, R. Cattoni, B. Magnini, L. Serafini, Interlinking unstructured and structured knowledge in an integrated framework, in: 7th IEEE International Conference on Semantic Computing, ICSC, Irvine, CA, USA, 2013.
[55]
F. Corcoglioniti, M. Rospocher, R. Cattoni, B. Magnini, L. Serafini, The knowledgestore: a storage framework for interlinking unstructured and structured knowledge, Int. J. Semant. Web Inf. Syst., 11 (2015) 1-35.
[56]
P. Stouten, R. Kortleven, I. Hopkinson, Deliverable d8.1: Test Data and Scenarios, Tech. Rep., NewsReader Project, 2013.
[57]
J. Hoffart, F.M. Suchanek, K. Berberich, G. Weikum, YAGO2: A spatially and temporally enhanced knowledge base from wikipedia, Artificial Intelligence, 194 (2013) 28-61.
[58]
I. Hopkinson, S. Maude, M. Rospocher, A simple API to the KnowledgeStore, in: Proceedings of the ISWC Developers Workshop colocated with the 13th International Semantic Web Conference (ISWC'14), Riva del Garda, Italy, 2014.

Cited By

View all
  • (2024)Construction and Teaching Application of Knowledge Graph in the Field of International TradeProceedings of the International Conference on Decision Science & Management10.1145/3686081.3686127(270-275)Online publication date: 26-Apr-2024
  • (2024)Prompt for extractionKnowledge-Based Systems10.1016/j.knosys.2024.111544289:COnline publication date: 8-Apr-2024
  • (2024)TCEKG: A Temporal and Causal Event Knowledge Graph for Power Distribution Network Fault DiagnosisAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5615-5_41(505-517)Online publication date: 5-Aug-2024
  • Show More Cited By
  1. Building event-centric knowledge graphs from news

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Web Semantics: Science, Services and Agents on the World Wide Web
    Web Semantics: Science, Services and Agents on the World Wide Web  Volume 37, Issue C
    March 2016
    207 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 March 2016

    Author Tags

    1. Big data
    2. Event extraction
    3. Event-centric knowledge
    4. Information integration
    5. Natural language processing
    6. Real world data

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Construction and Teaching Application of Knowledge Graph in the Field of International TradeProceedings of the International Conference on Decision Science & Management10.1145/3686081.3686127(270-275)Online publication date: 26-Apr-2024
    • (2024)Prompt for extractionKnowledge-Based Systems10.1016/j.knosys.2024.111544289:COnline publication date: 8-Apr-2024
    • (2024)TCEKG: A Temporal and Causal Event Knowledge Graph for Power Distribution Network Fault DiagnosisAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5615-5_41(505-517)Online publication date: 5-Aug-2024
    • (2024)Event Evolution Analysis of Network Text Based on Pre-trained Language Model and Event GraphCooperative Design, Visualization, and Engineering10.1007/978-3-031-71315-6_6(52-62)Online publication date: 15-Sep-2024
    • (2023)Construction of Gesar Epic Event Graph Based on Event ExtractionProceedings of the 2023 4th International Conference on Computer Science and Management Technology10.1145/3644523.3644601(431-436)Online publication date: 13-Oct-2023
    • (2023)OKG: A Knowledge Graph for Fine-grained Understanding of Social Media Discourse on InequalityProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627557(166-174)Online publication date: 5-Dec-2023
    • (2023)DICE: a Dataset of Italian Crime Event newsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591904(2985-2995)Online publication date: 19-Jul-2023
    • (2023)Human-in-the-Loop Rule Discovery for Micropost Event DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320834535:8(8100-8111)Online publication date: 1-Aug-2023
    • (2023)A Software Reference Architecture for Journalistic Knowledge PlatformsKnowledge-Based Systems10.1016/j.knosys.2023.110750276:COnline publication date: 27-Sep-2023
    • (2023)Shards of Knowledge – Modeling Attributions for Event-Centric Knowledge GraphsConceptual Modeling10.1007/978-3-031-47262-6_14(259-276)Online publication date: 6-Nov-2023
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media