[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3227609.3227670acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Mining and Leveraging Background Knowledge for Improving Named Entity Linking

Published: 25 June 2018 Publication History

Abstract

Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development.
The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge.
This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge.
Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.

References

[1]
Adrian M.P. Braşoveanu, Giuseppe Rizzo, Philipp Kuntschick, Albert Weichselbraun, and Lyndon J.B. Nixon. 2018. Framing Named Entity Linking Error Types. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (7-12), Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). European Language Resources Association (ELRA), Paris, France, 266--271. http://www.lrec-conf.org/proceedings/lrec2018/summaries/612.html
[2]
Erik Cambria and Bebo White. 2014. Jumping NLP Curves: A Review of Natural Language Processing Research. IEEE Computational Intelligence Magazine 9, 2 (May 2014), 48--57.
[3]
Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-SEMANTICS'13). ACM, Graz, Austria, 121--124.
[4]
Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, and Kalina Bontcheva. 2015. Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51, 2 (2015), 32--49.
[5]
Maud Ehrmann, Guillaume Jacquet, and Ralf Steinberger. 2017. JRC-Names: Multilingual entity name variants and titles as Linked Data. Semantic Web 8, 2 (2017), 283--295.
[6]
Matthew Francis-Landau, Greg Durrett, and Dan Klein. 2016. Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, Kevin Knight, Ani Nenkova, and Owen Rambow (Eds.). The Association for Computational Linguistics, San Diego, CA, USA, 1256--1261.
[7]
Ben Hachey, Joel Nothman, and Will Radford. 2014. Cheap and easy entity evaluation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June22-27, 2014, Baltimore, MD, USA, Volume 2: Short Papers. ACL, Baltimore, MD, USA, 464--469.
[8]
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust Disambiguation of Named Entities in Text. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27--31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, Edinburgh, UK, 782--792.
[9]
Heng Ji and Joel Nothman. 2016. Overview of TAC-KBP2016 Tri-lingual EDL and Its Impact on End-to-End KBP. In Eighth Text Analysis Conference (TAC). NIST, Gaithersburg, Maryland, USA, Article 3, 15 pages. https://tac.nist.gov/publications/2016/additional.papers/TAC2016.KBP_Entity_Discovery_and_Linking_overview.proceedings.pdf
[10]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal 6, 2 (2015), 103--104.
[11]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.
[12]
Edgard Marx, Saeedeh Shekarpour, Tommaso Soru, Adrian M. P. Braşoveanu, Muhammad Saleem, Ciro Baron, Albert Weichselbraun, Jens Lehmann, AxelCyrille Ngonga Ngomo, and Sören Auer. 2017. Torpedo: Improving the State-of-the-Art RDF Dataset Slicing. In 11th IEEE International Conference on Semantic Computing, ICSC 2017, San Diego, CA, USA, January 30 - February 1, 2017. IEEE Computer Society, San Diego, CA, USA, 149--156.
[13]
Stuart E. Middleton and Vadims Krivcovs. 2016. Geoparsing and Geosemantics for Social Media: Spatiotemporal Grounding of Content Propagating Rumors to Support Trust and Veracity Analysis during Breaking News. ACM Trans. Inf. Syst. 34, 3 (2016), 16:1--16:26.
[14]
Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics 2 (2014), 231--244. https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/291
[15]
Andrea Giovanni Nuzzolese, Anna Lisa Gentile, Valentina Presutti, Aldo Gangemi, Darío Garigliotti, and Roberto Navigli. 2015. Open Knowledge Extraction Challenge. In Semantic Web Evaluation Challenges - Second Sem WebEval Challenge at ESWC 2015, Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected Papers (Communications in Computer and Information Science), Fabien Gandon, Elena Cabrio, Milan Stankovic, and Antoine Zimmermann (Eds.), Vol. 548. Springer, Berlin, Germany, 3--15.
[16]
Andrea Giovanni Nuzzolese, Anna Lisa Gentile, Valentina Presutti, Aldo Gangemi, Robert Meusel, and Heiko Paulheim. 2016. The Second Open Knowledge Extraction Challenge, See {23}, 3--16.
[17]
Ozer Ozdikis, Halit Oguztüzün, and Pinar Karagoz. 2017. A survey on location estimation techniques for events detected in Twitter. Knowl. Inf. Syst. 52, 2 (2017), 291--339.
[18]
Julien Plu, Giuseppe Rizzo, and Raphaël Troncy. 2016. Enhancing Entity Linking by Combining NER Models, See {23}, 17--32.
[19]
Petar Ristoski and Heiko Paulheim. 2016. Semantic Web in data mining and knowledge discovery: A comprehensive survey. Web Semantics: Science, Services and Agents on the World Wide Web 36 (jan 2016), 1--22.
[20]
Giuseppe Rizzo, Bianca Pereira, Andrea Varga, Marieke van Erp, and Amparo Elizabeth Cano Basave. 2017. Lessons learnt from the Named Entity rEcognition and Linking (NEEL) challenge series. Semantic Web 8, 5 (2017), 667--700.
[21]
Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel Gerber, and Andreas Both. 2014. N3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26-31, 2014., Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis (Eds.). ELRA, Paris, France, 3529--3533. http://www.lrec-conf.org/proceedings/lrec2014/summaries/856.html
[22]
Benjamin Roth, Tassilo Barth, Michael Wiegand, Mittul Singh, and Dietrich Klakow. 2014. Effective Slot Filling Based on Shallow Distant Supervision Methods. arXiv 1401, 1158 {cs} (jan 2014), 0--0. http://arxiv.org/abs/1401.1158 arXiv: 1401.1158.
[23]
Harald Sack, Stefan Dietze, Anna Tordai, and Christoph Lange (Eds.). 2016. Semantic Web Challenges - Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. Communications in Computer and Information Science, Vol. 641. Springer, Berlin, Germany.
[24]
Robert Speer and Catherine Havasi. 2012. Representing General Relational Knowledge in ConceptNet 5. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA), Istanbul, Turkey, Article 1072, 8 pages. http://www.lrec-conf.org/proceedings/lrec2012/summaries/1072.html
[25]
Thomas Steiner, Ruben Verborgh, Raphaël Troncy, Joaquim Gabarró, and Rik Van de Walle. 2012. Adding Realtime Coverage to the Google Knowledge Graph. In Proceedings of the ISWC 2012 Posters & Demonstrations Track, Boston, USA, November 11-15, 2012 (CEUR Workshop Proceedings), Birte Glimm and David Huynh (Eds.), Vol. 914. CEUR-WS, Aachen, Germany, Article 2, 4 pages. http://ceur-ws.org/Vol-914/paper_2.pdf
[26]
Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder, Daniel Gerber, Sandro Athaide Coelho, Sören Auer, and Andreas Both. 2014. AGDISTIS - Agnostic Disambiguation of Named Entities Using Linked Open Data. In ECAI 2014 - 21st European Conference on Artificial Intelligence, 18--22 August 2014, Prague, Czech Republic - Including Prestigious Applications of Intelligent Systems (PAIS 2014) (Frontiers in Artificial Intelligence and Applications), Torsten Schaub, Gerhard Friedrich, and Barry O'Sullivan (Eds.), Vol. 263. IOS Press, Amsterdam, The Netherlands, 1113--1114.
[27]
Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann. 2015. GERBIL: General Entity Annotator Benchmarking Framework. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015 (May 18-22). ACM, Florence, Italy, 1133--1143.
[28]
Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.
[29]
Albert Weichselbraun and Philipp Kuntschik. 2017. Mitigating linked data quality issues in knowledge-intense information extraction methods. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, WIMS 2017, Amantea, Italy, June 19-22, 2017, Rajendra Akerkar, Alfredo Cuzzocrea, Jannong Cao, and Mohand-Said Hacid (Eds.). ACM, Amantea, Italy, 17:1--17:12.
[30]
Albert Weichselbraun, Daniel Streiff, and Arno Scharl. 2015. Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence. International Journal on Artificial Intelligence Tools 24, 2, Article 1 (2015), 24 pages.
[31]
Amrapali Zaveri, Dimitris Kontokostas, Mohamed A. Sherif, Lorenz Bühmann, Mohamed Morsey, Sören Auer, and Jens Lehmann. 2013. User-driven Quality Evaluation of DBpedia. In Proceedings of the 9th International Conference on Semantic Systems (I-SEMANTICS '13). ACM, Graz, Austria, Article 1, 8 pages.

Cited By

View all
  • (2022)Building Knowledge Graphs and Recommender Systems for Suggesting Reskilling and Upskilling Options from the WebInformation10.3390/info1311051013:11(510)Online publication date: 25-Oct-2022
  • (2022)Slot Filling for Extracting Reskilling and Upskilling Options from the WebNatural Language Processing and Information Systems10.1007/978-3-031-08473-7_25(279-290)Online publication date: 13-Jun-2022
  • (2021)The Application of Text Mining Algorithms to Discover One Topic Objects in Digital Learning Repositories2021 28th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT50888.2021.9347611(502-509)Online publication date: 27-Jan-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WIMS '18: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics
June 2018
398 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Knowledge-rich Information Extraction
  2. Linked Data Quality Information Extraction
  3. Named Entity Linking
  4. Natural Language Processing
  5. Semantic Technologies

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WIMS '18

Acceptance Rates

Overall Acceptance Rate 140 of 278 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Building Knowledge Graphs and Recommender Systems for Suggesting Reskilling and Upskilling Options from the WebInformation10.3390/info1311051013:11(510)Online publication date: 25-Oct-2022
  • (2022)Slot Filling for Extracting Reskilling and Upskilling Options from the WebNatural Language Processing and Information Systems10.1007/978-3-031-08473-7_25(279-290)Online publication date: 13-Jun-2022
  • (2021)The Application of Text Mining Algorithms to Discover One Topic Objects in Digital Learning Repositories2021 28th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT50888.2021.9347611(502-509)Online publication date: 27-Jan-2021
  • (2021)RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural NetworkProceedings of the Web Conference 202110.1145/3442381.3449917(1673-1685)Online publication date: 19-Apr-2021
  • (2020)Named Entity Extraction for Knowledge Graphs: A Literature OverviewIEEE Access10.1109/ACCESS.2020.29739288(32862-32881)Online publication date: 2020
  • (2019)Verification of Web Videos Through Analysis of Their Online ContextVideo Verification in the Fake News Era10.1007/978-3-030-26752-0_7(191-221)Online publication date: 18-Sep-2019
  • (2019)Real-Time Story Detection and Video Retrieval from Social Media StreamsVideo Verification in the Fake News Era10.1007/978-3-030-26752-0_2(17-52)Online publication date: 18-Sep-2019
  • (2019)Multimodal Analytics Dashboard for Story Detection and VisualizationVideo Verification in the Fake News Era10.1007/978-3-030-26752-0_10(281-299)Online publication date: 18-Sep-2019
  • (2018)On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking PerformanceProcedia Computer Science10.1016/j.procs.2018.09.004137(33-42)Online publication date: 2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media