More Web Proxy on the site http://driver.im/

research-article

Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India

Authors:

Ali Hürriyetoğlu,

Çağri Yoltar,

Deniz YüretAuthors Info & Claims

GIR'18: Proceedings of the 12th Workshop on Geographic Information Retrieval

Article No.: 8, Pages 1 - 10

https://doi.org/10.1145/3281354.3281363

Published: 06 November 2018 Publication History

Abstract

Place name recognition is one of the key tasks in Information Extraction. In this paper, we tackle this task in English News from India. We first analyze the results obtained by using available tools and corpora and then train our own models to obtain better results. Most of the previous work done on entity recognition for English makes use of similar corpora for both training and testing. Yet we observe that the performance drops significantly when we test the models on different datasets. For this reason, we have trained various models using combinations of several corpora. Our results show that training models using combinations of several corpora improves the relative performance of these models but still more research on this area is necessary to obtain place name recognizers that generalize to any given dataset.

References

[1]

N Abinaya, Neethu John, Barathi HB Ganesh, Anand M Kumar, and KP Soman. 2014. AMRITA_CEN@ FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 103--111.

Digital Library

[2]

Beatrice Alex, Kate Byrne, Claire Grover, and Richard Tobin. 2015. Adapting the Edinburgh geoparser for historical georeferencing. International Journal of Humanities and Arts Computing 9, 1 (2015), 15--35.

[3]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722--735.

Digital Library

[4]

Isabelle Augenstein, Leon Derczynski, and Kalina Bontcheva. 2017. Generalisation in named entity recognition: A quantitative analysis. Computer Speech & Language 44 (2017), 61--83.

Digital Library

[5]

Jason PC Chiu and Eric Nichols. 2015. Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308 (2015).

[6]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537.

Digital Library

[7]

Francisco Couto, Luis Campos, and Andre Lamurias. 2017. MER: a Minimal Named-Entity Recognition Tagger and Annotation Server. (04 2017).

[8]

Grant DeLozier, Jason Baldridge, and Loretta London. 2015. Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles. In AAAI. 2382--2388.

Digital Library

[9]

Franck Dernoncourt, Ji Young Lee, and Peter Szolovits. 2017. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. Conference on Empirical Methods on Natural Language Processing (EMNLP) (2017).

[10]

Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. 2016. De-identification of Patient Notes with Recurrent Neural Networks. Journal of the American Medical Informatics Association (JAMIA) (2016).

[11]

Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily M Bender. 2017. Towards linguistically generalizable nlp systems: A workshop and shared task. arXiv preprint arXiv:1711.01505 (2017).

[12]

Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 363--370.

Digital Library

[13]

Christopher B. Jones and Ross S. Purves. 2008. Geographical information retrieval. International Journal of Geographical Information Science 22, 3 (2008), 219--228.

Digital Library

[14]

Morteza Karimzadeh, Wenyi Huang, Siddhartha Banerjee, Jan Oliver Wallgrün, Frank Hardisty, Scott Pezanowski, Prasenjit Mitra, and Alan M MacEachren. 2013. GeoTxt: a web API to leverage place references in text. In Proceedings of the 7th workshop on geographic information retrieval. ACM, 72--73.

Digital Library

[15]

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).

[16]

Thomas Lavergne, Olivier Cappé, and François Yvon. 2010. Practical Very Large Scale CRFs. In Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 504--513. http://www.aclweb.org/anthology/P10-1052

Digital Library

[17]

David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3--26.

[18]

Sailaja Pingali. 2009. Indian English. Edinburgh University Press.

[19]

Dinesh Kumar Prabhakar, Shantanu Dubey, Bharti Goel, and Sukomal Pal. 2014. ISM@FIRE-2014: Named Entity Recognition for Indian Languages. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 98--102.

Digital Library

[20]

Ross S Purves, Paul Clough, Christopher B Jones, Mark H Hall, Vanessa Murdock, et al. 2018. Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text. Foundations and Trends® in Information Retrieval 12, 2-3 (2018), 164--318.

[21]

Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.

[22]

SP Sanjay, M Anand Kumar, and KP Soman. 2015. AMRITA_CEN-NLP@ FIRE 2015: CRF Based Named Entity Extractor For Twitter Microposts. In FIRE Workshops. 96--99.

[23]

Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 142--147.

Digital Library

Cited By

Hürriyetoğlu AYörük EYüret DYoltar ÇGürel BDuruşan FMutlu OAkdemir A(2019)Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-Context SettingExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-28577-7_32(425-432)Online publication date: 3-Aug-2019
https://doi.org/10.1007/978-3-030-28577-7_32
Hürriyetoğlu AYörük EYüret DYoltar ÇGürel BDuruşan FMutlu O(2019)A Task Set Proposal for Automatic Protest Information Collection Across Multiple CountriesAdvances in Information Retrieval10.1007/978-3-030-15719-7_42(316-323)Online publication date: 14-Apr-2019
https://dl.acm.org/doi/10.1007/978-3-030-15719-7_42

Index Terms

Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval tasks and goals
      1. Information extraction

Recommendations

Biomedical Named Entity Recognition with less Supervision
ICHI '15: Proceedings of the 2015 International Conference on Healthcare Informatics

Annotating clinical notes manually is very labor-intensive and needs expertise in the area of annotation. Thus annotation is a highly expensive task not only in human resource but also in financial aspects. Moreover mistakes, missed tags, and ...
Generalisation in named entity recognition

Quantitative study of NER performance in diverse corpora of different genres, including newswire and social media.Multiple state of the art NER approaches are tested.Possible reasons for NER failure are analysed and quantified: NE diversity, unseen NEs ...
NERA: Named Entity Recognition for Arabic

Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GIR'18: Proceedings of the 12th Workshop on Geographic Information Retrieval

November 2018

37 pages

ISBN:9781450360340

DOI:10.1145/3281354

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSPATIAL: ACM Special Interest Group on Spatial Information

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

European Research Council

Conference

SIGSPATIAL '18

Sponsor:

SIGSPATIAL

SIGSPATIAL '18: 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

November 6, 2018

WA, Seattle, USA

Acceptance Rates

GIR'18 Paper Acceptance Rate 8 of 12 submissions, 67%;

Overall Acceptance Rate 46 of 61 submissions, 75%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
144
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hürriyetoğlu AYörük EYüret DYoltar ÇGürel BDuruşan FMutlu OAkdemir A(2019)Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-Context SettingExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-28577-7_32(425-432)Online publication date: 3-Aug-2019
https://doi.org/10.1007/978-3-030-28577-7_32
Hürriyetoğlu AYörük EYüret DYoltar ÇGürel BDuruşan FMutlu O(2019)A Task Set Proposal for Automatic Protest Information Collection Across Multiple CountriesAdvances in Information Retrieval10.1007/978-3-030-15719-7_42(316-323)Online publication date: 14-Apr-2019
https://dl.acm.org/doi/10.1007/978-3-030-15719-7_42

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten