[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3281354.3281363acmconferencesArticle/Chapter ViewAbstractPublication PagesgirConference Proceedingsconference-collections
research-article

Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India

Published: 06 November 2018 Publication History

Abstract

Place name recognition is one of the key tasks in Information Extraction. In this paper, we tackle this task in English News from India. We first analyze the results obtained by using available tools and corpora and then train our own models to obtain better results. Most of the previous work done on entity recognition for English makes use of similar corpora for both training and testing. Yet we observe that the performance drops significantly when we test the models on different datasets. For this reason, we have trained various models using combinations of several corpora. Our results show that training models using combinations of several corpora improves the relative performance of these models but still more research on this area is necessary to obtain place name recognizers that generalize to any given dataset.

References

[1]
N Abinaya, Neethu John, Barathi HB Ganesh, Anand M Kumar, and KP Soman. 2014. AMRITA_CEN@ FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 103--111.
[2]
Beatrice Alex, Kate Byrne, Claire Grover, and Richard Tobin. 2015. Adapting the Edinburgh geoparser for historical georeferencing. International Journal of Humanities and Arts Computing 9, 1 (2015), 15--35.
[3]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722--735.
[4]
Isabelle Augenstein, Leon Derczynski, and Kalina Bontcheva. 2017. Generalisation in named entity recognition: A quantitative analysis. Computer Speech & Language 44 (2017), 61--83.
[5]
Jason PC Chiu and Eric Nichols. 2015. Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308 (2015).
[6]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537.
[7]
Francisco Couto, Luis Campos, and Andre Lamurias. 2017. MER: a Minimal Named-Entity Recognition Tagger and Annotation Server. (04 2017).
[8]
Grant DeLozier, Jason Baldridge, and Loretta London. 2015. Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles. In AAAI. 2382--2388.
[9]
Franck Dernoncourt, Ji Young Lee, and Peter Szolovits. 2017. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. Conference on Empirical Methods on Natural Language Processing (EMNLP) (2017).
[10]
Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. 2016. De-identification of Patient Notes with Recurrent Neural Networks. Journal of the American Medical Informatics Association (JAMIA) (2016).
[11]
Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily M Bender. 2017. Towards linguistically generalizable nlp systems: A workshop and shared task. arXiv preprint arXiv:1711.01505 (2017).
[12]
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 363--370.
[13]
Christopher B. Jones and Ross S. Purves. 2008. Geographical information retrieval. International Journal of Geographical Information Science 22, 3 (2008), 219--228.
[14]
Morteza Karimzadeh, Wenyi Huang, Siddhartha Banerjee, Jan Oliver Wallgrün, Frank Hardisty, Scott Pezanowski, Prasenjit Mitra, and Alan M MacEachren. 2013. GeoTxt: a web API to leverage place references in text. In Proceedings of the 7th workshop on geographic information retrieval. ACM, 72--73.
[15]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).
[16]
Thomas Lavergne, Olivier Cappé, and François Yvon. 2010. Practical Very Large Scale CRFs. In Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 504--513. http://www.aclweb.org/anthology/P10-1052
[17]
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3--26.
[18]
Sailaja Pingali. 2009. Indian English. Edinburgh University Press.
[19]
Dinesh Kumar Prabhakar, Shantanu Dubey, Bharti Goel, and Sukomal Pal. 2014. ISM@FIRE-2014: Named Entity Recognition for Indian Languages. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 98--102.
[20]
Ross S Purves, Paul Clough, Christopher B Jones, Mark H Hall, Vanessa Murdock, et al. 2018. Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text. Foundations and Trends® in Information Retrieval 12, 2-3 (2018), 164--318.
[21]
Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.
[22]
SP Sanjay, M Anand Kumar, and KP Soman. 2015. AMRITA_CEN-NLP@ FIRE 2015: CRF Based Named Entity Extractor For Twitter Microposts. In FIRE Workshops. 96--99.
[23]
Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 142--147.

Cited By

View all
  • (2019)Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-Context SettingExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-28577-7_32(425-432)Online publication date: 3-Aug-2019
  • (2019)A Task Set Proposal for Automatic Protest Information Collection Across Multiple CountriesAdvances in Information Retrieval10.1007/978-3-030-15719-7_42(316-323)Online publication date: 14-Apr-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GIR'18: Proceedings of the 12th Workshop on Geographic Information Retrieval
November 2018
37 pages
ISBN:9781450360340
DOI:10.1145/3281354
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Entity Extraction
  2. Machine Learning
  3. Named Entity Recognition
  4. Natural Language Processing
  5. Place Name Recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SIGSPATIAL '18
Sponsor:

Acceptance Rates

GIR'18 Paper Acceptance Rate 8 of 12 submissions, 67%;
Overall Acceptance Rate 46 of 61 submissions, 75%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-Context SettingExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-28577-7_32(425-432)Online publication date: 3-Aug-2019
  • (2019)A Task Set Proposal for Automatic Protest Information Collection Across Multiple CountriesAdvances in Information Retrieval10.1007/978-3-030-15719-7_42(316-323)Online publication date: 14-Apr-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media