[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3383583.3398618acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
short-paper

On the Ambiguity and Relevance of Place Names in Scientific Text

Published: 01 August 2020 Publication History

Abstract

How hard is it to systematically identify and disambiguate place names in scientific text? In order to address this question, we applied MapAffil, a toponymic search interface, on a random sample of 500 place name sentences from PubMed abstracts.
The algorithm correctly identified and disambiguated 39.2% of the place names in sentences. An error analysis revealed six unique challenges: Biological terms (14.2%), Method terms (11.6%), Acronyms (10%), References (6%), Other entity names (4.2%), and Other errors (2.2%). Interestingly, a large portion of the correctly identified place names appeared irrelevant to the subject matter.
Many of these errors can be fixed easily, but irrelevance is much harder to address, for it depends on semantics and purpose. To study the role of place in scientific text, it is not sufficient to disambiguate accurately, but it is also necessary to be able to assess the degree of relevance.

Supplementary Material

MP4 File (3383583.3398618.mp4)
How hard is it to systematically identify and disambiguate place names in the scientific text? In order to address this question, we applied MapAffil, a toponymic search interface, on a random sample of 500 place name sentences from PubMed abstracts. Several different types of common misclassifications in geotagging in the scientific text has been detected. Many of these mislabeled place names can easily be fixed as the discussion of solutions in the paper. A comparison of the proportions of different misclassification types between biomedical text and newsletter text shows that biomedical papers contain more potential misclassification. Finally, several potential solutions have been discussed. With an increased number of annotated records, a better model may be trained based on the findings and discussions above.

References

[1]
Sam Coates-Stephens. 1992. The analysis and acquisition of proper names for the understanding of free text. Comput. and the Humanit., 26, 441--456.
[2]
Milan Gritta, Mohammad T. Pilehvar, and Nigel Collier. 2018. A pragmatic guide to geoparsing evaluation. ArXiv:1810.12368 [Cs].
[3]
Yingjie Hu and Benjamin Adams. 2020. Harvesting big geospatial data from natural language texts. In M. Werner and Y.-Y. Chiang (Eds), Handbook of Big Geospatial Data, Springer.
[4]
Yingjie Hu, Song Gao, Dalton Lunga, Wenwen Li, Shawn Newsam and Budendra Bhaduri. 2019. GeoAI at ACM SIGSPATIAL: progress, challenges, and future directions. SIGSPATIAL Special, 11(2), 5--15.
[5]
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvist. Invest., 30, 1, 3--26.
[6]
Vetle I. Torvik. 2016. MapAffil: A bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide. D-Lib Mag., 11--12.
[7]
Vetle I. Torvik. 2018. MapAffil 2016 dataset -- author affiliations mapped to cities and their geocodes worldwide. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4354331_V1
[8]
Graham Wilcock. 2009. Text annotation with OpenNLP and UIMA. In Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009), Northern European Association for Language Technology (NEALT), Odense, Denmark, 7--8.

Cited By

View all
  • (2024)Augmenting web-based tourist support system with microblog analyzed dataInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02247-8Online publication date: 16-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
August 2020
611 pages
ISBN:9781450375856
DOI:10.1145/3383583
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. geoparsing
  2. named entity recognition
  3. place name ambiguity
  4. pubmed
  5. toponym resolution

Qualifiers

  • Short-paper

Funding Sources

  • US National Institutes of Health

Conference

JCDL '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Upcoming Conference

JCDL '24
The 2024 ACM/IEEE Joint Conference on Digital Libraries
December 16 - 20, 2024
Hong Kong , China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Augmenting web-based tourist support system with microblog analyzed dataInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02247-8Online publication date: 16-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media