research-article

External Knowledge and Query Strategies in Active Learning: a Study in Clinical Information Extraction

Authors:

Mahnoosh Kholghi,

Laurianne Sitbon,

Guido Zuccon,

Anthony NguyenAuthors Info & Claims

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 143 - 152

https://doi.org/10.1145/2806416.2806550

Published: 17 October 2015 Publication History

Get Access

Abstract

This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that (1) amongst existing query strategies, the ones based on the classification model's confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and (2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.

References

[1]

A. R. Aronson. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the American Medical Informatics Association Annual Symposium, 17--21, 2001.

Abstract

References

Cited By

Index Terms

Recommendations

Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey

Active Learning for Name Entity Recognition with External Knowledge

Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations