[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Annotating longitudinal clinical narratives for de-identification

Published: 01 December 2015 Publication History

Abstract

Display Omitted De-identification shared task for longitudinal clinical records.Protected Health Information in records replaced with realistic surrogates.First corpus of its kind available for distribution.Used for Track 1 of the 2014 i2b2/UTHealth NLP shared task. The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.

References

[1]
D. Demner-Fushman, W.W. Chapman, C.J. McDonald, What can natural language processing do for clinical decision support?, J. Biomed. Inform., 42 (2009) 760-772.
[2]
K.B. Wagholikar, K.L. MacLaughlin, M.R. Henry, R.A. Greenes, R.A. Hankey, H. Liu, R. Chaudhry, Clinical decision support with automated text processing for cervical cancer screening, J. Am. Med. Inform. Assoc. (2012).
[3]
R.J. Carroll, W.K. Thompson, A.E. Eyler, A.M. Mandelin, T. Cai, R.M. Zink, J.A. Pacheco, C.S. Boomershine, T.A. Lasko, H. Xu, E.W. Karlson, R.G. Perez, V.S.Gainer, S.N. Murphy, E.M. Ruderman, R.M. Pope, R.M. Plenge, A. Ngo Kho, K.P. Liao, J.C. Denny, Portability of an algorithm to identify rheumatoid arthritis in electronic health records, J. Am. Inform. Assoc., 19 (2012) e162-e169.
[4]
C. Weng, X. Wu, Z. Luo, M.R. Boland, D. Theodoratos, S.B. Johnson, EliXR: an approach to eligibility criteria extraction and representation, J. Am. Med. Inform. Assoc., 18 (2011) i116-i124.
[5]
A. Stubbs, C. Kotfila, Ö. Uzuner, Automated Systems for the De-identification of Longitudinal Clinical Narratives: Overview of 2014 i2b2/UTHealth Shared Task Track 1 (2015) J. Biomed. Inform. 58S (2015) S11-S19.
[6]
A. Stubbs, C. Kotfila, Ö. Uzuner, Identifying Risk Factors for Heart Disease Over Time: Overview of 2014 i2b2/UTHealth Shared Task Track 2 (2015) J. Biomed. Inform. 58S (2015) S67-S77.
[7]
Ö. Uzuner, Y. Luo, P. Szolovits, Evaluating the state-of-the-art in automatic de-identification, J. Med. Inform. Assoc., 14 (2007) 550-563.
[8]
A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.Ch. Ivanov, R.G. Mark, J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit, and Physionet: components of a new research resource for complex physiologic signals, Circulation, 101 (2000) e215-e220.
[9]
I. Neamatullah, M. Douglass, L.H. Lehman, A. Reisner, M. Villarroel, W.J. Long, P. Szolovits, G.B. Moody, R.G. Mark, G.D. Clifford, Automated de-identification of free-text medical records, BMC Med. Inform. Decis. Mak., 8 (2008) 32.
[10]
L. Deleger, T. Lingren, Y. Ni, M. Kaiser, L. Stoutenborough, K. Marsolo, M. Kouril, K. Molnar, I. Solti, Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research, J. Biomed. Inform., 50 (2014) 173-183.
[11]
B.R. South, D. Mowery, Y. Suo, J. Leng, O. Ferrandez, S.M. Meystre, W.W. Chapman, Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. J. Biomed. Inform. 50 (2014) 162-172. http://dx.doi.org/10.1016/j.jbi.2014.05.002 (in press).
[12]
A. Stubbs, Ö. Uzuner, De-identification of medical records through annotation, in: Nancy Ide, James Pustejovsky (Eds.), Chapter in Handbook of Linguistic Annotation, Springer, 2015.
[13]
V. Kumar, A. Stubbs, S. Shaw, Ö. Uzuner, Creation of a new longitudinal corpus of clinical narratives, J. Biomed. Inform. 58S (2015) S6-S10.
[14]
A. Stubbs, MAE and MAI: lightweight annotation and adjudication tools, in: 2011 Proceedings of the Linguistic Annotation Workshop V, Association of Computational Linguistics, Portland, Oregon, July 23-24, 2011.
[15]
A. Stubbs, Ö. Uzuner, C. Kotfila, I. Goldstein, P. Szolovitz, Challenges in synthesizing replacements for PHI in narrative EMRs, in: Chapter in Medical Data Privacy Handbook, Springer, Anticipated Publication, 2015.
[16]
Ö. Uzuner, Focus on i2b2 obesity NLP challenge: viewpoint paper: recognizing obesity and comorbidities in sparse data, J. Med. Inform. Assoc., 16 (2009) 561-570.

Cited By

View all
  • (2023)Few-Shot Named Entity Recognition via Label-Attention MechanismProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594358(466-471)Online publication date: 17-Mar-2023
  • (2021)De-Identification of Clinical Notes Using Contextualized Language Models and a Token ClassifierIntelligent Systems10.1007/978-3-030-91699-2_3(33-41)Online publication date: 29-Nov-2021
  • (2020)Deidentification of free-text medical records using pre-trained bidirectional transformersProceedings of the ACM Conference on Health, Inference, and Learning10.1145/3368555.3384455(214-221)Online publication date: 2-Apr-2020
  • Show More Cited By
  1. Annotating longitudinal clinical narratives for de-identification

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal of Biomedical Informatics
      Journal of Biomedical Informatics  Volume 58, Issue S
      December 2015
      222 pages

      Publisher

      Elsevier Science

      San Diego, CA, United States

      Publication History

      Published: 01 December 2015

      Author Tags

      1. Annotation
      2. De-identification
      3. HIPAA
      4. Natural language processing

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Few-Shot Named Entity Recognition via Label-Attention MechanismProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594358(466-471)Online publication date: 17-Mar-2023
      • (2021)De-Identification of Clinical Notes Using Contextualized Language Models and a Token ClassifierIntelligent Systems10.1007/978-3-030-91699-2_3(33-41)Online publication date: 29-Nov-2021
      • (2020)Deidentification of free-text medical records using pre-trained bidirectional transformersProceedings of the ACM Conference on Health, Inference, and Learning10.1145/3368555.3384455(214-221)Online publication date: 2-Apr-2020
      • (2019)SurfConProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330894(1578-1586)Online publication date: 25-Jul-2019
      • (2017)A cascaded approach for Chinese clinical text de-identification with less annotation effortJournal of Biomedical Informatics10.1016/j.jbi.2017.07.01773:C(76-83)Online publication date: 1-Sep-2017
      • (2017)Intrainstitutional EHR collections for patient-level information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2388468:11(2636-2648)Online publication date: 1-Nov-2017
      • (2016)Optimizing annotation resources for natural language de-identification via a game theoretic frameworkJournal of Biomedical Informatics10.1016/j.jbi.2016.03.01961:C(97-109)Online publication date: 1-Jun-2016

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media