Embedded Word Representations for Rich Indexing: A Case Study for Medical Records

Katherine Metcalf¹⁶ &
David Leake¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11156))

Included in the following conference series:

International Conference on Case-Based Reasoning

1150 Accesses
2 Citations

Abstract

Case indexing decisions must often confront the tradeoff between rich semantic indexing schemes, which provide effective retrieval at large indexing cost, and shallower indexing schemes, which enable low-cost indexing but may be less reliable. Indexing for textual case-based reasoning is often based on information retrieval approaches that minimize index acquisition cost but sacrifice semantic information. This paper presents JointEmbed, a method for automatically generating rich indices. JointEmbed automatically generates continuous vector space embeddings that implicitly capture semantic information, leveraging multiple knowledge sources such as free text cases and pre-existing knowledge graphs. JointEmbed generates effective indices by applying pTransR, a novel approach for modelling knowledge graphs, to encode and summarize contents of domain knowledge resources. JointEmbed is applied to the medical CBR task of retrieving relevant patient electronic health records, for which potential health consequences make retrieval quality paramount. An evaluation supports that JointEmbed outperforms previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mining for Health: A Comparison of Word Embedding Methods for Analysis of EHRs Data

Comparing High Dimensional Word Embeddings Trained on Medical Text to Bag-of-Words for Predicting Medical Codes

Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records

Article Open access 25 June 2019

Notes

References

Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, pp. 17–21. American Medical Informatics Association (2001)
Google Scholar
Bichindaritz, I., Marling, C.: Case-based reasoning in the health sciences: what’s next? Artif. Intell. Med. 36(2), 127–135 (2006)
Article Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucl. Acids Res. 32, 267–270 (2004)
Article Google Scholar
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013)
Google Scholar
Brüninghaus, S., Ashley, K.D.: The role of information extraction for textual CBR. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 74–89. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44593-5_6
Chapter Google Scholar
Burke, R.D., Hammond, K.J., Kulyukin, V.A., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57–66 (1997)
Google Scholar
Cunningham, C., Weber, R., Proctor, J.M., Fowler, C., Murphy, M.: Investigating graphs in textual case-based reasoning. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 573–586. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28631-8_42
Chapter Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Article Google Scholar
Gupta, K.M., Aha, D.W.: Towards acquiring case indexing taxonomies from text. In: Proceedings of the 17th International Florida AI Research Society Conference, pp. 172–177 (2004)
Google Scholar
Huang, W., Li, G., Jin, Z.: Improved knowledge base completion by the path-augmented transR model. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds.) KSEM 2017. LNCS (LNAI), vol. 10412, pp. 149–159. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63558-3_13
Chapter Google Scholar
Johnson, A.E., et al.: MIMIC-III. Scientific data 3, 160035 (2016)
Article Google Scholar
Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the Annual Meeting of the Cognitive Science Society, pp. 103–106. Erlbaum (2002)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning ICML, pp. 1188–1196 (2014)
Google Scholar
Lenz, M., Burkhard, H.-D.: CBR for document retrieval: the FAllQ project. In: Leake, D.B., Plaza, E. (eds.) ICCBR 1997. LNCS, vol. 1266, pp. 84–93. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63233-6_481
Chapter Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181–2187. AAAI Press (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013)
Google Scholar
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 746–751 (2013)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Moen, H., Ginter, F., Marsi, E., Peltonen, L., Salakoski, T., Salanterä, S.: Care episode retrieval: distributional semantic models for information retrieval in the clinical domain. BMC Med. Inf. Decis. Mak. 15(S-2)–S2 (2015)
Google Scholar
Moen, H., et al.: Comparison of automatic summarisation methods for clinical free text notes. Artif. Intell. Med. 67, 25–37 (2016)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Article Google Scholar
Osgood, R., Bareiss, R.: Automated index generation for constructing large-scale conversational hypermedia systems. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 309–314. AAAI Press, July 1993
Google Scholar
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)
Article MathSciNet Google Scholar
Patterson, D.W., Rooney, N., Dobrynin, V., Galushka, M.: Sophia: a novel approach for textual case-based reasoning. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence IJCAI, pp. 15–20 (2005)
Google Scholar
Porter, R., Kaplan, J.: Merck manual (2012). https://www.merckmanuals.com/professional
Proctor, J.M., Waldstein, I., Weber, R.: Identifying facts for TCBR. In: ICCBR 2005 Workshop Proceedings, pp. 150–159 (2005)
Google Scholar
Schank, R., et al.: A content theory of memory indexing. Tech. Rep. 1, Institute for the Learning Sciences, Northwestern University (1990)
Google Scholar
Sizov, G., Öztürk, P., Štyrák, J.: Acquisition and reuse of reasoning knowledge from textual cases for automated analysis. In: Lamontagne, L., Plaza, E. (eds.) ICCBR 2014. LNCS (LNAI), vol. 8765, pp. 465–479. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11209-1_33
Chapter Google Scholar
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph and text jointly embedding. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing EMNLP, pp. 1591–1601 (2014)
Google Scholar
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1112–1119. AAAI Press (2014)
Google Scholar
Weber, R.O., Ashley, K.D., Brüninghaus, S.: Textual case-based reasoning. Knowl. Eng. Rev. 20(3), 255–260 (2005)
Article Google Scholar
Wiratunga, N., Lothian, R., Massie, S.: Unsupervised feature selection for text data. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006). https://doi.org/10.1007/11805816_26
Chapter Google Scholar
Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I.: Textual feature construction from keywords. In: ICCBR 2005 Workshop Proceedings, pp. 110–119 (2005)
Google Scholar
Xie, R., Liu, Z., Jia, J., Luan, H., Sun, M.: Representation learning of knowledge graphs with entity descriptions. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2659–2665. AAAI Press (2016)
Google Scholar
Yang, C., He, B.: A novel semantics-based approach to medical literature search. In: IEEE International Conference on Bioinformatics and Biomedicine BIBM, pp. 1616–1623 (2016)
Google Scholar

Download references

Acknowledgement

This work is supported by the Indiana University Precision Health Initiative.

Author information

Authors and Affiliations

School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, 47408, USA
Katherine Metcalf & David Leake

Authors

Katherine Metcalf
View author publications
You can also search for this author in PubMed Google Scholar
David Leake
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Leake .

Editor information

Editors and Affiliations

Wright State University, Dayton, OH, USA
Michael T. Cox
Mälardalen University, Västeras, Sweden
Peter Funk
Mälardalen University, Västeras, Sweden
Shahina Begum

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Metcalf, K., Leake, D. (2018). Embedded Word Representations for Rich Indexing: A Case Study for Medical Records. In: Cox, M., Funk, P., Begum, S. (eds) Case-Based Reasoning Research and Development. ICCBR 2018. Lecture Notes in Computer Science(), vol 11156. Springer, Cham. https://doi.org/10.1007/978-3-030-01081-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-01081-2_18
Published: 09 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01080-5
Online ISBN: 978-3-030-01081-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics