Abstract
Case indexing decisions must often confront the tradeoff between rich semantic indexing schemes, which provide effective retrieval at large indexing cost, and shallower indexing schemes, which enable low-cost indexing but may be less reliable. Indexing for textual case-based reasoning is often based on information retrieval approaches that minimize index acquisition cost but sacrifice semantic information. This paper presents JointEmbed, a method for automatically generating rich indices. JointEmbed automatically generates continuous vector space embeddings that implicitly capture semantic information, leveraging multiple knowledge sources such as free text cases and pre-existing knowledge graphs. JointEmbed generates effective indices by applying pTransR, a novel approach for modelling knowledge graphs, to encode and summarize contents of domain knowledge resources. JointEmbed is applied to the medical CBR task of retrieving relevant patient electronic health records, for which potential health consequences make retrieval quality paramount. An evaluation supports that JointEmbed outperforms previous methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, pp. 17–21. American Medical Informatics Association (2001)
Bichindaritz, I., Marling, C.: Case-based reasoning in the health sciences: what’s next? Artif. Intell. Med. 36(2), 127–135 (2006)
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucl. Acids Res. 32, 267–270 (2004)
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013)
Brüninghaus, S., Ashley, K.D.: The role of information extraction for textual CBR. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 74–89. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44593-5_6
Burke, R.D., Hammond, K.J., Kulyukin, V.A., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: experiences with the FAQ finder system. AI Mag. 18(2), 57–66 (1997)
Cunningham, C., Weber, R., Proctor, J.M., Fowler, C., Murphy, M.: Investigating graphs in textual case-based reasoning. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 573–586. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28631-8_42
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Gupta, K.M., Aha, D.W.: Towards acquiring case indexing taxonomies from text. In: Proceedings of the 17th International Florida AI Research Society Conference, pp. 172–177 (2004)
Huang, W., Li, G., Jin, Z.: Improved knowledge base completion by the path-augmented transR model. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds.) KSEM 2017. LNCS (LNAI), vol. 10412, pp. 149–159. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63558-3_13
Johnson, A.E., et al.: MIMIC-III. Scientific data 3, 160035 (2016)
Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the Annual Meeting of the Cognitive Science Society, pp. 103–106. Erlbaum (2002)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning ICML, pp. 1188–1196 (2014)
Lenz, M., Burkhard, H.-D.: CBR for document retrieval: the FAllQ project. In: Leake, D.B., Plaza, E. (eds.) ICCBR 1997. LNCS, vol. 1266, pp. 84–93. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63233-6_481
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181–2187. AAAI Press (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013)
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 746–751 (2013)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Moen, H., Ginter, F., Marsi, E., Peltonen, L., Salakoski, T., Salanterä, S.: Care episode retrieval: distributional semantic models for information retrieval in the clinical domain. BMC Med. Inf. Decis. Mak. 15(S-2)–S2 (2015)
Moen, H., et al.: Comparison of automatic summarisation methods for clinical free text notes. Artif. Intell. Med. 67, 25–37 (2016)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Osgood, R., Bareiss, R.: Automated index generation for constructing large-scale conversational hypermedia systems. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 309–314. AAAI Press, July 1993
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)
Patterson, D.W., Rooney, N., Dobrynin, V., Galushka, M.: Sophia: a novel approach for textual case-based reasoning. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence IJCAI, pp. 15–20 (2005)
Porter, R., Kaplan, J.: Merck manual (2012). https://www.merckmanuals.com/professional
Proctor, J.M., Waldstein, I., Weber, R.: Identifying facts for TCBR. In: ICCBR 2005 Workshop Proceedings, pp. 150–159 (2005)
Schank, R., et al.: A content theory of memory indexing. Tech. Rep. 1, Institute for the Learning Sciences, Northwestern University (1990)
Sizov, G., Öztürk, P., Štyrák, J.: Acquisition and reuse of reasoning knowledge from textual cases for automated analysis. In: Lamontagne, L., Plaza, E. (eds.) ICCBR 2014. LNCS (LNAI), vol. 8765, pp. 465–479. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11209-1_33
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph and text jointly embedding. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing EMNLP, pp. 1591–1601 (2014)
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1112–1119. AAAI Press (2014)
Weber, R.O., Ashley, K.D., Brüninghaus, S.: Textual case-based reasoning. Knowl. Eng. Rev. 20(3), 255–260 (2005)
Wiratunga, N., Lothian, R., Massie, S.: Unsupervised feature selection for text data. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006). https://doi.org/10.1007/11805816_26
Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I.: Textual feature construction from keywords. In: ICCBR 2005 Workshop Proceedings, pp. 110–119 (2005)
Xie, R., Liu, Z., Jia, J., Luan, H., Sun, M.: Representation learning of knowledge graphs with entity descriptions. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2659–2665. AAAI Press (2016)
Yang, C., He, B.: A novel semantics-based approach to medical literature search. In: IEEE International Conference on Bioinformatics and Biomedicine BIBM, pp. 1616–1623 (2016)
Acknowledgement
This work is supported by the Indiana University Precision Health Initiative.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Metcalf, K., Leake, D. (2018). Embedded Word Representations for Rich Indexing: A Case Study for Medical Records. In: Cox, M., Funk, P., Begum, S. (eds) Case-Based Reasoning Research and Development. ICCBR 2018. Lecture Notes in Computer Science(), vol 11156. Springer, Cham. https://doi.org/10.1007/978-3-030-01081-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-01081-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01080-5
Online ISBN: 978-3-030-01081-2
eBook Packages: Computer ScienceComputer Science (R0)