Abstract
Consideration of subgroups or domains within medical image datasets is crucial for the development and evaluation of robust and generalizable machine learning systems. To tackle the domain identification problem, we examine deep unsupervised generative clustering approaches for representation learning and clustering. The Variational Deep Embedding (VaDE) model is trained to learn lower-dimensional representations of images based on a Mixture-of-Gaussians latent space prior distribution while optimizing cluster assignments. We propose the Conditionally Decoded Variational Deep Embedding (CDVaDE) model which incorporates additional variables of choice, such as the class labels, as conditioning factors to guide the clustering towards subgroup structures in the data which have not been known or recognized previously. We analyze the behavior of CDVaDE on multiple datasets and compare it to other deep clustering algorithms. Our experimental results demonstrate that the considered models are capable of separating digital pathology images into meaningful subgroups. We provide a general-purpose implementation of all considered deep clustering methods as part of the open source Python package DomId (https://github.com/DIDSR/DomId).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahn, E., Kumar, A., Feng, D., Fulham, M., Kim, J.: Unsupervised feature learning with k-means and an ensemble of deep convolutional neural networks for medical image classification. arXiv preprint arXiv:1906.03359 (2019)
Barragán-Montero, A., et al.: Artificial intelligence and machine learning for medical imaging: a technology review. Physica Med. 83, 242–256 (2021)
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Sig. Process. Mag. 29(6), 141–142 (2012)
Gavrielides, M.A., Gallas, B.D., Lenz, P., Badano, A., Hewitt, S.M.: Observer variability in the interpretation of HER2/\(neu\) immunohistochemical expression with unaided and computer-aided digital microscopy. Arch. Pathol. Lab. Med. 135(2), 233–242 (2011). https://doi.org/10.5858/135.2.233
Gossmann, A., Cha, K.H., Sun, X.: Performance deterioration of deep neural networks for lesion classification in mammography due to distribution shift: an analysis based on artificially created distribution shift. In: Medical Imaging 2020: Computer-Aided Diagnosis, vol. 11314, p. 1131404. SPIE (2020). https://doi.org/10.1117/12.2551346
Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational deep embedding: an unsupervised and generative approach to clustering. In: IJCAI (2017)
Kart, T., Bai, W., Glocker, B., Rueckert, D.: DeepMCAT: large-scale deep clustering for medical image categorization. In: Engelhardt, S., et al. (eds.) DGM4MICCAI/DALI -2021. LNCS, vol. 13003, pp. 259–267. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88210-5_26
Keay, T., Conway, C.M., O’Flaherty, N., Hewitt, S.M., Shea, K., Gavrielides, M.A.: Reproducibility in the automated quantitative assessment of HER2/neu for breast cancer. J. Pathol. Inform. 4(1), 19 (2013)
Kim, D.W., Jang, H.Y., Kim, K.W., Shin, Y., Park, S.H.: Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J. Radiol. 20(3), 405–410 (2019). https://doi.org/10.3348/kjr.2019.0025
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2013). arxiv.org/abs/1312.6114v10
Oakden-Rayner, L., Dunnmon, J., Carneiro, G., Re, C.: Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In: CHIL 2020, pp. 151–159. ACM (2020). https://doi.org/10.1145/3368555.3384468
Perkonigg, M., Sobotka, D., Ba-Ssalamah, A., Langs, G.: Unsupervised deep clustering for predictive texture pattern discovery in medical images. arXiv preprint arXiv:2002.03721 (2020)
Vokinger, K.N., Feuerriegel, S., Kesselheim, A.S.: Mitigating bias in machine learning for medicine. Commun. Med. 1(1), 25 (2021). https://doi.org/10.1038/s43856-021-00028-w
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of the 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, New York, USA, vol. 48, pp. 478–487. PMLR (2016). https://proceedings.mlr.press/v48/xieb16.html
Yu, A.C., Mohajer, B., Eng, J.: External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol. Artif. Intell. 4(3), e210064 (2022). https://doi.org/10.1148/ryai.210064
Acknowledgments
The authors would like to thank Dr. Marios Gavrielides for providing access to the HER2 dataset and for helpful discussion. This project was supported in part by an appointment to the Research Participation Program at the U.S. Food and Drug Administration administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration. XS acknowledges support from the Hightech Agenda Bayern.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sidulova, M., Sun, X., Gossmann, A. (2023). Deep Unsupervised Clustering for Conditional Identification of Subgroups Within a Digital Pathology Image Set. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14227. Springer, Cham. https://doi.org/10.1007/978-3-031-43993-3_64
Download citation
DOI: https://doi.org/10.1007/978-3-031-43993-3_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43992-6
Online ISBN: 978-3-031-43993-3
eBook Packages: Computer ScienceComputer Science (R0)