Abstract
Systematic analysis of high resolution whole slide images enables more effective diagnosis, prognosis and prediction of cancer and other important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be divided into smaller regions for processing due to computer memory limitations, which will lead to inaccurate results due to the ignorance of boundary crossing objects. In this paper, we propose a highly scalable and cost effective MapReduce based image analysis framework for whole slide image processing, and provide a cloud based implementation. The framework takes a grid-based overlapping partitioning scheme, and provides parallelization of image segmentation based on MapReduce. It provides graceful handling of boundary objects with a highly efficient spatial indexing based matching method, thus avoiding loss of accuracy due to partitioning. We demonstrate that the system achieves high scalability and is cost-effective – our experiments demonstrate that it costs less than fifteen cents to analyze one image on average using Amazon Elastic MapReduce.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kong, J., Cooper, L.A.D., Wang, F., Teodoro, G., Scarpace, L., Mikkelsen, T., Schniederjan, M.J., Moreno, S., Saltz, J.H., Brat, D.J.: Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS One 8(11), e81049 (2013)
Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kurc, T.M., Moreno, C., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. J. Am. Med. Inform. Assoc. 19(2), 317–323 (2012)
Foran, D.J., Yang, L., Hu, J., Goodell, L.A., Reise, M., Wang, F., Kurc, T., Pan, T., Sharma, A., Saltz, H.: Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. JAMIA 18(4), 403–415 (2011)
Teodoro, G., Pan, T., Kurc, T.M., Kong, J., Cooper, L.A.D., Podhorszki, N., Klasky, S., Saltz, J.H.: High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. In: IPDPS, pp. 103–114, May 2013
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: SIGSPATIAL GIS, pp. 309–318. ACM (2012)
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Cooper, L.A.D., Kong, J., Wang, F., Saltz, K.T., J.H., Brat D.: In silico analysis of nuclei in glioblastoma using large-scale microscopy images improves prediction of treatment response. In: EMBC (2011)
Wang, F., Oh, T.W., Vergara-Nidermayr, C., Kurc, T.M., Saltz, J.H.: Managing and querying whole slide images. In: SPIE Medical Imaging (2012)
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)
Zhang, X., Wang, F., Lee, R., Saltz, J.H.: Towards building high performance medical image management system for clinical trials. In: SPIE Medical Imaging, pp. 762805–11 (2011)
Acknowledgements
This work is supported in part by NSF IIS 1350885, by NSF ACI 1350885, by Grant Number K25CA181503 from the National Institute of Health, by Grant Number R01LM009239 from the National Library of Medicine, by Grant Number 1U24CA180924-01A1 from the National Cancer Institute, and by CNPq.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Vo, H. et al. (2017). Cloud-Based Whole Slide Image Analysis Using MapReduce. In: Wang, F., Yao, L., Luo, G. (eds) Data Management and Analytics for Medicine and Healthcare. DMAH 2016. Lecture Notes in Computer Science(), vol 10186. Springer, Cham. https://doi.org/10.1007/978-3-319-57741-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-57741-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57740-1
Online ISBN: 978-3-319-57741-8
eBook Packages: Computer ScienceComputer Science (R0)