Abstract
Spatial transcriptomics allows for the measurement of RNA abundance at a high spatial resolution, making it possible to systematically link the morphology of cellular neighbourhoods and spatially localized gene expression. Here, we report the development of a deep learning algorithm for the prediction of local gene expression from haematoxylin-and-eosin-stained histopathology images using a new dataset of 30,612 spatially resolved gene expression data matched to histopathology images from 23 patients with breast cancer. We identified over 100 genes, including known breast cancer biomarkers of intratumoral heterogeneity and the co-localization of tumour growth and immune activation, the expression of which can be predicted from the histopathology images at a resolution of 100 µm. We also show that the algorithm generalizes well to The Cancer Genome Atlas and to other breast cancer gene expression datasets without the need for re-training. Predicting the spatially resolved transcriptome of a tissue directly from tissue images may enable image-based screening for molecular biomarkers with spatial variation.
Similar content being viewed by others
Data availability
The main data supporting the results in this study are available within the paper and its Supplementary Information. Raw files for the breast cancer samples are available through a Materials transfer agreement with Å.B. (ake.borg@med.lu.se). All images and processed data are available at http://www.spatialtranscriptomicsresearch.org. The 10x Spatial Genomics data can be downloaded from https://wp.10xgenomics.com/spatial-transcriptomics. All data from TGCA are publicly available from the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov).
Code availability
The code for ST-Net is available at https://github.com/bryanhe/ST-Net.
References
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Eng, C. H. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature 568, 235–239 (2019).
Liu, R. et al. Modeling spatial correlation of transcripts with application to developing pancreas. Sci. Rep. 9, 5592 (2019).
Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363 (2014).
Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
Kamentsky, L. et al. Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software. Bioinformatics 27, 1179–1180 (2011).
Yu, K. H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition 770–778 (2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition 4700–4708 (2017).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. Int. Conf. on Learning Representations (2015).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2016).
Litjens, G. et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7, giy065 (2018).
Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images. Preprint at https://arXiv.org/abs/1703.02442 (2017).
Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A. H. Deep learning for identifying metastatic breast cancer. Preprint at https://arXiv.org/abs/1606.05718 (2016).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Khosravi, P., Kazemi, E., Imielinski, M., Elemento, O. & Hajirasouliha, I. Deep convolutional neural networks enable discrimination of heterogeneous digital pathology images. EBioMedicine 27, 317–328 (2018).
Yu, K. H. et al. Classifying non-small cell lung cancer histopathology types and transcriptomic subtypes using convolutional neural networks. J. Am. Assoc. Med. Inform. Assoc. 27, 757–769 (2019).
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2008).
The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Kurozumi, S. et al. Prognostic significance of tumour-infiltrating lymphocytes for oestrogen receptor-negative breast cancer without lymph node metastasis. Oncol. Lett. 17, 2647–2656 (2019).
Ladha, J. et al. Identification of genomic targets of transcription factor AEBP1 and its role in survival of glioma cells. Mol. Cancer Res. 10, 1039–1051 (2012).
Sangaletti, S. et al. Macrophage-derived SPARC bridges tumor cell–extracellular matrix interactions toward metastasis. Cancer Res. 68, 9050–9059 (2008).
Yamamoto, K. et al. Biglycan is a specific marker and an autocrine angiogenic factor of tumour endothelial cells. Br. J. Cancer 106, 1214–1223 (2012).
Cheng, J. et al. Integrative analysis of histopathological images and genomic data predicts clear cell renal cell carcinoma prognosis. Cancer Res. 77, e91–e100 (2017).
Ge, R. & Zou, J. Intersecting faces: non-negative matrix factorization with new guarantees. In Proc. of the 32nd Int. Conf. on Machine Learning (2015).
Rahmani, E. et al. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods 13, 443–445 (2016).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th Int. Conf. on Machine Learning (2017).
Hunt, D. A. et al. mRNA stability and overexpression of fatty acid synthase in human breast cancer cell lines. Anticancer Res. 27, 27–34 (2007).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection. J. Open Source Software 3, 891 (2018).
Salmén, F. et al. Barcoded solid-phase RNA capture for spatial transcriptomics profiling in mammalian tissue sections. Nat. Protoc. 13, 2501–2534 (2018).
Deng, J. et al. Imagenet: a largescale hierarchical image database. In IEEE Conf. on Computer Vision and Pattern Recognition 248–255 (2009).
Russakovsky, O. et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Adv. Neural Inf. Process. Syst. (2017).
Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979).
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proc. 9th Python in Science Conf. (2010).
Acknowledgements
J.Z. is supported by the NSF (grant no. CCF 1763191), NIH (grant nos. R21 MD012867-01, P30AG059307 and U01MH098953) and grants from the Silicon Valley Foundation and Chan–Zuckerberg Initiative. J.L. is supported by the Swedish Foundation for Strategic Research, Swedish Cancer Society and Swedish Research Council.
Author information
Authors and Affiliations
Contributions
All of the authors contributed to the project planning and writing of the manuscript. B.H. and L.B. performed analysis. L.S., Å.B. and J.L. generated data. J.M., J.L. and J.Z. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
J.L. is an author on patent nos. PCT/EP2012/056823 (WO2012/140224), PCT/EP2013/071645 (WO2014/060483) and PCT/EP2016/057355 applied for by Spatial Transcriptomics AB/10x Genomics Inc. covering the described technology.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary figures and tables.
Rights and permissions
About this article
Cite this article
He, B., Bergenstråhle, L., Stenbeck, L. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat Biomed Eng 4, 827–834 (2020). https://doi.org/10.1038/s41551-020-0578-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41551-020-0578-x
- Springer Nature Limited
This article is cited by
-
IAMSAM: image-based analysis of molecular signatures using the Segment Anything Model
Genome Biology (2024)
-
Pathogenomics for accurate diagnosis, treatment, prognosis of oncology: a cutting edge overview
Journal of Translational Medicine (2024)
-
Mining the interpretable prognostic features from pathological image of intrahepatic cholangiocarcinoma using multi-modal deep learning
BMC Medicine (2024)
-
STASCAN deciphers fine-resolution cell distribution maps in spatial transcriptomics by deep learning
Genome Biology (2024)
-
Deep learning applications in breast cancer histopathological imaging: diagnosis, treatment, and prognosis
Breast Cancer Research (2024)