Abstract
Molecular profiling guides precision treatment of breast cancer; however, Asian patients are underrepresented in publicly available large-scale studies. We established a comprehensive multiomics cohort of 773 Chinese patients with breast cancer and systematically analyzed their genomic, transcriptomic, proteomic, metabolomic, radiomic and digital pathology characteristics. Here we show that compared to breast cancers in white individuals, Asian individuals had more targetable AKT1 mutations. Integrated analysis revealed a higher proportion of HER2-enriched subtype and correspondingly more frequent ERBB2 amplification and higher HER2 protein abundance in the Chinese HR+HER2+ cohort, stressing anti-HER2 therapy for these individuals. Furthermore, comprehensive metabolomic and proteomic analyses revealed ferroptosis as a potential therapeutic target for basal-like tumors. The integration of clinical, transcriptomic, metabolomic, radiomic and pathological features allowed for efficient stratification of patients into groups with varying recurrence risks. Our study provides a public resource and new insights into the biology and ancestry specificity of breast cancer in the Asian population, offering potential for further precision treatment approaches.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
£14.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
£99.00 per year
only £8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
WES, CNA, RNA-seq and metabolome data that support the findings of this study have been deposited in the Genome Sequence Archive database under accession code PRJCA017539. MS data have been deposited in iProX under accession code IPX0006535000. Human breast cancer genomic, transcriptomic data and protein data were derived from the FUSCC targeted sequencing cohort, TCGA Research Network, Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Clinical Proteomic Tumor Analysis Consortium (CPTAC). The datasets derived from TCGA, METABRIC and CPTAC are available at the cBioPortal website (www.cbioportal.org/). FUSCC targeted sequencing data are available in the Fudan Data Portal (https://data.3steps.cn/cdataportal/). All other data supporting the findings of this study are available from the corresponding author on reasonable request. Source data are provided with this paper.
Code availability
All data analysis and processing were conducted using published software packages whose details have been previously described and referenced within the Methods. No new code or mathematical algorithms were generated from this manuscript.
References
Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).
Waks, A. G. & Winer, E. P. Breast cancer treatment: a review. JAMA 321, 288–300, (2019).
Gennari, A. et al. ESMO clinical practice guideline for the diagnosis, staging and treatment of patients with metastatic breast cancer. Ann. Oncol. https://doi.org/10.1016/j.annonc.2021.09.019 (2021).
Razavi, P. et al. The genomic landscape of endocrine-resistant advanced breast cancers. Cancer Cell 34, 427–438 (2018).
Jiang, Y. Z. et al. Genomic and transcriptomic landscape of triple-negative breast cancers: subtypes and treatment strategies. Cancer Cell 35, 428–440 (2019).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).
Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).
Sammut, S. J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).
Boehm, K. M. et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat. Cancer 3, 723–733 (2022).
Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016).
Krug, K. et al. Proteogenomic landscape of breast cancer tumourigenesis and targeted therapy. Cell 183, 1436–1456 (2020).
Pan, J. W. et al. The molecular landscape of Asian breast cancers reveals clinically relevant population-specific differences. Nat. Commun. 11, 6433 (2020).
Kan, Z. et al. Multi-omics profiling of younger Asian breast cancers reveals distinctive molecular signatures. Nat. Commun. 9, 1725 (2018).
Shimoi, T. et al. Hotspot mutation profiles of AKT1 in Asian women with breast and endometrial cancers. BMC Cancer 21, 1131 (2021).
Lang, G. T. et al. Characterization of the genomic landscape and actionable mutations in Chinese breast cancers by clinical sequencing. Nat. Commun. 11, 5679 (2020).
Lee, Y. R. et al. WWP1 gain-of-function inactivation of PTEN in cancer predisposition. N. Engl. J. Med. 382, 2103–2116 (2020).
Lee, Y. R. et al. Reactivation of PTEN tumour suppressor for cancer treatment through inhibition of a MYC-WWP1 inhibitory pathway. Science https://doi.org/10.1126/science.aau0159 (2019).
Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).
Wolf, D. M. et al. Redefining breast cancer subtypes to guide treatment prioritization and maximize response: predictive biomarkers across 10 cancer therapies. Cancer Cell https://doi.org/10.1016/j.ccell.2022.05.005 (2022).
Hakimi, A. A. et al. An integrated metabolic atlas of clear cell renal cell carcinoma. Cancer Cell 29, 104–116 (2016).
Xiao, Y. et al. Comprehensive metabolomics expands precision medicine for triple-negative breast cancer. Cell Res 32, 477–490 (2022).
Xiao, Y. et al. Multi-omics profiling reveals distinct microenvironment characterization and suggests immune escape mechanisms of triple-negative breast cancer. Clin. Cancer Res. 25, 5002–5014 (2019).
Pusztai, L. et al. Durvalumab with olaparib and paclitaxel for high-risk HER2-negative stage II/III breast cancer: results from the adaptively randomized I-SPY2 trial. Cancer Cell https://doi.org/10.1016/j.ccell.2021.05.009 (2021).
Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).
Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53, 1334–1347 (2021).
Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019).
Uhrig, S. et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460 (2021).
Ding, R. et al. Breast cancer screening and early diagnosis in Chinese women. Cancer Biol. Med. https://doi.org/10.20892/j.issn.2095-3941.2021.0676 (2022).
Lee, S. K. et al. Is the high proportion of young age at breast cancer onset a unique feature of Asian breast cancer? Breast Cancer Res. Treat. 173, 189–199 (2019).
Zhu, B. et al. Comparison of somatic mutation landscapes in Chinese versus European breast cancer patients. HGG Adv. 3, 100076 (2022).
Wander, S. A. et al. The genomic landscape of intrinsic and acquired resistance to cyclin-dependent kinase 4/6 inhibitors in patients with hormone receptor-positive metastatic breast cancer. Cancer Discov. 10, 1174–1193 (2020).
Kalinsky, K. et al. Effect of capivasertib in patients with an AKT1 E17K-mutated tumour: NCI-MATCH subprotocol EAY131-Y nonrandomized trial. JAMA Oncol. 7, 271–278, (2021).
Smyth, L. M. et al. Capivasertib, an AKT kinase inhibitor, as monotherapy or in combination with fulvestrant in patients with AKT1 (E17K)-mutant, ER-positive metastatic breast cancer. Clin. Cancer Res. 26, 3947–3957 (2020).
Jones, R. H. et al. Fulvestrant plus capivasertib versus placebo after relapse or progression on an aromatase inhibitor in metastatic, oestrogen receptor-positive breast cancer (FAKTION): a multicentre, randomised, controlled, phase 2 trial. Lancet Oncol. 21, 345–357 (2020).
Gianni, L. et al. Efficacy and safety of neoadjuvant pertuzumab and trastuzumab in women with locally advanced, inflammatory, or early HER2-positive breast cancer (NeoSphere): a randomised multicentre, open-label, phase 2 trial. Lancet Oncol. 13, 25–32 (2012).
Robidoux, A. et al. Lapatinib as a component of neoadjuvant therapy for HER2-positive operable breast cancer (NSABP protocol B-41): an open-label, randomised phase 3 trial. Lancet Oncol. 14, 1183–1192 (2013).
de Azambuja, E. et al. Lapatinib with trastuzumab for HER2-positive early breast cancer (NeoALTTO): survival outcomes of a randomised, open-label, multicentre, phase 3 trial and their association with pathological complete response. Lancet Oncol. 15, 1137–1146 (2014).
Gianni, L. et al. Neoadjuvant chemotherapy with trastuzumab followed by adjuvant trastuzumab versus neoadjuvant chemotherapy alone, in patients with HER2-positive locally advanced breast cancer (the NOAH trial): a randomised controlled superiority trial with a parallel HER2-negative cohort. Lancet 375, 377–384 (2010).
Shao, Z. et al. Efficacy, safety, and tolerability of pertuzumab, trastuzumab, and docetaxel for patients with early or locally advanced ERBB2-positive breast cancer in Asia: the PEONY Phase 3 randomized clinical trial. JAMA Oncol. 6, e193692 (2020).
Llombart-Cussac, A. et al. HER2-enriched subtype as a predictor of pathological complete response following trastuzumab and lapatinib without chemotherapy in early-stage HER2-positive breast cancer (PAMELA): an open-label, single-group, multicentre, phase 2 trial. Lancet Oncol. 18, 545–554 (2017).
Prat, A. et al. HER2-enriched subtype and ERBB2 expression in HER2-positive breast cancer treated with dual HER2 blockade. J. Natl Cancer Inst. 112, 46–54 (2020).
Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013).
Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50 (2018).
Tang, X. et al. A joint analysis of metabolomics and genetics of breast cancer. Breast Cancer Res. 16, 415 (2014).
Terunuma, A. et al. MYC-driven accumulation of 2-hydroxyglutarate is associated with breast cancer prognosis. J. Clin. Invest. 124, 398–412 (2014).
Nguyen, T. et al. Uncovering the role of N-acetyl-aspartyl-glutamate as a glutamate reservoir in cancer. Cell Rep. 27, 491–501 (2019).
Muthusamy, T. et al. Serine restriction alters sphingolipid diversity to constrain tumour growth. Nature 586, 790–795 (2020).
Ogretmen, B. Sphingolipid metabolism in cancer signalling and therapy. Nat. Rev. Cancer 18, 33–50 (2018).
Zheng, J. & Conrad, M. The metabolic underpinnings of ferroptosis. Cell Metab. 32, 920–937 (2020).
Chen, X., Kang, R., Kroemer, G. & Tang, D. Broadening horizons: the role of ferroptosis in cancer. Nat. Rev. Clin. Oncol. 18, 280–296 (2021).
Jiang, L. et al. Radiogenomic analysis reveals tumour heterogeneity of triple-negative breast cancer. Cell Rep. Med. 3, 100694 (2022).
Zhao, S. et al. Deep learning framework for comprehensive molecular and prognostic stratifications of triple-negative breast cancer. Fundam. Res. https://doi.org/10.1016/j.fmre.2022.06.008 (2022).
Jiang, Y.-Z. et al. Integrated molecular portraits of breast cancer. Nat. Protoc. https://doi.org/10.21203/rs.3.pex-2435/v1 (2023).
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167 (2009).
Paquet, E. R. & Hallett, M. T. Absolute assignment of breast cancer intrinsic molecular subtype. J. Natl Cancer Inst. https://doi.org/10.1093/jnci/dju357 (2015).
Chen, B., Khodadoust, M. S., Liu, C. L., Newman, A. M. & Alizadeh, A. A. Profiling tumour infiltrating immune cells with CIBERSORT. Methods Mol. Biol. 1711, 243–259 (2018).
Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17, 218 (2016).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102, 15545–15550 (2005).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14, 7 (2013).
Telli, M. L. et al. Homologous recombination deficiency (HRD) score predicts response to platinum-containing neoadjuvant chemotherapy in patients with triple-negative breast cancer. Clin. Cancer Res. 22, 3764–3773 (2016).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Benard, B. A. et al. Clonal architecture predicts clinical outcomes and drug sensitivity in acute myeloid leukemia. Nat. Commun. 12, 7244 (2021).
Amin, S. B. et al. Comparative molecular life history of spontaneous canine and human gliomas. Cancer Cell 37, 243–257 (2020).
Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).
McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016).
Chen, D. et al. Identification and characterization of robust hepatocellular carcinoma prognostic subtypes based on an integrative metabolite-protein interaction network. Adv. Sci. 8, e2100311 (2021).
Johansson, H. J. et al. Breast cancer quantitative proteome and proteogenomic landscape. Nat. Commun. 10, 1600 (2019).
Chen, Y. J. et al. Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression. Cell 182, 226–244 (2020).
Avants, B. B., Epstein, C. L., Grossman, M. & Gee, J. C. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12, 26–41 (2008).
Neal, J. T. et al. Organoid modeling of the tumour immune microenvironment. Cell 175, 1972–1988 (2018).
Sachs, N. et al. A living biobank of breast cancer organoids captures disease heterogeneity. Cell 172, 373–386 (2018).
Acknowledgements
This work was supported by grants from the National Key Research and Development Project of China (grant no. 2020YFA0112304 to Z.-M.S. and Y.-Z.J., and 2021YFF1201300 to Y.-Z.J., W.Huang and J.S.), the National Natural Science Foundation of China (grant nos. 92159301, 82341003 and 91959207 to Z.-M.S., 82272822 to Y.-Z.J, 82272704 to D.M. and 32370701 to L.S.), the Shanghai Key Laboratory of Breast Cancer (grant no. 12DZ2260100 to Z.-M.S.), the Shanghai Hospital Development Center Municipal Project for Developing Emerging and Frontier Technology in Shanghai Hospitals (grant no. SHDC12021103 to Z.-M.S.), the Program of Shanghai Academic/Technology Research Leader (grant no. 20XD1421100 to Y.-Z.J.), the Natural Science Foundation of Shanghai (grant no. 22ZR1479200 to Y.-Z.J. and 23ZR1411800 to X.J.), the Shanghai Rising-Star Program (grant no. 23QA1401400 to D.M.), the Youth Talent Program of Shanghai Health Commission (grant no. 2022YQ012 to X.J.) and the Shanghai Municipal Science and Technology Major Project (grant no. 2023SHZDZX02 to L.S.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We are grateful to Computing for the Future at Fudan and the Human Phenome Data Center of Fudan University for computing support. We also thank J. Xu from Nanjing University of Information Science and Technology for editing the manuscript.
Author information
Authors and Affiliations
Contributions
Z.-M.S., W.Huang, Y.Z. and Y.-Z.J. outlined the manuscript content. J.S. and W.Hunag performed the genomic sequencing. Y.Y., W.Hou, Y.L., Q.C., J.Y., N.Z., L.S. and Y.Z. performed RNA sequencing and contributed to data processing and analysis. W.L., W.G. and T.G. performed proteomics. S.Z., G.-H.S., W.-T.Y., C.Y. and Y.G. contributed to multimodal data integration. Y.-Z.J., D.M., X.J., Y.-F.Z., T.F., C.-J.L., L.-J.D., C.-L.L. and W.-J.Z. contributed to literature survey, data collection and data analysis. Y.-Z.J., D.M., X.J. and Y.X. prepared the figures and drafted the manuscript, with contributions from all authors. V.K., F.B., C.V., A.D., N.M.S., T.W. and C.M.P. helped with data interpretation and manuscript editing. All authors approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Cancer thanks Xiaohong Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Clinical and molecular characteristics of the Chinese Breast cancer Genome Atlas (CBCGA) cohort.
a, Cohort and omics information. b, The matching information between Immunohistochemistry (IHC) subtypes and PAM50 subtypes is displayed using a confusion matrix in which numbers in the diagonal represent subtype agreement between the two subtyping methods (in n = 752 tumors). Abbreviations for PAM50 subtypes: LumA, luminal A; LumB, luminal B; HER2, HER2-enriched; Basal, basal-like; Normal, normal-like. c, The matching information between AIMS subtypes and PAM50 subtypes is displayed using a confusion matrix (in n = 752 tumors). d, Differentially expressed proteins across PAM50 subtypes. From left to right, differential expression analysis were conducted between Luminal A (n = 56 tumors), Luminal B (n = 77 tumors), HER2-enriched (n = 59 tumors), Basal-like (n = 59 tumors) and the other subtypes (n = 215, 194, 212 and 212 tumors respectively). e, f, Differentially expressed polar metabolites (e) and lipids (f) across PAM50 subtypes. From left to right, differential expression analysis were conducted between Luminal A (n = 119 tumors), Luminal B (n = 144 tumors), HER2-enriched (n = 98 tumors), Basal-like (n = 52 tumors) and other subtypes (n = 324, 299, 345 and 391 tumors respectively). For d-f, two-sided P values were determined by Mann–Whitney U-test and adjusted by the Benjamini–Hochberg procedure. Proteins, polar metabolites and lipids were colored gray if they didn’t meet the criteria that the absolute value of log2 Fold Change (log2FC) is greater than 1 or FDR < 0.05.
Extended Data Fig. 2 Comparisons between the breast cancers raised in CBCGA Chinese and the Cancer Genome Atlas (TCGA) white individuals.
a–e, Gene-level somatic mutation frequencies of the IDC cases in the Luminal A (CBCGA: n = 182 tumors; TCGA: n = 229 tumors) (a), Luminal B (CBCGA: n = 180 tumors; TCGA: n = 183 tumors) (b), HER2-enriched (CBCGA: n = 121 tumors; TCGA: n = 35 tumors) (c), Basal-like (CBCGA: n = 83 tumors; TCGA: n = 86 tumors) (d) and Normal-like (CBCGA: n = 41 tumors; TCGA: n = 9 tumors) (e) cohorts. f, AKT1 mutation frequency found in IDC cases in East Asian (CBCGA: n = 624 tumors; Targeted sequencing cohort: n = 3,208 tumors; NCCH: n = 311 tumors) and white individuals (TCGA: n = 474 tumors; METABRIC: n = 1,866 tumors) breast cancer cohorts. ‘*’ denotes the cohorts where PAM50 subtypes are not available, AKT1 mutation frequency in all cases is shown. g, AKT1 mutation sites found in luminal A IDC patients in the CBCGA (upper) and TCGA white individuals (lower) cohorts.
Extended Data Fig. 3 Comparisons in molecular subtype and ERBB2 amplification between the breast cancers raised in CBCGA Chinese and TCGA white individuals.
a, b, Proportion of Luminal A (a) and HER2-enriched (b) breast cancer in the IDC cases of CBCGA Chinese (n = 716 tumors) and TCGA Asian (n = 47 tumors) compared with TCGA white individuals (n = 490 tumors) and METABRIC (n = 1974 tumors) cohorts. c–e, Gene-level somatic copy number alterations of the IDC cases in the CBCGA and TCGA white individuals cohorts grouped by IHC-based subtypes: amplifications (upper) and deletions (lower) in HR+HER2- (c), HR-HER2+ (d) and triple-negative breast cancer (e). For a-b, P values were obtained from two-sided Fisher’s exact test and adjusted by the Benjamini–Hochberg procedure.
Extended Data Fig. 4 Quality control of proteomics and impact of copy number alteration on mRNA and protein expression.
a, Bar plot showing the detected genes in each batch. The totality of detected genes was 10864. b, Principal component analysis (PCA) evaluating the batch effect with all genes that were detected in over 70% of included samples after normalization and batch effect removement. c, Dot plots showing the Pearson’s correlation between technical replicates (samples within batch 33 and 34) with all genes that were detected in over 70% of included samples after normalization and batch effect removement. d, Venn diagrams depicting the cis-effect of CNA (FDR < 0.05) along the central dogma in this study and the studies published by Mertins and colleagues10 (n = 74 tumors) and by Krug and colleagues11 (n = 122 tumors). e, f, Boxplot showing the mRNA level and protein level of WWP1 (e) and CCND1 (f) across different GISTIC scores in each PAM50 subtype. For WWP1 analysis, the number of samples were as follows: LumA: n = 188 tumors in the RNA analysis and n = 52 tumors in the protein analysis; LumB: n = 198 tumors in the RNA analysis and n = 73 tumors in the protein analysis; HER2: n = 121 tumors in the RNA analysis and n = 49 tumors in the protein analysis; Basal: n = 88 tumors in the RNA analysis and n = 44 tumors in the protein analysis; Normal: n = 47 tumors in the RNA analysis and n = 20 tumors in the protein analysis. For CCND1 analysis, the number of samples were as follows: LumA: n = 147 tumors in the RNA analysis and n = 41 tumors in the protein analysis; LumB: n = 163 tumors in the RNA analysis and n = 58 tumors in the protein analysis; HER2: n = 105 tumors in the RNA analysis and n = 41 tumors in the protein analysis; Basal: n = 76 tumors in the RNA analysis and n = 31 tumors in the protein analysis; Normal: n = 37 tumors in the RNA analysis and n = 12 tumors in the protein analysis. In boxplots, the centreline represents the median, the box limits represent the upper and lower quartiles, the whiskers represent the 1.5× interquartile range, and the points represent individual samples. g, h, Forest plot of multivariate Cox regression analysis for relapse free survival adjusting for PAM50 clusters, tumor size and lymph node status in overall population (n = 271 tumors) (g) and HR+HER2- subgroup (n = 148 tumors) (h). Error bars represent the 95% confidence intervals (CI) of the hazard ratio (HR) and the center for the error bars indicates HRs. i, Gene set enrichment analysis (GSEA) comparing the molecular characteristics of each integrated cluster with the others. Pathways that were significantly enriched in certain cluster (FDR < 0.25) were shown. j, Heat map showing the abundance of immune cells in Cluster 3 (n = 75 tumors) and non-Cluster 3 (n = 196 tumors) breast cancers. Cell types that were significantly elevated in Cluster 3 subgroup were marked with asterisks. k, Enrichment of immunotherapy predictive signatures in integrated clusters and PAM50 subtypes indicated by logistic model in overall population (n = 271 tumors) and HR+HER2- (n = 148 tumors) subgroups. For d, P values were obtained from Spearman’s rank test with false discovery rate correction. For e, f, two-sided Wilcoxon rank tests were conducted to compare the mRNA level or protein level between samples with GISTIC scores of ‘0’ and ‘2’ in different PAM50 subtypes. *: P value < 0.05; N.S.: not significant, P value > 0.05. For g, h, P values were obtained from two-sided multivariate Cox regression analysis. The bold font indicates a P value less than 0.05. For j, P values were obtained from unpaired two-sided t-test.
Extended Data Fig. 5 Quality control and overview of polar metabolomic and lipidomic data in CBCGA.
a, The distribution of quality control (QC) samples in principal component analysis (PCA) of polar metabolomic data in positive- (left panel) and negative- (right panel) ion modes. b, The distribution of QC samples in PCA of lipidomic data in positive- (left panel) and negative- (right panel) ion modes. c, The numbers and proportions of annotated polar metabolites (top panel) and lipids (bottom panel) in our study. FA, Fatty Acid; GL, Glycerolipid; GP, Glycerophospholipid; SP, Sphingolipid; ST, Sterol Lipids. d, A volcano plot of the 669 annotated polar metabolites (top panel) and 1312 lipids (bottom panel) profiled. Differentially abundant metabolites of different categories were individually color coded. e, Log2 fold change (FC) of different categories of polar metabolites (top panel) and lipids (bottom panel) between tumor and normal tissues. The dashed red line represents the same level of metabolite abundance between the tumor and the normal. Tumor, n = 501 biologically independent samples; Normal, n = 76 biologically independent samples. Center line indicates the median, and bounds of box indicate the 25th and 75th percentiles, the whiskers represent the 1.5× interquartile range. f, A pathway-based analysis of metabolomic changes between tumor and normal tissues. The differential abundance (DA) score captures the average, gross changes for all metabolites in a pathway. A score of 1 indicates that all measured metabolites in the pathway increase in the tumor compared to normal tissues, and a score of −1 indicates that all measured metabolites in a pathway decrease. Pathways with no less than three measured metabolites were used for DA score calculation. Tumor, n = 501 biologically independent samples; Normal, n = 76 biologically independent samples. For d, P values are calculated using the two-sided Kruskal–Wallis test and adjusted by the Benjamini–Hochberg procedure.
Extended Data Fig. 6 Integrated analysis of immunogenomic characteristics of breast cancer.
a, CIBERSORT estimated cell proportion of 22 types of immune cells among TME phenotypes (Cold: n = 296 tumors; Moderate: n = 191 tumors; Hot: n = 265 tumors). Cell abundance was normalized across samples. b, ESTIMATE evaluated immune and stromal signatures among different TME phenotypes in each PAM50 subtype (LumA: n = 222 tumors; LumB: n = 221 tumors; HER2: n = 148 tumors; Basal: n = 112 tumors; Normal: n = 49 tumors). For the boxplot, center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. c, K-means clustering of TCGA cohort based on the estimated abundance of 24 microenvironment cell types (Cold: n = 419 tumors; Moderate: n = 458 tumors; Hot: n = 202 tumors). d, Distribution of TME phenotypes across the PAM50 subtypes in TCGA cohort. e, Proportions of tumor microenvironment cells deconvoluted from scRNA-seq data (n = 752 tumors). f, g, Comparison of MHC (f) and innate immune (g) molecules expression among TME phenotypes in each indicated PAM50 subtype (n = 752 tumors). h, Comparison of virus mimicry signature among TME phenotypes in each indicated intrinsic subtype (LumA: n = 222 tumors; LumB: n = 221 tumors; HER2: n = 148 tumors; Basal: n = 112 tumors; Normal: n = 49 tumors). Center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. For b,h, P values are calculated using the two-sided Kruskal–Wallis test adjusted by Benjamini–Hochberg (BH) procedure.
Extended Data Fig. 7 Recurrent ERBB2 fusion transcripts in HER2-positive tumors.
a, Distribution of fusion genes across chromosomes. b, The circle represents the landscape of fusion genes. Recurrent fusions (more than two samples) are displayed as connected gene pairs, in which the width of the connecting arc represents the number of samples that contained the fusion. Red indicates novel gene fusions not present in public database (FusionGDB and ChimerDB). c, Bar chart showing the top 11 recurrent fusion genes. d, e, Distribution of fusion genes in IHC subtypes (d) (HR+HER2-, n = 468 tumors; HR+HER2+ , n = 100 tumors; HR-HER2 + , n = 81 tumors; TNBC, n = 103 tumors; Paratumour, n = 60 samples) and PAM50 subtypes (e) (Luminal A, n = 222 tumors; Luminal B, n = 221 tumors; HER2-enriched, n = 148 tumors; Basal-like, n = 112 tumors; Normal-like, n = 49 tumors; Paratumour, n = 60 samples). For the boxplot, center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. f, The proportions of fusion types proximal to ERBB2 on chromosome 17q. g, Circos plot displaying ERBB2 fusions. h, Propensity-matched survival analysis for HER2-positive patients with or without ERBB2 fusions. For d, e, the statistical analysis was performed using the Kruskal–Wallis test. For h, survival distributions were compared using the log-rank test.
Extended Data Fig. 8 Data dimension, overall performance multimodal prognosis prediction model and feature importance of TMPIC model.
a, Upset plot showing the number of patients of different data modality combinations. Vertical bars of upper plot present the number of patients of data modality combinations denoted by the black circles of the plot located below. C, clinical stage; I, IHC subtype; T, transcriptomic data; P, digital pathology data; M, metabolomic data; R, radiologic data. b, Comparison of C-indices of models of single modalities (n = 6 models), of 2 to 3 modalities (n = 15 models) and of 4 to 6 modalities (n = 16 models). For the boxplot, center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. FDR, false discovery rate. c, Feature importance score of TMPIC model. New C-indices were calculated as dropping each individual feature from the TMPIC model. Feature importance score calculated as the difference of original C-index and new C-index in the testing cohort (n = 80 patients). For b, P values were obtained from the Kruskal–Wallis test with false discovery rate correction.
Supplementary information
Supplementary Table 1
a, Clinical and molecular characteristics of the involved patients. b, Mutational signatures contribution per intrinsic subtype. c, Frequent somatic mutations and germline variants shown in Fig. 1. d, Frequent cancer-related copy number gain/amplification between different intrinsic subtypes. e, Frequent cancer-related copy number loss/deletion between different intrinsic subtypes. f, Transcriptome data shown in Fig. 1. g, Differentially expressed proteins across intrinsic subtypes. h, Differentially expressed polar metabolites across intrinsic subtypes. i, Differentially expressed lipids across intrinsic subtypes.
Supplementary Table 2
a, Clinical features and molecular subtypes between CBCGA and TCGA white individuals. b, Frequent mutations between CBCGA and TCGA white individuals (IDC). c, Intrinsic subtypes between CBCGA and TCGA white individuals (IDC). d, Enriched copy number amplifications between CBCGA and TCGA white individuals (IDC). e, Enriched copy number deletions between CBCGA and TCGA white individuals (IDC).
Supplementary Table 3
Effects of CNAs on mRNA and protein (P values were calculated using the two-sided Spearman’s rank test and were adjusted for multiple testing using the FDR method).
Supplementary Table 4
a, Additional samples. Supplementary information of the additional 58 TNBC samples for metabolomic detection. b, Polar metabolites. log2 transformed abundance of MS2 annotated polar metabolites in tumor and normal tissues of the CBCGA cohort. c, Lipids. log2 transformed abundance of MS2 annotated lipids in tumor and healthy tissues of the CBCGA cohort. d, Protein network. Protein annotations of metabolic protein network. e, Metabolite network. Polar metabolite annotations of metabolite network. f, Correlations. Correlation of subtype-specific metabolic proteins and subtype-specific polar metabolites.
Supplementary Table 5
a, Single-sample GSEA estimated abundance of tumor microenvironment cells. b, CIBERSORT estimated proportion of tumor microenvironment cells. c, scRNA deconvolution. Deconvoluted proportion of tumor microenvironment cells based on scRNA-seq data. d, Immunogenomic indicators of the cohort. e, Somatic mutations of each TME phenotypes. f, Copy-number alterations of each TME phenotypes.
Supplementary Table 6
a, List of fusion events in CBCGA cohort. b, The reading frame of fusion transcripts in CBCGA cohort.
Supplementary Table 7
a, Features for multimodal integration. b, C-indices of models combining multimodal features to stratify patient prognosis in the testing cohort. c, Risk scores for each patient and values of multimodal features used in the TMPIC model.
Source data
Source Data Fig. 1
Statistical Source Data.
Source Data Fig. 2
Statistical Source Data.
Source Data Fig. 3
Statistical Source Data.
Source Data Fig. 4
Statistical Source Data.
Source Data Fig. 5
Statistical Source Data.
Source Data Fig. 6
Statistical Source Data.
Source Data Extended Data Fig. 1
Statistical Source Data.
Source Data Extended Data Fig. 2
Statistical Source Data.
Source Data Extended Data Fig. 3
Statistical Source Data.
Source Data Extended Data Fig. 4
Statistical Source Data.
Source Data Extended Data Fig. 5
Statistical Source Data.
Source Data Extended Data Fig. 6
Statistical Source Data.
Source Data Extended Data Fig. 7
Statistical Source Data.
Source Data Extended Data Fig. 8
Statistical Source Data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, YZ., Ma, D., Jin, X. et al. Integrated multiomic profiling of breast cancer in the Chinese population reveals patient stratification and therapeutic vulnerabilities. Nat Cancer 5, 673–690 (2024). https://doi.org/10.1038/s43018-024-00725-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43018-024-00725-0
This article is cited by
-
Multicenter radio-multiomic analysis for predicting breast cancer outcome and unravelling imaging-biological connection
npj Precision Oncology (2024)
-
Epoxy metabolites of linoleic acid promote the development of breast cancer via orchestrating PLEC/NFκB1/CXCL9-mediated tumor growth and metastasis
Cell Death & Disease (2024)