Abstract
Tumor purity is the proportion of tumor cells in the sampled admixture. Estimating tumor purity is one of the key steps for both understanding the tumor micro-environment and reducing false positives and false negatives in the genomic analysis. However, existing approaches often lose some accuracy when analyzing the samples with high degree of heterogeneity. The patterns of clonal architecture shown in sequencing data interfere with the data signals that the purity estimation algorithms expect. In this article, we propose a computational method, EMPurity, which is able to accurately infer the tumor purity of the samples with high degree of heterogeneity. EMPurity captures the patterns of both the tumor purity and clonal structure by a probabilistic model. The model parameters are directly calculated from aligned reads, which prevents the errors transferring from the variant calling results. We test EMPurity on a series of datasets comparing to three popular approaches, and EMPurity outperforms them on different simulation configurations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216), 1061–1068 (2008)
International Cancer Genome Consortium (2016). http://icgc.org
Loo, P., Nordgard, S., Lingjærde, O., et al.: Allele-specific copy number analysis of tumors. Proc. Natl. Acad. Sci. U.S.A. 107(39), 16910–16915 (2010)
Cibulskis, K., Lawrence, M., Carter, S., et al.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31(3), 213–219 (2013)
Larson, D., Harris, C., Chen, K., et al.: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28(3), 311–317 (2012)
Roth, A., Ding, J., Morin, R., et al.: JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28(7), 907–913 (2012)
Carter, S., Cibulskis, K., Helman, E., et al.: Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30(5), 413–421 (2012)
Gusnanto, A., Wood, H., Pawitan, Y., et al.: Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics 28(1), 40–47 (2012)
Oesper, L., Mahmoody, A., Raphael, B.: THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 14(7), R80 (2013)
Yoshihara, K., Shahmoradgoli, M., Martínez, E., et al.: Inferring tumour purity and stromal and immune cell admixture from expression data. Nature Commun. 4(4), 2612 (2013)
Su, X., Zhang, L., Zhang, J., et al.: PurityEst: estimating purity of human tumor samples using next-generation sequencing data. Bioinformatics 28(17), 2265–2266 (2012)
Berger, M., Lawrence, M., Demichelis, F., et al.: The genomic complexity of primary human prostatecancer. Nature 470(7333), 214–220 (2011)
Larson, N., Fridley, B.: PurBayes: estimating tumor cellularity and subclonality in next-generation sequencing data. Bioinformatics 29(15), 1888–1889 (2013)
Miller, C., White, B., Dees, N., et al.: SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 10(8), e1003665 (2014)
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Lu, C., Xie, M., Wendl, M., Wang, J., McLellan, M., Leiserson, M., et al.: Patterns and functional implications of rare germline variants across 12 cancer types. Nature Commun. 6, 10086 (2015)
Xie, M., Lu, C., Wang, J., et al.: Age-related cancer mutations associated with clonal hematopoietic expansion. Nat. Med. 20(12), 1472–1478 (2014)
Acknowledgement
This work is supported by the National Science Foundation of China (Grant No: 81400632), Shaanxi Science Plan Project (Grant No: 2014JM8350) and the Fundamental Research Funds for the Central Universities (XJTU).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Geng, Y. et al. (2017). Accurately Estimating Tumor Purity of Samples with High Degree of Heterogeneity from Cancer Sequencing Data. In: Huang, DS., Jo, KH., Figueroa-García, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-63312-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)