[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

CNV_MCD: : Detection of copy number variations based on minimum covariance determinant using next-generation sequencing data

Published: 21 November 2024 Publication History

Abstract

Copy number variation (CNV), a pivotal form of genomic structural variation, plays a critical role in the genetic diversity of cancer genomes. In numerous studies, the identification of CNVs is commonly approached as an issue of outlier detection. To address this, read depth (RD) signals for genomic segments are first extracted from next-generation sequencing (NGS) data. CNVs are detected by assigning outlier scores to genomic segments based on the distance between their RD signals and those of adjacent segments. However, the mean and covariance estimators of the global distribution commonly utilized for calculating distance are susceptible to the effect of CNVs, resulting in inaccurate CNV detection. To solve this problem, we introduce a new method, CNV_MCD, for detecting CNVs based on the minimum covariance determinant (MCD). CNV_MCD employs the MCD method to estimate the mean and covariance of the RD profile, circumventing the need for direct computation of these parameters and ensuring the minimization of the determinant of the covariance matrix. This approach enables the calculation of a robust distance for each genomic segment, which serves as an outlier score. Furthermore, we implement a fast median filtering to correct for baseline drift in the outlier scores and use a chi-squared approximation to determine the cutoff distance for CNVs. These enhancements facilitate the detection of small CNVs, establishing CNV_MCD as a complementary method for CNV detection in low-coverage sequencing data. Extensive experiments on both simulated and real datasets demonstrate that CNV_MCD outperforms other popular CNV detection methods. Overall, our method offers a more robust and reliable technique for CNV detection, playing a crucial role in elucidating the genetic mechanisms underlying complex diseases such as cancer.

References

[1]
R. Beroukhim, C.H. Mermel, D. Porter, et al., The landscape of somatic copy-number alteration across human cancers, Nature 463 (2010) 899–905.
[2]
J. Zhang, L. Feuk, G.E. Duggan, R. Khaja, S.W. Scherer, Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome, Cytogenet. Genome Res. (2006) 205–214.
[3]
D.F. Conrad, D. Pinto, R. Redon, et al., Origins and functional impact of copy number variation in the human genome, Nature (2010) 704–712.
[4]
P. Stankiewicz, J.R. Lupski, Structural variation in the human genome and its role in disease, Annu. Rev. Med. (2010) 437–455.
[5]
E.H.C. Jr, S.W. Scherer, Copy-number variations associated with neuropsychiatric conditions, Nature (7215) (2008) 919.
[6]
X. Yuan, G. Yu, X. Hou, I.M. Shih, R. Clarke, J. Zhang, E.P. Hoffman, R.R. Wang, Z. Zhang, Y. Wang, Genome-wide identification of significant aberrations in cancer genome, BMC. Genomics. 13 (1) (2012) 342. –342.
[7]
K. Xie, K. Liu, H. Alvi, W. Ji, S. Wang, L. Chang, X. Yuan, Ihybcnv: an intra-hybrid approach for cnv detection from next-generation sequencing data, Digit. Signal. Process. 121 (2022).
[8]
T.S. Mei, P. Yudi, K.C. Seng, C.K. Seng, S. Agus, Statistical challenges associated with detecting copy number variations with next-generation sequencing, Bioinformatics. (21) 2711–2718. 20.
[9]
C.H. Mermel, S.E. Schumacher, B. Hill, M.L. Meyerson, Gistic2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biol. 12 (4) (2011) R41. –R41.
[10]
J. Duan, H.W. Deng, Y.P. Wang, Common copy number variation detection from multiple sequenced samples, IEEE Transact. Biomed. Eng. 61 (3) (2013) 928–937.
[11]
L. Zhang, Y. Yuan, H. Karen, Identification of recurrent focal copy number variations and their putative targeted driver genes in ovarian cancer, BMC. Bioinformatics. (2016) 222.
[12]
M. Zhao, Q. Wang, Q. Wang, P. Jia, Z. Zhao, Computational tools for copy number variation (cnv) detection using next-generation sequencing data: features and perspectives, BMC. Bioinformatics. 14 (11) (2013) 1–16.
[13]
K. Ye, G. Hall, Z. Ning, Structural variation detection from next generation sequencing, Next Generat. Sequenc.Applic. 1 (007).
[14]
S. Yoon, Z. Xuan, V. Makarov, K. Ye, J. Sebat, Sensitive and accurate detection of copy number variants using read depth of coverage, Genome Res. 19 (9) (2009) 1586–1592.
[15]
A. Abyzov, A.E. Urban, M. Snyder, M. Gerstein, Cnvnator: an approach to discover, genotype, and characterize typical and atypical cnvs from family and population genome sequencing, Genome Res. 21 (6) (2011) 974–984.
[16]
C.A. Miller, O. Hampton, C. Coarfa, A. Milosavljevic, Readdepth: a parallel r package for detecting copy number alterations from short sequencing reads, PLoS. One 6 (1) (2011) e16327.
[17]
V. Boeva, T. Popova, K. Bleakley, P. Chiche, J. Cappo, G. Schleiermacher, I. Janoueix-Lerosey, O. Delattre, E. Barillot, Control-freec: a tool for assessing copy number and allelic content using next-generation sequencing data, Bioinformatics. 28 (3) (2012) 423–425.
[18]
Z. Yu, Y. Liu, Y. Shen, M. Wang, A. Li, Climat: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data, Bioinformatics. 30 (18) (2014) 2576–2583.
[19]
S.D. Smith, J.K. Kawash, A. Grigoriev, Grom-rd: resolving genomic biases to improve read depth detection of copy number variants, PeerJ. 3 (2015) e836.
[20]
Z. Yu, A. Li, M. Wang, Climat-het: detecting subclonal copy number alterations and loss of heterozygosity in heterogeneous tumor samples from whole-genome sequencing data, BMC. Med. Genomics. 10 (1) (2017) 1–11.
[21]
P. Dharanipragada, S. Vogeti, N. Parekh, icopydav: integrated platform for copy number variationsdetection, annotation and visualization, PLoS. One 13 (4) (2018).
[22]
Y. Li, J. Zhang, X. Yuan, J. Li, dpGMM: a dirichlet process gaussian mixture model for copy number variation detection in low-coverage whole-genome sequencing data, IEEe Access. (2020) 27973–27985.
[23]
Y. Chen, L. Zhao, Y. Wang, et al., SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data, Bmc Bioinf (2017) 147.
[24]
X. Yuan, J. Yu, J. Xi, L. Yang, J. Shang, Z. Li, J. Duan, Cnv iftv: an isolation forest and total variation-based detection of cnvs from short-read sequencing data, IEEE/ACM. Trans. Comput. Biol. Bioinform. 18 (2) (2019) 539–549.
[25]
J. Duan, J.G. Zhang, H.W. Deng, Y.P. Wang, CNV-TV: a robust method to discover copy number variation from short sequencing reads, BMC. Bioinformatics. (2013) 150.
[26]
Y. Li, J. Zhang, X. Yuan, BagGMM: calling copy number variation by bagging multiple Gaussian mixture models from tumor and matched normal next-generation sequencing data, Digit. Signal. Process. (2019) 88.
[27]
X. Yuan, J. Li, J. Bai, J. Xi, A local outlier factor-based detection of copy number variations from ngs data, IEEE/ACM. Trans. Comput. Biol. Bioinform. 18 (5) (2019) 1811–1820.
[28]
E. Venkatraman, A.B. Olshen, A faster circular binary segmentation algorithm for the analysis of array cgh data, Bioinformatics. 23 (6) (2007) 657–663.
[29]
M. Suvakov, A. Panda, C. Diesh, I. Holmes, A. Abyzov, CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing, Gigascience 10 (11) (2021) giab074.
[30]
K. Xie, Y. Tian, X. Yuan, A density peak-based method to detect copy number variations from next-generation sequencing data, Front. Genet. 11 (2021).
[31]
P.J. Rousseeuw, K.V. Driessen, A fast algorithm for the minimum covariance determinant estimator, Technometrics. 41 (3) (1999) 212–223. 22.
[32]
C. Croux, G. Haesbroeck, Influence function and efficiency of the minimum covariance determinant scatter matrix estimator, J. Multivar. Anal. 71 (2) (1999) 161–190.
[33]
H. Li, R. Durbin, Fast and accurate short read alignment with burrows-wheeler transform, Bioinformatics. (2009) 1754–1760.
[34]
H. Li, B. Handsaker, A. Wysoker, T. Fennell, R. Durbin, 1000 genome project data processing subgroup. the sequence alignment/map (sam) format and samtools, Bioinformatics. (2009) 2078–2079.
[35]
R. Tibshirani, M. Saunders, S. Rosset, Z. Ji, K. Knight, Sparsity and smoothness via the fused lasso, J. Royal Stat. Soc. 67 (1) (2010) 91–108.
[36]
X. Yuan, G. Yu, X. Hou, I.M. Shih, R. Clarke, J. Zhang, E.P. Hoffman, R.R. Wang, Z. Zhang, Y. Wang, Genome-wide identification of significant aberrations in cancer genome, BMC. Genomics. 13 (2012) 1–14.
[37]
X. Yuan, J. Zhang, L. Yang, J. Bai, P. Fan, Detection of significant copy number variations from multiple samples in next-generation sequencing data, IEEe Trans. Nanobiosci. 17 (1) (2017) 12–20.
[38]
P. Wang, Algorithms for calling gains and losses in array CGH Data, Pollack, J. (eds) Microarray analysis of the physical genome. Methods in Molecular Biology™ (2009).
[39]
F. Picard, M. Hoebeke, E. Lebarbier, V. Miele, G. Rigaill, S. Robin, cghseg: segmentation methods for array cgh analysis, R package version 1 (2).
[40]
P. Wang, Y. Kim, J. Pollack, B. Narasimhan, R. Tibshirani, A method for calling gains and losses in array cgh data, Biostatistics. 6 (1) (2005) 45–58.
[41]
J. Hardin, D.M. Rocke, The distribution of robust distances, J. Comput. Graph. Stat. 14 (4) (2005) 928–946.
[42]
J. Hardin, D.M. Rocke, Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computat. Stat. Data Anal. 44 (4) (2007) 625–638.
[43]
H.P. Lopuhaa, P.J. Rousseeuw, Breakdown points of affine equivariant estimators of multivariate location and covariance matrices, Ann. Statist. (1991) 229–248.
[44]
X. Yuan, J. Zhang, L. Yang, Intsim: an integrated simulator of nextgeneration sequencing data, IEEE Transact. Biomed. Eng. 64 (2) (2016) 441–451.
[45]
Y. Kondo, I. Yoshida, Y. Yamaguchi, H. Machida, M. Numada, H. Koshimizu, Proposal for roughness evaluation using median filter and investigation of the optimum filter width, Measurement 18 (2021).
[46]
J. Yang, S. Rahardja, P. Fränti, Mean-shift outlier detection and filtering, Pattern. Recognit. 15 (2021).
[47]
A. Shah, J.I. Bangash, A.W. Khan, et al., Comparative analysis of median filter and its variants for removal of impulse noise from gray scale images, J. King Saud University - Comput. Informat. Sci. (2020).
[48]
S. Tania, R. Rowaida, A comparative study of various image filtering techniques for removing various noisy pixels in aerial image, Internat. J. Signal Process., Image process., Pattern Recognit. 19 (3) (2016) 113–124.
[49]
Y. Xin, Y. Chen, W.T. Hao, ECG baseline wander correction based on mean-median filter and empirical mode decomposition, Biomed. Mater. Eng. 24 (1) (2014) 365–371.
[50]
Nijad Al-Najdawi, Mariam Biltawi, Sara Tedmori, Mammogram image visual enhancement, mass segmentation and classification, Appl. Soft Comput. J. 35 (2015) 175–185.
[51]
X. Yuan, J. Bai, J. Zhang, L. Yang, J. Duan, Y. Li, M. Gao, Condel: detecting copy number variation and genotyping deletion zygosity from single tumor samples using sequence data, IEEE/ACM. Trans. Comput. Biol. Bioinform. 17 (4) (2018) 1141–1153.
[52]
S. Sakamoto, H. Inoue, Y. Kohda, S.I. Ohba, M. Kawada, Interferoninduced transmembrane protein 1 (ifitm1) promotes distant metastasis of small cell lung cancer, Int. J. Mol. Sci. 21 (14) (2020) 4934.
[53]
G. Infusini, J.M. Smith, H. Yuan, A. Pizzolla, W.C. Ng, S.L. Londrigan, A. Haque, P.C. Reading, J.A. Villadangos, L.M. Wakim, Respiratory dc use ifitm3 to avoid direct viral infection and safeguard virus-specific cd8+t cell priming, PLoS. One 10 (11) (2015).
[54]
W. Zhan, W. Wang, T. Han, C. Xie, T. Zhang, M. Gan, J.B. Wang, Commd9 promotes tfdp1/e2f1 transcriptional activity via interaction with tfdp1 in non-small cell lung cancer, Cell. Signal. 30 (2017) 59–66.
[55]
L. Yang, Y. Wang, M. Fang, D. Deng, Y. Zhang, C3orf21 ablation promotes the proliferation of lung adenocarcinoma, and its mutation at the rs2131877 locus may serve as a susceptibility marker, Oncotarget. 8 (20) (2017) 33422–33431.
[56]
Q. Zhu, J. Wang, Q. Zhang, F. Wang, J. Liu, Methylation-driven genes pmpcap1, sowahc and znf454 as potential prognostic biomarkers in lung squamous cell carcinoma, Mol. Med. Rep. 21 (3) (2020) 1285–1295.
[57]
Y. Feng, H. Fan, H. Wu, H. He, Z. Lan, H. Xian, J. Liu, Golph3L is a novel prognostic biomarker for epithelial ovarian cancer, J. Cancer 6 (9) (2015) 893–900.
[58]
X. Guan, S. Chen, Y. Liu, L.l. Wang, Y. Zhao, Z.H. Zong, Pum1 promotes ovarian cancer proliferation, migration and invasion, Biochem. Biophys. Res. Commun. 497 (1) (2018) 313–318.
[59]
S. Yu, J. Shen, J. Fei, X. Zhu, M. Yin, J. Zhou, Kndc1 is a predictive marker of malignant transformation in borderline ovarian tumors, Onco. Targets. Ther. 13 (2020) 709.
[60]
C. Chen, J. Liu, G. Xu, Overexpression of piwi proteins in human stage iii epithelial ovarian cancer with lymph node metastasis, Cancer Biomarkers 13 (5) (2013) 315–321.
[61]
L. Chen, N. Zhao, J. Cao, et al., Short-and long-read metagenomics expand individualized structural variations in gut microbiomes, Nat. Commun. 13 (1) (2022) 3175.
[62]
M.U. Ahsan, Q. Liu, J.E. Perdomo, et al., A survey of algorithms for the detection of genomic structural variants from long-read sequencing data, Nat. Methods 20 (8) (2023) 1143–1158.

Index Terms

  1. CNV_MCD: Detection of copy number variations based on minimum covariance determinant using next-generation sequencing data
                Index terms have been assigned to the content through auto-classification.

                Recommendations

                Comments

                Please enable JavaScript to view thecomments powered by Disqus.

                Information & Contributors

                Information

                Published In

                cover image Digital Signal Processing
                Digital Signal Processing  Volume 154, Issue C
                Nov 2024
                623 pages

                Publisher

                Academic Press, Inc.

                United States

                Publication History

                Published: 21 November 2024

                Author Tags

                1. Next-generation sequencing
                2. Copy number variation
                3. Minimum covariance determinant
                4. Median filtering
                5. Low coverage

                Qualifiers

                • Research-article

                Contributors

                Other Metrics

                Bibliometrics & Citations

                Bibliometrics

                Article Metrics

                • 0
                  Total Citations
                • 0
                  Total Downloads
                • Downloads (Last 12 months)0
                • Downloads (Last 6 weeks)0
                Reflects downloads up to 30 Dec 2024

                Other Metrics

                Citations

                View Options

                View options

                Media

                Figures

                Other

                Tables

                Share

                Share

                Share this Publication link

                Share on social media