Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data
<p>Performance of the Multi-view Non-negative Matrix Factorization (MvNMF) set with different values of <math display="inline"><semantics> <mi>k</mi> </semantics></math> and <math display="inline"><semantics> <mi>r</mi> </semantics></math>. (<b>a</b>) is the clustering performance of MvNMF on PAAD and HNSC about <math display="inline"><semantics> <mi>k</mi> </semantics></math>; (<b>b</b>) is the clustering performance of MvNMF on ESCA and COAD about <math display="inline"><semantics> <mi>k</mi> </semantics></math>; (<b>c</b>) is the clustering performance of MvNMF on PAAD, ESCA and COAD about <math display="inline"><semantics> <mi>r</mi> </semantics></math>.</p> "> Figure 2
<p>Performance of the graph regularized MvNMF (GMvNMF) set with different values of <math display="inline"><semantics> <mi>k</mi> </semantics></math>, <math display="inline"><semantics> <mi>r</mi> </semantics></math> and <math display="inline"><semantics> <mi>λ</mi> </semantics></math>. (<b>a</b>) is the clustering performance of GMvNMF on PAAD, HNSC, ESCA and COAD about <math display="inline"><semantics> <mi>k</mi> </semantics></math>; (<b>b</b>) is the clustering performance of GMvNMF on PAAD, HNSC, ESCA and COAD about <math display="inline"><semantics> <mi>r</mi> </semantics></math>; (<b>c</b>) is the clustering performance of GMvNMF on PAAD, HNSC, ESCA and COAD about <math display="inline"><semantics> <mi>λ</mi> </semantics></math>.</p> "> Figure 3
<p>Convergence curves of joint Non-negative Matrix Factorization (jNMF), integrated NMF (iNMF), integrative orthogonality-regularized NMF (iONMF), MvNMF, and GMvNMF.</p> ">
Abstract
:1. Introduction
- In order to effectively cluster and select features for multi-view data at the same time, a novel integrated model called MvNMF is proposed. In the MvNMF framework, the shared basis matrix can reconstruct the potential cluster group structure, which contributed to the improvement of clustering performance. The selection of the co-differential genes can be performed because the shared coefficient matrix can recover the common feature pattern from different views.
- The graph regularization was applied to the objective function to form the GMvNMF method, which ensured that GMvNMF can capture the manifold structure of the multi-view data. This makes sense for the performance improvement of the integrated model.
- Scientific and rational experiments were designed on the cancer genomic data to illustrate the validity of the GMvNMF method and achieve satisfactory results.
2. Materials and Methods
2.1. Joint Non-Negative Matrix Factorization and Representative Variants
2.2. Graph Regularization
2.3. Graph Regularized Multi-View Non-Negative Matrix Factorization
2.3.1. Objective Function
2.3.2. Optimization of GMvNMF
Algorithm 1: GMvNMF |
Data Input: Parameters: Output: , and Initialization: , and Set Repeat Update by (15); Update by (16); Update by (17); ; |
Until convergence |
3. Results
3.1. Datasets
3.2. Parameter Setting
3.3. Convergence and Computational Time Analysis
3.4. Clustering Results
3.4.1. Evaluation Metrics
3.4.2. Multi-View Clustering Results
- The clustering performance of jNMF on PAAD and COAD datasets was better than iNMF, iONMF, and MvNMF. This demonstrates that improvements to the traditional NMF integration model may result in the loss of useful information, which in turn affected the clustering results. However, in the ESCA and HNSC datasets, MvNMF outperformed jNMF, iNMF, and iONMF from the overall perspective of the evaluation metrics. This shows the validity of our proposed MvNMF model, which better preserved the complementary information between multiple views.
- From Table 3, we can see that the precision of the GMvNMF method and the precision of the MvNMF method were similar in the four multi-view datasets. However, the GMvNMF method was at least about 18, 32 and 20% higher than the MvNMF method in terms of AC, recall, and F-measure. Therefore, the GMvNMF method had better clustering performance. This shows that it is necessary to consider the manifold structure that exists in multi-view data.
- Taking the four multi-view datasets in Table 3 as a whole, the proposed GMvNMF method had the best clustering performance. GMvNMF outperformed other methods by about 23, 39, 0.67, and 25%, with respect to the average values of the metrics AC, recall, precision, and F-measure. Therefore, GMvNMF is an effective integration model that takes into account the latent group structure and intrinsic geometric information between multi-view data.
3.5. Gene Selection Results
3.5.1. Co-Differentially Expressed Gene Selection Results
3.5.2. Discussion of Co-Differential Genes
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zhang, W.; Liu, Y.; Sun, N.; Wang, D.; Boydkirkup, J.; Dou, X.; Han, J.D.J. Integrating genomic, epigenomic, and transcriptomic features reveals modular signatures underlying poor prognosis in ovarian cancer. Cell Rep. 2013, 4, 542–553. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Liu, C.C.; Li, W.; Shen, H.; Laird, P.W.; Zhou, X.J. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 2012, 40, 9379–9391. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yan, Q.; Ding, Y.; Xia, Y.; Chong, Y.; Zheng, C.; Yan, Q.; Ding, Y.; Xia, Y.; Chong, Y.; Zheng, C. Class-probability propagation of supervised information based on sparse subspace clustering for hyperspectral images. Remote Sens. 2017, 9, 1017. [Google Scholar] [CrossRef]
- Liu, J.X.; Wang, D.; Gao, Y.L.; Zheng, C.H.; Xu, Y.; Yu, J. Regularized non-negative matrix factorization for identifying differential genes and clustering samples: A survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, PP, 1. [Google Scholar]
- Sun, S. A survey of multi-view machine learning. Neural Comput. Appl. 2013, 23, 2031–2038. [Google Scholar] [CrossRef]
- Wang, J.; Tian, F.; Yu, H.; Liu, C.H.; Zhan, K.; Wang, X. Diverse non-negative matrix factorization for multiview data representation. IEEE Trans. Cybern. 2017, PP, 1–13. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, S. A unified joint matrix factorization framework for data integration. arXiv, 2017; arXiv:1707.08183. [Google Scholar]
- Yang, Z.; Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 2016, 32, 1–8. [Google Scholar] [CrossRef] [PubMed]
- Stražar, M.; Žitnik, M.; Zupan, B.; Ule, J.; Curk, T. Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins. Bioinformatics 2016, 32, 1527–1535. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.Q.; Zheng, C.H.; Zhao, X.M. jNMFMA: A joint non-negative matrix factorization meta-analysis of transcriptomics data. Bioinformatics 2015, 31, 572–580. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, S. Bayesian joint matrix decomposition for data integration with heterogeneous noise. arXiv, 2017; arXiv:1712.03337. [Google Scholar]
- Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 2001, 14, 585–591. [Google Scholar]
- Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1548–1560. [Google Scholar] [PubMed]
- Liu, X.; Zhai, D.; Zhao, D.; Zhai, G.; Gao, W. Progressive image denoising through hybrid graph Laplacian regularization: A unified framework. IEEE Trans. Image Process. 2014, 23, 1491–1503. [Google Scholar] [PubMed]
- Facchinei, F.; Kanzow, C.; Sagratella, S. Solving quasi-variational inequalities via their KKT conditions. Math. Program. 2014, 144, 369–412. [Google Scholar] [CrossRef]
- Katarzyna, T.; Patrycja, C.; Maciej, W. The cancer genome atlas (TCGA): An immeasurable source of knowledge. Contemp. Oncol. 2015, 19, 68–77. [Google Scholar]
- Liu, J.; Wang, C.; Gao, J.; Han, J. Multi-View Clustering Via Joint Nonnegative Matrix Factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining, Austin, TX, USA, 2–4 May 2013; pp. 252–260. [Google Scholar]
- Shahnaz, F.; Berry, M.W.; Pauca, V.P.; Plemmons, R.J. Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 2004, 42, 373–386. [Google Scholar] [CrossRef]
- Jing, P.J.; Shen, H.B. MACOED: A multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics 2015, 31, 634–641. [Google Scholar] [CrossRef] [PubMed]
- Ponder, B.A.J. Cancer genetics. Nature 2001, 411, 336–341. [Google Scholar] [CrossRef]
- Liu, J.X.; Gao, Y.L.; Zheng, C.H.; Xu, Y.; Yu, J. Block-constraint robust principal component analysis and its application to integrated analysis of TCGA data. IEEE Trans. Nanobiosci. 2016, 15, 510–516. [Google Scholar] [CrossRef] [PubMed]
- Safran, M.; Dalah, I.; Alexander, J.; Rosen, N.; Iny, S.T.; Shmoish, M.; Nativ, N.; Bahir, I.; Doniger, T.; Krug, H. Genecards version 3: The human gene integrator. Database 2010, 2010, baq020. [Google Scholar] [CrossRef] [PubMed]
- Chakree, K.; Ovatlarnporn, C.; Dyson, P.J.; Ratanaphan, A. Altered DNA binding and amplification of human breast cancer suppressor gene BRCA1 induced by a novel antitumor compound, [Ru(η6-p-phenylethacrynate)Cl2(pta)]. Int. J. Mol. Sci. 2012, 13, 13183–13202. [Google Scholar] [CrossRef] [PubMed]
- Grabsch, H.; Dattani, M.; Barker, L.; Maughan, N.; Maude, K.; Hansen, O.; Gabbert, H.E.; Quirke, P.; Mueller, W. Expression of DNA double-strand break repair proteins ATM and BRCA1 predicts survival in colorectal cancer. Clin. Cancer Res. 2006, 12, 1494–1500. [Google Scholar] [CrossRef] [PubMed]
- Antoniou, A.; Pharoah, P.D.; Narod, S.; Risch, H.A.; Eyfjord, J.E.; Hopper, J.L.; Loman, N.; Olsson, H.; Johannsson, O.; Borg, A. Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations detected in case series unselected for family history: A combined analysis of 22 studies. Am. J. Hum. Genet. 2003, 72, 1117–1130. [Google Scholar] [CrossRef] [PubMed]
- Rigopoulos, D.N.; Tsiambas, E.; Lazaris, A.C.; Kavantzas, N.; Papazachariou, I.; Kravvaritis, C.; Tsounis, D.; Koliopoulou, A.; Athanasiou, A.E.; Karameris, A. Deregulation of EGFR/VEGF/HIF-1a signaling pathway in colon adenocarcinoma based on tissue microarrays analysis. J. BUON 2010, 15, 107–115. [Google Scholar]
- Lee, J.W.; Soung, Y.H.; Kim, S.Y.; Nam, H.K.; Park, W.S.; Nam, S.W.; Kim, M.S.; Sun, D.I.; Lee, Y.S.; Jang, J.J. Somatic mutations of EGFR gene in squamous cell carcinoma of the head and neck. Clin. Cancer Res. 2005, 11, 2879–2882. [Google Scholar] [CrossRef] [PubMed]
- Bossi, P.; Resteghini, C.; Paielli, N.; Licitra, L.; Pilotti, S.; Perrone, F. Prognostic and predictive value of EGFR in head and neck squamous cell carcinoma. Oncotarget 2016, 7, 74362–74379. [Google Scholar] [CrossRef] [PubMed]
Datasets | Data Types | Normal Samples | Tumor Samples | Genes |
---|---|---|---|---|
PAAD | GE, CNV, ME | 176 | 4 | 19,877 |
ESCA | GE, CNV, ME | 183 | 9 | 19,877 |
HNSC | GE, CNV, ME | 398 | 20 | 19,877 |
COAD | GE, CNV, ME | 262 | 19 | 16,977 |
Methods | Times (s) |
---|---|
jNMF | 2.8808 ± 1.7 × 10−4 |
iNMF | 3.4647 ± 1.3 × 10−3 |
iONMF | 5.7375 ± 2.8 × 10−3 |
MvNMF | 1.3495 ± 7.0 × 10−5 |
GMvNMF | 1.0767 ± 1.4 × 10−4 |
Methods | Metrics | jNMF | iNMF | iONMF | MvNMF | GMvNMF |
---|---|---|---|---|---|---|
PAAD | AC (%) | 70.39 ± 3.71 | 70.30 ± 3.71 | 65.01 ± 2.73 | 63.86 ± 0.78 | 95.59 ± 0.05 |
Recall (%) | 61.78 ± 7.34 | 56.49 ± 8.30 | 53.17 ± 5.48 | 56.30 ± 2.77 | 91.90 ± 5.28 | |
Precision (%) | 97.93 ± 0.03 | 98.35 ± 0.01 | 97.89 ± 0.00 | 97.88 ± 0.03 | 95.99 ± 1.92 | |
F-measure (%) | 71.99 ± 5.26 | 66.92 ± 7.06 | 65.65 ± 4.65 | 69.89 ± 1.92 | 92.12 ± 5.03 | |
ESCA | AC (%) | 65.32 ± 3.70 | 66.42 ± 3.49 | 57.64 ± 0.21 | 68.04 ± 0.70 | 93.23 ± 0.21 |
Recall (%) | 51.48 ± 6.67 | 54.39 ± 6.55 | 51.90 ± 0.67 | 51.10 ± 3.75 | 97.21 ± 0.74 | |
Precision (%) | 88.16 ± 5.84 | 88.29 ± 6.21 | 94.70 ± 0.20 | 93.51 ± 0.51 | 95.20 ± 0.00 | |
F-measure (%) | 62.81 ± 6.60 | 65.61 ± 6.25 | 67.16 ± 0.55 | 64.47 ± 3.39 | 95.97 ± 0.37 | |
COAD | AC (%) | 73.91 ± 1.84 | 71.00 ± 1.33 | 66.99 ± 0.68 | 65.13 ± 0.03 | 92.01 ± 0.01 |
Recall (%) | 57.15 ± 6.54 | 51.28 ± 5.16 | 50.24 ± 2.95 | 47.15 ± 1.58 | 94.70 ± 3.61 | |
Precision (%) | 90.02 ± 3.29 | 87.60 ± 4.52 | 90.18 ± 1.88 | 89.94 ± 0.64 | 93.42 ± 0.00 | |
F-measure (%) | 68.25 ± 5.34 | 63.53 ± 5.11 | 63.79 ± 2.8 | 61.45 ± 1.58 | 92.22 ± 3.23 | |
HNSC | AC (%) | 66.75 ± 0.00 | 66.16 ± 0.01 | 66.39 ± 0.00 | 67.70 ± 0.03 | 86.18 ± 2.67 |
Recall (%) | 53.62 ± 2.19 | 51.18 ± 2.21 | 50.68 ± 2.20 | 55.23 ± 2.44 | 87.07 ± 5.57 | |
Precision (%) | 95.22 ± 0.38 | 94.30 ± 0.39 | 94.03 ± 0.39 | 95.53 ± 0.33 | 94.93 ± 0.05 | |
F-measure (%) | 67.85 ± 2.01 | 65.61 ± 1.96 | 65.09 ± 2.03 | 69.10 ± 2.20 | 88.63 ± 3.30 |
Methods | PAAD | ESCA | COAD | HNSC | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
N | HRS | ARS | N | HRS | ARS | N | HRS | ARS | N | HRS | ARS | |
jNMF | 374 | 84.93 | 4.89 | 168 | 76.15 | 5.19 | 142 | 103.7 | 7.02 | 175 | 168.23 | 17.75 |
iNMF | 375 | 84.93 | 4.84 | 171 | 76.15 | 5.31 | 144 | 103.7 | 7.71 | 175 | 102.98 | 16.65 |
iONMF | 375 | 100.56 | 5.19 | 170 | 76.15 | 5.36 | 141 | 165.65 | 8.64 | 175 | 168.23 | 17.52 |
MvNMF | 365 | 100.56 | 5.23 | 170 | 76.15 | 5.52 | 145 | 165.65 | 8.66 | 177 | 168.23 | 18.00 |
GMvNMF | 376 | 100.56 | 5.53 | 182 | 76.15 | 5.69 | 152 | 173.12 | 8.37 | 177 | 168.23 | 17.60 |
Gene ID | Gene ED | Related Go Annotations | Related Diseases | Relevance Score |
---|---|---|---|---|
672 | BRCA1 | RNA binding and ligase activity | Breast-Ovarian Cancer, Familial 1 and Pancreatic Cancer 4 | 173.12 |
675 | BRCA2 | protease binding and histone acetyltransferase activity | Fanconi Anemia, Complementation Group D1 and Breast Cancer | 135.87 |
1956 | EGFR | identical protein binding and protein kinase activity | Inflammatory Skin and Bowel Disease, Neonatal, 2 and Lung Cancer | 104.16 |
3569 | IL6 | signaling receptor binding and growth factor activity | Kaposi Sarcoma and Rheumatoid Arthritis, Systemic Juvenile | 58.74 |
4318 | MMP9 | identical protein binding and metalloendopeptidase activity | Metaphyseal Anadysplasia 2 and Metaphyseal Anadysplasia | 45.57 |
1495 | CTNNA1 | actin filament binding | Macular Dystrophy, Patterned, 2 and Butterfly-Shaped Pigment Dystrophy | 41.99 |
1950 | EGF | calcium ion binding and epidermal growth factor receptor binding | Hypomagnesemia 4, Renal and Familial Primary Hypomagnesemia with Normocalciuria and Normocalcemia | 40.84 |
5594 | MAPK1 | transferase activity, transferring phosphorus-containing groups and protein tyrosine kinase activity | Chromosome 22Q11.2 Deletion Syndrome, Distal and Pertussis | 39.23 |
2475 | MTOR | transferase activity, transferring phosphorus-containing groups and protein serine/threonine kinase activity | Focal Cortical Dysplasia, Type II and Smith-Kingsmore Syndrome | 34.07 |
887 | CCKBR | G-protein coupled receptor activity and 1-phosphatidylinositol-3-kinase regulator activity | Panic Disorder and Anxiety | 23.43 |
Gene ID | Gene ED | Related Go Annotations | Related Diseases | Paralog Gene |
---|---|---|---|---|
999 | CDH1 | calcium ion binding and protein phosphatase binding | Gastric Cancer, Hereditary Diffuse and Blepharocheilodontic Syndrome 1 | CDH3 |
1499 | CTNNB1 | DNA binding transcription factor activity and binding | Mental Retardation, Autosomal Dominant 19 and Pilomatrixoma | JUP |
1956 | EGFR | identical protein binding and protein kinase activity | Inflammatory Skin and Bowel Disease, Neonatal, 2 and Lung Cancer | ERBB4 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, N.; Gao, Y.-L.; Liu, J.-X.; Shang, J.; Zhu, R.; Dai, L.-Y. Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data. Genes 2018, 9, 586. https://doi.org/10.3390/genes9120586
Yu N, Gao Y-L, Liu J-X, Shang J, Zhu R, Dai L-Y. Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data. Genes. 2018; 9(12):586. https://doi.org/10.3390/genes9120586
Chicago/Turabian StyleYu, Na, Ying-Lian Gao, Jin-Xing Liu, Junliang Shang, Rong Zhu, and Ling-Yun Dai. 2018. "Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data" Genes 9, no. 12: 586. https://doi.org/10.3390/genes9120586
APA StyleYu, N., Gao, Y. -L., Liu, J. -X., Shang, J., Zhu, R., & Dai, L. -Y. (2018). Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data. Genes, 9(12), 586. https://doi.org/10.3390/genes9120586