Abstract
In the biological domain, it is more and more common to apply several high-throughput technologies to the same set of samples. We propose a Covariate-Related Structure Extraction approach (CRSE) that explores relationships between different types of high-dimensional molecular data (views) in the context of sample covariate information from the experimental design, for example class membership. Real-world data analysis with an initial pipeline implementation of CRSE shows that the proposed approach successfully captures cross-view structures underlying multiple biologically relevant classification schemes, allowing to predict class labels to unseen examples from either view or across views.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdi, H., Williams, L.J., Valentin, D.: Multiple factor analysis: principal component analysis for multitable and multiblock data sets. Wiley Interdisc. Rev. Comput. Stat. 5(2), 149–179 (2013)
Acar, E., Gurdeniz, G., Rasmussen, M., Rago, D., Dragsted, L.O., Bro, R.: Coupled matrix factorization with sparse factors to identify potential biomarkers in metabolomics. In: IEEE 12th International Conference on Data Mining Workshops, pp. 1–8 (2012)
Acar, E., Papalexakis, E.E., Rasmussen, M.A., Lawaetz, A.J., Nilsson, M., Bro, R.: Structure-revealing data fusion. BMC Bioinf. 15(1), 239 (2014)
Barkauskas, D.: FTICRMS: Programs for Analyzing Fourier Transform-Ion Cyclotron Resonance Mass Spectrometry Data. R package version 8 (2012)
Boulesteix, A.-L., Strimmer, K.: Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings Bioinform. 8(1), 32–44 (2007)
Choi, S.W., Lee, I.-B.: Multiblock PLS-based localized process diagnosis. J. Process Control 15(3), 295–306 (2005)
Duda, R.O., Hart, P.E., et al.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)
Eslami, A., Qannari, E., Kohler, A., Bougeard, S.: Multivariate analysis of multiblock and multigroup data. Chemometr. Intell. Lab. Syst. 133, 63–69 (2014)
Geladi, P., Kowalski, B.R.: Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)
González, I., Déjean, S., Martin, P.G., Baccini, A., et al.: CCA: an R package to extend canonical correlation analysis. J. Stat. Softw. 23(12), 1–14 (2008)
Guo, S., Ruan, Q., Wang, Z., Liu, S.: Facial expression recognition using spectral supervised canonical correlation analysis. J. Comput. Inf. Sci. Eng. 29(5), 907–924 (2013)
Haenlein, M., Kaplan, A.M.: A beginner’s guide to partial least squares analysis. Underst. Stat. 3(4), 283–297 (2004)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Horst, P.: Generalized canonical correlations and their applications to experimental data. J. Clin. Psychol. 17(4), 331–347 (1961)
Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)
Huopaniemi, I., Suvitaival, T., Nikkilä, J., Orešič, M., Kaski, S.: Multivariate multi-way analysis of multi-source data. Bioinformatics 26(12), i391–i398 (2010)
Jamali, M., Ester, M.: A matrix factorization technique with trust propagation for recommendation in social networks. In: Proceedings of the 4th ACM Conference on Recommender Systems, pp. 135–142. ACM (2010)
Jiang, M., Cui, P., Liu, R., Yang, Q., Wang, F., Zhu, W., Yang, S.: Social contextual recommendation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 45–54. ACM (2012)
Klami, A., Virtanen, S., Kaski, S.: Bayesian canonical correlation analysis. J. Mach. Learn. Res. 14(1), 965–1003 (2013)
Krzanowski, W.: Principal component analysis in the presence of group structure. Appl. Stat. 33, 164–168 (1984)
Lanckriet, G.R., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004)
Lee, C.M., Mudaliar, M.A., Haggart, D., Wolf, C.R., Miele, G., Vass, J.K., Higham, D.J., Crowther, D.: Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology. PLoS ONE 7(12), e48238 (2012)
Luo, Y., Tao, D., Ramamohanarao, K., Xu, C., Wen, Y.: Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans. Knowl. Data Eng. 27(11), 3111–3124 (2015)
Pinheiro, J.C., Bates, D.M.: Basic concepts and examples. Mixed-effects Models in S and S-Plus, pp. 3–56 (2000)
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015)
Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning, pp. 515–521. Morgan Kaufmann (1998)
Smilde, A.K., Westerhuis, J.A., de Jong, S.: A framework for sequential multiblock component methods. J. Chemom. 17(6), 323–337 (2003)
Smyth, G.K.: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3(1), 1–25 (2004). doi:10.2202/1544-6115.1027. ISSN (Online) 1544-6115
Sweeney, K.T., McLoone, S.F., Ward, T.E.: The use of ensemble empirical mode decomposition with canonical correlation analysis as a novel artifact removal technique. IEEE Trans. Biomed. Eng. 60(1), 97–105 (2013)
Tenenhaus, M., Vinzi, V.E.: PLS regression, PLS path modeling and generalized procrustean analysis: a combined approach for multiblock analysis. J. Chemom. 19(3), 145–153 (2005)
Vía, J., Santamaría, I., Pérez, J.: A learning algorithm for adaptive canonical correlation analysis of several data sets. Neural Netw. 20(1), 139–152 (2007)
Vinod, H.D.: Canonical ridge and econometrics of joint production. J. Econometrics 4(2), 147–166 (1976)
Wendorf, C.A.: Primer on multiple regression coding: common forms and the additional case of repeated contrasts. Underst. Stat. 3(1), 47–57 (2004)
Westerhuis, J.A., Kourti, T., MacGregor, J.F.: Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 12(5), 301–321 (1998)
Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)
Witten, D.M., Tibshirani, R.J.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8(1), 1–27 (2009)
Wold, S., Hellberg, S., Lundstedt, T., Sjöström, M.: PLS modeling with latent variables in two or more dimensions. Partial Least Squares Model Building: Theory and Application (1987)
Wold, S., Kettaneh, N., Tjessem, K.: Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection. J. Chemom. 10(5–6), 463–482 (1996)
Zhou, G., Cichocki, A., Zhang, Y., Mandic, D.P.: Group component analysis for multiblock data: common and individual feature extraction. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2015). doi:10.1109/TNNLS.2015.2487364
Acknowledgement
We thank Ming Jin, Jin Zhao, Basem Kanawati, Philippe Schmitt-Kopplin, Andreas Albert, J. Barbro Winkler, and Anton R. Schäffner for kindly providing the datasets used in this study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhou, L., Georgii, E., Plant, C., Böhm, C. (2016). Covariate-Related Structure Extraction from Paired Data. In: Renda, M., Bursa, M., Holzinger, A., Khuri, S. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2016. Lecture Notes in Computer Science(), vol 9832. Springer, Cham. https://doi.org/10.1007/978-3-319-43949-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-43949-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43948-8
Online ISBN: 978-3-319-43949-5
eBook Packages: Computer ScienceComputer Science (R0)