Abstract
We define the problem of biclustering on heterogeneous data, that is, data of various types (binary, numeric, etc.). This problem has not yet been investigated in the biclustering literature. We propose a new method, HBC (Heterogeneous BiClustering), designed to extract biclusters from heterogeneous, large-scale, sparse data matrices. The goal of this method is to handle medical data gathered by hospitals (on patients, stays, acts, diagnoses, prescriptions, etc.) and to provide valuable insight on such data. HBC takes advantage of the data sparsity and uses a constructive greedy heuristic to build a large number of possibly overlapping biclusters. The proposed method is successfully compared with a standard biclustering algorithm on small-size numeric data. Experiments on real-life data sets further assert its scalability and efficiency.
C. Dhaenens—This work was partially supported by project ClinMine - ANR-13-TECS-0009.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bozdağ, D., Kumar, A.S., Catalyurek, U.V.: Comparative analysis of biclustering algorithms. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, pp. 265–274. ACM (2010)
Buluc, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: SPAA, pp. 233–244 (2009)
Busygin, S., Prokopyev, O., Pardalos, P.M.: Biclustering in data mining. Comput. Oper. Res. 35(9), 2964–2987 (2008)
Cheng, Y., Church, G.M.: Biclustering of expression data. ISMB 8, 93–103 (2000)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274. ACM (2001)
Henriques, R., Madeira, S.C.: BicNET: flexible module discovery in large-scale biological networks using biclustering. Algorithms Mol. Biol. 11(1), 1 (2016)
Jacques, J., Taillard, J., Delerue, D., Dhaenens, C., Jourdan, L.: Conception of a dominance-based multi-objective local search in the context of classification rule mining in large and imbalanced data sets. Appl. Soft Comput. 34, 705–720 (2015)
Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)
Tanay, A., Sharan, R., Kupiec, M., Shamir, R.: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc. Natl. Acad. Sci. U.S.A. 101(9), 2981–2986 (2004)
van Uitert, M., Meuleman, W., Wessels, L.: Biclustering sparse binary genomic data. J. Comput. Biol. 15(10), 1329–1345 (2008)
Yang, J., Wang, W., Wang, H., Yu, P.: \(\delta \)-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering, pp. 517–528. IEEE (2002)
Zhou, J., Khokhar, A.: ParRescue: scalable parallel algorithm and implementation for biclustering over large distributed datasets. In: 26th IEEE International Conference on Distributed Computing Systems, ICDCS 2006, pp. 21–21. IEEE (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Vandromme, M., Jacques, J., Taillard, J., Jourdan, L., Dhaenens, C. (2016). A Scalable Biclustering Method for Heterogeneous Medical Data. In: Pardalos, P., Conca, P., Giuffrida, G., Nicosia, G. (eds) Machine Learning, Optimization, and Big Data. MOD 2016. Lecture Notes in Computer Science(), vol 10122. Springer, Cham. https://doi.org/10.1007/978-3-319-51469-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-51469-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51468-0
Online ISBN: 978-3-319-51469-7
eBook Packages: Computer ScienceComputer Science (R0)