Abstract
Biclustering is a data mining technique used to simultaneously partition the set of samples and the set of their attributes (features) into subsets (clusters). Samples and features clustered together are supposed to have a high relevance to each other. In this paper we provide a new mathematical programming formulation for unsupervised biclustering. The proposed model involves the solution of a fractional 0–1 programming problem. A linear-mixed 0–1 reformulation as well as two heuristic-based approaches are developed. Encouraging computational results on clustering real DNA microarray data sets are presented. In addition, we also discuss theoretical computational complexity issues related to biclustering.
Similar content being viewed by others
References
Baz M, Hunsaker B, Brooks JP, Gosavi A (2007) Automated tuning of optimization parameters. Technical Report, University of Pittsburgh
Ben-Dor A, Chor B, Karp R, Yakhini Z (2000b) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10:373–384
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Busygin S, Jacobsen G, Krämer E (2002) Double conjugated clustering applied to leukemia microarray data. In: SDM 2002 workshop on clustering high dimensional data and its applications
Busygin S, Prokopyev OA, Pardalos PM (2005) Feature selection for consistent biclustering via fractional 0–1 programming. J Combin Optim 10(1):7–21
Busygin S, Prokopyev OA, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35:2964–2987
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for molecular biology, pp 93–103
Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the 4th SIAM international conference on data mining (SDM), April 22–24, Lake Buena Vista, Florida
Coclustering Software. http://www.cs.utexas.edu/users/dml/Software/cocluster.html
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), August 26–29, San Francisco, CA
Dhillon IS, Guan Y (2003) Information theoretic clustering of sparse co-occurrence data. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), November 19–22, Melbourne, FL
Dhillon IS, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, August 24–27, Washington, DC
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, San Francisco
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Hansen P, Poggi de Aragão M, Ribeiro CC (1991) Hyperbolic 0–1 programming and query optimization in information retrieval. Math Program 52:256–263
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129
Hsiao L-L, Dangond F, Yoshida T, Hong R, Jensen RV, Misra J, Dillon W, Lee KF, Clark KE, Haverty P, Weng Z, Mutter G, Frosch MP, MacDonald ME, Milford EL, Crum CP, Bueno R, Pratt RE, Mahadevappa M, Warrington JA, Stephanopoulos G, Stephanopoulos G, Gullans SR (2001) A compendium of gene expression in normal human tissues. Physiol Genomics 7:97–104
HuGE Index.org Website. http://www.hugeindex.org
Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703–716
Madeira S, Oliviera AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1:24–45
Martin R (1999) Large scale linear and integer optimization: a unified approach. Kluwer Academic, Boston
National Human Genome Research Institute (2008) DNA microarray fact sheet. http://www.genome.gov/page.cfm?pageID=10000533#2, last accessed January
Prokopyev OA, Huang H-X, Pardalos PM (2005a) On complexity of unconstrained hyperbolic 0–1 programming problems. Oper Res Lett 33:312–318
Prokopyev OA, Meneses C, Oliveira CAS, Pardalos PM (2005b) On multiple-ratio hyperbolic 0–1 programming problems. Pac J Optim 1(2):327–345
Saipe S (1975) Solving a (0,1) hyperbolic program by branch and bound. Naval Res Logist Q 22:497–515
Tawarmalani M, Ahmed S, Sahinidis N (2002) Global optimization of 0–1 hyperbolic programs. J Glob Optim 24:385–416
Wu T-H (1997) A note on a global approach for general 0–1 fractional programming. Eur J Oper Res 101:220–223
Yang J, Wang W, Wang H, Yu P (2002) δ-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th IEEE international conference on data engineering, pp 517–528
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Trapp, A., Prokopyev, O.A. & Busygin, S. Finding checkerboard patterns via fractional 0–1 programming. J Comb Optim 20, 1–26 (2010). https://doi.org/10.1007/s10878-008-9186-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-008-9186-5