Finding checkerboard patterns via fractional 0–1 programming

Andrew Trapp¹,
Oleg A. Prokopyev¹ &
Stanislav Busygin²

136 Accesses
5 Citations
Explore all metrics

Abstract

Biclustering is a data mining technique used to simultaneously partition the set of samples and the set of their attributes (features) into subsets (clusters). Samples and features clustered together are supposed to have a high relevance to each other. In this paper we provide a new mathematical programming formulation for unsupervised biclustering. The proposed model involves the solution of a fractional 0–1 programming problem. A linear-mixed 0–1 reformulation as well as two heuristic-based approaches are developed. Encouraging computational results on clustering real DNA microarray data sets are presented. In addition, we also discuss theoretical computational complexity issues related to biclustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

Baz M, Hunsaker B, Brooks JP, Gosavi A (2007) Automated tuning of optimization parameters. Technical Report, University of Pittsburgh
Ben-Dor A, Chor B, Karp R, Yakhini Z (2000b) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10:373–384
Article Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
MATH Google Scholar
Busygin S, Jacobsen G, Krämer E (2002) Double conjugated clustering applied to leukemia microarray data. In: SDM 2002 workshop on clustering high dimensional data and its applications
Busygin S, Prokopyev OA, Pardalos PM (2005) Feature selection for consistent biclustering via fractional 0–1 programming. J Combin Optim 10(1):7–21
Article MATH MathSciNet Google Scholar
Busygin S, Prokopyev OA, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35:2964–2987
Article MATH MathSciNet Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for molecular biology, pp 93–103
Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the 4th SIAM international conference on data mining (SDM), April 22–24, Lake Buena Vista, Florida
Coclustering Software. http://www.cs.utexas.edu/users/dml/Software/cocluster.html
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), August 26–29, San Francisco, CA
Dhillon IS, Guan Y (2003) Information theoretic clustering of sparse co-occurrence data. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), November 19–22, Melbourne, FL
Dhillon IS, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, August 24–27, Washington, DC
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, San Francisco
MATH Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Hansen P, Poggi de Aragão M, Ribeiro CC (1991) Hyperbolic 0–1 programming and query optimization in information retrieval. Math Program 52:256–263
Google Scholar
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129
Article Google Scholar
Hsiao L-L, Dangond F, Yoshida T, Hong R, Jensen RV, Misra J, Dillon W, Lee KF, Clark KE, Haverty P, Weng Z, Mutter G, Frosch MP, MacDonald ME, Milford EL, Crum CP, Bueno R, Pratt RE, Mahadevappa M, Warrington JA, Stephanopoulos G, Stephanopoulos G, Gullans SR (2001) A compendium of gene expression in normal human tissues. Physiol Genomics 7:97–104
Google Scholar
HuGE Index.org Website. http://www.hugeindex.org
Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703–716
Article Google Scholar
Madeira S, Oliviera AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1:24–45
Article Google Scholar
Martin R (1999) Large scale linear and integer optimization: a unified approach. Kluwer Academic, Boston
MATH Google Scholar
National Human Genome Research Institute (2008) DNA microarray fact sheet. http://www.genome.gov/page.cfm?pageID=10000533#2, last accessed January
Prokopyev OA, Huang H-X, Pardalos PM (2005a) On complexity of unconstrained hyperbolic 0–1 programming problems. Oper Res Lett 33:312–318
Article MATH MathSciNet Google Scholar
Prokopyev OA, Meneses C, Oliveira CAS, Pardalos PM (2005b) On multiple-ratio hyperbolic 0–1 programming problems. Pac J Optim 1(2):327–345
MATH MathSciNet Google Scholar
Saipe S (1975) Solving a (0,1) hyperbolic program by branch and bound. Naval Res Logist Q 22:497–515
Article MATH Google Scholar
Tawarmalani M, Ahmed S, Sahinidis N (2002) Global optimization of 0–1 hyperbolic programs. J Glob Optim 24:385–416
Article MATH MathSciNet Google Scholar
Wu T-H (1997) A note on a global approach for general 0–1 fractional programming. Eur J Oper Res 101:220–223
Article MATH Google Scholar
Yang J, Wang W, Wang H, Yu P (2002) δ-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th IEEE international conference on data engineering, pp 517–528

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Andrew Trapp & Oleg A. Prokopyev
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA
Stanislav Busygin

Authors

Andrew Trapp
View author publications
You can also search for this author in PubMed Google Scholar
Oleg A. Prokopyev
View author publications
You can also search for this author in PubMed Google Scholar
Stanislav Busygin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oleg A. Prokopyev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trapp, A., Prokopyev, O.A. & Busygin, S. Finding checkerboard patterns via fractional 0–1 programming. J Comb Optim 20, 1–26 (2010). https://doi.org/10.1007/s10878-008-9186-5

Download citation

Published: 18 September 2008
Issue Date: July 2010
DOI: https://doi.org/10.1007/s10878-008-9186-5

Finding checkerboard patterns via fractional 0–1 programming

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dominant Set Biclustering

A novel biclustering algorithm of binary microarray data: BiBinCons and BiBinAlter

A Binary Factor Graph Model for Biclustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Finding checkerboard patterns via fractional 0–1 programming

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dominant Set Biclustering

A novel biclustering algorithm of binary microarray data: BiBinCons and BiBinAlter

A Binary Factor Graph Model for Biclustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation