More Web Proxy on the site http://driver.im/

research-article

Mining discrete patterns via binary matrix factorization

Authors:

Jieping YeAuthors Info & Claims

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 757 - 766

https://doi.org/10.1145/1557019.1557103

Published: 28 June 2009 Publication History

Abstract

Mining discrete patterns in binary data is important for subsampling, compression, and clustering. We consider rank-one binary matrix approximations that identify the dominant patterns of the data, while preserving its discrete property. A best approximation on such data has a minimum set of inconsistent entries, i.e., mismatches between the given binary data and the approximate matrix. Due to the hardness of the problem, previous accounts of such problems employ heuristics and the resulting approximation may be far away from the optimal one. In this paper, we show that the rank-one binary matrix approximation can be reformulated as a 0-1 integer linear program (ILP). However, the ILP formulation is computationally expensive even for small-size matrices. We propose a linear program (LP) relaxation, which is shown to achieve a guaranteed approximation error bound. We further extend the proposed formulations using the regularization technique, which is commonly employed to address overfitting. The LP formulation is restricted to medium-size matrices, due to the large number of variables involved for large matrices. Interestingly, we show that the proposed approximate formulation can be transformed into an instance of the minimum s-t cut problem, which can be solved efficiently by finding maximum flows. Our empirical study shows the efficiency of the proposed algorithm based on the maximum flow. Results also confirm the established theoretical bounds.

Supplementary Material

JPG File (p757-ye.jpg)

Download
10.81 KB

MP4 File (p757-ye.mp4)

Download
153.26 MB

References

[1]

Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE TPAMI, Sep 2004.

Digital Library

[2]

G. Dantzig. Maximization of a linear function of variables subject to linear inequalities. In Activity Analysis of Production and Allocation. Wiley, 1951.

[3]

A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.

Digital Library

[4]

G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, USA, 3rd edition, 1996.

Digital Library

[5]

I. Heller and C. B. Tompkins. An extension of a theorem of Dantzig's. Ann. of Math. Stud., no. 38, pages 247--254. 1956.

[6]

D. S. Hochbaum and A. Pathria. Forest harvesting and minimum cuts: a new approach to handling spatial constraints. Forest Science, 43(4):544--554, 1997.

[7]

A. J. Hoffman and J. B. Kruskal. Integral boundary points of convex polyhedra. Annals of Mathematics Studies, no. 38, pages 223--246. Princeton University Press, 1956.

[8]

I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.

[9]

J. A. Kelner and D. A. Spielman. A randomized polynomial-time simplex algorithm for linear programming. In ACM STOC 2006, pages 51--60, 2006.

Digital Library

[10]

L. G. Khachiyan. A polynomial algorithm in linear programming {in russian}. Doklady Akademii Nauk SSSR, 244:1093--1096, 1979. English translation: Soviet Mathematics Doklady 20 (1979), 191--194.

[11]

V. Klee and G. J. Minty. How good is the simplex algorithm? In Inequalities, III, pages 159--175. Academic Press, New York, 1972.

[12]

T. G. Kolda and D. P. O'Leary. A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Trans. Inf. Syst., 16(4):322--346, 1998.

Digital Library

[13]

M. Koyutürk and A. Grama. PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. In ACM SIGKDD, pages 147--156, 2003.

Digital Library

[14]

M. Koyutürk, A. Grama, and N. Ramakrishnan. Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE TKDE, 17(4):447--461, April 2005.

Digital Library

[15]

M. Koyuturk, A. Grama, and N. Ramakrishnan. Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis. ACM Trans. Math. Softw., 32(1):33--69, 2006.

Digital Library

[16]

K. Nishino, Y. Sato, and K. Ikeuchi. Eigen-texture method: appearance compression based on 3d model. In IEEE CVPR 1999, pages 618--624, 1999.

[17]

D. A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of ACM, 51(3):385--463, 2004.

Digital Library

[18]

M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71--86, 1991.

Digital Library

Cited By

Li HWang JZhang NZhang W(2024)Binary matrix factorization via collaborative neurodynamic optimizationNeural Networks10.1016/j.neunet.2024.106348176(106348)Online publication date: Aug-2024
https://doi.org/10.1016/j.neunet.2024.106348
Velingker AVötsch MWoodruff DZhou SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Fast (1 + ε)-approximation algorithms for binary matrix factorizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619863(34952-34977)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619863
Fomin FPanolan FPatil ATanveer A(2022)Boolean and $\mathbb{F}_{p}$-Matrix Factorization: From Theory to Practice2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892947(1-8)Online publication date: 18-Jul-2022
https://doi.org/10.1109/IJCNN55064.2022.9892947
Show More Cited By

Index Terms

Mining discrete patterns via binary matrix factorization
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Approximation Schemes for Low-rank Binary Matrix Approximation Problems
Special Issue on Soda'18 and Regular Papers

We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constraints. The new constrained clustering problem generalizes a number of problems and by solving it, we obtain the ...
Mining Discrete Patterns via Binary Matrix Factorization
ICDMW '13: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops

In general, binary matrix factorization (BMF) refers to the problem of finding two binary matrices of low rank such that the difference between their matrix product and a given binary matrix is minimal. BMF is an important tool in mining discrete ...
Binary Matrix Factorization and Completion via Integer Programming
Binary matrix factorization is an essential tool for identifying discrete patterns in binary data. In this paper, we consider the rank-k binary matrix factorization problem (k-BMF) under Boolean arithmetic: we are given an n × m binary matrix X with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

June 2009

1426 pages

ISBN:9781605584959

DOI:10.1145/1557019

General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD09

Sponsor:

KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

June 28 - July 1, 2009

Paris, France

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
712
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li HWang JZhang NZhang W(2024)Binary matrix factorization via collaborative neurodynamic optimizationNeural Networks10.1016/j.neunet.2024.106348176(106348)Online publication date: Aug-2024
https://doi.org/10.1016/j.neunet.2024.106348
Velingker AVötsch MWoodruff DZhou SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Fast (1 + ε)-approximation algorithms for binary matrix factorizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619863(34952-34977)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619863
Fomin FPanolan FPatil ATanveer A(2022)Boolean and $\mathbb{F}_{p}$-Matrix Factorization: From Theory to Practice2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892947(1-8)Online publication date: 18-Jul-2022
https://doi.org/10.1109/IJCNN55064.2022.9892947
Sripratak PPunnen AStephen T(2022)The Bipartite Boolean Quadric PolytopeDiscrete Optimization10.1016/j.disopt.2021.10065744:P1Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1016/j.disopt.2021.100657
Punnen A(2022)The Bipartite QUBOThe Quadratic Unconstrained Binary Optimization Problem10.1007/978-3-031-04520-2_10(261-300)Online publication date: 9-Apr-2022
https://doi.org/10.1007/978-3-031-04520-2_10
Malik OUshijima-Mwesigwa HRoy AMandal AGhosh I(2021)Binary matrix factorization on special purpose hardwarePLOS ONE10.1371/journal.pone.026125016:12(e0261250)Online publication date: 16-Dec-2021
https://doi.org/10.1371/journal.pone.0261250
Lu HChen XShi JVaidya JAtluri VHong YHuang W(2020)Algorithms and Applications to Weighted Rank-one Binary Matrix FactorizationACM Transactions on Management Information Systems10.1145/338659911:2(1-33)Online publication date: 3-May-2020
https://dl.acm.org/doi/10.1145/3386599
Li YShah DSong DYu C(2020)Nearest Neighbors for Matrix Estimation Interpreted as Blind Regression for Latent Variable ModelIEEE Transactions on Information Theory10.1109/TIT.2019.295029966:3(1760-1784)Online publication date: Mar-2020
https://doi.org/10.1109/TIT.2019.2950299
Beckerleg MThompson A(2020)A divide-and-conquer algorithm for binary matrix completionLinear Algebra and its Applications10.1016/j.laa.2020.04.017Online publication date: Apr-2020
https://doi.org/10.1016/j.laa.2020.04.017
Fomin FGolovach PPanolan F(2020)Parameterized low-rank binary matrix approximationData Mining and Knowledge Discovery10.1007/s10618-019-00669-5Online publication date: 2-Jan-2020
https://doi.org/10.1007/s10618-019-00669-5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents