[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1557019.1557103acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Mining discrete patterns via binary matrix factorization

Published: 28 June 2009 Publication History

Abstract

Mining discrete patterns in binary data is important for subsampling, compression, and clustering. We consider rank-one binary matrix approximations that identify the dominant patterns of the data, while preserving its discrete property. A best approximation on such data has a minimum set of inconsistent entries, i.e., mismatches between the given binary data and the approximate matrix. Due to the hardness of the problem, previous accounts of such problems employ heuristics and the resulting approximation may be far away from the optimal one. In this paper, we show that the rank-one binary matrix approximation can be reformulated as a 0-1 integer linear program (ILP). However, the ILP formulation is computationally expensive even for small-size matrices. We propose a linear program (LP) relaxation, which is shown to achieve a guaranteed approximation error bound. We further extend the proposed formulations using the regularization technique, which is commonly employed to address overfitting. The LP formulation is restricted to medium-size matrices, due to the large number of variables involved for large matrices. Interestingly, we show that the proposed approximate formulation can be transformed into an instance of the minimum s-t cut problem, which can be solved efficiently by finding maximum flows. Our empirical study shows the efficiency of the proposed algorithm based on the maximum flow. Results also confirm the established theoretical bounds.

Supplementary Material

JPG File (p757-ye.jpg)
MP4 File (p757-ye.mp4)

References

[1]
Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE TPAMI, Sep 2004.
[2]
G. Dantzig. Maximization of a linear function of variables subject to linear inequalities. In Activity Analysis of Production and Allocation. Wiley, 1951.
[3]
A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988.
[4]
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, USA, 3rd edition, 1996.
[5]
I. Heller and C. B. Tompkins. An extension of a theorem of Dantzig's. Ann. of Math. Stud., no. 38, pages 247--254. 1956.
[6]
D. S. Hochbaum and A. Pathria. Forest harvesting and minimum cuts: a new approach to handling spatial constraints. Forest Science, 43(4):544--554, 1997.
[7]
A. J. Hoffman and J. B. Kruskal. Integral boundary points of convex polyhedra. Annals of Mathematics Studies, no. 38, pages 223--246. Princeton University Press, 1956.
[8]
I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.
[9]
J. A. Kelner and D. A. Spielman. A randomized polynomial-time simplex algorithm for linear programming. In ACM STOC 2006, pages 51--60, 2006.
[10]
L. G. Khachiyan. A polynomial algorithm in linear programming {in russian}. Doklady Akademii Nauk SSSR, 244:1093--1096, 1979. English translation: Soviet Mathematics Doklady 20 (1979), 191--194.
[11]
V. Klee and G. J. Minty. How good is the simplex algorithm? In Inequalities, III, pages 159--175. Academic Press, New York, 1972.
[12]
T. G. Kolda and D. P. O'Leary. A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Trans. Inf. Syst., 16(4):322--346, 1998.
[13]
M. Koyutürk and A. Grama. PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. In ACM SIGKDD, pages 147--156, 2003.
[14]
M. Koyutürk, A. Grama, and N. Ramakrishnan. Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE TKDE, 17(4):447--461, April 2005.
[15]
M. Koyuturk, A. Grama, and N. Ramakrishnan. Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis. ACM Trans. Math. Softw., 32(1):33--69, 2006.
[16]
K. Nishino, Y. Sato, and K. Ikeuchi. Eigen-texture method: appearance compression based on 3d model. In IEEE CVPR 1999, pages 618--624, 1999.
[17]
D. A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of ACM, 51(3):385--463, 2004.
[18]
M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71--86, 1991.

Cited By

View all
  • (2024)Binary matrix factorization via collaborative neurodynamic optimizationNeural Networks10.1016/j.neunet.2024.106348176(106348)Online publication date: Aug-2024
  • (2023)Fast (1 + ε)-approximation algorithms for binary matrix factorizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619863(34952-34977)Online publication date: 23-Jul-2023
  • (2022)Boolean and $\mathbb{F}_{p}$-Matrix Factorization: From Theory to Practice2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892947(1-8)Online publication date: 18-Jul-2022
  • Show More Cited By

Index Terms

  1. Mining discrete patterns via binary matrix factorization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. binary matrix factorization
    2. integer linear program
    3. maximum flow
    4. minimum cut
    5. rank-one
    6. regularization

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Binary matrix factorization via collaborative neurodynamic optimizationNeural Networks10.1016/j.neunet.2024.106348176(106348)Online publication date: Aug-2024
    • (2023)Fast (1 + ε)-approximation algorithms for binary matrix factorizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619863(34952-34977)Online publication date: 23-Jul-2023
    • (2022)Boolean and $\mathbb{F}_{p}$-Matrix Factorization: From Theory to Practice2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892947(1-8)Online publication date: 18-Jul-2022
    • (2022)The Bipartite Boolean Quadric PolytopeDiscrete Optimization10.1016/j.disopt.2021.10065744:P1Online publication date: 1-May-2022
    • (2022)The Bipartite QUBOThe Quadratic Unconstrained Binary Optimization Problem10.1007/978-3-031-04520-2_10(261-300)Online publication date: 9-Apr-2022
    • (2021)Binary matrix factorization on special purpose hardwarePLOS ONE10.1371/journal.pone.026125016:12(e0261250)Online publication date: 16-Dec-2021
    • (2020)Algorithms and Applications to Weighted Rank-one Binary Matrix FactorizationACM Transactions on Management Information Systems10.1145/338659911:2(1-33)Online publication date: 3-May-2020
    • (2020)Nearest Neighbors for Matrix Estimation Interpreted as Blind Regression for Latent Variable ModelIEEE Transactions on Information Theory10.1109/TIT.2019.295029966:3(1760-1784)Online publication date: Mar-2020
    • (2020)A divide-and-conquer algorithm for binary matrix completionLinear Algebra and its Applications10.1016/j.laa.2020.04.017Online publication date: Apr-2020
    • (2020)Parameterized low-rank binary matrix approximationData Mining and Knowledge Discovery10.1007/s10618-019-00669-5Online publication date: 2-Jan-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media