More Web Proxy on the site http://driver.im/

research-article

Algorithms and Applications to Weighted Rank-one Binary Matrix Factorization

Authors:

Jaideep Vaidya,

Vijayalakshmi Atluri,

Wei HuangAuthors Info & Claims

ACM Transactions on Management Information Systems (TMIS), Volume 11, Issue 2

Article No.: 7, Pages 1 - 33

https://doi.org/10.1145/3386599

Published: 03 May 2020 Publication History

Abstract

Many applications use data that are better represented in the binary matrix form, such as click-stream data, market basket data, document-term data, user-permission data in access control, and others. Matrix factorization methods have been widely used tools for the analysis of high-dimensional data, as they automatically extract sparse and meaningful features from data vectors. However, existing matrix factorization methods do not work well for the binary data. One crucial limitation is interpretability, as many matrix factorization methods decompose an input matrix into matrices with fractional or even negative components, which are hard to interpret in many real settings. Some matrix factorization methods, like binary matrix factorization, do limit decomposed matrices to binary values. However, these models are not flexible to accommodate some data analysis tasks, like trading off summary size with quality and discriminating different types of approximation errors. To address those issues, this article presents weighted rank-one binary matrix factorization, which is to approximate a binary matrix by the product of two binary vectors, with parameters controlling different types of approximation errors. By systematically running weighted rank-one binary matrix factorization, one can effectively perform various binary data analysis tasks, like compression, clustering, and pattern discovery. Theoretical properties on weighted rank-one binary matrix factorization are investigated and its connection to problems in other research domains are examined. As weighted rank-one binary matrix factorization in general is NP-hard, efficient and effective algorithms are presented. Extensive studies on applications of weighted rank-one binary matrix factorization are also conducted.

References

[1]

Harry C. Andrews and Calude L. Patterson. 1976. Singular value decomposition and digital image processing. IEEE Trans. Acoust. Speech Sign. Process. 24, 1 (1976), 26--28.

[2]

Xue Bai, Ram Gopal, Manuel Nunez, and Dmitry Zhdanov. 2012. On the prevention of fraud and privacy exposure in process information flow. INFORMS J. Comput. 24, 3 (2012), 416--432.

Digital Library

[3]

F. Barahona, M. Junger, and G. Reinelt. 1989. Experiments in quadratic 0-1 programming. Math. Program. 44, 2 (Jul. 1989), 127--137.

Digital Library

[4]

J. E. Beasley. 1998. Heuristic Algorithms for the Unconstrained Binary Quadratic Programming Problem. Technical Report. Management School, Imperial College.

[5]

Mandell Bellmore and H. Donald Ratliff. 1971. Set covering and involutory bases. Manage. Sci. 18, 3 (1971), 194--206.

Digital Library

[6]

M. W. Berry, S. T. Dumais, and G. W. O’Brien. 1995. Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 4 (1995), 573--595.

Digital Library

[7]

Ella Bingham, Ata Kaban, and Mikael Fortelius. 2009. The aspect Bernoulli model: Multiple causes of presences and absences. Pattern Anal. Appl. 12, 1 (Jan. 2009), 55--78.

Digital Library

[8]

P. S. Bradley, Usama M. Fayyad, and O. L. Mangasarian. 1999. Mathematical programming for data mining: Formulations and challenges. INFORMS J. Comput. 11, 3 (1999), 217--238.

[9]

Douglas Burdick, Manuel Calimlim, and Johannes Gehrke. 2001. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, 443--452.

[10]

Alberto Caprara, Matteo Fischetti, and Paolo Toth. 1999. A heuristic method for the set covering problem. Operat. Res. 47, 5 (1999), 730--743.

Digital Library

[11]

V. Chvatal. 1979. A greedy heuristic for the set-covering problem. Math. Operat. Res. 4, 3 (1979), 233--235.

Digital Library

[12]

M. Dawande, P. Keskinocak, and S. Tayur. 1997. On the biclique problem in bipartite graphs. GSIA working paper 1996-04, Carnegie-Mellon University.

[13]

Chrysanthos Dellarocas. 2003. The digitization of word of mouth: Promise and challenges of online feedback mechanisms. Manage. Sci. 49, 10 (2003), 1407--1424.

Digital Library

[14]

Alina Ene, William Horne, Nikola Milosavljevic, Prasad Rao, Robert Schreiber, and Robert E. Tarjan. 2008. Fast exact and heuristic methods for role minimization problems. In Proceedings of the ACM Symposium on Access Control Models and Technologies (SACMAT’08). 1--10.

[15]

David F. Ferraiolo, Ravi Sandhu, Serban Gavrila, D. Richard Kuhn, and Ramaswamy Chandramouli. 2001. Proposed NIST standard for role-based access control. ACM Trans. Inf. Syst. Secur. 4, 3 (Aug. 2001), 224--274.

Digital Library

[16]

Mario Frank, Andreas P. Streich, David Basin, and Joachim M. Buhmann. 2012. Multi-assignment clustering for Boolean data. J. Mach. Learn. Res. 13, 1 (Feb. 2012), 459--489.

Digital Library

[17]

Y. Gao and G. Church. 2005. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 21, 21 (2005), 3970--3975.

Digital Library

[18]

Floris Geerts, Bart Goethals, and Taneli Mielikainen. 2004. Tiling databases. In Discovery Science. Springer, 278--289.

[19]

Vladimir Gligorijević, Yannis Panagakis, and Stefanos Zafeiriou. 2018. Non-negative matrix factorizations for multiplex network analysis. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2018), 928--940.

Digital Library

[20]

Fred Glover, Gary A. Kochenberger, and Bahram Alidaee. 1998. Adaptive memory tabu search for binary quadratic programs. Manage. Sci. 44, 3 (1998), 336--345.

Digital Library

[21]

G. H. Golub and C. F. Van Loan. 1996. Matrix Computations (3rd ed.). The Johns Hopkins Universtiy Press, Baltimore, MD.

[22]

Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. 2017. Local search methods for k-means with outliers. Proc. VLDB Endow. 10, 7 (2017), 757--768.

Digital Library

[23]

Benjamin David Haeffele and René Vidal. 2019. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 14, 8 (2019).

[24]

Dorit S. Hochbaum, Nimrod Megiddo, Joseph Naor, and Arie Tamir. 1993. Tight bounds and 2-approximation algorithms for integer programs with two variables per inequality. Math. Program. 62 (1993), 69--83.

[25]

Hisao Ishibuchi and Yusuke Nojima. 2007. Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int. J. Approx. Reason. 44, 1 (2007), 4--31.

Digital Library

[26]

Ata Kabán and Ella Bingham. 2008. Factorisation and denoising of 0-1 data: A variational approach. Neurocomputing 71, 10--12 (Jun. 2008), 2291--2308.

Digital Library

[27]

Tamara G. Kolda and Dianne P. O’Leary. 2000. Algorithm 805: Computation and uses of the semidiscrete matrix decomposition. ACM Trans. Math. Softw. 26, 3 (2000), 415--435.

Digital Library

[28]

Mehmet Koyutürk and Ananth Grama. 2003. PROXIMUS: A framework for analyzing very high dimensional discrete-attributed datasets. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). 147--156.

Digital Library

[29]

Mehmet Koyuturk, Ananth Grama, and Naren Ramakrishnan. 2005. Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE Trans. Knowl. Data Eng. 17, 4 (2005), 447--461.

Digital Library

[30]

Martin Kuhlmann, Dalia Shohat, and Gerhard Schimpf. 2003. Role mining—Revealing business roles for security administration using data mining technology. In Proceedings of the 8th ACM Symposium on Access Control Models and Technologies (SACMAT’03). ACM, New York, NY, 179--186.

Digital Library

[31]

Tarald O. Kvalseth. 1987. Entropy and correlation: Some comments. IEEE Trans. Syst. Man Cybernet. 17, 3 (May 1987), 517--519.

[32]

Ken Lang. 1995. NewsWeeder: Learning to filter netnews. In Proceedings of the 12th International Machine Learning Conference.

[33]

D. D. Lee and H. S. Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401 (1999), 788--791.

[34]

Tao Li. 2005. A general model for clustering binary data. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, 188--197.

Digital Library

[35]

Edo Liberty, Ram Sriharsha, and Maxim Sviridenko. 2016. An algorithm for online k-means clustering. In Proceedings of the 18th Workshop on Algorithm Engineering and Experiments (ALENEX’16). SIAM, 81--89.

[36]

Haibing Lu, Jaideep Vaidya, and Vijayalakshmi Atluri. 2014. An optimization framework for role mining. J. Comput. Secur. 22, 1 (Jan. 2014), 1--31.

[37]

Haibing Lu, J. Vaidya, and V. Atluri. [n.d.] An optimization framework for role mining. (unpublished).

[38]

Haibing Lu, J. Vaidya, V. Atluri, and Yuan Hong. 2012. Constraint-aware role mining via extended Boolean matrix decomposition. IEEE Trans. Depend. Sec. Comput. 9, 5 (Sept.-Oct. 2012), 655--669.

[39]

Haibing Lu, Jaideep Vaidya, Vijayalakshmi Atluri, Heechang Shin, and Lili Jiang. 2011. Weighted rank-one binary matrix factorization. In Proceedings of the SIAM Internation Conference on Data Mining (SDM’11). 283--294.

[40]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY.

[41]

James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Technical Report. McKinsey Global Institute.

[42]

Peter Merz and Bernd Freisleben. 1999. Genetic algorithms for binary quadratic programming. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’99). Morgan Kauffman, 417--424.

[43]

Peter MerZ and Bernd Freisleben. 2002. Greedy and local search heuristics for unconstrained binary quadratic programming. J. Heurist. 8, 2 (Mar. 2002), 197--213.

[44]

Miettinen Pauli, Mielikainen Taneli, Gionis Aristides, Das Gautam, and Mannila Heikki. 2008. The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20, 10 (2008), 1348--1362.

Digital Library

[45]

René Peeters. 2003. The maximum edge biclique problem is NP-complete. Discr. Appl. Math. 131, 3 (2003), 651--654.

Digital Library

[46]

Andrew I. Schein, Lawrence K. Saul, and Lyle H. Ungar. 2003. A generalized linear model for principal component analysis of binary data. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics. 546431.

[47]

D. Sculley. 2010. Web-scale K-means Clustering. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 1177--1178.

Digital Library

[48]

Farial Shahnaz, Michael W. Berry, V. Paul Pauca, and Robert J. Plemmons. 2006. Document clustering using nonnegative matrix factorization. Inf. Process. Manage. 42, 2 (Mar. 2006), 373--386.

[49]

Bao-Hong Shen, Shuiwang Ji, and Jieping Ye. 2009. Mining discrete patterns via binary matrix factorization. In Proceedings of the ACM Knowledge Discovery and Data Mining Conference (KDD’09). 757--766.

Digital Library

[50]

Pradeep Shenoy, Jayant R. Haritsa, S. Sudarshan, Gaurav Bhalotia, Mayank Bawa, and Devavrat Shah. 2000. Turbo-charging vertical mining of large databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00). ACM, New York, NY, 22--33.

Digital Library

[51]

Tomáš Singliar and Miloš Hauskrecht. 2006. Noisy-OR component analysis and its application to link analysis. J. Mach. Learn. Res. 7 (Dec. 2006), 2189--2213.

[52]

Alexander Strehl and Joydeep Ghosh. 2003. Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J. Comput. 15, 2 (2003), 208--230.

Digital Library

[53]

Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining (1st ed.). Addison-Wesley Longman, Boston, MA.

[54]

Jaideep Vaidya, Vijayalakshmi Atluri, and Qi Guo. 2007. The role mining problem: Finding a minimal descriptive set of roles. In Proceedings of the ACM Symposium on Access Control Models and Technologies (SACMAT’07). 175--184.

Digital Library

[55]

Alfredo Vellido, José David Martín-Guerrero, and Paulo J. G. Lisboa. 2012. Making machine learning models interpretable. In Proceedings of the European Symposium on Artifical Neural Networks (ESANN’12), Vol. 12. Citeseer, 163--172.

[56]

Zhongyuan Zhang, Tao Li, Chris Ding, and Xiangsun Zhang. 2007. Binary matrix factorization with applications. In Proceedings of the 2007 7th IEEE International Conference on Data Mining. 391--400.

Digital Library

[57]

Zhong-Yuan Zhang, Tao Li, Chris Ding, Xian-Wen Ren, and Xiang-Sun Zhang. 2010. Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Discov. 20, 1 (2010), 28.

Digital Library

Cited By

Li HWang JZhang NZhang W(2024)Binary matrix factorization via collaborative neurodynamic optimizationNeural Networks10.1016/j.neunet.2024.106348176(106348)Online publication date: Aug-2024
https://doi.org/10.1016/j.neunet.2024.106348
Dantas PSabino da Silva WCordeiro LCarvalho C(2024)A comprehensive review of model compression techniques in machine learningApplied Intelligence10.1007/s10489-024-05747-w54:22(11804-11844)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s10489-024-05747-w
Wu YHuang Wayne WYu’d LTiand Y(2023)Group Recommendation Based on Heterogeneous Graph Algorithm for EBSNsIEEE Access10.1109/ACCESS.2022.322459811(1854-1866)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2022.3224598
Show More Cited By

Index Terms

Algorithms and Applications to Weighted Rank-one Binary Matrix Factorization
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Nonnegative matrix factorization via rank-one downdate
ICML '08: Proceedings of the 25th international conference on Machine learning

Nonnegative matrix factorization (NMF) was popularized as a tool for data mining by Lee and Seung in 1999. NMF attempts to approximate a matrix with nonnegative entries by a product of two low-rank matrices, also with nonnegative entries. We propose an ...
Quadratic nonnegative matrix factorization

In Nonnegative Matrix Factorization (NMF), a nonnegative matrix is approximated by a product of lower-rank factorizing matrices. Most NMF methods assume that each factorizing matrix appears only once in the approximation, thus the approximation is ...
Binary Matrix Factorization with Applications
ICDM '07: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

An interesting problem in Nonnegative Matrix Factorization (NMF) is to factorize the matrix X which is of some specific class, for example, binary matrix. In this paper, we extend the standard NMF to Binary Matrix Factorization (BMF for short): given a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Management Information Systems

ACM Transactions on Management Information Systems Volume 11, Issue 2

Research Commentary

June 2020

115 pages

ISSN:2158-656X

EISSN:2158-6578

DOI:10.1145/3398026

Editor:
Daniel Zeng
Chinese Academy of Sciences, China

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2020

Accepted: 01 March 2020

Revised: 01 February 2020

Received: 01 September 2017

Published in TMIS Volume 11, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSFC
National Institutes of Health
State Grid Corporation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
342
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li HWang JZhang NZhang W(2024)Binary matrix factorization via collaborative neurodynamic optimizationNeural Networks10.1016/j.neunet.2024.106348176(106348)Online publication date: Aug-2024
https://doi.org/10.1016/j.neunet.2024.106348
Dantas PSabino da Silva WCordeiro LCarvalho C(2024)A comprehensive review of model compression techniques in machine learningApplied Intelligence10.1007/s10489-024-05747-w54:22(11804-11844)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s10489-024-05747-w
Wu YHuang Wayne WYu’d LTiand Y(2023)Group Recommendation Based on Heterogeneous Graph Algorithm for EBSNsIEEE Access10.1109/ACCESS.2022.322459811(1854-1866)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2022.3224598
Crampton JEiben EGutin GKarapetyan DMajumdar DDietrich SChowdhury OTakabi D(2022)Generalized Noise Role MiningProceedings of the 27th ACM on Symposium on Access Control Models and Technologies10.1145/3532105.3535024(91-102)Online publication date: 7-Jun-2022
https://dl.acm.org/doi/10.1145/3532105.3535024
Zeng ZLin JPan WMing ZLu Z(2022)PAS: A Position-Aware Similarity Measurement for Sequential Recommendation2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892267(1-8)Online publication date: 18-Jul-2022
https://doi.org/10.1109/IJCNN55064.2022.9892267
Santoso SHarsani PHarahap ESunarya PAyu Sanjaya Y(2022)Enrichment Program using PROMETHEE for Decision Support Systems of Prospective Assistance Funds Disabilities2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA)10.1109/ICTIIA54654.2022.9935971(1-5)Online publication date: 23-Sep-2022
https://doi.org/10.1109/ICTIIA54654.2022.9935971
Wu JZhang PWang YShi J(2022)Integrated aviation model and metaheuristic algorithm for hub-and-spoke network design and airline fleet planningTransportation Research Part E: Logistics and Transportation Review10.1016/j.tre.2022.102755164(102755)Online publication date: Aug-2022
https://doi.org/10.1016/j.tre.2022.102755
Wang JDai ZZhang WShi J(2022)Operating room scheduling for non-operating room anesthesia with emergency uncertaintyAnnals of Operations Research10.1007/s10479-022-04870-6321:1-2(565-588)Online publication date: 8-Aug-2022
https://doi.org/10.1007/s10479-022-04870-6
Wang JDai ZChang AShi J(2022)Surgical scheduling by Fuzzy model considering inpatient beds shortage under uncertain surgery durationsAnnals of Operations Research10.1007/s10479-022-04645-z315:1(463-505)Online publication date: 22-Mar-2022
https://doi.org/10.1007/s10479-022-04645-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents