[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1553374.1553498acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Multi-assignment clustering for Boolean data

Published: 14 June 2009 Publication History

Abstract

Conventional clustering methods typically assume that each data item belongs to a single cluster. This assumption does not hold in general. In order to overcome this limitation, we propose a generative method for clustering vectorial data, where each object can be assigned to multiple clusters. Using a deterministic annealing scheme, our method decomposes the observed data into the contributions of individual clusters and infers their parameters.
Experiments on synthetic Boolean data show that our method achieves higher accuracy in the source parameter estimation and superior cluster stability compared to state-of-the-art approaches. We also apply our method to an important problem in computer security known as role mining. Experiments on real-world access control data show performance gains in generalization to new employees against other multi-assignment methods. In challenging situations with high noise levels, our approach maintains its good performance, while alternative state-of-the-art techniques lack robustness.

References

[1]
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Int Conf on Mngm of Data, 22, 207--216.
[2]
Buhmann, J., & Küühnel, H. (1993). Vector quantization with complexity costs. IEEE Trans on Information Theory (pp. 1133--1145).
[3]
Colantonio, A., Di Pietro, R., & Ocello, A. (2008). A cost-driven approach to role engineering. Symp on Appl Comp (pp. 2129--2136).
[4]
Ene, A., Horne, W., Milosavljevic, N., Rao, P., Schreiber, R., & Tarjan, R. E. (2008). Fast exact and heuristic methods for role minimization problems. Symp on Access Control Models and Technologies (pp. 1--10).
[5]
Ferraiolo, D. F., Sandhu, R., Gavrila, S., Kuhn, D. R., & Chandramouli, R. (2001). Proposed NIST standard for role-based access control. ACM Trans Inf Syst Secur, 4, 224--274.
[6]
Frank, M., Basin, D., & Buhmann, J. M. (2008). A class of probabilistic models for role engineering. Conf on Computer and Communications Security (pp. 299--310).
[7]
Griffiths, T. L., & Ghahramani, Z. (2005). Infinite latent feature models and the indian buffet process. Conf on Neural Information Processing Systems (pp. 475--482).
[8]
Kabán, A., & Bingham, E. (2008). Factorisation and denoising of 0--1 data: A variational approach. Neurocomputing, 71, 2291--2308.
[9]
Kemp, C., Tenenbaum, J. B., Griffths, T. L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. Nat Conf on Artificial Intelligence (pp. 763--770).
[10]
Kuhlmann, M., Shohat, D., & Schimpf, G. (2003). Role mining - revealing business roles for security administration using data mining technology. Symp on Access Control Models and Technologies (pp. 179--186).
[11]
Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M. (2004). Stability-based validation of clustering solutions. Neural Computation, 16, 1299--1323.
[12]
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2006). The Discrete Basis Problem. Proc of Principles and Practice of Knowledge Discovery in Databases (pp. 335--346).
[13]
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.
[14]
Rose, K., Gurewitz, E., & Fox, G. (1992). Vector quantization by deterministic annealing. IEEE Trans on Information Theory (pp. 2210--2239).
[15]
Streich, A. P., & Buhmann, J. M. (2008). Classification of multi-labeled data: A generative approach. Europ Conf on Machine Learning (pp. 390--405).
[16]
Vaidya, J., Atluri, V., & Guo, Q. (2007). The Role Mining Problem: Finding a minimal descriptive set of roles. Symp on Access Control Models and Technologies (pp. 175--184).
[17]
Wood, F. (2006). A non-parametric bayesian method for inferring hidden causes. Conf on Uncertainty in Artificial Intelligence (pp. 536--543).

Cited By

View all
  • (2024)DEA-based internal validity index for clusteringJournal of the Operational Research Society10.1080/01605682.2024.2348621(1-14)Online publication date: 14-May-2024
  • (2022)META-CODE: Community Detection via Exploratory Learning in Topologically Unknown NetworksProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557639(4034-4038)Online publication date: 17-Oct-2022
  • (2020)Graph Clustering with Embedding Propagation2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378031(858-867)Online publication date: 10-Dec-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DEA-based internal validity index for clusteringJournal of the Operational Research Society10.1080/01605682.2024.2348621(1-14)Online publication date: 14-May-2024
  • (2022)META-CODE: Community Detection via Exploratory Learning in Topologically Unknown NetworksProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557639(4034-4038)Online publication date: 17-Oct-2022
  • (2020)Graph Clustering with Embedding Propagation2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378031(858-867)Online publication date: 10-Dec-2020
  • (2019)The Next 700 Policy MinersProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3354196(95-112)Online publication date: 6-Nov-2019
  • (2019)Bayesian Clustering for HIV1 Protease Inhibitor Contact MapsArtificial Intelligence in Medicine10.1007/978-3-030-21642-9_35(281-285)Online publication date: 30-May-2019
  • (2018)Linear Independent Component Analysis Over Finite Fields: Algorithms and BoundsIEEE Transactions on Signal Processing10.1109/TSP.2018.287200666:22(5875-5886)Online publication date: 5-Oct-2018
  • (2017)Bayesian Boolean matrix factorisationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305988(2969-2978)Online publication date: 6-Aug-2017
  • (2017)Feature representation for social circles detection using MACNeural Computing and Applications10.1007/s00521-016-2222-y28:9(2395-2402)Online publication date: 1-Sep-2017
  • (2017)Overlapping Community Detection in Social Networks with Node Attributes by Neighborhood InfluenceModels, Algorithms, and Technologies for Network Analysis10.1007/978-3-319-56829-4_14(187-203)Online publication date: 24-Jun-2017
  • (2016)Detection of Rogue Certificates from Trusted Certificate Authorities Using Deep Neural NetworksACM Transactions on Privacy and Security10.1145/297559119:2(1-31)Online publication date: 17-Sep-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media