More Web Proxy on the site http://driver.im/

research-article

Multi-assignment clustering for Boolean data

Authors:

Andreas P. Streich,

Joachim M. BuhmannAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 969 - 976

https://doi.org/10.1145/1553374.1553498

Published: 14 June 2009 Publication History

Abstract

Conventional clustering methods typically assume that each data item belongs to a single cluster. This assumption does not hold in general. In order to overcome this limitation, we propose a generative method for clustering vectorial data, where each object can be assigned to multiple clusters. Using a deterministic annealing scheme, our method decomposes the observed data into the contributions of individual clusters and infers their parameters.

Experiments on synthetic Boolean data show that our method achieves higher accuracy in the source parameter estimation and superior cluster stability compared to state-of-the-art approaches. We also apply our method to an important problem in computer security known as role mining. Experiments on real-world access control data show performance gains in generalization to new employees against other multi-assignment methods. In challenging situations with high noise levels, our approach maintains its good performance, while alternative state-of-the-art techniques lack robustness.

References

[1]

Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Int Conf on Mngm of Data, 22, 207--216.

Digital Library

[2]

Buhmann, J., & Küühnel, H. (1993). Vector quantization with complexity costs. IEEE Trans on Information Theory (pp. 1133--1145).

Digital Library

[3]

Colantonio, A., Di Pietro, R., & Ocello, A. (2008). A cost-driven approach to role engineering. Symp on Appl Comp (pp. 2129--2136).

Digital Library

[4]

Ene, A., Horne, W., Milosavljevic, N., Rao, P., Schreiber, R., & Tarjan, R. E. (2008). Fast exact and heuristic methods for role minimization problems. Symp on Access Control Models and Technologies (pp. 1--10).

Digital Library

[5]

Ferraiolo, D. F., Sandhu, R., Gavrila, S., Kuhn, D. R., & Chandramouli, R. (2001). Proposed NIST standard for role-based access control. ACM Trans Inf Syst Secur, 4, 224--274.

Digital Library

[6]

Frank, M., Basin, D., & Buhmann, J. M. (2008). A class of probabilistic models for role engineering. Conf on Computer and Communications Security (pp. 299--310).

Digital Library

[7]

Griffiths, T. L., & Ghahramani, Z. (2005). Infinite latent feature models and the indian buffet process. Conf on Neural Information Processing Systems (pp. 475--482).

[8]

Kabán, A., & Bingham, E. (2008). Factorisation and denoising of 0--1 data: A variational approach. Neurocomputing, 71, 2291--2308.

Digital Library

[9]

Kemp, C., Tenenbaum, J. B., Griffths, T. L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. Nat Conf on Artificial Intelligence (pp. 763--770).

[10]

Kuhlmann, M., Shohat, D., & Schimpf, G. (2003). Role mining - revealing business roles for security administration using data mining technology. Symp on Access Control Models and Technologies (pp. 179--186).

Digital Library

[11]

Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M. (2004). Stability-based validation of clustering solutions. Neural Computation, 16, 1299--1323.

Digital Library

[12]

Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2006). The Discrete Basis Problem. Proc of Principles and Practice of Knowledge Discovery in Databases (pp. 335--346).

[13]

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.

Digital Library

[14]

Rose, K., Gurewitz, E., & Fox, G. (1992). Vector quantization by deterministic annealing. IEEE Trans on Information Theory (pp. 2210--2239).

Digital Library

[15]

Streich, A. P., & Buhmann, J. M. (2008). Classification of multi-labeled data: A generative approach. Europ Conf on Machine Learning (pp. 390--405).

[16]

Vaidya, J., Atluri, V., & Guo, Q. (2007). The Role Mining Problem: Finding a minimal descriptive set of roles. Symp on Access Control Models and Technologies (pp. 175--184).

Digital Library

[17]

Wood, F. (2006). A non-parametric bayesian method for inferring hidden causes. Conf on Uncertainty in Artificial Intelligence (pp. 536--543).

Cited By

Zhao JAn Q(2024)DEA-based internal validity index for clusteringJournal of the Operational Research Society10.1080/01605682.2024.2348621(1-14)Online publication date: 14-May-2024
https://doi.org/10.1080/01605682.2024.2348621
Hou YTran CShin WAl Hasan MXiong L(2022)META-CODE: Community Detection via Exploratory Learning in Topologically Unknown NetworksProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557639(4034-4038)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557639
Yang CLiu LLiu MWang ZZhang CHan J(2020)Graph Clustering with Embedding Propagation2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378031(858-867)Online publication date: 10-Dec-2020
https://doi.org/10.1109/BigData50022.2020.9378031
Show More Cited By

Index Terms

Recommendations

Multi-assignment clustering for boolean data

We propose a probabilistic model for clustering Boolean data where an object can be simultaneously assigned to multiple clusters. By explicitly modeling the underlying generative process that combines the individual source emissions, highly structured ...
Multi-assignment clustering for boolean data

We propose a probabilistic model for clustering Boolean data where an object can be simultaneously assigned to multiple clusters. By explicitly modeling the underlying generative process that combines the individual source emissions, highly structured ...
On cluster tree for nested and multi-density data clustering

Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Commission for Technology and Innovation

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
328
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao JAn Q(2024)DEA-based internal validity index for clusteringJournal of the Operational Research Society10.1080/01605682.2024.2348621(1-14)Online publication date: 14-May-2024
https://doi.org/10.1080/01605682.2024.2348621
Hou YTran CShin WAl Hasan MXiong L(2022)META-CODE: Community Detection via Exploratory Learning in Topologically Unknown NetworksProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557639(4034-4038)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557639
Yang CLiu LLiu MWang ZZhang CHan J(2020)Graph Clustering with Embedding Propagation2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378031(858-867)Online publication date: 10-Dec-2020
https://doi.org/10.1109/BigData50022.2020.9378031
Cotrini CCorinzia LWeghorn TBasin DCavallaro LKinder JWang XKatz J(2019)The Next 700 Policy MinersProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3354196(95-112)Online publication date: 6-Nov-2019
https://dl.acm.org/doi/10.1145/3319535.3354196
Prabhakaran SVogt J(2019)Bayesian Clustering for HIV1 Protease Inhibitor Contact MapsArtificial Intelligence in Medicine10.1007/978-3-030-21642-9_35(281-285)Online publication date: 30-May-2019
https://doi.org/10.1007/978-3-030-21642-9_35
Painsky ARosset SFeder M(2018)Linear Independent Component Analysis Over Finite Fields: Algorithms and BoundsIEEE Transactions on Signal Processing10.1109/TSP.2018.287200666:22(5875-5886)Online publication date: 5-Oct-2018
https://dl.acm.org/doi/10.1109/TSP.2018.2872006
Rukat THolmes CTitsias MYau C(2017)Bayesian Boolean matrix factorisationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305988(2969-2978)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305890.3305988
Alonso JParedes RRosso P(2017)Feature representation for social circles detection using MACNeural Computing and Applications10.1007/s00521-016-2222-y28:9(2395-2402)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s00521-016-2222-y
Chesnokov V(2017)Overlapping Community Detection in Social Networks with Node Attributes by Neighborhood InfluenceModels, Algorithms, and Technologies for Network Analysis10.1007/978-3-319-56829-4_14(187-203)Online publication date: 24-Jun-2017
https://doi.org/10.1007/978-3-319-56829-4_14
Dong ZKane KCamp L(2016)Detection of Rogue Certificates from Trusted Certificate Authorities Using Deep Neural NetworksACM Transactions on Privacy and Security10.1145/297559119:2(1-31)Online publication date: 17-Sep-2016
https://dl.acm.org/doi/10.1145/2975591
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents