Abstract
This research note focuses on a problem where the cluster sizes for two partitions of the same object set are assumed known; however, the actual assignments of objects to clusters are unknown for one or both partitions. The objective is to find a contingency table that produces maximum possible agreement between the two partitions, subject to constraints that the row and column marginal frequencies for the table correspond exactly to the cluster sizes for the partitions. This problem was described by H. Messatfa (Journal of Classification, 1992, pp. 5–15), who provided a heuristic procedure based on the linear transportation problem. We present an exact solution procedure using binary integer programming. We demonstrate that our proposed method efficiently obtains optimal solutions for problems of practical size.
Similar content being viewed by others
References
ALBATINEH, A.N., NIEWIADORMSKA-BUGAJ, M., and MIHALKO, D. (2006), “On Similarity Indices and Correction for Chance Agreement,” Journal of Classification, 23, 301–313.
COHEN, J. (1960), “A Coefficient of Agreement for Nominal Scales,” Educational and Psychological Measurement, 20, 37–46.
COSARES, S., and HOCHBAUM, D.S. (1994), “Strongly Polynomial Algorithms for the Quadratic Transportation Problem with a Fixed Number of Sources,” Mathematics of Operations Research, 19, 94–111.
FOWLKES. E.B., and MALLOWS, C.L. (1983), “A Method for Comparing Two Hierarchical Clusterings,” Journal of the American Statistical Association, 78, 553–569.
GOODMAN, L., and KRUSKAL, W. (1979), Measures of Association for Cross Classification, New York: Springer-Verlag.
HOCHBAUM, D.S., SHAMIR, R., and SHANTHIKUMAR, J.G. (1992), “A Polynomial Algorithm for an Integer Quadratic Non-Separable Transportation Problem,” Mathematical Programming, 55, 359–371.
HOFFMAN, A.J., and WIELANDT, H.W. (1953), “The Variation of the Spectrum of a Normal Matrix,” Duke Mathematical Journal, 20, 37–39.
HUBERT, L.J. (1987), Assignment Methods in Combinatorial Data Analysis, New York: Marcel Dekker.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218.
HUBERT, L.J., and BAKER, F.J. (1978),. “Evaluating the Conformity of Sociometric Measurements,” Psychometrika, 43, 31–41.
ILOG CPLEX 6.5 (1999), User’s Manual, Mountain View, CA: ILOG, Inc.
JACCARD, P. (1912), “The Distribution of the Flora in the Alpine Zone,” The New Phytologist, 11, 37–50.
KATZ, L., and POWELL, J.H. (1953), “A Proposed Index of the Conformity of One Sociometric Measurement to Another,” Psychometrika, 18, 249–256.
LIGHT, R.J. (1971), “Measures of Response Agreement for Categorical Data: Some Generalizations and Alternatives,” Psychological Bulletin, 66, 376–390.
MESSATFA, H. (1992), “An Algorithm to Maximize the Agreement Between Partitions,” Journal of Classification, 9, 5–15.
RAND, W.M. (1971), “Objective Criteria for the Evaluation of Clustering Methods,” Journal of the American Statistical Association, 66, 846–850.
STEINLEY, D. (2004), “Properties of the Hubert-Arabie Adjusted Rand Index,” Psychological Methods, 9, 386–396.
Author information
Authors and Affiliations
Corresponding author
Additional information
We would like to thank the Editor, Willem Heiser, and an anonymous reviewer for helpful comments that resulted in improvements of this article.
Rights and permissions
About this article
Cite this article
Brusco, M.J., Steinley, D. A Binary Integer Program to Maximize the Agreement Between Partitions. J Classif 25, 185–193 (2008). https://doi.org/10.1007/s00357-008-9013-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-008-9013-9