Abstract
One of the most important distinctions that must be made in clustering research is the difference between models (or problems) and the methods for solving those problems. Nowhere is this more evident than with the evaluation of the popular affinity propagation algorithm (apcluster.m), which is a MATLAB implementation of a neural clustering method that has received significant attention in the biological sciences and other disciplines. Several authors have undertaken comparisons of apcluster.m with methods designed for models that fall within the class of uncapacitated facility location problems (UFLPs). These comparative models include the p-center (or K-center) model and, more importantly, the p-median (or K-median) model. The results across studies are conflicting and clouded by the fact that, although similar, the optimization model underlying apcluster.m is slightly different from the p-median model and appreciably different from the pcenter model. In this paper, we clarify that apcluster.m is actually a heuristic for a ‘maximization version’ of another model in the class of UFLPs, which is known as the simple plant location problem (SPLP). An exact method for the SPLP is described, and the apcluster.m program is compared to a fast heuristic procedure (sasplp.m) in both a simulation experiment and across numerous datasets from the literature. Although the exact method is the preferred approach when computationally feasible, both apcluster.m and sasplp.m are efficient and effective heuristic approaches, with the latter slightly outperforming the former in most instances.
Similar content being viewed by others
References
AGMON, S. (1954), “The Relaxation Method for Linear Inequalities”, Canadian Journal of Mathematics, 6, 382–392.
ALBA, E., and DOMINGUEZ, E. (2006), “Comparative Analysis of Modern Optimization Tools for the p-median Problem”, Statistics and Computing, 16, 251–260.
ALP, O., ERKUT, E., and DREZNER, Z. (2003), “An Efficient Genetic Algorithm for the p-Median Problem”, Annals of Operations Research, 122, 21–42.
ANDERSON, E. (1935), “The Irises of the Gaspé Peninsula”, Bulletin of the American Iris Society, 59, 2–5.
APELTSIN, L., MORRIS, J.H., BABBITT, P.C., and FERRIN, T.E. (2011), “Improving the Quality of Protein Similarity Network Clustering Algorithms Using the Network Edge Weight Distribution”, Bioinformatics, 27, 326–333.
AVELLA, P., SASSANO, A., and VASIL’EV, I. (2007), “Computational Study of Large-Scale p-Median Problems”, Mathematical Programming A, 109, 89–114.
BALINSKI, M.L. (1965), “Integer Programming: Methods, Uses, Computation”, Management Science, 12, 253–313.
BELTRAN, C., TADONKI, C., and VIAL, J. (2006), “Solving the p-Median Problem with a Semi-Lagrangian Relaxation”, Computational Optimization and Applications, 35(2), 239–260.
BERROU, C., GLAVIEUX, A., and THITIMAJSHIMA, P. (1993), “Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes (1)”, in Proceedings of the IEEE International Conference on Communications, ICC 93, Vol. 2, pp. 1064–1070.
BILDE, O., and KRARUP, J. (1977), “Sharp Lower Bounds and Efficient Algorithms for the Simple Plant Location Problem”, Annals of Discrete Mathematics, 1, 79–88.
BLANCHARD, S.J., ALOISE, D., and DESARBO, W.S. (2012), “The Heterogeneous pmedian Problem for Categorization Based Clustering”, Psychometrika, 77, 741–762.
BRUSCO, M.J., and CRADIT, J.D. (2001), “A Variable Selection Heuristic for K-means Clustering”, Psychometrika, 66, 249–270.
BRUSCO, M.J., and KÖHN, H.-F. (2008a), “Comment on ‘Clustering by Passing Messages Between Data Points’”, Science, 319, 726c.
BRUSCO, M.J., and KÖHN, H.-F. (2008b), “Optimal Partitioning of a Data Set Based on the p-median Model”, Psychometrika, 73, 89–105.
BRUSCO, M.J., and KÖHN, H.-F. (2009), “Exemplar-Based Clustering via Simulated Annealing”, Psychometrika, 74, 457–475.
BRUSCO, M.J., and STEINLEY, D. (2007), “A Comparison of Heuristic Procedures for Minimum Within-Cluster Sums of Squares Partitionin”, Psychometrika, 72, 583–600.
CHANG, J.T. (2012), “Deriving Transcriptional Programs and Functional Processes from Gene Expression Databases”, Bioinformatics, 28, 1122–1129.
CHEN, L., CHAN, T.-H., CHOYKE, P.L., HILLMAN, E.M.C., CHI, C.-Y., BHUJWALLA, Z.M., WANG, G., WANG, S.S., SZABO, Z., and WANG, Y. (2011), “CAM-CM: A Signal Deconvolution Tool for in vivo Dynamic Contrast-Enhanced Imaging of Complex Tissues”, Bioinformatics, 27, 2607–2609.
CHIYOSHI, F., and GALVÃO, R.D. (2000), “A Statistical Analysis of Simulated Annealing Applied to the p-Median Problem”, Annals of Operations Research, 96, 61–74.
CHRISTOFIDES, N., and BEASLEY, J. (1982), “A Tree Search Algorithm for the pmedian Problem”, European Journal of Operational Research, 10, 196–204.
CORNUEJOLS, G., FISHER, M.L., and NEMHAUSER, G.L. (1977), “Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms”, Management Science, 23, 789–810.
DUECK, D. (2009), “Affinity Propagation: Clustering Data by Passing Messages”, Unpublished Doctoral Dissertation, Graduate Department of Electrical and Computer Engineering, University of Toronto.
EFROYMSON, M.A., and RAY, T.L. (1966), “A Branch-and-Bound Algorithm for Plant Location”, Operations Research, 14, 361–375.
EL-SHAIEB, A.M. (1973), “A New Algorithm for Locating Sources Among Destinations”, Management Science, 20, 221–231.
ERLENKOTTER, D. (1978), “A Dual Procedure for Uncapacitated Facility Location”, Operations Research, 26, 992–1009.
FISHER, R.A. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7, 179–188.
FREY, B., and DUECK, D. (2007), “Clustering by Passing Messages Between Data Points”, Science, 315, 972–976.
FREY, B., and DUECK, D. (2008), “Response to Comment on ‘Clustering by Passing Messages Between Data Points’”, Science, 319, 726d.
GALVÃO, R.D. (1980), “A Dual-Bounded Algorithm for the p-median Problem”, Operations Research, 28, 1112–1121.
GALVÃO, R.D. (2004), “Uncapacitated Facility Location Problems: Contributions”, Pesquisa Operacional, 24, 7–38.
GALVÃO, R.D., and RAGGI, L.A. (1989), “A Method for Solving to Optimality Uncapacitated Location Problems”, Annals of Operations Research, 18, 225–244.
GRÖTSCHEL, M., and HOLLAND, O. (1991), “Solution of Large-Scale Symmetric Traveling Salesman Problems”, Mathematical Programming, 51, 141–202.
HAIR, J.F., ANDERSON, R.E., TATHAM, R.L., and BLACK, W.C. (1998), Multivariate Data Analysis (5th ed.), Upper Saddle River, NJ: Prentice Hall.
HAKIMI, S.L. (1964), “Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph”, Operations Research, 12, 450–459.
HAKIMI, S.L. (1965), “Optimum Distribution of Switching Centers in a Communication Network and Some Related Graph Theory Problems”, Operations Research, 123, 462–475.
HANJOUL, P., and PEETERS, D. (1985), “A Comparison of Two Dual-Based Procedures for Solving the p-Median Problem”, European Journal of Operational Research, 20, 387–396.
HANSEN, P., and JAUMARD, B. (1997), “Cluster Analysis and Mathematical Programming”, Mathematical Programming, 79, 191–215.
HANSEN, P., and MLADENOVIĆ, N. (1997), “Variable Neighborhood Search for the p-Median”, Location Science, 5, 207–226.
HANSEN, P., and MLADENOVIĆ, N. (2008), “Complement to a Comparative Analysis of Heuristics for the p-Median Problem”, Statistics and Computing, 18, 41–46.
HANSEN, P., MLADENOVIĆ, N., and PEREZ-BRITO, D. (2001), “Variable Neighborhood Decomposition Search”, Journal of Heuristics, 7, 335–350.
HELD, M., and KARP, R.M. (1970), “The Traveling Salesman Problem and Minimum Spanning Trees”, Operations Research, 18, 1138–1162.
HELD, M., WOLFE, P., and CROWDER, H.P. (1974), “Validation of Subgradient Optimization”, Mathematical Programming, 6, 62–88.
HEINZ, G., PETERSON, L.J., JOHNSON, R.W., and KERK, C.J. (2003), “Exploring Relationships in Body Dimensions”, Journal of Statistical Education, 11, www.amstat.org/publications/jse/v11n2/datasets.heinz.html.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193–218.
JARVINEN, P., RAJALA, J., and SINERVO, H. (1972), “A Branch-and-Bound Algorithm for Seeking the p-median”, Operations Research, 20, 173–178.
KARALETSOS, T., STEGLE, O., DREYER, D., WINN, J., and BORGWARDT, K.M. (2012), “ShapePheno: Unsupervised Extraction of Shape Phenotypes from biological Image Collections”, Bioinformatics, 28, 1001–1008.
KAUFMAN, L., and ROUSSEEUW, P.J. (2005), Finding Groups in Data: An Introduction to Cluster Analysis (2nd ed.), New York: Wiley.
KIDDLE, S.J., WINDRAM, O.P.F., MCHATTIE, S., MEAD, A., BEYNON, J., BUCHANAN-WOLLASTON, V., DENBY, K.J., and MUKHERJEE, S. (2010), “Temporal Clustering by Affinity Propagation Reveals Transcriptional Modules in Arabidopsis Thaliana”, Bioinformatics, 26, 355–362.
KLASTORIN, T. (1985), “The p-median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach”, Management Science, 31, 84–95.
KÖHN, H.-F., STEINLEY, D., and BRUSCO, M.J. (2010), “The p-median Model as a Tool for Clustering Psychological Data”, Psychological Methods, 15, 87–95.
KUEHN, A.A., and HAMBURGER, M.J. (1963), “A Heuristic Program for Locating Warehouses”, Management Science, 9, 643–666.
LEVANOVA, T., and LORESH, M.A. (2004), “Algorithms of Ant System and Simulated Annealing for the p-median Problem”, Automation and Remote Control, 65, 431–438.
LIN, S., and KERNIGHAN, B.W. (1973), “An Effective Heuristic Algorithm for the Traveling Salesman Problem”, Operations Research, 21, 498–516.
MARANZANA, F.E. (1964), “On the Location of Supply Points to Minimize Transportation Costs”, Operations Research, 12, 138–139.
MATHWORKS, INC. (2006), Using MATLAB (Version 7), Natick MA: The MathWorks, Inc.
MÉZARD, M., PARISI, G., and ZECCHINA, R. (2002), “Analytic and Algorithmic Solution of Random Satisfiability Problems”, Science, 297, 812–815.
MILLIGAN, G.W. (1980), “An Examination of the Effects of Six Types of Error Perturbation on Fifteen Clustering Algorithms”, Psychometrika, 45, 325–342.
MILLIGAN, G.W. (1996), “Clustering Validation: Results and Implications for Applied Analyses.” in Clustering and Classification, eds. P. Arabie, L.J. Hubert, and G. De Soete, River Edge NJ: World Scientific Publishing, pp. 321–375.
MILLIGAN, G.W., and COOPER, M.C. (1985), “An Examination of Procedures for Determining the Number of Clusters in a Data Set”, Psychometrika, 50, 159–179.
MILLIGAN, G.W., and COOPER, M.C. (1988), “A Study of the Standardization of Variables in Cluster Analysis”, Journal of Classification, 5, 181–204.
MLADENOVIĈ, N., BRIMBERG, J., HANSEN, P., and MORENO-PÉREZ, J.A. (2007), “The p-Median Problem: A Survey of Metaheuristic Approaches”, European Journal of Operational Research, 179, 927–939.
MOTZKIN, T., and SCHOENBERG, I.J. (1954), “The Relaxation Method for Linear Inequalities”, Canadian Journal of Mathematics, 6, 393–404.
MUKHERJEE, S., and HILL, S.M. (2011), “Network Clustering: Probing Biological Heterogeneity by Sparse Graphical Models”, Bioinformatics, 27, 994–1000.
MULVEY, J.M., and CROWDER, H.P. (1979), “Cluster Analysis: An Application of Lagrangian Relaxation”, Management Science, 25, 329–340.
NARULA, S.C., OGBU, U.I., and SAMUELSON, H.M. (1977), “An Algorithm for the pmedian Problem”, Operations Research, 25, 709–713.
RAO, M. R. (1971), “Cluster Analysis and Mathematical Programming”, Journal of the American Statistical Association, 66, 622–626.
REINELT, G. (2001), TSPLIB, http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95.
RESENDE, M.G.C., and WERNECK, R.F. (2004), “A Hybrid Heuristic for the p-median Problem”, Journal of Heuristics, 10, 59–88.
REVELLE, C.S., and SWAIN, R. (1970), “Central Facilities Location”, Geographical Analysis, 2, 30–42.
ROLLAND, E., SCHILLING, D.A. and CURRENT, J.R. (1996), “An Efficient Tabu Search Procedure for the p-Median Problem”, European Journal of Operational Research, 96, 329–342.
SPÄTH, H. (1980), Cluster Analysis Algorithms for Data Reduction and Classification of Objects, New York: Wiley.
SPELLMAN, P.T., SHERLOK, G., ZHANG, M.Q., IYER, V.R., ANDERS, K., EISEN, M.B., BROWN, P.O., BOTSTEIN, D., and FUTCHER, B. (1998), “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization”, Molecular Biology of the Cell, 9, 3273–3297.
STEINLEY, D. (2003), “Local Optima in K-means Clustering: What You Don’t Know May Hurt You”, Psychological Methods, 8, 294–304.
STEINLEY, D. (2004), “Properties of the Hubert-Arabie Adjusted Rand Index”, Psychological Methods, 9, 386–396.
STEINLEY, D. (2006), “Profiling Local Optima in K-means Clustering: Developing a Diagnostic Technique”, Psychological Methods, 11, 178–192.
STEINLEY, D., and BRUSCO, M.J. (2007), “Initializing K-means Batch Clustering: A Critical Analysis of Several Techniques”, Journal of Classification, 24, 99–121.
STEINLEY, D., and BRUSCO, M.J. (2008a), “A New Variable Weighting and Selection Procedure for K-means Cluster Analysis”, Multivariate Behavioral Research, 43, 77–108.
STEINLEY, D., and BRUSCO, M.J. (2008b), “Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures”, Psychometrika, 73, 125–144.
STEINLEY, D., and BRUSCO, M.J. (2011a), “Choosing the Number of Clusters in Kmeans Clustering”, Psychological Methods, 16, 285–297.
STEINLEY, D., and BRUSCO, M.J. (2011b), “Evaluating Mixture-Modeling for Clustering: Recommendations and Cautions”, Psychological Methods, 16, 63–79.
STEINLEY, D., and HENSON, R. (2005), “An Analytic Method for Generating Clusters with Known Overlap”, Journal of Classification, 22, 221–250.
TANG, D., ZHU, Q., and YANG, F. (2010), “A Poisson-Based Adaptive Affinity Propagation Clustering for SAGE Data”, Computational Biology and Chemistry, 34, 63–70.
TEITZ, M.B. and BART, P. (1968), “Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph”, Operations Research, 16, 955–961.
THIZY, J.-M., VAN WASSENHOVE, L., and KHUMAWALA, B. (1985), “Comparison of Exact and Approximate Methods of Solving the Uncapacitated Plant Location Problem”, Journal of Operations Management, 6, 23–34.
VINOD, H. (1969), “Integer Programming and the Theory of Grouping”, Journal of the American Statistical Association, 64, 506–517.
VLASBLOM, J. and WODAK, S.J. (2009), “Markov Clustering versus Affinity Propagation for the Partitioning of Protein Interaction Graphs”, BMC Bioinformatics, 10, 99.
WHITAKER, R. (1983), “A Fast Algorithm for the Greedy Interchange of Large-Scale Clustering and Median Location Problems”, INFOR, 21, 95–108.
WOŹNIAK, M., TIURYN, J., and DUTKOWSKI, J. (2010), “MODEVO: Exploring Modularity and Evolution of Protein Interaction Networks”, Bioinformatics, 26, 1790–1791.
ZHANG, J., LI, D., CHEN, H. and FANG, F. (2011), “Analysis of Activity in fMRI Data Using Affinity Propagation Clustering”, Computer Methods in Biomechanics and Biomedical Engineering, 14, 271–281.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brusco, M.J., Steinley, D. Affinity Propagation and Uncapacitated Facility Location Problems. J Classif 32, 443–480 (2015). https://doi.org/10.1007/s00357-015-9187-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-015-9187-x