Abstract
The k-means algorithm is probably the most well-known and most popular clustering method in existence today. This work evaluates if a new, autonomous, kernel k-means approach for graph node clustering coupled with the modularity criterion can rival, e.g., the well-established Louvain method. We test the algorithm on social network datasets of various sizes and types. The new method estimates the optimal kernel or distance parameters as well as the natural number of clusters in the dataset based on modularity. Results indicate that this simple black-box algorithm manages to perform on par with the Louvain method given the same input.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006). doi:10.1007/3-540-28349-8_2
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, P10008 (2008)
Bolla, M.: Penalized versions of the Newman-Girvan modularity and their relation to normalized cuts and k-means clustering. Phys. Rev. E 84(1), 016108 (2011)
Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications, 2nd edn. Springer, Heidelberg (1997)
Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20, 172–188 (2008)
Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)
Chebotarev, P.: The graph bottleneck identity. Adv. Appl. Math. 47(3), 403–413 (2011)
Chebotarev, P.: The walk distances in graphs. Discrete Appl. Math. 160(10–11), 1484–1500 (2012)
Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Autom. Remote Control 58(9), 1505–1514 (1997)
Chebotarev, P., Shamis, E.: The forest metric for graph vertices. Electron. Notes Discrete Math. 11, 98–107 (2002)
Chung, F., Yau, S.T.: Coverings, heat kernels and spanning trees. J. Comb. 6, 163–184 (1998)
Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 263–274 (1995)
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
Daniel, W.W.: Applied Non-parametric Statistics. The Duxbury Advanced Series in Statistics and Decision Sciences. PWS-Kent Publishing Company, Boston (1990)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Devooght, R., Mantrach, A., Kivimaki, I., Bersini, H., Jaimes, A., Saerens, M.: Random walks based modularity: application to semi-supervised learning. In: Proceedings of the 23rd International World Wide Web Conference (WWW 2014), pp. 213–224 (2014)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Hoboken (1973)
Estrada, E.: The communicability distance in graphs. Linear Algebra Appl. 436(11), 4317–4328 (2012)
Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. U.S.A. 104(1), 36–41 (2007)
Fouss, F., Saerens, M., Shimbo, M.: Algorithms and Models for Network Data and Link Analysis. Cambridge University Press, Cambridge (2016)
Fouss, F., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of graph kernels on a collaborative recommendation task. In: Proceedings of the 6th International Conference on Data Mining (ICDM 2006), pp. 863–868 (2006)
Françoisse, K., Kivimäki, I., Mantrach, A., Rossi, F., Saerens, M.: A bag-of-paths framework for network data analysis. Neural Netw. 90, 90–111 (2017)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99, 7821–7826 (2002)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Ito, T., Shimbo, M., Kudo, T., Matsumoto, Y.: Application of kernels to link analysis. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 586–592 (2005)
Ivashkin, V., Chebotarev, P.: Do logarithmic proximity measures outperform plain ones in graph clustering? In: Proceedings of 6th International Conference on Network Analysis (2016)
Kandola, J., Cristianini, N., Shawe-Taylor, J.: Learning semantic similarity. In: Advances in Neural Information Processing Systems (NIPS 2002), vol. 15, pp. 657–664 (2002)
Katz, L.: A new status index derived from sociometric analysis. Psychmetrika 18(1), 39–43 (1953)
Kivimäki, I., Lebichot, B., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Phys. A: Stat. Mech. Appl. 393, 600–616 (2014)
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 315–322 (2002)
Krebs, V.: New political patterns (2008). http://www.orgnet.com/divided.html
Lancichinetti, A., Fortunato, S.: Limits of modularity maximization in community detection. Phys. Rev. E 84(6), 066122 (2011)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 46–110 (2008)
Lang, K.: 20 newsgroups dataset. http://bit.ly/lang-newsgroups
Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103, 8577–8582 (2006)
Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Reichardt, J., Bornholdt, S.: Detecting fuzzy community structures in complex networks with a Potts model. Phys. Rev. Lett. 93(21), 218701 (2004)
Saerens, M., Achbany, Y., Fouss, F., Yen, L.: Randomized shortest-path problems: two related models. Neural Comput. 21(8), 2363–2404 (2009)
Siegel, S.: Non-parametric Statistics for the Behavioral Sciences. McGraw-Hill, New York city (1956)
Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS, vol. 2777, pp. 144–158. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45167-9_12
Sommer, F., Fouss, F., Saerens, M.: Comparison of graph node distances on clustering tasks. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9886, pp. 192–201. Springer, Cham (2016). doi:10.1007/978-3-319-44778-0_23
von Luxburg, U., Radl, A., Hein, M.: Getting lost in space: large sample analysis of the commute distance. In: Proceedings of the 23th Neural Information Processing Systems conference (NIPS 2010), pp. 2622–2630 (2010)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering based on the commute-time kernel. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1037–1045. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71701-0_117
Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comparative study. Data Knowl. Eng. 68(3), 338–361 (2009)
Yen, L., Mantrach, A., Shimbo, M., Saerens, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 785–793 (2008)
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)
Acknowledgements
This work was supported in part by the FNRS and UCL (FSR) through a PhD scholarship. This work was also partially supported by the Immediate and the Brufence projects funded by InnovIris (Brussels Region). We thank these institutions for giving us the opportunity to conduct both fundamental and applied research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sommer, F., Fouss, F., Saerens, M. (2017). Modularity-Driven Kernel k-means for Community Detection. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science(), vol 10614. Springer, Cham. https://doi.org/10.1007/978-3-319-68612-7_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-68612-7_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68611-0
Online ISBN: 978-3-319-68612-7
eBook Packages: Computer ScienceComputer Science (R0)