Abstract
Semi-supervised clustering uses the limited background knowledge to aid unsupervised clustering algorithms. Recently, a kernel method for semi-supervised clustering has been introduced, which has been shown to outperform previous semi-supervised clustering approaches. However, the setting of the kernel’s parameter is left to manual tuning, and the chosen value can largely affect the quality of the results. Thus, the selection of kernel’s parameters remains a critical and open problem when only limited supervision, provided in terms of pairwise constraints, is available. In this paper, we derive a new optimization criterion to automatically determine the optimal parameter of an RBF kernel, directly from the data and the given constraints. Our approach integrates the constraints into the clustering objective function, and optimizes the parameter of a Gaussian kernel iteratively during the clustering process. Our experimental comparisons and results with simulated and real data clearly demonstrate the effectiveness and advantages of the proposed algorithm.
Chapter PDF
Similar content being viewed by others
References
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: International Conference on Machine Learning, pp. 11–18 (2003)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: International Conference on Knowledge Discovery and Data Mining (2004)
Besag, J.: On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B (Methodological) (1986)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and Metric Learning in semi-supervised clustering. In: International Conference on Machine Learning (2004)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Boykov, Y., Veksler, O., Zabih, R.: Markov Random fields with efficient approximations. In: IEEE Computer Vision and pattern Recognition Conference (1998)
Chapelle, O., Vapnik, V.: Choosing Mutiple Parameters for Support Vector Machines. Machine Learning 46(1), 131–159 (2002)
Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. TR2003-1892, Cornell University (2003)
Cristianini, N., Shawe-Taylor, J., Elisseeff, A.: On Kernel-Target Alignment. In: Neural Information Processing Systems (NIPS) (2001)
Huang, J., Yuen, P.C., Chen, W.S., Lai, J.H.: Kernel Subspace LDA with optimized Kernel Parameters on Face Recognition. In: The sixth IEEE International Conference on Automatic Face and Gesture Recognition (2004)
Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. In: The 40th IEEE Symposium on Foundation of Computer Science (1999)
Kulis, B., Basu, S., Dhillon, I., Moony, R.: Semi-supervised graph clustering: a kernel approach. In: International Conference on Machine Learning (2005)
Segal, E., Wang, H., Koller, D.: Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics (2003)
Theodoridis, S., Koutroubas, K.: Pattern Recognition. Academic Press, London (1999)
Vapnik., V.: The Nature of Statistical Learning Theory. Wiley, New York (1995)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-Means clustering with background knowledge. In: International Conference on Machine Learning, pp. 577–584 (2001)
Wang, W., Xu, Z., Lu, W., Zhang, X.: Determination of the spread parameter in the Gaussian Kernel for classification and regression. Neurocomputing 55(3), 645 (2002)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems, vol. 15 (2003)
Zhang, Y., Brady, M., Smith, S.: Hidden Markov random field model and segmentation of brain MR images. IEEE Transactions on Medical Imaging (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yan, B., Domeniconi, C. (2006). An Adaptive Kernel Method for Semi-supervised Clustering. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_49
Download citation
DOI: https://doi.org/10.1007/11871842_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)