Abstract
The process of clustering groups together data points so that intra-cluster similarity is maximized while inter-cluster similarity is minimized. Support vector clustering (SVC) is a clustering approach that can identify arbitrarily shaped cluster boundaries. The execution time of SVC depends heavily on several factors: choice of the width of a kernel function that determines a nonlinear transformation of the input data, solution of a quadratic program, and the way that the output of the quadratic program is used to produce clusters. This paper builds on our prior SVC research in two ways. First, we propose a method for identifying a kernel width value in a region where our experiments suggest that clustering structure is changing significantly. This can form the starting point for efficient exploration of the space of kernel width values. Second, we offer a technique, called cone cluster labeling, that uses the output of the quadratic program to build clusters in a novel way that avoids an important deficiency present in previous methods. Our experimental results use both two-dimensional and high-dimensional data sets.
Similar content being viewed by others
Notes
MATLAB is a registered trademark of The MathWorks, Inc..
The feature space is a Hilbert space, so the Pythagorean theorem holds.
MATLAB is a registered trademark of The MathWorks, Inc.
References
Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. J Mach Learning Res 2:125–137
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, New York
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York
Dong J, Krzyz ak A, Suen C (2005) Fast SVM training algorithm with decomposition on very large data sets. IEEE Trans Pattern Anal Mach Intell 27(4):603–618
Estivill-Castro V (2002) Why so many clustering algorithms—a position paper. SIGKDD Explorations 4(1):65–75
Estivill-Castro V, Lee I (2000) Automatic clustering via boundary extraction for mining massive point-data sets. In: Proceedings of the 5th international conference on geocomputation
Estivill-Castro V, Lee I (2000) Hierarchical clustering based on spatial proximity using delaunay diagram. In: Proceedings of 9th international symposium on spatial data handling, pp 7a.26–7a.41
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining (KDD-96), Portland, pp 226–231
Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Oxford University Press, New York
Fasulo D (1999) An analysis of recent work on clustering algorithms. Technical report 01-03-02, University of Washington
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Harel D, Koren Y (2001) Clustering spatial data using random walks. In: Proceedings of knowledge discovery and data mining (KDD’01), pp 281–286
Horn D (2001) Clustering via Hilbert space. Physica A 302:70–79
Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inf Process Lett 76(200):175–181
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Jonyer I, Holder LB, Cook DJ (2001) Graph-based hierarchical conceptual clustering. Int J Artif Intell Tools 10(1–2):107–135
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surveys 31:264–323
Lee S, Daniels K (2004) Gaussian kernel width exploration in support vector clustering. Technical report 2004-009, University of Massachusetts Lowell, Department of Computer Science
Lee S, Daniels K (2005) Gaussian kernel width generator for support vector clustering. In: He M, Narasimhan G, Petoukhov S (eds) Proceedings, international conference on bioinformatics and its applications and advances in bioinformatics and its applications. Advances in bioinformatics and its applications. Series in mathematical biology and medicine, vol 8. World Scientific, pp 151–162
Lee S, Daniels K (2006) Cone cluster labeling for support vector clustering. In: Proceedings of 2006 SIAM conference on data mining, pp 484–488
Lee S (2005) Gaussian kernel width selection and fast cluster labeling for support vector clustering. Doctoral thesis and Technical report 2005-009, University of Massachusetts Lowell, Department of Computer Science
Lee J, Lee D (2005) An improved cluster labeling method for support vector clustering. IEEE Trans Pattern Anal Mach Intell 27:461–464
Mortenson M (2006) Geometric modeling, 3rd edn. Industrial Press Inc, New York
Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/∼mlearn/mlrepository.html
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208
Preparata FP, Shamos MI (1985) Computational geometry. Springer, New York
Vapnik VN (1995) The nature of statistical learning theory, 2nd edn. Springer, New York
Yang J, Estivill-Castro V, Chalup SK (2002) Support vector clustering through proximity graph modeling. In: Proceedings of 9th international conference on neural information processing (ICONIP’02), pp 898–903
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, SH., Daniels, K.M. Gaussian kernel width exploration and cone cluster labeling for support vector clustering. Pattern Anal Applic 15, 327–344 (2012). https://doi.org/10.1007/s10044-011-0244-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0244-8