More Web Proxy on the site http://driver.im/

article

Improved search strategies and extensions to k-medoids-based clustering algorithms

Authors:

John F. Roddick,

Jeng-Shyang PanAuthors Info & Claims

International Journal of Business Intelligence and Data Mining, Volume 3, Issue 2

Pages 212 - 231

https://doi.org/10.1504/IJBIDM.2008.020520

Published: 01 September 2008 Publication History

Abstract

In this paper two categories of improvements are suggested that can be applied to most k-medoids-based algorithms - conceptual/algorithmic improvements, and implementational improvements. These include the revisiting of the accepted cases for swap comparison and the application of partial distance searching and previous medoid indexing to clustering. Various hybrids are then applied to a number of k-medoids-based algorithms and the method is shown to be generally applicable. Experimental results on both artificial and real datasets demonstrate that when applied to CLARANS the number of distance calculations can be reduced by up to 98%.

References

[1]

Babu, T.R. and Murty, M.N. (2001) 'Comparison of genetic algorithm based prototype selection schemes', Pattern Recognition, Vol. 34, pp. 523-525.

[2]

Baek, S.J., Jeon, B.K. and Sung, K.M. (1997) 'A fast encoding algorithm for vector quantization', IEEE Signal Processing Letters, Vol. 4, No. 2, pp. 325-327.

[3]

Bei, C.D. and Gray, R.M. (1985) 'A improvement of the minimum distortion encoding algorithm for vector quantization', IEEE Transactions on Communication COM, Vol. 33, No. 10, pp. 1132-1133.

[4]

Berkhin, R. (2002) Survey of Clustering Data Mining Techniques, Technical Report, Accrue Software Inc.

[5]

Ceglar, A., Morrall, R. and Roddick, J. (2006) 'Mining medical administrative data - the PKB system', in Ackermann, M., Soares, C. and Guidemann, B. (Eds.): ECML PKDD 2006 Workshop on Practical Data Mining: Applications, Experiences and Challenges, Berlin, pp. 59-66.

[6]

Cetin, A. and Weerackody, V. (1988) 'Design of vector quantizers using simulated annealing', IEEE Transactions on Circuits and Systems, Vol. 35, No. 12, pp. 1550-1555.

[7]

Chen, S.H. and Pan, J.S. (1989) 'Fast search algorithm for vq-based recognition of isolated word', IEE Proc. I, Vol. 136, No. 6, pp. 391-396.

[8]

Chu, S.C., Roddick, J.F. and Pan, J.S. (2001) 'A comparative study and extensions to k-medoids algorithms', in Li, D. (Ed.): Fifth International Conference on Optimization: Techniques and Applications (ICOTA 2001), Vol. 4, Hong Kong, pp. 1708-1717.

[9]

Chu, S.C., Roddick, J.F. and Pan, J.S. (2002) 'An efficient k-medoids-based algorithm using previous medoid index, triangular inequality elimination criteria and partial distance search', in Kambayashi, Y., Winiwarter, W. and Arikawa, M. (Eds.): 4th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2002), Springer, Aix-en-Provence, France, pp. 63-72.

[10]

Enright, A.J. and Ouzounis, C.A. (2000) 'Generage: a robust algorithm for sequence clustering and domain detection', Bioinformatics, Vol. 16, No. 5, pp. 451-457.

[11]

Ester, M., Kriegel, H-P., Sander, J. and Xu, X. (1996) 'A density-based algorithm for discovering clusters in large spatial databases with noise', in Simoudis, E., Han, J. and Fayyad, U. (Eds.): Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, Portland, Oregon, pp. 226-231.

[12]

Estivill-Castro, V. and Lee, I. (2000) 'AUTOCLUST: automatic clustering via boundary extraction for massive point-data sets', Fifth International Conference on Geocomputation, Springer, pp. 23-25.

[13]

Fisher, D., Xu, L., Carnes, J., Reich, Y., Fenves, S., Chen, J., Shiavi, R., Biswas, G. and Weinberg, J. (1993) 'Applying ai clustering to engineering tasks', IEEE Intelligent Systems, Vol. 8, No. 6, pp. 51-60.

Digital Library

[14]

Gamal, A., Hemachandra, L., Shperling, I. and Wei, V. (1987) 'Using simulated annealing to design good codes', IEEE Transactions on Information Theory, Vol. 33, No. 1, pp. 116-123.

Digital Library

[15]

Ganti, V., Gehrke, J. and Ramakrishnan, R. (1999) 'CACTUS - clustering categorical data using summaries', International Conference on Knowledge Discovery and Data Mining, San Diego, USA, pp. 73-83.

Digital Library

[16]

Gersho, A. and Gray, R.M. (1992) Vector Quantization and Signal Compression, Kluwer, Boston, MA.

[17]

Greene, D., Tsymbal, A., Bolshakova, N. and Cunningham, P. (2004) 'Ensemble clustering in medical diagnostics', in Long, R., Antani, S., Lee, D., Nutter, B. and Zhang, M. (Eds.): 17th IEEE Symposium on Computer-Based Medical Systems, CBMS 2004, Bethesda, Maryland, pp. 576-581.

[18]

Guan, L. and Kamel, M. (1992) 'Equal-average hyperplane partitioning method for vector quantization of image data', Pattern Recognition Letters, Vol. 13, No. 10, pp. 693-699.

Digital Library

[19]

Guha, S., Rastogi, R. and Shim, K. (1998) 'Cure: an efficient clustering algorithm for large databases', ACM SIGMOD International Conference on the Management of Data, Seattle, WA, USA, pp. 73-84.

Digital Library

[20]

Han, J., Kamber, M. and Tung, A.K.H. (2001) 'Spatial clustering methods in data mining: a survey', Geographic Data Mining and Knowledge Discovery, Taylor and Francis, London, Research Monographs in Geographic Information Sytems.

[21]

Huang, H.C., Pan, J.S., Lu, Z.M., Sun, S.H. and Hang, H.M. (2001) 'Vector quantization based on genetic simulated annealing', Signal Processing, Vol. 81, No. 7, pp. 1513-1523.

Digital Library

[22]

Jain, A.K., Murty, M. and Flynn, P. (1999) 'Data clustering: a review', ACM Computing Surveys, Vol. 31, No. 3, pp. 264-323.

Digital Library

[23]

Jolion, J., Meer, P. and Bataouche, S. (1991) 'Robust clustering with applications in computer vision', IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, pp. 791-802.

Digital Library

[24]

Karypis, G., Han, E-H. and Kumar, V. (1999) 'CHAMELEON: a hierarchical clustering algorithm using dynamic modeling', Computer, Vol. 32, pp. 32-68.

Digital Library

[25]

Kaufman, L. and Rousseeuw, P. (1990) Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley and Sons, New York.

[26]

Kirkpatrick, S., Gelatt, J.C.D. and Vecchi, M.P. (1983) 'Optimization by simulated annealing', Science, Vol. 220, No. 4598, pp. 671-680.

[27]

Krishnapuram, R., Joshi, A. and Yi, L. (1999) 'A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering', IEEE International Fuzzy Systems Conference, Seoul, Korea, pp. 1281-1286.

[28]

Lecompte, D., Kaufman, L. and Rousseeuw, P. (1986) 'Hierarchical cluster analysis of emotional concerns and personality characteristics in a freshman population', Acta Psychiatrica Belgica, Vol. 86, pp. 324-333.

[29]

Lee, C.H. and Chen, L.H. (1994) 'Fast closest codeword search algorithm for vector quantization', IEE Proc. Vision Image and Signal Processing, Vol. 141, No. 3, pp. 143-148.

[30]

Lucasius, C.B., Dane, A.D. and Kateman, G. (1993) 'On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison', Analytica Chimica Acta, pp. 647-669.

[31]

MacQueen, J. (1967) 'Some methods for classification and analysis of multivariate observations', 5th Berkeley Symposium on Mathematics, Statistics and Probability, Vol. 1, pp. 281-296.

[32]

Markl, V., Ramsak, F. and Bayer, R. (1999) 'Improving OLAP performance by multidimensional hierarchical clustering', Proc. IDEAS Conference, Montreal, Canada, pp. 165-177.

[33]

Mobasher, B., Cooley, R. and Srivastava, J. (1999) 'Creating adaptive web sites through usage-based clustering of urls', Knowledge and Data Engineering Workshop, Chicago, IL, USA, pp. 19-25.

[34]

Ng, R. (1996) 'Spatial data mining: Discovering knowledge of clusters from maps', ACM SIGMOD Workshop on Research Issue on Data Mining and Knowledge Discovery, Montreal, Canada.

[35]

Ng, R. and Han, J. (1994) 'Efficient and effective clustering methods for spatial data mining', in Bocca, J.B., Jarke, M. and Zaniolo, C. (Eds.): Twentieth International Conference on Very Large Data Bases, Morgan Kaufmann, Santiago, Chile, pp. 144-155.

[36]

Ng, R. and Han, J. (2002) 'CLARANS: a method for clustering objects for spatical data mining', IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 5, pp. 1003-1016.

Digital Library

[37]

Pan, J., McInnes, F. and Jack, M. (1996a) 'Bound for minkowski metric or quadratic metric applied to vq codeword search', IEE Proceedings on Vision, Image and Signal Processing, Vol. 143, No. 1, pp. 67-71.

[38]

Pan, J.S., Lu, Z.M. and Sun, S.H. (2000) 'A fast codeword search algorithm for image coding based on mean-variance pyramids of codewords', IEE Electronics Letters, Vol. 36, No. 3, pp. 210-211.

[39]

Pan, J.S., McInnes, F.R. and Jack, M.A. (1996b) 'Fast clustering algorithm for vector quantization', Pattern Recognition, Vol. 29, No. 3, pp. 511-518.

[40]

Pan, J.S., Wang, J., Fang, H.L. and Chen, C. (1998) 'A modifed tabu search approach for texture segmentation using 2 - d non-separable wavelet frames', 10th International Conference on Tools with Artificial Intelligence, pp. 474-481.

[41]

Sander, J., Ester, M., Kriegel, H-P. and Xu, X. (1998) 'Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications', Data Mining and Knowledge Discovery, Vol. 2, No. 2, pp. 169-194.

Digital Library

[42]

Sato, M., Sato, Y. and Jain, L. (1997) Fuzzy Clustering Models and Applications, Physica-Verlag, Germany.

[43]

Soleymani, M.R. and Morgera, S.D. (1987) 'A heigh-speed algorithm for vector quantization', IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1946-1948.

[44]

Vecchi, M.P. and Kirkpatrick, S. (1983) 'Global wiring by simulated annealing', IEEE Transactions on Computer-Aided Design, Vol. 2, No. 4, pp. 215-222.

Digital Library

[45]

Vidal, E. (1986) 'An algorithm for finding nearest neighbours in (approximately) constant average time', Pattern Recognition Letters, Vol. 4, pp. 145-157.

[46]

von Luxburg, U. and Ben-David, S. (2005) 'Towards a statistical theory of clustering', PASCAL Workshop on Statistics and Optimization of Clustering, London, UK.

[47]

Wang, X. and Hamilton, H. (2003) 'Dbrs: a density-based spatial clustering method with random sampling', 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2003), Vol. 2637 of LNCS, Springer, Seoul, Korea, pp. 563-575.

[48]

Xu, R. (2005) 'Survey of clustering algorithms', IEEE Transactions on Neural Networks, Vol. 16, No. 3, pp. 645-678.

Digital Library

[49]

Zamir, O., Etzion, O., Mandani, O. and Karp, R. (1997) 'Fast and intuitive clustering of web documents', in Heckerman, D., Mannila, H., Pregibon, D. and Uthurusamy, R. (Eds.): Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, Newport Beach, CA, USA, pp. 287-290.

[50]

Zhang, T., Ramakrishnan, R. and Livny, M. (1996) 'BIRCH: an efficient clustering method for very large databases', ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, pp. 103-114.

Digital Library

Cited By

Barioni MRazente HMarcelino ATraina ATraina C(2014)Open issues for partitioning clustering methodsWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11274:3(161-177)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1002/widm.1127

Index Terms

Improved search strategies and extensions to k-medoids-based clustering algorithms

Recommendations

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
Similarity Search and Applications
Abstract
Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids.
In Euclidean geometry the mean—as used ...
Performance Analysis of Clustering Methods for Outlier Detection
ACCT '12: Proceedings of the 2012 Second International Conference on Advanced Computing & Communication Technologies

Outliers detection is a task that finds objects that are dissimilar or inconsistent with respect to the remaining data. It has many uses in applications like fraud detection, network intrusion detection and clinical diagnosis of diseases. Using ...
An improved K-medoids algorithm based on step increasing and optimizing medoids

The proposed clustering algorithm improves performance and preserves efficiency.We propose a candidate medoids subset to optimize the clustering medoids.We propose increasing the medoid methods in a step-wise fashion.Results report better performances ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Business Intelligence and Data Mining

International Journal of Business Intelligence and Data Mining Volume 3, Issue 2

September 2008

111 pages

ISSN:1743-8195

EISSN:1743-8187

Issue’s Table of Contents

Publisher

Inderscience Publishers

Geneva 15, Switzerland

Publication History

Published: 01 September 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Barioni MRazente HMarcelino ATraina ATraina C(2014)Open issues for partitioning clustering methodsWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11274:3(161-177)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1002/widm.1127

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents