[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Sampling for Nyström Extension-Based Spectral Clustering: Incremental Perspective and Novel Analysis

Published: 20 July 2016 Publication History

Abstract

Sampling is the key aspect for Nyström extension based spectral clustering. Traditional sampling schemes select the set of landmark points on a whole and focus on how to lower the matrix approximation error. However, the matrix approximation error does not have direct impact on the clustering performance. In this article, we propose a sampling framework from an incremental perspective, i.e., the landmark points are selected one by one, and each next point to be sampled is determined by previously selected landmark points. Incremental sampling builds explicit relationships among landmark points; thus, they work together well and provide a theoretical guarantee on the clustering performance. We provide two novel analysis methods and propose two schemes for selecting-the-next-one of the framework. The first scheme is based on clusterability analysis, which provides a better guarantee on clustering performance than schemes based on matrix approximation error analysis. The second scheme is based on loss analysis, which provides maximized predictive ability of the landmark points on the (implicit) labels of the unsampled points. Experimental results on a wide range of benchmark datasets demonstrate the superiorities of our proposed incremental sampling schemes over existing sampling schemes.

References

[1]
Gavin Anne-Claude, Aloy Patrick, Grandi Paola, Krause Roland, Boesche Markus, Marzioch Martina, Rau Christina, Jensen Lars Juhl, Bastuck Sonja, and Dmpelfeld Birgit. 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 7084 (Mar. 2006), 631--636.
[2]
Mohamed-Ali Belabbas and Patrick J. Wolfe. 2009. Spectral methods in machine learning and new strategies for very large datasets. Proc. Nat. Acad. Sci. USA 51, 6 (Jan. 2009), 369--374.
[3]
Sylvain Brohe and Jacques Van Helden. 2006. Evaluation of clustering algorithms for protein--protein interaction networks. BMC Bioinform. 7, 1602 (2006), 2791--2797.
[4]
Deng Cai and Xinlei Chen. 2015. Large scale spectral clustering via landmark-based sparse representation. IEEE Trans. Cybern. 45, 8 (Aug. 2015), 1669--1680.
[5]
Kamp Christel and Christensen Kim. 2005. Spectral analysis of protein--protein interactions in Drosophila melanogaster. Phys. Rev. E 71, 4 (Apr. 2005), 100--119.
[6]
Sean R. Collins, Patrick Kemmeren, Xue-Chu Zhao, Jack F. Greenblatt, Forrest Spencer, Frank C. P. Holstege, Jonathan S. Weissman, and Nevan J. Krogan. 2007. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteom. 6, 3 (Jan. 2007), 439--450.
[7]
Romain Couillet and Florent Benaych-Georges. 2015. Understanding big data spectral clustering. In Proceedings of the IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing. IEEE, Cancun, Mexico, 29--32.
[8]
Chandler Davis and W. M. Kahan. 1970. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7, 1 (1970), 1--46.
[9]
Shi Fei Ding, Hong Jie Jia, and Zhong Zhi Shi. 2014. Spectral clustering algorithm based on adaptive Nyström sampling for big data analysis. J. Softw. 25, 9 (2014), 2037--2049.
[10]
Petros Drineas and Michael W. Mahoney. 2005. On the Nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6 (Dec. 2005), 2153--2175.
[11]
Ahmed K. Farahat, Ali Ghodsi, and Mohamed Kamel. 2011. A novel greedy algorithm for Nyström approximation. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR, Fort Lauderdale, USA, 269--277.
[12]
Charless Fowlkes, Serge Belongie, Fan R. K. Chung, and Jitendra Malik. 2004. Spectral grouping using Nyström methods. IEEE Trans. Pattern Anal. Mach. Intell. 26, 2 (Feb. 2004), 214--225.
[13]
Alex Gittens and Michael Mahoney. 2013. Revisiting the Nyström for improved large-scale machine learning. In Proceedings of the 30th International Conference on Machine Learning. IMLS, Atlanta, GA, USA, 567--575.
[14]
Blake Hunter and Thomas Strohmer. 2010. Performance analysis of spectral clustering on compressed, incomplete and inaccurate measurements. Comput. Res. Repository (2010), arXiv: abs/1011.0997.
[15]
Ying Kang, Bo Yu, Weiping Wang, and Dan Meng. 2015. Spectral clustering for large-scale social networks via a pre-coarsening sampling based Nyström method. In Advances in Knowledge Discovery and Data Mining. Springer, Ho Chi Minh City, Vietnam, 106--118.
[16]
Inoue Kentaro, Li Weijiang, and Kurata Hiroyuki. 2010. Diffusion model based spectral clustering for protein-protein interaction networks. PLoS One 5, 9 (Sep. 2010), 2771--2794.
[17]
Nguyen Lu Dang Khoa and Sanjay Chawla. 2012. Large scale spectral clustering using resistance distance and Spielman--Teng solvers. In Proceedings of 15th International Conference on Discovery Science. Springer, Lyon, France, 7--21.
[18]
Nevan J. Krogan, Cagney Gerard, Yu Haiyuan, Zhong Gouqing, Guo Xinghua, Ignatchenko Alexandr, Li Joyce, Pu Shuye, Datta Nira, and Aaron P. Tikuisis. 2006. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 7084 (Mar. 2006), 637--643.
[19]
Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. 2012. Sampling methods for the Nyström method. J. Mach. Learn. Res. 13 (Apr. 2012), 981--1006.
[20]
Jianyuan Li, Yingjie Xia, Zhenyu Shan, and Yuncai Liu. 2015. Scalable constrained spectral clustering. IEEE Trans. Knowl. Data Eng. 27, 2 (Feb. 2015), 589--593.
[21]
Mu Li, Wei Bi, James T. Kwok, and Bao-Liang. Lu. 2015. Large-scale Nyström kernel matrix approximation using randomized SVD. IEEE Trans. Neural Netw. Learn. Syst. 26, 1 (Jan. 2015), 152--164.
[22]
Mu Li, James T. Kwok, and B.-L. Lu. 2010. Making large-scale Nyström approximation possible. In Proceedings of the International Conference on Machine Learning. ACM, Haifa, Israel, 631--638.
[23]
Mu Li, Xiao-Chen Lian, James T. Kwok, and Bao-Liang Lu. 2011. Time and space efficient spectral clustering via column sampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Colorado Springs, CO, USA, 2297--2304.
[24]
Mu Li, Gary L. Miller, and Rongkun Peng. 2013. Iterative row sampling. In Proceedings of the IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, Berkeley, CA, USA, 127--136.
[25]
Ming Lin, Fei Wang, and Changshui Zhang. 2015. Large-scale eigenvector approximation via Hilbert space embedding Nyström. Pattern Recognit. 48, 5 (May 2015), 1904--1912.
[26]
Jialu Liu, Chi Wang, Marina Danilevsky, and Jiawei Han. 2013. Large-scale spectral clustering on graphs. In Proceedings of the International Joint Conference on Artificial Intelligence. Morgan Kaufmann, Beijing, China, 1486--1492.
[27]
David R. Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Barcelona, Spain, 416--425.
[28]
David R. Martin, Charless C. Fowlkes, and Jitendra Malik. 2004. Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26, 5 (May 2004), 530--549.
[29]
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems. MIT Press, Vancouver, British Columbia, Canada, 849--856.
[30]
Guimin Qin and Lin Gao. 2010. Spectral clustering for detecting protein complexes in protein--protein interaction (PPI) networks. Math. Comput. Model. 52, 11C12 (Dec. 2010), 2066--2074.
[31]
Tomoya Sakai and Atsushi Imiya. 2009. Fast spectral clustering with random projection and sampling. In Proceedings of the International Conference on Machine Learning and Data Mining in Pattern Recognition. Springer, Leipzig, Germany, 372--384.
[32]
T. Semertzidis, D. Rafailidis, M. G. Strintzis, and P. Daras. 2015. Large-scale spectral clustering based on pairwise constraints. Inf. Process. Manag. 51, 5 (Sep. 2015), 616--624.
[33]
Taner Z. Sen, Andrzej Kloczkowski, and Robert L. Jernigan. 2006. Functional clustering of yeast proteins from the protein--protein interaction network. BMC Bioinform. 7, 1 (Jul. 2006), 355.
[34]
Ohad Shamir and Naftali Tishby. 2011. Spectral clustering on a budget. In Proceedings of the International Conference on Artificial Intelligence and Statistics. JMLR, Fort Lauderdale, USA, 661--669.
[35]
Si Si, Cho-Jui Hsieh, and Inderjit S. Dhillon. 2014. Memory efficient kernel approximation. In Proceedings of the 31th International Conference on Machine Learning. ACM, Beijing, China, 701--709.
[36]
Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles -- A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (Dec. 2002), 583--617.
[37]
Shusen Wang, Chao Zhang, Hui Qian, and Zhihua Zhang. 2014. Improving the modified Nyström method using spectral shifting. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 611--620.
[38]
Shusen Wang and Zhihua Zhang. 2014. Efficient algorithms and error analysis for the modified Nyström method. In Proceedings of the International Conference on Artificial Intelligence and Statistics. JMLR, Reykjavik, Iceland, 996--1004.
[39]
Christopher K. I. Williams and Matthias Seeger. 2000. Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems. MIT Press, Vancouver, British Columbia, Canada, 682--688.
[40]
Donghui Yan, Ling Huang, and Michael I. Jordan. 2009. Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Paris, France, 907--916.
[41]
Kai Zhang, Ivor W. Tsang, and James T. Kwok. 2008. Improved Nyström low-rank approximation and error analysis. In Proceedings of the International Conference on Machine Learning. ACM, Helsinki, Finland, 1232--1239.
[42]
Xianchao Zhang and Quanzeng You. 2011. Clusterability analysis and incremental sampling for Nyström extension based spectral clustering. In Proceedings of the IEEE International Conference on Data Mining. IEEE, New York, NY, USA, 942--951.

Cited By

View all
  • (2024)Tractability of approximation by general shallow networksAnalysis and Applications10.1142/S021953052440001322:03(535-568)Online publication date: 9-Feb-2024
  • (2022)Divide-and-conquer based large-scale spectral clusteringNeurocomputing10.1016/j.neucom.2022.06.006501(664-678)Online publication date: Aug-2022
  • (2022)An Efficient Deep Learning Approach Using Improved Generative Adversarial Networks for Incomplete Information Completion of Self-driving VehiclesJournal of Grid Computing10.1007/s10723-022-09610-520:3Online publication date: 22-Jun-2022
  • Show More Cited By

Index Terms

  1. Sampling for Nyström Extension-Based Spectral Clustering: Incremental Perspective and Novel Analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 1
    February 2017
    288 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/2974720
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2016
    Accepted: 01 May 2016
    Revised: 01 May 2016
    Received: 01 August 2014
    Published in TKDD Volume 11, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Nyström extension
    2. Spectral clustering
    3. clusterability analysis
    4. incremental sampling
    5. loss analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)81
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Tractability of approximation by general shallow networksAnalysis and Applications10.1142/S021953052440001322:03(535-568)Online publication date: 9-Feb-2024
    • (2022)Divide-and-conquer based large-scale spectral clusteringNeurocomputing10.1016/j.neucom.2022.06.006501(664-678)Online publication date: Aug-2022
    • (2022)An Efficient Deep Learning Approach Using Improved Generative Adversarial Networks for Incomplete Information Completion of Self-driving VehiclesJournal of Grid Computing10.1007/s10723-022-09610-520:3Online publication date: 22-Jun-2022
    • (2021)An efficient graph clustering algorithm by exploiting k-core decomposition and motifsComputers and Electrical Engineering10.1016/j.compeleceng.2021.10756496:PBOnline publication date: 1-Dec-2021
    • (2020)Hubness-based Sampling Method for Nyström Spectral Clustering2020 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN48605.2020.9207089(1-8)Online publication date: Jul-2020
    • (2019)Multi-graph fusion for multi-view spectral clusteringKnowledge-Based Systems10.1016/j.knosys.2019.105102(105102)Online publication date: Oct-2019
    • (2019)An autoencoder-based spectral clustering algorithmSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-019-03994-524:3(1661-1671)Online publication date: 17-Apr-2019
    • (2018)An anchor-based spectral clustering methodFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.170026219:11(1385-1396)Online publication date: 26-Dec-2018
    • (2018)Partial Sum Minimization of Singular Values Representation on Grassmann ManifoldsACM Transactions on Knowledge Discovery from Data10.1145/309269012:1(1-22)Online publication date: 23-Jan-2018
    • (2017)A Spectral Clustering Method for Large-Scale Geostatistical DatasetsMachine Learning and Data Mining in Pattern Recognition10.1007/978-3-319-62416-7_18(248-261)Online publication date: 2-Jul-2017

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media