[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1553374.1553499acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

A least squares formulation for a class of generalized eigenvalue problems in machine learning

Published: 14 June 2009 Publication History

Abstract

Many machine learning algorithms can be formulated as a generalized eigenvalue problem. One major limitation of such formulation is that the generalized eigenvalue problem is computationally expensive to solve especially for large-scale problems. In this paper, we show that under a mild condition, a class of generalized eigenvalue problems in machine learning can be formulated as a least squares problem. This class of problems include classical techniques such as Canonical Correlation Analysis (CCA), Partial Least Squares (PLS), and Linear Discriminant Analysis (LDA), as well as Hypergraph Spectral Learning (HSL). As a result, various regularization techniques can be readily incorporated into the formulation to improve model sparsity and generalization ability. In addition, the least squares formulation leads to efficient and scalable implementations based on the iterative conjugate gradient type algorithms. We report experimental results that confirm the established equivalence relationship. Results also demonstrate the efficiency and effectiveness of the equivalent least squares formulations on large-scale problems.

References

[1]
Agarwal, S., Branson, K., & Belongie, S. (2006). Higher order learning with graphs. International Conference on Machine Learning (pp. 17--24).
[2]
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399--2434.
[3]
Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY: Springer.
[4]
d'Aspremont, A., Ghaoui, L., Jordan, M., & Lanckriet, G. (2004). A direct formulation for sparse PCA using semidefinite programming. Neural Information Processing Systems (pp. 41--48).
[5]
Donoho, D. (2006). For most large underdetermined systems of linear equations, the minimal 11-norm near-solution approximates the sparsest near-solution. Communications on Pure and Applied Mathematics, 59, 907--934.
[6]
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407.
[7]
Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. Neural Information Processing Systems (pp. 681--687).
[8]
Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 302--332.
[9]
Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Baltimore, MD: Johns Hopkins Press.
[10]
Hale, E., Yin, W., & Zhang, Y. (2008). Fixed-point continuation for l 1-minimization: Methodology and convergence. SIAM Journal on Optimization, 19, 1107--1130.
[11]
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer.
[12]
Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 312--377.
[13]
Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 550--554.
[14]
Kazawa, H., Izumitani, T., Taira, H., & Maeda, E. (2005). Maximal margin labeling for multi-topic text categorization. Neural Information Processing Systems (pp. 649--656).
[15]
Paige, C. C., & Saunders, M. A. (1982). LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software, 8, 43--71.
[16]
Rosipal, R., & Kräämer, N. (2006). Overview and recent advances in partial least squares. Subspace, Latent Structure and Feature Selection Techniques, Lecture Notes in Computer Science (pp. 34--51).
[17]
Saad, Y. (1992). Numerical methods for large eigenvalue problems. New York, NY: Halsted Press.
[18]
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.
[19]
Sun, L., Ji, S., & Ye, J. (2008a). Hypergraph spectral learning for multi-label classification. ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (pp. 668--676).
[20]
Sun, L., Ji, S., & Ye, J. (2008b). A least squares formulation for canonical correlation analysis. International Conference on Machine Learning (pp. 1024--1031).
[21]
Tao, D., Li, X., Wu, X., & Maybank, S. (2009). Geometric mean for subspace selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 260--274.
[22]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 58, 267--288.
[23]
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. International Conference on Machine Learning (pp. 412--420).
[24]
Ye, J. (2007). Least squares linear discriminant analysis. International Conference on Machine Learning (pp. 1087--1094).

Cited By

View all
  • (2024)On the Equivalence of Linear Discriminant Analysis and Least Squares RegressionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3208944(1-11)Online publication date: 2024
  • (2024)Dimensionality reduction through clustering of variables and canonical correlationJournal of the Korean Statistical Society10.1007/s42952-024-00290-3Online publication date: 26-Sep-2024
  • (2024)Collaborative and dynamic kernel discriminant analysis for large-scale problems: applications in multi-class learning and novelty detectionProgress in Artificial Intelligence10.1007/s13748-023-00309-6Online publication date: 22-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)On the Equivalence of Linear Discriminant Analysis and Least Squares RegressionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3208944(1-11)Online publication date: 2024
  • (2024)Dimensionality reduction through clustering of variables and canonical correlationJournal of the Korean Statistical Society10.1007/s42952-024-00290-3Online publication date: 26-Sep-2024
  • (2024)Collaborative and dynamic kernel discriminant analysis for large-scale problems: applications in multi-class learning and novelty detectionProgress in Artificial Intelligence10.1007/s13748-023-00309-6Online publication date: 22-Jan-2024
  • (2023)Discriminative Projected Clustering via Unsupervised LDAIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320271934:11(9466-9480)Online publication date: Nov-2023
  • (2023)A Scalable Unsupervised Feature Selection With Orthogonal Graph Representation for Hyperspectral ImagesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.328447561(1-13)Online publication date: 2023
  • (2022)fgSpMSpV: A Fine-grained Parallel SpMSpV Framework on HPC PlatformsACM Transactions on Parallel Computing10.1145/35127709:2(1-29)Online publication date: 11-Apr-2022
  • (2022)A Unified Framework for Probabilistic Component AnalysisMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_30(469-484)Online publication date: 10-Mar-2022
  • (2022)A Hybrid Synchronization Mechanism for Parallel Sparse Triangular SolveLanguages and Compilers for Parallel Computing10.1007/978-3-030-99372-6_8(118-133)Online publication date: 24-Mar-2022
  • (2021)Clustered Discriminant Regression for High-Dimensional Data Feature Extraction and Its Applications in Healthcare and Additive ManufacturingIEEE Transactions on Automation Science and Engineering10.1109/TASE.2020.302902818:4(1998-2010)Online publication date: Oct-2021
  • (2020)Learning Distilled Graph for Large-Scale Social Network Data ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290406832:7(1393-1404)Online publication date: 1-Jul-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media