More Web Proxy on the site http://driver.im/

research-article

Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds

Authors:

Balaji Krishnapuram,

Lawrence Carin,

Mario A. T. Figueiredo,

Alexander J. HarteminkAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 27, Issue 6

Pages 957 - 968

https://doi.org/10.1109/TPAMI.2005.127

Published: 01 June 2005 Publication History

Abstract

Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmarkdata sets attest to the accuracy, sparsity, and efficiency of the proposed methods.

References

[1]

P. Bartlett and S. Mendelson, “Rademacher and Gaussian Complexities: Risk Bounds and Structural Results,” J. Machine Learning Research, vol. 3, pp. 463-482, 2002.

Digital Library

[2]

D. Böhning, “Multinomial Logistic Regression Algorithm,” Annals of the Inst. of Statistical Math., vol. 44, pp. 197-200, 1992.

[3]

D. Böhning and B. Lindsay, “Monotonicity of Quadratic-Approximation Algorithms,” Annals of the Inst. of Statistical Math., vol. 40, pp. 641-663, 1988.

[4]

S. Chen D. Donoho and M. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Scientific Computation, vol. 20, pp. 33-61, 1998.

Digital Library

[5]

M. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, U.K.: Cambridge Univ. Press, 2000.

Digital Library

[6]

L. Csato and M. Opper, “Sparse Online Gaussian Processes,” Neural Computation, vol. 14, no. 3, pp. 641-668, 2002.

Digital Library

[7]

J. de Leeuw and G. Michailides, “Block Relaxation Methods in Statistics,” technical report, Dept. of Statistics, Univ. of California at Los Angeles, 1993.

[8]

A. Dempster N. Laird and D. Rubin, “Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.

[9]

D. Donoho and M. Elad, “Optimally Sparse Representations in General Nonorthogonal Dictionaries by l<inf>1</inf> Minimization,” Proc. Nat'l Academy of Science, vol. 100, no. 5, pp. 2197-2202, 2003.

[10]

M. Figueiredo and A. Jain, “Bayesian Learning of Sparse Classifiers,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 35-41, 2001.

[11]

M. Figueiredo, “Adaptive Sparseness for Supervised Learning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, pp.nbsp1150-1159, 2003.

Digital Library

[12]

J. Friedman T. Hastie S. Rosset R. Tibshirani and J. Zhu, “Discussion of Boosting Papers,” The Annals of Statistics, vol. 32, no. 1, pp. 102-107, 2004.

[13]

T. Graepel R. Herbrich and J. Shawe-Taylor, “Generalisation Error Bounds for Sparse Linear Classifiers,” Proc. Conf. Computational Learning Theory, pp. 298-303, 2000.

Digital Library

[14]

T. Graepel R. Herbrich and R.C. Williamson, “From Margin to Sparsity,” Proc. Neural Information Processing Systems (NIPS) 13, pp. 210-216, 2001.

[15]

R. Herbrich, Learning Kernel Classifiers: Theory and Algorithms. Cambridge, Mass.: MIT Press, 2002.

Digital Library

[16]

B. Krishnapuram L. Carin and A. Hartemink, “Joint Classifier and Feature Optimization for Cancer Diagnosis Using Gene Expression Data,” Proc. Int'l Conf. Research in Computational Molecular Biology, 2003.

Digital Library

[17]

B. Krishnapuram A. Hartemink L. Carin and M. Figueiredo, “A Bayesian Approach to Joint Feature Selection and Classifier Design,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, pp. 1105-1111, 2004.

Digital Library

[18]

K. Lange, Optimization. New York: Springer Verlag, 2004.

[19]

K. Lange D. Hunter and I. Yang, “Optimization Transfer Using Surrogate Objective Functions,” J. Computational and Graphical Statistics, vol. 9, pp. 1-59, 2000.

[20]

J. Langford and J. Shawe-Taylor, “PAC-Bayes and Margins,” Advances in Neural Information Processing Systems 15, S. Becker, S.nbspThrun, and K. Obermayer, eds., pp. 423-430, Cambridge, Mass.: MIT Press, 2003.

[21]

J. Langford, “Practical Prediction Theory for Classification,” Proc. Int'l Conf. Machine Learning, T. Fawcett and N. Mishra, eds., 2003.

[22]

N.D. Lawrence M. Seeger and R. Herbrich, “Fast Sparse Gaussian Process Methods: The Informative Vector Machine,” Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun and K.nbspObermayer, eds., pp. 609-616, Cambridge, Mass.: MIT Press, 2003.

[23]

M. Lewicki and T. Sejnowski, “Learning Overcomplete Representations,” Neural Computation, vol. 12, pp. 337-365, 2000.

Digital Library

[24]

S. Mallat, A Wavelet Tour of Signal Processing. San Diego, Calif.: Academic Press, 1998.

[25]

D. McAllester, “Some PAC-Bayesian Theorems,” Machine Learning, vol. 37, pp. 355-363, 1999.

Digital Library

[26]

R. Meir and T. Zhang, “Generalization Error Bounds for Bayesian Mixture Algorithms,” J. Machine Learning Research, vol. 4, pp. 839-860, 2003.

Digital Library

[27]

T. Minka, “A Comparison of Numerical Optimizers for Logistic Regression,” technical report, Dept. of Statistics, Carnegie Mellon Univ., 2003.

[28]

R. Neal, Bayesian Learning for Neural Networks. New York: Springer Verlag, 1996.

Digital Library

[29]

A.Y. Ng, “Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance,” Proc. Int'l Conf. Machine Learning, 2004.

Digital Library

[30]

B. Olshausen and D. Field, “Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images,” Nature, vol. 381, pp. 607-609, 1996.

[31]

Y. Qi T.P. Minka R.W. Picard and Z. Ghahramani, “Predictive Automatic Relevance Determination by Expectation Propagation,” Proc. Int'l Conf. Machine Learning, 2004.

Digital Library

[32]

R. Salakhutdinov and S. Roweis, “Adaptive Overrelaxed Bound Optimization Methods,” Proc. Int'l Conf. Machine Learning, pp. 664-671, 2003.

[33]

M. Seeger, “PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification,” J. Machine Learning Research, vol. 3, pp. 233-269, 2002.

Digital Library

[34]

R. Tibshirani, “Regression Shrinkage and Selection via the LASSO,” J. Royal Statistical Soc. B, vol. 58, no. 1, pp. 267-288, 1996.

[35]

M. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Machine Learning Research, vol. 1, pp. 211-244, 2001.

Digital Library

[36]

M. Tipping and A. Faul, “Fast Marginal Likelihood Maximisation for Sparse Bayesian Models,” Proc. Ninth Int'l Workshop Artificial Intelligence and Statistics, C. Bishop and B. Frey, eds., 2003.

[37]

L. Valiant, “A Theory of the Learnable,” Comm. ACM, vol. 27, pp.nbsp1134-1142, 1984.

Digital Library

[38]

V. Vapnik, Statistical Learning Theory. New York: John Wiley, 1998.

Digital Library

[39]

J. Weston A. Elisseeff B. Schölkopf and M. Tipping, “Use of the Zero-Norm with Linear Models and Kernel Methods,” J. Machine Learning Research, vol. 3, pp. 1439-1461, 2003.

Digital Library

[40]

C. Williams and D. Barber, “Bayesian Classification with Gaussian Priors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20,no. 12, pp. 1342-1351, Dec. 1998.

Digital Library

[41]

P. Williams, “Bayesian Regularization and Pruning Using a Laplace Prior,” Neural Computation, vol. 7, pp. 117-143, 1995.

Digital Library

[42]

T. Zhang and F. Oles, “Regularized Linear Classification Methods,” Information Retrieval, vol. 4, pp. 5-31, 2001.

Digital Library

[43]

J. Zhu and T. Hastie, “Kernel Logistic Regression and the Import Vector Machine,” Advances in Neural Information Processing Systems 14, T. Dietterich, S. Becker, and Z. Ghahramani, eds., pp. 1081-1088, Cambridge, Mass.: MIT Press, 2002.

Cited By

Nie FHao ZWang RWooldridge MDy JNatarajan S(2024)Multi-class support vector machine with maximizing minimum marginProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i13.29361(14466-14473)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i13.29361
Hu XYang J(2024)Group Penalized Multinomial Logit Models and Stock Return Direction PredictionIEEE Transactions on Information Theory10.1109/TIT.2024.337675170:6(4297-4318)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1109/TIT.2024.3376751
Qu JXiao LDong WLi Y(2024)MTLSC-DiffKnowledge-Based Systems10.1016/j.knosys.2024.112415303:COnline publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112415
Show More Cited By

Index Terms

Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
      2. Factorization methods
        Canonical correlation analysis
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Regression analysis

Recommendations

Accelerating cross-validation in multinomial logistic regression with l₁-regularization

We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an l₁-norm. This allows us to avoid repeated optimizations required for literally conducting cross-...
Sparse group lasso and high dimensional multinomial classification

The sparse group lasso optimization problem is solved using a coordinate gradient descent algorithm. The algorithm is applicable to a broad class of convex loss functions. Convergence of the algorithm is established, and the algorithm is used to ...
Sparse Multinomial Logistic Regression via Approximate Message Passing

For the problem of multi class linear classification and feature selection, we propose approximate message passing approaches to sparse multinomial logistic regression (MLR). First, we propose two algorithms based on the Hybrid Generalized Approximate ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 27, Issue 6

June 2005

176 pages

ISSN:0162-8828

Issue’s Table of Contents

Copyright © 2005.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 June 2005

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

120
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nie FHao ZWang RWooldridge MDy JNatarajan S(2024)Multi-class support vector machine with maximizing minimum marginProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i13.29361(14466-14473)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i13.29361
Hu XYang J(2024)Group Penalized Multinomial Logit Models and Stock Return Direction PredictionIEEE Transactions on Information Theory10.1109/TIT.2024.337675170:6(4297-4318)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1109/TIT.2024.3376751
Qu JXiao LDong WLi Y(2024)MTLSC-DiffKnowledge-Based Systems10.1016/j.knosys.2024.112415303:COnline publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112415
Hu XYang J(2024)G-LASSO/G-SCAD/G-MCP penalized trinomial logit dynamic models predict up trends, sideways trends and down trends for stock returns▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2024.123476249:PAOnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123476
Naik AKuppili V(2024)An embedded feature selection method based on generalized classifier neural network for cancer classificationComputers in Biology and Medicine10.1016/j.compbiomed.2023.107677168:COnline publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.compbiomed.2023.107677
Robini MWang LZhu Y(2024)The appeals of quadratic majorization–minimizationJournal of Global Optimization10.1007/s10898-023-01361-189:3(509-558)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10898-023-01361-1
Nguyen HNguyen THo NOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Demystifying softmax gating function in Gaussian mixture of expertsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666328(4624-4652)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666328
Wen CLi ZDong RNi YPan W(2023)Simultaneous Dimension Reduction and Variable Selection for Multinomial Logistic RegressionINFORMS Journal on Computing10.1287/ijoc.2022.013235:5(1044-1060)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1287/ijoc.2022.0132
Zhu JWang XHu LHuang JJiang KZhang YLin SZhu J(2022)abessThe Journal of Machine Learning Research10.5555/3586589.358679123:1(9206-9212)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.5555/3586589.3586791
Sun TAli DWu A(2022)Compressed-Domain ECG-based Biometric User Identification Using Task-Driven Dictionary LearningACM Transactions on Computing for Healthcare10.1145/34617013:3(1-15)Online publication date: 7-Apr-2022
https://dl.acm.org/doi/10.1145/3461701
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents