Multicategory Proximal Support Vector Machine Classifiers

Glenn M. Fung¹ &
O. L. Mangasarian^2,3

4649 Accesses
6 Altmetric
Explore all metrics

Abstract

Given a dataset, each element of which labeled by one of k labels, we construct by a very fast algorithm, a k-category proximal support vector machine (PSVM) classifier. Proximal support vector machines and related approaches (Fung & Mangasarian, 2001; Suykens & Vandewalle, 1999) can be interpreted as ridge regression applied to classification problems (Evgeniou, Pontil, & Poggio, 2000). Extensive computational results have shown the effectiveness of PSVM for two-class classification problems where the separating plane is constructed in time that can be as little as two orders of magnitude shorter than that of conventional support vector machines. When PSVM is applied to problems with more than two classes, the well known one-from-the-rest approach is a natural choice in order to take advantage of its fast performance. However, there is a drawback associated with this one-from-the-rest approach. The resulting two-class problems are often very unbalanced, leading in some cases to poor performance. We propose balancing the k classes and a novel Newton refinement modification to PSVM in order to deal with this problem. Computational results indicate that these two modifications preserve the speed of PSVM while often leading to significant test set improvement over a plain PSVM one-from-the-rest application. The modified approach is considerably faster than other one-from-the-rest methods that use conventional SVM formulations, while still giving comparable test set correctness.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., & Sorensen, D. (1999). LAPACK User’s Guide, third edition. Philadelphia, Pennsylvania: SIAM. http://www.netlib.org/lapack/.
Bennett, K. P. & Mangasarian, O. L. (1993). Multicategory separation via linear programming. Optimization Methods and Software, 3, 27–39.
Google Scholar
Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel, L., LeCun, Y., Muller, U., Sackinger, E., Simard, P., & Vapnik, V. (1994). Comparison of classifier methods: A case study in handwriting digit recognition. International Conference on Pattern Recognition (pp. 77–87). IEEE Computer Society Press.
Bradley, P. S. & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13, 1–10. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.
Google Scholar
Bredensteiner, E. J. & Bennett, K. P. (1999). Multicategory classification by support vector machines. Computational Optimization and Applications, 12, 53–79.
Google Scholar
Burges, C. J. C. (1998).Atutorial on support vectormachines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121–167.
Article Google Scholar
Chen, C. & Mangasarian, O. L. (1996). Hybrid misclassification minimization. Advances in Computational Mathematics, 5:2, 127–136. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/95-05.ps.
Google Scholar
Cherkassky, V. & Mulier, F. (1998). Learning from Data—Concepts, Theory and Methods. New York: JohnWiley & Sons.
Google Scholar
CPLEX Optimization Inc., Incline Village, Nevada. (1992). Using the CPLEX(TM) Linear Optimizer and CPLEX(TM) Mixed Integer Optimizer (Version 2.0).
Dasgupta, S. (2000). Experiments with random projection. Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000) (pp. 143–151). San Francisco, CA: Morgan Kaufmann Publishers.
Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 171–203). Cambridge, MA: MIT Press.
Google Scholar
Facchinei, F. (1995). Minimization of SC¹ functions and the Maratos effect. Operations Research Letters, 17, 131–137.
Google Scholar
Fung, G. & Mangasarian, O. L. (2001). Proximal support vector machine classifiers. In F. Provost & R. Srikant (Eds.), Proceedings KDD-2001: Knowledge discovery and data mining (pp. 77–86). San Francisco, CA, New York: Asscociation for Computing Machinery. ftp://ftp.cs.wisc.edu/pub/ dmi/tech-reports/01-02.ps.
Furey, T. S., Duffy, N., Cristianini, N., Bednarski, D., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cnacer tissue samples usingmicroarray expression data. Bioinformatics, 16:10, 906–914.
CAS PubMed Google Scholar
Van Gestel, T., Suykens, J., Lanckriet, G., Lambrechts, A., De Moor, B., & Vandewalle, J. (2002). Multiclass ls-svms: moderated outputs and coding-decoding schemes. Neural Processing Letters, 15:1, 45–48.
Google Scholar
Hiriart-Urruty, J.-B., Strodiot, J. J., & Nguyen, V. H. (1984). Generalized hessian matrix and second-order optimality conditions for problems with C^L1 data. Applied Mathematics and Optimization, 11, 43–56.
Google Scholar
Hoerl, A. E. & Kennard, R.W. (1952). Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Google Scholar
Hsu, C.-W. & Lin, C.-J. (2001). A comparison on methods for multi-class support vector machines. http://www.csie.ntu.edu.tw/cjlin/papers.html.
Kanzow, C., Qi, H., & Qi, L. (2001). On the minimum norm solution of linear programs. Preprint, University of Hamburg, Hamburg. Journal of Optimization Theory and Applications, to appear. http://www.math.uni-hamburg.de/home/kanzow/paper.html.
Lee, Y.-J. & Mangasarian, O. L. (2001). RSVM: Reduced support vector machines. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7, CD-ROM. ftp://ftp.cs.wisc.edu/ pub/dmi/tech-reports/00-07.ps.
Lee, Y.-J. & Mangasarian, O. L. (2001). SSVM: A smooth support vector machine. Computational Optimization and Applications, 20, 5–22. Data Mining Institute, University of Wisconsin, Technical Report 99-03. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.ps.
MathSciNet Google Scholar
Mangasarian, O. L. (1994). Nonlinear Programming. Philadelphia, PA: SIAM.
Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 135–146). Cambridge, MA: MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps.
Google Scholar
MATLAB. (1994–2001). User’s Guide. The MathWorks, Inc., Natick, MA 01760. http://www.mathworks.com.
Murphy, P. M. & Aha, D. W. (1992). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Polyak, B. T. (1987). Introduction to Optimization. New York: Optimization Software, Inc., Publications Division.
Rockafellar, R. T. (1970). Convex Analysis Princeton. New Jersey: Princeton University Press.
Google Scholar
Roth, V. V. & Steinhage, V. (1999). Nonlinear discriminant analysis using kernel function. In S. A. Solla, T. K. Leen, & K.-R. Mueller (Eds.), Advances in neural information processing systems–NIPS*99 (pp. 568–574).
Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10, 1000–1017.
Article Google Scholar
Smola, A. J. & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. Proc. 17th International Conf. on Machine Learning (pp. 911–918). San Francisco, CA: Morgan Kaufmann.
Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002). Least Squares Support Vector Machines. Singapore: World Scientific Publishing Co..
Google Scholar
Suykens, J. A. K., Lukas, L., Van Dooren, P., De Moor, B., & Vandewalle, J. (1999). Least squares support vector machine classifiers: A large scale algorithm. European Conference on Circuit Theory and Design, ECCTD’99 (pp. 839–842). Stresa, Italy.
Suykens, J. A. K. & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9:3, 293–300.
Google Scholar
Suykens, J. A. K. & Vandewalle, J. (1999). Multiclass least squares support vector machines. Proceedings of IJCNN’99 (pp. CD-ROM). Washington, DC.
Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of Ill–Posed Problems. New York: John Wiley & Sons.
Google Scholar
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. New York: Springer.
Google Scholar
Weston, J. & Watkins, C. (1998). Multi-class support vector machines. Technical report csd-tr-98-04, Royal Holloway, University of London, Surrey, England.
Williams, C. K. I. & Seeger, M. (2000). Using the Nyström method to speed up kernel machines. Advances in Neural Information Processing Systems (NIPS2000). http://www.kernel-machines.org.

Download references

Author information

Authors and Affiliations

Computer-Aided Diagnosis & Therapy Solutions, Siemens Medical Solutions, Inc, 51 Valley Stream Parkway, Malvern, PA, 19355
Glenn M. Fung
Computer Sciences Department, University of Wisconsin, Madison, WI, 53706
O. L. Mangasarian
Department of Mathematics, University of California at San Diego, La Jolla, CA 92093
O. L. Mangasarian

Authors

Glenn M. Fung
View author publications
You can also search for this author in PubMed Google Scholar
O. L. Mangasarian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Glenn M. Fung.

Additional information

Editor

Shai Ben-David

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fung, G.M., Mangasarian, O.L. Multicategory Proximal Support Vector Machine Classifiers. Mach Learn 59, 77–97 (2005). https://doi.org/10.1007/s10994-005-0463-6

Download citation

Received: 11 July 2001
Revised: 18 June 2002
Accepted: 14 October 2004
Issue Date: May 2005
DOI: https://doi.org/10.1007/s10994-005-0463-6

Multicategory Proximal Support Vector Machine Classifiers

Abstract

Article PDF

Similar content being viewed by others

The semiproximal SVM approach for multiple instance learning: a kernel-based computational study

Multi-class support vector machine based on the minimization of class variance

Support Vector Machines

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editor

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multicategory Proximal Support Vector Machine Classifiers

Abstract

Article PDF

Similar content being viewed by others

The semiproximal SVM approach for multiple instance learning: a kernel-based computational study

Multi-class support vector machine based on the minimization of class variance

Support Vector Machines

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editor

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation