Abstract
Given a dataset, each element of which labeled by one of k labels, we construct by a very fast algorithm, a k-category proximal support vector machine (PSVM) classifier. Proximal support vector machines and related approaches (Fung & Mangasarian, 2001; Suykens & Vandewalle, 1999) can be interpreted as ridge regression applied to classification problems (Evgeniou, Pontil, & Poggio, 2000). Extensive computational results have shown the effectiveness of PSVM for two-class classification problems where the separating plane is constructed in time that can be as little as two orders of magnitude shorter than that of conventional support vector machines. When PSVM is applied to problems with more than two classes, the well known one-from-the-rest approach is a natural choice in order to take advantage of its fast performance. However, there is a drawback associated with this one-from-the-rest approach. The resulting two-class problems are often very unbalanced, leading in some cases to poor performance. We propose balancing the k classes and a novel Newton refinement modification to PSVM in order to deal with this problem. Computational results indicate that these two modifications preserve the speed of PSVM while often leading to significant test set improvement over a plain PSVM one-from-the-rest application. The modified approach is considerably faster than other one-from-the-rest methods that use conventional SVM formulations, while still giving comparable test set correctness.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., & Sorensen, D. (1999). LAPACK User’s Guide, third edition. Philadelphia, Pennsylvania: SIAM. http://www.netlib.org/lapack/.
Bennett, K. P. & Mangasarian, O. L. (1993). Multicategory separation via linear programming. Optimization Methods and Software, 3, 27–39.
Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel, L., LeCun, Y., Muller, U., Sackinger, E., Simard, P., & Vapnik, V. (1994). Comparison of classifier methods: A case study in handwriting digit recognition. International Conference on Pattern Recognition (pp. 77–87). IEEE Computer Society Press.
Bradley, P. S. & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13, 1–10. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.
Bredensteiner, E. J. & Bennett, K. P. (1999). Multicategory classification by support vector machines. Computational Optimization and Applications, 12, 53–79.
Burges, C. J. C. (1998).Atutorial on support vectormachines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121–167.
Chen, C. & Mangasarian, O. L. (1996). Hybrid misclassification minimization. Advances in Computational Mathematics, 5:2, 127–136. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/95-05.ps.
Cherkassky, V. & Mulier, F. (1998). Learning from Data—Concepts, Theory and Methods. New York: JohnWiley & Sons.
CPLEX Optimization Inc., Incline Village, Nevada. (1992). Using the CPLEX(TM) Linear Optimizer and CPLEX(TM) Mixed Integer Optimizer (Version 2.0).
Dasgupta, S. (2000). Experiments with random projection. Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000) (pp. 143–151). San Francisco, CA: Morgan Kaufmann Publishers.
Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 171–203). Cambridge, MA: MIT Press.
Facchinei, F. (1995). Minimization of SC1 functions and the Maratos effect. Operations Research Letters, 17, 131–137.
Fung, G. & Mangasarian, O. L. (2001). Proximal support vector machine classifiers. In F. Provost & R. Srikant (Eds.), Proceedings KDD-2001: Knowledge discovery and data mining (pp. 77–86). San Francisco, CA, New York: Asscociation for Computing Machinery. ftp://ftp.cs.wisc.edu/pub/ dmi/tech-reports/01-02.ps.
Furey, T. S., Duffy, N., Cristianini, N., Bednarski, D., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cnacer tissue samples usingmicroarray expression data. Bioinformatics, 16:10, 906–914.
Van Gestel, T., Suykens, J., Lanckriet, G., Lambrechts, A., De Moor, B., & Vandewalle, J. (2002). Multiclass ls-svms: moderated outputs and coding-decoding schemes. Neural Processing Letters, 15:1, 45–48.
Hiriart-Urruty, J.-B., Strodiot, J. J., & Nguyen, V. H. (1984). Generalized hessian matrix and second-order optimality conditions for problems with CL1 data. Applied Mathematics and Optimization, 11, 43–56.
Hoerl, A. E. & Kennard, R.W. (1952). Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Hsu, C.-W. & Lin, C.-J. (2001). A comparison on methods for multi-class support vector machines. http://www.csie.ntu.edu.tw/cjlin/papers.html.
Kanzow, C., Qi, H., & Qi, L. (2001). On the minimum norm solution of linear programs. Preprint, University of Hamburg, Hamburg. Journal of Optimization Theory and Applications, to appear. http://www.math.uni-hamburg.de/home/kanzow/paper.html.
Lee, Y.-J. & Mangasarian, O. L. (2001). RSVM: Reduced support vector machines. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7, CD-ROM. ftp://ftp.cs.wisc.edu/ pub/dmi/tech-reports/00-07.ps.
Lee, Y.-J. & Mangasarian, O. L. (2001). SSVM: A smooth support vector machine. Computational Optimization and Applications, 20, 5–22. Data Mining Institute, University of Wisconsin, Technical Report 99-03. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.ps.
Mangasarian, O. L. (1994). Nonlinear Programming. Philadelphia, PA: SIAM.
Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 135–146). Cambridge, MA: MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps.
MATLAB. (1994–2001). User’s Guide. The MathWorks, Inc., Natick, MA 01760. http://www.mathworks.com.
Murphy, P. M. & Aha, D. W. (1992). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Polyak, B. T. (1987). Introduction to Optimization. New York: Optimization Software, Inc., Publications Division.
Rockafellar, R. T. (1970). Convex Analysis Princeton. New Jersey: Princeton University Press.
Roth, V. V. & Steinhage, V. (1999). Nonlinear discriminant analysis using kernel function. In S. A. Solla, T. K. Leen, & K.-R. Mueller (Eds.), Advances in neural information processing systems–NIPS*99 (pp. 568–574).
Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10, 1000–1017.
Smola, A. J. & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. Proc. 17th International Conf. on Machine Learning (pp. 911–918). San Francisco, CA: Morgan Kaufmann.
Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002). Least Squares Support Vector Machines. Singapore: World Scientific Publishing Co..
Suykens, J. A. K., Lukas, L., Van Dooren, P., De Moor, B., & Vandewalle, J. (1999). Least squares support vector machine classifiers: A large scale algorithm. European Conference on Circuit Theory and Design, ECCTD’99 (pp. 839–842). Stresa, Italy.
Suykens, J. A. K. & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9:3, 293–300.
Suykens, J. A. K. & Vandewalle, J. (1999). Multiclass least squares support vector machines. Proceedings of IJCNN’99 (pp. CD-ROM). Washington, DC.
Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of Ill–Posed Problems. New York: John Wiley & Sons.
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. New York: Springer.
Weston, J. & Watkins, C. (1998). Multi-class support vector machines. Technical report csd-tr-98-04, Royal Holloway, University of London, Surrey, England.
Williams, C. K. I. & Seeger, M. (2000). Using the Nyström method to speed up kernel machines. Advances in Neural Information Processing Systems (NIPS2000). http://www.kernel-machines.org.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor
Shai Ben-David
Rights and permissions
About this article
Cite this article
Fung, G.M., Mangasarian, O.L. Multicategory Proximal Support Vector Machine Classifiers. Mach Learn 59, 77–97 (2005). https://doi.org/10.1007/s10994-005-0463-6
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-0463-6