[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Problem formulations and solvers in linear SVM: a review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Support vector machine (SVM) is an optimal margin based classification technique in machine learning. SVM is a binary linear classifier which has been extended to non-linear data using Kernels and multi-class data using various techniques like one-versus-one, one-versus-rest, Crammer Singer SVM, Weston Watkins SVM and directed acyclic graph SVM (DAGSVM) etc. SVM with a linear Kernel is called linear SVM and one with a non-linear Kernel is called non-linear SVM. Linear SVM is an efficient technique for high dimensional data applications like document classification, word-sense disambiguation, drug design etc. because under such data applications, test accuracy of linear SVM is closer to non-linear SVM while its training is much faster than non-linear SVM. SVM is continuously evolving since its inception and researchers have proposed many problem formulations, solvers and strategies for solving SVM. Moreover, due to advancements in the technology, data has taken the form of ‘Big Data’ which have posed a challenge for Machine Learning to train a classifier on this large-scale data. In this paper, we have presented a review on evolution of linear support vector machine classification, its solvers, strategies to improve solvers, experimental results, current challenges and research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. datasets used in the experiments are available at link: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

  • Abe S (2015) Fuzzy support vector machines for multilabel classification. Pattern Recogn 48(6):2110–2117

    Article  MATH  Google Scholar 

  • Abe S, Inoue T (2002) Fuzzy support vector machines for multiclass problems. In: European symposium on artificial neural networks, pp 113–118

  • Beck A, Tetruashvili L (2013) On the convergence of block coordinate descent type methods. SIAM J Optim 23(4):2037–2060

    Article  MathSciNet  MATH  Google Scholar 

  • Becker S, Fadili M (2012) A quasi-newton proximal splitting method. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, pp 2627–2635

  • Blondel M, Seki K, Uehara K (2013) Block coordinate descent algorithms for large-scale sparse multiclass classification. Mach Learn 93(1):31–52

    Article  MathSciNet  MATH  Google Scholar 

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory, ACM Press, pp 144–152

  • Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Lechevallier Y, Saporta G (eds) Proceedings of the 19th international conference on computational statistics (COMPSTAT’2010), Springer, Paris, pp 177–187. http://leon.bottou.org/papers/bottou-2010

  • Bottou L (2012) Neural networks: tricks of the trade. 2nd edn, Springer, Berlin, pp 421–436

  • Bottou L, Lin CJ (2007) Support vector machine solvers. Science 3(1):1–27

    Google Scholar 

  • Bottou L, Curtis FE, Nocedal J (2016) Optimization methods for large-scale machine learning. arXiv:1606.04838

  • Bredensteiner EJ, Bennett KP (1999) Computational optimization: a tribute to olvi mangasarian, vol I, Springer, Boston, chap Multicategory Classification by Support Vector Machines, pp 53–79

  • Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031

    Article  MathSciNet  MATH  Google Scholar 

  • Cao LJ, Keerthi SS, Ong CJ, Zhang JQ, Periyathamby U, Fu XJ, Lee HP (2006) Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans Neural Netw 17(4):1039–1049

    Article  Google Scholar 

  • Cauchy AL (1847) Méthode générale pour la résolution des systèmes d’équations simultanées. Compte Rendu des S’eances de L’Acad’emie des Sciences XXV S’erie A(25):536–538

    Google Scholar 

  • Chang CC, Hsu CW, Lin CJ (2000) The analysis of decomposition methods for support vector machines. IEEE Transactions on Neural Networks 11(4):1003–1008

  • Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27

    Article  Google Scholar 

  • Chang KW, Hsieh CJ, Lin CJ (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res 9:1369–1398

    MathSciNet  MATH  Google Scholar 

  • Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178

    Article  MathSciNet  MATH  Google Scholar 

  • Chauhan VK, Dahiya K, Sharma A (2017a) Mini-batch block-coordinate based stochastic average adjusted gradient methods to solve big data problems. In: Zhang ML, Noh YK (eds) Proceedings of the ninth Asian conference on machine learning, PMLR, Proceedings of machine learning research, vol 77, pp 49–64. http://proceedings.mlr.press/v77/chauhan17a.html

  • Chauhan VK, Dahiya K, Sharma A (2017b) Trust region levenberg-marquardt method for linear svm. In: Ninth international conference on advances in pattern recognition (ICAPR-2017)

  • Yc C, Su C (2016) Distance-based margin support vector machine for classification. Appl Math Comput 283:141–152

    MathSciNet  Google Scholar 

  • Collobert R, Bengio S, Bengio Y (2002) A parallel mixture of svms for very large scale problems. Neural Comput 14(5):1105–1114

    Article  MATH  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Cotter A, Shalev-Shwartz S, Srebro N (2013) Learning optimally sparse support vector machines. In: Proceedings of the 30th ICML, pp 266–274

  • Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(1995):201–233

    Article  MATH  Google Scholar 

  • De S, Taylor G, Goldstein T (2015) Variance reduction for distributed stochastic gradient descent. arXiv:1512.01708

  • Defazio A, Bach F, Lacoste-Julien S (2014) Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the 27th international conference on neural information processing systems, MIT Press, Cambridge, NIPS’14, pp 1646–1654

  • Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04, pp 109–117

  • Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. In: Advances in computational mathematics, MIT Press, Cambridge, pp 1–50

  • Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) LIBLINEAR: a library for large linear classification. JMLR 9:1871–1874

    MATH  Google Scholar 

  • Farquhar JDR, Meng H, Szedmak S, Hardoon DR, Shawe-Taylor J (2006) Two view learning: Svm-2k, theory and practice. In: Advances in neural information processing systems, MIT Press, Cambridge

  • Ferris MC, Munson TS (2002) Interior-point methods for massive support vector machines. SIAM J Optim 13(3):783–804

    Article  MathSciNet  MATH  Google Scholar 

  • Fletcher R, Reeves CM (1964) Function minimization by conjugate gradients. Comput J 7(2):149–154. https://doi.org/10.1093/comjnl/7.2.149

    Article  MathSciNet  MATH  Google Scholar 

  • Franc V, Sonnenburg S (2008) Optimized cutting plane algorithm for support vector machines. In: Proceedings of the 25th international conference on machine learning, ACM, New York, NY, ICML ’08, pp 320–327

  • Friedman J (1996) Another approach to polychotomous classification. Stanford University, Tech Rep, Dept Statistics

  • Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’01, pp 77–86

  • Gao Y, Sun S (2010) An empirical evaluation of linear and nonlinear kernels for text classification using support vector machines. In: 2010 seventh international conference on fuzzy systems and knowledge discovery, vol 4, pp 1502–1505, https://doi.org/10.1109/FSKD.2010.5569327

  • Gibson N (2011) Gradient-based methods for optimization. part i

  • Graf HP, Cosatto E, Bottou L, Durdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Advances in neural information processing systems pp 521–528

  • Guermeur Y (2002) Combining discriminant models with new multi-class svms. Pattern Anal Appl 5(2):168–179

    Article  MathSciNet  MATH  Google Scholar 

  • Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(6):409–436

    Article  MathSciNet  MATH  Google Scholar 

  • Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on machine learning, ACM, New York, NY, ICML ’08, pp 408–415

  • Hsieh CJ, Si S, Dhillon I (2014) A divide-and-conquer solver for Kernel support vector machines. JMLR W&Cp 32(1):566–574

    Google Scholar 

  • Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425. https://doi.org/10.1109/72.991427

    Article  Google Scholar 

  • Jayadeva, Khemchandani R, Chandra S (2004) Fast and robust learning through fuzzy linear proximal support vector machines. Neurocomput 61(C):401–411

    Article  Google Scholar 

  • Jayadeva KR, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910

    Article  MATH  Google Scholar 

  • Khemchandani R, Chandra S (2017) Twin support vector machines–models, extensions and applications, studies in computational intelligence, vol 659. Springer, Berlin

  • Ji Y, Sun S (2011) Multitask multiclass support vector machines. In: 2011 IEEE 11th international conference on data mining workshops, pp 512–518

  • Ji Y, Sun S (2013) Multitask multiclass support vector machines: model and experiments. Pattern Recogn 46(3):914–924

    Article  MATH  Google Scholar 

  • Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the sixteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, ICML ’99, pp 200–209

  • Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ’06, pp 217–226

  • Joachims T, Dortmund U, Joachimscsuni-dortmundde T (1999) Making large-scale SVM learning practical. In: Advances in Kernel methods–support vector learning, pp 41–56

  • Joachims T, Finley T, Yu CNJ (2009) Cutting-plane training of structural svms. Mach Learn 77(1):27–59

    Article  MATH  Google Scholar 

  • Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26, Curran Associates, Inc., pp 315–323

  • Kao W, Chung K, Sun C, Lin C (2004) Decomposition methods for linear support vector machines. Neural Comput 16:1689–1704

    Article  MATH  Google Scholar 

  • Keerthi SS, DeCoste D (2005) A modified finite newton method for fast solution of large scale linear svms. JMLR 6:341–361

    MathSciNet  MATH  Google Scholar 

  • Keerthi SS, Decoste D (2006) Building support vector machines with reduced classifier complexity. JMLR 7:1493–1515

    MathSciNet  MATH  Google Scholar 

  • Keerthi SS, Sundararajan S, Chang KW, Hsieh CJ, Lin CJ (2008) A sequential dual method for large scale multi-class linear svms. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, KDD ’08, pp 408–416

  • Knerr S, Personnaz L, Dreyfus G (1990) Neurocomputing: algorithms, architectures and applications. Springer, Berlin, pp 41–50

    Book  Google Scholar 

  • Konečný J, Richtárik P (2013) Semi-stochastic gradient descent methods, 1:19. arXiv:1312.1666

  • Kressel UHG (1999) Advances in Kernel methods. MIT Press, Cambridge, MA, pp 255–268

    Google Scholar 

  • Ladicky L, Torr P (2011) Locally linear support vector machines. ICML, Stockholm, pp 985–992

    Google Scholar 

  • Lazar A (2010) A comparison of linear support vector machine algorithms on large non-sparse datasets. In: Proceedings–9th ICMLA

  • Lee CP, Lin CJ (2013) A study on l2-loss (squared hinge-loss) multi-class svm. Neural Comput 25:1302–1323

    Article  MathSciNet  MATH  Google Scholar 

  • Lee CP, Roth D (2015) Distributed box-constrained quadratic optimization for dual linear SVM. In: Proceedings of the 32nd ICML, 37

  • Lee YJ, Mangasarian O (2001) Ssvm: a smooth support vector machine for classification. Comput Optim Appl 20(1):5–22

    Article  MathSciNet  MATH  Google Scholar 

  • Levenberg K (1944) A method for the solution of certain problems in least squares. Quart Appl Math 2:164–168

    Article  MathSciNet  MATH  Google Scholar 

  • Li L, Li Y, Su H, Chu J (2006) Intelligent computing: international conference on intelligent computing, ICIC 2006, Kunming, China, August 16–19, 2006. Proceedings, Part I, Springer Berlin Heidelberg, Berlin, Heidelberg, chap Least Squares Support Vector Machines Based on Support Vector Degrees, pp 1275–1281

  • Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014a) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, USENIX Association, Berkeley, CA, OSDI’14, pp 583–598

  • Li M, Zhang T, Chen Y, Smola AJ (2014b) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’14, pp 661–670

  • Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471

    Article  Google Scholar 

  • Lin CJ (2001) On the convergence of the decomposition method for support vector machines. IEEE Trans Neural Netw 12(6):1288–1298

    Article  Google Scholar 

  • Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for logistic regression. JMLR 9:627–650

    MathSciNet  MATH  Google Scholar 

  • Lin CY, Tsai CH, Lee CP, Lin CJ (2014) Large-scale logistic regression and linear support vector machines using spark. In: 2014 IEEE international conference on Big Data (Big Data) pp 519–528

  • Lin HT, Lin CJ, Weng RC (2007) A note on platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276

    Article  Google Scholar 

  • Liu D, Shi Y, Tian Y, Huang X (2016) Ramp loss least squares support vector machine. J Comput Sci 14:61–68

    Article  MathSciNet  Google Scholar 

  • Lu Z, Xiao L (2015) On the complexity analysis of randomized block-coordinate descent methods. Math Program 152(1):615–642

    Article  MathSciNet  MATH  Google Scholar 

  • Luss R, d’Aspremont A (2009) Support vector machine classification with indefinite kernels. Math Program Comput 1(2):97–118

    Article  MathSciNet  MATH  Google Scholar 

  • Mangasarian OL (2001) A finite newton method for classification problems. Tech. rep., Data Mining Institute, Computer Sciences Department, University of Wisconsin

  • Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7:1517–1530

    MathSciNet  MATH  Google Scholar 

  • Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. JMLR 1:161–177

    MathSciNet  MATH  Google Scholar 

  • Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74

    Article  Google Scholar 

  • Manno A, Palagi L, Sagratella S (2015) A class of parallel decomposition algorithms for SVMs training, pp 1–24

  • Marquardt D (1963) An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math 11:431–441

    Article  MathSciNet  MATH  Google Scholar 

  • Moré JJ (1978) Numerical analysis. In: Proceedings of the biennial conference held at Dundee, June 28–July 1, 1977, Springer, Berlin, pp 105–116

  • Needell D, Srebro N, Ward R (2014) Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: Proceedings of the 27th international conference on neural information processing systems, MIT Press, Cambridge, MA, NIPS’14, pp 1017–1025

  • Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22(2):341–362

    Article  MathSciNet  MATH  Google Scholar 

  • Nie F, Huang Y, Wang X, Huang H (2014) New primal SVM solver with linear computational cost for big data classifications. In: Proceedings of the 31st international conference on international conference on machine learning, Vol 32, JMLR.org, ICML’14, pp II–505–II–513

  • Nie F, Wang X, Huang H (2017) Multiclass capped Lp-norm SVM for robust classifications. The 31st AAAI Conference on Artificial Intelligence (AAAI), San Francisco, USA

  • Nocedal WS (1999) Numerical optimization. Springer, New York

    Book  MATH  Google Scholar 

  • Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239

    Article  Google Scholar 

  • Parlett B (1998) The symmetric eigenvalue problem. Society for Industrial and Applied Mathematics, Philadelphia, PA

  • Peng X (2010) A nu-twin support vector machine (nu-tsvm) classifier and its geometric algorithms. Inf Sci 180(20):3863–3875

    Article  MATH  Google Scholar 

  • Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Microsoft Res pp MSR–TR–98–14

  • Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin dags for multiclass classification. Adv Neural Inf Process Syst 12:547–553

    Google Scholar 

  • Prez-Cruz F, Weston J, Herrmann D, Schölkopf B (2003) Extension of the nu-SVM range for classification. Nato Sci Series Sub Series iii Comput Syst Sci 190:179–196

    Google Scholar 

  • Ranganathan A (2004) The levenberq-marquardt algorithm

  • Recht B, Re C, Wright S, Niu F (2011) Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems 24, Curran Associates, Inc., pp 693–701

  • Richtárik P, Takáč M (2014) Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math Program 144(1–2):1–38

    Article  MathSciNet  MATH  Google Scholar 

  • Richtárik P, Takáč M (2016) Parallel coordinate descent methods for big data optimization. Math Program 156(1):433–484

    Article  MathSciNet  MATH  Google Scholar 

  • Schmidt M, Le Roux N, Bach F (2016) Minimizing finite sums with the stochastic average gradient. Math Program pp 1–30

  • Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245

    Article  Google Scholar 

  • Shalev-Shwartz S, Singer Y (2006) Online learning meets optimization in the dual, pp 423–437

  • Shalev-Shwartz S, Zhang T (2013a) Accelerated mini-batch stochastic dual coordinate ascent. In: NIPS’ 13 Proceedings of the 26th International Conference on Neural Information Processing Systems, Vol 1. pp 378–385

  • Shalev-Shwartz S, Zhang T (2013b) Stochastic dual coordinate ascent methods for regularized loss. JMLR 14(1):567–599

    MathSciNet  MATH  Google Scholar 

  • Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th international conference on machine learning, ACM, New York, NY, ICML ’07, pp 807–814

  • Shawe-Taylor J, Sun S (2011) A review of optimization methodologies in support vector machines. Neurocomputing 74(17):3609–3618

    Article  Google Scholar 

  • Shi Z, Liu R (2016) A better convergence analysis of the block coordinate descent method for large scale machine learning. arXiv:1608.04826

  • Stefano M, Mikhail B (2011) Laplacian support vector machines trained in the primal. JMLR 12:1149–1184

    MathSciNet  MATH  Google Scholar 

  • Steinwart I (2004) Sparseness of support vector machinessome asymptotically sharp bounds. Adv Neural Inf Process Syst 16:1069–1076

    Google Scholar 

  • Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: Proceedings of the 19th international conference on artificial neural networks: Part I, Springer, Berlin, ICANN ’09, pp 884–893

  • Sun S, Xie X (2016) Semisupervised support vector machines with tangent space intrinsic manifold regularization. IEEE Trans Neural Netw Learn Syst 27(9):1827–1839

    Article  MathSciNet  Google Scholar 

  • Suykens J, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  • Tak M, Bijral A, Richtrik P, Srebro N (2013) Mini-batch primal and dual methods for SVMs. In: 30th international conference on machine learning, Springer, Berlin, pp 537–552

  • Teo CH, Smola A, Vishwanathan SV, Le QV (2007) A scalable modular convex solver for regularized risk minimization. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’07, pp 727–736

  • Teo CH, Vishwanthan S, Smola AJ, Le QV (2010) Bundle methods for regularized risk minimization. JMLR 11:311–365

    MathSciNet  MATH  Google Scholar 

  • Tsai CH, Lin CJCY (2014) Incremental and decremental training for linear classification. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’14, pp 343–352

  • Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494

    Article  MathSciNet  MATH  Google Scholar 

  • Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, ICML ’04, p 104

  • Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New York

    MATH  Google Scholar 

  • Wang Z, Crammer K, Vucetic S (2012) Breaking the curse of kernelization : budgeted stochastic gradient descent for large-scale SVM training. JMLR 13(1):3103–3131

    MathSciNet  MATH  Google Scholar 

  • Wen T, Edelman A, Gorsich D (2003) Contemporary mathematics. American Mathematical Society, Boston, MA, chap A Fast Projected Conjugate Gradient Algorithm for Training Support Vector Machines, pp 245–263

  • Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: ESANN

  • Willoughby RA (1979) Solutions of ill-posed problems (a. n. tikhonov and v. y. arsenin). SIAM Rev 21(2):266–267

    Article  Google Scholar 

  • Woodsend K, Gondzio J (2009) Hybrid MPI/OpenMP parallel linear support vector machine training. JMLR 10:1937–1953

    MathSciNet  MATH  Google Scholar 

  • Wright SJ (2015) Coordinate descent algorithms. Math Program 151(1):3–34

    Article  MathSciNet  MATH  Google Scholar 

  • Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optim 24(4):2057–2075

    Article  MathSciNet  MATH  Google Scholar 

  • Xie X, Sun S (2015) Multi-view twin support vector machines. Intell Data Anal 19(4):701–712

    Article  Google Scholar 

  • Xu C, Tao D, Xu C (2013) A survey on multi-view learning, pp 1–59. arXiv:1304.5634v1

  • Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6(3):1758–1789

    Article  MathSciNet  MATH  Google Scholar 

  • Xu Y, Yin W (2015) Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J Optim 25(3):1686–1716

    Article  MathSciNet  MATH  Google Scholar 

  • Yang Y, Chen J, Zhu J (2016) Distributing the stochastic gradient sampler for large-scale lda. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’16, pp 1975–1984

  • Ye J, Xiong T (2007) SVM versus least squares SVM. In: JMLR W&P, pp 644–651

  • Yuan GX, Chang KW, Hsieh CJ, Lin CJ (2010) A comparison of optimization methods and software for large-scale L1-regularized linear classification. JMLR 11:3183–3234

    MathSciNet  MATH  Google Scholar 

  • Yuan GX, Ho CH, Lin CJ (2012) Recent advances of large-scale linear classification. Proc IEEE 100(9):2584–2603

    Article  Google Scholar 

  • Zanghirati G, Zanni L (2003) A parallel solver for large quadratic programs in training support vector machines. Parallel Comput 29(4):535–551

    Article  MathSciNet  MATH  Google Scholar 

  • Zanni L, Serafini T, Zanghirati G (2006) Parallel software for training large scale support vector machines on multiprocessor systems. JMLR 7:1467–1492

    MathSciNet  MATH  Google Scholar 

  • Zhang A, Gu Q (2016) Accelerated stochastic block coordinate descent with optimal sampling. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’16, pp 2035–2044

  • Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, ICML ’04

  • Zhu J, Rosset S, Hastie T, Tibshirani R (2003) 1-norm support vector machines. In: Neural information processing systems, MIT Press, Cambridge, p 16

Download references

Acknowledgements

First author would like to thank Ministry of Human Resource Development, Government of INDIA, for providing fellowship (University Grants Commission–Senior Research Fellowship) to pursue his Ph.D. work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anuj Sharma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chauhan, V.K., Dahiya, K. & Sharma, A. Problem formulations and solvers in linear SVM: a review. Artif Intell Rev 52, 803–855 (2019). https://doi.org/10.1007/s10462-018-9614-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-018-9614-6

Keywords