Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques

6240 Accesses
61 Citations
Explore all metrics

Abstract

Optimization of simulation-based or data-driven systems is a challenging task, which has attracted significant attention in the recent literature. A very efficient approach for optimizing systems without analytical expressions is through fitting surrogate models. Due to their increased flexibility, nonlinear interpolating functions, such as radial basis functions and Kriging, have been predominantly used as surrogates for data-driven optimization; however, these methods lead to complex nonconvex formulations. Alternatively, commonly used regression-based surrogates lead to simpler formulations, but they are less flexible and inaccurate if the form is not known a priori. In this work, we investigate the efficiency of subset selection regression techniques for developing surrogate functions that balance both accuracy and complexity. Subset selection creates sparse regression models by selecting only a subset of original features, which are linearly combined to generate a diverse set of surrogate models. Five different subset selection techniques are compared with commonly used nonlinear interpolating surrogate functions with respect to optimization solution accuracy, computation time, sampling requirements, and model sparsity. Our results indicate that subset selection-based regression functions exhibit promising performance when the dimensionality is low, while interpolation performs better for higher dimensional problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

CMS: a novel surrogate model with hierarchical structure based on correlation mapping

Article 07 June 2022

Automatic selection for general surrogate models

Article 20 February 2018

Surrogate-assisted global sensitivity analysis: an overview

Article 17 January 2020

References

Boukouvala, F., Floudas, C.A.: ARGONAUT: AlgoRithms for Global Optimization of coNstrAined grey-box compUTational problems. Optim. Lett. 11(5), 895–913 (2017)
MathSciNet MATH Google Scholar
Cozad, A., Sahinidis, N.V., Miller, D.C.: Learning surrogate models for simulation-based optimization. AIChE J. 60(6), 2211–2227 (2014)
Google Scholar
Amaran, S., et al.: Simulation optimization: a review of algorithms and applications. 4OR 12(4), 301–333 (2014)
MathSciNet MATH Google Scholar
Tekin, E., Sabuncuoglu, I.: Simulation optimization: a comprehensive review on theory and applications. IIE Trans. 36(11), 1067–1081 (2004)
Google Scholar
Bhosekar, A., Ierapetritou, M.: Advances in surrogate based modeling, feasibility analysis, and optimization: a review. Comput. Chem. Eng. 108, 250–267 (2018)
Google Scholar
Bajaj, I., Iyer, S.S., Faruque Hasan, M.M.: A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point. Comput. Chem. Eng. 116, 306–321 (2017)
Google Scholar
Forrester, A.I.J., Keane, A.J.: Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 45(1), 50–79 (2009)
Google Scholar
Jakobsson, S., et al.: A method for simulation based optimization using radial basis functions. Optim. Eng. 11(4), 501–532 (2010)
MathSciNet MATH Google Scholar
Boukouvala, F., Muzzio, F.J., Ierapetritou, M.G.: Dynamic data-driven modeling of pharmaceutical processes. Ind. Eng. Chem. Res. 50(11), 6743–6754 (2011)
Google Scholar
Bittante, A., Pettersson, F., Saxén, H.: Optimization of a small-scale LNG supply chain. Energy 148, 79–89 (2018)
Google Scholar
Sampat, A.M., et al.: Optimization formulations for multi-product supply chain networks. Comput. Chem. Eng. 104, 296–310 (2017)
Google Scholar
Beykal, B., et al.: Global optimization of grey-box computational systems using surrogate functions and application to highly constrained oil-field operations. Comput. Chem. Eng. 114, 99–110 (2018)
Google Scholar
Ciaurri, D.E., Mukerji, T., Durlofsky, L.J.: Derivative-free optimization for oil field operations, in computational optimization and applications in engineering and industry. In: Yang, X.-S., Koziel, S. (eds.), pp. 19–55 Springer, Berlin (2011)
Jansen, J.D., Durlofsky, L.J.: Use of reduced-order models in well control optimization. Optim. Eng. 18(1), 105–132 (2017)
MathSciNet MATH Google Scholar
Isebor, O.J., Durlofsky, L.J., Echeverría Ciaurri, D.: A derivative-free methodology with local and global search for the constrained joint optimization of well locations and controls. Comput. Geosci. 18(3), 463–482 (2014)
Google Scholar
Khoury, G.A., et al.: Princeton_TIGRESS 2.0: High refinement consistency and net gains through support vector machines and molecular dynamics in double-blind predictions during the CASP11 experiment. Proteins Struct. Funct. Bioinform. 85(6): 1078–1098 (2017)
Google Scholar
Liwo, A., et al.: Protein structure prediction by global optimization of a potential energy function. Proc. Natl. Acad. Sci. 96(10), 5482 (1999)
Google Scholar
DiMaio, F., et al.: Improved molecular replacement by density- and energy-guided protein structure optimization. Nature 473, 540 (2011)
Google Scholar
Wang, C., et al.: An evaluation of adaptive surrogate modeling based optimization with two benchmark problems. Environ. Model Softw. 60, 167–179 (2014)
Google Scholar
Fen, C.-S., Chan, C., Cheng, H.-C.: Assessing a response surface-based optimization approach for soil vapor extraction system design. J. Water Resour. Plan. Manag. 135(3), 198–207 (2009)
Google Scholar
Jones, D.R.: A taxonomy of global optimization methods based on response surfaces. J. Glob. Optim. 21(4), 345–383 (2001)
MathSciNet MATH Google Scholar
Palmer, K., Realff, M.: Metamodeling approach to optimization of steady-state flowsheet simulations: model generation. Chem. Eng. Res. Des. 80(7), 760–772 (2002)
Google Scholar
Anand, P., Siva Prasad, B.V.N., Venkateswarlu, C.H.: Modeling and optimization of a pharmaceutical formulation system using radial basis function network. Int. J. Neural Syst. 19(02), 127–136 (2009)
Google Scholar
Jeong, S., Murayama, M., Yamamoto, K.: Efficient optimization design method using Kriging model. J. Aircr. 42, 413–420 (2005)
Google Scholar
Miller, A.J.: Selection of subsets of regression variables. J. R. Stat. Soc. Ser. A (General) 147(3), 389–425 (1984)
MathSciNet MATH Google Scholar
Candès, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
MathSciNet MATH Google Scholar
Guyon, I., et al.: Gene Selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
MATH Google Scholar
Feng, G., et al.: Feature subset selection using naive Bayes for text classification. Pattern Recogn. Lett. 65, 109–115 (2015)
Google Scholar
Wright, J., et al.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Google Scholar
Sahinidis, N.: The ALAMO approach to machine learning. In: Kravanja, Z., Bogataj, M. (eds.) Computer Aided Chemical Engineering, p. 2410. Elsevier, Amsterdam (2016)
Google Scholar
Cozad, A., Sahinidis, N., Miller, D.: A combined first-principles and data-driven approach to model building. Comput. Chem. Eng. 73, 116–127 (2015)
Google Scholar
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13(4), 455–492 (1998)
MathSciNet MATH Google Scholar
Regis, R.G., Shoemaker, C.A.: Constrained global optimization of expensive black box functions using radial basis functions. J. Glob. Optim. 31(1), 153–171 (2005)
MathSciNet MATH Google Scholar
Gorissen, D., et al.: A surrogate modeling and adaptive sampling toolbox for computer based design. J. Mach. Learn. Res. 11, 2051–2055 (2010)
Google Scholar
Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005)
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity. Chapman and Hall, New York (2015)
MATH Google Scholar
Ren, H.: Greedy vs. L1 Convex Optimization in Sparse Coding: Comparative Study in Abnormal Event Detection (2015)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
MathSciNet MATH Google Scholar
Hastie, T., Qian, J.: Glmnet Vignette (2014). [cited 2018; https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html]
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
MathSciNet Google Scholar
Kawano, S., et al.: Sparse principal component regression with adaptive loading. Comput. Stat. Data Anal. 89, 192–203 (2015)
MathSciNet MATH Google Scholar
Geladi, P., Kowalski, B.R.: Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)
Google Scholar
Chun, H., Keleş, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72(1), 3–25 (2010)
MathSciNet MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
MathSciNet Google Scholar
Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 17(1), 113–126 (2004)
MATH Google Scholar
Boukouvala, F., Hasan, M.M.F., Floudas, C.A.: Global optimization of general constrained grey-box models: new method and its application to constrained PDEs for pressure swing adsorption. J. Global Optim. 67(1), 3–42 (2017)
MathSciNet MATH Google Scholar
Friedman, J.H., et al.: Package ‘glmnet’: lasso and elastic-net regularized generalized linear models (2018). https://cran.r-project.org/web/packages/glmnet/glmnet.pdf. Accessed 1 May 2018
Zou, H.: Package ‘elasticnet’: elastic-net for sparse estimation and sparse PCA (2015). https://cran.r-project.org/web/packages/elasticnet/elasticnet.pdf. Accessed 1 May 2018
Kawano, S.: Package ‘spcr’: sparse principal component regression (2016). https://cran.r-project.org/web/packages/spcr/spcr.pdf. Accessed 1 May 2018
Chung, D., Chun, H., Keleş, S.: An introduction to the ‘spls’ package, Version 1.0. (2018). https://cran.r-project.org/web/packages/spls/vignettes/spls-example.pdf. Accessed 1 May 2018
Karatzoglou, A., Smola, A.J., Hornik, K.: Package ‘kernlab’: kernel-based machine learning lab (2018). https://cran.r-project.org/web/packages/kernlab/kernlab.pdf. Accessed 1 May 2018
Kuhn, M.: Package ‘caret’: classification and regression training (2018). https://cran.r-project.org/web/packages/caret/caret.pdf. Accessed 1 May 2018
Drud, A.: CONOPT. [cited 2018; https://www.gams.com/latest/docs/S_CONOPT.html
Rios, L.M., Sahinidis, N.V.: Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Glob. Optim. 56(3), 1247–1293 (2013)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, USA
Sun Hye Kim & Fani Boukouvala

Authors

Sun Hye Kim
View author publications
You can also search for this author in PubMed Google Scholar
Fani Boukouvala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fani Boukouvala.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S.H., Boukouvala, F. Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques. Optim Lett 14, 989–1010 (2020). https://doi.org/10.1007/s11590-019-01428-7

Download citation

Received: 26 July 2018
Accepted: 18 April 2019
Published: 09 May 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11590-019-01428-7

Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CMS: a novel surrogate model with hierarchical structure based on correlation mapping

Automatic selection for general surrogate models

Surrogate-assisted global sensitivity analysis: an overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CMS: a novel surrogate model with hierarchical structure based on correlation mapping

Automatic selection for general surrogate models

Surrogate-assisted global sensitivity analysis: an overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation