Mixed-integer quadratic programming reformulations of multi-task learning models

Matteo Lapucci ^,,
Davide Pucci

Department of Information Engineering, Università degli Studi di Firenze, Via di Santa Marta 3, 50139 Florence, Italy

Received: 13 February 2021 Revised: 12 January 2022 Accepted: 14 February 2022 Published: 24 March 2022

In this manuscript, we consider well-known multi-task learning (MTL) models from the literature for linear regression problems, such as clustered MTL or weakly constrained MTL. We propose novel reformulations of the training problem for these models, based on mixed-integer quadratic programming (MIQP) techniques. We show that our approach allows to drive the optimization process up to certified global optimality, exploiting popular off-the-shelf software solvers. By computational experiments on both synthetic and real-world datasets, we show that this strategy generally leads to improvements in terms of the predictive performance of the models, if compared to the classical local optimization techniques, based on alternating minimization strategies, that are usually employed. We also suggest a number of possible extensions of our model that should further improve the quality of the obtained regressors, introducing, for example, sparsity and features selection elements.
- multitask learning,
- clustered MTL,
- weakly constrained MTL,
- MIQP,
- global optimization
Citation: Matteo Lapucci, Davide Pucci. Mixed-integer quadratic programming reformulations of multi-task learning models[J]. Mathematics in Engineering, 2023, 5(1): 1-16. doi: 10.3934/mine.2023020

Related Papers:

Abstract

In this manuscript, we consider well-known multi-task learning (MTL) models from the literature for linear regression problems, such as clustered MTL or weakly constrained MTL. We propose novel reformulations of the training problem for these models, based on mixed-integer quadratic programming (MIQP) techniques. We show that our approach allows to drive the optimization process up to certified global optimality, exploiting popular off-the-shelf software solvers. By computational experiments on both synthetic and real-world datasets, we show that this strategy generally leads to improvements in terms of the predictive performance of the models, if compared to the classical local optimization techniques, based on alternating minimization strategies, that are usually employed. We also suggest a number of possible extensions of our model that should further improve the quality of the obtained regressors, introducing, for example, sparsity and features selection elements.

References

[1]	A. Agarwal, S. Gerber, H. Daume, Learning multiple tasks using manifold regularization, In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, 2010, 46–54.
[2]	A. Ahmed, A. Das, A. J. Smola, Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising, In: Proceedings of the 7th ACM international conference on Web search and data mining, 2014,153–162.
[3]	R. K. Ando, T. Zhang, P. Bartlett, A framework for learning predictive structures from multiple tasks and unlabeled data, J. Mach. Learn. Res., 6 (2005), 1817–1853.
[4]	A. Argyriou, T. Evgeniou, M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243–272. http://dx.doi.org/10.1007/s10994-007-5040-8 doi: 10.1007/s10994-007-5040-8
[5]	J. Bai, K. Zhou, G. Xue, H. Zha, G. Sun, B. Tseng, et al., Multi-task learning for learning to rank in web search, In: Proceedings of the 18th ACM conference on Information and knowledge management, 2009, 1549–1552. http://dx.doi.org/10.1145/1645953.1646169
[6]	A. Barzilai, K. Crammer, Convex multi-task learning by clustering, In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015, 65–73.
[7]	D. P. Bertsekas, J. N. Tsitsiklis, Parallel and distributed computation: numerical methods, NJ: Prentice hall Englewood Cliffs, 1989.
[8]	D. Bertsimas, A. King, Logistic regression: From art to science, Statist. Sci., 32 (2017), 367–384. http://dx.doi.org/10.1214/16-STS602 doi: 10.1214/16-STS602
[9]	D. Bertsimas, A. King, R. Mazumder, Best subset selection via a modern optimization lens, Ann. Statist., 44 (2016), 813–852. http://dx.doi.org/10.1214/15-AOS1388 doi: 10.1214/15-AOS1388
[10]	S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Now Publishers Inc, 2011. http://dx.doi.org/10.1561/2200000016
[11]	R. H. Byrd, P. Lu, J. Nocedal, C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput., 16 (1995), 1190–1208. http://dx.doi.org/10.1137/0916069 doi: 10.1137/0916069
[12]	R. Caruana, Multitask learning, Mach. Learn., 28 (1997), 41–75. http://dx.doi.org/10.1023/A:1007379606734 doi: 10.1023/A:1007379606734
[13]	J. Chen, J. Zhou, J. Ye, Integrating low-rank and group-sparse structures for robust multi-task learning, In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, 42–50.
[14]	E. Civitelli, M. Lapucci, F. Schoen, A. Sortino, An effective procedure for feature subset selection in logistic regression based on information criteria, Comput. Optim. Appl., 80 (2001), 1–32. http://dx.doi.org/10.1007/s10589-021-00288-1 doi: 10.1007/s10589-021-00288-1
[15]	L. Di Gangi, M. Lapucci, F. Schoen, A. Sortino, An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series, Comput. Optim. Appl., 74 (2019), 919–948. http://dx.doi.org/10.1007/s10589-019-00134-5 doi: 10.1007/s10589-019-00134-5
[16]	T. Evgeniou, C. A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615–637.
[17]	T. Evgeniou, M. Pontil, Regularized multi–task learning, In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004,109–117. http://dx.doi.org/10.1145/1014052.1014067
[18]	H. Goldstein, Multilevel modelling of survey data, Journal of the Royal Statistical Society. Series D (The Statistician), 40 (1991), 235–244. http://dx.doi.org/10.2307/2348496 doi: 10.2307/2348496
[19]	A. Gómez, O. A. Prokopyev, A mixed-integer fractional optimization approach to best subset selection, INFORMS J. Comput., 33 (2021), 551–565. http://dx.doi.org/10.1287/ijoc.2020.1031 doi: 10.1287/ijoc.2020.1031
[20]	Gurobi Optimization, LLC, Gurobi optimizer reference manual. Available from: https://www.gurobi.com/documentation/9.5/refman/index.html.
[21]	L. Han, Y. Zhang, Learning multi-level task groups in multi-task learning, In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, 2638–2644.
[22]	Q. Hu, Z. Wu, K. Richmond, J. Yamagishi, Y. Stylianou, R. Maia, Fusion of multiple parameterisations for dnn-based sinusoidal speech synthesis with multi-task learning, In: Sixteenth annual conference of the international speech communication association, 2015.
[23]	A. Jalali, S. Sanghavi, C. Ruan, P. Ravikumar, A dirty model for multi-task learning, In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, 2010,964–972.
[24]	Z. Kang, K. Grauman, F. Sha, Learning with whom to share in multi-task feature learning, In: Proceedings of the 28 th International Conference on Machine Learning, 2011, 1–8.
[25]	Q. Liu, Q. Xu, V. W. Zheng, H. Xue, Z. Cao, Q. Yang, Multi-task learning for cross-platform sirna efficacy prediction: an in-silico study, BMC Bioinformatics, 11 (2010), 181. http://dx.doi.org/10.1186/1471-2105-11-181 doi: 10.1186/1471-2105-11-181
[26]	X. Lu, Y. Wang, X. Zhou, Z. Zhang, Z. Ling, Traffic sign recognition via multi-modal tree-structure embedded multi-task learning, IEEE Trans. Intell. Transp. Syst., 18 (2016), 960–972. http://dx.doi.org/10.1109/TITS.2016.2598356 doi: 10.1109/TITS.2016.2598356
[27]	R. Miyashiro, Y. Takano, Mixed integer second-order cone programming formulations for variable selection in linear regression, Eur. J. Oper. Res., 247 (2015), 721–731. http://dx.doi.org/10.1016/j.ejor.2015.06.081 doi: 10.1016/j.ejor.2015.06.081
[28]	T. Sato, Y. Takano, R. Miyashiro, A. Yoshise, Feature subset selection for logistic regression via mixed integer optimization, Comput. Optim. Appl., 64 (2016), 865–880. http://dx.doi.org/10.1007/s10589-016-9832-2 doi: 10.1007/s10589-016-9832-2
[29]	C. Su, F. Yang, S. Zhang, Q. Tian, L. S. Davis, W. Gao, Multi-task learning with low rank attribute embedding for person re-identification, In: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, 3739–3747. http://dx.doi.org/10.1109/ICCV.2015.426
[30]	J. Torres, G. Bai, J. Wang, L. Zhao, C. Vaca, C. Abad, Sign-regularized multi-task learning, 2021, arXiv: 2102.11191.
[31]	A. Tsanas, M. Little, P. McSharry, L. Ramig, Accurate telemonitoring of parkinson's disease progression by noninvasive speech tests, IEEE Trans. Biomed. Eng., 57 (2010), 884–893. http://dx.doi.org/10.1109/TBME.2009.2036000 doi: 10.1109/TBME.2009.2036000
[32]	H. Wang, F. Nie, H. Huang, S. Risacher, C. Ding, A. J. Saykin, et al., Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance, In: 2011 International Conference on Computer Vision, 2011,557–562. http://dx.doi.org/10.1109/ICCV.2011.6126288
[33]	H. Wang, F. Nie, H. Huang, J. Yan, S. Kim, S. Risacher, et al., High-order multi-task feature learning to identify longitudinal phenotypic markers for alzheimer's disease progression prediction, In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012, 1277–1285.
[34]	J. Wang, L. Zhao, L. Wu, Multi-convex inequality-constrained alternating direction method of multipliers, 2019, arXiv: 1902.10882.
[35]	C. Williams, E. V. Bonilla, K. M. Chai, Multi-task gaussian process prediction, In: Proceedings of the 20th International Conference on Neural Information Processing Systems, 2007,153–160.
[36]	Z. Wu, C. Valentini-Botinhao, O. Watts, S. King, Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, 4460–4464. http://dx.doi.org/10.1109/ICASSP.2015.7178814
[37]	Q. Xu, S. J. Pan, H. H. Xue, Q. Yang, Multitask learning for protein subcellular location prediction, IEEE/ACM Trans. Comput. Bi, 8 (2010), 748–759. http://dx.doi.org/10.1109/TCBB.2010.22 doi: 10.1109/TCBB.2010.22
[38]	T. Zhang, B. Ghanem, S. Liu, N. Ahuja, Robust visual tracking via multi-task sparse learning, In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, 2042–2049. http://dx.doi.org/10.1109/CVPR.2012.6247908
[39]	Y. Zhang, Q. Yang, A survey on multi-task learning, IEEE Trans. Knowl. Data Eng., in press. http://dx.doi.org/10.1109/TKDE.2021.3070203
[40]	Y. Zhang, D.-Y. Yeung, A regularization approach to learning task relationships in multitask learning, ACM Trans. Knowl. Discov. Data, 8 (2014), 1–31. http://dx.doi.org/10.1145/2538028 doi: 10.1145/2538028
[41]	J. Zheng, L. M. Ni, Time-dependent trajectory regression on road networks via multi-task learning, In: Proceedings of the twenty-seventh AAAI conference on artificial intelligence, 2013, 1048–1055.
[42]	J. Zhou, J. Chen, J. Ye, Clustered multi-task learning via alternating structure optimization, Adv. Neural Inf. Process. Syst., 24 (2011), 702–710.
[43]	Q. Zhou, Q. Zhao, Flexible clustered multi-task learning by learning representative tasks, IEEE Trans. Pattern Anal., 38 (2015), 266–278. http://dx.doi.org/10.1109/TPAMI.2015.2452911 doi: 10.1109/TPAMI.2015.2452911

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)