Convex multi-task feature learning

Andreas Argyriou¹,
Theodoros Evgeniou² &
Massimiliano Pontil¹

11k Accesses
905 Citations
12 Altmetric
1 Mention
Explore all metrics

Abstract

We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known single-task 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns task-specific functions and in the latter step it learns common-across-tasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select—not learn—a few common variables across the tasks.

Article PDF

Low-Rank and Sparse Multi-task Learning

Effective Learning with Joint Discriminative and Representative Feature Selection

Multi-Task Learning with Group-Specific Feature Space Sharing

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aaker, D. A., Kumar, V., & Day, G. S. (2004). Marketing research (8th ed.). New York: Wiley.
Google Scholar
Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2006). Low-rank matrix factorization with attributes (Technical Report 2006/68/TOM/DS). INSEAD, Working paper.
Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.
MathSciNet Google Scholar
Argyriou, A., Micchelli, C. A., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. In Lecture notes in artificial intelligence : Vol. 3559. Proceedings of the 18th annual conference on learning theory (COLT) (pp. 338–352). Berlin: Springer.
Google Scholar
Argyriou, A., Evgeniou, T., & Pontil, M. (2007a). Multi-task feature learning. In Schölkopf, B. Platt, J. Hoffman, T. (Eds.), Advances in neural information processing systems (Vol. 19, pp. 41–48). Cambridge: MIT Press.
Google Scholar
Argyriou, A., Micchelli, C. A., & Pontil, M. (2007b). Representer theorems for spectral norms. Working paper, Dept. of Computer Science, University College London.
Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 686, 337–404.
Article MathSciNet Google Scholar
Bakker, B., & Heskes, T. (2003). Task clustering and gating for Bayesian multi–task learning. Journal of Machine Learning Research, 4, 83–99.
Article Google Scholar
Baxter, J. (2000). A model for inductive bias learning. Journal of Artificial Intelligence Research, 12, 149–198.
MATH MathSciNet Google Scholar
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. In Lecture notes in computer science : Vol. 2777. Proceedings of the 16th annual conference on learning theory (COLT) (pp. 567–580). Berlin: Springer.
Google Scholar
Bennett, K. P., & Embrechts, M. J. (2003). An optimization perspective on partial least squares. In J. A. K. Suykens, G. Horvath, S. Basu, C. Micchelli, J. Vandewalle (Eds.), NATO science series III: computer & systems sciences : Vol. 190. Advances in learning theory: methods, models and applications (pp. 227–250). Amsterdam: IOS Press.
Google Scholar
Bhatia, R. (1997). Matrix analysis. Springer: Graduate texts in Mathematics.
Google Scholar
Borga, M. (1998). Learning multidimensional signal processing. PhD thesis, Dept. of Electrical Engineering, Linköping University, Sweden.
Borwein, J. M., & Lewis, A. S. (2005). CMS books in mathematics. Convex analysis and nonlinear optimization: theory and examples. Berlin: Springer.
Google Scholar
Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
MATH Google Scholar
Breiman, L., & Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society, Series B, 59(1), 3–54.
Article MATH MathSciNet Google Scholar
Caponnetto, A., & De Vito, E. (2006). Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, August 2006.
Caruana, R. (1997). Multi-task learning. Machine Learning, 28, 41–75.
Article Google Scholar
Chapelle, O., & Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 17, pp. 257–264). Cambridge: MIT Press.
Google Scholar
Donoho, D. (2004). For most large underdetermined systems of linear equations, the minimal l1-norm near-solution approximates the sparsest near-solution. Preprint, Dept. of Statistics, Stanford University.
Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
MathSciNet Google Scholar
Evgeniou, T., Pontil, M., & Toubia, O. (2006). A convex optimization approach to modeling consumer heterogeneity in conjoint estimation (Technical Report). INSEAD.
Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American control conference (Vol. 6, pp. 4734–4739).
Goldstein, H. (1991). Multilevel modelling of survey data. The Statistician, 40, 235–244.
Article Google Scholar
Golub, G. H., & van Loan, C. F. (1996). Matrix computations. Baltimore: Johns Hopkins University Press.
MATH Google Scholar
Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 16(12), 2639–2664.
Article MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). Springer series in statistics. The elements of statistical learning: data mining, inference and prediction. Berlin: Springer.
MATH Google Scholar
Heisele, B., Serre, T., Pontil, M., Vetter, T., & Poggio, T. (2002). Categorization by learning and combining object parts. In Advances in neural information processing systems (Vol. 14, pp. 1239–1245). Cambridge: MIT Press.
Google Scholar
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.
MATH Google Scholar
Izenman, A. J. (1975). Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5, 248–264.
Article MATH MathSciNet Google Scholar
Jebara, T. (2004). Multi-task feature and kernel selection for SVMs. In Proceedings of the 21st international conference on machine learning.
Lawrence, N. D., & Platt, J. C. (2004). Learning to learn with the informative vector machine. In R. Greiner (Ed.), Proceedings of the international conference in machine learning. Helsinki: Omnipress.
Google Scholar
Lenk, P. J., DeSarbo, W. S., Green, P. E., & Young, M. R. (1996). Hierarchical Bayes conjoint analysis: recovery of partworth heterogeneity from reduced experimental designs. Marketing Science, 15(2), 173–191.
Article Google Scholar
Lewis, A. S. (1995). The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis, 2(1), 173–183.
MATH MathSciNet Google Scholar
Maurer, A. (2006). Bounds for linear multi-task learning. Journal of Machine Learning Research, 7, 117–139.
MathSciNet Google Scholar
Micchelli, C. A., & Pinkus, A. (1994). Variational problems arising from balancing several error criteria. Rendiconti di Matematica, Serie VII, 14, 37–86.
MATH MathSciNet Google Scholar
Micchelli, C. A., & Pontil, M. (2005). On learning vector-valued functions. Neural Computation, 17, 177–204.
Article MATH MathSciNet Google Scholar
Neve, M., De Nicolao, G., & Marchesi, L. (2007). Nonparametric identification of population models via Gaussian processes. Automatica (Journal of IFAC), 43(7), 1134–1144.
Article MATH Google Scholar
Obozinski, G., Taskar, B., & Jordan, M. I. (2006). Multi-task feature selection (Technical report). Deptartment of Statistics, UC Berkeley, June 2006.
Poggio, T., & Girosi, F. (1998). A sparse representation for function approximation. Neural Computation, 10, 1445–1454.
Article Google Scholar
Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., & Poggio, T. (2005). Theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex (AI Memo 2005-036). Massachusetts Institute of Technology.
Srebro, N., Rennie, J. D. M., & Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In Advances in neural information processing systems (Vol. 17, pp. 1329–1336). Cambridge: MIT Press.
Google Scholar
Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 762–769).
Wahba, G. (1990). Series in applied mathematics : Vol. 59. Splines models for observational data. Philadelphia: SIAM.
Google Scholar
Wold, S., Ruhe, A., Wold, H., & Dunn III, W. J. (1984). The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal of Scientific Computing, 3, 735–743.
Google Scholar
Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8, 35–63.
MathSciNet Google Scholar
Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In Proceedings of the 22nd international conference on machine learning.
Zhang, J., Ghahramani, Z., & Yang, Y. (2006). Learning multiple related tasks using latent independent component analysis. In Advances in neural information processing systems (Vol. 18, pp. 1585–1592). Cambridge: MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
Andreas Argyriou & Massimiliano Pontil
Technology Management and Decision Sciences, INSEAD, 77300, Fontainebleau, France
Theodoros Evgeniou

Authors

Andreas Argyriou
View author publications
You can also search for this author in PubMed Google Scholar
Theodoros Evgeniou
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Pontil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Argyriou.

Additional information

Editors: Daniel Silver, Kristin Bennett, Richard Caruana.

This is a longer version of the conference paper (Argyriou et al. in Advances in neural information processing systems, vol. 19, 2007a). It includes new theoretical and experimental results.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Argyriou, A., Evgeniou, T. & Pontil, M. Convex multi-task feature learning. Mach Learn 73, 243–272 (2008). https://doi.org/10.1007/s10994-007-5040-8

Download citation

Received: 06 February 2007
Revised: 10 May 2007
Accepted: 01 August 2007
Published: 09 January 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s10994-007-5040-8

Convex multi-task feature learning

Abstract

Article PDF

Similar content being viewed by others

Low-Rank and Sparse Multi-task Learning

Effective Learning with Joint Discriminative and Representative Feature Selection

Multi-Task Learning with Group-Specific Feature Space Sharing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Convex multi-task feature learning

Abstract

Article PDF

Similar content being viewed by others

Low-Rank and Sparse Multi-task Learning

Effective Learning with Joint Discriminative and Representative Feature Selection

Multi-Task Learning with Group-Specific Feature Space Sharing

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation