Abstract
Given multiple prediction problems such as regression or classification, we are interested in a joint inference framework that can effectively share information between tasks to improve the prediction accuracy, especially when the number of training examples per problem is small. In this paper we propose a probabilistic framework which can support a set of latent variable models for different multi-task learning scenarios. We show that the framework is a generalization of standard learning methods for single prediction problems and it can effectively model the shared structure among different prediction tasks. Furthermore, we present efficient algorithms for the empirical Bayes method as well as point estimation. Our experiments on both simulated datasets and real world classification datasets show the effectiveness of the proposed models in two evaluation settings: a standard multi-task learning setting and a transfer learning setting.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Ando, R., & Zhang, T. (2004). A framework for learning predictive structures from multiple tasks and unlabeled data (Technical Report RC23462). IBM T.J. Watson Research Center, 45.
Argyriou, A., Evgeniou, T., & Pontil, M. (2006). Multi-task feature learning. In Advances in neural information processing systems (NIPS) 19. Cambridge: MIT Press.
Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research, 12, 149–198.
Breiman, L., & Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society B, 59(1), 3–54.
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.
Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
Ferguson, T. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209–230.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: data mining, inference and prediction (1st ed.). Berlin: Springer.
Heskes, T. (2000). Empirical Bayes for learning to learn. In Proc. 17th international conf. on machine learning (pp. 367–374). San Mateo, CA: Morgan Kaufmann.
Jaakkola, T., & Jordan, M. (1997). A variational approach to Bayesian logistic regression models and their extensions. In Proceedings of 6th international workshop on AI and statistics.
Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th international conference on machine learning (ICML).
Lehmann, E., & Casella, G. (1998). Theory of point estimation (2nd ed.). Berlin: Springer.
Lenk, P., DeSarbo, W., Green, P., & Young, M. (1996). Hierarchical Bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Marketing Science, 15(2), 173–191.
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). London: Chapman & Hall/CRC.
Silver, D., & Mercer, R. (2001). Selective functional transfer: Inductive bias from related tasks. In Proceedings of the IASTED international conference on artificial intelligence and soft computing (ASC2001) (pp. 182–189).
Silverman, B. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall/CRC.
Tanner, M. A. (2005). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions (3rd ed.). Berlin: Springer.
Teh, Y., Seeger, M., & Jordan, M. (2005). Semiparametric latent factor models. In AISTAT.
Thrun, S., & Pratt, L. (1998). Learning to learn. Dordrecht: Kluwer Academic.
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 640–646). Cambridge: MIT Press.
Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In Proceedings of 22nd international conference on machine learning (ICML).
Zhang, J., Ghahramani, Z., & Yang, Y. (2005). Learning multiple related tasks using latent independent component analysis. In Neural information processing systems (NIPS) 18.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Daniel L. Silver, Kristin Bennett, Richard Caruana.
Rights and permissions
About this article
Cite this article
Zhang, J., Ghahramani, Z. & Yang, Y. Flexible latent variable models for multi-task learning. Mach Learn 73, 221–242 (2008). https://doi.org/10.1007/s10994-008-5050-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5050-1