Multitask Learning

Rich Caruana¹

50k Accesses
4276 Citations
36 Altmetric
6 Mentions
Explore all metrics

Abstract

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals, and presents new results for MTL with k-nearest neighbor and kernel regression. In this paper we demonstrate multitask learning in three domains. We explain how multitask learning works, and show that there are many opportunities for multitask learning in real domains. We present an algorithm and results for multitask learning with case-based methods like k-nearest neighbor and kernel regression, and sketch an algorithm for multitask learning in decision trees. Because multitask learning works, can be applied to many different kinds of domains, and can be used with different learning algorithms, we conjecture there will be many opportunities for its use on real-world problems.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abu-Mostafa, Y. S. (1990). “Learning from Hints in Neural Networks,” Journal of Complexity, 6(2), pp. 192–198.
Article Google Scholar
Abu-Mostafa, Y. S. (1993). “Hints and the VC Dimension,” Neural Computation, 5(2).
Abu-Mostafa, Y. S. (1995). “Hints,” Neural Computation, 7, pp. 639-671.
Google Scholar
Baluja, S. & Pomerleau, D. A. (1995). “Using the Representation in a Neural Network's Hidden Layer for TaskSpecific Focus of Attention,” Proceedings of the International Joint Conference on Artificial Intelligence 1995, IJCAI-95, Montreal, Canada, pp. 133-139.
Baxter, J. (1994). “Learning Internal Representations,” Ph.D. Thesis, The Flinders Univeristy of South Australia.
Baxter, J. (1995). “Learning Internal Representations,” Proceedings of the 8th ACM Conference on Computational Learning Theory, (COLT-95), Santa Cruz, CA.
Baxter, J. (1996). “A Bayesian/Information Theoretic Model of Bias Learning,” Proceedings of the 9th International Conference on Computational Learning Theory, (COLT-96), Desenzano del Gardo, Italy.
Breiman, L. & Friedman, J. H. (1995). “Predicting Multivariate Responses in Multiple Linear Regression,” ftp://ftp.stat.berkeley.edu/pub/users/breiman/curds-whey-all.ps.Z.
Caruana, R. (1993). “Multitask Learning: A Knowledge-Based Source of Inductive Bias,” Proceedings of the 10th International Conference on Machine Learning, ML-93, University of Massachusetts, Amherst, pp. 41-48.
Google Scholar
Caruana, R. (1994).”Multitask Connectionist Learning,” Proceedings of the 1993 Connectionist Models Summer School, pp. 372-379.
Caruana, R. (1995). “Learning Many Related Tasks at the Same Time with Backpropagation,” Advances in Neural Information Processing Systems 7 (Proceedings of NIPS-94), pp. 656-664.
Caruana, R., Baluja, S., & Mitchell, T. (1996). “Using the Future to “Sort Out” the Present: Rankprop and Multitask Learning for Medical Risk Prediction,” Advances in Neural Information Processing Systems 8 (Proceedings of NIPS-95), pp. 959-965.
Caruana, R. & de Sa, V. R. (1997). “Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs,” to appear in Advances in Neural Information Processing Systems 9 (Proceedings of NIPS-96).
Caruana, R. (1997). “Multitask Learning,” Ph.D. Thesis, School of Computer Science, Carnegie Mellon University.
Cooper, G. F. & Herskovits, E. (1992). “A Bayesian Method for the Induction of Probabilistic Networks from Data,” Machine Learning, 9, pp. 309-347.
Article Google Scholar
Cooper, G. F., Aliferis, C. F., Ambrosino, R., Aronis, J., Buchanan, B. G., Caruana, R., Fine, M. J., Glymour, C., Gordon, G., Hanusa, B. H., Janosky, J. E., Meek, C., Mitchell, T., Richardson, T., and Spirtes, P. (1997). ”An Evaluation of Machine Learning Methods for Predicting Pneumonia Mortality,” Artificial Intelligence in Medicine 9, pp. 107-138.
Craven, M. & Shavlik, J. (1994). “Using Sampling and Queries to Extract Rules from Trained Neural Networks,” Proceedings of the 11th International Conference on Machine Learning, ML-94, Rutgers University, New Jersey, pp. 37-45.
Google Scholar
Davis, I. & Stentz, A. (1995). “Sensor Fusion for Autonomous Outdoor Navigation Using Neural Networks,” Proceedings of IEEE's Intelligent Robots and Systems Conference.
Dent, L., Boticario, J., McDermott, J., Mitchell, T., & Zabowski, D. (1992). “A Personal Learning Apprentice,” Proceedings of 1992 National Conference on Artificial Intelligence.
de Sa, V. R. (1994). “Learning Classification with Unlabelled Data,” Advances in Neural Information Processing Systems 6, (Proceedings of NIPS-93), pp. 112-119.
Dietterich, T. G., Hild, H., & Bakiri, G. (1990). “A Comparative Study of ID3 and Backpropagation for English Text-to-speech Mapping,” Proceedings of the Seventh International Conference on Artificial Intelligence, pp. 24-31.
Dietterich, T. G., Hild, H., & Bakiri, G. (1995). “A Comparison of ID3 and Backpropagation for English Text-to-speech Mapping,” Machine Learning, 18(1), pp. 51-80.
Article Google Scholar
Dietterich, T. G. & Bakiri, G. (1995). “Solving Multiclass Learning Problems via Error-Correcting Output Codes,” Journal of Artificial Intelligence Research, 2, pp. 263-286.
Google Scholar
Fine, M. J., Singer, D., Hanusa, B. H., Lave, J., & Kapoor, W. (1993). “Validation of a Pneumonia Prognostic Index Using the MedisGroups Comparative Hospital Database,” American Journal of Medicine.
Fisher, D. H. (1987). “Conceptual Clustering, Learning from Examples, and Inference,” Proceedings of the 4th International Workshop on Machine Learning.
Ghahramani, Z. & Jordan, M. I. (1994). “Supervised Learning from Incomplete Data Using an EM Approach,” Advances in Neural Information Processing Systems 6, (Proceedings of NIPS-93,) pp. 120-127.
Ghahramani, Z. & Jordan, M. I. (1997). “Mixture Models for Learning from Incomplete Data,” Computational Learning Theory and Natural Learning Systems, Vol. IV, R. Greiner, T. Petsche and S.J. Hanson (eds.), Cambridge, MA, MIT Press, pp. 67-85.
Google Scholar
Ghosn, J. & Bengio, Y. (1997). “Multi-Task Learning for Stock Selection,” to appear in Advances in Neural Information Processing Systems 9, (Proceedings of NIPS-96).
Hinton, G. E. (1986). “Learning Distributed Representations of Concepts,” Proceedings of the 8th International Conference of the Cognitive Science Society, pp. 1-12.
Holmstrom, L. & Koistinen, P. (1992). “Using Additive Noise in Back-propagation Training,” IEEE Transactions on Neural Networks, 3(1), pp. 24-38.
Google Scholar
Jordan, M. & Jacobs, R. (1994). “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, 6, pp. 181-214.
Google Scholar
Koller, D. & Sahami, M. (1996). “Toward Optimal Feature Selection,” Proceedings of the 13th International Conference on Machine Learning, ICML-96, Bari, Italy, pp. 284-292.
Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackal, L. D. (1989). ”Backpropagation Applied to Handwritten Zip-Code Recognition,” Neural Computation, 1, pp. 541-551.
Google Scholar
Little, R. J. A. & Rubin, D. B. (1987). Statistical Analysis with Missing Data, Wiley, New York.
Liu, H. & Setiono, R. (1996). “A Probibilistic Approach to Feature Selection—A Filter Solution,” Proceedings of the 13th International Conference on Machine Learning,ICML-96, Bari, Italy, pp. 319-327.
Martin, J. D. (1994). “Goal-directed Clustering,” Proceedings of the 1994 AAAI Spring Symposium on Goal-directed Learning.
Martin, J. D. & Billman, D. O. (1994). “Acquiring and Combining Overlapping Concepts,” Machine Learning, 16, pp. 1-37.
Google Scholar
Mitchell, T. (1980). “The Need for Biases in Learning Generalizations,” Rutgers University: CBM-TR-117.
Mitchell, T., Caruana, R., Freitag, D., McDermott, J., & Zabowski, D. (1994). “Experience with a Learning Personal Assistant,” Communications of the ACM: Special Issue on Agents, 37(7), pp. 80-91.
Google Scholar
Munro, P. W. & Parmanto, B. (1997). “Competition Among Networks Improves Committee Performance,” to appear in Advances in Neural Information Processing Systems 9 (Proceedings of NIPS-96).
Omohundro, S. M. (1996). “Family Discovery,” Advances in Neural Information Processing Systems 8, (Proceedings of NIPS-95), pp. 402-408.
O'sullivan, J. & Thrun, S. (1996). “Discovering Structure in Multiple Learning Tasks: The TC Algorithm,” Proceedings of the 13th International Conference on Machine Learning, ICML-96, Bari, Italy, pp. 489-497.
Pomerleau, D. A. (1992). “Neural Network Perception for Mobile Robot Guidance,” Carnegie Mellon University: CMU-CS-92-115.
Pratt, L. Y., Mostow, J., & Kamm, C. A. (1991). “Direct Transfer of Learned Information Among Neural Networks,” Proceedings of AAAI-91.
Pratt, L. Y. (1992). “Non-literal Transfer Among Neural Network Learners,” Colorado School of Mines: MCS92-04.
Quinlan, J. R. (1986). “Induction of Decision Trees,” Machine Learning, 1, pp. 81-106.
Article Google Scholar
Quinlan, J. R. (1992). C4.5: Programs for Machine Learning, Morgan Kaufman Publishers.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). “Learning Representations by Back-propagating Errors,” Nature, 323, pp. 533-536.
Google Scholar
Sejnowski, T. J. & Rosenberg, C. R. (1986). “NETtalk: A Parallel Network that Learns to Read Aloud,” John Hopkins: JHU/EECS-86/01.
Sharkey, N. E. & Sharkey, A. J. C. (1992). “Adaptive Generalisation and the Transfer of Knowledge,” University of Exeter: R257.
Sill, J. & Abu-Mostafa, Y. (1997). “Monotonicity Hints,” to appear in Neural Information Processing Systems 9 (Proceedings of NIPS-96).
Simard, P., Victorri, B., LeCun, Y., & Denker, J. (1992). “Tangent Prop—A Formalism for Specifying Selected Invariances in an Adaptive Neural Network,” Advances in Neural Information Processing Systems 4 (Proceedings of NIPS-91), pp. 895-903.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction, and Search, Springer-Verlag, New York.
Google Scholar
Suddarth, S. C. & Kergosien, Y. L. (1990). “Rule-injection Hints as a Means of Improving Network Performance and Learning Time,” Proceedings of the 1990 EURASIP Workshop on Neural Networks, pp. 120-129.
Suddarth, S. C. & Holden, A. D. C. (1991). “Symbolic-neural Systems and the Use of Hints for Developing Complex Systems,” International Journal of Man-Machine Studies, 35(3), pp. 291-311.
Google Scholar
Thrun, S. & Mitchell, T. (1994). “Learning One More Thing,” Carnegie Mellon University: CS-94-184.
Thrun, S. (1995). “Lifelong Learning: A Case Study,” Carnegie Mellon University: CS-95-208.
Thrun, S. (1996a). “Is Learning the N-th Thing Any Easier Than Learning the First?,” Advances in Neural Information Processing Systems 8 (Proceedings of NIPS-95), pp. 640-646.
Thrun, S. (1996b). Explanation-Based Neural Network Learning: A Lifelong Learning Approach, Kluwer Academic Publisher.
Tresp, V., Ahmad, S., & Neuneier, R. (1994). “Training Neural Networks with Deficient Data,” Advances in Neural Information Processing Systems 6 (Proceedings of NIPS-93), pp. 128-135.
Valdes-Perez, R., & Simon, H. (1994). “A Powerful Heuristic for the Discovery of Complex Patterned Behavior,” Proceedings of the 11th International Conference on Machine Learning, ML-94, Rutgers University, NewJersey, pp. 326-334.
Google Scholar
Waibel, A., Sawai, H., & Shikano, K. (1989). “Modularity and Scaling in Large Phonemic Neural Networks,” IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), pp. 1888-1898.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213
Rich Caruana

Authors

Rich Caruana
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caruana, R. Multitask Learning. Machine Learning 28, 41–75 (1997). https://doi.org/10.1023/A:1007379606734

Download citation

Issue Date: July 1997
DOI: https://doi.org/10.1023/A:1007379606734

Multitask Learning

Abstract

Article PDF

Similar content being viewed by others

A new transfer learning framework with application to model-agnostic multi-task learning

Transfer of Knowledge Across Tasks

On the Relationship Between Disentanglement and Multi-task Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Multitask Learning

Abstract

Article PDF

Similar content being viewed by others

A new transfer learning framework with application to model-agnostic multi-task learning

Transfer of Knowledge Across Tasks

On the Relationship Between Disentanglement and Multi-task Learning

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article