[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Task clustering and gating for bayesian multitask learning

Published: 01 December 2003 Publication History

Abstract

Modeling a collection of similar regression or classification tasks can be improved by making the tasks 'learn from each other'. In machine learning, this subject is approached through 'multitask learning', where parallel tasks are modeled as multiple outputs of the same network. In multilevel analysis this is generally implemented through the mixed-effects linear model where a distinction is made between 'fixed effects', which are the same for all tasks, and 'random effects', which may vary between tasks. In the present article we will adopt a Bayesian approach in which some of the model parameters are shared (the same for all tasks) and others more loosely connected through a joint prior distribution that can be learned from the data. We seek in this way to combine the best parts of both the statistical multilevel approach and the neural network machinery. The standard assumption expressed in both approaches is that each task can learn equally well from any other task. In this article we extend the model by allowing more differentiation in the similarities between tasks. One such extension is to make the prior mean depend on higher-level task characteristics. More unsupervised clustering of tasks is obtained if we go from a single Gaussian prior to a mixture of Gaussians. This can be further generalized to a mixture of experts architecture with the gates depending on task characteristics. All three extensions are demonstrated through application both on an artificial data set and on two real-world problems, one a school problem and the other involving single-copy newspaper sales.

References

[1]
M. Aitkin and N. Longford. Statistical modelling issues in school effectiveness studies. Journal of the Royal Statistical Society A, 149:1-43, 1986.
[2]
V. Arora, P. Lahiri, and K. Mukherjee. Empirical Bayes estimation of finite population means from complex surveys. Journal of the American Statistical Association, 92:1555-1562, 1997.
[3]
J. Baxter. A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28:7-39, 1997.
[4]
C. Bishop. Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995.
[5]
B. Brumback and J. Rice. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association, 93:961-976, 1998.
[6]
S. Bryk and W. Raudenbush. Hierarchical linear models: applications and data analysis methods. Sage Publications, Inc, Newbury Park (CAL), 1992.
[7]
I. Cadez, S. Gaffney, and P. Smyth. A general probabilistic framework for clustering individuals and objects. In Proceedings of the ACM SIGKDD Conference, pages 140-149, 2000.
[8]
R. Caruana. Multitask learning. Machine Learning, 28:41-75, 1997.
[9]
R. Caruana, S. Lawrence, and C. Lee Giles. Overfitting in neural networks: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13, pages 402-408, Denver, Colorado, 2001. MIT Press.
[10]
M. Daniels and C. Gatsonis. Hierarchical generalized linear models in the analysis of variations in health care utilization. Journal of the American Statistical Association, 94:29-38, 1999.
[11]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-38, 1977.
[12]
D. Gamerman and H. Migon. Dynamic hierarchical models. Journal of the Royal Statistical Society, Ser. B, 55:629-642, 1993.
[13]
T. Heskes. Solving a huge number of similar tasks: a combination of multi-task learning and hierarchical Bayesian modeling. In ICML, pages 233-241, 1998.
[14]
T. Heskes. Empirical Bayes for learning to learn. In P. Langley, editor, Proceedings of ICML, pages 367-374, San Francisco, CA, 2000. Morgan Kaufmann.
[15]
W. Jiang and M. Tanner. On the approximation rate of hierarchical mixtures-of-experts for generalized linear models. Neural Computation, 11:1183-1198, 1999.
[16]
M. Jordan and R. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6: 181-214, 1994.
[17]
X. Lin and D. Zhang. Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society, 61:381-400, 1999.
[18]
D. MacKay. Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks. Network, 6:469-505, 1995.
[19]
P. Mortimore, P. Sammons, L. Stoll, D. Lewis, and R. Ecob. School Matters. Wells: Open Books, 1988.
[20]
L. Pratt. Discriminability-based transfer between neural networks. In Advances in Neural Information Processing Systems 5, pages 204-211, 1992.
[21]
C. Robert. The Bayesian Choice: A Decision-Theoretic Motivation. Springer, New York, 1994.
[22]
M. Seltzer, H. Wong, and A. Bryk. Bayesian analysis in applications of hierarchical models: issues and methods. Journal of Educational and Behavioral Statistics, 21:131-167, 1996.
[23]
S. Thrun and J. O'Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In International Conference on Machine Learning, pages 489-497, 1996.
[24]
C. Williams and D. Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:1342-1351, 1998.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 4, Issue
12/1/2003
1486 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 December 2003
Published in JMLR Volume 4

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)23
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Post-Distillation via Neural ResuscitationIEEE Transactions on Multimedia10.1109/TMM.2023.330660126(3046-3060)Online publication date: 1-Jan-2024
  • (2024)Multi-Task Decouple Learning With Hierarchical Attentive Point ProcessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330562836:4(1741-1757)Online publication date: 1-Apr-2024
  • (2024)MTL-Leak: Privacy Risk Assessment in Multi-Task LearningIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.324786921:1(204-215)Online publication date: 1-Jan-2024
  • (2023)Integrating random effects in deep neural networksThe Journal of Machine Learning Research10.5555/3648699.364885524:1(7402-7458)Online publication date: 1-Jan-2023
  • (2022)Association graph learning for multi-task classification with category shiftsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600595(4503-4516)Online publication date: 28-Nov-2022
  • (2022)A Survey on Dynamic Fuzzy Machine LearningACM Computing Surveys10.1145/354401355:7(1-42)Online publication date: 15-Dec-2022
  • (2022)Profile Decomposition Based Hybrid Transfer Learning for Cold-Start Data Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/353099016:6(1-28)Online publication date: 30-Jul-2022
  • (2022)Semantic Segmentation for Free Space and Lane Based on Grid-Based Interest Point DetectionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2021.308352623:7(8498-8512)Online publication date: 1-Jul-2022
  • (2022)Supervised Shallow Multi-task Learning: Analysis of MethodsNeural Processing Letters10.1007/s11063-021-10703-754:3(2491-2508)Online publication date: 29-Jan-2022
  • (2022)Deep autoencoder based domain adaptation for transfer learningMultimedia Tools and Applications10.1007/s11042-022-12226-281:16(22379-22405)Online publication date: 1-Jul-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media