More Web Proxy on the site http://driver.im/

article

Free access

Task clustering and gating for bayesian multitask learning

Authors:

Tom HeskesAuthors Info & Claims

The Journal of Machine Learning Research, Volume 4

Pages 83 - 99

https://doi.org/10.1162/153244304322765658

Published: 01 December 2003 Publication History

Abstract

Modeling a collection of similar regression or classification tasks can be improved by making the tasks 'learn from each other'. In machine learning, this subject is approached through 'multitask learning', where parallel tasks are modeled as multiple outputs of the same network. In multilevel analysis this is generally implemented through the mixed-effects linear model where a distinction is made between 'fixed effects', which are the same for all tasks, and 'random effects', which may vary between tasks. In the present article we will adopt a Bayesian approach in which some of the model parameters are shared (the same for all tasks) and others more loosely connected through a joint prior distribution that can be learned from the data. We seek in this way to combine the best parts of both the statistical multilevel approach and the neural network machinery. The standard assumption expressed in both approaches is that each task can learn equally well from any other task. In this article we extend the model by allowing more differentiation in the similarities between tasks. One such extension is to make the prior mean depend on higher-level task characteristics. More unsupervised clustering of tasks is obtained if we go from a single Gaussian prior to a mixture of Gaussians. This can be further generalized to a mixture of experts architecture with the gates depending on task characteristics. All three extensions are demonstrated through application both on an artificial data set and on two real-world problems, one a school problem and the other involving single-copy newspaper sales.

References

[1]

M. Aitkin and N. Longford. Statistical modelling issues in school effectiveness studies. Journal of the Royal Statistical Society A, 149:1-43, 1986.

[2]

V. Arora, P. Lahiri, and K. Mukherjee. Empirical Bayes estimation of finite population means from complex surveys. Journal of the American Statistical Association, 92:1555-1562, 1997.

[3]

J. Baxter. A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28:7-39, 1997.

[4]

C. Bishop. Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995.

[5]

B. Brumback and J. Rice. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association, 93:961-976, 1998.

[6]

S. Bryk and W. Raudenbush. Hierarchical linear models: applications and data analysis methods. Sage Publications, Inc, Newbury Park (CAL), 1992.

[7]

I. Cadez, S. Gaffney, and P. Smyth. A general probabilistic framework for clustering individuals and objects. In Proceedings of the ACM SIGKDD Conference, pages 140-149, 2000.

[8]

R. Caruana. Multitask learning. Machine Learning, 28:41-75, 1997.

[9]

R. Caruana, S. Lawrence, and C. Lee Giles. Overfitting in neural networks: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13, pages 402-408, Denver, Colorado, 2001. MIT Press.

[10]

M. Daniels and C. Gatsonis. Hierarchical generalized linear models in the analysis of variations in health care utilization. Journal of the American Statistical Association, 94:29-38, 1999.

[11]

A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-38, 1977.

[12]

D. Gamerman and H. Migon. Dynamic hierarchical models. Journal of the Royal Statistical Society, Ser. B, 55:629-642, 1993.

[13]

T. Heskes. Solving a huge number of similar tasks: a combination of multi-task learning and hierarchical Bayesian modeling. In ICML, pages 233-241, 1998.

[14]

T. Heskes. Empirical Bayes for learning to learn. In P. Langley, editor, Proceedings of ICML, pages 367-374, San Francisco, CA, 2000. Morgan Kaufmann.

[15]

W. Jiang and M. Tanner. On the approximation rate of hierarchical mixtures-of-experts for generalized linear models. Neural Computation, 11:1183-1198, 1999.

[16]

M. Jordan and R. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6: 181-214, 1994.

[17]

X. Lin and D. Zhang. Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society, 61:381-400, 1999.

[18]

D. MacKay. Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks. Network, 6:469-505, 1995.

[19]

P. Mortimore, P. Sammons, L. Stoll, D. Lewis, and R. Ecob. School Matters. Wells: Open Books, 1988.

[20]

L. Pratt. Discriminability-based transfer between neural networks. In Advances in Neural Information Processing Systems 5, pages 204-211, 1992.

[21]

C. Robert. The Bayesian Choice: A Decision-Theoretic Motivation. Springer, New York, 1994.

[22]

M. Seltzer, H. Wong, and A. Bryk. Bayesian analysis in applications of hierarchical models: issues and methods. Journal of Educational and Behavioral Statistics, 21:131-167, 1996.

[23]

S. Thrun and J. O'Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In International Conference on Machine Learning, pages 489-497, 1996.

[24]

C. Williams and D. Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:1342-1351, 1998.

Cited By

Bao ZChen ZWang CZheng WHuang ZChen Y(2024)Post-Distillation via Neural ResuscitationIEEE Transactions on Multimedia10.1109/TMM.2023.330660126(3046-3060)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3306601
Wu WZhang XZhao SFu CZhou J(2024)Multi-Task Decouple Learning With Hierarchical Attentive Point ProcessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330562836:4(1741-1757)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TKDE.2023.3305628
Yan HYan AHu LLiang JHu H(2024)MTL-Leak: Privacy Risk Assessment in Multi-Task LearningIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.324786921:1(204-215)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TDSC.2023.3247869
Show More Cited By

Index Terms

Task clustering and gating for bayesian multitask learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
      2. Unsupervised learning
        Cluster analysis
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Lifelong multitask learning is a multitask learning framework in which a learning agent faces the tasks that need to be learnt in an online manner. Lifelong multitask learning framework may be applied to a variety of applications such as image ...
A Regularization Approach to Learning Task Relationships in Multitask Learning

Multitask learning is a learning paradigm that seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this article, we propose a regularization approach to learning the relationships between ...
Multitask Learning
Special issue on inductive transfer

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 4, Issue

12/1/2003

1486 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 December 2003

Published in JMLR Volume 4

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

141
Total Citations
View Citations
1,117
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)23

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bao ZChen ZWang CZheng WHuang ZChen Y(2024)Post-Distillation via Neural ResuscitationIEEE Transactions on Multimedia10.1109/TMM.2023.330660126(3046-3060)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3306601
Wu WZhang XZhao SFu CZhou J(2024)Multi-Task Decouple Learning With Hierarchical Attentive Point ProcessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330562836:4(1741-1757)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TKDE.2023.3305628
Yan HYan AHu LLiang JHu H(2024)MTL-Leak: Privacy Risk Assessment in Multi-Task LearningIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.324786921:1(204-215)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TDSC.2023.3247869
Simchoni GRosset S(2023)Integrating random effects in deep neural networksThe Journal of Machine Learning Research10.5555/3648699.364885524:1(7402-7458)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.5555/3648699.3648855
Shen JXiao ZZhen XSnoek CWorring MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Association graph learning for multi-task classification with category shiftsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600595(4503-4516)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600595
Liu LLi F(2022)A Survey on Dynamic Fuzzy Machine LearningACM Computing Surveys10.1145/354401355:7(1-42)Online publication date: 15-Dec-2022
https://dl.acm.org/doi/10.1145/3544013
Li ZYan HTsung FZhang K(2022)Profile Decomposition Based Hybrid Transfer Learning for Cold-Start Data Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/353099016:6(1-28)Online publication date: 30-Jul-2022
https://dl.acm.org/doi/10.1145/3530990
Shao MHaq MGao DChondro PRuan S(2022)Semantic Segmentation for Free Space and Lane Based on Grid-Based Interest Point DetectionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2021.308352623:7(8498-8512)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1109/TITS.2021.3083526
Abhadiomhen SNzeh RGanaa ENwagwu HOkereke GRoutray S(2022)Supervised Shallow Multi-task Learning: Analysis of MethodsNeural Processing Letters10.1007/s11063-021-10703-754:3(2491-2508)Online publication date: 29-Jan-2022
https://dl.acm.org/doi/10.1007/s11063-021-10703-7
Dev KAshraf ZMuhuri PKumar S(2022)Deep autoencoder based domain adaptation for transfer learningMultimedia Tools and Applications10.1007/s11042-022-12226-281:16(22379-22405)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-12226-2
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents