[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1273496.1273558acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning a meta-level prior for feature relevance from multiple related tasks

Published: 20 June 2007 Publication History

Abstract

In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning---learning on an ensemble of related tasks---to construct an informative prior on feature relevance. We assume that features themselves have meta-features that are predictive of their relevance to the prediction task, and model their relevance as a function of the meta-features using hyperparameters (called meta-priors). We present a convex optimization algorithm for simultaneously learning the meta-priors and feature weights from an ensemble of related prediction tasks which share a similar relevance structure. Our approach transfers the "meta-priors" among different tasks, which makes it possible to deal with settings where tasks have nonoverlapping features or the relevance of the features vary over the tasks. We show that learning feature relevance improves performance on two real data sets which illustrate such settings: (1) predicting ratings in a collaborative filtering task, and (2) distinguishing arguments of a verb in a sentence.

References

[1]
Argyriou, A., Evgeniou, T., & Pontil, M. (2006). Multi-task feature learning. Proceeding of NIPS. Cambridge, MA: MIT Press.
[2]
Baxter, J. (1997). A bayesian/information theoretic model of learning to learn viamultiple task sampling. Mach. Learn., 28, 7--39.
[3]
Baxter, J. (2000). Model for inductive learning. J. of Artificial Intelligence Research.
[4]
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--75.
[5]
Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. J. Mach. Learn. Res.
[6]
Fink, M., Shwatz-Shalev, S., Singer, Y., & Ullman, S. (2006). Online multiclass learning by interclass hypothesis sharing. Proc. 23rd International Conference on Machine Learning.
[7]
Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles.
[8]
Heskes, T. (2000). Empirical bayes for learning to learn. Proc. 17th International Conference on Machine Learning.
[9]
Kaelbling, L. (2003). JMLR special issue on variable and feature selection.
[10]
Kingsbury, P., Palmer, M., & Marcus, M. (2002). Adding semantic annotation to the penn treebank. Proceedings of the Human Language Technology Conference (HLT'02).
[11]
MacKay, D. (1992). Bayesian interpolation. Neural Computation, 4, 415--447.
[12]
Marlin, B. (2004). Collaborative filtering: A machine learning perspective.
[13]
McCallum, A., Rosenfeld, R., Mitchell, T., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes.
[14]
McCullagh, P., & Nelder, J. (1989). Generalized linear models. London: Chapman and Hall.
[15]
Moschitti, A. (2004). A study on convolution kernels for shallow statistic parsing. ACL.
[16]
Neal, R. (1995). Bayesian learning for neural networks. Doctoral dissertation. Adviser-Geoffrey Hinton.
[17]
Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J. H., & Jurafsky, D. (2005). Support vector learning for semantic argument classification. Machine Learning, 60, 11--39.
[18]
Raina, R., Ng, A., & Koller, D. (2006). Transfer learning by constructing informative priors. Proc. 21st International Conference on Machine Learning.
[19]
Taskar, B., Wong, M., & Koller, D. (2003). Learning on the test data: Leveraging unseen features. Proc. 20th International Conference on Machine Learning.
[20]
Teh, Y., Seeger, M., & Jordan, M. (2005). Semiparameteric latent factor models. Workshop on Artificial Intelligence and Statistics 10.
[21]
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? Advances in Neural Information Processing Systems (pp. 640--646). The MIT Press.
[22]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B.
[23]
Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning gaussian processes from multiple tasks.
[24]
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. B, 68, 49--67.
[25]
Zhang, J., Ghahramani, Z., & Yang, Y. (2005). Learning multiple related tasks using latent independent component analysis. Advances in Neural Information Processing Systems 17.

Cited By

View all
  • (2024)Self-paced method for transfer partial label learningInformation Sciences10.1016/j.ins.2024.121043(121043)Online publication date: Jun-2024
  • (2024)Federated transfer learning for intrusion detection system in industrial iot 4.0Multimedia Tools and Applications10.1007/s11042-024-18379-683:19(57913-57941)Online publication date: 16-Feb-2024
  • (2023)Shortest Paths Discovery in Uncertain Networks via Transfer LearningProceedings of the ACM on Management of Data10.1145/35892861:2(1-25)Online publication date: 20-Jun-2023
  • Show More Cited By
  1. Learning a meta-level prior for feature relevance from multiple related tasks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '07: Proceedings of the 24th international conference on Machine learning
    June 2007
    1233 pages
    ISBN:9781595937933
    DOI:10.1145/1273496
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Machine Learning Journal

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ICML '07 & ILP '07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Self-paced method for transfer partial label learningInformation Sciences10.1016/j.ins.2024.121043(121043)Online publication date: Jun-2024
    • (2024)Federated transfer learning for intrusion detection system in industrial iot 4.0Multimedia Tools and Applications10.1007/s11042-024-18379-683:19(57913-57941)Online publication date: 16-Feb-2024
    • (2023)Shortest Paths Discovery in Uncertain Networks via Transfer LearningProceedings of the ACM on Management of Data10.1145/35892861:2(1-25)Online publication date: 20-Jun-2023
    • (2023)AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01591(17300-17311)Online publication date: 1-Oct-2023
    • (2023)Analysis on methods to effectively improve transfer learning performanceTheoretical Computer Science10.1016/j.tcs.2022.09.023940(90-107)Online publication date: Jan-2023
    • (2022)M3ViTProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602332(28441-28457)Online publication date: 28-Nov-2022
    • (2022)Towards Low-Power Machine Learning Architectures Inspired by Brain Neuromodulatory SignallingJournal of Low Power Electronics and Applications10.3390/jlpea1204005912:4(59)Online publication date: 4-Nov-2022
    • (2022)Application of transfer learning for the prediction of blast impulseInternational Journal of Protective Structures10.1177/2041419622109669914:2(242-262)Online publication date: 24-May-2022
    • (2022)Deep Cross-Output Knowledge Transfer Using Stacked-Structure Least-Squares Support Vector MachinesIEEE Transactions on Cybernetics10.1109/TCYB.2020.300896352:5(3207-3220)Online publication date: May-2022
    • (2022)Holistic Affect Recognition Using PaNDA: Paralinguistic Non-Metric Dimensional AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2019.296188113:2(769-780)Online publication date: 1-Apr-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media