[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Discriminative Bayesian Filtering for the Semi-supervised Augmentation of Sequential Observation Data

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12743))

Included in the following conference series:

  • 1893 Accesses

Abstract

We aim to construct a probabilistic classifier to predict a latent, time-dependent boolean label given an observed vector of measurements. Our training data consists of sequences of observations paired with a label for precisely one of the observations in each sequence. As an initial approach, we learn a baseline supervised classifier by training on the labeled observations alone, ignoring the unlabeled observations in each sequence. We then leverage this first classifier and the sequential structure of our data to build a second training set as follows: (1) we apply the first classifier to each unlabeled observation and then (2) we filter the resulting estimates to incorporate information from the labeled observations and create a much larger training set. We describe a Bayesian filtering framework that can be used to perform step 2 and show how a second classifier built using the latter, filtered training set can outperform the initial classifier.

At Adobe, our motivating application entails predicting customer segment membership from readily available proprietary features. We administer surveys to collect label data for our subscribers and then generate feature data for these customers at regular intervals around the survey time. While we can train a supervised classifier using paired feature and label data from the survey time alone, the availability of nearby feature data and the relative expensive of polling drive this semi-supervised approach. We perform an ablation study comparing both a baseline classifier and a likelihood-based augmentation approach to our proposed method and show how our method best improves predictive performance for an in-house classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 55.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Given a set \(\{\alpha _0^k\}_{k\in K}\) of candidate values for \(\alpha _0\) and a set \(\{\alpha _1^\ell \}_{\ell \in L}\) for \(\alpha _1\), we select parameters via an exhaustive grid search as follows. For each \((k,\ell )\in K\times L\), we apply Algorithm 1 with \(\alpha _0^k\) and \(\alpha _1^\ell \) to the training set, train a classifier on the resulting filtered dataset, and then evaluate this classifier’s predictive performance on the validation set (using AUC). Upon completion, we select the parameter values \(\alpha _0^k\) and \(\alpha _1^\ell \) that yield the most performant classifier.

References

  1. Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K.: Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: International Joint Conference on Neural Networks, vol. 3, pp. 189–194 (2020)

    Google Scholar 

  2. Batty, E., et al.: Behavenet: nonlinear embedding and Bayesian neural decoding of behavioral videos. In: Advances in Neural Information Processing Systems, pp. 15706–15717 (2019)

    Google Scholar 

  3. Brandman, D.M., et al.: Rapid calibration of an intracortical brain-computer interface for people with tetraplegia. J. Neural Eng. 15(2), 026007 (2018)

    Article  Google Scholar 

  4. Brandman, D.M., Burkhart, M.C., Kelemen, J., Franco, B., Harrison, M.T., Hochberg, L.R.: Robust closed-loop control of a cursor in a person with tetraplegia using Gaussian process regression. Neural Comput. 30(11), 2986–3008 (2018)

    Article  MathSciNet  Google Scholar 

  5. Burkhart, M.C.: A discriminative approach to bayesian filtering with applications to human neural decoding. Ph.D. thesis, Brown University, Division of Applied Mathematics, Providence, U.S.A. (2019)

    Google Scholar 

  6. Burkhart, M.C., Brandman, D.M., Franco, B., Hochberg, L.R., Harrison, M.T.: The discriminative Kalman filter for Bayesian filtering with nonlinear and nongaussian observation models. Neural Comput. 32(5), 969–1017 (2020)

    Article  MathSciNet  Google Scholar 

  7. Burkhart, M.C., Shan, K.: Deep low-density separation for semi-supervised classification. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12139, pp. 297–311. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50420-5_22

    Chapter  Google Scholar 

  8. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  9. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  10. Chen, Z.: Bayesian filtering: from Kalman filters to particle filters, and beyond. Technical report, McMaster U (2003)

    Google Scholar 

  11. Durrett, R.: Probability: Theory and Examples. Cambridge University Press, Cambridge (2010)

    Book  Google Scholar 

  12. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist. 29(5), 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  13. Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems, pp. 529–536 (2004)

    Google Scholar 

  14. Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semi-supervised learning. In: Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  15. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)

    Google Scholar 

  16. Kim, M., Pavlovic, V.: Discriminative learning for dynamic state prediction. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1847–1861 (2009)

    Article  Google Scholar 

  17. Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, pp. 950–957 (1991)

    Google Scholar 

  18. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001)

    Google Scholar 

  19. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representations (2017)

    Google Scholar 

  20. Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: ICML Workshop on Challenges in Representation Learning (2013)

    Google Scholar 

  21. Liu, D.C., Nocedal, J.: On the limited memory method for large scale optimization. Math. Program. 45(3), 503–528 (1989)

    Article  MathSciNet  Google Scholar 

  22. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: International Conference on Machine Learning, pp. 591–598 (2000)

    Google Scholar 

  23. Minka, T.P.: Expectation propagation for approximate Bayesian inference. In: Uncertainty in Artificial Intelligence (2001)

    Google Scholar 

  24. Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines (2010)

    Google Scholar 

  25. Ng, A., Jordan, M.: On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, vol. 14, pp. 841–848 (2002)

    Google Scholar 

  26. Oliver, A., Odena, A., Raffel, C.A., Cubuk, E.D., Goodfellow, I.: Realistic evaluation of deep semi-supervised learning algorithms. In: Advances in Neural Information Processing Systems, pp. 3235–3246 (2018)

    Google Scholar 

  27. Pearl, J.: Reverend Bayes on inference engines: a distributed hierarchical approach. In: Proceedings of Association for the Advancement of Artificial Intelligence, pp. 133–136 (1982)

    Google Scholar 

  28. Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. In: Advances in Neural Information Processing Systems, pp. 3546–3554 (2015)

    Google Scholar 

  29. Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 1163–1171 (2016)

    Google Scholar 

  30. Särkkä, S.: Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge (2013)

    Book  Google Scholar 

  31. Scudder III, H.J.: Probability of error for some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11(3), 363–371 (1965)

    Article  MathSciNet  Google Scholar 

  32. Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, pp. 1195–1204 (2017)

    Google Scholar 

  33. Taycher, L., Shakhnarovich, G., Demirdjian, D., Darrell, T.: Conditional random people: tracking humans with CRFs and grid filters. In: Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  34. Whitney, M., Sarkar, A.: Bootstrapping via graph propagation. In: Proceedings of Association for Computational Linguistics, vol. 1, pp. 620–628 (2012)

    Google Scholar 

  35. Yarkowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of Association for Computational Linguistics, pp. 189–196 (1995)

    Google Scholar 

  36. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  37. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp. 321–328 (2004)

    Google Scholar 

  38. Zhu, X.: Semi-supervised learning literature survey. Technical report, TR 1530, U. Wisconsin-Madison (2005)

    Google Scholar 

  39. Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, CMU-CALD-02-107, Carnegie Mellon University (2002)

    Google Scholar 

  40. Zhuang, C., Ding, X., Murli, D., Yamins, D.: Local label propagation for large-scale semi-supervised learning (2019). arXiv:1905.11581

Download references

Acknowledgements

The author would like to thank his manager Binjie Lai, her manager Xiang Wu, and his coworkers at Adobe, especially Eunyee Koh for performing an internal review. The author is also grateful to the anonymous reviewers for their thoughtful feedback and to his former advisor Matthew T. Harrison for inspiring this discriminative filtering approach.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael C. Burkhart .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Burkhart, M.C. (2021). Discriminative Bayesian Filtering for the Semi-supervised Augmentation of Sequential Observation Data. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77964-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77963-4

  • Online ISBN: 978-3-030-77964-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics