Abstract
We aim to construct a probabilistic classifier to predict a latent, time-dependent boolean label given an observed vector of measurements. Our training data consists of sequences of observations paired with a label for precisely one of the observations in each sequence. As an initial approach, we learn a baseline supervised classifier by training on the labeled observations alone, ignoring the unlabeled observations in each sequence. We then leverage this first classifier and the sequential structure of our data to build a second training set as follows: (1) we apply the first classifier to each unlabeled observation and then (2) we filter the resulting estimates to incorporate information from the labeled observations and create a much larger training set. We describe a Bayesian filtering framework that can be used to perform step 2 and show how a second classifier built using the latter, filtered training set can outperform the initial classifier.
At Adobe, our motivating application entails predicting customer segment membership from readily available proprietary features. We administer surveys to collect label data for our subscribers and then generate feature data for these customers at regular intervals around the survey time. While we can train a supervised classifier using paired feature and label data from the survey time alone, the availability of nearby feature data and the relative expensive of polling drive this semi-supervised approach. We perform an ablation study comparing both a baseline classifier and a likelihood-based augmentation approach to our proposed method and show how our method best improves predictive performance for an in-house classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Given a set \(\{\alpha _0^k\}_{k\in K}\) of candidate values for \(\alpha _0\) and a set \(\{\alpha _1^\ell \}_{\ell \in L}\) for \(\alpha _1\), we select parameters via an exhaustive grid search as follows. For each \((k,\ell )\in K\times L\), we apply Algorithm 1 with \(\alpha _0^k\) and \(\alpha _1^\ell \) to the training set, train a classifier on the resulting filtered dataset, and then evaluate this classifier’s predictive performance on the validation set (using AUC). Upon completion, we select the parameter values \(\alpha _0^k\) and \(\alpha _1^\ell \) that yield the most performant classifier.
References
Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K.: Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: International Joint Conference on Neural Networks, vol. 3, pp. 189–194 (2020)
Batty, E., et al.: Behavenet: nonlinear embedding and Bayesian neural decoding of behavioral videos. In: Advances in Neural Information Processing Systems, pp. 15706–15717 (2019)
Brandman, D.M., et al.: Rapid calibration of an intracortical brain-computer interface for people with tetraplegia. J. Neural Eng. 15(2), 026007 (2018)
Brandman, D.M., Burkhart, M.C., Kelemen, J., Franco, B., Harrison, M.T., Hochberg, L.R.: Robust closed-loop control of a cursor in a person with tetraplegia using Gaussian process regression. Neural Comput. 30(11), 2986–3008 (2018)
Burkhart, M.C.: A discriminative approach to bayesian filtering with applications to human neural decoding. Ph.D. thesis, Brown University, Division of Applied Mathematics, Providence, U.S.A. (2019)
Burkhart, M.C., Brandman, D.M., Franco, B., Hochberg, L.R., Harrison, M.T.: The discriminative Kalman filter for Bayesian filtering with nonlinear and nongaussian observation models. Neural Comput. 32(5), 969–1017 (2020)
Burkhart, M.C., Shan, K.: Deep low-density separation for semi-supervised classification. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12139, pp. 297–311. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50420-5_22
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Chen, Z.: Bayesian filtering: from Kalman filters to particle filters, and beyond. Technical report, McMaster U (2003)
Durrett, R.: Probability: Theory and Examples. Cambridge University Press, Cambridge (2010)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist. 29(5), 1189–1232 (2001)
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems, pp. 529–536 (2004)
Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semi-supervised learning. In: Conference on Computer Vision and Pattern Recognition (2019)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
Kim, M., Pavlovic, V.: Discriminative learning for dynamic state prediction. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1847–1861 (2009)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, pp. 950–957 (1991)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001)
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representations (2017)
Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: ICML Workshop on Challenges in Representation Learning (2013)
Liu, D.C., Nocedal, J.: On the limited memory method for large scale optimization. Math. Program. 45(3), 503–528 (1989)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: International Conference on Machine Learning, pp. 591–598 (2000)
Minka, T.P.: Expectation propagation for approximate Bayesian inference. In: Uncertainty in Artificial Intelligence (2001)
Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines (2010)
Ng, A., Jordan, M.: On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, vol. 14, pp. 841–848 (2002)
Oliver, A., Odena, A., Raffel, C.A., Cubuk, E.D., Goodfellow, I.: Realistic evaluation of deep semi-supervised learning algorithms. In: Advances in Neural Information Processing Systems, pp. 3235–3246 (2018)
Pearl, J.: Reverend Bayes on inference engines: a distributed hierarchical approach. In: Proceedings of Association for the Advancement of Artificial Intelligence, pp. 133–136 (1982)
Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. In: Advances in Neural Information Processing Systems, pp. 3546–3554 (2015)
Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 1163–1171 (2016)
Särkkä, S.: Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge (2013)
Scudder III, H.J.: Probability of error for some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11(3), 363–371 (1965)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, pp. 1195–1204 (2017)
Taycher, L., Shakhnarovich, G., Demirdjian, D., Darrell, T.: Conditional random people: tracking humans with CRFs and grid filters. In: Computer Vision and Pattern Recognition (2006)
Whitney, M., Sarkar, A.: Bootstrapping via graph propagation. In: Proceedings of Association for Computational Linguistics, vol. 1, pp. 620–628 (2012)
Yarkowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of Association for Computational Linguistics, pp. 189–196 (1995)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp. 321–328 (2004)
Zhu, X.: Semi-supervised learning literature survey. Technical report, TR 1530, U. Wisconsin-Madison (2005)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, CMU-CALD-02-107, Carnegie Mellon University (2002)
Zhuang, C., Ding, X., Murli, D., Yamins, D.: Local label propagation for large-scale semi-supervised learning (2019). arXiv:1905.11581
Acknowledgements
The author would like to thank his manager Binjie Lai, her manager Xiang Wu, and his coworkers at Adobe, especially Eunyee Koh for performing an internal review. The author is also grateful to the anonymous reviewers for their thoughtful feedback and to his former advisor Matthew T. Harrison for inspiring this discriminative filtering approach.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Burkhart, M.C. (2021). Discriminative Bayesian Filtering for the Semi-supervised Augmentation of Sequential Observation Data. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-77964-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77963-4
Online ISBN: 978-3-030-77964-1
eBook Packages: Computer ScienceComputer Science (R0)