Classification Active Learning Based on Mutual Information
<p>The experimental results of different querying approaches for sequential active learning (<math display="inline"> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> </math>). (<b>a</b>) Average; (<b>b</b>) Average; (<b>c</b>) Average; (<b>d</b>) STD; (<b>e</b>) STD; (<b>f</b>) STD; (<b>g</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>h</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Entropy; (<b>i</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Entropy; (<b>j</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Random; (<b>k</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Random; (<b>l</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Random.</p> "> Figure 2
<p>The experimental results of different querying approaches for batch active learning (<math display="inline"> <mrow> <mi>k</mi> <mspace width="3.33333pt"/> <mo>=</mo> <mspace width="3.33333pt"/> <mn>5</mn> </mrow> </math>). (<b>a</b>) Average; (<b>b</b>) Average; (<b>c</b>) Average; (<b>d</b>) STD; (<b>e</b>) STD; (<b>f</b>) STD; (<b>g</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>h</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>i</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>j</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Random; (<b>k</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Random; (<b>l</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Random.</p> "> Figure 3
<p>The experimental results of different querying approaches for batch active learning (<math display="inline"> <mrow> <mi>k</mi> <mspace width="3.33333pt"/> <mo>=</mo> <mspace width="3.33333pt"/> <mn>10</mn> </mrow> </math>). (<b>a</b>) Average; (<b>b</b>) Average; (<b>c</b>) Average; (<b>d</b>) STD; (e) STD; (<b>f</b>) STD; (g) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>h</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>i</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>j</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Random; (<b>k</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Random; (<b>l</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Random.</p> "> Figure 4
<p>The experimental results of different querying approaches for batch active learning (<math display="inline"> <mrow> <mi>k</mi> <mspace width="3.33333pt"/> <mo>=</mo> <mspace width="3.33333pt"/> <mn>20</mn> </mrow> </math>). (<b>a</b>) Average; (<b>b</b>) Average; (<b>c</b>) Average; (<b>d</b>) STD; (<b>e</b>) STD; (<b>f</b>) STD; (<b>g</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Entropy; (<b>h</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Entropy; (<b>i</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Entropy; (<b>j</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs</span>. Random; (<b>k</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Random; (<b>l</b>) <span class="html-italic">p</span>-value: MI <span class="html-italic">vs.</span> Random.</p> ">
Abstract
:1. Introduction
2. Active Learning with Mutual Information
2.1. Formulation
2.2. Evaluating Mutual Information
2.3. Randomized vs. Deterministic Submodular Optimizations
Algorithm 1: The deterministic approach |
Inputs: The objective function f, the unlabeled indices , the query batch size |
Outputs: a subset of unlabeled indices of size k |
Algorithm 2: The randomized approach |
Inputs: The objective function f, the unlabeled indices , the query batch size |
Outputs: a subset of unlabeled indices of size k |
2.4. Total Complexity Reduction
2.5. Further Speed-Up
3. Results and Discussion
3.1. Cardiotocography
3.2. MNIST (Mixed National Institute of Standards and Technology)
3.3. Bach Choral Harmony
3.4. Experimental Settings
- Pess-MI-Det: Pessimistic MI with deterministic optimization (Algorithm 1);
- Pess-MI-Rand: Pessimistic MI with randomized optimization (Algorithm 2);
- Opt-MI-Det: Optimistic MI with deterministic optimization (Algorithm 1);
- Opt-MI-Rand: Optimistic MI with randomized optimization (Algorithm 2);
- entropy: Entropy objective with deterministic optimization (Algorithm 1);
- random: Random querying.
3.5. Numerical Results
4. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
Appendix: Update Equation for Multinomial Logistic Regression
Multinomial Logistic Regression as a Discriminative Classifier
Updating the Gradient Vector
Updating the Inverse Hessian Matrix
References
- Lewis, D.D.; Gale, W.A. A Sequential Algorithm for Training Text Classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 3–6 July 1994.
- Freund, Y.; Seung, H.S.; Shamir, E.; Tishby, N. Selective sampling using the query by committee algorithm. Mach. Learn. 1997, 28, 133–168. [Google Scholar] [CrossRef]
- Cohn, D.A.; Ghahramani, Z.; Jordan, M.I. Active learning with statistical models. 1996; arXiv:cs/9603104. [Google Scholar]
- Campbell, C.; Cristianini, N.; Smola, A.J. Query Learning with Large Margin Classifiers. In Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, USA, 29 June–2 July 2000.
- Roy, N.; McCallum, A. Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction. In Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA, 28 June–1 July 2001.
- Settles, B.; Craven, M.; Ray, S. Multiple-instance active learning. Adv. Neural Inf. Process. Syst. 2008, 20, 1289–1296. [Google Scholar]
- Brinker, K. Incorporating Diversity in Active Learning with Support Vector Machines. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003.
- Holub, A.; Perona, P.; Burl, M.C. Entropy-based active learning for object recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, AK, USA, 23–28 June 2008.
- Chen, Y.; Krause, A. Near-optimal batch mode active learning and adaptive submodular optimization. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013.
- Guo, Y.; Schuurmans, D. Discriminative batch mode active learning. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.86.6929 (accessed on 2 February 2016).
- Azimi, J.; Fern, A.; Zhang-Fern, X.; Borradaile, G.; Heeringa, B. Batch Active Learning via Coordinated Matching. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, UK, 27 June–3 July 2012.
- Hoi, S.C.H.; Jin, R.; Zhu, J.; Lyu, M.R. Batch Mode Active Learning and Its Application to Medical Image Classification. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 2006.
- Guo, Y. Active instance sampling via matrix partition. Available online: http://papers.nips.cc/paper/3919-active-instance-sampling-via-matrix-partition (accessed on 2 February 2016).
- Li, X.; Guo, Y. Adaptive Active Learning for Image Classification. In Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013.
- Guo, Y.; Greiner, R. Optimistic Active Learning Using Mutual Information. In Proceedings of 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 6–12 January 2007.
- Krause, A.; Singh, A.; Guestrin, C. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. J. Mach. Learn. Res. 2008, 9, 235–284. [Google Scholar]
- Dilkina, B.; Damoulas, T.; Gomes, C.; Fink, D. AL2: Learning for Active Learning. In Proceedings of Machine Learning for Sustainability Workshop at the 25th Conference of Neural Information Processing Systems, Sirra Nevada, Spain, 16–17 December 2011.
- Wei, K.; Iyer, R.; Bilmes, J. Submodularity in data subset selection and active learning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, Fran, 6–11 July 2015.
- Bach, F. Learning with submodular functions: A convex optimization perspective. 2013; arXiv:1111.6453v2. [Google Scholar]
- Nemhauser, G.L.; Wolsey, L.A.; Fisher, M.L. An analysis of approximations for maximizing submodular set functions—I. Math. Program. 1978, 14, 265–294. [Google Scholar] [CrossRef]
- Krause, A.; Golovin, D. Submodular Function Maximization. Available online: https://las.inf.ethz.ch/files/krause12survey.pdf (accessed on 2 February 2016).
- Nemhauser, G.L.; Wolsey, L.A. Best algorithms for approximating the maximum of a submodular set function. Math. Oper. Res. 1978, 3, 177–188. [Google Scholar] [CrossRef]
- Feige, U.; Mirrokni, V.S.; Vondrak, J. Maximizing non-monotone submodular functions. SIAM J. Comput. 2011, 40, 1133–1153. [Google Scholar] [CrossRef]
- Buchbinder, N.; Feldman, M.; Naor, J.; Schwartz, R. A tight linear time (1/2)-approximation for unconstrained submodular maximization. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, New Brunswick, NJ, USA, 20–23 October 2012; pp. 649–658.
- Buchbinder, N.; Feldman, M.; Naor, J.; Schwartz, R. Submodular Maximization with Cardinality Constraints. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, Portland, OR, USA, 5–7 January 2014.
- Lichman, M. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/datasets.html (accessed on 2 February 2016).
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Radicioni, D.P.; Esposito, R. BREVE: An HMPerceptron-Based Chord Recognition System. In Advances in Music Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2010; pp. 143–164. [Google Scholar]
- McCullagh, P.; Nelder, J.A. Generalized linear models. Ann. Statist. 1984, 12, 1589–1596. [Google Scholar] [CrossRef]
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sourati, J.; Akcakaya, M.; Dy, J.G.; Leen, T.K.; Erdogmus, D. Classification Active Learning Based on Mutual Information. Entropy 2016, 18, 51. https://doi.org/10.3390/e18020051
Sourati J, Akcakaya M, Dy JG, Leen TK, Erdogmus D. Classification Active Learning Based on Mutual Information. Entropy. 2016; 18(2):51. https://doi.org/10.3390/e18020051
Chicago/Turabian StyleSourati, Jamshid, Murat Akcakaya, Jennifer G. Dy, Todd K. Leen, and Deniz Erdogmus. 2016. "Classification Active Learning Based on Mutual Information" Entropy 18, no. 2: 51. https://doi.org/10.3390/e18020051
APA StyleSourati, J., Akcakaya, M., Dy, J. G., Leen, T. K., & Erdogmus, D. (2016). Classification Active Learning Based on Mutual Information. Entropy, 18(2), 51. https://doi.org/10.3390/e18020051