[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Active learning from stream data using optimal weight classifier ensemble

Published: 01 December 2010 Publication History

Abstract

In this paper, we propose a new research problem on active learning from data streams, where data volumes grow continuously, and labeling all data is considered expensive and impractical. The objective is to label a small portion of stream data from which a model is derived to predict future instances as accurately as possible. To tackle the technical challenges raised by the dynamic nature of the stream data, i.e., increasing data volumes and evolving decision concepts, we propose a classifierensemble-based active learning framework that selectively labels instances from data streams to build a classifier ensemble. We argue that a classifier ensemble's variance directly corresponds to its error rate, and reducing a classifier ensemble's variance is equivalent to improving its prediction accuracy. Because of this, one should label instances toward theminimization of the variance of the underlying classifier ensemble. Accordingly, we introduce a minimum-variance (MV) principle to guide the instance labeling process for data streams. In addition, we derive an optimal-weight calculationmethod to determine the weight values for the classifier ensemble. The MV principle and the optimal weighting module are combined to build an active learning framework for data streams. Experimental results on synthetic and real-world data demonstrate the performance of the proposed work in comparison with other approaches.

References

[1]
C. Aggarwal, Data Streams: Models and Algorithms. New York: Springer-Verlag, 2007.
[2]
P. Domingos and G. Hulten, "Mining high-speed data streams," in Proc. KDD, 2000, pp. 71-80.
[3]
H. Wang, W. Fan, P. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in Proc. KDD, 2003, pp. 226-235.
[4]
P. Zhang, X. Zhu, and L. Guo, "Mining data streams with labeled and unlabeled training examples," in Proc. ICDM, 2009, pp. 627-636.
[5]
H. Wang, J. Yin, J. Pei, P. Yu, and J. Yu, "Suppressing model overfitting in mining concept-drifting data streams," in Proc. KDD, 2006, pp. 736-741.
[6]
Y. Yang, X. Wu, and X. Zhu, "Combining proactive and reactive predictions of data streams," in Proc. KDD, 2005, pp. 710-715.
[7]
W. Street and Y. Kim, "A streaming ensemble algorithm (SEA) for largescale classification," in Proc. KDD, 2001, pp. 377-382.
[8]
J. Gao, W. Fan, J. Han, and P. Yu, "A general framework for mining concept-drifting data streams with skewed distributions," in Proc. SIAM Int. Conf. Data Mining, 2007, pp. 3-14.
[9]
W. Fan, Y. Huang, H. Wang, and P. Yu, "Active mining of data streams," in Proc. SIAM Int. Conf. Data Mining, 2004, pp. 457-461.
[10]
X. Zhu, P. Zhang, X. Wu, D. He, C. Zhang, and Y. Shi, "Cleansing noisy data streams," in Proc. 8th IEEE Int. Conf. Data Mining, 2008, pp. 1139-1144.
[11]
D. Cohn, L. Atlas, and R. Ladner, "Improving generalization with active learning," Mach. Learn., vol. 15, no. 2, pp. 201-221, May 1994.
[12]
X. Zhu, X. Wu, and Q. Chen, "Eliminating class noise in large datasets," in Proc. ICML, 2003, pp. 920-927.
[13]
H. Seung, M. Opper, and H. Sompolinsky, "Query by committee," in Proc. COLT, 1992, pp. 287-294.
[14]
Y. Freund, H. Seung, and E. Tishby, "Selective sampling using the query by committee algorithm," Mach. Learn., vol. 28, no. 2/3, pp. 133-168, Aug./Sep. 1997.
[15]
S. Hoi, R. Jin, J. Zhu, and M. Lyu, "Batch mode active learning and its application to medical image classification," in Proc. ICML, 2006, pp. 417-424.
[16]
H. Nguyen and A. Smeulders, "Active learning using pre-clustering," in Proc. ICML, 2004, p. 79.
[17]
M. Culver, D. Kun, and S. Scott, "Active learning to maximize area under the ROC curve," in Proc. ICDM, 2006, pp. 149-158.
[18]
J. Z. Kolter and M. A. Maloof, "Using additive expert ensembles to cope with concept drift," in Proc. ICML, 2005, pp. 449-456.
[19]
P. Utgoff, "Incremental induction of decision trees," Mach. Learn., vol. 4, no. 2, pp. 161-186, Nov. 1989.
[20]
K. Tumer and J. Ghosh, "Error correlation and error reduction in ensemble classifier," Connection Sci., vol. 8, no. 3/4, pp. 385-404, Dec. 1996.
[21]
K. Tumer and J. Ghosh, "Analysis of decision boundaries in linearly combined neural classifiers," Pattern Recognit., vol. 29, no. 2, pp. 341- 348, Feb. 1996.
[22]
R. Kohavi and D. Wolpert, "Bias plus variance decomposition for zeroone loss function," in Proc. ICML, 1996, pp. 275-283.
[23]
X. Zhu and X. Wu, "Class noise vs attribute noise: A quantitative study of their impacts," Artif. Intell. Rev., vol. 22, no. 3/4, pp. 177-210, Nov. 2004.
[24]
J. Snyman, Practical Mathematical Optimization. New York: Springer-Verlag, 2005.
[25]
I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. San Mateo, CA: Morgan Kaufmann, 2005.
[26]
J. Quinlan, C4.5: Programs for Machine learning. San Mateo, CA: Morgan Kaufmann, 1993.
[27]
D. Newman, S. Hettich, C. Blake, and C. Merz, UCI Repository of Machine Learning, 1998.
[28]
P. Domingos and M. Pazzani, "On the optimality of the simple Bayesian classifier under zero-one loss," Mach. Learn., vol. 29, no. 2/3, pp. 103- 130, Nov./Dec. 1997.
[29]
L. Breiman, "Bias, variance, and arching classifiers," Statistics Dept., Univ. California at Berkeley, Berkeley, CA, Tech. Rep. #460, 1996.
[30]
J. Friedman, "On bias, variance, 0/1-loss, and the curse-of dimensionality," Data Mining Knowl. Discover, vol. 1, no. 1, pp. 55-77, 1996.
[31]
E. Kong and T. Dietterich, "Error-correcting output coding corrects bias and variance," in Proc. 12th ICML, 1995, pp. 313-321.
[32]
D. Moore and G. McCabe, Introduction to the Practice of Statistics, 4th ed. San Francisco, CA: Michelle Julet, 2002.
[33]
C. Corinna and V. Vapnik, "Support-vector networks," Mach. Learn., vol. 20, no. 3, pp. 273-297, Sep. 1995.
[34]
J. Kittler, M. Hatef, R. Duin, and J. Matas, "On combining classifiers," IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226-239, Mar. 1998.
[35]
T. Ho, J. Hull, and S. Srihari, "Decision combination in multiple classifier systems," IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 1, pp. 66- 75, Jan. 1994.
[36]
L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley, 2004.
[37]
J. Kittler and F. Alkoot, "Sum versus vote fusion in multiple classifier systems," IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 1, pp. 110- 116, Jan. 2003.
[38]
P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, "A low-granularity classifier for data streams with concept drifts and biased class distribution," IEEE Trans. Knowl. Data Eng., vol. 19, no. 9, pp. 1202-1213, Sep. 2007.
[39]
P. Mitra, C. Murthy, and S. Pal, "A probabilistic active support vector learning algorithm," IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 3, pp. 413-418, Mar. 2004.
[40]
D. Cohn, Z. Ghahramani, and M. Jordan, "Active learning with statistical models," Artif. Intell. Res., vol. 4, no. 1, pp. 129-145, Jan. 1996.
[41]
C. Campbell, N. Cristianini, and A. Smola, "Query learning with large margin classifiers," in Proc. ICML, 2000, pp. 111-118.
[42]
S. Ji, B. Krishnapuram, and L. Carin, "Variational Bayes for continuous hiddenMarkov models and its application to active learning," IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 522-532, Apr. 2006.
[43]
D. Lewis and J. Catlett, "Heterogeneous uncertainty sampling for supervised learning," in Proc. ICML, 1994, pp. 148-156.
[44]
N. Abe and H. Mamitsuka, "Query learning strategies using boosting and bagging," in Proc. 15th ICML, Madison, WI, 1998, pp. 1-9.
[45]
X. Zhu and X. Wu, "Class noise handling for effective cost-sensitive learning by cost-guided iterative classification filtering," IEEE Trans. Knowl. Data Eng., vol. 18, no. 10, pp. 1435-1440, Oct. 2006.
[46]
X. Zhu, P. Zhang, X. Lin, and Y. Shi, "Active learning from data streams," in Proc. ICDM, 2007, pp. 757-762.
[47]
J. Rodriguez, L. Kuncheva, and C. Alonso, "Rotation forest: A new classifier ensemble method," IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1619-1630, Oct. 2006.
[48]
P. Melville and R. Mooney, "Diverse ensembles for active learning," in Proc. 21st ICML, 2004, p. 74.
[49]
H. Mamitsuka and N. Abe, "Active ensemble learning: Application to data mining and bioinformatics," Syst. Comput. Jpn., vol. 38, no. 11, pp. 100- 108, 2007.
[50]
J. Huang, S. Ertekin, Y. Song, H. Zha, and C. Giles, "Efficient multiclass boosting classification with active learning," in Proc. SDM, 2007, pp. 297-308.
[51]
N. Pedrajas, C. Osorio, and C. Fyfe, "Nonlinear boosting projections for ensemble construction," J. Mach. Learn. Res., vol. 8, pp. 1-33, 2007.
[52]
L. Breiman, "Bagging predictors," Mach. Learn., vol. 24, no. 2, pp. 123- 140, Aug. 1996.
[53]
Y. Freud and R. Schapire, "Experiments with a new boosting algorithm," in Proc. ICML, 1996, pp. 148-156.
[54]
W. Hu, W. Hu, N. Xie, and S. Maybank, "Unsupervised active learning based on hierarchical graph-theoretic clustering," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 5, pp. 1147-1161, Oct. 2009.
[55]
G. Stemp-Morlock, "Learning more about active learning," Commun. ACM, vol. 52, no. 4, pp. 11-13, Apr. 2009.
[56]
C. Monteleoni and M. Kaariainen, "Practical online active learning for classification," in Proc. CVPR Online Learning Classification Workshop, 2007, pp. 1-8.
[57]
B. Settles, "Active learning literature survey," Univ. Wisconsin-Madison, Madison, WI, Computer Science Tech. Rep. 1648, 2009.
[58]
J. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd ed. New York: Springer-Verlag, 1985.
[59]
X. Zhu, X. Wu, and C. Zhang, "Vague one-class learning for data streams," in Proc. ICDM, 2009, pp. 657-666.
[60]
X. Zhu, Active learning for concept drifting data streams: Results and source codes, Florida Atlantic Univ., Boca Raton, FL. {Online}. Available: http://www.cse.fau.edu/~xqzhu/Stream/ActiveLearning/index.html

Cited By

View all
  1. Active learning from stream data using optimal weight classifier ensemble

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
    IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics  Volume 40, Issue 6
    December 2010
    224 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 December 2010
    Accepted: 07 January 2010
    Revised: 12 November 2009
    Received: 23 July 2009

    Author Tags

    1. Active learning
    2. active learning
    3. classifier ensemble
    4. stream data

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Robust online active learning with cluster-based local drift detection for unbalanced imperfect dataApplied Soft Computing10.1016/j.asoc.2024.112051165:COnline publication date: 1-Nov-2024
    • (2023)Fraud analyticsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120605232:COnline publication date: 1-Dec-2023
    • (2022)A comprehensive analysis of the diverse aspects inherent to image data stream classificationKnowledge and Information Systems10.1007/s10115-022-01717-164:8(2215-2238)Online publication date: 1-Aug-2022
    • (2022)Classifying a Stream of Infinite Concepts: A Bayesian Non-parametric ApproachMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_1(1-16)Online publication date: 10-Mar-2022
    • (2021)Active learning for imbalanced data under cold startProceedings of the Second ACM International Conference on AI in Finance10.1145/3490354.3494423(1-9)Online publication date: 3-Nov-2021
    • (2020)A survey on ensemble learningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8208-z14:2(241-258)Online publication date: 1-Apr-2020
    • (2020)Active Learning Query Strategies for Classification, Regression, and Clustering: A SurveyJournal of Computer Science and Technology10.1007/s11390-020-9487-435:4(913-945)Online publication date: 1-Jul-2020
    • (2018)GOOWEACM Transactions on Knowledge Discovery from Data10.1145/313924012:2(1-33)Online publication date: 23-Jan-2018
    • (2018)Research and Design of an Automatic Grading Device in Chicken Wing WeightWireless Personal Communications: An International Journal10.1007/s11277-017-5099-x102:2(769-782)Online publication date: 1-Sep-2018
    • (2017)Learning non-linear dynamics of decision boundaries for maintaining classification performanceProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298483.3298545(2117-2123)Online publication date: 4-Feb-2017
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media