[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1888305.1888328guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Classification and novel class detection of data streams in a dynamic feature space

Published: 20 September 2010 Publication History

Abstract

Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Concept-drift occurs in a data stream when the underlying concept changes over time. Most existing data stream classification techniques address only the infinite length and concept-drift problems. However, concept-evolution and feature- evolution are also major challenges, and these are ignored by most of the existing approaches. Concept-evolution occurs in the stream when novel classes arrive, and feature-evolution occurs when new features emerge in the stream. Our previous work addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Most of the existing data stream classification techniques, including our previous work, assume that the feature space of the data points in the stream is static. This assumption may be impractical for some type of data, for example text data. DXMiner considers the dynamic nature of the feature space and provides an elegant solution for classification and novel class detection when the feature space is dynamic. We show that our approach outperforms state-of-the-art stream classification techniques in classifying and detecting novel classes in real data streams.

References

[1]
Chen, S., Wang, H., Zhou, S., Yu, P.: Stop chasing trends: Discovering high order models in evolving data. In: Proc. ICDE 2008, pp. 923-932 (2008).
[2]
Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD, Seattle, WA, USA, pp. 128-137 (2004).
[3]
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: SIGKDD, San Francisco, CA, USA, pp. 97-106 (August 2001).
[4]
Katakis, I., Tsoumakas, G., Vlahavas, I.: Dynamic feature space and incremental feature selection for the classification of textual data streams. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 102-116. Springer, Heidelberg (2006).
[5]
Kolter, J., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: ICML, Bonn, Germany, pp. 449-456 (August 2005).
[6]
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Integrating novel class detection with classification for concept-drifting data streams. In: Buntine, W., Grobelnik, M., Mladenic, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 79-94. Springer, Heidelberg (2009); Extended version is in the preprints, IEEE TKDE, vol. 99 (2010), doi = http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.61
[7]
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: Perner, P. (ed.) ICDM 2008. LNCS (LNAI), vol. 5077, pp. 929-934. Springer, Heidelberg (2008).
[8]
Spinosa, E.J., de Leon, A.P., de Carvalho, F., Gama, J.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: ACM SAC, pp. 976-980 (2008).
[9]
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003, pp. 226-235 (2003).
[10]
Wenerstrom, B., Giraud-Carrier, C.: Temporal data mining in dynamic feature spaces. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 1141-1145. Springer, Heidelberg (2006).
[11]
Yang, Y., Wu, X., Zhu, X.: Combining proactive and reactive predictions for data streams. In: Proc. SIGKDD, pp. 710-715 (2005).

Cited By

View all
  • (2019)Emerging topics and challenges of learning from noisy data in nonstandard classificationKnowledge and Information Systems10.1007/s10115-018-1244-460:1(63-97)Online publication date: 1-Jul-2019
  • (2017)A Survey on Ensemble Learning for Data Stream ClassificationACM Computing Surveys10.1145/305492550:2(1-36)Online publication date: 27-Mar-2017
  • (2016)A three-way incremental-learning algorithm for radar emitter identificationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-015-4457-710:4(673-688)Online publication date: 1-Aug-2016
  • Show More Cited By
  1. Classification and novel class detection of data streams in a dynamic feature space

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ECML PKDD'10: Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
    September 2010
    518 pages
    ISBN:364215882X

    Sponsors

    • PASCAL2 - Pattern Analysis, Statistical Modelling and Computational Learning
    • Nokia
    • Google Inc.
    • INRIA: Institut Natl de Recherche en Info et en Automatique
    • Yahoo! Labs

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 20 September 2010

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Emerging topics and challenges of learning from noisy data in nonstandard classificationKnowledge and Information Systems10.1007/s10115-018-1244-460:1(63-97)Online publication date: 1-Jul-2019
    • (2017)A Survey on Ensemble Learning for Data Stream ClassificationACM Computing Surveys10.1145/305492550:2(1-36)Online publication date: 27-Mar-2017
    • (2016)A three-way incremental-learning algorithm for radar emitter identificationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-015-4457-710:4(673-688)Online publication date: 1-Aug-2016
    • (2013)Data stream dynamic clustering supported by Markov chain isomorphismsIntelligent Data Analysis10.5555/2595566.259557217:3(439-457)Online publication date: 1-May-2013

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media