[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Framework for On-Demand Classification of Evolving Data Streams

Published: 01 May 2006 Publication History

Abstract

Current models of the classification problem do not effectively handle bursts of particular classes coming in at different times. In fact, the current model of the classification problem simply concentrates on methods for one-pass classification modeling of very large data sets. Our model for data stream classification views the data stream classification problem from the point of view of a dynamic approach in which simultaneous training and test streams are used for dynamic classification of data sets. This model reflects real-life situations effectively, since it is desirable to classify test streams in real time over an evolving training and test stream. The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream. In order to achieve this goal, we propose an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier. The empirical results indicate that the system maintains a high classification accuracy in an evolving data stream, while providing an efficient solution to the classification task.

References

[1]
C.C. Aggarwal, J. Han, J. Wang, and P. Yu, “On Demand Classification of Data Streamsm,” Proc. ACM KDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 503-508, Aug. 2004.
[2]
C.C. Aggarwal, J. Han, J. Wang, and P. Yu, “CluStream: A Framework for Clustering Evolving Data Streams,” Proc. Int'l Conf. Very Large Data Bases, pp. 81-92, Sept. 2003.
[3]
C.C. Aggarwal, “A Framework for Diagnosing Changes in Evolving Data Streams,” Proc. ACM SIGMOD Conf., pp. 575-586, June 2003.
[4]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, pp. 1-16, June 2002.
[5]
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-Data Algorithms For High-Quality Clustering,” Proc. 18th Int'l Conf. Data Eng., pp. 685-696, Feb. 2002.
[6]
P. Bradley, U. Fayyad, and C. Reina, “Scaling Clustering Algorithms to Large Databases,” Proc. Knowledge Discovery and Data Mining Conf., pp. 9-15, 1998.
[7]
Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, “MultiDimensional Regression Analysis of Time-Series Data Streams,” Proc. 28th Int'l Conf. Very Large Data Bases, pp. 323-334, Aug. 2002.
[8]
P. Domingos and G. Hulten, “Mining High-Speed Data Streams,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 71-80, Aug. 2000.
[9]
P. Domingos and G. Hulten, “A General Method for Scaling Up Machine Learning Algorithms and Its Application to Clustering,” Proc. Int'l Conf. Machine Learning, pp. 106-113, 2001.
[10]
R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[11]
J.H. Friedman, “A Recursive Partitioning Decision Rule for Non-Parametric Classifiers,” IEEE Trans. Computers, vol. 26, pp. 404-408, 1977.
[12]
J. Gehrke, V. Ganti, R. Ramakrishnan, and W.-Y. Loh, “BOAT: Optimistic Decision Tree Construction,” Proc. 1999 ACM SIGMOD Int'l Conf. Management of Data, pp. 169-180, June 1999.
[13]
G. Hulten, L. Spencer, and P. Domingos, “Mining Time Changing Data Streams,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 97-106, Aug. 2001.
[14]
F. Farnstrom, J. Lewis, and C. Elkan, “Scalability for Clustering Algorithms Revisited,” SIGKDD Explorations, vol. 2, no. 1, pp. 51-57, 2000.
[15]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan, “Testing and Spot-Checking of Data Streams,” Proc. 11th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 165-174, Jan. 2000.
[16]
F.J. Ferrer-Troyano, J.S. Aguilar-Ruiz, and J.C. Riquelme, “Discovering Decision Rules from Numerical Data Streams,” ACM Symp. Applied Computing, pp. 649-653, 2004.
[17]
J. Fong and M. Strauss, “An Approximate $L^p{\hbox{-}}{\rm{Difference}}$ Algorithm for Massive Data Streams,” Proc. 17th Ann. Symp. Theoretical Aspects of Computer Science, pp. 193-204, Feb. 2000.
[18]
J. Gama, R. Rocha, and P. Medas, “Accurate Decision Trees for Mining High-Speed Data Streams,” Proc. Ninth Int'l Conf. Knowledge Discovery and Data Mining, pp. 523-528, Aug. 2003.
[19]
J. Gehrke, F. Korn, and D. Srivastava, “On Computing Correlated Aggregates over Continual Data Streams,” Proc. 2001 ACM SIGMOD Int'l Conf. Management of Data, pp. 271-282, May 2001.
[20]
S. Guha and N. Koudas, “Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation,” Proc. 18th Int'l Conf. Data Eng., pp. 567-578, Feb. 2002.
[21]
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams,” Proc. 41st Annual Symp. Foundations of Computer Science, pp. 359-366, Nov. 2000.
[22]
A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries,” Proc. 27th Int'l Conf. Very Large Data Bases, pp. 79-88, Sept. 2001.
[23]
R. Jin and G. Agrawal, “Efficient Decision Tree Construction on Streaming Data,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 571-576, Aug. 2003.
[24]
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma, “Query Processing, Resource Management, and Approximation in a Data Stream Management System,” Proc. First Biennial Conf. Innovative Data Systems Research, Jan. 2003.
[25]
H. Wang, W. Fan, P. Yu, and J. Han, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 226-235, Aug. 2003.
[26]
T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proc. 1996 ACM SIGMOD Int'l Conf. Management of Data, pp. 103-114, June 1996.

Cited By

View all
  • (2022)Analysis of Performance Improvement of Real-time Internet of Things Application Data Processing in the Movie Industry PlatformComputational Intelligence and Neuroscience10.1155/2022/52372522022Online publication date: 1-Jan-2022
  • (2022)Automatic disease vector mosquitoes identification via hierarchical data stream classificationProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507019(1005-1012)Online publication date: 25-Apr-2022
  • (2022)Auxiliary Network: Scalable and Agile Online Learning for Dynamic System with Inconsistently Available InputsNeural Information Processing10.1007/978-3-031-30105-6_46(549-561)Online publication date: 22-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 18, Issue 5
May 2006
143 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 May 2006

Author Tags

  1. Stream classification
  2. geometric time frame
  3. microclustering
  4. nearest neighbor.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Analysis of Performance Improvement of Real-time Internet of Things Application Data Processing in the Movie Industry PlatformComputational Intelligence and Neuroscience10.1155/2022/52372522022Online publication date: 1-Jan-2022
  • (2022)Automatic disease vector mosquitoes identification via hierarchical data stream classificationProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507019(1005-1012)Online publication date: 25-Apr-2022
  • (2022)Auxiliary Network: Scalable and Agile Online Learning for Dynamic System with Inconsistently Available InputsNeural Information Processing10.1007/978-3-031-30105-6_46(549-561)Online publication date: 22-Nov-2022
  • (2020)Novel Class Detection with Concept Drift in Data Stream - AhtNODEInternational Journal of Distributed Systems and Technologies10.4018/IJDST.202001010211:1(15-26)Online publication date: 1-Jan-2020
  • (2019)Dynamic Structure Embedded Online Multiple-Output Regression for Streaming DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.279444641:2(323-336)Online publication date: 1-Feb-2019
  • (2019)A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clusteringKnowledge and Information Systems10.1007/s10115-019-01336-361:2(755-798)Online publication date: 1-Nov-2019
  • (2017)Learning with feature evolvable streamsProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294771.3294906(1416-1426)Online publication date: 4-Dec-2017
  • (2017)News event evolution model based on the reading willingness and modified TF-IDF formulaJournal of High Speed Networks10.3233/JHS-17055523:1(33-47)Online publication date: 1-Jan-2017
  • (2017)A multiple-instance stream learning framework for adaptive document categorizationKnowledge-Based Systems10.1016/j.knosys.2017.01.001120:C(198-210)Online publication date: 15-Mar-2017
  • (2015)A survey on data stream clustering and classificationKnowledge and Information Systems10.1007/s10115-014-0808-145:3(535-569)Online publication date: 1-Dec-2015
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media