Article

Stop Chasing Trends: Discovering High Order Models in Evolving Data

Authors:

Shixi Chen,

Haixun Wang,

Shuigeng Zhou,

Philip S. YuAuthors Info & Claims

ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Pages 923 - 932

https://doi.org/10.1109/ICDE.2008.4497501

Published: 07 April 2008 Publication History

Publisher Site

Abstract

Many applications are driven by evolving data - patterns in Web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction models for evolving data becomes an important and challenging task. Currently, most approaches work by "chasing trends", that is, they keep learning or updating models from the evolving data, and use these impromptu models for online prediction. In many cases, this proves to be both costly and ineffective - much time is wasted on re-learning recurring concepts, yet the classifier may remain one step behind the current trend all the time. In this paper, we propose to mine high-order models in evolving data. More often than not, there are a limited number of concepts, or stable distributions, in the data stream, and concepts switch between each other constantly. We mine all such concepts offline from a historical stream, and build high quality models for each of them. At run time, combining historical concept change patterns and cues provided by an online training stream, we find the most likely current concept and use its corresponding models to classify data in an unlabeled stream. The primary advantage of the high-order model approach is its high accuracy. Experiments show that in benchmark datasets, classification error of the high-order model is only a small fraction of that of the current best approaches. Another important benefit is that, unlike state-of-the-art approaches, our approach does not require users to tune any parameters to achieve a satisfying result on streams of different characteristics.

Cited By

View all

Gandhi JGandhi V(2020)Novel Class Detection with Concept Drift in Data Stream - AhtNODEInternational Journal of Distributed Systems and Technologies10.4018/IJDST.202001010211:1(15-26)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.4018/IJDST.2020010102
Ali MAl-Shaer EKhan HKhayam S(2013)Automated Anomaly Detector Adaptation using Adaptive Threshold TuningACM Transactions on Information and System Security10.1145/2445566.244556915:4(1-30)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1145/2445566.2445569
Kamath KCaverlee JChen XLebanon GWang HZaki M(2012)Content-based crowd retrieval on the real-time webProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396789(195-204)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396761.2396789
Show More Cited By

Recommendations

Chasing rainfall: estimating event precipitation along tracks of tropical cyclones via reanalysis data and in-situ gauges
Abstract
Tropical cyclones (TCs) are an important water source in many regions around the world, replenishing local dams, waterways and groundwater systems. Three diverse precipitation datasets were tested for dissimilarities in their rainfall ...
Graphical abstract

Display Omitted
Highlights
- rainfall_tracker: free toolbox for MATLAB and GNU Octave to extract rain properties.
Forecasting Trends in Time Series

Most time series methods assume that any trend will continue unabated, regardless of the forecast lead time. But recent empirical findings suggest that forecast accuracy can be improved by either damping or ignoring altogether trends which have a low ...
Designing to Stop Live Streaming Cyberbullying: A case study of Twitch Live Streaming Platform
C&T '21: Proceedings of the 10th International Conference on Communities & Technologies - Wicked Problems in the Age of Tech

Cyberbullying is widespread in online communities. The ability to protect users and platforms from the dangers of cyberbullying has become essential. In this work, we conducted a qualitative study of Twitch community to explore the design elements that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

April 2008

1628 pages

ISBN:9781424418367

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 April 2008

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Gandhi JGandhi V(2020)Novel Class Detection with Concept Drift in Data Stream - AhtNODEInternational Journal of Distributed Systems and Technologies10.4018/IJDST.202001010211:1(15-26)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.4018/IJDST.2020010102
Ali MAl-Shaer EKhan HKhayam S(2013)Automated Anomaly Detector Adaptation using Adaptive Threshold TuningACM Transactions on Information and System Security10.1145/2445566.244556915:4(1-30)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1145/2445566.2445569
Kamath KCaverlee JChen XLebanon GWang HZaki M(2012)Content-based crowd retrieval on the real-time webProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396789(195-204)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396761.2396789
Masud MWoolam CGao JKhan LHan JHamlen KOza N(2012)Facing the reality of data stream classificationKnowledge and Information Systems10.1007/s10115-011-0447-833:1(213-244)Online publication date: 1-Oct-2012
https://dl.acm.org/doi/10.1007/s10115-011-0447-8
Brice PJiang WWan G(2011)A Cluster-Based Context-Tree Model for Multivariate Data Streams with Applications to Anomaly DetectionINFORMS Journal on Computing10.1287/ijoc.1100.040723:3(364-376)Online publication date: 1-Jul-2011
https://dl.acm.org/doi/10.1287/ijoc.1100.0407
Sadik SGruenwald LDesai BCruz IBernardino J(2011)Online outlier detection for data streamsProceedings of the 15th Symposium on International Database Engineering & Applications10.1145/2076623.2076635(88-96)Online publication date: 21-Sep-2011
https://dl.acm.org/doi/10.1145/2076623.2076635
Wang PWang HWang WSellis TMiller RKementsietsidis AVelegrakis Y(2011)Finding semantics in time seriesProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989364(385-396)Online publication date: 12-Jun-2011
https://dl.acm.org/doi/10.1145/1989323.1989364
Masud MChen QGao JKhan LHan JThuraisingham B(2010)Classification and novel class detection of data streams in a dynamic feature spaceProceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II10.5555/1888305.1888328(337-352)Online publication date: 20-Sep-2010
https://dl.acm.org/doi/10.5555/1888305.1888328
Wang YZuo JYang NDuan LLi HZhu J(2010)An efficient approach for mining segment-wise intervention rules in time-series streamsProceedings of the 11th international conference on Web-age information management10.5555/1884017.1884049(238-249)Online publication date: 15-Jul-2010
https://dl.acm.org/doi/10.5555/1884017.1884049
Tan YGu XWang HRicha AGuerraoui R(2010)Adaptive system anomaly prediction for large-scale hosting infrastructuresProceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing10.1145/1835698.1835741(173-182)Online publication date: 25-Jul-2010
https://dl.acm.org/doi/10.1145/1835698.1835741
Show More Cited By

Abstract

Cited By

Recommendations

Chasing rainfall: estimating event precipitation along tracks of tropical cyclones via reanalysis data and in-situ gauges

Forecasting Trends in Time Series

Designing to Stop Live Streaming Cyberbullying: A case study of Twitch Live Streaming Platform

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations