[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1066157.1066226acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

BRAID: stream mining through group lag correlations

Published: 14 June 2005 Publication History

Abstract

The goal is to monitor multiple numerical streams, and determine which pairs are correlated with lags, as well as the value of each such lag. Lag correlations (and anti-correlations) are frequent, and very interesting in practice: For example, a decrease in interest rates typically precedes an increase in house sales by a few months; higher amounts of fluoride in the drinking water may lead to fewer dental cavities, some years later. Additional settings include network analysis, sensor monitoring, financial data analysis, and moving object tracking. Such data streams are often correlated (or anti-correlated), but with an unknown lag.We propose BRAID, a method to detect lag correlations between data streams. BRAID can handle data streams of semi-infinite length, incrementally, quickly, and with small resource consumption. We also provide a theoretical analysis, which, based on Nyquist's sampling theorem, shows that BRAID can estimate lag correlations with little, and often with no error at all. Our experiments on real and realistic data show that BRAID detects the correct lag perfectly most of the time (the largest relative error was about 1%); while it is up to 40,000 times faster than the naive implementation.

References

[1]
D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Aurora: a new model and architecture for data stream management. VLDB Journal, 12(2):120--139, 2003.
[2]
R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. FODO, pages 69--84, Oct. 1993.
[3]
B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain.: Operator scheduling for memory minimization in data stream systems. ACM SIGMOD, pages 253--264, June 2003.
[4]
G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 1994.
[5]
R. P. Brent. Algorithm for Minimization without Derivatives. Dover Publications, 2002.
[6]
D. Carney, U. Cetintemel, A. Rasin, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. VLDB, pages 838--849, Sept. 2003.
[7]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. Telegraphcq: Continuous dataflow processing for an uncertain world. CIDR, Jan. 2003.
[8]
S. Chandrasekaran and M. J. Franklin. Remembrance of streams past: Overload-sensitive management of archived streams. VLDB, pages 348--359, August-September 2004.
[9]
C. D. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. Gigascope: A stream database for network applications. ACM SIGMOD, pages 647--651, June 2003.
[10]
A. Das, J. Gehrke, and M. Riedewald. Approximate join processing over data streams. ACM SIGMOD, pages 40--51, June 2003.
[11]
A. Dobra, M. N. Garofalakis, J. Gehrke, and R. Rastogi. Processing complex aggregate queries over data streams. ACM SIGMOD, pages 61--72, June 2002.
[12]
P. Domingos and G. Hulten. Mining high-speed data streams. KDD, pages 71--80, 2000.
[13]
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. ACM SIGMOD, pages 419--429, Minneapolis, MN, May 25--27 1994.
[14]
G. E. Forsythe. Computer Methods for Mathematical Computations. Prentice-Hall, 1977.
[15]
S. Ganguly, M. N. Garofalakis, and R. Rastogi. Processing set expressions over continuous update streams. ACM SIGMOD, pages 265--276, June 2003.
[16]
V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining data streams under block evolution. SIGKDD Explorations, 3(2):1--10, 2002.
[17]
A. C. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Near-optimal sparse fourier representations via sampling. STOC, pages 152--161, 2002.
[18]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. VLDB, pages 79--88, Sept. 2001.
[19]
S. Guha, C. Kim, and K. Shim. Xwave: Approximate extended wavelets for streaming data. VLDB, pages 288--299, August-September 2004.
[20]
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. IEEE TKDE, 15(3):515--528, 2003.
[21]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
[22]
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. KDD, pages 97--106, 2001.
[23]
E. J. Keogh. Exact indexing of dynamic time warping. VLDB, pages 406--417, Aug. 2002.
[24]
E. J. Keogh, K. Chakrabarti, S. Mehrotra, and M. J. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. ACM SIGMOD, pages 151--162, May 2001.
[25]
K. Koper, T. Wallace, S. Taylor, and H. Hartse. Forensic seismology and the sinking of the kursk. EOS Trans., AGU, 82, pages 37,45--46, 2001.
[26]
N. Koudas, B. C. Ooi, K.-L. Tan, and R. Zhang. Approximate nn queries on streams with guaranteed error/performance bounds. VLDB, pages 804--815, August-September 2004.
[27]
B. P. Lathi. Signal Processing and Linear Systems. Oxford University Press, 1998.
[28]
S. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. ACM SIGMOD, pages 49--60, June 2002.
[29]
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. S. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system. CIDR, Jan. 2003.
[30]
S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. VLDB, pages 560--571, Sept. 2003.
[31]
N. Tatbul, U. Cetintemel, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Load shedding in a data stream manager. VLDB, pages 309--320, Sept. 2003.
[32]
M. Wang, T. Madhyastha, N. H. Chang, S. Papadimitriou, and C. Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. ICDE, Feb. 2002.
[33]
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE, pages 13--22, 2000.
[34]
Y. Zhu and D. Shasha. Statistical monitoring of thousands of data streams in real time. VLDB, pages 358--369, Aug. 2002.
[35]
Y. Zhu and D. Shasha, Efficient elastic burst detection in data streams. KDD, pages 336--345, 2003.

Cited By

View all
  • (2024)Subsequence Join in Streaming Time Series under Dynamic Time WarpingVietnam Journal of Computer Science10.1142/S219688882350020311:02(241-274)Online publication date: 5-Jan-2024
  • (2024)Static and Streaming Discovery of Maximal Linear Representation Between Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328727336:1(401-415)Online publication date: Jan-2024
  • (2024)Optimizing Operators for Temporal and Spatiotemporal Data2024 25th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM61037.2024.00068(331-333)Online publication date: 24-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
  • Conference Chair:
  • Fatma Ozcan
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)4
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Subsequence Join in Streaming Time Series under Dynamic Time WarpingVietnam Journal of Computer Science10.1142/S219688882350020311:02(241-274)Online publication date: 5-Jan-2024
  • (2024)Static and Streaming Discovery of Maximal Linear Representation Between Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328727336:1(401-415)Online publication date: Jan-2024
  • (2024)Optimizing Operators for Temporal and Spatiotemporal Data2024 25th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM61037.2024.00068(331-333)Online publication date: 24-Jun-2024
  • (2024)Comparing the Performance of Three Computational Methods for Estimating the Effective Reproduction NumberJournal of Computational Biology10.1089/cmb.2023.006531:2(128-146)Online publication date: 1-Feb-2024
  • (2023)Detecting Lead-Lag Relationships in Stock Returns and Portfolio StrategiesSSRN Electronic Journal10.2139/ssrn.4599565Online publication date: 2023
  • (2023)AMNES: Accelerating the Computation of Data Correlation Using FPGAsProceedings of the VLDB Endowment10.14778/3625054.362505616:13(4174-7187)Online publication date: 1-Sep-2023
  • (2023)Dangoron: Network Construction on Large-scale Time Series Data across Sliding WindowsCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589399(269-271)Online publication date: 4-Jun-2023
  • (2023)Unraveling the Impact of COVID-19 Pandemic Dynamics on Commercial Water-Use VariationJournal of Water Resources Planning and Management10.1061/JWRMD5.WRENG-5940149:8Online publication date: Aug-2023
  • (2023)Inference and analysis on the evidential reasoning rule with time-lagged dependenciesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106978126:PCOnline publication date: 1-Nov-2023
  • (2022)Combining Filtering and Cross-Correlation Efficiently for Streaming Time SeriesACM Transactions on Knowledge Discovery from Data10.1145/350273816:5(1-24)Online publication date: 24-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media