[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2390045.2390062acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Towards benchmarking stream data warehouses

Published: 02 November 2012 Publication History

Abstract

Data management systems are facing two challenges driven by the requirements of emerging data-intensive applications: more data and less time to process the data. Data volumes continue to increase as new sources and data collecting mechanisms appear. At the same time, these sources tend to be highly dynamic and generate data in the form of a stream, which requires quick reaction to newly arrived data. Traditional data warehouses enable scalable data storage and analytics, including the ability to define nested levels of materialized views. However, views are typically refreshed during downtimes---e.g., every night---which does not meet the latency requirements of many applications. Stream data warehousing is a new data management technology that allows nearly-continuous view refresh as new data arrive, which enables seamless integration of real-time monitoring and business intelligence with long-term data mining. In this paper, we argue that a new benchmark is required for stream warehouses, which should focus on measuring the property that determines the utility of these systems, namely how well they can keep up with the incoming data and guarantee the "freshness" of materialized views.

References

[1]
Brad Adelberg, Hector Garcia-Molina, Ben Kao: Applying Update Streams in a Soft Real-Time Database System. SIGMOD Conference 1995: 245--256
[2]
Mona Ahuja, Cheng Che Chen, Ravi Gottapu, Jorg Hallmann, Waqar Hasan, Richard Johnson, Maciek Kozyrczak, Ramesh Pabbati, Neeta Pandit, Sreenivasulu Pokuri, Krishna Uppala: Peta-scale data warehousing at Yahoo! SIGMOD Conference 2009: 855--862
[3]
Arvind Arasu, Mitch Cherniack, Eduardo F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, Richard Tibbetts: Linear Road: A Stream Data Management Benchmark. VLDB 2004: 480--491
[4]
Arian Baer, Antonio Barbuzzi, Pietro Michiardi, Fabio Ricciato: Two parallel approaches to network data analysis, 5th Workshop on Large Scale Distributed Systems and Middleware (LADIS), 2011
[5]
Magdalena Balazinska, YongChul Kwon, Nathan Kuchta, Dennis Lee: Moirae: History-Enhanced Monitoring. CIDR 2007: 375--386
[6]
MohammadHossein Bateni, Lukasz Golab, MohammadTaghi Hajiaghayi, Howard J. Karloff: Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses. Theory Comput. Syst. 49(4): 757--780 (2011)
[7]
Badrish Chandramouli, Jonathan Goldstein, Songyun Duan: Temporal Analytics on Big Data for Web Advertising. ICDE 2012: 90--101
[8]
Junghoo Cho, Hector Garcia-Molina: Synchronizing a Database to Improve Freshness. SIGMOD Conference 2000: 117--128
[9]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears: Benchmarking cloud serving systems with YCSB. SoCC 2010: 143--154
[10]
Jeffrey Dean, Sanjay Ghemawat: MapReduce: a flexible data processing tool. Commun. ACM 53(1): 72--77 (2010)
[11]
Nathan Folkert, Abhinav Gupta, Andrew Witkowski, Sankar Subramanian, Srikanth Bellamkonda, Shrikanth Shankar, Tolga Bozkaya, Lei Sheng: Optimizing Refresh of a Set of Materialized Views. VLDB 2005: 1043--1054
[12]
Michael J. Franklin, Sailesh Krishnamurthy, Neil Conway, Alan Li, Alex Russakovsky, Neil Thombre: Continuous Analytics: Rethinking Query Processing in a Network-Effect World. CIDR 2009
[13]
Lukasz Golab, Theodore Johnson: Consistency in a Stream Warehouse. CIDR 2011: 114--122
[14]
Lukasz Golab, Theodore Johnson, J. Spencer Seidel, Vladislav Shkapenyuk: Stream warehousing with DataDepot. SIGMOD Conference 2009: 847--854
[15]
Lukasz Golab, Theodore Johnson, Subhabrata Sen, Jennifer Yates: A Sequence-Oriented Stream Warehouse Paradigm for Network Monitoring Applications. PAM 2012: 53--63
[16]
Lukasz Golab, Theodore Johnson, Vladislav Shkapenyuk, Scalable Scheduling of Updates in Streaming Data Warehouses. IEEE Trans. Knowl. Data Eng. (TKDE) 24(6):1092--1105
[17]
Lukasz Golab, M. Tamer Ozsu: Data Stream Management. Morgan & Claypool Publishers, 2010
[18]
Matteo Golfarelli, Stefano Rizzi: Data Warehouse Testing. IJDWM 7(2): 26--43 (2011)
[19]
Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, Neil Thombre: Continuous analytics over discontinuous streams. SIGMOD Conference 2010: 1081--1092
[20]
Wilburt Labio, Ramana Yerneni, Hector Garcia-Molina: Shrinking the Warehouse Update Window. SIGMOD Conference 1999: 383--394
[21]
AWI Working Group Traffic Archive http://mawi.wide.ad.jp/mawi/
[22]
Marcelo R. N. Mendes, Pedro Bizarro, Paulo Marques: A Performance Study of Event Processing Systems. TPCTC 2009: 221--236
[23]
M. Asif Naeem, Gillian Dobbie, Gerald Weber: HYBRIDJOIN for Near-Real-Time Data Warehousing. IJDWM 7(4): 21--42 (2011)
[24]
M. Asif Naeem, Gillian Dobbie, Gerald Weber, Shafiq Alam: R-MESHJOIN for near-real-time data warehousing. DOLAP 2010: 53--60
[25]
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker: A comparison of approaches to large-scale data analysis. SIGMOD Conference 2009: 165--178
[26]
Neoklis Polyzotis, Spiros Skiadopoulos, Panos Vassiliadis, Alkis Simitsis, Nils-Erik Frantzell: Meshing Streaming Updates with Persistent Data in an Active Data Warehouse. IEEE Trans. Knowl. Data Eng. 20(7): 976--991 (2008)
[27]
Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, Joseph M. Hellerstein: Enabling Real-Time Querying of Live and Historical Stream Data. SSDBM 2007: 28
[28]
Alkis Simitsis, Panos Vassiliadis, Umeshwar Dayal, Anastasios Karagiannis, Vasiliki Tziovara: Benchmarking ETL Workflows. TPCTC 2009: 199--220
[29]
Christian Thomsen, Torben Bach Pedersen, Wolfgang Lehner: RiTE: Providing On-Demand Data for Right-Time Data Warehousing. ICDE 2008: 456--465
[30]
PC-DS benchmark. Transaction Processing Council,\\ http://www.tpc.org/tpcds/default.asp
[31]
Kristin Tufte, Jin Li, David Maier, Vassilis Papadimos, Robert L. Bertini, James Rucker: Travel time estimation using NiagaraST and latte. SIGMOD Conference 2007: 1091--1093

Cited By

View all
  • (2023)The stream data warehouseFuture Generation Computer Systems10.1016/j.future.2023.01.003142:C(212-227)Online publication date: 1-May-2023
  • (2014)Database Processing BenchmarksEncyclopedia of Information Science and Technology, Third Edition10.4018/978-1-4666-5888-2.ch167(1741-1747)Online publication date: 31-Jul-2014
  • (2012)DOLAP 2012 workshop summaryProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398765(2780-2781)Online publication date: 29-Oct-2012

Recommendations

Reviews

Amos O Olagunju

The conduct of business continues to change rapidly as new technologies and techniques evolve. One result of this change is that enterprises and governments sometimes find themselves maintaining historical business data in very large databases without adequate time-sensitive data mining algorithms. A reliable framework for data mining multidimensional graphs of heterogeneous information networks exists for the familiar online analytical processing (OLAP) technology [1]. However, effective business decisions can still benefit from intelligent systems that are capable of processing new real-time transactions by taking advantage of the historical business patterns in data warehouses. How often should an effective data warehouse be refreshed to accommodate new incoming business transactions__?__ How should businesses gauge the effectiveness of continuously updated data warehouses that provide business intelligence__?__ The authors of this paper propose a set of ideas for testing the performance of data warehouses that support almost instantaneous refreshing of table views as new data arrives. The framework for benchmarking data warehouses that restore table views in practically real-time involves two steps. First, the authors assess the latency between the incoming fresh business data and the propagation of this data through the base tables and into views. From this, they compute the "staleness" associated with processing alternative hierarchies of tables and real views of workloads containing different queries and batch window data sizes. This results in a near real-time staleness value for new transactions over a given period, which is the sum of all areas under the staleness value graph lines. The metric for investigating the performance of transaction processing systems for data warehouses is the ratio of total queries multiplied by the gigabyte rate of the transaction processing system, relative to the total load time of transaction data plus its power test consumption and the total execution times for testing different queries for insertion, deletion, and addition operations. The authors comprehensively review the literature to ascertain the credibility of bid metrics for assessing the performance of data warehouses that almost instantaneously update tables and provide real views of incoming transaction data. They use a hypothetical situation to illustrate the computation of the staleness value, and propose future practical and simulation experiments to validate metrics for stream data warehouses. Clearly, the authors open up an interesting, long-overdue discussion on the reliability of the data warehouses that data mining algorithms operate on. Until new prototype systems emerge that can measure the reliability of stream data warehouses, the practical applications of the metrics advocated by the authors will remain doubtful and open to discussion. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DOLAP '12: Proceedings of the fifteenth international workshop on Data warehousing and OLAP
November 2012
154 pages
ISBN:9781450317214
DOI:10.1145/2390045
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data warehouse benchmarking
  2. materialized view maintenance
  3. stream data warehousing

Qualifiers

  • Research-article

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 29 of 79 submissions, 37%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The stream data warehouseFuture Generation Computer Systems10.1016/j.future.2023.01.003142:C(212-227)Online publication date: 1-May-2023
  • (2014)Database Processing BenchmarksEncyclopedia of Information Science and Technology, Third Edition10.4018/978-1-4666-5888-2.ch167(1741-1747)Online publication date: 31-Jul-2014
  • (2012)DOLAP 2012 workshop summaryProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398765(2780-2781)Online publication date: 29-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media