Abstract
Nowadays, it is prevalent to build OLAP services on cloud platforms. Cloud OLAP adopters are eager to understand and characterize the performance of OLAP engines on the cloud. However, traditional OLAP benchmarks are usually designed for on-premise environments. When evaluating cloud OLAP engines, they have limitations on cloud environment adaption and cloud scenario benchmark execution. To address these issues, this paper proposes Raven, a cloud-oriented OLAP benchmark with flexible system architecture and diversified workloads. Raven supports cloud service deployment and various cloud OLAP engine integration. In addition, to simulate complex cloud query scenarios, we design a group of timeline-based and service-oriented workloads. We implement Raven on the Amazon AWS cloud platform and use it to evaluate typical types of widely-used OLAP engines, including Presto, SparkSQL, Kylin, and Athena. Experimental results show that Raven can effectively benchmark diversified OLAP engines. Besides, Raven can benchmark various configuration settings of an identical OLAP engine. We also explore an OLAP case study on the cloud using Raven.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Armbrust, M., et al.: Spark SQL: relational data processing in Spark. In: SIGMOD Conference, pp. 1383–1394 (2015)
Battle, L., et al.: Database benchmarking for supporting real-time interactive querying of large data. In: SIGMOD Conference, pp. 1571–1587 (2020)
Chevalier, M., et al.: Benchmark for OLAP on nosql technologies comparing nosql multidimensional data warehousing solutions. In: RCIS, pp. 480–485 (2015)
Cooper, B.F., et al.: Benchmarking cloud serving systems with YCSB. In: SoCC, pp. 143–154 (2010)
Daase, B., et al.: Maximizing persistent memory bandwidth utilization for OLAP workloads. In: SIGMOD Conference, pp. 339–351 (2021)
Dageville, B., et al.: The snowflake elastic data warehouse. In: SIGMOD Conference, pp. 215–226 (2016)
Deep, S., et al.: DIAMetrics: benchmarking query engines at scale. SIGMOD Rec. 50(1), 24–31 (2021)
Gruenheid, A., et al.: DIAMetrics: benchmarking query engines at scale. Proc. VLDB Endow. 13(12), 3285–3298 (2020)
Gu, R., et al.: Improving in-memory file system reading performance by fine-grained user-space cache mechanisms. J. Syst. Archit. 1(115), 1–15 (2021)
Gu, R., et al.: Octopus-DF: unified dataframe-based cross-platform data analytic system. Parallel Comput. 110(2022), 1–12 (2022)
Kornacker, M., et al.: Impala: a modern, open-source SQL engine for hadoop. In: CIDR, pp. 1–10 (2015)
Kossmann, D., et al.: An evaluation of alternative architectures for transaction processing in the cloud. In: SIGMOD Conference, pp. 579–590 (2010)
Kuschewski, M., Leis, V.: White-box OLAP performance modeling for the cloud. In: CIDR, p. 1 (2021)
Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)
Laszewski, T., Nauduri, P.: Chapter 1 - Migrating to the cloud. In: Migrating to the Cloud: Oracle Client/Server Modernization, pp. 1–19. Syngress, Boston (2012)
Li, C., et al.: The design and implementation of a scalable deep learning benchmarking platform. In: CLOUD, pp. 414–425 (2020)
Malki, M.E., et al.: Benchmarking big data OLAP nosql databases. In: UNet, pp. 82–94 (2018)
O’Neil, P.E., et al.: The star schema benchmark and augmented fact table indexing. In: TPCTC, pp. 237–252 (2009)
Pöss, M., et al.: TPC-DS, taking decision support benchmarking to the next level. In: SIGMOD Conference, pp. 582–587 (2002)
Queiroz-Sousa, P.O., Salgado, A.C.: A review on OLAP technologies applied to information networks. ACM Trans. Knowl. Discov. Data 14(1), 8:1–8:25 (2020)
Sethi, R., et al.: Presto: SQL on everything. In: ICDE, pp. 1802–1813 (2019)
Steinmetz, N., et al.: Question answering on OLAP-like data sources. In: EDBT/ICDT Workshops, pp. 1–8 (2020)
Tan, J., et al.: Choosing a cloud DBMS: architectures and tradeoffs. Proc. VLDB Endow. 12(12), 2170–2182 (2019)
The Apache Software Foundation: Apache Kylin | Analytical Data Warehouse for Big Data. http://kylin.apache.org/
Thusoo, A., et al.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)
Transaction processing performance council: TPC-H homepage. http://www.tpc.org/tpch/
Varghese, B., et al.: Cloud benchmarking for performance. In: CloudCom, pp. 535–540 (2014)
Varghese, B., et al.: Container-based cloud virtual machine benchmarking. In: IC2E, pp. 192–201 (2016)
Wang, L., et al.: BigDataBench: a big data benchmark suite from internet services. In: HPCA, pp. 488–499 (2014)
Wu, Z., Li, K.: Vbtree: forward secure conjunctive queries over encrypted data for cloud computing. VLDB J. 28(1), 25–46 (2019)
Xie, R., et al.: Hash adaptive bloom filter. In: IEEE ICDE Conference, pp. 636–647 (2021)
Xie, X., et al.: OLAP over probabilistic data cubes II: parallel materialization and extended aggregates. IEEE Trans. Knowl. Data Eng. 32(10), 1966–1981 (2020)
Yang, F., et al.: Druid: a real-time analytical data store. In: SIGMOD Conference, pp. 157–168 (2014)
Zhan, C., et al.: AnalyticDB: real-time OLAP database system at Alibaba cloud. Proc. VLDB Endow. 12(12), 2059–2070 (2019)
Acknowledgments
This work is funded in part by the China National Science Foundation (No. 62072230, U1811461), the Fundamental Research Funds for the Central Universities (No. 020214380089, 020214380098), Jiangsu Province Science and Technology Key Program (No. BE2021729), and the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, T. et al. (2023). Raven: Benchmarking Monetary Expense and Query Efficiency of OLAP Engines on the Cloud. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13946. Springer, Cham. https://doi.org/10.1007/978-3-031-30678-5_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-30678-5_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30677-8
Online ISBN: 978-3-031-30678-5
eBook Packages: Computer ScienceComputer Science (R0)