More Web Proxy on the site http://driver.im/

research-article

Anser: Adaptive Information Sharing Framework of AnalyticDB

Authors:

Feifei LiAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 16, Issue 12

Pages 3636 - 3648

https://doi.org/10.14778/3611540.3611553

Published: 01 August 2023 Publication History

Abstract

The surge in data analytics has fostered burgeoning demand for AnalyticDB on Alibaba Cloud, which has well served thousands of customers from various business sectors. The most notable feature is the diversity of the workloads it handles, including batch processing, real-time data analytics, and unstructured data analytics. To improve the overall performance for such diverse workloads, one of the major challenges is to optimize long-running complex queries without sacrificing the processing efficiency of short-running interactive queries. While existing methods attempt to utilize runtime dynamic statistics for adaptive query processing, they often focus on specific scenarios instead of providing a holistic solution.

To address this challenge, we propose a new framework called Anser, which enhances the design of traditional distributed data warehouses by embedding a new information sharing mechanism. This allows for the efficient management of the production and consumption of various dynamic information across the system. Building on top of Anser, we introduce a novel scheduling policy that optimizes both data and information exchanges within the physical plan, enabling the acceleration of complex analytical queries without sacrificing the performance of short-running interactive queries. We conduct comprehensive experiments over public and in-house workloads to demonstrate the effectiveness and efficiency of our proposed information sharing framework.

References

[1]

[n. d.]. Apache Hive. https://hive.apache.org/. Last accessed 2023-03-01.

[2]

[n. d.]. Apache Kafka. https://kafka.apache.org/. Last accessed 2023-03-01.

[3]

[n. d.]. Elastic Compute Service. https://www.alibabacloud.com/product/ecs.Last accessed 2023-03-01.

[4]

[n. d.]. HDFS Architecture Guide. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Last accessed 2023-03-01.

[5]

[n. d.]. Impala Runtime Filtering. https://impala.apache.org/docs/build/html/topics/impala_runtime_filtering.html. Last accessed 2023-03-01.

[6]

[n. d.]. Object Storage Service (OSS). https://www.alibabacloud.com/product/object-storage-service?spm=a3c0i.23458820.2359477120.2.26a77d3fagA3sE. Last accessed 2023-03-01.

[7]

[n. d.]. Parameter Sensitive Plan optimization. https://learn.microsoft.com/en-us/sql/relational-databases/performance/parameter-sensitivity-plan-optimization?view=sql-server-ver16. Last accessed 2023-03-01.

[8]

Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European conference on computer systems. 29--42.

Digital Library

[9]

Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. 2015. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1383--1394.

Digital Library

[10]

Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta, Venkatraman Govindaraju, Todd J Green, Monish Gupta, Sebastian Hillig, et al. 2022. Amazon Redshift re-invented. In Proceedings of the 2022 International Conference on Management of Data. 2205--2217.

Digital Library

[11]

Ron Avnur and Joseph M Hellerstein. 2000. Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 261--272.

Digital Library

[12]

Shivnath Babu, Pedro Bizarro, and David DeWitt. 2005. Proactive re-optimization. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 107--118.

Digital Library

[13]

Srikanth Bellamkonda, Hua-Gang Li, Unmesh Jagtap, Yali Zhu, Vince Liang, and Thierry Cruanes. 2013. Adaptive and Big Data Scale Parallel Execution in Oracle. Proc. VLDB Endow. 6 (2013), 1102--1113.

Digital Library

[14]

Chuangxian Wei Xiaoqiang Peng Liang Lin Sheng Wang Zhe Chen Feifei Li Yue Pan Fang Zheng Chengliang ChaiChaoqunZhan, Maomeng Su. 2019. AnalyticDB: Realtime OLAP Database System at AlibabaCloud. In Proceedings of the VLDB Endowment, Vol. 12. 2059--2070.

Digital Library

[15]

Surajit Chaudhuri and Kyuseok Shim. 1994. Including group-by in query optimization. In VLDB, Vol. 94. 12--15.

[16]

Ming-Syan Chen, Hui-I Hsiao, and Philip S Yu. 1997. On applying hash filters to improving the execution of multi-join queries. The VLDB journal 6 (1997), 121--131.

Digital Library

[17]

Ming-Syan Chen, Hui-I Hsiao, and Philip S Yu. 1997. On applying hash filters to improving the execution of multi-join queries. The VLDB journal 6 (1997), 121--131.

Digital Library

[18]

Amol Deshpande. 2004. An initial study of overheads of eddies. ACM SIGMOD Record 33, 1 (2004), 44--49.

Digital Library

[19]

Amol Deshpande, Joseph M Hellerstein, et al. 2004. Lifting the burden of history from adaptive query processing. In VLDB. Citeseer, 948--959.

[20]

Amol Deshpande, Joseph M Hellerstein, and Vijayshankar Raman. 2006. Adaptive query processing: why, how, when, what next. (2006), 806--807.

[21]

Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, and Tim Kraska. 2021. Instance-optimized data layouts for cloud analytics workloads. In Proceedings of the 2021 International Conference on Management of Data. 418--431.

Digital Library

[22]

David J. DeWitt Donovan A. Schneider. 1989. A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. 1989 ACM SIGMOD international conference on Management of data) (1989), 110--121.

[23]

Mostafa Elhemali, César A Galindo-Legaria, Torsten Grabs, and Milind M Joshi. 2007. Execution strategies for SQL subqueries. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data. 993--1004.

Digital Library

[24]

Goetz Graefe. 1995. The cascades framework for query optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19--29.

[25]

Goetz Graefe and Karen Ward. 1989. Dynamic query evaluation plans. In Proceedings of the 1989 ACM SIGMOD international conference on Management of data. 358--366.

Digital Library

[26]

Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1917--1923.

Digital Library

[27]

Ashish Gupta, Venky Harinarayan, and Dallan Quass. 1995. Aggregate-query processing in data warehousing environments. In VLDB, Vol. 95. Citeseer, 358--369.

[28]

Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 2007. 2007 Test-of-time Award "Online Aggregation". (2007), 1.

[29]

Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. 1997. Parametric query optimization. The VLDB Journal 6 (1997), 132--151.

Digital Library

[30]

Zachary G. Ives and Nicholas E. Taylor. 2008. Sideways Information Passing for Push-Style Query Processing. 2008 IEEE 24th International Conference on Data Engineering (2008), 774--783.

[31]

Matthias Jarke and Jürgen Hartmut Koch. 1984. Query Optimization in Database Systems. ACM Comput. Surv. 16 (1984), 111--152.

Digital Library

[32]

Navin Kabra and David J DeWitt. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 106--117.

Digital Library

[33]

P-A Larson. 2002. Data reduction by partial preaggregation. In Proceedings 18th International Conference on Data Engineering. IEEE, 706--715.

[34]

Allison W. Lee and Mohamed Zaït. 2008. Closing the query processing loop in Oracle 11g. Proc. VLDB Endow. 1 (2008), 1368--1378.

Digital Library

[35]

Kaiyu Li and Guoliang Li. 2018. Approximate Query Processing: What is New and Where to Go? Data Science and Engineering 3 (2018), 379--397.

[36]

Lothar F Mackert and Guy M Lohman. 1986. R* Optimizer Validation and Performance Evaluation. Very Large Data Bases: Proceedings 149 (1986), 149.

[37]

Abhishek Modi, Kaushik Rajan, Srinivas Thimmaiah, Prakhar Jain, Swinky Mann, Ayushi Agarwal, Ajith Shetty, Shahid K I, Ashit Gosalia, and Partho Sarthi. 2021. New query optimization techniques in the Spark engine of Azure synapse. Proceedings of the VLDB Endowment 15, 4 (2021), 936--948.

Digital Library

[38]

M. Oyamada. 2018. Accelerating Feature Engineering with Adaptive Partial Aggregation Tree. 2018 IEEE International Conference on Big Data (Big Data) (2018), 5417--5419.

[39]

Glenn Norman Paulley. 2001. Exploiting functional dependence in query optimization. University of Waterloo.

[40]

Vijayshankar Raman, Amol Deshpande, and Joseph M Hellerstein. 2003. Using state modules for adaptive query processing. In Proceedings 19th International Conference on Data Engineering (Cat. No. 03CH37405). IEEE, 353--364.

[41]

Praveen Seshadri, Joseph M Hellerstein, Hamid Pirahesh, TY Cliff Leung, Raghu Ramakrishnan, Divesh Srivastava, Peter J Stuckey, and S Sudarshan. 1996. Cost-based optimization for magic: Algebra and implementation. In Proceedings of the 1996 ACM SIGMOD international conference on Management of data. 435--446.

Digital Library

[42]

Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, et al. 2019. Presto: SQL on everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1802--1813.

[43]

Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, and Wangchao Le. 2020. Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (2020).

Digital Library

[44]

Michael Stillger, Guy M Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEODB2's learning optimizer. In VLDB, Vol. 1. 19--28.

[45]

Michael Stonebraker. 1986. The case for shared nothing. Database Engineering Bulletin) (1986), 4--9.

[46]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data. Proceedings of the VLDB Endowment 13, 12 (2020), 3152--3165.

Digital Library

[47]

Rongbiao Xie, Meng Li, Zheyu Miao, Rong Gu, He Huang, Haipeng Dai, and Guihai Chen. 2021. Hash Adaptive Bloom Filter. 2021 IEEE 37th International Conference on Data Engineering (ICDE) (2021), 636--647.

[48]

Yanjun Yao, Sisi Xiong, Hairong Qi, Yilu Liu, Leon M. Tolbert, and Qing Cao. 2015. Efficient Histogram Estimation for Smart Grid Data Processing With the Loglog-Bloom-Filter. IEEE Transactions on Smart Grid 6 (2015), 199--208.

[49]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 15--28.

[50]

Yong Zhao and Rong Chen. 2021. Spark SQL Query Optimization Based on Runtime Statistics Collection. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA) (2021), 250--255.

Cited By

Recommendations

AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data

With the explosive growth of unstructured data (such as images, videos, and audios), unstructured data analytics is widespread in a rich vein of real-world applications. Many database systems start to incorporate unstructured data analysis to meet such ...
Effective Information Retrieval Approach based on Parallel Matrix Method and MapReduce Framework
ICARCSET '15: Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015)

The Information technology is developing rapidly, large amount of database and huge data has made a tremendous challenge in data mining. As a result, the amounts of data missing and unstructured information are increasing at an unprecedented rate. In ...
Software analytics = sharing information
PROMISE '13: Proceedings of the 9th International Conference on Predictive Models in Software Engineering

Software and its development generates an inordinate amount of data. Development activities such as check-ins, work items, bug reports, code reviews, and test executions are recorded in software repositories. User interactions that reflect how customers ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 16, Issue 12

August 2023

685 pages

ISSN:2150-8097

Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2023

Published in PVLDB Volume 16, Issue 12

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
112
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents