More Web Proxy on the site http://driver.im/

research-article

SPRINTER: A Fast n-ary Join Query Processing Method for Complex OLAP Queries

Authors:

Yoon-Min Nam Nam,

Donghyoung Han Han,

Min-Soo Kim KimAuthors Info & Claims

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 2055 - 2070

https://doi.org/10.1145/3318464.3380565

Published: 31 May 2020 Publication History

Abstract

The concept of OLAP query processing is now being widely adopted in various applications. The number of complex queries containing the joins between non-unique keys (called FK-FK joins) increases in those applications. However, the existing in-memory OLAP systems tend not to handle such complex queries efficiently since they generate a large amount of intermediate results or incur a huge amount of probe cost. In this paper, we propose an effective query planning method for complex OLAP queries. It generates a query plan containing n-ary join operators based on a cost model. The plan does not generate intermediate results for processing FK-FK joins and significantly reduces the probe cost. We also propose an efficient processing method for n-ary join operators. We implement the prototype system SPRINTER by integrating our proposed methods into an open-source in-memory OLAP system. Through experiments using the TPC-DS benchmark, we have shown that SPRINTER outperforms the state-of-the-art OLAP systems for complex queries.

Supplementary Material

MP4 File (3318464.3380565.mp4)

Presentation Video

Download
89.16 MB

References

[1]

Christopher R Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2016. Emptyheaded: A relational engine for graph processing. In SIGMOD.

Digital Library

[2]

Foto N Afrati and Jeffrey D Ullman. 2011. Optimizing multiway joins in a map-reduce environment. IEEE Transactions on Knowledge and Data Engineering (TKDE) 23, 9 (2011), 1282--1298.

Digital Library

[3]

Rafi Ahmed, Rajkumar Sen, Meikel Poess, and Sunil Chakkappen. 2014. Of snowstorms and bushy trees. In VLDB.

[4]

Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In FoCS.

[5]

Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Özsu. [n.d.]. Multi-core, main-memory joins: Sort vs. hash revisited.

[6]

Ronald Barber, Guy Lohman, Vijayshankar Raman, Richard Sidle, Sam Lightstone, and Berni Schiefer. 2015. In-memory BLU acceleration in IBM's DB2 and dashDB: Optimized for modern workloads and hardware architectures. In ICDE.

[7]

Nathan Bell and Jared Hoberock. 2012. Thrust: A productivity-oriented library for CUDA. In GPU computing gems Jade edition. Elsevier.

[8]

Spyros Blanas, Yinan Li, and Jignesh M Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD.

[9]

Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR.

[10]

Sebastian Breß, Henning Funke, and Jens Teubner. 2016. Robust query processing in co-processor-accelerated databases. In SIGMOD.

[11]

Jack Chen, Samir Jindel, Robert Walzer, Rajkumar Sen, Nika Jimsheleishvilli, and Michael Andrews. 2016. The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database. In VLDB.

[12]

Zhimin Chen, Vivek Narasayya, and Surajit Chaudhuri. 2014. Fast foreign-key detection in Microsoft SQL server PowerPivot for Excel. In PVLDB.

[13]

Shumo Chu, Magdalena Balazinska, and Dan Suciu. 2015. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD.

[14]

Henning Funke, Sebastian Breß, Stefan Noll, Volker Markl, and Jens Teubner. 2018. Pipelined query processing in coprocessor environments. In SIGMOD.

[15]

Michael Gowanlock and Ben Karsin. 2018. Sorting Large Datasets with Heterogeneous CPU/GPU Architectures. In Proc. IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[16]

Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, and Samuel Madden. 2010. HYRISE: a main memory hybrid storage engine. In VLDB.

[17]

Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-oblivious parallelism for in-memory column-stores. In VLDB.

[18]

Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding. 2016. Quickr: Lazily approximating complex adhoc queries in bigdata clusters. In SIGMOD.

Digital Library

[19]

Changkyu Kim, Tim Kaldewey, Victor W Lee, Eric Sedlar, Anthony D Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs. In VLDB.

[20]

Hyerin Kim, NaNa Kang, Kang-Wook Chon, Seonho Kim, NaHye Lee, JaeHyung Koo, and Min-Soo Kim. 2015. MRPrimer: a MapReduce based method for the thorough design of valid and ranked primers for PCR. Nucleic acids research 43, 20 (2015), e130--e130.

[21]

Kinetica. 2019. Kinetica: Active Analytics Platform for the Extreme Data Economy. https://www.kinetica.com.

[22]

Arun Kumar, Jeffrey Naughton, Jignesh M Patel, and Xiaojin Zhu. 2016. To join or not to join?: Thinking twice about joins before feature selection. In SIGMOD.

[23]

Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, et al. 2015. Oracle database in-memory: A dual format in-memory database. In ICDE.

[24]

Per-Åke Larson, Adrian Birka, Eric N Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-time analytical processing with SQL server. In VLDB.

[25]

Per-Åke Larson, Cipri Clinciu, Eric N Hanson, Artem Oks, Susan L Price, Srikumar Rangarajan, Aleksandras Surna, and Qingqing Zhou. 2011. SQL server column store indexes. In SIGMOD.

[26]

Thomas J Lee, Yannick Pouliot, Valerie Wagner, Priyanka Gupta, David WJ Stringer-Calvert, Jessica D Tenenbaum, and Peter D Karp. 2006. BioWarehouse: a bioinformatics database warehouse toolkit. BMC bioinformatics 7, 1 (2006), 170.

[27]

Feilong Liu and Spyros Blanas. 2015. Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In SoCC.

[28]

Prashanth Menon, Todd C Mowry, and Andrew Pavlo. 2017. Relaxed operator fusion for in-memory databases: Making compilation, vectorization, and prefetching work together at last. In VLDB.

[29]

D Merrill. 2015. CUB v1. 5.3: CUDA Unbound, a library of warp-wide, block-wide, and device-wide GPU parallel primitives. NVIDIA Research (2015).

[30]

Raghunath Othayoth Nambiar and Meikel Poess. 2006. The making of TPC-DS.

[31]

Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst case optimal join algorithms. In PODS.

[32]

Hung Q Ngo, Christopher Ré, and Atri Rudra. 2014. Skew Strikes Back: New Developments in the Theory of Join Algorithms. ACM SIGMOD Rec. 42, 4 (2014), 5--16.

Digital Library

[33]

OmniSci. 2019. A github repository of OmniSci database. https://github.com/omnisci/omniscidb.

[34]

Jignesh M Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A data platform based on the scaling-up approach. In VLDB.

[35]

Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based pipelined query processing engine. In SIGMOD.

Digital Library

[36]

Adam Perelman and Christopher Ré. 2015. DunceCap: Compiling worst-case optimal query plans. In SIGMOD.

[37]

Chuck Pheatt. 2008. Intel® threading building blocks. Journal of Computing Sciences in Colleges 23, 4 (2008), 298--298.

Digital Library

[38]

Francesco Scarcello. 2005. Query answering exploiting structural properties. ACM SIGMOD Rec. Vol. 34, No. 3, pp. 91--99 (2005).

Digital Library

[39]

Stefan Schuh, Xiao Chen, and Jens Dittrich. 2016. An experimental comparison of thirteen relational equi-joins in main memory. In SIGMOD.

[40]

Anil Shanbhag and S Sudarshan. 2014. Optimizing join enumeration in transformation-based query optimizers. In VLDB.

[41]

Vishal Sikka, Franz Färber, Anil Goel, and Wolfgang Lehner. 2013. SAP HANA: The evolution from a modern main-memory data platform to an enterprise application platform. In VLDB.

Digital Library

[42]

Elias Stehle and Hans-Arno Jacobsen. 2017. A memory bandwidth efficient hybrid radix sort on gpus. In SIGMOD.

[43]

Susan Tu and Christopher Ré. 2015. Duncecap: Query plans using generalized hypertree decompositions. In SIGMOD.

Digital Library

[44]

Todd L Veldhuizen. 2014. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In ICDT.

[45]

Andreas Weininger. 2002. Efficient execution of joins in a star schema. In SIGMOD.

[46]

PetrieWong, Zhian He, and Eric Lo. 2013. Parallel analytics as a service. In SIGMOD.

[47]

Konstantinos Xirogiannopoulos and Amol Deshpande. 2019. Memory- Efficient Group-by Aggregates over Multi-Way Joins. In arXiv preprint arXiv:1906.05745.

[48]

Meihui Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M Procopiuc, and Divesh Srivastava. 2010. On Multi-Column Foreign Key Discovery. In PVLDB.

[49]

Xiaofei Zhang, Lei Chen, and Min Wang. 2012. Efficient multi-way theta-join processing using mapreduce. In VLDB.

[50]

Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M Patel. 2017. Looking ahead makes query plans robust: Making the initial case with in-memory star schema data warehouse workloads. In VLDB.

[51]

Marcin Zukowski, Mark Van de Wiel, and Peter A Boncz. 2012. Vectorwise: A Vectorized Analytical DBMS. In ICDE.

Cited By

Han YWu ZWu PZhu RYang JTan LZeng KCong GQin YPfadler AQian ZZhou JLi JCui B(2022)Cardinality estimation in DBMSProceedings of the VLDB Endowment10.14778/3503585.350358615:4(752-765)Online publication date: 14-Apr-2022
https://dl.acm.org/doi/10.14778/3503585.3503586
Zhang HYu JZhang YZhao KIves ZBonifati AEl Abbadi A(2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526164
Wang JPang WWeng CZhou A(2022)D-Cubicle: boosting data transfer dynamically for large-scale analytical queries in single-GPU systemsFrontiers of Computer Science10.1007/s11704-022-2160-z17:4Online publication date: 12-Dec-2022
https://doi.org/10.1007/s11704-022-2160-z
Show More Cited By

Index Terms

SPRINTER: A Fast n-ary Join Query Processing Method for Complex OLAP Queries
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization
        Query planning

Recommendations

An Optimal Algorithm for Processing Distributed Star Queries

The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Processing multi-join queries

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

June 2020

2925 pages

ISBN:9781450367356

DOI:10.1145/3318464

General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Institute of Information communications Technology Planning Evaluation (IITP)
the National Research Foundation of Korea (NRF)

Conference

SIGMOD/PODS '20

Sponsor:

SIGMOD

SIGMOD/PODS '20: International Conference on Management of Data

June 14 - 19, 2020

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
1,305
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)6

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Han YWu ZWu PZhu RYang JTan LZeng KCong GQin YPfadler AQian ZZhou JLi JCui B(2022)Cardinality estimation in DBMSProceedings of the VLDB Endowment10.14778/3503585.350358615:4(752-765)Online publication date: 14-Apr-2022
https://dl.acm.org/doi/10.14778/3503585.3503586
Zhang HYu JZhang YZhao KIves ZBonifati AEl Abbadi A(2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526164
Wang JPang WWeng CZhou A(2022)D-Cubicle: boosting data transfer dynamically for large-scale analytical queries in single-GPU systemsFrontiers of Computer Science10.1007/s11704-022-2160-z17:4Online publication date: 12-Dec-2022
https://doi.org/10.1007/s11704-022-2160-z
Zhang FZhang CYang LZhang SHe BLu WDu X(2021)Fine-Grained Multi-Query Stream Processing on Integrated ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306640732:9(2303-2320)Online publication date: 5-Apr-2021
https://dl.acm.org/doi/10.1109/TPDS.2021.3066407

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents