[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3318464.3380565acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SPRINTER: A Fast n-ary Join Query Processing Method for Complex OLAP Queries

Published: 31 May 2020 Publication History

Abstract

The concept of OLAP query processing is now being widely adopted in various applications. The number of complex queries containing the joins between non-unique keys (called FK-FK joins) increases in those applications. However, the existing in-memory OLAP systems tend not to handle such complex queries efficiently since they generate a large amount of intermediate results or incur a huge amount of probe cost. In this paper, we propose an effective query planning method for complex OLAP queries. It generates a query plan containing n-ary join operators based on a cost model. The plan does not generate intermediate results for processing FK-FK joins and significantly reduces the probe cost. We also propose an efficient processing method for n-ary join operators. We implement the prototype system SPRINTER by integrating our proposed methods into an open-source in-memory OLAP system. Through experiments using the TPC-DS benchmark, we have shown that SPRINTER outperforms the state-of-the-art OLAP systems for complex queries.

Supplementary Material

MP4 File (3318464.3380565.mp4)
Presentation Video

References

[1]
Christopher R Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2016. Emptyheaded: A relational engine for graph processing. In SIGMOD.
[2]
Foto N Afrati and Jeffrey D Ullman. 2011. Optimizing multiway joins in a map-reduce environment. IEEE Transactions on Knowledge and Data Engineering (TKDE) 23, 9 (2011), 1282--1298.
[3]
Rafi Ahmed, Rajkumar Sen, Meikel Poess, and Sunil Chakkappen. 2014. Of snowstorms and bushy trees. In VLDB.
[4]
Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In FoCS.
[5]
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Özsu. [n.d.]. Multi-core, main-memory joins: Sort vs. hash revisited.
[6]
Ronald Barber, Guy Lohman, Vijayshankar Raman, Richard Sidle, Sam Lightstone, and Berni Schiefer. 2015. In-memory BLU acceleration in IBM's DB2 and dashDB: Optimized for modern workloads and hardware architectures. In ICDE.
[7]
Nathan Bell and Jared Hoberock. 2012. Thrust: A productivity-oriented library for CUDA. In GPU computing gems Jade edition. Elsevier.
[8]
Spyros Blanas, Yinan Li, and Jignesh M Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD.
[9]
Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR.
[10]
Sebastian Breß, Henning Funke, and Jens Teubner. 2016. Robust query processing in co-processor-accelerated databases. In SIGMOD.
[11]
Jack Chen, Samir Jindel, Robert Walzer, Rajkumar Sen, Nika Jimsheleishvilli, and Michael Andrews. 2016. The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database. In VLDB.
[12]
Zhimin Chen, Vivek Narasayya, and Surajit Chaudhuri. 2014. Fast foreign-key detection in Microsoft SQL server PowerPivot for Excel. In PVLDB.
[13]
Shumo Chu, Magdalena Balazinska, and Dan Suciu. 2015. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD.
[14]
Henning Funke, Sebastian Breß, Stefan Noll, Volker Markl, and Jens Teubner. 2018. Pipelined query processing in coprocessor environments. In SIGMOD.
[15]
Michael Gowanlock and Ben Karsin. 2018. Sorting Large Datasets with Heterogeneous CPU/GPU Architectures. In Proc. IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[16]
Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, and Samuel Madden. 2010. HYRISE: a main memory hybrid storage engine. In VLDB.
[17]
Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-oblivious parallelism for in-memory column-stores. In VLDB.
[18]
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding. 2016. Quickr: Lazily approximating complex adhoc queries in bigdata clusters. In SIGMOD.
[19]
Changkyu Kim, Tim Kaldewey, Victor W Lee, Eric Sedlar, Anthony D Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs. In VLDB.
[20]
Hyerin Kim, NaNa Kang, Kang-Wook Chon, Seonho Kim, NaHye Lee, JaeHyung Koo, and Min-Soo Kim. 2015. MRPrimer: a MapReduce based method for the thorough design of valid and ranked primers for PCR. Nucleic acids research 43, 20 (2015), e130--e130.
[21]
Kinetica. 2019. Kinetica: Active Analytics Platform for the Extreme Data Economy. https://www.kinetica.com.
[22]
Arun Kumar, Jeffrey Naughton, Jignesh M Patel, and Xiaojin Zhu. 2016. To join or not to join?: Thinking twice about joins before feature selection. In SIGMOD.
[23]
Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, et al. 2015. Oracle database in-memory: A dual format in-memory database. In ICDE.
[24]
Per-Åke Larson, Adrian Birka, Eric N Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-time analytical processing with SQL server. In VLDB.
[25]
Per-Åke Larson, Cipri Clinciu, Eric N Hanson, Artem Oks, Susan L Price, Srikumar Rangarajan, Aleksandras Surna, and Qingqing Zhou. 2011. SQL server column store indexes. In SIGMOD.
[26]
Thomas J Lee, Yannick Pouliot, Valerie Wagner, Priyanka Gupta, David WJ Stringer-Calvert, Jessica D Tenenbaum, and Peter D Karp. 2006. BioWarehouse: a bioinformatics database warehouse toolkit. BMC bioinformatics 7, 1 (2006), 170.
[27]
Feilong Liu and Spyros Blanas. 2015. Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In SoCC.
[28]
Prashanth Menon, Todd C Mowry, and Andrew Pavlo. 2017. Relaxed operator fusion for in-memory databases: Making compilation, vectorization, and prefetching work together at last. In VLDB.
[29]
D Merrill. 2015. CUB v1. 5.3: CUDA Unbound, a library of warp-wide, block-wide, and device-wide GPU parallel primitives. NVIDIA Research (2015).
[30]
Raghunath Othayoth Nambiar and Meikel Poess. 2006. The making of TPC-DS.
[31]
Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst case optimal join algorithms. In PODS.
[32]
Hung Q Ngo, Christopher Ré, and Atri Rudra. 2014. Skew Strikes Back: New Developments in the Theory of Join Algorithms. ACM SIGMOD Rec. 42, 4 (2014), 5--16.
[33]
OmniSci. 2019. A github repository of OmniSci database. https://github.com/omnisci/omniscidb.
[34]
Jignesh M Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A data platform based on the scaling-up approach. In VLDB.
[35]
Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based pipelined query processing engine. In SIGMOD.
[36]
Adam Perelman and Christopher Ré. 2015. DunceCap: Compiling worst-case optimal query plans. In SIGMOD.
[37]
Chuck Pheatt. 2008. Intel® threading building blocks. Journal of Computing Sciences in Colleges 23, 4 (2008), 298--298.
[38]
Francesco Scarcello. 2005. Query answering exploiting structural properties. ACM SIGMOD Rec. Vol. 34, No. 3, pp. 91--99 (2005).
[39]
Stefan Schuh, Xiao Chen, and Jens Dittrich. 2016. An experimental comparison of thirteen relational equi-joins in main memory. In SIGMOD.
[40]
Anil Shanbhag and S Sudarshan. 2014. Optimizing join enumeration in transformation-based query optimizers. In VLDB.
[41]
Vishal Sikka, Franz Färber, Anil Goel, and Wolfgang Lehner. 2013. SAP HANA: The evolution from a modern main-memory data platform to an enterprise application platform. In VLDB.
[42]
Elias Stehle and Hans-Arno Jacobsen. 2017. A memory bandwidth efficient hybrid radix sort on gpus. In SIGMOD.
[43]
Susan Tu and Christopher Ré. 2015. Duncecap: Query plans using generalized hypertree decompositions. In SIGMOD.
[44]
Todd L Veldhuizen. 2014. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In ICDT.
[45]
Andreas Weininger. 2002. Efficient execution of joins in a star schema. In SIGMOD.
[46]
PetrieWong, Zhian He, and Eric Lo. 2013. Parallel analytics as a service. In SIGMOD.
[47]
Konstantinos Xirogiannopoulos and Amol Deshpande. 2019. Memory- Efficient Group-by Aggregates over Multi-Way Joins. In arXiv preprint arXiv:1906.05745.
[48]
Meihui Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M Procopiuc, and Divesh Srivastava. 2010. On Multi-Column Foreign Key Discovery. In PVLDB.
[49]
Xiaofei Zhang, Lei Chen, and Min Wang. 2012. Efficient multi-way theta-join processing using mapreduce. In VLDB.
[50]
Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M Patel. 2017. Looking ahead makes query plans robust: Making the initial case with in-memory star schema data warehouse workloads. In VLDB.
[51]
Marcin Zukowski, Mark Van de Wiel, and Peter A Boncz. 2012. Vectorwise: A Vectorized Analytical DBMS. In ICDE.

Cited By

View all
  • (2022)Cardinality estimation in DBMSProceedings of the VLDB Endowment10.14778/3503585.350358615:4(752-765)Online publication date: 14-Apr-2022
  • (2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
  • (2022)D-Cubicle: boosting data transfer dynamically for large-scale analytical queries in single-GPU systemsFrontiers of Computer Science10.1007/s11704-022-2160-z17:4Online publication date: 12-Dec-2022
  • Show More Cited By

Index Terms

  1. SPRINTER: A Fast n-ary Join Query Processing Method for Complex OLAP Queries

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
      June 2020
      2925 pages
      ISBN:9781450367356
      DOI:10.1145/3318464
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 May 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FK-FK join
      2. co-processing
      3. complex OLAP query processing
      4. n-ary join operator
      5. query optimization
      6. query planning

      Qualifiers

      • Research-article

      Funding Sources

      • Institute of Information communications Technology Planning Evaluation (IITP)
      • the National Research Foundation of Korea (NRF)

      Conference

      SIGMOD/PODS '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)92
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 19 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Cardinality estimation in DBMSProceedings of the VLDB Endowment10.14778/3503585.350358615:4(752-765)Online publication date: 14-Apr-2022
      • (2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
      • (2022)D-Cubicle: boosting data transfer dynamically for large-scale analytical queries in single-GPU systemsFrontiers of Computer Science10.1007/s11704-022-2160-z17:4Online publication date: 12-Dec-2022
      • (2021)Fine-Grained Multi-Query Stream Processing on Integrated ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306640732:9(2303-2320)Online publication date: 5-Apr-2021

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media