[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Data-parallel query processing on non-uniform data

Published: 01 February 2020 Publication History

Abstract

Graphics processing units (GPUs) promise spectacular performance advantages when used as database coprocessors. Their massive compute capacity, however, is often hampered by control flow divergence caused by non-uniform data distributions. When data-parallel work items demand for different amounts or types of processing, instructions execute with lowered efficiency. Query compilation techniques---a recent advance in GPU-accelerated database processing---suffer from the problem even more, because divergence effects are amplified during the execution of fused pipeline operators.
In this work, we identify two types of control flow divergence---filter divergence and expansion divergence---that frequently occur in real world workloads. We quantify the problem for two poster cases and propose techniques to balance these divergence effects. By balancing divergence effects, our approach is able to restore processing efficiency even when pipelines contain heavily skewed operations. Our query compiler DogQC has a wider range of functionality than other query coprocessors and achieves performance improvements. We observe shorter execution times for TPC-H benchmark queries by factors up to 4.51x compared with existing GPU query compilers and by factors up to 4.54x compared with CPU-based systems.

References

[1]
P. Bakkum and S. Chakradhar. Efficient data management for GPU databases. High Performance Computing on Graphics Processing Units, 2012.
[2]
M. Billeter, O. Olsson, and U. Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In Proceedings of the conference on high performance graphics 2009, pages 159--166. ACM, 2009.
[3]
C. Binnig, S. Hildenbrand, and F. Färber. Dictionary-based order-preserving string compression for main memory column stores. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 283--296. ACM, 2009.
[4]
P. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In Technology Conference on Performance Evaluation and Benchmarking, pages 61--76. Springer, 2013.
[5]
P. A. Boncz, S. Manegold, M. L. Kersten, et al. Database architecture optimized for the new bottleneck: Memory access. In VLDB, pages 54--65, 1999.
[6]
S. Borkar and A. A. Chien. The Future of Microprocessors. Communications of the ACM, 54(5):67--77, May 2011.
[7]
S. Breß, H. Funke, and J. Teubner. Robust query processing in co-processor-accelerated databases. In Proceedings of the 2016 International Conference on Management of Data, pages 1891--1906. ACM, 2016.
[8]
S. Breß, B. Köcher, H. Funke, S. Zeuch, T. Rabl, and V. Markl. Generating custom code for efficient query execution on heterogeneous processors. VLDB Journal, 27(6):797--822, 2018.
[9]
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer networks, 33(1-6):309--320, 2000.
[10]
G. Chen and X. Shen. Free launch: optimizing GPU dynamic kernel launches through thread reuse. In Proceedings of the 48th International Symposium on Microarchitecture, pages 407--419. ACM, 2015.
[11]
A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single-source shortest paths. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pages 349--359. IEEE, 2014.
[12]
H. Funke, S. Breß, S. Noll, V. Markl, and J. Teubner. Pipelined Query Processing in Coprocessor Environments. In Proceedings of the 2018 International Conference on Management of Data, pages 1603--1618. ACM, 2018.
[13]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. PVLDB, 6(9):709--720, 2013.
[14]
Intel Corporation. Accelerating x265 with Intel Advanced Vector Extensions 512 (Intel AVX-512). 2018.
[15]
B. Jang, D. Schaa, P. Mistry, and D. Kaeli. Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Transactions on Parallel and Distributed Systems, 22(1):105--118, 2010.
[16]
Z. Jia, M. Maggioni, J. Smith, and D. P. Scarpazza. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. arXiv preprint arXiv:1903.07486, 2019.
[17]
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In Proceedings of the Eighth International Workshop on Data Management on New Hardware, pages 55--62. ACM, 2012.
[18]
A. Kipf, H. Lang, V. Pandey, R. A. Persa, P. Boncz, T. Neumann, and A. Kemper. Approximate geospatial joins with precision guarantees. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 1360--1363. IEEE, 2018.
[19]
H. Lang, A. Kipf, L. Passing, P. Boncz, T. Neumann, and A. Kemper. Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. In Proceedings of the 14th International Workshop on Data Management on New Hardware, page 5. ACM, 2018.
[20]
V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 743--754. ACM, 2014.
[21]
H. Liu and H. H. Huang. Enterprise: breadth-first graph traversal on GPUs. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--12. IEEE, 2015.
[22]
L. Liu, Y. Zhang, M. Liu, C. Wang, and J. Wang. A-MapCG: an adaptive MapReduce framework for GPUs. In 2017 International Conference on Networking, Architecture, and Storage (NAS), pages 1--8. IEEE, 2017.
[23]
X. Mei and X. Chu. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Transactions on Parallel and Distributed Systems, 28(1):72--86, 2016.
[24]
I. Müller, C. Ratsch, F. Faerber, et al. Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems. In EDBT, pages 283--294, 2014.
[25]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.
[26]
NVidia Corporation. NVidia Kepler GPU Architecture. 2012.
[27]
NVidia Corporation. NVidia Turing GPU Architecture. 2018.
[28]
OmniSci Incorporated. OmniSciDB. https://www.omnisci.com/, 2019.
[29]
J. Paul, J. He, and B. He. GPL: A GPU-based pipelined query processing engine. In Proceedings of the 2016 International Conference on Management of Data, pages 1935--1950. ACM, 2016.
[30]
H. Pirk, S. Manegold, M. L. Kersten, et al. Accelerating Foreign-Key Joins using Asymmetric Memory Channels. In ADMS@VLDB, pages 27--35, 2011.
[31]
O. Polychroniou and K. A. Ross. Vectorized Bloom filters for advanced SIMD processors. In Proceedings of the Tenth International Workshop on Data Management on New Hardware, page 6. ACM, 2014.
[32]
R. Rui and Y.-C. Tu. Fast Equi-Join Algorithms on GPUs: Design and Implementation. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, page 17. ACM, 2017.
[33]
M. Sha, Y. Li, and K.-L. Tan. GPU-based Graph Traversal on Compressed Graphs. In Proceedings of the 2019 International Conference on Management of Data, pages 775--792. ACM, 2019.
[34]
J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, pages 33--40. ACM, 2011.
[35]
G. Vasiliadis, M. Polychronakis, S. Antonatos, E. P. Markatos, and S. Ioannidis. Regular expression matching on graphics hardware for intrusion detection. In International Workshop on Recent Advances in Intrusion Detection, pages 265--283. Springer, 2009.
[36]
J. Wang and S. Yalamanchili. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In 2014 IEEE International Symposium on Workload Characterization (IISWC), pages 51--60. IEEE, 2014.
[37]
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pages 107--118. IEEE, 2012.
[38]
H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red fox: An execution environment for relational query processing on gpus. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, page 44. ACM, 2014.
[39]
Y. Yuan, R. Lee, and X. Zhang. The Yin and Yang of processing data warehousing queries on GPU devices. PVLDB, 6(10):817--828, 2013.
[40]
Y. Zu, M. Yang, Z. Xu, L. Wang, X. Tian, K. Peng, and Q. Dong. GPU-based NFA implementation for memory efficient high speed regular expression matching. In ACM SIGPLAN Notices, volume 47, pages 129--140. ACM, 2012.

Cited By

View all
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 1-Apr-2024
  • (2024)Free Your Hash Join From Slow Synchronizations: A Fast Hash Join Implementation on GPUProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674437(106-107)Online publication date: 5-Jul-2024
  • (2024)How Does Software Prefetching Work on GPU Query Processing?Proceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663445(1-9)Online publication date: 10-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 6
February 2020
170 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2020
Published in PVLDB Volume 13, Issue 6

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 1-Apr-2024
  • (2024)Free Your Hash Join From Slow Synchronizations: A Fast Hash Join Implementation on GPUProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674437(106-107)Online publication date: 5-Jul-2024
  • (2024)How Does Software Prefetching Work on GPU Query Processing?Proceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663445(1-9)Online publication date: 10-Jun-2024
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 1-Jul-2024
  • (2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
  • (2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
  • (2023)Fine-Grained Tuple Transfer for Pipelined Query Execution on CPU-GPU CoprocessorDatabase Systems for Advanced Applications10.1007/978-3-031-30637-2_2(19-34)Online publication date: 17-Apr-2023
  • (2022)GHiveProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563503(158-172)Online publication date: 7-Nov-2022
  • (2022)GaccO - A GPU-accelerated OLTP DBMSProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517876(1003-1016)Online publication date: 10-Jun-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media