[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2485278.2485282acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Optimizing select conditions on GPUs

Published: 24 June 2013 Publication History

Abstract

Implementations of data processing operators on GPU processors have achieved significant performance improvements over their multicore CPU counterparts. To achieve maximum performance, database operator implementations must take into consideration special features of GPU architectures. A crucial difference is that the unit of execution is a group ("warp") of threads, 32 threads in our target architecture, as opposed to a single thread for CPUs. In the presence of branches, threads in a warp have to follow the same execution path; if some threads diverge then different paths are serialized. Additionally, similarly to CPUs, branches degrade the efficiency of instruction scheduling. Here, we study conjunctive selection queries where branching hurts performance. We compute the optimal execution plan for a conjunctive query, taking branch penalties into account and consider both single-kernel and multi-kernel plans. Our evaluation suggests that divergence affects performance significantly and that our techniques reduce resource underutilization and improve operator performance.

References

[1]
P. Bakkum and K. Skadron. Accelerating SQL database operations on a GPU with CUDA. In GPGPU, 2010.
[2]
S. Carrillo, J. Siegel, and X. Li. A control-structure splitting optimization for GPGPU. In ACM conference on Computing frontiers, 2009.
[3]
N. Corporation. NVIDIA CUDA C Programming Guide. NVIDIA Corporation, April 2012.
[4]
A. Davidson, D. Tarjan, M. Garland, and J. D. Owens. Efficient parallel merge sort for fixed and variable length keys. In InPar, 2012.
[5]
D. J. Dewitt, S. R. Madden, D. J. Abadi, and D. S. Myers. Materialization strategies in a column-oriented DBMS. In ICDE, 2007.
[6]
G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu, and S. Yalamanchili. SIMD re-convergence at thread frontiers. In MICRO, 2011.
[7]
G. Diamos, H. Wu, A. Lele, J. Wang, and S. Yalamanchili. Efficient relational algebra algorithms and data structures for GPU. 2012.
[8]
R. Fang, B. He, M. Lu, K. Yang, N. K. Govindaraju, Q. Luo, and P. V. S. GPUQP: query co-processing using graphics processors.
[9]
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, Washington, DC, USA, 2007.
[10]
T. D. Han and T. S. Abdelrahman. Reducing branch divergence in GPU programs. In GPGPU, 2011.
[11]
J. Hellerstein. Optimization techniques for queries with expensive methods. TODS, 23, 1998.
[12]
J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. SIGARCH Comput. Archit. News, 38(3), 2010.
[13]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow., 4(9):539--550, June 2011.
[14]
K. A. Ross. Selection conditions in main memory. TODS, 29(1), 2004.
[15]
E. A. Sitaridi and K. A. Ross. Ameliorating memory contention of OLAP operators on GPU processors. In DaMoN, 2012.
[16]
R. Taylor and X. Li. Software-based branch predication for AMD GPUs. SIGARCH Comput. Archit. News, 38(4):66--72, Jan. 2011.
[17]
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In MICRO, 2012.
[18]
H. Wu, G. Diamos, A. Lele, J. Wang, S. Cadambi, S. Yalamanchili, and S. Chakradhar. Optimizing data warehousing applications for GPUs using kernel fusion/fission. In PLC Workshop, 2012.
[19]
E. Z. Zhang, Y. Jiang, Z. Guo, and X. Shen. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. In ICS, 2010.
[20]
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In ASPLOS, 2011.

Cited By

View all
  • (2024)Give a JIT on GPUs: NVRTC for Code-Generating Database Systems2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00061(384-387)Online publication date: 13-May-2024
  • (2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
  • (2022)Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMSProceedings of the VLDB Endowment10.14778/3551793.355180915:11(2491-2503)Online publication date: 29-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware
June 2013
65 pages
ISBN:9781450321969
DOI:10.1145/2485278
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Give a JIT on GPUs: NVRTC for Code-Generating Database Systems2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00061(384-387)Online publication date: 13-May-2024
  • (2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
  • (2022)Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMSProceedings of the VLDB Endowment10.14778/3551793.355180915:11(2491-2503)Online publication date: 29-Sep-2022
  • (2022)Tile-based Lightweight Integer Compression in GPUProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526132(1390-1403)Online publication date: 10-Jun-2022
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • (2022)TCUDB: Accelerating Database with Tensor ProcessorsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517869(1360-1374)Online publication date: 10-Jun-2022
  • (2022)Revisiting Approximate Query Processing and Bootstrap Error Estimation on GPUDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_5(72-87)Online publication date: 8-Apr-2022
  • (2021)r3d3Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370323(277-288)Online publication date: 27-Feb-2021
  • (2020)POLARDB meets computational storageProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386695(29-42)Online publication date: 24-Feb-2020
  • (2020)Improving execution efficiency of just-in-time compilation based query processing on GPUsProceedings of the VLDB Endowment10.14778/3425879.342589014:2(202-214)Online publication date: 16-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media