More Web Proxy on the site http://driver.im/

research-article

Optimizing select conditions on GPUs

Authors:

Evangelia A. Sitaridi,

Kenneth A. RossAuthors Info & Claims

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

Article No.: 4, Pages 1 - 8

https://doi.org/10.1145/2485278.2485282

Published: 24 June 2013 Publication History

Abstract

Implementations of data processing operators on GPU processors have achieved significant performance improvements over their multicore CPU counterparts. To achieve maximum performance, database operator implementations must take into consideration special features of GPU architectures. A crucial difference is that the unit of execution is a group ("warp") of threads, 32 threads in our target architecture, as opposed to a single thread for CPUs. In the presence of branches, threads in a warp have to follow the same execution path; if some threads diverge then different paths are serialized. Additionally, similarly to CPUs, branches degrade the efficiency of instruction scheduling. Here, we study conjunctive selection queries where branching hurts performance. We compute the optimal execution plan for a conjunctive query, taking branch penalties into account and consider both single-kernel and multi-kernel plans. Our evaluation suggests that divergence affects performance significantly and that our techniques reduce resource underutilization and improve operator performance.

References

[1]

P. Bakkum and K. Skadron. Accelerating SQL database operations on a GPU with CUDA. In GPGPU, 2010.

Digital Library

[2]

S. Carrillo, J. Siegel, and X. Li. A control-structure splitting optimization for GPGPU. In ACM conference on Computing frontiers, 2009.

Digital Library

[3]

N. Corporation. NVIDIA CUDA C Programming Guide. NVIDIA Corporation, April 2012.

[4]

A. Davidson, D. Tarjan, M. Garland, and J. D. Owens. Efficient parallel merge sort for fixed and variable length keys. In InPar, 2012.

[5]

D. J. Dewitt, S. R. Madden, D. J. Abadi, and D. S. Myers. Materialization strategies in a column-oriented DBMS. In ICDE, 2007.

[6]

G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu, and S. Yalamanchili. SIMD re-convergence at thread frontiers. In MICRO, 2011.

Digital Library

[7]

G. Diamos, H. Wu, A. Lele, J. Wang, and S. Yalamanchili. Efficient relational algebra algorithms and data structures for GPU. 2012.

[8]

R. Fang, B. He, M. Lu, K. Yang, N. K. Govindaraju, Q. Luo, and P. V. S. GPUQP: query co-processing using graphics processors.

[9]

W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, Washington, DC, USA, 2007.

Digital Library

[10]

T. D. Han and T. S. Abdelrahman. Reducing branch divergence in GPU programs. In GPGPU, 2011.

Digital Library

[11]

J. Hellerstein. Optimization techniques for queries with expensive methods. TODS, 23, 1998.

Digital Library

[12]

J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. SIGARCH Comput. Archit. News, 38(3), 2010.

Digital Library

[13]

T. Neumann. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow., 4(9):539--550, June 2011.

Digital Library

[14]

K. A. Ross. Selection conditions in main memory. TODS, 29(1), 2004.

Digital Library

[15]

E. A. Sitaridi and K. A. Ross. Ameliorating memory contention of OLAP operators on GPU processors. In DaMoN, 2012.

Digital Library

[16]

R. Taylor and X. Li. Software-based branch predication for AMD GPUs. SIGARCH Comput. Archit. News, 38(4):66--72, Jan. 2011.

Digital Library

[17]

H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In MICRO, 2012.

Digital Library

[18]

H. Wu, G. Diamos, A. Lele, J. Wang, S. Cadambi, S. Yalamanchili, and S. Chakradhar. Optimizing data warehousing applications for GPUs using kernel fusion/fission. In PLC Workshop, 2012.

Digital Library

[19]

E. Z. Zhang, Y. Jiang, Z. Guo, and X. Shen. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. In ICS, 2010.

Digital Library

[20]

E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In ASPLOS, 2011.

Digital Library

Cited By

Sachnov AMerzljak LSchüle M(2024)Give a JIT on GPUs: NVRTC for Code-Generating Database Systems2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00061(384-387)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00061
Yogatama BMiller BWang YMarkall GHemstad JKimball GYu X(2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595307
Yogatama BGong WYu X(2022)Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMSProceedings of the VLDB Endowment10.14778/3551793.355180915:11(2491-2503)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551809
Show More Cited By

Optimizing select conditions on GPUs
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Optimizing symmetric dense matrix-vector multiplication on GPUs
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the Symmetric Matrix Vector ...
Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on ...
Software-based branch predication for AMD GPUs

Branch predication is a program transformation technique that combines instructions of multiple branches of an if statement into a straight-line sequence and associates each instruction of the sequence with a predicate. The branch predication improves ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

June 2013

65 pages

ISBN:9781450321969

DOI:10.1145/2485278

Conference Chairs:
Ryan Johnson
University of Toronto
,
Alfons Kemper
Technische Universität München

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SIGMOD/PODS'13

Sponsor:

SIGMOD

SIGMOD/PODS'13: International Conference on Management of Data

June 24, 2013

New York, New York

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
515
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sachnov AMerzljak LSchüle M(2024)Give a JIT on GPUs: NVRTC for Code-Generating Database Systems2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00061(384-387)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00061
Yogatama BMiller BWang YMarkall GHemstad JKimball GYu X(2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595307
Yogatama BGong WYu X(2022)Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMSProceedings of the VLDB Endowment10.14778/3551793.355180915:11(2491-2503)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551809
Shanbhag AYogatama BYu XMadden SIves ZBonifati AEl Abbadi A(2022)Tile-based Lightweight Integer Compression in GPUProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526132(1390-1403)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526132
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Hu YLi YTseng HIves ZBonifati AEl Abbadi A(2022)TCUDB: Accelerating Database with Tensor ProcessorsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517869(1360-1374)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517869
Zhao HZhang HJing YZhang KHe ZWang X(2022)Revisiting Approximate Query Processing and Bootstrap Error Estimation on GPUDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_5(72-87)Online publication date: 8-Apr-2022
https://doi.org/10.1007/978-3-031-00123-9_5
Krolik AVerbrugge CHendren LLee J(2021)r3d3Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370323(277-288)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370323
Cao WLiu YCheng ZZheng NLi WWu WOuyang LWang PWang YKuan RLiu ZZhu FZhang TNoh SWelch B(2020)POLARDB meets computational storageProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386695(29-42)Online publication date: 24-Feb-2020
https://dl.acm.org/doi/10.5555/3386691.3386695
Paul JHe BLu SLau C(2020)Improving execution efficiency of just-in-time compilation based query processing on GPUsProceedings of the VLDB Endowment10.14778/3425879.342589014:2(202-214)Online publication date: 16-Nov-2020
https://dl.acm.org/doi/10.14778/3425879.3425890
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten