More Web Proxy on the site http://driver.im/

research-article

Efficient Top-K Query Processing on Massively Parallel Hardware

Authors:

Samuel MaddenAuthors Info & Claims

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Pages 1557 - 1570

https://doi.org/10.1145/3183713.3183735

Published: 27 May 2018 Publication History

Abstract

A common operation in many data analytics workloads is to find the top-k items, i.e., the largest or smallest operations according to some sort order (implemented via LIMIT or ORDER BY expressions in SQL). A naive implementation of top-k is to sort all of the items and then return the first k, but this does much more work than needed. Although efficient implementations for top-k have been explored on traditional multi-core processors, there has been no prior systematic study of top-k implementations on GPUs, despite open requests for such implementations in GPU-based frameworks like TensorFlow and ArrayFire. In this work, we present several top-k algorithms for GPUs, including a new algorithm based on bitonic sort called bitonic top-k. The bitonic top-k algorithm is up to a factor of \new15x faster than sort and 4x faster than a variety of other possible implementations for values of k up to 256. We also develop a cost model to predict the performance of several of our algorithms, and show that it accurately predicts actual performance on modern GPUs.

References

[1]

{n. d.}. Arrayfire discussion on top-k. http://bit.ly/2lLuFS1. ({n. d.}).

[2]

{n. d.}. Issue to add gpu verion of top-k to tensorflow. https://github.com/ tensorflow/tensorflow/issues/5719. ({n. d.}).

[3]

Martín Abadi et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI.

Digital Library

[4]

Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2010. GGKS: Grinnell GPU k-selection. http://code.google.com/p/ggks/. (2010).

[5]

Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2012. Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) (2012).

Digital Library

[6]

Kenneth E Batcher. 1968. Sorting networks and their applications. In Proceedings of the spring joint computer conference.

Digital Library

[7]

Sebastian Breß, Henning Funke, and Jens Teubner. 2016. Robust query processing in co-processor-accelerated databases. In SIGMOD. ACM.

Digital Library

[8]

Jatin Chhugani, Anthony D Nguyen, Victor W Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. 2008. Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB (2008).

Digital Library

[9]

Wenbin Fang, Bingsheng He, and Qiong Luo. 2010. Database compression on graphics processors. PVLDB (2010).

Digital Library

[10]

Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. 2006. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD.

Digital Library

[11]

Mark Harris. 2007. Optimizing cuda. SC07: High Performance Computing With CUDA (2007).

[12]

Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-oblivious parallelism for in-memory column-stores. PVLDB (2013).

Digital Library

[13]

Ihab F Ilyas, George Beskales, and Mohamed A Soliman. 2008. A survey of top-k query processing techniques in relational database systems. CSUR (2008).

Digital Library

[14]

James Malcolm et al. 2012. ArrayFire: a GPU acceleration platform. In SPIE.

[15]

Duane Merrill and Andrew Grimshaw. 2011. High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing. Parallel Processing Letters (2011).

[16]

Todd Mostak. 2013. An overview of MapD (massively parallel database). White paper, Massachusetts Institute of Technology (2013).

[17]

Hagen Peters, Ole Schulz-Hildebrandt, and Norbert Luttenberger. 2010. Fast in-place sorting with cuda based on bitonic sort. Parallel Processing and Applied Mathematics (2010), 403--410.

Digital Library

[18]

Holger Pirk, Stefan Manegold, and Martin Kersten. 2014. Waste not Efficient co-processing of relational data. In ICDE. IEEE.

[19]

Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo-a vector algebra for portable database performance on modern hardware. PVLDB (2016).

Digital Library

[20]

Eyal Rozenberg and Peter Boncz. 2017. Faster across the PCIe bus: a GPU library for lightweight decompression: including support for patched compression schemes. In DaMoN. ACM.

Digital Library

[21]

Nadathur Satish, Mark Harris, and Michael Garland. 2009. Designing efficient sorting algorithms for manycore GPUs. In Parallel &Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on. IEEE, 1--10.

Digital Library

[22]

Elias Stehle and Hans-Arno Jacobsen. 2017. A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs. In SIGMOD. ACM.

Digital Library

[23]

Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of processing data warehousing queries on GPU devices. PVLDB (2013).

Digital Library

Cited By

Han WVargaftik SMitzenmacher MKarp BBasat R(2024)Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient CompressionProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696857(186-194)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696857
Yin DShi JZhang MHuang ZLiu JDong FCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680908
Li YZhou BZhang JWei XLi YChen Y(2024)RadiK: Scalable and Optimized GPU-Parallel Radix Top-K SelectionProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656596(537-548)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656596
Show More Cited By

Index Terms

Efficient Top-K Query Processing on Massively Parallel Hardware

Recommendations

Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Recently, Intel Xeon Phi is emerging as a many-core processor with up to 61 x86 cores. In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel ...
Massively parallel differential evolution--pattern search optimization with graphics hardware acceleration: an investigation on bound constrained optimization problems

This paper presents a novel parallel Differential Evolution (DE) algorithm with local search for solving function optimization problems, utilizing graphics hardware acceleration. As a population-based meta-heuristic, DE was originally designed for ...
Efficient Top-k Query Answering through its Top-N Rewritings Using Views
PIKM '15: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management

Recently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

May 2018

1874 pages

ISBN:9781450347037

DOI:10.1145/3183713

General Chairs:
Gautam Das
University of Texas at Arlington, USA
,
Christopher Jermaine
Rice University, USA
,
Philip Bernstein
Microsoft Research, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '18

Sponsor:

SIGMOD

SIGMOD/PODS '18: International Conference on Management of Data

June 10 - 15, 2018

TX, Houston, USA

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
1,359
Total Downloads

Downloads (Last 12 months)122
Downloads (Last 6 weeks)17

Reflects downloads up to 05 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han WVargaftik SMitzenmacher MKarp BBasat R(2024)Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient CompressionProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696857(186-194)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696857
Yin DShi JZhang MHuang ZLiu JDong FCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680908
Li YZhou BZhang JWei XLi YChen Y(2024)RadiK: Scalable and Optimized GPU-Parallel Radix Top-K SelectionProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656596(537-548)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656596
Li YZhou BZhang JWei XLi YChen YLee IChabbi MSteuwer M(2024)POSTER: RadiK: Scalable Radix Top-K Selection on GPUsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638478(472-474)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638478
Ming ZHu YZhou WZheng XYao CFeng DMencagli GDazzi PLowenthal DBadia R(2024)ADTopk: All-Dimension Top-k Compression for High-Performance Data-Parallel DNN TrainingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658678(135-147)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658678
Su JGan MChen GGuo WChen C(2024)High-Similarity-Pass Attention for Single Image Super-ResolutionIEEE Transactions on Image Processing10.1109/TIP.2023.334829333(610-624)Online publication date: 2024
https://doi.org/10.1109/TIP.2023.3348293
Huang YFan XYan SWeng C(2024)Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00289(3767-3781)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00289
Yoon DOh S(2024)Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00043(307-316)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00043
Zhang CLi JJiang S(2024)Durable reverse top-k queries on time-varying preferenceWorld Wide Web10.1007/s11280-024-01293-027:5Online publication date: 2-Aug-2024
https://doi.org/10.1007/s11280-024-01293-0
Yang YZhang GWu YZhao ZFu Y(2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 29-Mar-2024
https://doi.org/10.1007/s11227-024-06031-x
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten