[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3183713.3183735acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Efficient Top-K Query Processing on Massively Parallel Hardware

Published: 27 May 2018 Publication History

Abstract

A common operation in many data analytics workloads is to find the top-k items, i.e., the largest or smallest operations according to some sort order (implemented via LIMIT or ORDER BY expressions in SQL). A naive implementation of top-k is to sort all of the items and then return the first k, but this does much more work than needed. Although efficient implementations for top-k have been explored on traditional multi-core processors, there has been no prior systematic study of top-k implementations on GPUs, despite open requests for such implementations in GPU-based frameworks like TensorFlow and ArrayFire. In this work, we present several top-k algorithms for GPUs, including a new algorithm based on bitonic sort called bitonic top-k. The bitonic top-k algorithm is up to a factor of \new15x faster than sort and 4x faster than a variety of other possible implementations for values of k up to 256. We also develop a cost model to predict the performance of several of our algorithms, and show that it accurately predicts actual performance on modern GPUs.

References

[1]
{n. d.}. Arrayfire discussion on top-k. http://bit.ly/2lLuFS1. ({n. d.}).
[2]
{n. d.}. Issue to add gpu verion of top-k to tensorflow. https://github.com/ tensorflow/tensorflow/issues/5719. ({n. d.}).
[3]
Martín Abadi et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI.
[4]
Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2010. GGKS: Grinnell GPU k-selection. http://code.google.com/p/ggks/. (2010).
[5]
Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2012. Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) (2012).
[6]
Kenneth E Batcher. 1968. Sorting networks and their applications. In Proceedings of the spring joint computer conference.
[7]
Sebastian Breß, Henning Funke, and Jens Teubner. 2016. Robust query processing in co-processor-accelerated databases. In SIGMOD. ACM.
[8]
Jatin Chhugani, Anthony D Nguyen, Victor W Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. 2008. Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB (2008).
[9]
Wenbin Fang, Bingsheng He, and Qiong Luo. 2010. Database compression on graphics processors. PVLDB (2010).
[10]
Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. 2006. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD.
[11]
Mark Harris. 2007. Optimizing cuda. SC07: High Performance Computing With CUDA (2007).
[12]
Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-oblivious parallelism for in-memory column-stores. PVLDB (2013).
[13]
Ihab F Ilyas, George Beskales, and Mohamed A Soliman. 2008. A survey of top-k query processing techniques in relational database systems. CSUR (2008).
[14]
James Malcolm et al. 2012. ArrayFire: a GPU acceleration platform. In SPIE.
[15]
Duane Merrill and Andrew Grimshaw. 2011. High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing. Parallel Processing Letters (2011).
[16]
Todd Mostak. 2013. An overview of MapD (massively parallel database). White paper, Massachusetts Institute of Technology (2013).
[17]
Hagen Peters, Ole Schulz-Hildebrandt, and Norbert Luttenberger. 2010. Fast in-place sorting with cuda based on bitonic sort. Parallel Processing and Applied Mathematics (2010), 403--410.
[18]
Holger Pirk, Stefan Manegold, and Martin Kersten. 2014. Waste not Efficient co-processing of relational data. In ICDE. IEEE.
[19]
Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo-a vector algebra for portable database performance on modern hardware. PVLDB (2016).
[20]
Eyal Rozenberg and Peter Boncz. 2017. Faster across the PCIe bus: a GPU library for lightweight decompression: including support for patched compression schemes. In DaMoN. ACM.
[21]
Nadathur Satish, Mark Harris, and Michael Garland. 2009. Designing efficient sorting algorithms for manycore GPUs. In Parallel &Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on. IEEE, 1--10.
[22]
Elias Stehle and Hans-Arno Jacobsen. 2017. A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs. In SIGMOD. ACM.
[23]
Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of processing data warehousing queries on GPU devices. PVLDB (2013).

Cited By

View all
  • (2024)Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient CompressionProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696857(186-194)Online publication date: 18-Nov-2024
  • (2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024
  • (2024)RadiK: Scalable and Optimized GPU-Parallel Radix Top-K SelectionProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656596(537-548)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bitonic top-k
  2. top-k algorithms for gpu

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)122
  • Downloads (Last 6 weeks)17
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient CompressionProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696857(186-194)Online publication date: 18-Nov-2024
  • (2024)FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature GridProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680908(11089-11098)Online publication date: 28-Oct-2024
  • (2024)RadiK: Scalable and Optimized GPU-Parallel Radix Top-K SelectionProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656596(537-548)Online publication date: 30-May-2024
  • (2024)POSTER: RadiK: Scalable Radix Top-K Selection on GPUsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638478(472-474)Online publication date: 2-Mar-2024
  • (2024)ADTopk: All-Dimension Top-k Compression for High-Performance Data-Parallel DNN TrainingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658678(135-147)Online publication date: 3-Jun-2024
  • (2024)High-Similarity-Pass Attention for Single Image Super-ResolutionIEEE Transactions on Image Processing10.1109/TIP.2023.334829333(610-624)Online publication date: 2024
  • (2024)Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00289(3767-3781)Online publication date: 13-May-2024
  • (2024)Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00043(307-316)Online publication date: 6-May-2024
  • (2024)Durable reverse top-k queries on time-varying preferenceWorld Wide Web10.1007/s11280-024-01293-027:5Online publication date: 2-Aug-2024
  • (2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 29-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media