Fast In-Place Sorting with CUDA Based on Bitonic Sort

Hagen Peters²⁰,
Ole Schulz-Hildebrandt²⁰ &
Norbert Luttenberger²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6067))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1818 Accesses
18 Citations

Abstract

State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit to the characteristics of modern GPU-architecture.

We present a high-performance in-place implementation of Batcher’s bitonic sorting networks for CUDA-enabled GPUs. We adapted bitonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

BPLG–BMCS: GPU-sorting algorithm using a tuning skeleton library

Article 13 December 2015

A comparison-free sorting algorithm on CPUs and GPUs

Article 30 August 2018

Survey of GPU Based Sorting Algorithms

Article 11 April 2017

References

Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient conditional operations for data-parallel architectures. In: MICRO 33: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, pp. 159–170. ACM, New York (2000)
Google Scholar
Purcell, T.J., Donner, C., Cammarano, M., Jensen, H.W., Hanrahan, P.: Photon mapping on programmable graphics hardware. In: HWWS 2003: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, Aire-la-Ville, Switzerland, pp. 41–50. Eurographics Association (2003)
Google Scholar
Kipfer, P., Segal, M., Westermann, R.: Uberflow: a gpu-based particle engine. In: HWWS 2004: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pp. 115–122. ACM, New York (2004)
Chapter Google Scholar
Govindaraju, N., Raghuvanshi, N., Henson, M., Manocha, D.: A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Technical report, University of North Carolina-Chapel Hill (2005)
Google Scholar
Greb, A., Zachmann, G.: Gpu-abisort: optimal parallel sorting on stream architectures. In: 20th International on Parallel and Distributed Processing Symposium, IPDPS 2006 (2006)
Google Scholar
Batcher, K.: Sorting networks and their applications. In: AFIPS Spring Joint Comput. Conf. (1967)
Google Scholar
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with cuda. In: GPU Gems 3. Addison-Wesley, Reading (2007)
Google Scholar
Grand, S.L.: Broad-phase collision detection with cuda. In: GPU Gems, vol. 3, Addison-Wesley, Reading (2007)
Google Scholar
He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: SC 2007: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1–12. ACM, New York (2007)
Chapter Google Scholar
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for gpu computing. In: GH 2007: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, Aire-la-Ville, Switzerland, pp. 97–106. Eurographics Association (2007)
Google Scholar
Cederman, D., Tsigas, P.: A practical quicksort algorithm for graphics processors. In: Halperin, D., Mehlhorn, K. (eds.) Esa 2008. LNCS, vol. 5193, pp. 246–258. Springer, Heidelberg (2008)
Chapter Google Scholar
Sintorn, E., Assarsson, U.: Fast parallel gpu-sorting using a hybrid algorithm, Orlando, FL, USA, vol. 68, pp. 1381–1388. Academic Press, Inc., London (2008)
Google Scholar
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore gpus. In: Proceedings 23rd IEEE International Parallel and Distributed Processing Symposium (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Group for Communication Systems, Department of Computer Science, Christian-Albrechts-University Kiel, Germany
Hagen Peters, Ole Schulz-Hildebrandt & Norbert Luttenberger

Authors

Hagen Peters
View author publications
You can also search for this author in PubMed Google Scholar
Ole Schulz-Hildebrandt
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Luttenberger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computational and Information Sciences, Czestochowa University of Technology,
Roman Wyrzykowski
Department of Electrical Engineering and Computer Science, University of Tennessee, TN 37996-3450, Knoxville, USA
Jack Dongarra
Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego 73, PL-42-200, Czestochowa, Poland
Konrad Karczewski
Department of Informatics and Mathematical Modeling, Technical University of Denmark, Richard Petersens Plads, Building 321, 2800, Kongens Lyngby, Denmark
Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peters, H., Schulz-Hildebrandt, O., Luttenberger, N. (2010). Fast In-Place Sorting with CUDA Based on Bitonic Sort. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-14390-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14389-2
Online ISBN: 978-3-642-14390-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast In-Place Sorting with CUDA Based on Bitonic Sort

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

BPLG–BMCS: GPU-sorting algorithm using a tuning skeleton library

A comparison-free sorting algorithm on CPUs and GPUs

Survey of GPU Based Sorting Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Fast In-Place Sorting with CUDA Based on Bitonic Sort

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

BPLG–BMCS: GPU-sorting algorithm using a tuning skeleton library

A comparison-free sorting algorithm on CPUs and GPUs

Survey of GPU Based Sorting Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation