[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Analyzing and enhancing the parallel sort operation on multithreaded architectures

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The sort operation is a core part of many critical applications (e.g., database management systems). Despite the large efforts to parallelize it, the fact that it suffers from high data-dependencies vastly limits its performance. Multithreaded architectures are emerging as the most demanding technology in leading-edge processors. These architectures include simultaneous multithreading, chip multiprocessors, and machines combining different multithreading technologies. In this paper, we analyze the memory behavior and improve the performance of the most recent parallel radix and quick integer sort algorithms on modern multithreaded architectures. We achieve speedups up to 4.69× for radix sort and up to 4.17× for quicksort on a machine with 4 multithreaded processors compared to single threaded versions, respectively. We find that since radix sort is CPU-intensive, it exhibits better results on chip multiprocessors where multiple CPUs are available. While quicksort is accomplishing speedups on all types of multithreading processers due to its ability to overlap memory miss latencies with other useful processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Brodal G, Fagerberg R, Moruz G (2008) On the adaptiveness of quicksort. J Exp Algorithm (JEA) 12:Article 3.2

    MathSciNet  Google Scholar 

  2. Chen J, Juang P, Ko K, Contreras G, Penry D, Rangan R, Stoler A, Peh L, Martonosi M (2005) Hardware-modulated parallelism in chip multiprocessors. ACM SIGARCH Comput Archit News Arch 33(4):54–63

    Article  Google Scholar 

  3. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press and McGraw-Hill, Cambridge, New York. ISBN 0-262-03293-7. Section 8.4: Bucket Sort, pp 174–177

    MATH  Google Scholar 

  4. DeWitt D, Naughton J, Schneider D (1992) Parallel Sorting on Shared Nothing Architectures Using Probabilistic Splitting. In: Proceedings of the 1st Intel conference on parallel and distributed info systems, 1992, pp 280–291

  5. Graefe G (2006) Implementing sorting in database systems. ACM Comput Surv (CSUR) 38(3)

  6. Hammond L, Nayfeh B, Olukotun K (1997) A single-chip multiprocessor. IEEE Comput 30(9):79–85

    Google Scholar 

  7. Intel® (2009) VTune Performance Analyzer for Linux. URL: http://www.intel.com/software/products/vtune/

  8. Intel C++ Compiler for Linux (2009) URL: http://www.intel.com/cd/software/products/asmo-na/eng/compilers/277618.htm

  9. Intel® Core 2 Duo (2009) URL: http://www.intel.com/products/processor/core2duo/index.htm

  10. Intel Hyper-Threading Technology (2002) URL: http://www.intel.com/technology/itj/2002/volume06issue01/vol6iss1_hyper_threading_technology.pdf

  11. Jiménez-González D, Navarro JJ, Larriba-Pey J (1999) Communication and cache conscious radix sort. In: Proceedings of the international conference on supercomputing, 1999, pp 76–83

  12. Jiménez-González D, Navarro JJ, Larriba-Pey J (2001) Fast Parallel in-memory 64-bit Sorting. In: Proceedings of the 15th ACM international conference on supercomputing (ICS), 2001, pp 114–122

  13. Jiménez-González D, Navarro JJ, Larriba-Pey J (2003) CC-Radix: a cache conscious sorting based on radix sort. In: Proceedings of the 11th Euromicro conference on parallel distributed and network-based processing (PDP), 2003, pp 101–108

  14. Knuth D (1997) The art of computer programming, vol 3: sorting and searching, 3rd edn. Addison-Wesley, Reading

    MATH  Google Scholar 

  15. LaMarca A, Ladner R (1997) The influence of caches on the performance of sorting. In: Proceeding of the ACM/SIAM symposium on discrete algorithms, 1997, pp 370–379

  16. Larriba-Pey JL, Jimenez D, Navarro J (1997) An analysis of superscalar Sorting Algorithms on an R8000 Processor. In: Proceedings of the 17th international conference of the Chilean Computer Science Society (SCCC), 1997, pp 125–134

  17. Lee S, Jeon M, Kim D, Sohn A (2002) Partition parallel radix sort. J Parallel Distrib Comput 656–668

  18. Marr DT, Binns F, Hill DL, Hinton G, Koufaty DA, Miller JA, Upton M (2002) Hyper-threading technology architecture and microarchitecture. Intel Technol J (Q1):4–15

  19. OpenMP® (2009) URL: http://www.openmp.org/

  20. Rahman N, Raman R (2000) Adapting radix sort to the memory hierarchy. In: Proceedings of the 2nd workshop on algorithm engineering and experiments (ALENEX), 2000, pp 131–146

  21. Rahman N, Raman R (2000) Analysing the cache behaviour of non-uniform distribution sorting algorithms. In: Proceedings of the European symposium on algorithms (ESA), 2000, pp 380–391

  22. Sedgewick R (1978) Implementing quicksort programs. Commun ACM 21:847–857

    Article  MATH  Google Scholar 

  23. Sohn A, Kodama Y (1998) Load balanced parallel radix sort. In: Proceeding of the international conference of supercomputing, 1998, pp 305–312

  24. Tsigas P, Zhang Yi (2002) Parallel quicksort seems to outperform sample sort on cache-coherent shared memory multiprocessors: an evaluation on SUN ENTERPRISE 10000. Technical Report 2002-03, Department of Computer Science, Chalmers University of Technology

  25. Tsigas P, Zhang Yi (2003) A Simple, Fast Parallel Implementation of Quicksort and its Performance Evaluation on Sun Enterprise 10000. In: Proceedings of the 11th EUROMICRO conference on parallel distributed and network-based processing (PDP), 2003, pp 372–381

  26. Tullsen D, Eggers S, Levy H (1995) Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd annual international symposium on computer architecture, (ISCA), 1995

  27. Xiao L, Zhang X, Kubricht SA (2000) Improving memory performance of sorting algorithms. ACM J Exp Algorithm 5(3):1–22

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wessam M. Hassanein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rashid, L., Hassanein, W.M. & Hammad, M.A. Analyzing and enhancing the parallel sort operation on multithreaded architectures. J Supercomput 53, 293–312 (2010). https://doi.org/10.1007/s11227-009-0294-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-009-0294-5

Keywords

Navigation