[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Efficient sorting using registers and caches

Published: 31 December 2002 Publication History

Abstract

Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines.A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-MERGE, which achieves better performance in practice over algorithms that are superior in the theoretical models. R-MERGE is designed to minimize memory stall cycles rather than cache misses by considering features common to many system designs.

Supplementary Material

PS File (p9-wickremesinghe.ps)
TAR File (p9-wickremesinghe.tex.tar)

References

[1]
AGGARWAL, A., ALPERN, B., CHANDRA, A. K., AND SNIR, M. 1987. A model for hierarchical memory. Volume 19 (New York, 1987), pp. 305-314.]]
[2]
AGGARWAL, A., CHANDRA, A., AND SNIR, M. 1987. Hierarchical memory with block transfer. Volume 28 (Los Angeles, 1987), pp. 204-216.]]
[3]
AGGARWAL, A. AND VITTER, J.S. 1988. The Input/Output complexity of sorting and related problems. Commun. ACM 31, 9, 1116-1127.]]
[4]
ALPERN, B., CARTER, L., FEIG, E., AND SELKER, T. 1994. The uniform memory hierarchy model of computation. 12, 2-3, 72-109.]]
[5]
ANDERSON, J. M., BERC, L. M., DEAN, J., GHEMAWAT, S., HENZINGER, M. R., LEUNG, S. A., SITES, R. L., VANDERVOORDE, M. T., WALDSPURGER, C. A., AND WEIHL, W. E. 1997. Continuous profiling: Where have all the cycles gone? In Proceedings of the 16th Symposium on Operating Systems Principles (SOSP-97), Volume 31,5 of Operating Systems Review (New York, Oct. 5-8 1997), pp. 1-14. ACM Press.]]
[6]
ARGE, L., CHASE, J., VITTER, J. S., AND WICKREMESINGHE, R. 2000. Efficient sorting using registers and caches. In WAE, Workshop on Algorithm Engineering, Lecture Notes in Computer Science (Sept. 2000). Springer. Journal version to appear in the ACM Journal of Experimental Algorithmics.]]
[7]
CALLAHAN, D., CARR, S., AND KENNEDY, K. 1990. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation (White Plains, NY, June 1990).]]
[8]
EDMONDSON, J. H., RUBINFELD, P. I., BANNON, P. J., BENSCHNEIDER, B. J., BERNSTEIN, D., CASTELINO, R. W., COOPER, E. M., DEVER, D. E., DONCHIN, D. R., FISCHER, T. C., JAIN, A. K., MEHTA, S., MEYER, J. E., PRESTON, R. P., RAJAGOPALAN, V., SOMANATHAN, C., TAYLOR, S. A., AND WOLRICH, G.M. 1995. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal 7, 1, 119-135.]]
[9]
FRIGO, LEISERSON, PROKOP, AND RAMACHANDRAN. 1999. Cache-oblivious algorithms. In FOCS: IEEE Symposium on Foundations of Computer Science (FOCS) (1999).]]
[10]
HENNESSY, J. L. AND PATTERSON, D. A. 1995. Computer Architecture: A Quantitative Approach (second ed.). Morgan Kaufmann.]]
[11]
KNUTH, D. E. 1998. Sorting and Searching (2nd ed.), Volume 3 of The Art of Computer Programming. Addison-Wesley, Reading, MA.]]
[12]
LADNER, R., FIX, J., AND LAMARCA, A. 1999. Cache performance analysis of traversals and random accesses. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (1999).]]
[13]
LAM, M. S., ROTHBERG, E. E., AND WOLF, M. E. 1991. The cache performance and optimizations of blocked algorithms. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (April 1991).]]
[14]
LAMARCA, A. AND LADNER, R. E. 1997. The influence of caches on the performance of sorting. Volume 7 (January 1997), pp. 370-379.]]
[15]
RAHMAN, N. AND RAMAN, R. 1999. Analysing cache effects in distribution sorting. In 3rd Workshop on Algorithm Engineering (July 1999).]]
[16]
RAHMAN, N. AND RAMAN, R. 2000. Adapting radix sort to the memory hierarchy. In ALENEX, Workshop on Algorithm Engineering and Experimentation (2000).]]
[17]
RANADE, A., KOTHARI, S., AND UDUPA, R. 2000. Register efficient mergesorting. In International Conference on High Performance Computing (2000).]]
[18]
SANDERS, P. 1999a. Accessing multiple sequences through set associative caches. In J. WIEDERMANN, P. VAN EMDE BOAS, AND M. NIELSEN Eds., Automata, Languages and Programruing, 26th International Colloquium, ICALP'99, Prague, Czech Republic, July 11-15, 1999, Proceedings, Volume 1644 of Lecture Notes in Computer Science (1999). Springer.]]
[19]
SANDERS, P. 1999b. Fast priority queues for cached memory. In ALENEX, Workshop on Algorithm Engineering and Expermentation, Lecture Notes in Computer Science (1999).]]
[20]
SEN, S. AND CHATTERJEE, S. 2000. Towards a theory of cache-efficient algorithms. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (2000).]]
[21]
SRIVASTAVA, A. AND EUSTACE, A. 1994. ATOM: A system for building customized program analysis tools. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation (June 1994), pp. 196-205.]]
[22]
VITTER, J. S. AND SHRIVER, E.A.M. 1994. Algorithms for parallel memory I: Two-level memories. 12, 2-3, 110-147.]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics
ACM Journal of Experimental Algorithmics  Volume 7, Issue
2002
218 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/944618
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 December 2002
Published in JEA Volume 7

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Complexity analysis and performance of double hashing sort algorithmJournal of the Egyptian Mathematical Society10.1186/s42787-019-0004-227:1Online publication date: 4-Apr-2019
  • (2016)Analyzing Cache MissesEncyclopedia of Algorithms10.1007/978-1-4939-2864-4_14(82-86)Online publication date: 22-Apr-2016
  • (2015)Analyzing Cache MissesEncyclopedia of Algorithms10.1007/978-3-642-27848-8_14-2(1-7)Online publication date: 18-Apr-2015
  • (2014)Patience is a virtueProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2593662(731-742)Online publication date: 18-Jun-2014
  • (2014)Analytical & empirical analysis of external sorting algorithms2014 International Conference on Data Mining and Intelligent Computing (ICDMIC)10.1109/ICDMIC.2014.6954224(1-6)Online publication date: Sep-2014
  • (2014)Fast sorting for exact OIT of complex scenesThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-014-0956-z30:6-8(603-613)Online publication date: 1-Jun-2014
  • (2013)Toward a Theory of Algorithm-Architecture Co-designHigh Performance Computing for Computational Science - VECPAR 201210.1007/978-3-642-38718-0_2(4-8)Online publication date: 2013
  • (2010)Sorting and order statisticsAlgorithms and theory of computation handbook10.5555/1882757.1882760(3-3)Online publication date: 1-Feb-2010
  • (2009)A Partition-Merge Based Cache-Conscious Parallel Sorting Algorithm for CMP with Shared CacheProceedings of the 2009 International Conference on Parallel Processing10.1109/ICPP.2009.26(396-403)Online publication date: 22-Sep-2009
  • (2008)An experimental study of sorting and branch predictionACM Journal of Experimental Algorithmics10.1145/1227161.137059912(1-39)Online publication date: 12-Jun-2008
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media