[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/647257.720782guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Efficient Sorting Using Registers and Caches

Published: 05 September 2000 Publication History

Abstract

Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines.
A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-merge, which achieves better performance in practice over algorithms that are theoretically superior under the models. R-merge is designed to minimize memory stall cycles rather than cache misses, considering features common to many system designs.

References

[1]
A. Aggarwal, B. Alpern, A. K. Chandra, and M. Snir. A model for hierarchical memory. Proceedings of the 19th ACM Symposium on Theory of Computation, pages 305-314, 1987.
[2]
A. Aggarwal and J. S. Vitter. The Input/Output complexity of sorting and related problems. Communications of the ACM, 31(9):1116-1127, 1988.
[3]
J. Anderson, L. Berc, J. Dean, S. Ghemawat, M. Henzinger, S.-T. Leung, M. Vandevoorde, C. Waldspurger, and B. Weihl. Continuous profiling: Where have all the cycles gone? In Proceedings of the Sixteenth ACM Symposium on Operating System Principles (SOSP), October 1997.
[4]
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation, White Plains, NY, June 1990.
[5]
DEC. Programmer's Guide. Digital Unix Documentation Library. ATOM toolkit reference.
[6]
J. H. Edmondson, P. I. Rubinfeld, P. J. Bannon, B. J. Benschneider, D. Bernstein, R. W. Castelino, E. M. Cooper, D. E. Dever, D. R. Donchin, T. C. Fischer, A. K. Jain, S. Mehta, J. E. Meyer, R. P. Preston, V. Rajagopalan, C. Somanathan, S. A. Taylor, and G. M. Wolrich. Internal organization of the Alpha 21164, a 300- MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 7(1):119-135, 1995.
[7]
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2 edition, 1995.
[8]
D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading MA, second edition, 1998.
[9]
R. Ladner, J. Fix, and A. LaMarca. Cache performance analysis of traversals and random accesses. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, 1999.
[10]
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.
[11]
A. LaMarca and R. E. Ladner. The influence of caches on the performance of sorting. In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 1997.
[12]
N. Rahman and R. Raman. Analysing cache effects in distribution sorting. In 3rd Workshop on Algorithm Engineering, 1999.
[13]
N. Rahman and R. Raman. Adapting radix sort to the memory hierarchy. In ALENEX, Workshop on Algorithm Engineering and Expermentation, 2000.
[14]
P. Sanders. Accessing multiple sequences through set associative caches. ICALP, 1999.
[15]
P. Sanders. Fast priority queues for cached memory. In ALENEX, Workshop on Algorithm Engineering and Expermentation. Max-Plank-Institut für Informatik, 1999.
[16]
S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2000.

Cited By

View all
  • (2002)An Out-of-Core Sorting Algorithm for Clusters with Processors at Different SpeedProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.661390Online publication date: 15-Apr-2002
  • (2002)Efficient sorting using registers and cachesACM Journal of Experimental Algorithmics10.1145/944618.9446277(9)Online publication date: 31-Dec-2002
  1. Efficient Sorting Using Registers and Caches

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    WAE '00: Proceedings of the 4th International Workshop on Algorithm Engineering
    September 2000
    242 pages
    ISBN:3540425128

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 05 September 2000

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2002)An Out-of-Core Sorting Algorithm for Clusters with Processors at Different SpeedProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.661390Online publication date: 15-Apr-2002
    • (2002)Efficient sorting using registers and cachesACM Journal of Experimental Algorithmics10.1145/944618.9446277(9)Online publication date: 31-Dec-2002

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media