[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/237090.237195acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

Compiler-directed page coloring for multiprocessors

Published: 01 September 1996 Publication History

Abstract

This paper presents a new technique, compiler-directed page coloring, that eliminates conflict misses in multiprocessor applications. It enables applications to make better use of the increased aggregate cache size available in a multiprocessor. This technique uses the compiler's knowledge of the access patterns of the parallelized applications to direct the operating system's virtual memory page mapping strategy. We demonstrate that this technique can lead to significant performance improvements over two commonly used page mapping strategies for machines with either direct-mapped or two-way set-associative caches. We also show that it is complementary to latency-hiding techniques such as prefetching.We implemented compiler-directed page coloring in the SUIF parallelizing compiler and on two commercial operating systems. We applied the technique to the SPEC95fp benchmark suite, a representative set of numeric programs. We used the SimOS machine simulator to analyze the applications and isolate their performance bottlenecks. We also validated these results on a real machine, an eight-processor 350MHz Digital AlphaServer. Compiler-directed page coloring leads to significant performance improvements for several applications. Overall, our technique improves the SPEC95fp rating for eight processors by 8% over Digital UNIX's page mapping policy and by 20% over a page coloring, a standard page mapping policy. The SUIF compiler achieves a SPEC95fp ratio of 57.4, the highest ratio to date.

References

[1]
Saman P. Amarasinghe, Jennifer M. Anderson, Christopher S. Wilson, Shih-Wei Liao, Robert S. French, Mary W. Hall, Brian R. Murphy and Monica S. Lam. The Multiprocessor as a General-Purpose Processor: A Software Perspective. IEEE Micro, 16(3), jun. 1996.
[2]
Jennifer M. Anderson, Saman P. Amarasinghe and Monica S. Lam, "Data and Computation Transformations for Multiprocessors," In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Jul. 1995, pp. 166-178.
[3]
Jennifer M. Anderson and Monica S. Lam, "Global Optimizations for Parallelism and Locality on Scalable Parallel Machines", In Proceedings of the A CM SIGPLAN'93 Conference on Programming Language Design and Implementation, Jun. 1993, pp. 112-125.
[4]
Brian N. Bershad, Dennis Lee, Theodore H. Romer, and J. Bradley Chen, "Avoiding Conflict Misses Dynamically in Large Direct-Mapped Caches", In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994, pp. 158-170.
[5]
David E Bacon, Susan L. Graham and Oliver J. Sharp, "Compiler Transformations for High-Performance Computing", In Computing Surveys, 26 (4), Dec. 1994.
[6]
David Callahan, Ken Kennedy and Allan Porterfield, "Software Prefetching", In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 1991, pp. 40-52.
[7]
Steve Carr, Kathryn S. McKinley and Chau-Wen Tseng, "Compiler Optimizations for Improving Data Locality", In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994, pp. 252-262.
[8]
Michel Dubois, Jonas Skeppstedt, Livio Ricciulli, Krishnan Ramamurthy and Per Stenstrom, "The Detection and Elimination of Useless Misses in Multiprocessors", In Proceedings of the 20th International Symposium on Computer Architecture, May 1993, pp. 88-97.
[9]
Susan J. Eggers and Randy H. Katz, "The effect of sharing on cache and bus performance of parallel programs", in Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 1989, pp. 257-270.
[10]
Dawson R. Engler, M. Frans Kaashoek and James O'Toole Jr. "Exokernel: An Operating System Architecture for Application-Level Resource Managment", In Proceedings of the 15th A CM Symposium on Operating System Principles, Dec. 1995, pp 251-266.
[11]
Manish Gupta and Prith Banerjee, "Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers." In IEEE Transactions on Parallel and Distributed Systems, 3(2), Mar. 1992, pp. 179- 193.
[12]
Mary W. Hall, Saman E Amarasinghe, Brian R. Murphy, Shih-Wei Liao and Monica S. Lain, "Detecting Coarse-Grain Parallelism Using an Interproceclural Parallelizing Compiler," In Proceedings of Supercomputing '95, Dec. 1995.
[13]
Kieran Harty and David R. Cheriton, "Application-controlled Physical Memory using External Page-Cache Management", In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 1991.
[14]
Tor E. Jeremiassen and Susan J. Eggers, "Reducing False Sharing on Shared Memory Multiprocessors through Compile Time Data Transformations", In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Jul. 1095, pp. 179-188.
[15]
Ken Kennedy and Ulrich Kremer, "Automatic Data Layout for High Performance Fortran", In Proceedings of Supercomputing '95, Dec. 1995.
[16]
Richard E. Kessler and Mark D. Hill, "Page Placement Algorithms for Large Real-indexed Caches", In A CM Transactions on Computer Systems, 10(4), Nov. 1992.
[17]
Butler W. Lampson, "Hints for Computer System Design", In Proceedings of the Ninth A CM Symposium on Operating Systems Principles, Oct. 1983, pp. 33-48.
[18]
Todd C. Mowry, Monica S. Lain and Anoop Gupta, "Design and Evaluation of a Compiler Algorithm for Prefetching", In Proceedings of the Fifth international Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, pp. 62-73.
[19]
Todd C. Mowry, "Tolerating Latency through Softwarecontrolled Data Prefetching", Ph.D. thesis, Technical Report CSL-TR-94-626, Stanford University, Mar. 1994.
[20]
Theodore H. Romer, Dennis Lee, Brian N. Bershad and J. Bradley Chen, "Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware", In Proceedings of the First Symposium on Operating Systems Design and Implementation, Nov. 1994, pp. 2;55-266.
[21]
Mendel Rosenblum, Stephen A. lterrod, Emmett Witchel and Anoop Gupta, "Complete Computer Simulation: The SimOS Approach", In IEEE Parallel and Distributed Technology, 3(4), Fall 1995.
[22]
Standard Performance Evaluation Corporation, The SPEC95fp benchmark suite, http://www, spechbench, org.
[23]
Ben Verghese, Scott Devine, Anoop Gupta and Mendel Rosenblum, "Operating System Support for Improving Locality on CC-NUMA Compute Servers", In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.
[24]
Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Steven W.K. Tjiang, Shi-Wei Liao, Chau-W~n Tseng, Mary W. Hall, Monica S. Lam and John L. Hennessy, "SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers", In ACM SIGPLAN Notices, 29(12), Dec. 1994.
[25]
Emmett Witchel and Mendel Rosenblum, "Embra: Fast and Flexible Machine Simulation", In Proceedings of the A CM SIGMETRICS '96 Conference on Measurement and Modeling of Computer Systems, May 1996, pp. 68-79.
[26]
Michael E. Wolf and Monica S. Lam, "A Data Locality Optimizing Algorithm", In Proceedings of the A CM SIGPLAN '91 Conference on Programming Language Design and Implementation, June 1991, pp. 30-44.

Cited By

View all
  • (2023)PinIt: Influencing OS Scheduling via Compiler-Induced AffinitiesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596279(87-98)Online publication date: 13-Jun-2023
  • (2022)Software-defined address mapping: a case on 3D memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507774(70-83)Online publication date: 28-Feb-2022
  • (2018)Reducing the second-level cache conflict misses using a set folding techniqueThe Journal of Supercomputing10.1007/s11227-017-2174-874:2(970-993)Online publication date: 1-Feb-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
October 1996
290 pages
ISBN:0897917677
DOI:10.1145/237090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ASPLOS96
Sponsor:

Acceptance Rates

ASPLOS VII Paper Acceptance Rate 25 of 109 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)31
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)PinIt: Influencing OS Scheduling via Compiler-Induced AffinitiesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596279(87-98)Online publication date: 13-Jun-2023
  • (2022)Software-defined address mapping: a case on 3D memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507774(70-83)Online publication date: 28-Feb-2022
  • (2018)Reducing the second-level cache conflict misses using a set folding techniqueThe Journal of Supercomputing10.1007/s11227-017-2174-874:2(970-993)Online publication date: 1-Feb-2018
  • (2017)Locality-Aware Dynamic Task Graph Scheduling2017 46th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2017.16(70-80)Online publication date: Aug-2017
  • (2016)MARACAS: A Real-Time Multicore VCPU Scheduling Framework2016 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS.2016.026(179-190)Online publication date: Nov-2016
  • (2015)vCacheProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830825(623-634)Online publication date: 5-Dec-2015
  • (2015)A Survey on Cache Management Mechanisms for Real-Time Embedded SystemsACM Computing Surveys10.1145/283055548:2(1-36)Online publication date: 3-Nov-2015
  • (2015)Optimizing off-chip accesses in multicoresACM SIGPLAN Notices10.1145/2813885.273798950:6(131-142)Online publication date: 3-Jun-2015
  • (2015)Optimizing off-chip accesses in multicoresProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737989(131-142)Online publication date: 3-Jun-2015
  • (2015)Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056041(296-308)Online publication date: Feb-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media