[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Evaluation of NUMA Memory Management Through Modeling and Measurements

Published: 01 November 1992 Publication History

Abstract

Dynamic page placement policies for NUMA (nonuniform memory access time)shared-memory architectures are explored using two approaches that complement eachother in important ways. The authors measure the performance of parallel programsrunning on the experimental DUnX operating system kernel for the BBN GP1000, whichsupports a highly parameterized dynamic page placement policy. They also develop andapply an analytic model of memory system performance of a local/remote NUMAarchitecture based on approximate mean-value analysis techniques. The model isvalidated against experimental data obtained with DUnX while running a syntheticworkload. The results of this validation show that, in general, model predictions are quitegood. Experiments investigating the effectiveness of dynamic page-placement and, inparticular, dynamic multiple-copy page placement the cost of replication/coherency faulterrors, and the cost of errors in deciding whether a page should move or be remotelyreferenced are described.

References

[1]
{1} S. Adve and M. Hill, "Weak ordering--A new definition," in Proc. 17th Annu. Int. Symp. Comput. Architecture, May 1990, pp. 2-14.
[2]
{2} S. V. Adve, V. S. Adve, M. D. Hill, and M. K. Vernon, "Comparison of hardware and software cache coherence schemes," in Proc. 18th Annu. Int. Symp. Comput. Architecture, Toronto, Ont., Canada, May 1991, pp. 298-308.
[3]
{3} BBN, Inside the Butterfly GP1000, Cambridge, MA, Oct. 1988.
[4]
{4} D. Black, "Scheduling and resource management techniques for multiprocessors," Ph.D. dissertation, Carnegie-Mellon Univ., July 1990.
[5]
{5} D. Black, A. Gupta, and W-D. Weber, "Competitive management of distributed shared memory," in Spring COMPCON 89 Dig. Papers, 1989, pp. 184-190.
[6]
{6} D. Black and D. Sleator, "Competitive algorithms for replication and migration problems," Tech. Rep. CMU-CS-89-201, Carnegie-Mellon Univ., Nov. 1989.
[7]
{7} W. Bolosky, M. Scott, and R. Fitzgerald, "Simple but effective techniques for NUMA memory management," in Proc. Twelfth ACM Symp. Oper. Syst. Principles, Dec. 1989, pp. 19-31.
[8]
{8} W. Bolosky, M. Scott, R. Fitzgerald, R. Fowler, and A. Cox, "NUMA policies and their relationship to memory architecture," in Proc. Architectural Support for Programming Languages and Oper. Syst., Apr. 1991, pp. 212-221.
[9]
{9} M-C. Chiang and G. S. Sohi, "Experience with mean value analysis models for evaluating shared bus throughput-oriented multiprocessors," in Proc. 1991 ACM Sigmetrics Conf. Measurement and Modeling of Comput. Syst., San Diego, CA, May 1991, pp. 90-100.
[10]
{10} A. L. Cox and R. J. Fowler, "The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with Platinum," in Proc. Twelfth ACM Symp. Oper. Syst. Principles, Dec. 1989, pp. 32-43.
[11]
{11} M. Dubois and C. Scheurich, "Memory access dependencies in shared-memory multiprocessors," IEEE Trans. Software Eng., vol. 16, no. 6, pp. 660-673, June 1990.
[12]
{12} K. Gharachorloo, A. Gupta, and J. Hennessy, "Performance evaluation of memory consistency models for shared-memory multiprocessors," in Proc. Architectural Support for Programming Languages and Oper. Syst., Apr. 1991, pp. 245-257.
[13]
{13} K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessey, "Memory consistency and event ordering in scalable shared-memory multiprocessors," in Proc. 17th Annu. Int. Symp. Comput. Architecture, May 1990, pp. 15-26.
[14]
{14} M. Holliday, "Page table management in local/remote architectures," in Proc. ACM SIGARCH Int. Conf. Supercomput., July 1988, pp. 1-8.
[15]
{15} M. Holliday, "Reference history, page size, and migration daemons in local/remote architectures," in Proc. Architectural Support for Programming Languages and Oper. Syst., Apr. 1989, pp. 104-112.
[16]
{16} R. P. LaRowe Jr., M. A. Holliday, and C. S. Ellis, "An analysis of dynamic page placement on a NUMA multiprocessor," in Proc 1992 ACM Sigmetrics and Performance '92 Conf. Measurement and Modeling of Comput. Syst., Newport, RI, May 1992, pp. 23-34.
[17]
{17} R. P. LaRowe, Jr., "Page placement for nonuniform memory access time (NUMA) shared memory multiprocessors," Ph.D. dissertation, Duke Univ., Mar. 1991.
[18]
{18} R. P. LaRowe, Jr. and C. S. Ellis, "Experimental comparison of memory management policies for NUMA multiprocessors," ACM Trans. Comput. Syst., vol. 9, no. 4, pp. 319-363, Nov. 1991.
[19]
{19} R. P. LaRowe, Jr. and C. S. Ellis, "OS experimentation and a user community coexist under the DUnX kernel," in Proc. 1991 Int. Conf. Parallel Processing, Aug. 1991, pp. II-158-166.
[20]
{20} R. P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan, "The robustness of NUMA memory management," in Proc. Thirteenth ACM Symp. Oper. Syst. Principles, Oct. 1991, pp. 137-151.
[21]
{21} R. P. LaRowe Jr., J. T. Wilkes, and C. S. Ellis, "Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor," in Proc. Symp. Principles and Practice of Parallel Programming, Apr. 1991, pp. 122-132.
[22]
{22} S. T. Leutenegger and M. K. Vernon, "A mean-value performance analysis of a new multiprocessor architecture," in Proc. 1988 ACM Sigmetrics Conf. Measurement and Modeling of Comput. Syst., May 1988, pp. 167-176.
[23]
{23} K. Li and P. Hudak, "Memory coherence in shared virtual memory systems," in Proc. Fifth ACM Symp. Principles of Distributed Comput., 1986.
[24]
{24} K. Li and R. Schaefer, "A hypercube shared virtual memory system," in Proc. 1989 Int. Conf. Parallel Processing, Aug. 1989, pp. I-125-132.
[25]
{25} J. Ramanathan and L. M. Ni, "Critical factors in NUMA memory management," in Proc. Eleventh Int. Conf. Distributed Comput. Syst., May 1991, pp. 500-507.
[26]
{26} C. Scheurich and M. Dubois, "Dynamic page migration in multiprocessors with distributed global memory," in Proc. Eighth Int. Conf. Distributed Comput. Syst., June 1988, pp. 162-169.
[27]
{27} J. Torrellas, J. Hennessy, and T. Weil, "Analysis of critical architectural and program parameters in a hierarchical shared-memory multiprocessor," in Proc. 1990 ACM Sigmetrics Conf. Measurement and Modeling of Comput. Syst., 1990, pp. 163-172.
[28]
{28} M. K. Vernon, E. D. Lazowska, and J. Zahorjan, "An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols," in Proc. 15th Annu. Int. Symp. Comput. Architecture, May 1988, pp. 308-317.
[29]
{29} M. K. Vernon, R. Jog, and G. S. Sohi, "Performance analysis of hierarchical cache-consistent multiprocessors," Perform. Eval., vol. 9, pp. 287-302, 1989.

Cited By

View all
  • (2024)Toast: A Heterogeneous Memory Management SystemProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676944(53-65)Online publication date: 14-Oct-2024
  • (2021)Improving GHC Haskell NUMA profilingProceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3471873.3472974(1-12)Online publication date: 22-Aug-2021
  • (2016)Investigating the Performance of Hardware Transactions on a Multi-Socket MachineProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935796(121-132)Online publication date: 11-Jul-2016
  • Show More Cited By
  1. Evaluation of NUMA Memory Management Through Modeling and Measurements

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Toast: A Heterogeneous Memory Management SystemProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676944(53-65)Online publication date: 14-Oct-2024
    • (2021)Improving GHC Haskell NUMA profilingProceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3471873.3472974(1-12)Online publication date: 22-Aug-2021
    • (2016)Investigating the Performance of Hardware Transactions on a Multi-Socket MachineProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935796(121-132)Online publication date: 11-Jul-2016
    • (2015)Page Placement Strategies for GPUs within Heterogeneous Memory SystemsACM SIGARCH Computer Architecture News10.1145/2786763.269438143:1(607-618)Online publication date: 14-Mar-2015
    • (2015)Page Placement Strategies for GPUs within Heterogeneous Memory SystemsACM SIGPLAN Notices10.1145/2775054.269438150:4(607-618)Online publication date: 14-Mar-2015
    • (2015)Page Placement Strategies for GPUs within Heterogeneous Memory SystemsProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694381(607-618)Online publication date: 14-Mar-2015
    • (2014)ShufflingProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628074(289-300)Online publication date: 24-Aug-2014
    • (2014)Optimizing memory access traffic via runtime thread migration for on-chip distributed memory systemsThe Journal of Supercomputing10.1007/s11227-014-1240-869:3(1491-1516)Online publication date: 1-Sep-2014
    • (2013)Traffic managementACM SIGPLAN Notices10.1145/2499368.245115748:4(381-394)Online publication date: 16-Mar-2013
    • (2013)Traffic managementACM SIGARCH Computer Architecture News10.1145/2490301.245115741:1(381-394)Online publication date: 16-Mar-2013
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media