More Web Proxy on the site http://driver.im/

research-article

Evaluation of NUMA Memory Management Through Modeling and Measurements

Authors:

R. P. LaRowe, Jr.,

M. A. HollidayAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 3, Issue 6

Pages 686 - 701

https://doi.org/10.1109/71.180624

Published: 01 November 1992 Publication History

Abstract

Dynamic page placement policies for NUMA (nonuniform memory access time)shared-memory architectures are explored using two approaches that complement eachother in important ways. The authors measure the performance of parallel programsrunning on the experimental DUnX operating system kernel for the BBN GP1000, whichsupports a highly parameterized dynamic page placement policy. They also develop andapply an analytic model of memory system performance of a local/remote NUMAarchitecture based on approximate mean-value analysis techniques. The model isvalidated against experimental data obtained with DUnX while running a syntheticworkload. The results of this validation show that, in general, model predictions are quitegood. Experiments investigating the effectiveness of dynamic page-placement and, inparticular, dynamic multiple-copy page placement the cost of replication/coherency faulterrors, and the cost of errors in deciding whether a page should move or be remotelyreferenced are described.

References

[1]

{1} S. Adve and M. Hill, "Weak ordering--A new definition," in Proc. 17th Annu. Int. Symp. Comput. Architecture, May 1990, pp. 2-14.

Digital Library

[2]

{2} S. V. Adve, V. S. Adve, M. D. Hill, and M. K. Vernon, "Comparison of hardware and software cache coherence schemes," in Proc. 18th Annu. Int. Symp. Comput. Architecture, Toronto, Ont., Canada, May 1991, pp. 298-308.

Digital Library

[3]

{3} BBN, Inside the Butterfly GP1000, Cambridge, MA, Oct. 1988.

[4]

{4} D. Black, "Scheduling and resource management techniques for multiprocessors," Ph.D. dissertation, Carnegie-Mellon Univ., July 1990.

Digital Library

[5]

{5} D. Black, A. Gupta, and W-D. Weber, "Competitive management of distributed shared memory," in Spring COMPCON 89 Dig. Papers, 1989, pp. 184-190.

[6]

{6} D. Black and D. Sleator, "Competitive algorithms for replication and migration problems," Tech. Rep. CMU-CS-89-201, Carnegie-Mellon Univ., Nov. 1989.

[7]

{7} W. Bolosky, M. Scott, and R. Fitzgerald, "Simple but effective techniques for NUMA memory management," in Proc. Twelfth ACM Symp. Oper. Syst. Principles, Dec. 1989, pp. 19-31.

Digital Library

[8]

{8} W. Bolosky, M. Scott, R. Fitzgerald, R. Fowler, and A. Cox, "NUMA policies and their relationship to memory architecture," in Proc. Architectural Support for Programming Languages and Oper. Syst., Apr. 1991, pp. 212-221.

Digital Library

[9]

{9} M-C. Chiang and G. S. Sohi, "Experience with mean value analysis models for evaluating shared bus throughput-oriented multiprocessors," in Proc. 1991 ACM Sigmetrics Conf. Measurement and Modeling of Comput. Syst., San Diego, CA, May 1991, pp. 90-100.

Digital Library

[10]

{10} A. L. Cox and R. J. Fowler, "The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with Platinum," in Proc. Twelfth ACM Symp. Oper. Syst. Principles, Dec. 1989, pp. 32-43.

Digital Library

[11]

{11} M. Dubois and C. Scheurich, "Memory access dependencies in shared-memory multiprocessors," IEEE Trans. Software Eng., vol. 16, no. 6, pp. 660-673, June 1990.

Digital Library

[12]

{12} K. Gharachorloo, A. Gupta, and J. Hennessy, "Performance evaluation of memory consistency models for shared-memory multiprocessors," in Proc. Architectural Support for Programming Languages and Oper. Syst., Apr. 1991, pp. 245-257.

Digital Library

[13]

{13} K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessey, "Memory consistency and event ordering in scalable shared-memory multiprocessors," in Proc. 17th Annu. Int. Symp. Comput. Architecture, May 1990, pp. 15-26.

Digital Library

[14]

{14} M. Holliday, "Page table management in local/remote architectures," in Proc. ACM SIGARCH Int. Conf. Supercomput., July 1988, pp. 1-8.

Digital Library

[15]

{15} M. Holliday, "Reference history, page size, and migration daemons in local/remote architectures," in Proc. Architectural Support for Programming Languages and Oper. Syst., Apr. 1989, pp. 104-112.

Digital Library

[16]

{16} R. P. LaRowe Jr., M. A. Holliday, and C. S. Ellis, "An analysis of dynamic page placement on a NUMA multiprocessor," in Proc 1992 ACM Sigmetrics and Performance '92 Conf. Measurement and Modeling of Comput. Syst., Newport, RI, May 1992, pp. 23-34.

Digital Library

[17]

{17} R. P. LaRowe, Jr., "Page placement for nonuniform memory access time (NUMA) shared memory multiprocessors," Ph.D. dissertation, Duke Univ., Mar. 1991.

Digital Library

[18]

{18} R. P. LaRowe, Jr. and C. S. Ellis, "Experimental comparison of memory management policies for NUMA multiprocessors," ACM Trans. Comput. Syst., vol. 9, no. 4, pp. 319-363, Nov. 1991.

Digital Library

[19]

{19} R. P. LaRowe, Jr. and C. S. Ellis, "OS experimentation and a user community coexist under the DUnX kernel," in Proc. 1991 Int. Conf. Parallel Processing, Aug. 1991, pp. II-158-166.

[20]

{20} R. P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan, "The robustness of NUMA memory management," in Proc. Thirteenth ACM Symp. Oper. Syst. Principles, Oct. 1991, pp. 137-151.

Digital Library

[21]

{21} R. P. LaRowe Jr., J. T. Wilkes, and C. S. Ellis, "Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor," in Proc. Symp. Principles and Practice of Parallel Programming, Apr. 1991, pp. 122-132.

Digital Library

[22]

{22} S. T. Leutenegger and M. K. Vernon, "A mean-value performance analysis of a new multiprocessor architecture," in Proc. 1988 ACM Sigmetrics Conf. Measurement and Modeling of Comput. Syst., May 1988, pp. 167-176.

Digital Library

[23]

{23} K. Li and P. Hudak, "Memory coherence in shared virtual memory systems," in Proc. Fifth ACM Symp. Principles of Distributed Comput., 1986.

Digital Library

[24]

{24} K. Li and R. Schaefer, "A hypercube shared virtual memory system," in Proc. 1989 Int. Conf. Parallel Processing, Aug. 1989, pp. I-125-132.

[25]

{25} J. Ramanathan and L. M. Ni, "Critical factors in NUMA memory management," in Proc. Eleventh Int. Conf. Distributed Comput. Syst., May 1991, pp. 500-507.

[26]

{26} C. Scheurich and M. Dubois, "Dynamic page migration in multiprocessors with distributed global memory," in Proc. Eighth Int. Conf. Distributed Comput. Syst., June 1988, pp. 162-169.

[27]

{27} J. Torrellas, J. Hennessy, and T. Weil, "Analysis of critical architectural and program parameters in a hierarchical shared-memory multiprocessor," in Proc. 1990 ACM Sigmetrics Conf. Measurement and Modeling of Comput. Syst., 1990, pp. 163-172.

Digital Library

[28]

{28} M. K. Vernon, E. D. Lazowska, and J. Zahorjan, "An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols," in Proc. 15th Annu. Int. Symp. Comput. Architecture, May 1988, pp. 308-317.

Digital Library

[29]

{29} M. K. Vernon, R. Jog, and G. S. Sohi, "Performance analysis of hierarchical cache-consistent multiprocessors," Perform. Eval., vol. 9, pp. 287-302, 1989.

Digital Library

Cited By

Bailleu MStavrakakis DRocha RChakraborty SGarg DBhatotia P(2024)Toast: A Heterogeneous Memory Management SystemProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676944(53-65)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676944
MacGregor RTrinder PLoidl HKeller GHenriksen T(2021)Improving GHC Haskell NUMA profilingProceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3471873.3472974(1-12)Online publication date: 22-Aug-2021
https://dl.acm.org/doi/10.1145/3471873.3472974
Brown TKogan ALev YLuchangco VScheideler CGilbert S(2016)Investigating the Performance of Hardware Transactions on a Multi-Socket MachineProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935796(121-132)Online publication date: 11-Jul-2016
https://dl.acm.org/doi/10.1145/2935764.2935796
Show More Cited By

Evaluation of NUMA Memory Management Through Modeling and Measurements
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

A model for parallel simulation of distributed shared memory
MASCOTS '96: Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems

We present an execution model for parallel simulation of a distributed shared memory architecture. The model captures the processor-memory interaction and abstracts the memory subsystem. Using this model we show how parallel, on-line, partially-ordered ...
Reducing PE/Memory Traffic in Multiprocessors by the Difference Coding of Memory Addresses

A method of reducing the volume of data flowing through the network in a shared memory parallel computer (multiprocessor) is described. The reduction is achieved by difference coding the memory addresses in messages sent between processing elements (PE'...
MFTL: A Design and Implementation for MLC Flash Memory Storage Systems

NAND flash memory has gained its popularity in a variety of applications as a storage medium due to its low power consumption, nonvolatility, high performance, physical stability, and portability. In particular, Multi-Level Cell (MLC) flash memory, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 3, Issue 6

November 1992

128 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © Copyright © 1992 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 November 1992

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bailleu MStavrakakis DRocha RChakraborty SGarg DBhatotia P(2024)Toast: A Heterogeneous Memory Management SystemProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676944(53-65)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676944
MacGregor RTrinder PLoidl HKeller GHenriksen T(2021)Improving GHC Haskell NUMA profilingProceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3471873.3472974(1-12)Online publication date: 22-Aug-2021
https://dl.acm.org/doi/10.1145/3471873.3472974
Brown TKogan ALev YLuchangco VScheideler CGilbert S(2016)Investigating the Performance of Hardware Transactions on a Multi-Socket MachineProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935796(121-132)Online publication date: 11-Jul-2016
https://dl.acm.org/doi/10.1145/2935764.2935796
Agarwal NNellans DStephenson MO'Connor MKeckler S(2015)Page Placement Strategies for GPUs within Heterogeneous Memory SystemsACM SIGARCH Computer Architecture News10.1145/2786763.269438143:1(607-618)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2786763.2694381
Agarwal NNellans DStephenson MO'Connor MKeckler S(2015)Page Placement Strategies for GPUs within Heterogeneous Memory SystemsACM SIGPLAN Notices10.1145/2775054.269438150:4(607-618)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2775054.2694381
Agarwal NNellans DStephenson MO'Connor MKeckler SOzturk OEbcioglu KDwarkadas S(2015)Page Placement Strategies for GPUs within Heterogeneous Memory SystemsProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694381(607-618)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2694344.2694381
Pusukuri KGupta RBhuyan LAmaral JTorrellas J(2014)ShufflingProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628074(289-300)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1145/2628071.2628074
Fu WChen TWang CLiu L(2014)Optimizing memory access traffic via runtime thread migration for on-chip distributed memory systemsThe Journal of Supercomputing10.1007/s11227-014-1240-869:3(1491-1516)Online publication date: 1-Sep-2014
https://dl.acm.org/doi/10.1007/s11227-014-1240-8
Dashti MFedorova AFunston JGaud FLachaize RLepers BQuema VRoth M(2013)Traffic managementACM SIGPLAN Notices10.1145/2499368.245115748:4(381-394)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451157
Dashti MFedorova AFunston JGaud FLachaize RLepers BQuema VRoth M(2013)Traffic managementACM SIGARCH Computer Architecture News10.1145/2490301.245115741:1(381-394)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451157
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents