[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Simulation of multiprocessors: accuracy and performance
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
Order Number:UMI Order No. GAX93-26475
Reflects downloads up to 17 Jan 2025Bibliometrics
Skip Abstract Section
Abstract

Simulation plays an important role in analyzing the performance of multiprocessor memory systems, but detailed simulation can consume enormous amounts of CPU time. This dissertation investigates three techniques that can reduce the time required to simulate a multiprocessor memory system: trace-driven simulation, simplifying approximations in memory-system simulators, and parallel simulation.

Trace-driven simulation allows simulation of multiple target systems from a single interpretation of a workload, recorded in a trace. It is often assumed that the trace is system-independent, but this assumption is violated by parallel workloads that have race-conditions or utilize dynamic scheduling. This dissertation measures and analyzes the impact of system-dependent traces on the accuracy of trace-driven simulations of multiprocessors.

Two such simplifying approximations for simulations of multiprocessor memory systems were investigated. Precise simulation requires that events originating from different parts of the system be simulated in the correct order. The investigation revealed that relaxing this constraint can double or triple simulator performance with little loss of accuracy. Precise simulation also requires simulating references to private data, which have little effect on performance. By neglecting private references, simulation was accelerated by up to 80 percent with little loss of accuracy.

To further increase speed, an existing simulator was parallelized to run on the DASH multiprocessor. The challenge was to preserve accuracy without serializing the simulation. Two approaches were evaluated: (1) ordering synchronization operations only and (2) relaxing the order of all events by a small amount. Both approaches yielded parallel speedups, but performance was limited by load imbalance. Overall, performance was comparable to the performance of hardware-assisted parallel simulators such as the Wisconsin Wind Tunnel.

Cited By

  1. Petit S, Sahuquillo J and Pont A Characterizing parallel workloads to reduce multiple writer overhead in shared virtual memory systems Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing, (261-268)
  2. ACM
    Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J and Heinrich M (2000). FLASH vs. (Simulated) FLASH, ACM SIGOPS Operating Systems Review, 34:5, (49-58), Online publication date: 1-Dec-2000.
  3. ACM
    Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J and Heinrich M (2000). FLASH vs. (Simulated) FLASH, ACM SIGARCH Computer Architecture News, 28:5, (49-58), Online publication date: 1-Dec-2000.
  4. ACM
    Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J and Heinrich M FLASH vs. (Simulated) FLASH Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, (49-58)
  5. ACM
    Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J and Heinrich M (2000). FLASH vs. (simulated) FLASH, ACM SIGPLAN Notices, 35:11, (49-58), Online publication date: 1-Nov-2000.
  6. Zhang Z, Cintra M and Torrellas J (1999). Excel-NUMA, IEEE Transactions on Computers, 48:2, (256-264), Online publication date: 1-Feb-1999.
  7. ACM
    Koufaty D and Torrellas J Comparing data forwarding and prefetching for communication-induced misses in shared-memory MPs Proceedings of the 12th international conference on Supercomputing, (53-60)
  8. ACM
    Olukotun K, Heinrich M and Ofelt D Digital system simulation Proceedings of the 35th annual Design Automation Conference, (658-663)
  9. Magnusson P Efficient instruction cache simulation and execution profiling with a threaded-code interpreter Proceedings of the 29th conference on Winter simulation, (1093-1100)
  10. ACM
    Richard J and Singh J Parallel hierarchical computation of specular radiosity Proceedings of the IEEE symposium on Parallel rendering, (59-69)
  11. Choi J and Park K Hybrid Full Map Directory Scheme for Distributed Shared Memory Multiprocessors Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  12. Torrie E, Martonosi M, Tseng C and Hall M (1996). Characterizing the Memory Behavior of Compiler-Parallelized Applications, IEEE Transactions on Parallel and Distributed Systems, 7:12, (1224-1237), Online publication date: 1-Dec-1996.
  13. Kozlov A and Singh J (2019). Parallel Implementations of Probabilistic Inference, Computer, 29:12, (33-40), Online publication date: 1-Dec-1996.
  14. ACM
    Legedza U and Weihl W (1996). Reducing synchronization overhead in parallel simulation, ACM SIGSIM Simulation Digest, 26:1, (86-95), Online publication date: 1-Jul-1996.
  15. Legedza U and Weihl W Reducing synchronization overhead in parallel simulation Proceedings of the tenth workshop on Parallel and distributed simulation, (86-95)
  16. ACM
    Martonosi M, Ofelt D and Heinrich M Integrating performance monitoring and communication in parallel computers Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (138-147)
  17. ACM
    Martonosi M, Ofelt D and Heinrich M (1996). Integrating performance monitoring and communication in parallel computers, ACM SIGMETRICS Performance Evaluation Review, 24:1, (138-147), Online publication date: 15-May-1996.
  18. ACM
    Holt C, Singh J and Hennessy J Application and architectural bottlenecks in large scale distributed shared memory machines Proceedings of the 23rd annual international symposium on Computer architecture, (134-145)
  19. ACM
    Holt C, Singh J and Hennessy J (1996). Application and architectural bottlenecks in large scale distributed shared memory machines, ACM SIGARCH Computer Architecture News, 24:2, (134-145), Online publication date: 1-May-1996.
  20. Park D and Saavedra R Trojan Proceedings of the 29th Annual Simulation Symposium (SS '96)
  21. Raynaud A, Zhang Z and Torrellas J Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
  22. Rajamony R and Cox A A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
  23. ACM
    Martonosi M, Clark D and Mesarina M The SHRIMP performance monitor Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, (61-69)
  24. ACM
    Erlichson A, Nayfeh B, Singh J and Olukotun K The benefits of clustering in shared address space multiprocessors Proceedings of the 1995 ACM/IEEE conference on Supercomputing, (60-es)
  25. ACM
    Zhang Z and Torrellas J Speeding up irregular applications in shared-memory multiprocessors Proceedings of the 22nd annual international symposium on Computer architecture, (188-199)
  26. ACM
    Woo S, Ohara M, Torrie E, Singh J and Gupta A The SPLASH-2 programs Proceedings of the 22nd annual international symposium on Computer architecture, (24-36)
  27. Torrie E, Tseng C, Martonosi M and Hall M Evaluating the impact of advanced memory systems on compiler-parallelized codes Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, (204-213)
  28. ACM
    Zhang Z and Torrellas J (1995). Speeding up irregular applications in shared-memory multiprocessors, ACM SIGARCH Computer Architecture News, 23:2, (188-199), Online publication date: 1-May-1995.
  29. ACM
    Woo S, Ohara M, Torrie E, Singh J and Gupta A (1995). The SPLASH-2 programs, ACM SIGARCH Computer Architecture News, 23:2, (24-36), Online publication date: 1-May-1995.
  30. ACM
    Bedichek R Talisman Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, (14-24)
  31. ACM
    Bedichek R (1995). Talisman, ACM SIGMETRICS Performance Evaluation Review, 23:1, (14-24), Online publication date: 1-May-1995.
  32. ACM
    Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGOPS Operating Systems Review, 28:5, (274-285), Online publication date: 1-Dec-1994.
  33. ACM
    Woo S, Singh J and Hennessy J (1994). The performance advantages of integrating block data transfer in cache-coherent multiprocessors, ACM SIGOPS Operating Systems Review, 28:5, (219-229), Online publication date: 1-Dec-1994.
  34. ACM
    Heinlein J, Gharachorloo K, Dresser S and Gupta A (1994). Integration of message passing and shared memory in the Stanford FLASH multiprocessor, ACM SIGOPS Operating Systems Review, 28:5, (38-50), Online publication date: 1-Dec-1994.
  35. ACM
    Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J The performance impact of flexibility in the Stanford FLASH multiprocessor Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (274-285)
  36. ACM
    Woo S, Singh J and Hennessy J The performance advantages of integrating block data transfer in cache-coherent multiprocessors Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (219-229)
  37. ACM
    Heinlein J, Gharachorloo K, Dresser S and Gupta A Integration of message passing and shared memory in the Stanford FLASH multiprocessor Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (38-50)
  38. ACM
    Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, 29:11, (274-285), Online publication date: 1-Nov-1994.
  39. ACM
    Woo S, Singh J and Hennessy J (1994). The performance advantages of integrating block data transfer in cache-coherent multiprocessors, ACM SIGPLAN Notices, 29:11, (219-229), Online publication date: 1-Nov-1994.
  40. ACM
    Heinlein J, Gharachorloo K, Dresser S and Gupta A (1994). Integration of message passing and shared memory in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, 29:11, (38-50), Online publication date: 1-Nov-1994.
  41. Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J The Stanford FLASH multiprocessor Proceedings of the 21st annual international symposium on Computer architecture, (302-313)
  42. Nayfeh B and Olukotun K Exploring the design space for a shared-cache multiprocessor Proceedings of the 21st annual international symposium on Computer architecture, (166-175)
  43. ACM
    Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, 22:2, (302-313), Online publication date: 1-Apr-1994.
  44. ACM
    Nayfeh B and Olukotun K (1994). Exploring the design space for a shared-cache multiprocessor, ACM SIGARCH Computer Architecture News, 22:2, (166-175), Online publication date: 1-Apr-1994.
  45. ACM
    Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, 22:2, (302-313), Online publication date: 1-Apr-1994.
Contributors
  • Stanford University
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations