[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

SM-prof: a tool to visualise and find cache coherence performance bottlenecks in multiprocessor programs

Published: 01 May 1995 Publication History

Abstract

Cache misses due to coherence actions are often the major source for performance degradation in cache coherent multiprocessors. It is often difficult for the programmer to take cache coherence into account when writing the program since the resulting access pattern is not apparent until the program is executed.SM-prof is a performance analysis tool that addresses this problem by visualising the shared data access pattern in a diagram with links to the source code lines causing performance degrading access patterns. The execution of a program is divided into time slots and each data block is classified based on the accesses made to the block during a time slot. This enables the programmer to follow the execution over time and it is possible to track the exact position responsible for accesses causing many cache misses related to coherence actions.Matrix multiplication and the MP3D application from SPLASH are used to illustrate the use of SM-prof. For MP3D, SM-prof revealed performance limitations that resulted in a performance improvement of over 75%.The current implementation is based on program-driven simulation in order to achieve non-intrusive profiling. If a small perturbation of the program execution is acceptable, it is also possible to use software tracing techniques given that a data address can be related to the originating instruction.

References

[1]
A. Agarwal, R. Simoni, J. Hennessy and M. Horowitz., An Evaluation of Directory Schemes for Cache Coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pp. 280-289. 1988.
[2]
T.E. Anderson and E. D. Lazowska, Quartz: A Tool for Tuning Parallel Program Performance. In Proceedings of the 1990 Conference on Measurement & Modeling of Computer Systems (S1GMETRICS), pp. 115-125, May 1990.
[3]
J. Boyle, R. Butler, T. Disz, B, Glickfield, E. Lusk, R. Overbeek, J. Patterson and R. Stevens,. Portable Programs for Parallel Processors. Holt, Rinehart and Winston Inc. 1987.
[4]
M. Brorsson, and P. Stenstrrm. Visualising Sharing Behaviour in relation to Shared Memory Management. in Proceedings of the 1992 International Conference on Parallel and Distributed Systems, pages 528-536, Hsinchu, Taiwan, December 1992
[5]
M. Brorsson and P. Stenstrrm. Visualisation of Cache Coherence Bottlenecks in Shared Memory Multiprocessor Applications. IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pp. 32-36, Fall 1993.
[6]
M. Brorsson and P. Stenstr/Sm, Modelling Accesses to Migratory and Producer-Consumer Characterised Data in a Shared- Memory Multiprocessor, In Proceedings of the 6th IEEE Symposium on Parallel and Distributed Processing, pp. 612- 619, Dallas, TX, October 1994.
[7]
M. Brorsson, F. Dahlgren, H. Nilsson and P. Stenstrt~m. The CacheMire Test Bench u A Flexible and Effective Approach for Simulation of Multiprocessors. In Proceedings of the 26th Annual Simulation Symposium, pp. 41-49, Washington DC, March 1993.
[8]
S.J. Eggers, D. R. Keppel, E. J. Koldinger and H. M. Levy, Techniques for Efficient Inline Tracing on a Shared-Memory Multiprocessor, In Proceedings of the 1990 Conference on Measurement & Modeling of Computer Systems (SIGMET- RICS), pp. 37-47, May 1990.
[9]
A.J. Goldberg, and J. L. Hennessy, Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessors Applications, IEEE Transactions on Parallel and Distributed Systems, 4(1): 28-40, January 1993.
[10]
H. A. Goosen, A. R. Karlin, D. Cheriton and D. Polzin, Chiron parallel program performance visualization system, Computer-Aided Design, Vol. 26, No. 12, pp. 899-906, December 1994.
[11]
S. L. Graham, P. B. Kessler and M. K. McKusick, An Execution Profiler for Modular Programs, Software- Practice and Experience, 13(8): 671-685, August 1983.
[12]
D. Kimelman and T. Ngo, The RP3 Program ~'sualization Environment. IBM Journal of Research and Development, 35(6), November 1991.
[13]
M. S. Lain, E. E. Rothberg and M. E. Wolf, The Cache Performance and Optimizations of Blocked Algorithms, In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63-74, April, 1991.
[14]
A. R. Lebeck and D. A. Wood. Cache Profiling and the SPEC Benchmarks: A Case Study. IEEE Computer Magazine, pp. 15-26, October 1994.
[15]
T. Lehr, Z. Segall, D. E Vrsalovic, E. Caplan, A. L. Chung and C. E. Fineman, Visualizing Performance Debugging, IEEE Computer, pp. 38-51, October 1989.
[16]
M. Martonosi, A. Gupta and T. Anderson, MemSpy: Analyzing Memory System Bottlenecks in Programs, Performance Evaluation Review. 20(1): 1-12, June, 1992. Proceedings of 1992 Conference on Measurement & Modeling of Computer Systems (SIGMETRICS and Performance'92).
[17]
M. Martonosi. Analyzing and Tuning Memory Performance in Sequential and Parallel Programs. PhD thesis, Department of Electrical Engineering, Stanford University, January 1994.
[18]
J. P. Singh, A. Gupta and M. Levoy. Parallel Visualization Algorithms: Performance and Architectural Implications, IEEE Computer Magazine, pp. 45-55, July 1994.
[19]
J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanfi)rd parallel applications for shared-memory. Computer Architecture News, 20(1):5-44, March 1992.
[20]
C. B. Stunkel, B. Janssens and W. K. Fuchs, Collecting Address Traces from Parallel Computers, In Proceedings of the 24th Hawaff International Conference on System Sciences, vol 1, pp. 373-383, January 1991.

Cited By

View all
  • (2005)Cautious, machine-independent performance tuning for shared-memory multiprocessorsEuro-Par'96 Parallel Processing10.1007/3-540-61626-8_13(106-113)Online publication date: 8-Jun-2005
  • (2011)DeFTACM Transactions on Architecture and Code Optimization10.1145/1970386.19703898:2(1-27)Online publication date: 22-Jun-2011
  • (2007)Source-Code-Correlated Cache Coherence Characterization of OpenMP BenchmarksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.105818:6(818-834)Online publication date: 1-Jun-2007
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 23, Issue 1
May 1995
323 pages
ISSN:0163-5999
DOI:10.1145/223586
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMETRICS '95/PERFORMANCE '95: Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
    May 1995
    340 pages
    ISBN:0897916956
    DOI:10.1145/223587
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1995
Published in SIGMETRICS Volume 23, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)6
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2005)Cautious, machine-independent performance tuning for shared-memory multiprocessorsEuro-Par'96 Parallel Processing10.1007/3-540-61626-8_13(106-113)Online publication date: 8-Jun-2005
  • (2011)DeFTACM Transactions on Architecture and Code Optimization10.1145/1970386.19703898:2(1-27)Online publication date: 22-Jun-2011
  • (2007)Source-Code-Correlated Cache Coherence Characterization of OpenMP BenchmarksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.105818:6(818-834)Online publication date: 1-Jun-2007
  • (2005)Cautious, machine-independent performance tuning for shared-memory multiprocessorsEuro-Par'96 Parallel Processing10.1007/3-540-61626-8_13(106-113)Online publication date: 8-Jun-2005
  • (2004)Detailed cache coherence characterization for OpenMP benchmarksProceedings of the 18th annual international conference on Supercomputing10.1145/1006209.1006250(287-297)Online publication date: 26-Jun-2004
  • (1999)Performance Tuning Software DSM Applications using VisualisationThe Journal of Supercomputing10.1023/A:100800500305413:3(249-265)Online publication date: 1-May-1999
  • (1998)Visualisation for performance tuning of DVSM applicationsProceedings of the Thirty-First Hawaii International Conference on System Sciences10.1109/HICSS.1998.649249(532-541)Online publication date: 1998
  • (1997)Analytical Prediction of Performance for Cache Coherence ProtocolsIEEE Transactions on Computers10.1109/12.64429146:11(1155-1173)Online publication date: 1-Nov-1997
  • (1996)A model for parallel simulation of distributed shared memoryProceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems10.5555/823448.823570Online publication date: 1-Feb-1996
  • (1996)A model for parallel simulation of distributed shared memoryProceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems10.1109/MASCOT.1996.501014(179-184)Online publication date: 1996

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media