article

Using Pin as a memory reference generator for multiprocessor simulation

Authors:

Collin McCurdy,

Charles FischerAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 33, Issue 5

Pages 39 - 44

https://doi.org/10.1145/1127577.1127586

Published: 01 December 2005 Publication History

Get Access

Abstract

In this paper we describe how we have used Pin to generate a multithreaded reference stream for simulation of a multiprocessor on a uniprocessor. We have taken special care to model as accurately as possible the effects of cache coherence protocol state, and lock and barrier synchronization on the performance of multithreaded applications running on multiprocessor hardware.We first describe a simplified version of the algorithm, which uses semaphores to synchronize instrumented application threads and the simulator. We then describe modifications to that algorithm to model the microarchitectural features of the Itanium2 that affect the timing of memory reference issue. An experimental evaluation determines that, while our methods enable accurate simulation, the use of semaphores has negative impact on the performance of the simulator.

References

[1]

E. Dijkstra. Cooperating Sequential Processes. In F. Genuys, editor, Programming Languages, pages 43--112. Academic Press, 1968.

Google Scholar

[2]

H. Franke, R. Russell, and M. Kirkwood. Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux. In Proceedings of the Ottawa Linux Symposium, 2002.

Google Scholar

[3]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Programming Language Design and Implementation, Chicago, IL, June 2005.

Digital Library

Google Scholar

[4]

C. McNairy and D. Soltis. Itanium2 Processor Microarchitecture. IEEE Micro, Mar-April 2003.

Digital Library

Google Scholar

[5]

D. Sorin, M. Plakal, A. Condon, M. Hill, M. Martin, and D. Wood. Specifying and Verifying a Broadcast and a Multi-cast Snooping Cache Coherence Protocol. IEEE Transactions on Parallel and Distributed Systems, 2002.

Digital Library

Google Scholar

Cited By

View all

Barai AArafa YBadawy AChennupati GSanthi NEidenbenz S(2022)PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modelingThe Journal of Supercomputing10.1007/s11227-021-03949-478:2(2354-2385)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s11227-021-03949-4
Navarro-Torres AAlastruey-Benedé JIbáñez-Marín PViñals-Yúfera V(2019)Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SPPLOS ONE10.1371/journal.pone.022013514:8(e0220135)Online publication date: 1-Aug-2019
https://doi.org/10.1371/journal.pone.0220135
Mal RChu Y(2017)A flexible multi-core functional cache simulator (FM-SIM)Proceedings of the Summer Simulation Multi-Conference10.5555/3140065.3140094(1-12)Online publication date: 9-Jul-2017
https://dl.acm.org/doi/10.5555/3140065.3140094
Show More Cited By

Index Terms

Using Pin as a memory reference generator for multiprocessor simulation

Recommendations

Bus-pin-aware bus-driven floorplanning
GLSVLSI '10: Proceedings of the 20th symposium on Great lakes symposium on VLSI

As the number of buses increase substantially in multi-core SoC designs, the bus planning problem has become the dominant factor in determining the performance and power consumption of SoC designs. To cope with the bus planning problem, it is desirable ...
An efficient software transactional memory using commit-time invalidation
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

To improve the performance of transactional memory (TM), researchers have found many eager and lazy optimizations for conflict detection, the process of determining if transactions can commit. Despite these optimizations, nearly all TMs perform one ...
Pin Accessibility-Driven Detailed Placement Refinement
ISPD '17: Proceedings of the 2017 ACM on International Symposium on Physical Design

The significantly increased number of routing design rules at sub-20nm nodes has made pin access one of the most critical challenges in detailed routing. Resolving pin access issues in detailed routing stage may be too late due to the fixed pin ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 33, Issue 5

Special issue on the 2005 workshop on binary instrumentation and application

December 2005

93 pages

ISSN:0163-5964

DOI:10.1145/1127577

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2005

Published in SIGARCH Volume 33, Issue 5

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
460
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Barai AArafa YBadawy AChennupati GSanthi NEidenbenz S(2022)PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modelingThe Journal of Supercomputing10.1007/s11227-021-03949-478:2(2354-2385)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s11227-021-03949-4
Navarro-Torres AAlastruey-Benedé JIbáñez-Marín PViñals-Yúfera V(2019)Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SPPLOS ONE10.1371/journal.pone.022013514:8(e0220135)Online publication date: 1-Aug-2019
https://doi.org/10.1371/journal.pone.0220135
Mal RChu Y(2017)A flexible multi-core functional cache simulator (FM-SIM)Proceedings of the Summer Simulation Multi-Conference10.5555/3140065.3140094(1-12)Online publication date: 9-Jul-2017
https://dl.acm.org/doi/10.5555/3140065.3140094
Zhao MYeung D(2017)Using Multicore Reuse Distance to Study Coherence DirectoriesACM Transactions on Computer Systems10.1145/309270235:2(1-49)Online publication date: 28-Jul-2017
https://dl.acm.org/doi/10.1145/3092702
Badawy AYeung D(2017)Optimizing locality in graph computations using reuse distance profiles2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC)10.1109/PCCC.2017.8280444(1-8)Online publication date: Dec-2017
https://doi.org/10.1109/PCCC.2017.8280444
Zhao MYeung D(2015)Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056065(590-602)Online publication date: Feb-2015
https://doi.org/10.1109/HPCA.2015.7056065
Wu MZhao MYeung D(2013)Studying multicore processor scaling via reuse distance analysisACM SIGARCH Computer Architecture News10.1145/2508148.248596541:3(499-510)Online publication date: 23-Jun-2013
https://dl.acm.org/doi/10.1145/2508148.2485965
Wu MZhao MYeung DMendelson A(2013)Studying multicore processor scaling via reuse distance analysisProceedings of the 40th Annual International Symposium on Computer Architecture10.1145/2485922.2485965(499-510)Online publication date: 23-Jun-2013
https://dl.acm.org/doi/10.1145/2485922.2485965
Wu MYeung D(2013)Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel ProgramsACM Transactions on Computer Systems (TOCS)10.1145/2427631.242763231:1(1-37)Online publication date: 1-Feb-2013
https://dl.acm.org/doi/10.1145/2427631.2427632
Wu MYeung DZhang LMutlu O(2012)Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysisProceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/2247684.2247687(2-11)Online publication date: 16-Jun-2012
https://dl.acm.org/doi/10.1145/2247684.2247687
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Bus-pin-aware bus-driven floorplanning

An efficient software transactional memory using commit-time invalidation

Pin Accessibility-Driven Detailed Placement Refinement