Efficient Characterization of Hidden Processor Memory Hierarchies
Pages 335 - 349
Abstract
A processor’s memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a compiler, a JIT and a runtime system, are used more and more, for example, performing large scale computation in cloud and cluster. Even worse, in such environments, a single computation may use a collection of processors with dissimilar characteristics. Ignorance of the performance-critical parameters of the underlying system makes it difficult to improve performance by optimizing the code or adjusting runtime-system behaviors; it also makes application performance harder to understand.
To address this problem, we have developed a suite of portable tools that can efficiently derive many of the parameters of processor memory hierarchies, such as levels, effective capacity and latency of caches and TLBs, in a matter of seconds. The tools use a series of carefully considered experiments to produce and analyze cache response curves automatically. The tools are inexpensive enough to be used in a variety of contexts that may include install time, compile time or runtime adaption, or performance understanding tools.
References
[1]
Saavedra RH and Smith AJ Measuring cache and TLB performance and their effect on benchmark runtimes IEEE Trans. Comput. 1995 44 10 1223-1235
[2]
McVoy, L.W., Staelin, C.: Lmbench: portable tools for performance analysis. In: USENIX annual technical conference, pp. 279–294 (1996)
[3]
Dongarra J, Moore S, Mucci P, Seymour K, and You H Bubak M, van Albada GD, Sloot PMA, and Dongarra J Accurate cache and TLB characterization using hardware counters Computational Science - ICCS 2004 2004 Heidelberg Springer 432-439
[4]
Yotov, K., Pingali, K., Stodghill, P.: X-ray: a tool for automatic measurement of hardware parameters. In: Proceedings of Second International Conference on the Quantitative Evaluation of Systems 2005, pp. 168–177. IEEE, September 2005
[5]
Yotov K, Pingali K, and Stodghill P Automatic measurement of memory hierarchy parameters ACM SIGMETRICS Perform. Eval. Rev. 2005 33 1 181-192
[6]
Duchateau AX, Sidelnik A, Garzarán MJ, and Padua D Amaral JN P-ray: a software suite for multi-core architecture characterization Languages and Compilers for Parallel Computing 2008 Heidelberg Springer 187-201
[7]
González-Domínguez, J., Taboada, G.L., Fragüela, B.B., Martín, M.J., Tourino, J.: Servet: a benchmark suite for autotuning on multicore clusters. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–9. IEEE, April 2010
[8]
Sandoval, J.A.: Foundations for Automatic, Adaptable Compilation. Doctoral dissertation, Rice University (2011)
[9]
Taylor, R., Li, X.: A micro-benchmark suite for AMD GPUs. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 387–396. IEEE (2010)
[10]
Sussman, A., Lo, N., Anderson, T.: Automatic computer system characterization for a parallelizing compiler. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER), pp. 216–224. IEEE (2011)
[11]
Abel, A.: Measurement-based inference of the cache hierarchy. Doctoral dissertation, Master’s thesis, Saarland University (2012)
[12]
González-Domínguez Jorge, Martín María J., Taboada Guillermo L., Expósito Roberto R., and Touriño Juan The Servet 3.0 benchmark suite: Characterization of network performance degradation Computers & Electrical Engineering 2013 39 8 2483-2493
[13]
Casas, M., Bronevetsky, G.: Active measurement of memory resource consumption. In: 2014 IEEE 28th International Symposium on Parallel and Distributed Processing, pp. 995–1004. IEEE, May 2014
[14]
Casas Marc and Bronevetsky Greg Evaluation of HPC Applications’ Memory Resource Consumption via Active Measurement IEEE Transactions on Parallel and Distributed Systems 2016 27 9 2560-2573
[15]
Moyer, S.A.: Performance of the iPSC/860 node architecture. Institute for Parallel Computation, University of Virginia (1991)
[16]
Qasem, A., Kennedy, K.: Profitable loop fusion and tiling using model-driven empirical search. In: Proceedings of the 20th Annual International Conference on Supercomputing, pp. 249–258. ACM, June 2006
[17]
Luk CK and Mowry TC Architectural and compiler support for effective instruction prefetching: a cooperative approach ACM Trans. Comput. Syst. 2001 19 1 71-109
Index Terms
- Efficient Characterization of Hidden Processor Memory Hierarchies
Index terms have been assigned to the content through auto-classification.
Recommendations
Towards Virtually-Addressed Memory Hierarchies
HPCA '01: Proceedings of the 7th International Symposium on High-Performance Computer ArchitectureAbstract: Currently cache hierarchies are indexed in parallel with a TLB but their tags are part of the physical address so that the memory hierarchy is physically addressed. This design faces problems as more concurrency is exploited in the processor ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Jun 2018
843 pages
ISBN:978-3-319-93712-0
DOI:10.1007/978-3-319-93713-7
© Springer International Publishing AG, part of Springer Nature 2018.
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Published: 11 June 2018
Author Tags
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024