[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2837476.2837477acmconferencesArticle/Chapter ViewAbstractPublication PagessepsConference Proceedingsconference-collections
research-article

Exana: an execution-driven application analysis tool for assisting productive performance tuning

Published: 27 October 2015 Publication History

Abstract

As modern memory subsystems have become complex, performance tuning of application code targeting for their deeper memory hierarchy is critical to rewarding their potential performance. However, it has been depending on time-consuming and empirical tasks by hands of domain experts. To assist such a performance tuning process, we have been developing an application analysis tool called Exana and attempted to automate some parts of it. Using already complied executable binary code as an input, Exana can transparently analyze program structures, data dependences, memory access characteristics, cache hit/miss statistics across program execution. In this paper, we demonstrate usefulness and productiveness of these analyses, and evaluate the overheads for them. After we demonstrate that our analysis is feasible and useful to the actual HPC application programs, we show that the overheads of Exana's analyses are much less than these of existing architectural simulators.

References

[1]
G. Ammons, T. Ball, and J. R. Larus. Exploiting hardware performance counters with flow and context sensitive profiling. In Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, pages 85– 96, 1997.
[2]
N. Binkert et al. The gem5 simulator. SIGARCH Comput. Archit. News, pages 1–7, 2011.
[3]
T. E. Carlson, W. Heirman, S. Eyerman, I. Hur, and L. Eeckhout. An evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization (TACO), pages 28:1–28:25, Aug. 2014.
[4]
P. J. Denning. The locality principle. Commun. ACM, 48(7): 19–24, 2005.
[5]
T. Endo and G. Jin. Software technologies coping with memory hierarchy of gpgpu clusters for stencil computations. In Cluster Computing (CLUSTER), 2014 IEEE International Conference on, pages 132–139, Sept 2014.
[6]
Exana tool kit. http://www.el.gsic.titech.ac.jp/ yukinori/Exana.html.
[7]
J. L. Henning. SPEC CPU suite growth: an historical perspective. SIGARCH Comput. Archit. News, 35(1):65–68, Mar. 2007. ISSN 0163-5964.
[8]
A. Jaleel, R. Cohn, C.-K. Luk, and B. Jacob. CMP$im: A Pinbased on-the-fly multi-core cache simulator. In In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MOBS’08), 2008.
[9]
J. Larus. Spending Moore’s dividend. Commun. ACM, 52(5): 62–69, May 2009.
[10]
likwid. https://github.com/rrze-likwid/likwid/.
[11]
C.-K. Luk et al. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190–200, 2005.
[12]
Y. Matsubara and Y. Sato. Online memory access pattern analysis on an application profiling tool. In International Workshop on Advances in Networking and Computing, 2014 (WANC2014), pages 602–604, Dec. 2014.
[13]
OpenMX: Open source package for Material eXplorer. http://www.openmx-square.org/.
[14]
A. Patel, F. Afram, S. Chen, and K. Ghose. MARSS: A full system simulator for multicore x86 cpus. In Proceedings of the 48th Design Automation Conference, DAC ’11, pages 1050– 1055, 2011.
[15]
D. Sanchez and C. Kozyrakis. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pages 475–486, 2013.
[16]
Y. Sato, Y. Inoguchi, and T. Nakamura. On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system. In Proceedings of the 8th ACM International Conference on Computing Frontiers, pages 25:0–25:10, May 2011.
[17]
Y. Sato, Y. Inoguchi, and T. Nakamura. Whole program data dependence profiling to unveil parallel regions in the dynamic execution. In Proceedings of 2012 IEEE International Symposium on Workload Characterization (IISWC2012), pages 69–80, Nov. 2012.
[18]
Y. Sato, Y. Inoguchi, and T. Nakamura. Identifying program loop nesting structures during execution of machine code. IEICE Transaction on Information and Systems, E97-D(9): 2371–2385, Sep. 2014.
[19]
The Riken Himeno CFD Benchmark. http://accc.riken.jp/2444.htm.

Cited By

View all
  • (2017)ExanaDBTProceedings of the Computing Frontiers Conference10.1145/3075564.3077627(191-200)Online publication date: 15-May-2017
  • (2017)An Accurate Simulator of Cache-Line Conflicts to Exploit the Underlying Cache PerformanceEuro-Par 2017: Parallel Processing10.1007/978-3-319-64203-1_9(119-133)Online publication date: 1-Aug-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SEPS 2015: Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems
October 2015
70 pages
ISBN:9781450339100
DOI:10.1145/2837476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Performance tuning
  2. actual HPC applications
  3. dynamic binary translation
  4. transparent analysis

Qualifiers

  • Research-article

Conference

SPLASH '15
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)ExanaDBTProceedings of the Computing Frontiers Conference10.1145/3075564.3077627(191-200)Online publication date: 15-May-2017
  • (2017)An Accurate Simulator of Cache-Line Conflicts to Exploit the Underlying Cache PerformanceEuro-Par 2017: Parallel Processing10.1007/978-3-319-64203-1_9(119-133)Online publication date: 1-Aug-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media