[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Efficient hardware-based nonintrusive dynamic application profiling

Published: 05 May 2011 Publication History

Abstract

Application profiling—the process of monitoring an application to determine the frequency of execution within specific regions—is an essential step within the design process for many software and hardware systems. Profiling is often a critical step within hardware/software partitioning utilized to determine the critical kernels of an application. In this article, we present an innovative, nonintrusive dynamic application profiler (DAProf) capable of profiling an executing application by monitoring the application's short backward branches, function calls, and function returns. The resulting profile information provides an accurate characterization of the frequently executed loops within the application providing a breakdown of loop executions versus loop iterations per execution. DAProf achieves excellent profiling accuracy with an average accuracy of 98% for loop executions, 97% for average iterations per execution, and 95% for percentage of execution time. In addition, the presented dynamic application profiler incurs as little as 11% area overhead compared to an ARM9 microprocessor. DAProf is ideally suited for rapidly profiling software applications and dynamic optimization approaches such as dynamic hardware/software partitioning in which detailed loop execution information is needed to provide accurate performance estimates.

References

[1]
Altera, Inc. 2009. Performance counter core. http://www.altera.com.
[2]
ARM Ltd. 2009. RealView profiler. http://www.arm.com/products/DevTools/RVP.html.
[3]
Anderson, J., Berc L., Dean, J., Ghemawat, S., Henzinger, M., Leung, S.-T., Sites, R., Vandevoorde, M., Waldspurger, C., and Weihl, W. 1997. Continuous profiling: Where have all the cycles gone? ACM Trans. Comput. Syst. 15, 4, 357--390.
[4]
Bala, V., Duesterwald, E., and Banerjia, S. 2000. Dynamo: A transparent runtime optimization system. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 1--12.
[5]
Ball, T., and Larus, J. 1996. Efficient path profiling. In Proceedings of the International Symposium on Microarchitecture (MICRO). 46--57.
[6]
Bellas, N., Hajj, I., Polychronopoulus, C., and Stamoulis, G. 1999. Energy and performance improvements in microprocessor design using a loop cache. In Proceedings of the International Conference on Computer Design (ICCD). 378--383.
[7]
Berrendorf, R., Ziegler, H., and Mohr, B. 2003. Performance counter library (PCL). http://www.fz-juelich.de/jsc/PCL/.
[8]
Brown, S., Dongarra, J., Garner, N., London, K., and Mucci, P. 2000. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the ACM Conference on Supercomputing (SC). 42--54.
[9]
Burger, D., and Austin, T. M. 1997. The simplescalar tool set, version 2.0. Tech. Rep. 1342. Computer Sciences Department, University of Wisconsin-Madison, Madison, WI.
[10]
Calder, B., Feller, P., and Eustace, A. 1997. Value profiling. In Proceedings of the International Symposium on Microarchitecture (MICRO). 259--269.
[11]
Chung, E.Y., Benini, L., and De Micheli, G. 2001. Automatic source code specialization for Energy Reduction. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). 80--83.
[12]
Chernoff, A., Herdeg, M., Hookway, R., Reeve, C., Rubin, N., Tye, T., Bharadwaj Yadavalli, S., and Yates, J. 1998. FX!32: A profile-directed binary translator. IEEE Micro 18, 2, 56--64.
[13]
Dean, J., Hicks, J., Waldspurger, C., Weihl, W., and Chrysos, G. 1997. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In Proceedings of the International Symposium on Microarchitecture (MICRO). 292--302.
[14]
Ebcioglu, K., Altman, E., Gschwind, M., and Sathaye, S. 2001. Dynamic binary translation and optimization. IEEE Trans. Comput. 50, 6, 529--548.
[15]
Gordon-Ross, A., Cotterell, S., and Vahid, F. 2002. Exploiting fixed programs in embedded systems: A loop cache example. IEEE Comput. Arch. Lett. 1, 1, 2--5.
[16]
Gordon-Ross, A., and Vahid, F. 2005. Frequent loop detection using efficient non-intrusive on-chip hardware. IEEE Trans. Comput. 54, 10, 1203--1215.
[17]
Graham, S. L., Kessler, P. B., and McKusick, M. K. 1982. GPROF: A call graph execution profiler. In Proceedings of the Symposium on Compiler Construction. 120--126.
[18]
Guo, Z., Buyukkurt, B., Najjar, W., and Vissers, K. 2005. Optimized generation of data-path from C codes. In Proceedings of the Design Automation and Test in Europe Conference (DATE). 112--117.
[19]
Guthaus, M., Ringenberg, J., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE Workshop on Workload Characterization. 3--14.
[20]
Hazelwood, K., and Klauser, A. 2006. A dynamic binary instrumentation engine for the ARM architecture. In Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES). 261--270.
[21]
Henkel, J. 1999. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of the Design Automation Conference (DAC). 122--127.
[22]
IEEE. 2001. IEEE 1149.1 standard test access port and boundary scan architecture.
[23]
Intel Corp. 2005. Vtune environment, http://developer.intel.com/vtune.
[24]
Keane, J., Bradley, C., and Ebeling, C. 2004. A compiled accelerator for biological cell signaling simulations. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA). 233--241.
[25]
Klaiber, A. 2000. The technology behind crusoe processors. Transmeta Corporation. Santa Clara CA. http://www.transmeta.com.
[26]
Lakshminarayana, G., Raghunathan, A., Khouri, K., Jha, N., and Dey, S. 1999. Common-case computation: A high-level technique for power and performance optimization. In Proceedings of the Design Automation Conference (DAC). 56--61.
[27]
Lee, L. H., Moyer, B., and Arends, J. 1999. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED). 267--269.
[28]
Lysecky, R., Stitt, G., and Vahid, F. 2006. Warp processors. ACM Trans. Des. Automat. Electron. Syst. 11, 3, 659--681.
[29]
Lysecky, R., Cotterell, S., and Vahid, F. 2002. A fast on-chip profiler memory. In Proceedings of the Design Automation Conference (DAC). 28--33.
[30]
Nair, A., and Lysecky, R. 2008. Non-intrusive dynamic application profiler for detailed loop execution characterization. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES). 23--30.
[31]
Pettis, K., and Hansen, R. C. 1990. Profile guided code positioning. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 16--27.
[32]
Schulz, M., White, B. S., McKee, S. A., Lee, H. S., and Jeitner, J. 2005. Owl: Next generation system monitoring. In Proceedings of the Conference on Computing Frontiers (CF). 116--124.
[33]
Scott, J., Lee, L.H., Chin, A., Arends, J., and Moyer, W. 1999. Designing the M*CORE M3 CPU architecture. In Proceedings of the International Conference on Computer Design (ICCD). 94--101.
[34]
Shannon, L. and Chow, P. 2004. Maximizing system performance: Using reconfigurability to monitor system communication. In Proceedings of the International Conference on Field-Programmable Technology (FPT). 231--238.
[35]
Sprunt, B. 2002. Pentium 4 performance monitoring features. IEEE Micro 22, 72--82.
[36]
Stitt, G., Vahid, F., and Nematbakhsh, S. 2004. Power savings and speedups from partitioning critical loops to hardware in embedded systems. ACM Trans. Embed. Comp. Syst. 3, 1, 218--232.
[37]
Stitt, G., and Vahid, F. 2002. The energy advantages of microprocessor platforms with on-chip configurable logic. IEEE Des. Test Comp. 19, 6, 36--43.
[38]
Tong, J., and Khalid, M. 2007. A comparison of profiling tools for FPGA-based embedded systems. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE). 1687--1690.
[39]
Venkataramani, G., Najjar, W., Kurdahi, F., Bagherzadeh, N., and Bohm, W. 2001. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. In Proceedings of the International Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES). 116--125.
[40]
Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2001. Loop analysis of embedded applications. Tech. Rep. UCR-CSE-01-03. University of California Riverside, Riverside, CA
[41]
Yellin, D. M. 2003. Competitive algorithms for the dynamic selection of component implementations. IBM Syst. J. 42, 1, 85--97.
[42]
Zagha, M., Larson, B., Turner, S., and Itzkowitz, M. 1996. Performance analysis using the MIPS R10000 performance counters. Supercomp., 16--35.
[43]
Zhang, X., Wang, Z., Gloy, N., Chen, J., and Smith, M. 1997. System support for automatic profiling and optimization. In Proceedings of the International Symposium on Operating Systems Principles, 15--26.
[44]
Zilles, C., and Sohi, G. 2001. A programmable co-processor for profiling. In Proceedings of the International Symposium on High-Performance Computer Architectures (HPCA). 241--252.

Cited By

View all
  • (2022)DynPath–Non-Intrusive Feature-Rich Hardware-Based Execution Path ProfilerIEEE Access10.1109/ACCESS.2022.321871010(116069-116086)Online publication date: 2022
  • (2018)Non-Intrusive In-Situ Requirements Monitoring of Embedded SystemACM Transactions on Design Automation of Electronic Systems10.1145/320621323:5(1-27)Online publication date: 20-Aug-2018
  • (2017)Performance impacts and limitations of hardware memory access trace collectionProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130496(506-511)Online publication date: 27-Mar-2017
  • Show More Cited By

Index Terms

  1. Efficient hardware-based nonintrusive dynamic application profiling

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 10, Issue 3
      April 2011
      205 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/1952522
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 05 May 2011
      Accepted: 01 November 2009
      Revised: 01 March 2009
      Received: 01 January 2008
      Published in TECS Volume 10, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Profiling
      2. critical kernels
      3. dynamic hardware/software partitioning
      4. dynamic optimization
      5. embedded systems
      6. nonintrusive profiling

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)DynPath–Non-Intrusive Feature-Rich Hardware-Based Execution Path ProfilerIEEE Access10.1109/ACCESS.2022.321871010(116069-116086)Online publication date: 2022
      • (2018)Non-Intrusive In-Situ Requirements Monitoring of Embedded SystemACM Transactions on Design Automation of Electronic Systems10.1145/320621323:5(1-27)Online publication date: 20-Aug-2018
      • (2017)Performance impacts and limitations of hardware memory access trace collectionProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130496(506-511)Online publication date: 27-Mar-2017
      • (2017)Performance impacts and limitations of hardware memory access trace collectionDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7927041(506-511)Online publication date: Mar-2017
      • (2017)Non-intrusive dynamic profiler for multicore embedded systems2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASPDAC.2017.7858372(500-505)Online publication date: 16-Jan-2017
      • (2014)Dynamic Analysis of Embedded Software Using Execution ReplayProceedings of the 2014 IEEE 17th International Symposium on Object/Component-Oriented Real-Time Distributed Computing10.1109/ISORC.2014.16(166-173)Online publication date: 10-Jun-2014
      • (2012)Programming strategies for runtime adaptability7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)10.1109/ReCoSoC.2012.6322875(1-8)Online publication date: Jul-2012
      • (2012)A comparison of the influence of different multi-core processors on the runtime overhead for application-level monitoringProceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools10.1007/978-3-642-31202-1_5(42-53)Online publication date: 31-May-2012

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media