[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Dynamic allocation for scratch-pad memory using compile-time decisions

Published: 01 May 2006 Publication History

Abstract

In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple allocation scheme. Primarily scratch pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation, results show that our scheme reduces runtime by up to 39.8% and energy by up to 31.3%, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.

References

[1]
Ahmed, N., Mateev, N., and Pingali, K. 2000. Tiling imperfectly-nested loop nests. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM). IEEE Computer Society, 31.]]
[2]
Anantharaman, S. and Pande, S. 1998. Compiler optimization for real time execution of loops on limited memory embedded systems. In Proc. of the 19th IEEE Real-Time Systems Symposium.]]
[3]
Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM Press, New York, 318--326.]]
[4]
Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM Press, New York. 259--267.]]
[5]
Appel, A. W. and Ginsburg, M. 1998. Modern Compiler Implementation in C. Cambridge University Press, Cambridge.]]
[6]
Avissar, O., Barua, R., and Stewart, D. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the ACM 2nd International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), Atlanta, GA. (Also at http://www.ece.umd.edu/~barua.)]]
[7]
Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad based embedded systems. ACM Transactions on Embedded Systems (TECS) 1, 1 (Sept.).]]
[8]
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Tenth International Symposium on Hardware/Software Codesign (CODES). Estes Park, Colorado, ACM Press, New York.]]
[9]
Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2001. The performance and energy consumption of three embedded real-time operating systems. In Proceedings of the 4th Workshop on Compiler and Architecture Support for Embedded Systems (CASES'01). Atlanta, GA. 203--210.]]
[10]
Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2003. The performance and energy consumption of embedded real-time operating systems. IEEE Transactions on Computers 52, 11 (Nov.), 1454--1469.]]
[11]
Belady, L. 1966. A study of replacement algorithms for virtual storage. In IBM Systems Journal 5, 78--101.]]
[12]
Bringmann, R. A. 1995. Compiler-controlled speculation. Ph.D. thesis, Department of Computer Science, University of Illinois, Urbana, IL.]]
[13]
Chilimbi, T. M., Davidson, B., and Larus, J. 1999. Cache-conscious structure definition. In ACM SIGPLAN Notices. ACM Press, New York. 13--24.]]
[14]
Dinero Cache Simulator Revised. 2004. DineroIV Cache Simulator. http://www.cs.wisc.edu/markhill/DineroIV/.]]
[15]
Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing(JEC) 1, 4, IOS Press, Amsterdam, Netherlands.]]
[16]
Eisenbeis, C., Jalby, W. D., and Fran, C. 1990. A strategy for array management in local memory. In Technical Report 1262, INRIA, Domaine de Voluceau, France.]]
[17]
Francesco, P., Marchal, P., Atienza, D., Luca Benini, F. C., and Mendias, J. M. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the Design Automation Conference. ACM Press, New York. 238--243.]]
[18]
Goodwin, D. W. and Wilken, K. D. 1996. Optimal and near-optimal global register allocation using 0-1 integer programming. In Software-Practice and Experience. 929--965.]]
[19]
Hallnor, G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA). Vancouver, British Columbia, Canada.]]
[20]
Hiser, J. D. and Davidson, J. W. 2004. Embarc: An efficient memory bank assignment algorithm for retargetable compilers. In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM Press, New York. 182--191.]]
[21]
ILOG Corporation. 2001. The CPLEX optimization suite. http://www.ilog.com/products/cplex/.]]
[22]
Janzen, J. 2001. Calculating memory system power for DDR SDRAM. In DesignLine Journal 10(2). Micron Technology Inc. http://www.micron.com/publications/designline.html.]]
[23]
Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Design Automation Conference. 690--695.]]
[24]
Keitel-Sculz, D. and Wehn, N. 2001. Embedded DRAM development technology, physical design, and application issues. In IEEE Design and Test of Computers.]]
[25]
Larson, E. and Austin, T. 2000. Compiler controlled value prediction using branch predictor based confidence. In Proceedings of the 33th Annual International Symposium on Microarchitecture (MICRO-33). IEEE Computer Society.]]
[26]
Lctes Panel. 2003. Compilation challenges for network processors. Industrial Panel, ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES). Slides at http://www.cs.purdue.edu/s3/LCTES03/.]]
[27]
Lim, A. W., Liao, S.-W., and Lam, M. S. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. ACM Press, New York. 103--112.]]
[28]
Luk, C.-K. and Mowry, T. C. 1998. Cooperative instruction prefetching in modern processors. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. 182--194.]]
[29]
Micron-flash data sheet. 128Mb Q-Flash memory. Micron technology Inc. http://www.micron.com/products/nor/qflash/partlist.aspx.]]
[30]
Micron-datasheet. 2003. 128Mb DDR SDRAM data sheet. (Dual data-rate synchronous DRAM) Micron Technology Inc. http://www.micron.com/products/dram/ddrsdram/.]]
[31]
Moritz, C. A., Frank, M., and Amarasinghe, S. 2000. FlexCache: A framework for flexible compiler generated data caching. In The 2nd Workshop on Intelligent Memory Systems, Boston, MA.]]
[32]
Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip versus off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems 5, 3 (July).]]
[33]
Schreiber, R. D. C. 2004. Near-optimal allocation of local memory arrays. In HPL-2004-24.]]
[34]
Sinha, A. and Chandrakasan, A. 2001. JouleTrack---A web based tool for software energy profiling. In Design Automation Conference. 220--225.]]
[35]
Sjodin, J. and Platen, C. V. 2001. Storage allocation for embedded processors. Compiler and Architecture Support for Embedded Computing Systems.]]
[36]
Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems.]]
[37]
Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto on chip memory. In Proceedings of the 15th International Symposium on System Synthesis (ISSS) (Kyoto, Japan). ACM Press, New York.]]
[38]
Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, 409.]]
[39]
Tanenbaum, A. S. 1998. Structured Computer Organization (4th Edition). Prentice Hall, Englewood, Cliffs, NJ.]]
[40]
Tiwari, V. and Lee, M. T. C. 1998. Power analysis of a 32-bit embedded microcontroller. VLSI Design Journal 7, 33.]]
[41]
Udayakumaran, S., Narahari, B., and Simha, R. 2002. Application specific memory partitioning for low power. In Proceedings of ACM COLP 2002 (Compiler and Operating Systems for Low Power. ACM Press, New York.]]
[42]
Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). ACM Press, New York. 276--286.]]
[43]
Udayakumaran, S. and Barua, R. 2006. An integrated scratch-pad allocator for affine and non-affine code. In Design, Automation and Test in Europe (DATE). IEEE Computer Society.]]
[44]
Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society.]]
[45]
Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratch-pad memory for energy minimization. In International Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS) (Stockholm, Sweden). ACM Press, New York.]]
[46]
Wehmeyer, L. and Marwedel, P. 2004. Influence of onchip scratch-pad memories on wcet prediction. In Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis.]]
[47]
Wehmeyer, L., Helmig, U., and Marwedel, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI2004).]]
[48]
Wilton, S. and Jouppi, N. 1996. Cacti: An enhanced cache access and cycle time model. In IEEE Journal of Solid-State Circuits.]]

Cited By

View all
  • (2024)Compiler-Based Memory Encryption for Machine Learning on Commodity Low-Power DevicesProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641564(198-211)Online publication date: 17-Feb-2024
  • (2024)MuDP: multi-granularity data placement for uniform loops on SPM-DRAM architectures to minimize latencyFrontiers of Computer Science10.1007/s11704-023-3566-y19:5Online publication date: 22-Nov-2024
  • (2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 5, Issue 2
May 2006
253 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1151074
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 01 May 2006
Published in TECS Volume 5, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Memory allocation
  2. compiler
  3. embedded systems
  4. scratch pad
  5. software caching
  6. software-managed cache

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Compiler-Based Memory Encryption for Machine Learning on Commodity Low-Power DevicesProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641564(198-211)Online publication date: 17-Feb-2024
  • (2024)MuDP: multi-granularity data placement for uniform loops on SPM-DRAM architectures to minimize latencyFrontiers of Computer Science10.1007/s11704-023-3566-y19:5Online publication date: 22-Nov-2024
  • (2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
  • (2023)Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN AcceleratorsProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580017(224-235)Online publication date: 17-Feb-2023
  • (2023)GNN at the Edge: Cost-Efficient Graph Neural Network Processing Over Distributed Edge ServersIEEE Journal on Selected Areas in Communications10.1109/JSAC.2022.322942241:3(720-739)Online publication date: Mar-2023
  • (2022)CARL: Compiler Assigned Reference LeasingACM Transactions on Architecture and Code Optimization10.1145/349873019:1(1-28)Online publication date: 17-Mar-2022
  • (2022)MASTER: Reclamation of Hybrid Scratchpad Memory to Maximize Energy Saving in Multi-Core Edge SystemsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2021.30494477:4(749-760)Online publication date: 1-Oct-2022
  • (2022)Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systemsFuture Generation Computer Systems10.1016/j.future.2022.05.005135(270-282)Online publication date: Oct-2022
  • (2021)SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN AcceleratorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480041(912-924)Online publication date: 18-Oct-2021
  • (2021)OptimizationEmbedded System Design10.1007/978-3-030-60910-8_7(349-379)Online publication date: 26-Jan-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media