Abstract
This paper proposes a code placement problem, its ILP formulation, and a heuristic algorithm for reducing the total energy consumption of embedded processor systems including a CPU core, on-chip and off-chip memories. Our approach exploits a non-cacheable memory region for an effective use of a cache memory and as a result, reduces the number of off-chip accesses. Our algorithm simultaneously finds a code layout for a cacheable region, a scratchpad region, and the other non-cacheable region of the address space so as to minimize the total energy consumption of the processor system. Experiments using a commercial embedded processor and an off-chip SDRAM demonstrate that our algorithm reduces the energy consumption of the processor system by 23% without any performance degradation compared to the best result achieved by the conventional approach.
Similar content being viewed by others
References
Segars, S. (2001). Low power design techniques for microprocessors. ISSCC Tutorial Note, February.
ARM Ltd. (2008). ARM processor core overview. http://www.arm.com/products/CPUs/.
Montanaro, J., et al. (1996). A 160 MHz, 32b 0.5W CMOS RISC microprocessor. In Proc. of ISSCC, February.
Su, C., & Despain, A. (1995). Cache design trade-offs for power and performance optimization: A case study. In Proc. of ISLPED (pp. 63–68), August.
Hicks, P., Walnock, M., & Owens, R. M. (1997). Analysis of power consumption in memory hierarchies. In Proc. of ISLPED (pp. 239–242), August.
Li, Y., & Henkel, J. (1998). A framework for estimating and minimizing energy dissipation of embedded HW/SW systems. In Proc. of DAC (pp. 188–193), June.
Shine, W. T., & Chacrabarti, C. (1999). Memory exploration for low power, embedded systems. In Proc. of DAC (pp. 140–145), June.
Malik, A., Moyer, B., & Cermak, D. (2000). A low power unified cache architecture providing power and performance flexibility. In Proc. of ISLPED (pp. 241–243), July.
McFarling, S. (1989). Program optimization for instruction caches. In Proc. of int’l conference on architecture support for programming languages and operating systems (pp. 183–191), April.
Hwu, W. W., & Chang, P. P. (1989). Achieving high instruction cache performance with an optimizing compiler. In Proc. of ISCA (pp. 242–251), May.
Tomiyama, H., & Yasuura, H. (1996). Optimal code placement of embedded software for instruction caches. In Proc. of European design and test conference (pp. 96–101), March.
Panda, P., Dutt, N., & Nicolau, A. (1996). Memory organization for improved data cache performance in embedded processors. In Proc. of ISSS (pp. 90–95), November.
Hashemi, A. H., Kaeli, D. R., & Calder, B. (1997). Efficient procedure mapping using cache line coloring. In Proc. of programming language design and implementation (pp. 171–182), June.
Ghosh, S., Martonosi, M., & Malik, S. (1999). Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21(4), 703–746, July.
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., & Marwedel, P. (2002). Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proc. of CODES (pp. 73–78), May.
Stenke, S., Wehmeyer, L., Lee, B., & Marwedel, P. (2002). Assigning program and data objects to scratchpad for energy reduction. In Proc. of DATE (pp. 409–415), March.
Ishitobi, Y., Ishihara, T., & Yasuura, H. (2007). Code placement for reducing the energy consumption of embedded processors with scratchpad and cache memories. In Proc. of ESTIMedia (pp. 13–18), March.
Johnson, T. L., Merten, M. C., & Hwu, W. W. (1997) Run-time spatial locality detection and optimization. In Proc. of the 30th int’l symposium on microarchitecture (pp. 57–64), December.
Rivers, J. A., & Davidson, E. S. (1996). Reducing conflicts in direct-mapped caches with a temporality-based design. In Proc. of the 25th int’l conference on parallel processing (pp. 154–163), August.
Micron (2008). The Micron System Power Calculator. http://www.micron.com/support/designsupport/tools/powercalc/powercalc.
Panwar, R., & Rennels, D. (1995). Reducing the frequency of tag compares for low power I-cache design. In Proc. of ISLPED (pp. 57–62), August.
Mullar, M. (1992). Power efficiency & low cost: The ARM6 family. In Proc. of hot chips IV, August.
Acknowledgements
This work is supported by VDEC, the Univ. of Tokyo with the collaboration of Renesas Technology Corp., ROHM Co., Ltd., Toppan Printing Co., Ltd., Synopsys, Inc. and Cadence Design Systems, Inc. This work is also supported by CREST ULP program of JST and Grant-in-Aid for Scientific Research (A) 19200004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ishitobi, Y., Ishihara, T. & Yasuura, H. Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories. J Sign Process Syst 60, 211–224 (2010). https://doi.org/10.1007/s11265-008-0306-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0306-3