More Web Proxy on the site http://driver.im/

research-article

SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems

Authors:

Prasenjit Chakraborty,

Preeti Ranjan PandaAuthors Info & Claims

CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Article No.: 21, Pages 1 - 10

Published: 29 September 2013 Publication History

Abstract

Modern system architectures sometimes include scratch pad memories (SPM) in their memory hierarchy to take advantage of their simpler design, in an attempt to meet the system area, performance, and power budget. These systems employing SPM can be broadly categorized as: (a) cacheless systems with only SPM, (b) hybrid systems with both cache and SPM, and (c) reconfigurable systems with the provision to reconfigure local memory as either cache, SPM, or a combination of the two. However SPM based systems have needed larger efforts spent on their programming, mainly due to allocating data and orchestrating data transfers explicitly by software. Tight product development cycles require faster development and porting of diverse applications to multiple SPM based architectures. In this paper we present SPM-Sieve, a profile-based tool and framework targeted for SPM based architectures that generates partitioning decisions of the first level memory in the system hierarchy, and suggests object mapping amongst the memory partitions without resorting to detailed simulation of all configurations. This is done by natively executing an application and using minimal target architecture specification, which not only provides early information influencing data organization in the application, but also provides a foundation for other more sophisticated algorithms to produce optimized allocations. We demonstrate the utility and generality of SPM-Sieve by evaluating it on a large number of SPEC2000 benchmarks targeted for a 128KB first level memory. We evaluate its effectiveness by performing simulation studies comparing the partition suggested by the tool against varying partition sizes, and observe that its suggestions are very competitive for SPM based architectures with and without caches.

References

[1]

ARM. Advanced RISC Machines Ltd. http://www.arm.com/armtech/ARM10 Thumb.

[2]

M. Adiletta, M. Rosenbluth, D. Bernstein, G. Wolrich, and H. Wilkinson. The Next Generation of Intel IXP Network Processors. Intel Technology Journal, 6(3), Aug. 2002.

[3]

O. Avissar, R. Barua, D. Stewart, F. G. Lane. An Optimal Memory Allocation Scheme for Scratch-Pad Based Embedded Systems. ACM Transactions on Embedded Computing Systems (TECS), vol. 1, pp. 6--26, 2002.

Digital Library

[4]

K. K. Agaram, S. W. Keckler, C. L., K. S. McKinley. Decomposing Memory Performance: Data Structures and Phases. In Proceedings of the 5th international symposium on Memory management (ISMM), 2006.

Digital Library

[5]

K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of IASTED conference on Parallel and Distributed Computing and Systems (PDCS01), 2001.

[6]

D. Brash. The ARM architecture Version 6 (ARMv6). ARM Ltd., January 2002. White Paper.

[7]

R. Banakar, S. Steinke, B. Lee, M. Balakrishnan and P. Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the tenth international symposium on Hardware/software codesign (CODES), 2002.

Digital Library

[8]

M. Biberstein, M. Chang, B. Mendelson, U. Shvadron, J. Turek. Trace-based Performance Analysis on Cell BE. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2008.

Digital Library

[9]

CEPBA. Paraver Parallel Program Visualization and Analysis tool REFERENCE MANUAL, 2001.

[10]

M. K. G. Chen, O. Ozturk and M. Karakoy. Dynamic scratch-pad memory management for irregular array access patterns. In Proceedings of the conference on Design, automation and test in Europe (DATE), 2006.

Digital Library

[11]

T. Chen, H. Lin, T. Zhang, and et al. Orchestrating data transfer for the Cell/B.E. processor. In Proceedings of the 22nd annual international conference on Supercomputing (ICS), 2008.

Digital Library

[12]

W. Che and K. S. Chatha. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming. In Proceedings of the 48th Design Automation Conference (DAC), 2011.

Digital Library

[13]

T. E. Carlson, W. Heirman, and L. Eeckhout. Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulation. International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 2011

Digital Library

[14]

P. Chakraborty, P. R. Panda. Integrating software caches with scratch pad memory. In Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems (CASES), 2012.

Digital Library

[15]

N. P. Carter. Runnemede: An Architecture for Ubiquitous High-Performance Computing. In Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2013.

Digital Library

[16]

C. Ding. and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation (PLDI), 2003.

Digital Library

[17]

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the cell multiprocessor. Technical report, IBM, July/September 2005.

[18]

A. Janapsatya, S. Parameswaran, and A. Ignjatovic. Hardware/software managed scratchpad memory for embedded system. In Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design (ICCAD), 2004.

Digital Library

[19]

G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In Proceedings of the 25th International Conference on Computer Design (ICCD), 2007.

[20]

A. Kannan, A. Shrivastava, A. Pabalkar and J. Lee. A Software Solution for Dynamic Stack Management on Scratch Pad Memory. In Proceedings of the Asia and south pacific design automation conference (ASP-DAC), 2009.

Digital Library

[21]

A. Leko, B. Golden, H. Sherburne, A. D. George, H. Su. Practical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation.

[22]

L. Li, L. Gao, and J. Xue. Memory coloring: A compiler approach for scratchpad memory management. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2005.

Digital Library

[23]

C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, et.al. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2005.

Digital Library

[24]

X. Liu and J. Mellor-Crummey. Pinpointing Data Locality Problems Using Data-centric Analysis. In Proceedings of the 2011 International Symposium on Code Generation and Optimization (CGO), 2011.

Digital Library

[25]

D. Lu, A. Shrivastava, and K. Bai. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 2011 international conference on Compilers, architectures and synthesis for embedded systems (CASES), 2011.

Digital Library

[26]

J. D. Hiser, J. W. Davidson. EMBARC: An Efficient Memory Bank Assignment Algorithm for Retargetable Compilers. In Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems (LCTES), 2004.

Digital Library

[27]

Motorola. Motorola ColdFire MCF5XXX processor family.

[28]

B. Mohr, F. Wolf. KOJAK - a tool set for automatic performance analysis of parallel programs. In Proceedings of the European Conference on Parallel Computing, 2003.

[29]

R. McIlroy, P. Dickman, J. Sventek. Efficient Dynamic Heap Allocation of Scratch-Pad Memory. In Proceedings of the 7th international symposium on Memory management (ISMM), 2008.

Digital Library

[30]

J. Merino, L. Alvarez, M. Gil, N. Navarro. Cetra: A trace and analysis framework for the evaluation of Cell BE systems. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2009.

[31]

NVIDIAs next generation CUDA compute architecture: FERMI. Technical report.

[32]

P. R. Panda, N. D. Dutt, A. Nicolau. Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications. In Proceedings of the 1997 European conference on Design and Test (EDTC), 1997.

Digital Library

[33]

P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, A. V. C. Kulkarni, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 6, pp. 149--206, 2001.

Digital Library

[34]

P. Ranganathan, S. Adve, and N. P. Jouppi. Reconfigurable caches and their application to media processing. In Proceedings of the 27th annual international symposium on Computer architecture (ISCA), 2000.

Digital Library

[35]

J. Sjodin, B. Froderberg, and T. Lindgren. Allocation of global data objects in on-chip ram. Compiler and Architecture Support for Embedded Computing Systems, December 1998.

[36]

http://www.cse.iitd.ac.in/~panda/spm-sieve

[37]

V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. WCET centric data allocation to scratchpad memory. In Proceedings of the 26th IEEE International Real-Time Systems Symposium (RTSS), 2005.

Digital Library

[38]

S. S. Shende and A. D. Malony, The TAU parallel performance system. International Journal of High Performance Computing Applications, vol. 2, pp. 287--311, 2006.

Digital Library

[39]

V. Suhendra, A. Roychoudhury, T. Mitra. Scratchpad allocation for concurrent embedded software. In Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis (CODES+ISSS), 2008.

Digital Library

[40]

S. Udayakumaran and R. Barua. An integrated scratch-pad allocator for affine and non-affine code. In Proceedings of the conference on Design, automation and test in Europe (DATE), 2006.

Digital Library

[41]

M. Verma, L. Wehmeyer, P. Marwedel. Dynamic overlay of scratch-pad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ISSS), 2004.

Digital Library

[42]

H. Wu, J. Xue, S. Parameswaran. Optimal WCET-aware code selection for scratchpad memory. In Proceedings of the tenth ACM international conference on Embedded software (EMSOFT), 2010.

Digital Library

[43]

T. Yemliha, S. Srikantaiah, M. Kandemir, and O. Ozturk. SPM management using markov chain based data access prediction. In Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2008.

Digital Library

[44]

X. Yang, T. Tang, L. Wang, X. Ren, J. Xue, S. Ye. Improving scratchpad allocation with demand-driven data tiling. In Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems (CASES), 2010.

Digital Library

Cited By

Du JLi RXiao ZTong ZZhang L(2017)Optimization of Data Allocation on CMP Embedded System with Data MigrationInternational Journal of Parallel Programming10.1007/s10766-016-0436-345:4(965-981)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s10766-016-0436-3

Index Terms

SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Integrating software caches with scratch pad memory
CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems

Software cache refers to cache functionality emulated in software on a compiler-controlled Scratch Pad Memory (SPM). Such structures are useful when standard SPM allocation strategies cannot be used due to hard-to-analyze memory reference patterns in ...
Recursive function data allocation to scratch-pad memory
CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems

This paper presents the first automatic scheme to allocate local (stack) data in recursive functions to scratch-pad memory (SPM) in embedded systems. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-...
DynaPoMP: dynamic policy-driven memory protection for SPM-based embedded systems
WESS '11: Proceedings of the Workshop on Embedded Systems Security

Today's embedded systems are often used to access, store, manipulate, and communicate sensitive data. Embedded system security risks are exacerbated by emerging trends (e.g., network connectivity, application download service, migration to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

September 2013

247 pages

ISBN:9781479914005

Program Chairs:
Rodric Rabbah
IBM Research
,
Anand Raghunathan
Purdue University

Sponsors

Publisher

IEEE Press

Publication History

Published: 29 September 2013

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWEEK'13

Sponsor:

ESWEEK'13: Ninth Embedded System Week

September 29 - October 4, 2013

Quebec, Montreal, Canada

Acceptance Rates

CASES '13 Paper Acceptance Rate 21 of 68 submissions, 31%;

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Du JLi RXiao ZTong ZZhang L(2017)Optimization of Data Allocation on CMP Embedded System with Data MigrationInternational Journal of Parallel Programming10.1007/s10766-016-0436-345:4(965-981)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s10766-016-0436-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents