[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2555729.2555750acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems

Published: 29 September 2013 Publication History

Abstract

Modern system architectures sometimes include scratch pad memories (SPM) in their memory hierarchy to take advantage of their simpler design, in an attempt to meet the system area, performance, and power budget. These systems employing SPM can be broadly categorized as: (a) cacheless systems with only SPM, (b) hybrid systems with both cache and SPM, and (c) reconfigurable systems with the provision to reconfigure local memory as either cache, SPM, or a combination of the two. However SPM based systems have needed larger efforts spent on their programming, mainly due to allocating data and orchestrating data transfers explicitly by software. Tight product development cycles require faster development and porting of diverse applications to multiple SPM based architectures. In this paper we present SPM-Sieve, a profile-based tool and framework targeted for SPM based architectures that generates partitioning decisions of the first level memory in the system hierarchy, and suggests object mapping amongst the memory partitions without resorting to detailed simulation of all configurations. This is done by natively executing an application and using minimal target architecture specification, which not only provides early information influencing data organization in the application, but also provides a foundation for other more sophisticated algorithms to produce optimized allocations. We demonstrate the utility and generality of SPM-Sieve by evaluating it on a large number of SPEC2000 benchmarks targeted for a 128KB first level memory. We evaluate its effectiveness by performing simulation studies comparing the partition suggested by the tool against varying partition sizes, and observe that its suggestions are very competitive for SPM based architectures with and without caches.

References

[1]
ARM. Advanced RISC Machines Ltd. http://www.arm.com/armtech/ARM10 Thumb.
[2]
M. Adiletta, M. Rosenbluth, D. Bernstein, G. Wolrich, and H. Wilkinson. The Next Generation of Intel IXP Network Processors. Intel Technology Journal, 6(3), Aug. 2002.
[3]
O. Avissar, R. Barua, D. Stewart, F. G. Lane. An Optimal Memory Allocation Scheme for Scratch-Pad Based Embedded Systems. ACM Transactions on Embedded Computing Systems (TECS), vol. 1, pp. 6--26, 2002.
[4]
K. K. Agaram, S. W. Keckler, C. L., K. S. McKinley. Decomposing Memory Performance: Data Structures and Phases. In Proceedings of the 5th international symposium on Memory management (ISMM), 2006.
[5]
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of IASTED conference on Parallel and Distributed Computing and Systems (PDCS01), 2001.
[6]
D. Brash. The ARM architecture Version 6 (ARMv6). ARM Ltd., January 2002. White Paper.
[7]
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan and P. Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the tenth international symposium on Hardware/software codesign (CODES), 2002.
[8]
M. Biberstein, M. Chang, B. Mendelson, U. Shvadron, J. Turek. Trace-based Performance Analysis on Cell BE. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2008.
[9]
CEPBA. Paraver Parallel Program Visualization and Analysis tool REFERENCE MANUAL, 2001.
[10]
M. K. G. Chen, O. Ozturk and M. Karakoy. Dynamic scratch-pad memory management for irregular array access patterns. In Proceedings of the conference on Design, automation and test in Europe (DATE), 2006.
[11]
T. Chen, H. Lin, T. Zhang, and et al. Orchestrating data transfer for the Cell/B.E. processor. In Proceedings of the 22nd annual international conference on Supercomputing (ICS), 2008.
[12]
W. Che and K. S. Chatha. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming. In Proceedings of the 48th Design Automation Conference (DAC), 2011.
[13]
T. E. Carlson, W. Heirman, and L. Eeckhout. Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulation. International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 2011
[14]
P. Chakraborty, P. R. Panda. Integrating software caches with scratch pad memory. In Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems (CASES), 2012.
[15]
N. P. Carter. Runnemede: An Architecture for Ubiquitous High-Performance Computing. In Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2013.
[16]
C. Ding. and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation (PLDI), 2003.
[17]
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the cell multiprocessor. Technical report, IBM, July/September 2005.
[18]
A. Janapsatya, S. Parameswaran, and A. Ignjatovic. Hardware/software managed scratchpad memory for embedded system. In Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design (ICCAD), 2004.
[19]
G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In Proceedings of the 25th International Conference on Computer Design (ICCD), 2007.
[20]
A. Kannan, A. Shrivastava, A. Pabalkar and J. Lee. A Software Solution for Dynamic Stack Management on Scratch Pad Memory. In Proceedings of the Asia and south pacific design automation conference (ASP-DAC), 2009.
[21]
A. Leko, B. Golden, H. Sherburne, A. D. George, H. Su. Practical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation.
[22]
L. Li, L. Gao, and J. Xue. Memory coloring: A compiler approach for scratchpad memory management. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2005.
[23]
C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, et.al. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2005.
[24]
X. Liu and J. Mellor-Crummey. Pinpointing Data Locality Problems Using Data-centric Analysis. In Proceedings of the 2011 International Symposium on Code Generation and Optimization (CGO), 2011.
[25]
D. Lu, A. Shrivastava, and K. Bai. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 2011 international conference on Compilers, architectures and synthesis for embedded systems (CASES), 2011.
[26]
J. D. Hiser, J. W. Davidson. EMBARC: An Efficient Memory Bank Assignment Algorithm for Retargetable Compilers. In Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems (LCTES), 2004.
[27]
Motorola. Motorola ColdFire MCF5XXX processor family.
[28]
B. Mohr, F. Wolf. KOJAK - a tool set for automatic performance analysis of parallel programs. In Proceedings of the European Conference on Parallel Computing, 2003.
[29]
R. McIlroy, P. Dickman, J. Sventek. Efficient Dynamic Heap Allocation of Scratch-Pad Memory. In Proceedings of the 7th international symposium on Memory management (ISMM), 2008.
[30]
J. Merino, L. Alvarez, M. Gil, N. Navarro. Cetra: A trace and analysis framework for the evaluation of Cell BE systems. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2009.
[31]
NVIDIAs next generation CUDA compute architecture: FERMI. Technical report.
[32]
P. R. Panda, N. D. Dutt, A. Nicolau. Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications. In Proceedings of the 1997 European conference on Design and Test (EDTC), 1997.
[33]
P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, A. V. C. Kulkarni, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 6, pp. 149--206, 2001.
[34]
P. Ranganathan, S. Adve, and N. P. Jouppi. Reconfigurable caches and their application to media processing. In Proceedings of the 27th annual international symposium on Computer architecture (ISCA), 2000.
[35]
J. Sjodin, B. Froderberg, and T. Lindgren. Allocation of global data objects in on-chip ram. Compiler and Architecture Support for Embedded Computing Systems, December 1998.
[36]
http://www.cse.iitd.ac.in/~panda/spm-sieve
[37]
V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. WCET centric data allocation to scratchpad memory. In Proceedings of the 26th IEEE International Real-Time Systems Symposium (RTSS), 2005.
[38]
S. S. Shende and A. D. Malony, The TAU parallel performance system. International Journal of High Performance Computing Applications, vol. 2, pp. 287--311, 2006.
[39]
V. Suhendra, A. Roychoudhury, T. Mitra. Scratchpad allocation for concurrent embedded software. In Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis (CODES+ISSS), 2008.
[40]
S. Udayakumaran and R. Barua. An integrated scratch-pad allocator for affine and non-affine code. In Proceedings of the conference on Design, automation and test in Europe (DATE), 2006.
[41]
M. Verma, L. Wehmeyer, P. Marwedel. Dynamic overlay of scratch-pad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ISSS), 2004.
[42]
H. Wu, J. Xue, S. Parameswaran. Optimal WCET-aware code selection for scratchpad memory. In Proceedings of the tenth ACM international conference on Embedded software (EMSOFT), 2010.
[43]
T. Yemliha, S. Srikantaiah, M. Kandemir, and O. Ozturk. SPM management using markov chain based data access prediction. In Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2008.
[44]
X. Yang, T. Tang, L. Wang, X. Ren, J. Xue, S. Ye. Improving scratchpad allocation with demand-driven data tiling. In Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems (CASES), 2010.

Cited By

View all
  • (2017)Optimization of Data Allocation on CMP Embedded System with Data MigrationInternational Journal of Parallel Programming10.1007/s10766-016-0436-345:4(965-981)Online publication date: 1-Aug-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
September 2013
247 pages
ISBN:9781479914005

Sponsors

Publisher

IEEE Press

Publication History

Published: 29 September 2013

Check for updates

Author Tags

  1. memory allocation
  2. scratch pad memory
  3. software cache

Qualifiers

  • Research-article

Conference

ESWEEK'13
ESWEEK'13: Ninth Embedded System Week
September 29 - October 4, 2013
Quebec, Montreal, Canada

Acceptance Rates

CASES '13 Paper Acceptance Rate 21 of 68 submissions, 31%;
Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Optimization of Data Allocation on CMP Embedded System with Data MigrationInternational Journal of Parallel Programming10.1007/s10766-016-0436-345:4(965-981)Online publication date: 1-Aug-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media