Abstract
One of the most critical components that determine the success of an MPSoC based architecture is its on-chip memory. Scratch Pad Memory (SPM) is increasingly being applied to substitute cache as the on-chip memory of embedded MPSoCs due to its superior chip area, power consumption and timing predictability. SPM can be organized as a Virtually Shared SPM (VS-SPM) architecture that takes advantage of both shared and private SPM. However, making effective use of the VS-SPM architecture strongly depends on two inter-dependent problems: variable partitioning and task scheduling. In this paper, we decouple these two problems and solve them in phase-ordered manner. We propose two variable partitioning heuristics based on an initial schedule: High Access Frequency First (HAFF) variable partitioning and Global View Prediction (GVP) variable partitioning. Then, we present a loop pipeline scheduling algorithm known as Rotation Scheduling with Variable Partitioning (RSVP) to improve overall throughput. Our experimental results obtained on MiBench show that the average performance improvements over IDAS (Integrated Data Assignment with Scheduling) are 23.74% for HAFF and 31.91% for GVP on four-core MPSoC. The average schedule length generated by RSVP is 25.96% shorter than that of list scheduling with optimal variable partition.
Similar content being viewed by others
References
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., & Marwedel, P. (2002). Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. CODES ’02: Proceedings of the tenth international symposium on Hardware/software codesign (pp. 73–78).
Motorola Corporation (1998). Mmc2001 reference manual. http://www.motorola.com/SPS/MCORE/info_documentation.htm.
Texas Instruments (1997). Tms370cx7x 8-bit microcontroller. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.
Motorola Corporation (2000). Cpu12 reference manual. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16BIT/68HC12FAMILY/REFMAT/CPU12RM.pdf.
Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., & Parikh, A. (2001). Dynamic management of scratch-pad memory space. In DAC ’01: Proceedings of the 38th conference on Design automation (pp. 690–695).
Xue, C., Shao, Z., Liu, M., Qiu, M., & Sha E. H. M. (2006). Loop scheduling with complete memory latency hiding on multi-core architecture. In ICPADS ’06: Proceedings of the 12th international conference on parallel and distributed systems (pp. 375–382).
Chen, T.-F., & Baer, J.-L. (1998). A performance study of software and hardware data prefetching schemes. International Symposium on Computer Architecture, 223–232.
Chen, F., ONeil, T. W., & Sha, E. H.-M. (2000). Optimizing overall loop schedules using prefetching and partitioning. IEEE Transactions on Parallel and Distributed Systems, 11(6), 604–614.
Wang, Z., Sha, E. H.-M., & Wang, Y. (2002). Partitioning and scheduling dsp applications with maximal memory access hiding. EURASIP Journal on Applied Signal Processing, 9, 926–935.
Kandemir, M., Ramanujam, J., & Choudhury, A. (2002). Exploring shared scratch pad memory space in embedded multiprocessor system. In DAC ’02: Proceedings of the 39th conference on design automation (pp. 219–224).
Terechko, A., Le Thénaff, E., & Corporaal, H. (2003). Cluster assignment of global values for clustered vliw processors. In CASES ’03: Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems (pp. 32–40).
Suhendra, V., Raghavan, C., & Mitra, T. (2006). Integrated scratchpad memory optimization and task scheduling for mpsoc architectures. In CASES ’06: Proceedings of the 2006 international conference on compilers, architecture and synthesis for embedded systems (pp. 401–410).
Ozturk, O., Chen, G., Kandemir, M., & Karakoy, M. (2006). An integer linear programming based approach to simultaneous memory space partitioning and data allocation for chip multiprocessors. In ISVLSI ’06: Proceedings of the IEEE computer society annual symposium on emerging VLSI technologies and architectures (p. 50).
Vallerio, K. S., & Jha, N. K. (2003). Task graph extraction for embedded system synthesis. In VLSID ’03: Proceedings of the 16th international conference on VLSI design (p. 480).
Chao, L.-F., LaPaugh, A. S., & Sha, E. H.-M. (1997). Rotation scheduling: A loop pipelining algorithm. IEEE Transactins on Computer-Aided Design, 16(3), 229–239.
Aiken, A., & Nicolau, A. (1988). Optimal loop parallelization. SIGPLAN Notices, 23(7).
Chao, L.-F., & Sha, E. H.-M. (1997). Scheduling data-flow graphs via retiming and unfolding. IEEE Transactions on Parallel and Distributed Systems, 8(12), 1259–1267.
Ozturk, O., Kandemir, M., Chen, G., Irwin, M. J., & Karakoy, M. (2005). Customized on-chip memories for embedded chip multiprocessors. In ASP-DAC ’05: Proceedings of the 2005 conference on Asia South Pacific design automation (pp. 743–748).
Meftali, S., Gharsalli, F., Rousseau, F., & Jerraya, A. A. (2001). An optimal memory allocation for application-specific multiprocessor system-on-chip. In ISSS ’01: Proceedings of the 14th international symposium on systems synthesis (pp. 19–24).
Panda, P. R., Dutt, N. D., & Nicolau, A. (2000). On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems, 5(3), 682–704.
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., & Shippy, D. (2005). Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 589–604.
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown R. B. (2001). Mibench: A free, commercially representative embedded benchmark suite. In WWC ’01: Proceedings of the workload characterization, 2001. WWC-4. 2001 IEEE international workshop (pp. 3–14).
Valgrind (2009). Valgrind homepage. http://www.valgrind.org.
Chen G., Ozturk, O., Kandemir, M., & Irwin, M. J. (2006). Multi-level on-chip memory hierachy design for embedded chip multiprocessor. In ICPADS ’06: Proceedings of the 12th international conference on parallel and distributed system (pp. 383–390).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by NSF CCR-0309461, NSF IIS-0513669, HK CERG B-Q60B, NSFC 60728206, and China Scholarship Council[2007]3020.
Rights and permissions
About this article
Cite this article
Zhang, L., Qiu, M., Tseng, WC. et al. Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory. J Sign Process Syst Sign Image Video Technol 58, 247–265 (2010). https://doi.org/10.1007/s11265-009-0362-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-009-0362-3