[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs

Published: 16 January 2013 Publication History

Abstract

Modern embedded systems need to support multiple time-constrained multimedia applications that often employ multiprocessor-systems-on-chip (MPSoCs). Such systems need to be optimized for resource usage and energy consumption. It is well understood that a design-time approach cannot provide timing guarantees for all the applications due to its inability to cater for dynamism in applications. However, a runtime approach consumes large computation requirements at runtime and hence may not lend well to constrained-aware mapping.
In this article, we present a hybrid approach for efficient mapping of applications in such systems. For each application to be supported in the system, the approach performs extensive design-space exploration (DSE) at design time to derive multiple design points representing throughput and energy consumption at different resource combinations. One of these points is selected at runtime efficiently, depending upon the desired throughput while optimizing for energy consumption and resource usage. While most of the existing DSE strategies consider a fixed multiprocessor platform architecture, our DSE considers a generic architecture, making DSE results applicable to any target platform. All the compute-intensive analysis is performed during DSE, which leaves for minimum computation at runtime. The approach is capable of handling dynamism in applications by considering their runtime aspects and providing timing guarantees.
The presented approach is used to carry out a DSE case study for models of real-life multimedia applications: H.263 decoder, H.263 encoder, MPEG-4 decoder, JPEG decoder, sample rate converter, and MP3 decoder. At runtime, the design points are used to map the applications on a heterogeneous MPSoC. Experimental results reveal that the proposed approach provides faster DSE, better design points, and efficient runtime mapping when compared to other approaches. In particular, we show that DSE is faster by 83% and runtime mapping is accelerated by 93% for some cases. Further, we study the scalability of the approach by considering applications with large numbers of tasks.

References

[1]
Ahn, Y., Han, K., Lee, G., Song, H., Yoo, J., Choi, K., and Feng, X. 2008. SoCDAL: System-on-chip design acceLerator. ACM Trans. Des. Autom. Electron. Syst. 13, 17, 1--38.
[2]
Angiolini, F., Ceng, J., Leupers, R., Ferrari, F., Ferri, C., and Benini, L. 2006. An integrated open framework for heterogeneous MPSoC design space exploration. In Proceedings of the Design, Automation and Test Conference in Europe. 1--6.
[3]
Ascia, G., Catania, V., Di Nuovo, A. G., Palesi, M., and Patti, D. 2007. Efficient design space exploration for application specific systems-on-a-chip. J. Syst. Archit. 53, 733--750.
[4]
Benini, L., Bertozzi, D., and Milano, M. 2008. Resource management policy handling multiple use-cases in MPSoC platforms using constraint programming. In Proceedings of the International Conference on Logic Programming. 470--484.
[5]
Bonfietti, A., Lombardi, M., Milano, M., and Benini, L. 2009. Throughput constraint for synchronous data flow graphs. In Proceedings of the International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. 26--40.
[6]
Borkar, S. 2007. Thousand core chips: A technology perspective. In Proceedings of the Annual Design Automation Conference. 746--749.
[7]
Carvalho, E. and Moraes, F. 2008. Congestion-aware task mapping in heterogeneous MPSoCs. In International Symposium on System-on-Chop (SoC). 1--4.
[8]
Cho, S. H., Xanthopoulos, T., and Chandrakasan, A. 1999. A low power variable length decoder for MPEG-2 based on nonuniform fine-grain table partitioning. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 7, 2, 249--257.
[9]
Gangwal, O. P., Radulescu, A., Goossens, K., Pestana, S. G., and Rijpkema, E. 2005. Building predictable systems on chip: An analysis of guaranteed communication in the Æthereal network on chip. In Dynamic and Robust Streaming in and between Connected Consumer-Electronic Devices, vol. 3, Springer, 1--36.
[10]
Geilen, M., Basten, T., Theelen, B., and Otten, R. 2005. An algebra of Pareto points. In Proceedings of the International Conference on Application of Concurrency to System Design. 88--97.
[11]
Ghamarian, A. H., Geilen, M. C. W., Stuijk, S., Basten, T., Theelen, B. D., Mousavi, M. R., Moonen, A. J. M., and Bekooij, M. J. G. 2006. Throughput analysis of synchronous data flow graphs. In Proceedings of the International Conference on Application of Concurrency to System Design. 25--36.
[12]
Giovanni, B., Fossati, L., and Sciuto, D. 2010. Decision-theoretic design space exploration of multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integ. Cir. Sys. 29, 1083--1095.
[13]
Goossens, K., Dielissen, J., and Radulescu, A. 2005. AEthereal network on chip: Concepts, architectures, and implementations. IEEE Des. Test 22, 5, 414--421.
[14]
Grecu, C., Pande, P., Ivanov, A., and Saleh, R. 2005. Timing analysis of network on chip architectures for mp-soc platforms. Microelectronics J. 36, 9, 833--845.
[15]
Hentati, M., Aoudni, Y., Nezan, J., Abid, M., and Deforges, O. 2011. FPGA dynamic reconfiguration using the RVC technology: Inverse quantization case study. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing. 1--7.
[16]
Hu, J. and Marculescu, R. 2004. Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In Proceedings of the conference on Design, automation and Test in Europe (DATE'04).
[17]
Jia, Z. J., Pimentel, A., Thompson, M., Bautista, T., and Nunez, A. 2010. NASA: A generic infrastructure for system-level MP-SoC design space exploration. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia. 41--50.
[18]
Keinert, J., Streubühr, M., Schlichter, T., Falk, J., Gladigau, J., Haubelt, C., Teich, J., and Meredith, M. 2009. SystemCoDesigner—an automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications. ACM Trans. Des. Autom. Electron. Syst. 14, 1, 1--23.
[19]
Kim, M., Banerjee, S., Dutt, N., and Venkatasubramanian, N. 2008. Energy-aware cosynthesis of real-time multimedia applications on MPSoCs using heterogeneous scheduling policies. ACM Trans. Embed. Comput. Syst. 7, 1, 1--19.
[20]
Kistler, M., Perrone, M., and Petrini, F. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro 26, 10--23.
[21]
Kumar, A., Fernando, S., Ha, Y., Mesman, B., and Corporaal, H. 2008. Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA. ACM Trans. Des. Autom. Electron. Syst. 13, 40, 1--27.
[22]
Lee, E. A. and Messerschmitt, D. G. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24--35.
[23]
Leijten, J., van Meerbergen, J., Timmer, A., and Jess, J. 1997. PROPHID: A heterogeneous multi-processor architecture for multimedia. In Proceedings of the International Conference on Computer Design. 164--169.
[24]
Liu, W., Yuan, M., He, X., Gu, Z., and Liu, X. 2008. Efficient SAT-based mapping and scheduling of homogeneous synchronous dataflow graphs for throughput optimization. In Proceedings of the Real-Time Systems Symposium. 492--504.
[25]
Lukasiewycz, M., Glass, M., Haubelt, C., and Teich, J. 2008. Efficient symbolic multi-objective design space exploration. In Proceedings of the Asia and South Pacific Design Automation Conference. 691--696.
[26]
Mariani, G., Avasare, P., Vanmeerbeeck, G., Ykman-Couvreur, C., Palermo, G., Silvano, C., and Zaccaria, V. 2010. An industrial design space exploration framework for supporting run-time resource management on multi-core systems. In Proceedings of the Conference on Design, Automation and Test in Europe. 196--201.
[27]
Moreira, O., Mol, J. J.-D., and Bekooij, M. 2007. Online resource management in a multiprocessor with a network-on-chip. In Proceedings of the Symposium on Applied Computing. 1557--1564.
[28]
Moreira, O., Valente, F., and Bekooij, M. 2007. Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor. In Proceedings of the International Conference on Embedded Software. 57--66.
[29]
Nollet, V., Avasare, P., Eeckhaut, H., Verkest, D., and Corporaal, H. 2008. Run-time management of a MPSoC containing FPGA fabric tiles. IEEE Trans. Very Large Scale Integr. Syst. 16, 24--33.
[30]
OEIS. 2012. Encyclopedia of integer sequences. http://oeis.org/.
[31]
Palermo, G., Silvano, C., and Zaccaria, V. 2005. Multi-objective design space exploration of embedded systems. J. Embed. Comput. 1, 305--316.
[32]
Palermo, G., Silvano, C., and Zaccaria, V. 2008. Robust optimization of SoC architectures: A multi-scenario approach. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia. 7--12.
[33]
Palma, J., Marcon, C., Moraes, F., Calazans, N., Reis, R., and Susin, A. 2005. Mapping embedded systems onto NoCs—The traffic effect on dynamic energy estimation. In Proceedings of the Symposium on Integrated Circuits and Systems Design. 196--201.
[34]
Paulin, P. G., Pilkington, C., Bensoudane, E., Langevin, M., and Lyonnard, D. 2004. Application of a multi-processor SoC platform to high-speed packet forwarding. In Proceedings of the Conference on Design, Automation and Test in Europe. 58--63.
[35]
Ren, J. and Kehtarnavaz, N. 2007. Comparison of power consumption for motion compensation and deblocking filters in high definition video coding. In Proceedings of the International Symposium on Consumer Electronics. 1--5.
[36]
Rutten, M. J., van Eijndhoven, J. T. J., Jaspers, E. G. T., van der Wolf, P., Pol, E.-J. D., Gangwal, O. P., and Timmer, A. 2002. A heterogeneous multiprocessor architecture for flexible media processing. IEEE Des. Test 19, 39--50.
[37]
Schranzhofer, A., Chen, J.-J., and Thiele, L. 2010. Dynamic power-aware mapping of applications onto heterogeneous MPSoC platforms. IEEE Trans. Ind. Inf. 6, 4, 692--707.
[38]
Segars, S. 1997. ARM7TDMI power consumption. IEEE Micro 17, 4, 12--19.
[39]
Singh, A. K., Jigang, W., Prakash, A., and Srikanthan, T. 2009. Efficient heuristics for minimizing communication overhead in noc-based heterogeneous MPSoC platforms. In Proceedings of the International Symposium on Rapid System Prototyping. 55--60.
[40]
Singh, A. K., Kumar, A., and Srikanthan, T. 2011. A hybrid strategy for mapping multiple throughput-constrained applications on MPSoCs. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems.
[41]
Singh, A. K., Srikanthan, T., Kumar, A., and Jigang, W. 2010. Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms. J. Syst. Archit. 56, 242--255.
[42]
Stuijk, S., Basten, T., Geilen, M. C. W., and Corporaal, H. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In Proceedings of the 44th Annual Design Automation Conference. 777--782.
[43]
Stuijk, S., Geilen, M., and Basten, T. 2006. SDF3: SDF for free. In Proceedings of the 6th International Conference on Application of Concurrency to System Design. 276--278.
[44]
Stuijk, S., Geilen, M., and Basten, T. 2010. A predictable multiprocessor design flow for streaming applications with dynamic behaviour. In Proceedings of Euromicro Conference on Digital System Design. 548--555.
[45]
Sung, T.-Y., Shieh, Y.-S., Yu, C.-W., and Hsin, H.-C. 2006. High-efficiency and low-power architectures for 2-D DCT and IDCT based on CORDIC rotation. In International Conference on Parallel and Distributed Computing, Applications and Technologies. 191--196.
[46]
Texas Instruments. 2010. TMS320C6412 DSP. http://www.ti.com/product/tms320c6412.
[47]
TILE-Gx100 2009. First 100-core processor with the new TILE-Gx family. http://www.tilera.com/products/processors/TILE-Gx_Family.
[48]
van Stralen, P. and Pimentel, A. 2010. Scenario-based design space exploration of MPSoCs. In International Conference on Computer Design. 305--312.
[49]
Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., and Borkar, N. 2007. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the International Solid-State Circuits Conference. 98--589.
[50]
Yang, P., Marchal, P., Wong, C., Himpe, S., Catthoor, F., David, P., Vounckx, J., and Lauwereins, R. 2002. Managing dynamic concurrent tasks in embedded real-time multimedia systems. In Proceedings of the International Symposium on System Synthesis. 112--119.
[51]
Yang, Z., Kumar, A., and Ha, Y. 2010. An area-efficient dynamically reconfigurable spatial division multiplexing network-on-chip with static throughput guarantee. In Proceedings of the International Conference on Field-Programmable Technology. 389--392.
[52]
Ykman-Couvreur, C., Avasare, P., Mariani, G., Palermo, G., Silvano, C., and Zaccaria, V. 2011. Linking run-time resource management of embedded multi-core platforms with automated design-time exploration. Computers Digital Techniques, IET 5, 2, 123--135.
[53]
Ykman-Couvreur, C., Nollet, V., Catthoor, F., and Corporaal, H. 2006. Fast multi-dimension multi-choice knapsack heuristic for MP-SoC run-time management. In Proceedings of the International Symposium on System-on-Chip. 1--4.
[54]
Zamora, N. H., Hu, X., and Marculescu, R. 2007. System-level performance/power analysis for platform-based design of multimedia applications. ACM Trans. Des. Autom. Electron. Syst. 12, 2, 1--29.

Cited By

View all
  • (2024)Flexible Spatio-Temporal Energy-Efficient Runtime Management2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473885(777-784)Online publication date: 22-Jan-2024
  • (2023)Generating Unified Platforms Using Multigranularity Domain DSE (MG-DmDSE) Exploiting Application SimilaritiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317237342:1(280-293)Online publication date: Jan-2023
  • (2022)Run-Time Remapping Algorithm of Dataflow Actors on NoC-Based Heterogeneous MPSoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317795733:12(3959-3976)Online publication date: 1-Dec-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 18, Issue 1
Special section on adaptive power management for energy and temperature-aware computing systems
January 2013
319 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2390191
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 16 January 2013
Accepted: 01 August 2012
Revised: 01 April 2012
Received: 01 September 2011
Published in TODAES Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multiprocessor systems-on-chip
  2. design-space exploration
  3. embedded systems
  4. energy consumption
  5. multimedia applications
  6. runtime mapping
  7. synchronous data-flow graphs
  8. throughput

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Flexible Spatio-Temporal Energy-Efficient Runtime Management2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473885(777-784)Online publication date: 22-Jan-2024
  • (2023)Generating Unified Platforms Using Multigranularity Domain DSE (MG-DmDSE) Exploiting Application SimilaritiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317237342:1(280-293)Online publication date: Jan-2023
  • (2022)Run-Time Remapping Algorithm of Dataflow Actors on NoC-Based Heterogeneous MPSoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317795733:12(3959-3976)Online publication date: 1-Dec-2022
  • (2022)mpsym: Improving Design-Space Exploration of Clustered Manycores With Arbitrary TopologiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310251241:6(1592-1605)Online publication date: Jun-2022
  • (2020)Energy-efficient runtime resource management for adaptable multi-application mappingProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408558(909-914)Online publication date: 9-Mar-2020
  • (2020)Hybrid Application Mapping for Composable Many-Core Systems: Overview and Future PerspectiveJournal of Low Power Electronics and Applications10.3390/jlpea1004003810:4(38)Online publication date: 17-Nov-2020
  • (2020)Energy-efficient Runtime Resource Management for Adaptable Multi-application Mapping2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116381(909-914)Online publication date: Mar-2020
  • (2020)Adaptive Task Allocation and Scheduling on NoC-based Multicore Platforms with Multitasking ProcessorsACM Transactions on Embedded Computing Systems10.1145/340832420:1(1-26)Online publication date: 7-Dec-2020
  • (2020)Run-Time Enforcement of Non-Functional Application Requirements in Heterogeneous Many-Core SystemsProceedings of the 25th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC47756.2020.9045536(629-636)Online publication date: 17-Jan-2020
  • (2020)Run-Time Enforcement of Non-functional Program Properties on MPSoCsA Journey of Embedded and Cyber-Physical Systems10.1007/978-3-030-47487-4_9(125-149)Online publication date: 31-Jul-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media