Abstract
The hardware/software (HW/SW) partitioning is a major concern in heterogeneous multi-processor system-on-a-chip design, where the large design space prohibits rapid identification of optimal HW/SW solutions to meet tight time-to-market constraints. In this paper, we propose graph reduction techniques to reduce the design space for HW/SW partitioning without sacrificing the partition quality. There are two major phases in the proposed approach: reducible sub-graph searching and sub-graph evaluation and reduction. In the former phase, we design a dynamic programming-based algorithm, named path flow algorithm (PFA), to identify reducible sub-graph candidates for directed acyclic graph (DAG) as most previous works use DAG as task graph model. We also propose algorithm DeLoop to transform an arbitrary directed graph into a DAG such that all reducible sub-graphs on the original graph can be detected by performing algorithm PFA on the DAG. Our approach overcomes the limitation of the existing approach by enabling the identification of candidate sub-graphs in arbitrary task graphs. In latter phase, we propose a reduction model which enables accurate estimation of task execution time on hardware and design a method to select candidate sub-graphs for reduction. Experimental results demonstrate that the proposed methods not only reduce the design space, but also notably improve the partitioning quality since hardware-parallel execution of tasks is taken into account in the proposed sub-graph reduction model.
Similar content being viewed by others
References
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Comput Gr Forum 8(1):3–11
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput. 10(2):188–192
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor theoretical properties and algorithms. Parallel Comput 21(11):1783–1805
Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference, pp 349–357
Arató P, Mann ZÁ, Orbán A (2005) Algorithmic aspects of hardware/software partitioning. ACM Trans Des Autom Electron Syst (TODAES) 10(1):136–156
Tahaee SA, Jahangir AH (2010) A polynomial algorithm for partitioning problems. ACM Trans Embed Comput Syst (TECS) 9(4):1–38. (Article 34)
Vahid F, Gajski DD, Gong J (1994) A binary-constraint search algorithm for minimizing hardware during hardware/software partitioning. In: Proceedings of the conference on the European design automation (EURO-DAC), pp 214–219
Dick RP, Jha NK (1997) Mogac: a multiobjective genetic algorithm for the co-synthesis of hardware–software embedded systems, In: Proceedings of 1997 IEEE/ACM international conference on computer-aided design (ICCAD), pp 522–529
López-Vallejo M, López JC (2003) On the hardware–software partitioning problem: system modeling and partitioning techniques. ACM Trans Des Autom Electron Syst (TODAES) 8(3):269–297
Vahid F (2002) Partitioning sequential programs for cad using a three-step approach. ACM Trans Des Autom Electron Syst (TODAES) 7(3):413–429
Wu J, Srikanthan T, Tao J (2008) Algorithmic aspects for functional partitioning and scheduling in hardware/software co-design. Des Autom Embed Syst 12(4):345–375
Henkel J (1999) A low power hardware/software partitioning approach for core-based embedded systems. In: Proceedings of the 36th ACM/IEEE design automation conference (DAC), pp 122–127
Stitt G, Vahid F (2002) The energy advantages of microprocessor platforms with on-chip configurable logic. IEEE Des Test Comput 19(6):36–43
Mu J, Lysecky R (2009) Autonomous hardware/software partitioning and voltage/frequency scaling for low-power embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 15(1):1–20
Lysecky R (2007) Low-power warp processor for power efficient high-performance embedded systems. In: Proceedings of the design, automation and test in Europe conference and exhibition (DATE), pp 1–6
Banerjee S, Dutt N (2004) Efficient search space exploration for HW–SW partitioning. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 122–127
Wu J, Srikanthan T, Chen G (2010) Algorithmic aspects of hardware/software partitioning: 1D search algorithms. IEEE Trans Comput 59(4):532–544
Tahaee SA, Jahangir AH, Habibi-Masouleh H (2008) Improving the performance of heuristic searches with judicious initial point selection. In: Proceedings of the 5th IEEE international symposium on embedded computing, pp 14–19
Wu J, Srikanthan T (2006) Low-complex dynamic programming algorithm for hardware/software partitioning. Inf Process Lett 98(2):41–46
Niemann R, Marwedel P (1997) An algorithm for hardware/software partitioning using mixed integer linear programming. Des Autom Embed Syst 2(2):165–193
Wu J, Wang P, Lam SK, Srikanthan T (2013) Efficient heuristic and tabu search for hardware/software partitioning. J Supercomput 66(1):118–134
Monshizadeh N, Trentelman HL, Camlibel MK (2014) Projection based model reduction of multi-agent systems using graph partitions. IEEE Trans Control Netw Syst 1(2):145–154
Shen C, Tong W (2013) A novel method of adopting graph reduction for resource management in parallel computing model. In: Proceedings of the 2013 IEEE international conference on green computing and communications and ieee internet of things and IEEE cyber, physical and social computing, pp 2218–2220
Song Y, Shi G (2012) Hierarchical graph reduction approach to symbolic circuit analysis with data sharing and cancellation-free properties. In: Proceedings of the 2012 17th Asia and South Pacific design automation conference (ASP-DAC), pp 541–546
Xu Y, Chu C (2009) GREMA: graph reduction based efficient mask assignment for double patterning technology. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 601–606
Henke J, Ernst R (1995) A path-based technique for estimating hardware runtime in Hw/Sw-cosynthesis. In: Proceedings of the 8th international symposium on system synthesis, pp 116–121
Maciel P, Barros E, Rosenstiel W (1997) Computing communication cost by Petri nets for hardware/software codesign. In: Proceedings of the 8th international workshop on rapid system prototyping (RSP’97) shortening the path from specification to prototype, pp 44–56
Henkel J, Ernst R (1998) High-level estimation techniques for usage in hardware/software co-design. In: Proceedings of the Asia and South Pacific design automation conference (ASP-DAC), pp 353–360
Li Y, Henkel J (1998) A framework for estimating and minimizing energy dissipation of embedded HW/SW systems. In: Proceedings of the 35th design automation conference (DAC), pp 188–193
Junior M, Neto S, Maciel P, Lima R, Ribeiro A, Barreto R, Tavares E, Braga F (2006) Analyzing software performance and energy consumption of embedded systems by probabilistic modeling: an approach based on coloured Petri nets. In: Proceedings of the 27th international conference on applications and theory of Petri nets and other models of concurrency (ICATPN), Turku, pp.261–281 (pp 26–30)
Ye W, Ernst R, Benner T, Henkel J (1993) Fast timing analysis for hardware–software co-synthesis. In: Proceedings of the IEEE international conference on computer design: VLSI in computers and processors (ICCD), pp 452–457
Vahid F, Gajski DD (1995) Closeness metrics for system-level functional partitioning. In: Proceedings of the European design automation conference with EURO-VHDL (EURO-DAC), pp 328–333
Lam SK, Srikanthan T (2009) Rapid design of area-efficient custom instructions for reconfigurable embedded processing. J Syst Archit 55(1):1–14
Lam SK, Srikanthan T, Clarke CT (2011) Architecture-aware technique for mapping area-time efficient custom instructions onto FPGAs. IEEE Trans Comput 60(5):680–692
Chuong LM, Lam SK, Srikanthan T (2009) Area-time estimation of controller for porting C-based functions onto FPGA. In: Proceedings of the IEEE/IFIP international symposium on rapid system prototyping (RSP), pp 145–151
Zhao K, Bian J (2011) Instruction-level hardware/software partition through DFG exploration. In: Proceedings of the 15th international conference on computer supported cooperative work in design (CSCWD), pp 55–60
Wang D, Li S, Dou Y (2008) Collaborative hardware/software partition of coarse-grained reconfigurable system using evolutionary ant colony optimization. In: Proceedings of the 2008 Asia and South Pacific design automation conference (ASP-DAC), pp 679–684
Bouchhima A, Gerin P, Petrot F (2009) Automatic instrumentation of embedded software for high level hardware/software co-simulation. In: Proceedings of the 2009 Asia and South Pacific design automation conference (ASP-DAC), pp 546–551
Wolf W (1996) Object-oriented cosynthesis of distributed embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 1(3):301–314
Henkel J, Ernst R (2001) An approach to automated hardware/software partitioning using a flexible granularity that is driven by high-level estimation techniques. IEEE Trans Very Large Scale Integr (VLSI) Syst 9(2):273–289
Gibson D, Kumar R, McCurley KS, Tomkins A (2006) Dense subgraph extraction. In: Cook DJ, Holder LB (eds) Mining graph data. Wiley, Hoboken
Cuzzocrea A, Song IY (2014) Big graph analytics: the state of the art and future research agenda. In: Proceedings of 17th international workshop on data warehousing and OLAP. ACM, pp 99–101
Quamar A, Deshpande A, Lin J (2014) NScale: neighborhood-centric analytics on large graphs. Proc VLDB 7(13):1673–1676
Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 581–586
Youness H, Hassan M, Sakanushi K, Takeuchi Y, Imai M, Salem A, Wahdan AM, Moness M (2009) A high performance algorithm for scheduling and hardware–software partitioning on MPSoCs. In: Proceedings of the 4th international conference on design and technology of integrated systems in nanoscale era (DTIS’09), pp 71–76
Han H, Liu W, Wu J, Jiang G (2013) Efficient algorithm for hardware/software partitioning and scheduling on MPSoC. J Comput 8(1):61–68
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 61173032, and the Doctoral Fund of Ministry of Education of China under Grant No. 20131201110002.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, G., Wu, J., Lam, SK. et al. Algorithmic aspects of graph reduction for hardware/software partitioning. J Supercomput 71, 2251–2274 (2015). https://doi.org/10.1007/s11227-015-1381-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1381-4