Abstract
Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that the generated custom instructions can be mapped on the RFU (mappable custom instructions). Experimental results show that multi-exit custom instructions enhance the performance and energy efficiency by an average of 32% and 3% compared to custom instructions limited to one basic block, respectively. A maximum speedup of 4.9, compared to a single-issue embedded processor, and an average speedup of 1.9 was achieved on MiBench benchmark suite. The maximum and average energy saving are 56% and 22%, respectively. These performance and energy efficiency are obtained at the cost of 30% area overhead.
Similar content being viewed by others
References
Alomary A, Nakata T, Honma Y, Sato J, Hikichi N, Imai M (1993) PEAS-I: A hardware/software co-design system for ASIPs. In: Euro-DAC, pp 2–7
Arnold M, Corporaal H (2001) Designing domain specific processors. In: Proceedings of the 9th international workshop on hardware/software codesign, pp 61–66
Atasu K, Pozzi L, Ienne P (2003) Automatic application-specific instruction-set extension under microarchitectural constraints. In: Design automation conference, pp 256–261
Baleani M, Gennari F, Jiang Y, Patel Y, Brayton R, Sangiovanni-Vincentelli A (2002) HW/SW partitioning and code generation of embedded control applications on a reconfigurable architecture platform. In: 10th international symposium on hardware/software codesign, pp 151–156
Barat F, Jayapala M, Vander AaT, Lauwereins R, Deconinck G, Corporaal H (2003) Low-power coarse-grained reconfigurable instruction set processor. In: Field-programmable logic and applications, pp 230–239
Biswas P, Dutt N, Ienne P, Pozzi L (2006) Automatic identification of application-specific functional units with architecturally visible storage. Proc Des Autom Test Eur 1:1–6
Brisk P, Kaplan A, Kastner R, Sarrafzadeh M (2002) Instruction generation and regularity extraction for reconfigurable processors. In: CASES, pp 262–269
Carrillo JE, Chow P (2001) The effect of reconfigurable units in superscalar processors. In: ACM/SIGDA on field programmable gate arrays, pp 141–150
Clark N, Zhong H, Mahlke S (2003) Processor acceleration through automated instruction set customization. In: The 36th international symposium on microarchitecture, pp 129–140
Clark N, Kudlur M, Park H, Mahlke S, Flautner K (2004) Application-specific processing on a general-purpose core via transparent instruction set customization. In: The 37th international symposium on microarchitecture, pp 30–40
Clark N, Blome J, Chu M, Mahlke S, Biles S, Flautner K (2005) An architecture framework for transparent instruction set customization in embedded processors. In: International symposium on computer architecture, pp 272–283
Dougherty WE, Pursley DJ, Thomas DE (1999) Subsetting behavioral intellectual property for low power ASIP design. J VLSI Signal Process 209–218
Furuyama T (2007) Challenges of digital consumer and mobile SoC’s: more Moore possible. In: Keynote address, design automation and test in Europe (DATE).
Goodwin D, Petkov D (2003) Automatic generation of application specific processors. In: International conference on compilers, architecture, and synthesis for embedded systems, pp 137–147
Hauck S, Fry T, Hosler M, Kao J (1997) The Chimaera reconfigurable functional unit. In: Proc IEEE symposium FPGAS for custom computing machines, pp 87–96
Kastner R, Kaplan A, Ogrenci Memic S, Bozorgzadeh E (2002) Instruction generation for hybrid reconfigurable systems. ACM Trans Des Automat Embedd Syst 604–627
Khan SU, Ahmad I (2009) A cooperative game theoretical technique for joint optimization of energy consumption and response time in computational grids. IEEE Trans Parallel Distrib Syst 21(4):537–553
Lodi A, Toma M, Campi F, Cappelli A, Guerrieri R (2003) A VLIW processor with reconfigurable instruction set for embedded applications. IEEE J Solid-State Circuits 38(11):1876–1886
Lysecky R, Vahid F (2005) A study of the speedups and competitiveness of FPGA soft processor cores using dynamic hardware/software partitioning. In: DATE, pp 18–23
Mehdipour F, Noori H, Saheb Zamani M, Murakami K, Sedighi M, Inoue K (2006) An integrated temporal partitioning and mapping framework for handling custom instructions on a reconfigurable functional unit. In: The eleventh Asia–Pacific computer systems architecture conference (ACSAC’06). Lecture notes in computer science, vol 4186, pp 219–230
Mehdipour F, Noori H, Saheb Zamani M, Inoue K, Murakami K (2007) Improving performance and energy saving in a reconfigurable processor via accelerating control data flow graphs. IEICE Trans Inf Syst E90-D(12)
Mehdipour F, Saheb Zamani M, Sedighi M (2006) An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems. Microprocess Microsyst 30:52–62
Mei B, Vernalde S, Verkest D, Lauwereinsg R (2004) Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study. In: Proc design automation and test in Europe, pp 90–101
Mibench, www.eecs.umich.edu/mibench
Noori H, Mehdipour F, Murakami K, Inoue K, Saheb Zamani M (2008) An architecture framework for an adaptive extensible processor. J Supercomput (online edition)
Noori H, Mehdipour F, Inoue K, Murakami K (2008) A reconfigurable functional unit with conditional execution for multi-exit custom instructions. IEICE Trans Electron E91-C(4):497–508
Patel S, Lumetta S (2001) rePLay: A hardware framework for dynamic optimization. IEEE Trans Comput 50(6):590–608
Praet JV, Goossens G, Lanneer D, Man HD (1994) Instruction set definition and instruction selection for ASIP. In: Intl symp on system synthesis
Rao DS, Kurdahi FJ (1993) On clustering for maximal regularity extraction. IEEE Trans Computer Aided Des 12(8):1198–1208
Razdan R, Smith M (1994) A high-performance microarchitecture with hardware-programmable functional units. In: The 27th international symposium on microarchitecture, pp 172–180
Sakurai T (2007) Meeting with the forthcoming IC design. Keynote address, ASP-DAC 2007
Semenov O et al (2003) Burn-in temperature projections for deep sub-micro technologies. In: International test conference
Simplescalar, www.simplescalar.com
Stitt G, Lysecky R, Vahid F (2004) Energy savings and speedups from partitioning critical software loops to hardware in embedded systems. ACM Trans Embedd Comput Syst 250–255
Sun F, Ravi S, Raghunathan A, Jha NK (2002) Synthesis of custom processors based on extensible platforms. In: ICCAD 2002, vol 23, pp 216–228
Sun F, Ravi S, Raghunathan A, Jha NK (2004) Custom instruction synthesis for extensible-processor platforms. IEEE Trans Computer-Aided Des Integrat Circuits Syst 23:216–228
Synopsys, www.synopsys.com
Tarjan D, Thoziyoor S, Jouppi NP (2006) Cacti 4.0, HP laboratories, Technical report
Vassiliadis S, Wong S, Gaydadjiev G, Bertels K, Kuzmanov G, Panainte EM (2004) The MOLEN polymorphic processor. IEEE Trans Comput 53(11):1363–1375
Wan M, Zhang H, George V, Benes M, Abnous A, Prabhu V, Rabaey J (2001) Design methodology of a low-energy reconfigurable single-chip DSP system. J VLSI Signal Process 47–61
Warp Processors, http://www.cs.ucr.edu/~vahid/warp/
Weisstein W Graph isomorphism. http://mathworld.wolfram.com/GraphIsomorphism.html
Wong S, Vassiliadis S, Cotofana S (2004) Future directions of programmable and reconfigurable embedded processors. In: Domain-specific processors: systems, architectures, modeling, and simulation
Yu P, Mitra T (2004) Characterizing embedded applications for instruction-set extensible processors. In: Design automation conference, pp 723–728
Zhang C, Vahid F, Najjar W (2005) A highly configurable cache architecture for embedded systems. ACM Trans Embed Comput Syst 4(2):136–146
Author information
Authors and Affiliations
Corresponding author
Additional information
A first version of this work appeared in Design Automation and Test in Europe (DATE), 2007 under the title “Generating and Executing Multi-Exit Custom Instructions for an Adaptive Extensible Processor” and International Symposium on Low Power Electronics and Design (ISLPED), 2008 under the title “Enhancing Energy Efficiency of Processor-Based Embedded Systems through Post-Fabrication ISA Extension”.
Rights and permissions
About this article
Cite this article
Noori, H., Mehdipour, F., Inoue, K. et al. Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization. J Supercomput 60, 196–222 (2012). https://doi.org/10.1007/s11227-010-0505-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-010-0505-0