Abstract
Code generation for VLIW processors includes several optimization problems like code optimization, instruction scheduling, and register allocation. The high complexity of these problems usually does not allow the computation of the optimal solution. Instead, optimization techniques, e.g., based on heuristics, are used to find acceptable solutions in a reasonable time. List scheduling is a well known heuristic-based microcode compaction method, that bases its scheduling decisions on weights derived from dependency analysis of the input program. Additional information and methods have to be used in order to reach better code compaction. Also, more sophisticated code optimization and register allocation support better code compaction. In this paper, evolutionary algorithms are used as dynamic heuristics in code generation, which allows dynamic adaption to the given input program and target processor configuration. Three evolutionary algorithms for operation merging, instruction scheduling, and register allocation are presented and evaluated on an exemplary image processing application, which shows different processing characteristics in the subroutines. They outperform code generation based on static heuristics and allow compilation for restricted target architectures that cannot be handled by the static heuristics.
Similar content being viewed by others
References
Bahuleyan, J, Nagpal, R, Srikant, YN. (2010). Integrated energy-aware cyclic and acyclic scheduling for clustered VLIW processors. In: International symposium on parallel & distributed processing, Workshops and Phd Forum (IPDPSW).
Briggs, P. (1992). Register allocation via graph coloring. PhD thesis: Rice University.
Codina, JM, Sanchez, J, Gonzalez, A. (2001). A unified modulo scheduling and register allocation technique for clustered processors. In: Proceedings of international conference on parallel architectures and compilation techniques.
Eriksson, MV, Skoog, O, Kessler, CW. (2008). Optimal vs. heuristic integrated code generation for clustered VLIW architectures. In Proceedings of 11th international workshop on software & compilers for embedded systems (SCOPES ’08). New York: ACM.
Fisher, JA. (1979). Optimization of horizontal microcode within and beyond basic blocks: an application of processor scheduling with resources. Tech. rep.: NY Univ. Courant Mathem.and Comp.Lab.
Gerlach, L, Marquardt, D, Payá Vayá, G, Liu, S, Weißbrich, M, Doclo, S, Blume, H. (2017). Analyzing the trade-off between power consumption and beamforming algorithm performance using a hearing aid ASIP. In: Proceedings of international conference on embedded computer systems: Architectures, Modeling and simulation.
Giesemann, F, Payá-Vayá, G., Blume, H. (2012). A Hardware/software environment for specializing dynamic reconfigurable generic vliw-simd asip architecture. In: ICT. OPEN 2012 conference.
Giesemann, F, Payá-Vayá, G., Blume, H, Limmer, M, Ritter, W. (2014). A comprehensive ASIC/FPGA prototyping environment for exploring embedded processing systems for advanced driver assistance applications. In: Proceedings of international conference on embedded computer systems: Architectures, Modeling, and simulation.
Giesemann, F, Payá-Vayá G., Gerlach, L, Blume, H, Pflug, F, von Voigt, G. (2017). Using genetic algorithm approach to reduce register file pressure during instruction scheduling. In: Proceedings of international conference on embedded computer systems: Architectures, Modeling and simulation.
Hennessy, JL, & Gross, T. (1983). Postpass code optimization of pipeline constraints. ACM Transactions on Programming Languages and Systems, 5(3), 422–448.
Äijö, T., Jääskeläinen, P., Elomaa, T, Kultala, H, Takala, J. (2015). Integer linear programming-based scheduling for Transport Triggered Architectures. ACM Transactions on Architecture and Code Optimization, 12(4), 1–22. https://doi.org/10.1145/2845082.
Kri, F, & Feeley, M. (2004). Genetic instruction scheduling and register allocation. In: XXIV international conference of the chilean computer science society.
Lee, C, Lee, JK, Hwang, T, Tsai, SC. (2003). Compiler optimization on VLIW instruction scheduling for low power. ACM Transaction on Design Automation of Electronic Systems, 8(2), 252–268.
Leupers, R, & Bashford, S. (2000). Graph-based code selection techniques for embedded processors. ACM Transactions on Design Automation of Electronic Systems, 5(4), 794–814.
Lin, YC, You, YP, Lee, JK. (2006). Register allocation for VLIW DSP processors with irregular register files. In: 12Th workshop on compilers for parallel computers (CPC 2006), Coruña, Spain.
Lorenz, M, Leupers, R, Marwedel, P, Drager, T, Fettweis, G. (2001). Low-energy DSP code generation using a genetic algorithm. In: Proceedings of the IEEE international conference on computer design: vlsi in computers and processors (ICCD), pp. 431–437. https://doi.org/10.1109/ICCD.2001.955062.
Lorenz, M, Wehmeyer, L, Dräger, T. (2002). Energy aware compilation for DSPs with SIMD instructions. In: Proceedings of the joint conference on Languages, compilers and tools for embedded systems software and compilers for embedded systems - LCTES/SCOPES ’02, ACM Press, pp. 94–101. https://doi.org/10.1145/513829.513847.
Lozano, RC, Carlsson, M, Drejhammar, F, Schulte, C. (2012). Constraint-based register allocation and instruction scheduling. In Lecture notes in computer science. https://doi.org/10.1007/978-3-642-33558-7_54 (pp. 750–766). Berlin: Springer.
Lozano, RC, Carlsson, M, Blindell, GH, Schulte, C. (2014). Combinatorial spill code optimization and ultimate coalescing. In: Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems - LCTES ’14, ACM Press, pp. 23–32. https://doi.org/10.1145/2597809.2597815.
Maldonado-Bascón, S., Lafuente-Arroyo, S, Gil-Jiménez, P., Gomez-Moreno, H, Lopez-Ferreras, F. (2007). Road-sign detection and recognition based on support vector machines. IEEE Transactions on Intelligent Transportation Systems, 8(2), 264–278. https://doi.org/10.1109/TITS.2007.895311.
McGovern, A, Moss, E, Barto, AG. (2002). Building a basic block instruction scheduler with reinforcement learning and rollouts. Machine Learning, 49(2/3), 141–160. https://doi.org/10.1023/a:1017976211990.
Nian, C, Yanxiang, H, Yong, C, Ximi, L, Qian, L. (2012). PSO Based instruction scheduling for low power. In: International conference on computer distributed control and intelligent environmental monitoring (CDCIEM), IEEE. https://doi.org/10.1109/cdciem.2012.129.
Nicolau, A, Potasman, R, Wang, H. (1992). Register allocation, renaming and their impact on fine-grain parallelism. In: Languages and Compilers for Parallel Computing, Springer-Verlag, pp. 218–235. https://doi.org/10.1007/bfb0038667.
Parikh, A, Kim, S, Kandemir, M, Vijaykrishnan, N, Irwin, M J. (2004). Instruction scheduling for low power. Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 37(1), 129–149. https://doi.org/10.1023/b:vlsi.0000017007.28247.f6.
Payá-Vayá, G. (2011). Design and analysis of a generic VLIW processor for multimedia applications PhD thesis. Leibniz Universität Hannover: Institute of Microelectronic Systems.
Payá-Vayá, G, Martín Langerwerf, J, Pirsch, P. (2005). Rapanui: Rapid prototyping for media processor architecture exploration. In Hämäläinen, T D, Pimentel, A D, Takala, J, Vassiliadis, S (Eds.) Embedded computer systems: architectures, modeling, and simulation, lecture notes in computer science. https://doi.org/10.1007/11512622_5, (Vol. 3553 pp. 249–265). Berlin: Springer.
Payá-Vayá, G., Martín-Langerwerf, J., Taptimthong, P., Pirsch, P. (2007). Design space exploration of media processors: A parameterized scheduler. In: 2007 international conference on embedded computer systems: Architectures, Modeling and simulation, pp. 41–49. https://doi.org/10.1109/ICSAMOS.2007.4285732.
Payá-Vayá, G., Martín Langerwerf, J., Giesemann, F., Blume, H., Pirsch, P. (2009). Instruction merging to increase parallelism in VLIW architectures. In: International symposium on system-on-chip (SOC), pp. 143–146. https://doi.org/10.1109/SOCC.2009.5335660.
Payá-Vayá, G., Martín-langerwerf, J., Pirsch, P. (2009). A multi-shared register file structure for VLIW processors. Journal of Signal Processing Systems, 58(2), 215–231. https://doi.org/10.1007/s11265-009-0355-2.
Payá-Vayá, G, Martín-Langerwerf, J, Blume, H, Pirsch, P. (2010). A forwarding-sensitive instruction scheduling approach to reduce register file constraints in VLIW architectures. In: 21st IEEE international conference on application-specific systems, Architectures and processors (ASAP), IEEE. https://doi.org/10.1109/asap.2010.5541015.
She, D, He, Y, Mesman, B, Corporaal, H. (2012). Scheduling for register file energy minimization in explicit datapath architectures. In: Proceedings of the IEEE design, automation & test in europe conference & exhibition (DATE), pp. 388–393 . https://doi.org/10.1109/date.2012.6176502.
Su, X, Wu, H, Xue, J. (2017). An efficient WCET-aware instruction scheduling and register allocation approach for clustered VLIW processors. ACM Transactions on Embedded Computing Systems, 16(5s), 1–21. https://doi.org/10.1145/3126524.
Wang, M, Wang, Y, Liu, D, Qin, Z, Shao, Z. (2010). Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors. Journal of Systems and Software, 83(5), 772–785. https://doi.org/10.1016/j.jss.2009.11.727.
Wilken, K, Liu, J, Heffernan, M. (2000). Optimal instruction scheduling using integer programming. SIGPLAN Not, 35(5), 121–133. https://doi.org/10.1145/358438.349318.
Xiao, S, & Lai, EMK. (2005). Instruction scheduling of VLIW architectures for balanced power consumption. In: Proceedings of the conference on Asia South Pacific design automation (ASP-DAC ’05), ACM Press, pp. 824–829. https://doi.org/10.1145/1120725.1121027.
Xiao, S, & Lai, EMK. (2007). VLIW instruction scheduling for minimal power variation. ACM Transactions on Architecture and Code Optimization, 4(3), 18–es. https://doi.org/10.1145/1275937.1275942.
Yun, HS, & Kim, J. (2001). Power-aware modulo scheduling for high-performance VLIW processors. In: ISLPED ’01: proceedings of the 2001 international symposium on low power electronics and design (IEEE Cat. No.01TH8581), ACM, pp. 40–45. https://doi.org/10.1109/lpe.2001.945369.
Zeitlhofer, T, & Wess, B. (1999). Operation scheduling for parallel functional units using genetic algorithms. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No.99CH36258), vol. 4, pp. 1997–2000. https://doi.org/10.1109/ICASSP.1999.758319.
Zhong, S, Wei, J, Guo, W, Wang, Z. (2010). Instruction scheduling using genetic algorithm with taboo search for TTA-like processors. In: International conference on computer design and applications. IEEE. https://doi.org/10.1109/iccda.2010.5541389.
Acknowledgements
This work was partly funded by the German Research Council (DFG) under project number PA 2762/2-1.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Giesemann, F., Gerlach, L. & Payá-Vayá, G. Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers. J Sign Process Syst 92, 655–678 (2020). https://doi.org/10.1007/s11265-019-01493-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-019-01493-2