Abstract
In this paper we propose and evaluate a post-link-optimization to increase instruction level parallelism by moving instructions from one basic block to the preceding blocks. The Grid Alu Processor used for the evaluations comprises plenty of functional units that are not completely allocated by the original instruction stream. The proposed technique speculatively performs operations in advance by using unallocated functional units.
The algorithm moves instructions to multiple predecessors of a source block. If necessary, it adds compensation code to allow the shifted instructions to work on unused registers, whose values will be copied into the original target registers at the time the speculation is resolved.
Evaluations of the algorithm show a maximum speedup of factor 2.08 achieved on the Grid Alu Processor compared to the unoptimized version of the same program due to a better exploitation of the ILP and an optimized mapping of loops.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bernstein, D., Rodeh, M.: Global instruction scheduling for superscalar machines. SIGPLAN Not. 26(6), 241–255 (1991)
Bringmann, R.A.: Enhancing instruction level parallelism through compiler-controlled speculation. PhD thesis, University of Illinois, Champaign, IL, USA (1995)
Burger, D., Austin, T.: The SimpleScalar tool set, version 2.0. ACM SIGARCH Computer Architecture News 25(3), 13–25 (1997)
Cotofana, S., Vassiliadis, S.: On the design complexity of the issue logic of superscalar machines. In: EUROMICRO 1998: Proceedings of the 24th Conference on EUROMICRO, p. 10277. IEEE Computer Society, Washington, DC, USA (1998)
Fisher, J.: Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers 30, 478–490 (1981)
Gregg, D.: Comparing tail duplication with compensation code in single path global instruction scheduling. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027, pp. 200–212. Springer, Heidelberg (2001)
Guthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., Brown, R.: Mibench: A free, commercially representative embedded benchmark suite. pp. 3 – 14 (December 2001)
Llosa, J.: Swing modulo scheduling: A lifetime-sensitive approach. In: PACT 1996: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques, p. 80. IEEE Computer Society, Washington, DC, USA (1996)
Lowney, P.G., Freudenberger, S.M., Karzes, T.J., Lichtenstein, W.D., Nix, R.P., O’Donnell, J.S., Ruttenberg, J.: The multiflow trace scheduling compiler. The Journal of Supercomputing 7(1-2), 51–142 (1993)
Mahlke, S.A., Chen, W.Y., Bringmann, R.A., Hank, R.E., Mei, W., Hwu, W., Ramakrishna, B., Michael, R., Schlansker, S.: Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Transactions on Computer Systems 11, 376–408 (1993)
Shehan, B., Jahr, R., Uhrig, S., Ungerer, T.: Reconfigurable grid alu processor: Optimization and design space exploration. In: Proceedings of the 13th Euromicro Conference on Digital System Design (DSD) 2010, Lille, France (2010)
Thornton, J.E.: Parallel operation in the Control Data 6600. In: AFIPS 1964 (Fall, part II): Proceedings of the Fall Joint Computer Conference, Part II: Very High Speed Computer Systems, October 27-29, pp. 33–40. ACM, New York (1965)
Thornton, J.E.: Design of a Computer—The Control Data 6600. Scott Foresman & Co. (1970)
Tirumalai, P., Lee, M.: A heuristic for global code motion. In: ICYCS 1993: Proceedings of the Third International Conference on Young Computer Scientists, pp. 109–115. Tsinghua University Press, Beijing (1993)
Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11(1), 25–33 (1967)
Uhrig, S., Shehan, B., Jahr, R., Ungerer, T.: The two-dimensional superscalar GAP processor architecture. International Journal on Advances in Systems and Measurements 3, 71–81 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jahr, R., Shehan, B., Uhrig, S., Ungerer, T. (2011). Static Speculation as Post-link Optimization for the Grid Alu Processor. In: Guarracino, M.R., et al. Euro-Par 2010 Parallel Processing Workshops. Euro-Par 2010. Lecture Notes in Computer Science, vol 6586. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21878-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-21878-1_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21877-4
Online ISBN: 978-3-642-21878-1
eBook Packages: Computer ScienceComputer Science (R0)