More Web Proxy on the site http://driver.im/

research-article

Open access

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Authors:

Ghassan Shobaki,

Maxim Shawabkeh,

Najm Eldeen Abu RmailehAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 3

Article No.: 14, Pages 1 - 31

https://doi.org/10.1145/2512432

Published: 16 September 2013 Publication History

Abstract

Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic techniques have been proposed to solve this problem. However, due to the inherently conflicting requirements of maximizing ILP and minimizing register pressure, heuristic techniques may produce poor schedules in many cases. If such cases occur in hot code, significant performance degradation may result. A few combinatorial optimization approaches have also been proposed, but none of them has been shown to solve large real-world instances within reasonable time. This article presents the first combinatorial algorithm that is efficient enough to optimally solve large instances of this problem (basic blocks with hundreds of instructions) within a few seconds per instance. The proposed algorithm uses branch-and-bound enumeration with a number of powerful pruning techniques to efficiently search the solution space. The search is based on a cost function that incorporates schedule length and register pressure. An implementation of the proposed scheduling algorithm has been integrated into the LLVM Compiler and evaluated using SPEC CPU 2006. On x86-64, with a time limit of 10ms per instruction, it optimally schedules 79% of the hot basic blocks in FP2006. Another 19% of the blocks are not optimally scheduled but are improved in cost relative to LLVM's heuristic. This improves the execution time of some benchmarks by up to 21%, with a geometric-mean improvement of 2.4% across the entire benchmark suite. With the use of precise latency information, the geometric-mean improvement is increased to 2.8%.

References

[1]

Barany, G. 2011. Register reuse scheduling. In Proceedings of the 9^th Workshop on Optimizations for DSP and Embedded Systems (ODES'11).

[2]

Barany, G. and Krall, A. 2013. Optimal and heuristic global code motion for minimal spilling. In Proceedings of the International Conference on Compiler Construction.

Digital Library

[3]

Berson, D., Gupta, R., and Soffa, M. 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures. In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. 243--254.

Digital Library

[4]

Cooper, K. and Torczon, L. 2004. Engineering a Compiler. Morgan Kaufmann, San Fransisco, CA.

Digital Library

[5]

Faraboschi, P., Fisher, J., and Young, C. 2001. Instruction scheduling for instruction level parallel processors. Proc. IEEE 89, 11, 1638--1659.

[6]

Fog, A. 2012. The micro-architecture of INTEL, AMD and VIA cpus. An optimization guide for assembly programmers and compiler makers. http://www.agner.org/optimize/microarchitecture.pdf.

[7]

Goodman, J. and Hsu, W. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the International Conference on Supercomputing.

Digital Library

[8]

Govindarajan, R., Yang, H., Amaral, J., Zhang, C., and Gao, G. 2003. Minimum register instruction sequencing to reduce register spills in out-of-order. IEEE Trans. Comput. 52, 1, 4--20.

Digital Library

[9]

Havanki, W., Banerjia, S., and Conte, T. 1998. Treegion scheduling for wide-issue processors. In Proceedings of the 4^th International Symposium on High-Performance Computer Architecture (HPCA'98).

Digital Library

[10]

Kessler, C. 1998. Scheduling expression DAGs for minimal register need. J. Comput. Lang. 24, 1, 33--53.

Digital Library

[11]

Langevin, M. and Cerny, E. 1996. A recursive technique for computing lower-bound performance of schedules. ACM Trans. Des. Autom. Electron. Syst. 1, 4, 443--456.

Digital Library

[12]

Malik, A. 2008. Constraint programming techniques for optimal instruction scheduling. Ph.D. thesis, University of Waterloo. https://cs.uwaterloo.ca/&sim;vanbeek/Publications/malik.pdf.

Digital Library

[13]

Rim, M. and Jain, R. 1994. Lower-bound performance estimation for the high-level synthesis scheduling problem. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 13, 4, 451--458.

Digital Library

[14]

Shobaki, G. and Wilken, K. 2004. Optimal superblock scheduling using enumeration. In Proceedings of the 37^th International Symposium on Microarchitecture.

Digital Library

[15]

Shobaki, G. 2006. Optimal global instruction scheduling using enumeration. Ph.D. dissertation, Department of Computer Science, UC Davis. http://www.cs.ucdavis.edu/research/tech-reports/2006/CSE-2006-19.pdf.

Digital Library

[16]

Shobaki, G., Wilken, K., and Heffernan, M. 2009. Optimal trace scheduling using enumeration. ACM Trans. Archit. Code Optim. 5, 4.

Digital Library

[17]

Touati, S. 2005. Register saturation in instruction-level parallelism. Int. J. Parallel Program. 33, 4, 393--449.

Digital Library

[18]

Weicker, R. and Henning, J. 2007. Subroutine profiling results for the CPU2006 benchmarks. ACM SIGARCH Comput. Archit. News 35, 1, 102--111.

Digital Library

[19]

Winkel, S. 2007. Optimal versus heuristic global code scheduling. In Proceedings of the 40^th International Symposium on Microarchitecture.

Digital Library

Cited By

Deng CChen ZShi YMa YWen MLuo L(2024)Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic ProgrammingACM Transactions on Design Automation of Electronic Systems10.1145/364313529:5(1-20)Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1145/3643135
Shobaki GMuyan-Özçelik PHutton JLinck BMalyshenko VKerbow ARamirez-Ortega RGordon V(2024)Instruction Scheduling for the GPU on the GPU2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444869(435-447)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444869
Gonggiatgul TShobaki GMuyan-Özçelik P(2023)A parallel branch-and-bound algorithm with history-based domination and its application to the sequential ordering problemJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.007172(131-143)Online publication date: Feb-2023
https://doi.org/10.1016/j.jpdc.2022.10.007
Show More Cited By

Index Terms

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Recommendations

Optimizing occupancy and ILP on the GPU using a combinatorial approach
CGO '20: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

This paper presents the first general solution to the problem of optimizing both occupancy and Instruction-Level Parallelism (ILP) when compiling for a Graphics Processing Unit (GPU). Exploiting ILP (minimizing schedule length) requires using more ...
Combinatorial Register Allocation and Instruction Scheduling

This article introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-...
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization
This paper describes a new approach to register-pressure-aware instruction scheduling, using Ant Colony Optimization (ACO). ACO is a nature-inspired optimization technique that researchers have successfully applied to NP-hard sequencing problems like the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 10, Issue 3

September 2013

310 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2509420

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2013

Accepted: 01 March 2013

Revised: 01 February 2013

Received: 01 February 2012

Published in TACO Volume 10, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
1,484
Total Downloads

Downloads (Last 12 months)234
Downloads (Last 6 weeks)42

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Deng CChen ZShi YMa YWen MLuo L(2024)Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic ProgrammingACM Transactions on Design Automation of Electronic Systems10.1145/364313529:5(1-20)Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1145/3643135
Shobaki GMuyan-Özçelik PHutton JLinck BMalyshenko VKerbow ARamirez-Ortega RGordon V(2024)Instruction Scheduling for the GPU on the GPU2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444869(435-447)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444869
Gonggiatgul TShobaki GMuyan-Özçelik P(2023)A parallel branch-and-bound algorithm with history-based domination and its application to the sequential ordering problemJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.007172(131-143)Online publication date: Feb-2023
https://doi.org/10.1016/j.jpdc.2022.10.007
Shobaki GGordon VMcHugh PDubois TKerbow A(2022)Register-Pressure-Aware Instruction Scheduling Using Ant Colony OptimizationACM Transactions on Architecture and Code Optimization10.1145/350555819:2(1-23)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3505558
Shobaki GBassett JHeffernan MKerbow AEgger BSmith A(2022)Graph transformations for register-pressure-aware instruction schedulingProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517771(41-53)Online publication date: 19-Mar-2022
https://dl.acm.org/doi/10.1145/3497776.3517771
Behroozi APark SMahlke SEgger BSmith A(2022)Loner: utilizing the CPU vector datapath to process scalar integer dataProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517767(205-217)Online publication date: 19-Mar-2022
https://dl.acm.org/doi/10.1145/3497776.3517767
Six CGourdin LBoulmé SMonniaux DFasse JNardino NPopescu AZdancewic S(2022)Formally verified superblock schedulingProceedings of the 11th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3497775.3503679(40-54)Online publication date: 17-Jan-2022
https://dl.acm.org/doi/10.1145/3497775.3503679
Lozano RCarlsson MBlindell GSchulte C(2019)Combinatorial Register Allocation and Instruction SchedulingACM Transactions on Programming Languages and Systems10.1145/333237341:3(1-53)Online publication date: 2-Jul-2019
https://dl.acm.org/doi/10.1145/3332373
Shobaki GKerbow APulido CDobson W(2019)Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction SchedulingACM Transactions on Architecture and Code Optimization10.1145/330148916:1(1-30)Online publication date: 27-Feb-2019
https://dl.acm.org/doi/10.1145/3301489
Lozano RSchulte C(2019)Survey on Combinatorial Register Allocation and Instruction SchedulingACM Computing Surveys10.1145/320092052:3(1-50)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3200920
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents