More Web Proxy on the site http://driver.im/

research-article

Open access

Graph transformations for register-pressure-aware instruction scheduling

Authors:

Ghassan Shobaki,

Justin Bassett,

Mark Heffernan,

Austin KerbowAuthors Info & Claims

CC 2022: Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction

Pages 41 - 53

https://doi.org/10.1145/3497776.3517771

Published: 18 March 2022 Publication History

Abstract

This paper presents graph transformation algorithms for register-pressure-aware instruction scheduling. The proposed transformations add edges to the data dependence graph (DDG) to eliminate solutions that are either redundant or sub-optimal. Register-pressure-aware instruction scheduling aims at balancing two conflicting objectives: maximizing instruction-level parallelism (ILP) and minimizing register pressure (RP). Graph transformations have been previously proposed for the problem of maximizing ILP without considering RP, which is a problem of limited practical value. In the current paper, we extend that work by proposing graph transformations for the RP minimization objective, which is an important objective in practice. Various cost functions are considered for representing RP, and we show that the proposed transformations preserve optimality with respect to each of them. The proposed transformations are used to reduce the size of the solution space before applying a Branch-and-Bound (B&B) algorithm that exhaustively searches for an optimal solution. The proposed transformations and the B&B algorithm were implemented in the LLVM compiler, and their performance was evaluated experimentally on a CPU target and a GPU target. The SPEC CPU2017 floating-point benchmarks were used on the CPU and the PlaidML benchmarks were used on the GPU. The results show that the proposed transformations significantly reduce the compile time while giving approximately the same execution-time performance.

References

[1]

Gergö Barany and Andreas Krall. 2013. Optimal and Heuristic Global Code Motion for Minimal Spilling. In Proceedings of the 22nd International Conference on Compiler Construction (CC’13). Springer-Verlag, Berlin, Heidelberg. 21–40. isbn:9783642370502 https://doi.org/10.1007/978-3-642-37051-9_2

Digital Library

[2]

Beck J.C., Prosser P., Selensky E. 2002. Graph Transformations for the Vehicle Routing and Job Shop Scheduling Problems. In Lecture Notes in Computer Science, Corradini A., Ehrig H., Kreowski H.J., Rozenberg G. (Ed.) (ICGT 2002, Vol. 2505). Springer, Berlin, Heidelberg. 60–74. https://doi.org/10.1007/3-540-45832-8_7

[3]

Hong-Chich Chou and Chung-Ping Chung. 1995. An Optimal Instruction Scheduler for Superscalar Processor. IEEE Trans. Parallel Distrib. Syst., 6, 3 (1995), mar, 303–313. issn:1045-9219 https://doi.org/10.1109/71.372778

Digital Library

[4]

Keith Cooper and Linda Torczon. 2011. Engineering a Compiler (2 ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. isbn:9780120884780

[5]

Ł ukasz Domagał a, Duco van Amstel, Fabrice Rastello, and P. Sadayappan. 2016. Register Allocation and Promotion through Combined Instruction Scheduling and Loop Unrolling. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). Association for Computing Machinery, New York, NY, USA. 143–151. isbn:9781450342414 https://doi.org/10.1145/2892208.2892219

Digital Library

[6]

Ulrich Dorndorf, Toàn Phan Huy, and Erwin Pesch. 1999. A survey of interval capacity consistency tests for time-and resource-constrained scheduling. In Project Scheduling. Springer, 213–238. https://doi.org/10.1007/978-1-4615-5533-9_10

[7]

E. B. Fernandez and T. Lang. 1976. Scheduling as a Graph Transformation. IBM Journal of Research and Development, 20, 6 (1976), 551–559. https://doi.org/10.1147/rd.206.0551

Digital Library

[8]

J. R. Goodman and W.-C. Hsu. 1988. Code Scheduling and Register Allocation in Large Basic Blocks. In Proceedings of the 2nd International Conference on Supercomputing (ICS ’88). Association for Computing Machinery, New York, NY, USA. 442–452. isbn:0897912721 https://doi.org/10.1145/55364.55407

Digital Library

[9]

R. Govindarajan, Hongbo Yang, José Nelson Amaral, Chihong Zhang, and Guang R. Gao. 2003. Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures. IEEE Trans. Comput., 52, 1 (2003), Jan., 4–20. issn:0018-9340 https://doi.org/10.1109/TC.2003.1159750

Digital Library

[10]

Mark Heffernan and Kent Wilken. 2005. Data-Dependency Graph Transformations for Instruction Scheduling. J. of Scheduling, 8, 5 (2005), Oct., 427–451. issn:1094-6136 https://doi.org/10.1007/s10951-005-2862-8

Digital Library

[11]

Mark Heffernan, Kent Wilken, and Ghassan Shobaki. 2006. Data-Dependency Graph Transformations for Superblock Scheduling. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). IEEE Computer Society, USA. 77–88. isbn:0769527329 https://doi.org/10.1109/MICRO.2006.16

Digital Library

[12]

Tatsushi Inagaki, Hideaki Komatsu, and Toshio Nakatani. 2003. Integrated Prepass Scheduling for a Java Just-In-Time Compiler on the IA-64 Architecture. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO ’03). IEEE Computer Society, USA. 159–168. isbn:076951913X

Digital Library

[13]

Intel. 2017. PlaidML machine learning benchmarks. https://github.com/plaidml/plaidbench##intel-corporation-machine-learning-benchmarks

[14]

Christoph W. Kessler. 1998. Scheduling Expression DAGs for Minimal Register Need. Computer Languages, 24, 1 (1998), 33–53. issn:0096-0551 https://doi.org/10.1016/S0096-0551(98)00002-2

Digital Library

[15]

Robert Klein. 2001. Scheduling of Resource Constrained Projects. Kluwer Academic Publishers, USA. isbn:079238637x

[16]

Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO ’04). IEEE Computer Society, USA. 75. isbn:0769521029 https://doi.org/10.1109/CGO.2004.1281665

[17]

Roberto Castañeda Lozano. 2018. Constraint-Based Register Allocation and Instruction Scheduling. Ph.D. Dissertation. KTH Royal Institute of Technology.

[18]

Roberto Castañeda Lozano, Mats Carlsson, Gabriel Hjort Blindell, and Christian Schulte. 2019. Combinatorial Register Allocation and Instruction Scheduling. ACM Trans. on Programming Languages and Systems, 41, 3 (2019), Article 17, July, 53 pages. issn:0164-0925 https://doi.org/10.1145/3332373

Digital Library

[19]

Roberto Castañeda Lozano and Christian Schulte. 2019. Survey on Combinatorial Register Allocation and Instruction Scheduling. Comput. Surveys, 52, 3 (2019), Article 62, June, 50 pages. issn:0360-0300 https://doi.org/10.1145/3200920

Digital Library

[20]

Abid Malik. 2008. Constraint Programming Techniques for Optimal Instruction Scheduling. Ph.D. Dissertation. University of Waterloo.

Digital Library

[21]

R. Montemanni, D. H. Smith, and L. M. Gambardella. 2008. A Heuristic Manipulation Technique for the Sequential Ordering Problem. Comput. Oper. Res., 35, 12 (2008), dec, 3931–3944. issn:0305-0548 https://doi.org/10.1016/j.cor.2007.05.003

Digital Library

[22]

C. V. Ramamoorthy, K. M. Chandy, and Mario J. Gonzalez. 1972. Optimal Scheduling Strategies in a Multiprocessor System. IEEE Trans. Comput., C-21, 2 (1972), 137–146. https://doi.org/10.1109/TC.1972.5008918

Digital Library

[23]

Prashant Singh Rawat, Fabrice Rastello, Aravind Sukumaran-Rajam, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2018. Register Optimizations for Stencils on GPUs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 168–182. isbn:9781450349826 https://doi.org/10.1145/3178487.3178500

Digital Library

[24]

Ghassan Shobaki, Vahl Scott Gordon, Paul McHugh, Theodore Dubois, and Austin Kerbow. 2022. Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization. ACM Trans. Archit. Code Optim., 19, 2 (2022), Article 23, jan, 23 pages. issn:1544-3566 https://doi.org/10.1145/3505558

Digital Library

[25]

Ghassan Shobaki, Austin Kerbow, and Stanislav Mekhanoshin. 2020. Optimizing Occupancy and ILP on the GPU Using a Combinatorial Approach. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (CGO 2020). Association for Computing Machinery, New York, NY, USA. 133–144. isbn:9781450370479 https://doi.org/10.1145/3368826.3377918

Digital Library

[26]

Ghassan Shobaki, Austin Kerbow, Christopher Pulido, and William Dobson. 2019. Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling. ACM Trans. Archit. Code Optim., 16, 1 (2019), Article 1, Feb., 30 pages. issn:1544-3566 https://doi.org/10.1145/3301489

Digital Library

[27]

Ghassan Shobaki, Laith Sakka, Najm Eldeen Abu Rmaileh, and Hasan Al-Hamash. 2015. Experimental Evaluation of Various Register-Pressure-Reduction Heuristics. Softw. Pract. Exper., 45, 11 (2015), Nov., 1497–1517. issn:0038-0644 https://doi.org/10.1002/spe.2297

Digital Library

[28]

Ghassan Shobaki, Maxim Shawabkeh, and Najm Eldeen Abu Rmaileh. 2013. Preallocation Instruction Scheduling with Register Pressure Minimization Using a Combinatorial Optimization Approach. ACM Trans. Archit. Code Optim., 10, 3 (2013), Article 14, Sept., 31 pages. issn:1544-3566 https://doi.org/10.1145/2512432

Digital Library

[29]

Oliver Sinnen. 2014. Reducing the solution space of optimal task scheduling. Computers & Operations Research, 43 (2014), 201–214. issn:0305-0548 https://doi.org/10.1016/j.cor.2013.09.004

Digital Library

[30]

Standard Performance Evaluation Corporation. 2017. SPEC CPU 2017. https://www.spec.org/cpu2017/

[31]

Sid-Ahmed-Ali Touati. 2005. Register Saturation in Instruction Level Parallelism. Intl. Journal of Parallel Prog., 33, 4 (2005), Aug., 393–449. issn:0885-7458 https://doi.org/10.1007/s10766-005-6466-x

Digital Library

[32]

Kent Wilken, Jack Liu, and Mark Heffernan. 2000. Optimal Instruction Scheduling Using Integer Programming. SIGPLAN Not., 35, 5 (2000), may, 121–133. issn:0362-1340 https://doi.org/10.1145/358438.349318

Digital Library

Cited By

Krolik AVerbrugge CHendren L(2023)rNdN: Fast Query Compilation for NVIDIA GPUsACM Transactions on Architecture and Code Optimization10.1145/360350320:3(1-25)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3603503
Su CWang J(2022)A Branch-and-Bound Algorithm for Minimizing the Total Tardiness of Multiple DevelopersMathematics10.3390/math1007120010:7(1200)Online publication date: 6-Apr-2022
https://doi.org/10.3390/math10071200

Index Terms

Graph transformations for register-pressure-aware instruction scheduling
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Discrete space search
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization
This paper describes a new approach to register-pressure-aware instruction scheduling, using Ant Colony Optimization (ACO). ACO is a nature-inspired optimization technique that researchers have successfully applied to NP-hard sequencing problems like the ...
Data-Dependency Graph Transformations for Instruction Scheduling

This paper presents a set of efficient graph transformations for local instruction scheduling. These transformations to the data-dependency graph prune redundant and inferior schedules from the solution space of the problem. Optimally scheduling the ...
Fast, frequency-based, integrated register allocation and instruction scheduling

Instruction scheduling and register allocation are two of the most important optimization phases in modern compilers as they have a significant impact on the quality of the generated code. Unfortunately, the objectives of these two optimizations are in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CC 2022: Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction

March 2022

253 pages

ISBN:9781450391832

DOI:10.1145/3497776

General Chairs:
Bernhard Egger
Seoul National University, South Korea
,
Aaron Smith
Microsoft, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

CC '22

Sponsor:

SIGPLAN

CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction

April 2 - 3, 2022

Seoul, South Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
993
Total Downloads

Downloads (Last 12 months)352
Downloads (Last 6 weeks)38

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Krolik AVerbrugge CHendren L(2023)rNdN: Fast Query Compilation for NVIDIA GPUsACM Transactions on Architecture and Code Optimization10.1145/360350320:3(1-25)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3603503
Su CWang J(2022)A Branch-and-Bound Algorithm for Minimizing the Total Tardiness of Multiple DevelopersMathematics10.3390/math1007120010:7(1200)Online publication date: 6-Apr-2022
https://doi.org/10.3390/math10071200

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents