[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3497776.3517771acmconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article
Open access

Graph transformations for register-pressure-aware instruction scheduling

Published: 18 March 2022 Publication History

Abstract

This paper presents graph transformation algorithms for register-pressure-aware instruction scheduling. The proposed transformations add edges to the data dependence graph (DDG) to eliminate solutions that are either redundant or sub-optimal. Register-pressure-aware instruction scheduling aims at balancing two conflicting objectives: maximizing instruction-level parallelism (ILP) and minimizing register pressure (RP). Graph transformations have been previously proposed for the problem of maximizing ILP without considering RP, which is a problem of limited practical value. In the current paper, we extend that work by proposing graph transformations for the RP minimization objective, which is an important objective in practice. Various cost functions are considered for representing RP, and we show that the proposed transformations preserve optimality with respect to each of them. The proposed transformations are used to reduce the size of the solution space before applying a Branch-and-Bound (B&B) algorithm that exhaustively searches for an optimal solution. The proposed transformations and the B&B algorithm were implemented in the LLVM compiler, and their performance was evaluated experimentally on a CPU target and a GPU target. The SPEC CPU2017 floating-point benchmarks were used on the CPU and the PlaidML benchmarks were used on the GPU. The results show that the proposed transformations significantly reduce the compile time while giving approximately the same execution-time performance.

References

[1]
Gergö Barany and Andreas Krall. 2013. Optimal and Heuristic Global Code Motion for Minimal Spilling. In Proceedings of the 22nd International Conference on Compiler Construction (CC’13). Springer-Verlag, Berlin, Heidelberg. 21–40. isbn:9783642370502 https://doi.org/10.1007/978-3-642-37051-9_2
[2]
Beck J.C., Prosser P., Selensky E. 2002. Graph Transformations for the Vehicle Routing and Job Shop Scheduling Problems. In Lecture Notes in Computer Science, Corradini A., Ehrig H., Kreowski H.J., Rozenberg G. (Ed.) (ICGT 2002, Vol. 2505). Springer, Berlin, Heidelberg. 60–74. https://doi.org/10.1007/3-540-45832-8_7
[3]
Hong-Chich Chou and Chung-Ping Chung. 1995. An Optimal Instruction Scheduler for Superscalar Processor. IEEE Trans. Parallel Distrib. Syst., 6, 3 (1995), mar, 303–313. issn:1045-9219 https://doi.org/10.1109/71.372778
[4]
Keith Cooper and Linda Torczon. 2011. Engineering a Compiler (2 ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. isbn:9780120884780
[5]
Ł ukasz Domagał a, Duco van Amstel, Fabrice Rastello, and P. Sadayappan. 2016. Register Allocation and Promotion through Combined Instruction Scheduling and Loop Unrolling. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). Association for Computing Machinery, New York, NY, USA. 143–151. isbn:9781450342414 https://doi.org/10.1145/2892208.2892219
[6]
Ulrich Dorndorf, Toàn Phan Huy, and Erwin Pesch. 1999. A survey of interval capacity consistency tests for time-and resource-constrained scheduling. In Project Scheduling. Springer, 213–238. https://doi.org/10.1007/978-1-4615-5533-9_10
[7]
E. B. Fernandez and T. Lang. 1976. Scheduling as a Graph Transformation. IBM Journal of Research and Development, 20, 6 (1976), 551–559. https://doi.org/10.1147/rd.206.0551
[8]
J. R. Goodman and W.-C. Hsu. 1988. Code Scheduling and Register Allocation in Large Basic Blocks. In Proceedings of the 2nd International Conference on Supercomputing (ICS ’88). Association for Computing Machinery, New York, NY, USA. 442–452. isbn:0897912721 https://doi.org/10.1145/55364.55407
[9]
R. Govindarajan, Hongbo Yang, José Nelson Amaral, Chihong Zhang, and Guang R. Gao. 2003. Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures. IEEE Trans. Comput., 52, 1 (2003), Jan., 4–20. issn:0018-9340 https://doi.org/10.1109/TC.2003.1159750
[10]
Mark Heffernan and Kent Wilken. 2005. Data-Dependency Graph Transformations for Instruction Scheduling. J. of Scheduling, 8, 5 (2005), Oct., 427–451. issn:1094-6136 https://doi.org/10.1007/s10951-005-2862-8
[11]
Mark Heffernan, Kent Wilken, and Ghassan Shobaki. 2006. Data-Dependency Graph Transformations for Superblock Scheduling. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). IEEE Computer Society, USA. 77–88. isbn:0769527329 https://doi.org/10.1109/MICRO.2006.16
[12]
Tatsushi Inagaki, Hideaki Komatsu, and Toshio Nakatani. 2003. Integrated Prepass Scheduling for a Java Just-In-Time Compiler on the IA-64 Architecture. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO ’03). IEEE Computer Society, USA. 159–168. isbn:076951913X
[13]
Intel. 2017. PlaidML machine learning benchmarks. https://github.com/plaidml/plaidbench##intel-corporation-machine-learning-benchmarks
[14]
Christoph W. Kessler. 1998. Scheduling Expression DAGs for Minimal Register Need. Computer Languages, 24, 1 (1998), 33–53. issn:0096-0551 https://doi.org/10.1016/S0096-0551(98)00002-2
[15]
Robert Klein. 2001. Scheduling of Resource Constrained Projects. Kluwer Academic Publishers, USA. isbn:079238637x
[16]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO ’04). IEEE Computer Society, USA. 75. isbn:0769521029 https://doi.org/10.1109/CGO.2004.1281665
[17]
Roberto Castañeda Lozano. 2018. Constraint-Based Register Allocation and Instruction Scheduling. Ph.D. Dissertation. KTH Royal Institute of Technology.
[18]
Roberto Castañeda Lozano, Mats Carlsson, Gabriel Hjort Blindell, and Christian Schulte. 2019. Combinatorial Register Allocation and Instruction Scheduling. ACM Trans. on Programming Languages and Systems, 41, 3 (2019), Article 17, July, 53 pages. issn:0164-0925 https://doi.org/10.1145/3332373
[19]
Roberto Castañeda Lozano and Christian Schulte. 2019. Survey on Combinatorial Register Allocation and Instruction Scheduling. Comput. Surveys, 52, 3 (2019), Article 62, June, 50 pages. issn:0360-0300 https://doi.org/10.1145/3200920
[20]
Abid Malik. 2008. Constraint Programming Techniques for Optimal Instruction Scheduling. Ph.D. Dissertation. University of Waterloo.
[21]
R. Montemanni, D. H. Smith, and L. M. Gambardella. 2008. A Heuristic Manipulation Technique for the Sequential Ordering Problem. Comput. Oper. Res., 35, 12 (2008), dec, 3931–3944. issn:0305-0548 https://doi.org/10.1016/j.cor.2007.05.003
[22]
C. V. Ramamoorthy, K. M. Chandy, and Mario J. Gonzalez. 1972. Optimal Scheduling Strategies in a Multiprocessor System. IEEE Trans. Comput., C-21, 2 (1972), 137–146. https://doi.org/10.1109/TC.1972.5008918
[23]
Prashant Singh Rawat, Fabrice Rastello, Aravind Sukumaran-Rajam, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2018. Register Optimizations for Stencils on GPUs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 168–182. isbn:9781450349826 https://doi.org/10.1145/3178487.3178500
[24]
Ghassan Shobaki, Vahl Scott Gordon, Paul McHugh, Theodore Dubois, and Austin Kerbow. 2022. Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization. ACM Trans. Archit. Code Optim., 19, 2 (2022), Article 23, jan, 23 pages. issn:1544-3566 https://doi.org/10.1145/3505558
[25]
Ghassan Shobaki, Austin Kerbow, and Stanislav Mekhanoshin. 2020. Optimizing Occupancy and ILP on the GPU Using a Combinatorial Approach. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (CGO 2020). Association for Computing Machinery, New York, NY, USA. 133–144. isbn:9781450370479 https://doi.org/10.1145/3368826.3377918
[26]
Ghassan Shobaki, Austin Kerbow, Christopher Pulido, and William Dobson. 2019. Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling. ACM Trans. Archit. Code Optim., 16, 1 (2019), Article 1, Feb., 30 pages. issn:1544-3566 https://doi.org/10.1145/3301489
[27]
Ghassan Shobaki, Laith Sakka, Najm Eldeen Abu Rmaileh, and Hasan Al-Hamash. 2015. Experimental Evaluation of Various Register-Pressure-Reduction Heuristics. Softw. Pract. Exper., 45, 11 (2015), Nov., 1497–1517. issn:0038-0644 https://doi.org/10.1002/spe.2297
[28]
Ghassan Shobaki, Maxim Shawabkeh, and Najm Eldeen Abu Rmaileh. 2013. Preallocation Instruction Scheduling with Register Pressure Minimization Using a Combinatorial Optimization Approach. ACM Trans. Archit. Code Optim., 10, 3 (2013), Article 14, Sept., 31 pages. issn:1544-3566 https://doi.org/10.1145/2512432
[29]
Oliver Sinnen. 2014. Reducing the solution space of optimal task scheduling. Computers & Operations Research, 43 (2014), 201–214. issn:0305-0548 https://doi.org/10.1016/j.cor.2013.09.004
[30]
Standard Performance Evaluation Corporation. 2017. SPEC CPU 2017. https://www.spec.org/cpu2017/
[31]
Sid-Ahmed-Ali Touati. 2005. Register Saturation in Instruction Level Parallelism. Intl. Journal of Parallel Prog., 33, 4 (2005), Aug., 393–449. issn:0885-7458 https://doi.org/10.1007/s10766-005-6466-x
[32]
Kent Wilken, Jack Liu, and Mark Heffernan. 2000. Optimal Instruction Scheduling Using Integer Programming. SIGPLAN Not., 35, 5 (2000), may, 121–133. issn:0362-1340 https://doi.org/10.1145/358438.349318

Cited By

View all
  • (2023)rNdN: Fast Query Compilation for NVIDIA GPUsACM Transactions on Architecture and Code Optimization10.1145/360350320:3(1-25)Online publication date: 19-Jul-2023
  • (2022)A Branch-and-Bound Algorithm for Minimizing the Total Tardiness of Multiple DevelopersMathematics10.3390/math1007120010:7(1200)Online publication date: 6-Apr-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CC 2022: Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction
March 2022
253 pages
ISBN:9781450391832
DOI:10.1145/3497776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. branch and bound
  2. dominance
  3. graph transformations
  4. instruction scheduling
  5. register-pressure reduction

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Conference

CC '22
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)352
  • Downloads (Last 6 weeks)38
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)rNdN: Fast Query Compilation for NVIDIA GPUsACM Transactions on Architecture and Code Optimization10.1145/360350320:3(1-25)Online publication date: 19-Jul-2023
  • (2022)A Branch-and-Bound Algorithm for Minimizing the Total Tardiness of Multiple DevelopersMathematics10.3390/math1007120010:7(1200)Online publication date: 6-Apr-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media