article

Free access

Lifetime-sensitive modulo scheduling

Author:

Richard A. HuffAuthors Info & Claims

ACM SIGPLAN Notices, Volume 28, Issue 6

Pages 258 - 267

https://doi.org/10.1145/173262.155115

Published: 01 June 1993 Publication History

PDF eReader

Abstract

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results—when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure—indicate near-optimal performance.

References

[1]

J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the Tenth Annual ACM Symposium on Principles of Programming Languages, pages 177-189, Jan. 1983.

Digital Library

Google Scholar

[2]

G. R. Beck, D. W. L. Yen, and T. L. Anderson. The Cych'a-5 mini-supercomputer: Architecture and implementation. Journal of Supercomputing, 7(1/2), Jan. 1993.

Digital Library

Google Scholar

[3]

D. G. Bradlee, S. J. Eggers, and R. R. Henry. Integrating register allocation and instruction scheduling for RISCs. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Opera,ling Systems, pages 122-131, Apr. 1991.

Digital Library

Google Scholar

[4]

R. Cytron, I. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An efficient method of computing static single assignment form. In Proceedings of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, pages 25-35, Jan. 1989.

Digital Library

Google Scholar

[5]

J. C. Dehnert, P. Y.-T Hsu, and I. P. Bratt. Overlapped loop support in the Cydra 5. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 26-38, Apr. 1989.

Digital Library

Google Scholar

[6]

J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. Journal of Supercomputing, 7(1/2), Jan. 1993.

Digital Library

Google Scholar

[7]

P. B. Gibbons and S. S. Muchnick. Efficient instruction scheduling for a pipelined architecture. In Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction, pages 11-16, 1986.

Digital Library

Google Scholar

[8]

J. R. Goodman and W.-C. Hsu. Code scheduling and register allocation in large basic blocks. In Proceedings of the 1988 International Conference on Supercomputing, pages 442-452, June 1988.

Digital Library

Google Scholar

[9]

M. S. Lain. A Systolic Array Optimizing Compiler. Kh. twer Academic Publishers, 1989.

Digital Library

Google Scholar

[10]

D. Landakov, S. Davidson, B. Shriver, and E W. Mallet. Local microcode compaction techniques. ACM Computing Surveys, pages 261-294, Sept. 1980.

Digital Library

Google Scholar

[11]

E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Saunders College Publishing, 1976.

Google Scholar

[12]

P.G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multifiow trace scheduling compiler. Journal of Supercomputing, 7(1}2), Jan. 1993.

Digital Library

Google Scholar

[13]

J. C. H. Park and M. S. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Laboratories, May 1991.

Google Scholar

[14]

S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pages 39-45, June 1992.

Google Scholar

[15]

B. R. Rau. Data flow and dependence analysis for instruction level parallelism. In Fourth Workshop on Languages and Compilers for Parallel Computing, pages 236-250, Aug. 1991.

Digital Library

Google Scholar

[16]

B.R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing, 7(112), Jan. 1993.

Digital Library

Google Scholar

[17]

B. R. Rau and C. D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings of the 14th Annual Microprogramming Workshop, pages 183-197, Oct. 1981.

Digital Library

Google Scholar

[18]

B.R. Rau, M. Lee, P. Tu'umalai, and M. S. Schlansker. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 283-299, June 1992.

Digital Library

Google Scholar

[19]

B.R. Rau, M. S. Schlansker, and P. Tlrumalai. Code generation schemas for modulo scheduled loops. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 158-169, Dec. 1992.

Digital Library

Google Scholar

[20]

B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. IEEE Computer, pages 12-35, Jan. 1989.

Digital Library

Google Scholar

[21]

J.C. Tiernan. An efficient search algorithm to find the elementary circuits of a graph. Communications of the ACM, pages 722-726, Dec. 1970.

Digital Library

Google Scholar

[22]

P. Tlrumalai, M. Lee, and M. S. Schlansker. Parallelization of loops with exits on pipelined architectures, in IEEE Proceedings of Supercomputing ' 90, pages 200-212, Nov. 1990.

Digital Library

Google Scholar

[23]

N. J. Warter, J. W. Bockhaus, G. E. Haab, and K. Subramanian. Enhancexl modulo scheduling for loops with conditional branches. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 170-179, Dec. 1992.

Digital Library

Google Scholar

[24]

P. Wijaya and V. H. Allan. Incremental foresighted local compaction. In Proceedings of the 22nd Annual International Symposium on Microarchitecture, pages 163-171, Aug. 1989.

Digital Library

Google Scholar

[25]

M. E. Wolf and M. S. Lain. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30-44, June 1991.

Digital Library

Google Scholar

Cited By

View all

Raskar SMonsalve Diaz JApplencourt TKumaran KGao GVieira MCardellini VDi Marco ATuma P(2023)Implementation of Dataflow Software Pipelining for Codelet ModelProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583734(161-172)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578244.3583734
Sittel PFiege NWickerson JZipf P(2022)Optimal and Heuristic Approaches to Modulo Scheduling With Rational Initiation Intervals in Hardware SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.306032041:3(614-627)Online publication date: Mar-2022
https://doi.org/10.1109/TCAD.2021.3060320
Prihozhy ACasale-Brunet SBezati EMattavelli M(2020)Pipeline Synthesis and Optimization from Branched Feedback Dataflow ProgramsJournal of Signal Processing Systems10.1007/s11265-020-01568-5Online publication date: 11-Jul-2020
https://doi.org/10.1007/s11265-020-01568-5
Show More Cited By

Index Terms

Lifetime-sensitive modulo scheduling
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types

Recommendations

Lifetime-sensitive modulo scheduling
PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many ...
Modulo scheduling for control-intensive general-purpose programs
Register Constrained Modulo Scheduling

Abstract--Software pipelining is an instruction scheduling technique that exploits the instruction level parallelism (ILP) available in loops by overlapping operations from various successive loop iterations. The main drawback of aggressive software ...

Reviews

Reviewer: Benjamin Rayborn Seyfarth

The author presents the concept of bidirectional slack-scheduling to improve the efficiency of register allocation for loops. Basically, he presents heuristics for determining whether to attempt to schedule an operation as early as possible or as late as possible in an attempt to reduce the register pressure within a loop. Then he uses a backtracking algorithm to perform instruction scheduling. The author also presents empirical results from incorporating his techniques into a FORTRAN compiler for a hypothetical very long instruction word (VLIW) machine similar to the Cydrome Cydra. His results indicate that his scheduling technique allows optimal scheduling for 96 percent of the loops in a set of sample code including 1525 loops, resulting in an 11 percent improvement over the Cydrome compiler. The author further states that the technique can be adapted to RISC machines. The paper is a nice blend of theory and practice. The reader is expected to be familiar with register allocation strategies and, for such readers, the paper is quite readable. For those needing to study some background material, this paper includes a sufficient bibliography for further exploration. Both compiler writers and theorists will find this paper useful.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACM SIGPLAN Notices Volume 28, Issue 6

June 1993

313 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/173262

Editor:
Richard Wexelblat
IDA/CSED, Alexandria, VA

Issue’s Table of Contents

PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
August 1993
313 pages
ISBN:0897915984
DOI:10.1145/155090
Editor:
Robert Cartwright
Rice Univ., Houston, TX

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993

Published in SIGPLAN Volume 28, Issue 6

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

203
Total Citations
View Citations
894
Total Downloads

Downloads (Last 12 months)108
Downloads (Last 6 weeks)13

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Raskar SMonsalve Diaz JApplencourt TKumaran KGao GVieira MCardellini VDi Marco ATuma P(2023)Implementation of Dataflow Software Pipelining for Codelet ModelProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583734(161-172)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578244.3583734
Sittel PFiege NWickerson JZipf P(2022)Optimal and Heuristic Approaches to Modulo Scheduling With Rational Initiation Intervals in Hardware SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.306032041:3(614-627)Online publication date: Mar-2022
https://doi.org/10.1109/TCAD.2021.3060320
Prihozhy ACasale-Brunet SBezati EMattavelli M(2020)Pipeline Synthesis and Optimization from Branched Feedback Dataflow ProgramsJournal of Signal Processing Systems10.1007/s11265-020-01568-5Online publication date: 11-Jul-2020
https://doi.org/10.1007/s11265-020-01568-5
Wu WZhu QWang FLin RQi F(2020)Compiler Optimizing for Power Efficiency of On-Chip MemoryAdvanced Computer Architecture10.1007/978-981-15-8135-9_21(290-303)Online publication date: 5-Sep-2020
https://doi.org/10.1007/978-981-15-8135-9_21
Bahtat MBelkouch SElleaume PLe Gall P(2016)Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usageEURASIP Journal on Advances in Signal Processing10.1186/s13634-016-0336-02016:1Online publication date: 31-Mar-2016
https://doi.org/10.1186/s13634-016-0336-0
Eichenberger ADavidson EAbraham S(2016)Minimizing Register Requirements of a Modulo Schedule via Optimum Stage SchedulingInternational Journal of Parallel Programming10.1007/BF0335674424:2(103-132)Online publication date: 26-May-2016
https://doi.org/10.1007/BF03356744
Aiken ABanerjee UKejariwal ANicolau AAiken ABanerjee UKejariwal ANicolau A(2016)Software Pipelining by Kernel RecognitionInstruction Level Parallelism10.1007/978-1-4899-7797-7_7(167-203)Online publication date: 30-Nov-2016
https://doi.org/10.1007/978-1-4899-7797-7_7
Aiken ABanerjee UKejariwal ANicolau AAiken ABanerjee UKejariwal ANicolau A(2016)Modulo SchedulingInstruction Level Parallelism10.1007/978-1-4899-7797-7_6(133-165)Online publication date: 30-Nov-2016
https://doi.org/10.1007/978-1-4899-7797-7_6
Lee JLee JLee JPaek Y(2014)Improving performance of loops on DIAM-based VLIW architecturesACM SIGPLAN Notices10.1145/2666357.259782549:5(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2666357.2597825
Lee JLee JLee JPaek YZhang YKulkarni P(2014)Improving performance of loops on DIAM-based VLIW architecturesProceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2597809.2597825(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2597809.2597825
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations