[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Lifetime-sensitive modulo scheduling

Published: 01 June 1993 Publication History

Abstract

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results—when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure—indicate near-optimal performance.

References

[1]
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the Tenth Annual ACM Symposium on Principles of Programming Languages, pages 177-189, Jan. 1983.
[2]
G. R. Beck, D. W. L. Yen, and T. L. Anderson. The Cych'a-5 mini-supercomputer: Architecture and implementation. Journal of Supercomputing, 7(1/2), Jan. 1993.
[3]
D. G. Bradlee, S. J. Eggers, and R. R. Henry. Integrating register allocation and instruction scheduling for RISCs. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Opera,ling Systems, pages 122-131, Apr. 1991.
[4]
R. Cytron, I. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An efficient method of computing static single assignment form. In Proceedings of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, pages 25-35, Jan. 1989.
[5]
J. C. Dehnert, P. Y.-T Hsu, and I. P. Bratt. Overlapped loop support in the Cydra 5. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 26-38, Apr. 1989.
[6]
J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. Journal of Supercomputing, 7(1/2), Jan. 1993.
[7]
P. B. Gibbons and S. S. Muchnick. Efficient instruction scheduling for a pipelined architecture. In Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction, pages 11-16, 1986.
[8]
J. R. Goodman and W.-C. Hsu. Code scheduling and register allocation in large basic blocks. In Proceedings of the 1988 International Conference on Supercomputing, pages 442-452, June 1988.
[9]
M. S. Lain. A Systolic Array Optimizing Compiler. Kh. twer Academic Publishers, 1989.
[10]
D. Landakov, S. Davidson, B. Shriver, and E W. Mallet. Local microcode compaction techniques. ACM Computing Surveys, pages 261-294, Sept. 1980.
[11]
E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Saunders College Publishing, 1976.
[12]
P.G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multifiow trace scheduling compiler. Journal of Supercomputing, 7(1}2), Jan. 1993.
[13]
J. C. H. Park and M. S. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Laboratories, May 1991.
[14]
S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pages 39-45, June 1992.
[15]
B. R. Rau. Data flow and dependence analysis for instruction level parallelism. In Fourth Workshop on Languages and Compilers for Parallel Computing, pages 236-250, Aug. 1991.
[16]
B.R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing, 7(112), Jan. 1993.
[17]
B. R. Rau and C. D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings of the 14th Annual Microprogramming Workshop, pages 183-197, Oct. 1981.
[18]
B.R. Rau, M. Lee, P. Tu'umalai, and M. S. Schlansker. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 283-299, June 1992.
[19]
B.R. Rau, M. S. Schlansker, and P. Tlrumalai. Code generation schemas for modulo scheduled loops. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 158-169, Dec. 1992.
[20]
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. IEEE Computer, pages 12-35, Jan. 1989.
[21]
J.C. Tiernan. An efficient search algorithm to find the elementary circuits of a graph. Communications of the ACM, pages 722-726, Dec. 1970.
[22]
P. Tlrumalai, M. Lee, and M. S. Schlansker. Parallelization of loops with exits on pipelined architectures, in IEEE Proceedings of Supercomputing ' 90, pages 200-212, Nov. 1990.
[23]
N. J. Warter, J. W. Bockhaus, G. E. Haab, and K. Subramanian. Enhancexl modulo scheduling for loops with conditional branches. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 170-179, Dec. 1992.
[24]
P. Wijaya and V. H. Allan. Incremental foresighted local compaction. In Proceedings of the 22nd Annual International Symposium on Microarchitecture, pages 163-171, Aug. 1989.
[25]
M. E. Wolf and M. S. Lain. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30-44, June 1991.

Cited By

View all
  • (2023)Implementation of Dataflow Software Pipelining for Codelet ModelProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583734(161-172)Online publication date: 15-Apr-2023
  • (2022)Optimal and Heuristic Approaches to Modulo Scheduling With Rational Initiation Intervals in Hardware SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.306032041:3(614-627)Online publication date: Mar-2022
  • (2020)Pipeline Synthesis and Optimization from Branched Feedback Dataflow ProgramsJournal of Signal Processing Systems10.1007/s11265-020-01568-5Online publication date: 11-Jul-2020
  • Show More Cited By

Recommendations

Reviews

Benjamin Rayborn Seyfarth

The author presents the concept of bidirectional slack-scheduling to improve the efficiency of register allocation for loops. Basically, he presents heuristics for determining whether to attempt to schedule an operation as early as possible or as late as possible in an attempt to reduce the register pressure within a loop. Then he uses a backtracking algorithm to perform instruction scheduling. The author also presents empirical results from incorporating his techniques into a FORTRAN compiler for a hypothetical very long instruction word (VLIW) machine similar to the Cydrome Cydra. His results indicate that his scheduling technique allows optimal scheduling for 96 percent of the loops in a set of sample code including 1525 loops, resulting in an 11 percent improvement over the Cydrome compiler. The author further states that the technique can be adapted to RISC machines. The paper is a nice blend of theory and practice. The reader is expected to be familiar with register allocation strategies and, for such readers, the paper is quite readable. For those needing to study some background material, this paper includes a sufficient bibliography for further exploration. Both compiler writers and theorists will find this paper useful.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 28, Issue 6
June 1993
313 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/173262
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
    August 1993
    313 pages
    ISBN:0897915984
    DOI:10.1145/155090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993
Published in SIGPLAN Volume 28, Issue 6

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)108
  • Downloads (Last 6 weeks)13
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Implementation of Dataflow Software Pipelining for Codelet ModelProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583734(161-172)Online publication date: 15-Apr-2023
  • (2022)Optimal and Heuristic Approaches to Modulo Scheduling With Rational Initiation Intervals in Hardware SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.306032041:3(614-627)Online publication date: Mar-2022
  • (2020)Pipeline Synthesis and Optimization from Branched Feedback Dataflow ProgramsJournal of Signal Processing Systems10.1007/s11265-020-01568-5Online publication date: 11-Jul-2020
  • (2020)Compiler Optimizing for Power Efficiency of On-Chip MemoryAdvanced Computer Architecture10.1007/978-981-15-8135-9_21(290-303)Online publication date: 5-Sep-2020
  • (2016)Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usageEURASIP Journal on Advances in Signal Processing10.1186/s13634-016-0336-02016:1Online publication date: 31-Mar-2016
  • (2016)Minimizing Register Requirements of a Modulo Schedule via Optimum Stage SchedulingInternational Journal of Parallel Programming10.1007/BF0335674424:2(103-132)Online publication date: 26-May-2016
  • (2016)Software Pipelining by Kernel RecognitionInstruction Level Parallelism10.1007/978-1-4899-7797-7_7(167-203)Online publication date: 30-Nov-2016
  • (2016)Modulo SchedulingInstruction Level Parallelism10.1007/978-1-4899-7797-7_6(133-165)Online publication date: 30-Nov-2016
  • (2014)Improving performance of loops on DIAM-based VLIW architecturesACM SIGPLAN Notices10.1145/2666357.259782549:5(135-144)Online publication date: 12-Jun-2014
  • (2014)Improving performance of loops on DIAM-based VLIW architecturesProceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2597809.2597825(135-144)Online publication date: 12-Jun-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media