More Web Proxy on the site http://driver.im/

Article

Graph-partitioning based instruction scheduling for clustered processors

Authors:

Josep M. Codina,

Jesús Sánchez,

Antonio GonzálezAuthors Info & Claims

MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture

Pages 150 - 159

Published: 01 December 2001 Publication History

Publisher Site Get Access

Abstract

This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is based on a preliminary cluster assignment phase implemented through graph partitioning techniques followed by a scheduling phase that integrates register allocation and spill code generation. The graph partitioning scheme is shown to be very effective due to its global view of the whole code while the partition is generated. Results show a significant speedup when compared with previously proposed techniques. For some processor configuration the average speedup for the SPECfp95 is 23% with respect to the published scheme with the best performance. Besides, the proposed scheme is much faster (between 2-7 times, depending on the configuration).

References

[1]

A. Agarwal, M. S. Hrishikesh, S. W. Keckler and D. Burger, "Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures", in Proc. of the 27th Int. Symp. on Computer Architecture, pp. 248-259, June 2000

Digital Library

[2]

E. Ayguadé, C. Barrado, A. González, J. Labarta, D. López, S. Moreno, D. Padua, F. Reig, Q. Riera and M. Valero, "Ictineo: a Tool for Research on ILP", in SC'96, Research Exhibit "Polaris at Work", 1996

[3]

A. Capitanio, D. Dytt and A. Nicolau, "Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs", in Procs. of 25th. Int. Symp. on Microarchitecture (MICRO-25), pp. 192-300, 1992

Digital Library

[4]

J. M. Codina, J. Sánchez and A. González, "A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'2001), Sept. 2001

Digital Library

[5]

C. Ding, S. Carr and P. Sweany, "Modulo Scheduling with cache reuse information", in Procs. of Europar'97, pp. 1079-1083, August 1997

Digital Library

[6]

A. E. Eichenberger, E. S. Davidson and S. G. Abraham, "Optimum Module Schedules for Minimum Register Requirements", in Procs. of Supercomputing 95, pp.31-40, July 1995

Digital Library

[7]

A. E. Eichenberger and E. S. Davidson, "Stage Scheduling: a Technique to Reduce the Register Requirements of a Module Schedule", in Procs. of 28th. Int. Symp. on Microarchitecture (MICRO-28), pp.338-349, Nov. 1995

Digital Library

[8]

J. R. Ellis, "Bulldog: A Compiler for VLIW Architectures", MIT Press, pp. 180-184, 1986

Digital Library

[9]

P. Faraboschi, G. Brown, J. Fisher, G. Desoli and F. Homewood, "Lx: A Technology Platform for Customizable VLIW Embedded Processing", in Proc. of the 27th Int. Symp. on Computer Architecture, pp. 203-213, June 2000

Digital Library

[10]

M. M. Fernandes, J. Llosa and N. Topham, "Distributed Modulo Scheduling", in Procs. of Int. Symp. on High-Performance Computer Architecture (HPCA-5), pp. 130-134, Jan. 1999

Digital Library

[11]

C. M. Fiduccia, R. M. Mattheyses, "A Linear-Time Heuristic for Improving Network Partitions", in Proc. ogf the 19th IEEE Design Automation Conference, pp. 175-181, 1982

Digital Library

[12]

J. Fridman and Zvi Greefield, "The TigerSharc DSP Architecture", IEEE Micro, pp. 66-76, Jan-Feb. 2000

Digital Library

[13]

L. Gwennap, "Digital 21264 Sets New Standard", Microprocessor Report, 10(14), Oct. 1996

[14]

R. Govindarajan, E. R. Altman and G. R. Gao, "Minimal Register Requirements Under Resource-Constrained Software Pipelining", in Procs. of 27th. Int. Symp. on Microarchitecture (MICRO-27), pp.85-94, Nov. 1994

Digital Library

[15]

B. Hendrickson and R. Leland, The Chaco User's Guide version 2.0, Tech. ReportSAND95-2344, Sandia National Labs, Albuquerque, NM, 1995

[16]

R. Ho, K. Mai, and M. Horowitz, "The Future of Wires", in Procs. of the IEEE, pp. 490-504, April 2001

[17]

R. A. Huff, "Lifetime-Sensitive Modulo Scheduling", in Procs. of Int. Conf. on Programming Languages Design and Implementation (PLDI'93), pp.318-328, 1993

Digital Library

[18]

S. Jain, "Circular Scheduling: a New Technique to Perform Software Pipelining", in Procs. of rocs. of Int. Conf. on Programming Languages Design and Implementation (PLDI'91), pp.219-228, June 1991

Digital Library

[19]

S. Jang, S. Carr, P. Sweany and D. Kuras, "A Code Generation Framework for VLIW Architectures with Partitioned Register Banks", in Procs. of 3rd. Int. Conf. on Massively Parallel Computing Systems, April 1998

[20]

K. Kailas, K. Ebcioglu and A. Agrawala, "CARS: A New Code Generation Framework for CLustered ILP Processors", in Proc. 7th Int. Symp. on High-Performance Computer Architecture (HPCA-7), Jan. 2001

Digital Library

[21]

G. Karypis and V. Kumar, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, Tech. Report 95-035, Dept. of Computer Science, University of Minnesota, Minneapolis, MN, 1995

[22]

G. Karypis and V. Kumar, "Analysis of Multilevel Graph Partitioning", in Proc. of 7th Supercomputing Conference, 1995

Digital Library

[23]

B. W. Kernighan and S. Lin, "An Effective Heuristic Procedure for Partitioning Graphs" Bell Syst. Tech. J., pp. 291-307, 1970

[24]

M. S. Lam, "Software Pipelining: an Effective Scheduling Technique for VLIW Machines", in Procs. of Int. Conf. on Programming Languages Design and Implementation (PLDI'88), pp.318-328, June 1988

Digital Library

[25]

J. Llosa, A. González, E. Ayguadé and M. Valero, "Swing Modulo Scheduling", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'96), pp.80-86, Oct. 1996

Digital Library

[26]

MAP1000 unfolds at Equator", Microprocessor Report, 12(16), Dec. 1998

[27]

D. Matzke, "Will physical scalability sabotage performance gains?", IEEE Computer, 30,(9), pp. 37-39, September 1997

Digital Library

[28]

K. Mehlhorn and S. Naher, "LEDA: a library of efficient data structures and algorithms",ACM Communications, 38, pp. 96-102, 1995

Digital Library

[29]

E. Nystrom and A. E. Eichenberger, "Effective Cluster Assingment for Modulo Scheduling", in Procs. of 31th. Int. Symp. on Microarchitecture (MICRO-31), pp.103-114, 1998

Digital Library

[30]

E. Özer, S. Banerjia and T. M. Conte, "Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures", in Procs. of 31st Int. Symp. on Microarchitecture (MICRO-31), pp. 308-315, Nov. 1998

Digital Library

[31]

G. G. Pechanek and S. Vassiliadis, "The ManArray Embedded Processor Architecture", in Proc. of 26th. Euromicro Conference, pp. 348-355, Sept. 2000

[32]

B. R. Rau and C. D. Glaeser, "Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Perfornance Scientific Computing", in Procs. of Workshop on Microarchitecture (MICRO-14), pp.183-198, Oct. 1981

Digital Library

[33]

B. R. Rau, "Iterative Modulo Scheduling: an Algorithm for Software Pipelining Loops", in Procs. of 3Oth. Int. Symp. on Microarchitecture (MICRO-27), pp.63-74, Nov. 1994

Digital Library

[34]

J. Sánchez and A. González, "Cache Sensitive Modulo Scheduling", in Procs. of 3Oth. Int. Symp. on Microarchitecture (MICRO-30), pp. 338-348, Dec. 1997

Digital Library

[35]

J. Sánchez and A. González, "The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures", in Procs. of the 29th. Int. Conf. on Parallel Processing (ICPP-29), pp. 555-562, Aug. 2000

Digital Library

[36]

J. Sánchez and A. González, "Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture", in Procs. of 33th. Int. Symp. on Microarchitecture (MICRO-33), Dec. 2000

Digital Library

[37]

J. E. Smith, "Instruction-Level Distributed Processor", IEEE Computer, 34(4), pp. 59-65, April 2001

Digital Library

[38]

Y. Taur, "CMOS Scaling and Issues in Sub-0.25 µm Systems", in Desing of High-Performance Microprocessor Circuits, IEEE Press, pp. 27-45, 2001

[39]

Texas Instruments Inc., "TMS320C62x/67x CPU and Instruction Set Reference Guide", 1998

[40]

J. Wang and C. Eisenbeis, "Decomposed Software Pipelining: a New Approach to Exploit Instruction Level Parallelism for Loops Programs", in IFIP, Jan. 1993

Digital Library

Cited By

Naresh VGope DLipasti M(2017)The CUREACM Transactions on Embedded Computing Systems10.1145/312652716:5s(1-19)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126527
Park YSeo SPark HCho HMahlke S(2012)SIMD defragmenterACM SIGPLAN Notices10.1145/2248487.215101447:4(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2248487.2151014
Park YSeo SPark HCho HMahlke S(2012)SIMD defragmenterACM SIGARCH Computer Architecture News10.1145/2189750.215101440:1(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2189750.2151014
Show More Cited By

Graph-partitioning based instruction scheduling for clustered processors
1. Software and its engineering
  1. Software notations and tools
2. Theory of computation

Recommendations

Instruction scheduling and fetch mechanisms for clustered vliw processors
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Retargetable instruction scheduling for pipelined processors

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture

December 2001

355 pages

ISBN:0769513697

General Chair:
Yale Patt
The University of Texas at Austin
,
Program Chairs:
Josh Fisher
Hewlett-Packard Laboratories
,
Paolo Faraboschi
Hewlett-Packard Laboratories
,
Publications Chair:
Kevin Skadron
University of Virginia

Copyright © Copyright (c) 2001 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2001

Check for updates

Qualifiers

Article

Conference

MICRO-34

Sponsor:

SIGMICRO

MICRO-34: The 34th International Symposium on Microarchitecture

December 1 - 5, 2001

Texas, Austin

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
495
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Naresh VGope DLipasti M(2017)The CUREACM Transactions on Embedded Computing Systems10.1145/312652716:5s(1-19)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126527
Park YSeo SPark HCho HMahlke S(2012)SIMD defragmenterACM SIGPLAN Notices10.1145/2248487.215101447:4(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2248487.2151014
Park YSeo SPark HCho HMahlke S(2012)SIMD defragmenterACM SIGARCH Computer Architecture News10.1145/2189750.215101440:1(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2189750.2151014
Park YSeo SPark HCho HMahlke SHarris TScott M(2012)SIMD defragmenterProceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems10.1145/2150976.2151014(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2150976.2151014
Fan KPark HKudlur MMahlke SSoffa MDuesterwald E(2008)Modulo scheduling for highly customized datapaths to increase hardware reusabilityProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1356058.1356075(124-133)Online publication date: 6-Apr-2008
https://dl.acm.org/doi/10.1145/1356058.1356075
Aleta ACodina JGonzalez AKaeli D(2007)Heterogeneous Clustered VLIW MicroarchitecturesProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2007.15(354-366)Online publication date: 11-Mar-2007
https://dl.acm.org/doi/10.1109/CGO.2007.15
He LJarvis SSpooner DNudd G(2006)Performance evaluation of scheduling applications with DAG topologies on multiclusters with independent local schedulersProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898862(325-325)Online publication date: 25-Apr-2006
https://dl.acm.org/doi/10.5555/1898699.1898862
Nagpal RSrikant YMin SYi W(2006)Compiler-assisted leakage energy optimization for clustered VLIW architecturesProceedings of the 6th ACM & IEEE International conference on Embedded software10.1145/1176887.1176921(233-241)Online publication date: 22-Oct-2006
https://dl.acm.org/doi/10.1145/1176887.1176921
Chu MMahlke S(2006)Compiler-directed Data Partitioning for Multicluster ProcessorsProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2006.9(208-220)Online publication date: 26-Mar-2006
https://dl.acm.org/doi/10.1109/CGO.2006.9
Xiao ZLi WJenq JHaddad HOmicini AWainwright R(2005)On unit task linear-nonlinear two-cluster scheduling problemProceedings of the 2005 ACM symposium on Applied computing10.1145/1066677.1066839(713-717)Online publication date: 13-Mar-2005
https://dl.acm.org/doi/10.1145/1066677.1066839
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents