[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/563998.564019acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Graph-partitioning based instruction scheduling for clustered processors

Published: 01 December 2001 Publication History

Abstract

This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is based on a preliminary cluster assignment phase implemented through graph partitioning techniques followed by a scheduling phase that integrates register allocation and spill code generation. The graph partitioning scheme is shown to be very effective due to its global view of the whole code while the partition is generated. Results show a significant speedup when compared with previously proposed techniques. For some processor configuration the average speedup for the SPECfp95 is 23% with respect to the published scheme with the best performance. Besides, the proposed scheme is much faster (between 2-7 times, depending on the configuration).

References

[1]
A. Agarwal, M. S. Hrishikesh, S. W. Keckler and D. Burger, "Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures", in Proc. of the 27th Int. Symp. on Computer Architecture, pp. 248-259, June 2000
[2]
E. Ayguadé, C. Barrado, A. González, J. Labarta, D. López, S. Moreno, D. Padua, F. Reig, Q. Riera and M. Valero, "Ictineo: a Tool for Research on ILP", in SC'96, Research Exhibit "Polaris at Work", 1996
[3]
A. Capitanio, D. Dytt and A. Nicolau, "Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs", in Procs. of 25th. Int. Symp. on Microarchitecture (MICRO-25), pp. 192-300, 1992
[4]
J. M. Codina, J. Sánchez and A. González, "A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'2001), Sept. 2001
[5]
C. Ding, S. Carr and P. Sweany, "Modulo Scheduling with cache reuse information", in Procs. of Europar'97, pp. 1079-1083, August 1997
[6]
A. E. Eichenberger, E. S. Davidson and S. G. Abraham, "Optimum Module Schedules for Minimum Register Requirements", in Procs. of Supercomputing 95, pp.31-40, July 1995
[7]
A. E. Eichenberger and E. S. Davidson, "Stage Scheduling: a Technique to Reduce the Register Requirements of a Module Schedule", in Procs. of 28th. Int. Symp. on Microarchitecture (MICRO-28), pp.338-349, Nov. 1995
[8]
J. R. Ellis, "Bulldog: A Compiler for VLIW Architectures", MIT Press, pp. 180-184, 1986
[9]
P. Faraboschi, G. Brown, J. Fisher, G. Desoli and F. Homewood, "Lx: A Technology Platform for Customizable VLIW Embedded Processing", in Proc. of the 27th Int. Symp. on Computer Architecture, pp. 203-213, June 2000
[10]
M. M. Fernandes, J. Llosa and N. Topham, "Distributed Modulo Scheduling", in Procs. of Int. Symp. on High-Performance Computer Architecture (HPCA-5), pp. 130-134, Jan. 1999
[11]
C. M. Fiduccia, R. M. Mattheyses, "A Linear-Time Heuristic for Improving Network Partitions", in Proc. ogf the 19th IEEE Design Automation Conference, pp. 175-181, 1982
[12]
J. Fridman and Zvi Greefield, "The TigerSharc DSP Architecture", IEEE Micro, pp. 66-76, Jan-Feb. 2000
[13]
L. Gwennap, "Digital 21264 Sets New Standard", Microprocessor Report, 10(14), Oct. 1996
[14]
R. Govindarajan, E. R. Altman and G. R. Gao, "Minimal Register Requirements Under Resource-Constrained Software Pipelining", in Procs. of 27th. Int. Symp. on Microarchitecture (MICRO-27), pp.85-94, Nov. 1994
[15]
B. Hendrickson and R. Leland, The Chaco User's Guide version 2.0, Tech. ReportSAND95-2344, Sandia National Labs, Albuquerque, NM, 1995
[16]
R. Ho, K. Mai, and M. Horowitz, "The Future of Wires", in Procs. of the IEEE, pp. 490-504, April 2001
[17]
R. A. Huff, "Lifetime-Sensitive Modulo Scheduling", in Procs. of Int. Conf. on Programming Languages Design and Implementation (PLDI'93), pp.318-328, 1993
[18]
S. Jain, "Circular Scheduling: a New Technique to Perform Software Pipelining", in Procs. of rocs. of Int. Conf. on Programming Languages Design and Implementation (PLDI'91), pp.219-228, June 1991
[19]
S. Jang, S. Carr, P. Sweany and D. Kuras, "A Code Generation Framework for VLIW Architectures with Partitioned Register Banks", in Procs. of 3rd. Int. Conf. on Massively Parallel Computing Systems, April 1998
[20]
K. Kailas, K. Ebcioglu and A. Agrawala, "CARS: A New Code Generation Framework for CLustered ILP Processors", in Proc. 7th Int. Symp. on High-Performance Computer Architecture (HPCA-7), Jan. 2001
[21]
G. Karypis and V. Kumar, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, Tech. Report 95-035, Dept. of Computer Science, University of Minnesota, Minneapolis, MN, 1995
[22]
G. Karypis and V. Kumar, "Analysis of Multilevel Graph Partitioning", in Proc. of 7th Supercomputing Conference, 1995
[23]
B. W. Kernighan and S. Lin, "An Effective Heuristic Procedure for Partitioning Graphs" Bell Syst. Tech. J., pp. 291-307, 1970
[24]
M. S. Lam, "Software Pipelining: an Effective Scheduling Technique for VLIW Machines", in Procs. of Int. Conf. on Programming Languages Design and Implementation (PLDI'88), pp.318-328, June 1988
[25]
J. Llosa, A. González, E. Ayguadé and M. Valero, "Swing Modulo Scheduling", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'96), pp.80-86, Oct. 1996
[26]
MAP1000 unfolds at Equator", Microprocessor Report, 12(16), Dec. 1998
[27]
D. Matzke, "Will physical scalability sabotage performance gains?", IEEE Computer, 30,(9), pp. 37-39, September 1997
[28]
K. Mehlhorn and S. Naher, "LEDA: a library of efficient data structures and algorithms",ACM Communications, 38, pp. 96-102, 1995
[29]
E. Nystrom and A. E. Eichenberger, "Effective Cluster Assingment for Modulo Scheduling", in Procs. of 31th. Int. Symp. on Microarchitecture (MICRO-31), pp.103-114, 1998
[30]
E. Özer, S. Banerjia and T. M. Conte, "Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures", in Procs. of 31st Int. Symp. on Microarchitecture (MICRO-31), pp. 308-315, Nov. 1998
[31]
G. G. Pechanek and S. Vassiliadis, "The ManArray Embedded Processor Architecture", in Proc. of 26th. Euromicro Conference, pp. 348-355, Sept. 2000
[32]
B. R. Rau and C. D. Glaeser, "Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Perfornance Scientific Computing", in Procs. of Workshop on Microarchitecture (MICRO-14), pp.183-198, Oct. 1981
[33]
B. R. Rau, "Iterative Modulo Scheduling: an Algorithm for Software Pipelining Loops", in Procs. of 3Oth. Int. Symp. on Microarchitecture (MICRO-27), pp.63-74, Nov. 1994
[34]
J. Sánchez and A. González, "Cache Sensitive Modulo Scheduling", in Procs. of 3Oth. Int. Symp. on Microarchitecture (MICRO-30), pp. 338-348, Dec. 1997
[35]
J. Sánchez and A. González, "The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures", in Procs. of the 29th. Int. Conf. on Parallel Processing (ICPP-29), pp. 555-562, Aug. 2000
[36]
J. Sánchez and A. González, "Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture", in Procs. of 33th. Int. Symp. on Microarchitecture (MICRO-33), Dec. 2000
[37]
J. E. Smith, "Instruction-Level Distributed Processor", IEEE Computer, 34(4), pp. 59-65, April 2001
[38]
Y. Taur, "CMOS Scaling and Issues in Sub-0.25 µm Systems", in Desing of High-Performance Microprocessor Circuits, IEEE Press, pp. 27-45, 2001
[39]
Texas Instruments Inc., "TMS320C62x/67x CPU and Instruction Set Reference Guide", 1998
[40]
J. Wang and C. Eisenbeis, "Decomposed Software Pipelining: a New Approach to Exploit Instruction Level Parallelism for Loops Programs", in IFIP, Jan. 1993

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
December 2001
355 pages
ISBN:0769513697

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2001

Check for updates

Qualifiers

  • Article

Conference

MICRO-34
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)The CUREACM Transactions on Embedded Computing Systems10.1145/312652716:5s(1-19)Online publication date: 27-Sep-2017
  • (2012)SIMD defragmenterACM SIGPLAN Notices10.1145/2248487.215101447:4(363-374)Online publication date: 3-Mar-2012
  • (2012)SIMD defragmenterACM SIGARCH Computer Architecture News10.1145/2189750.215101440:1(363-374)Online publication date: 3-Mar-2012
  • (2012)SIMD defragmenterProceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems10.1145/2150976.2151014(363-374)Online publication date: 3-Mar-2012
  • (2008)Modulo scheduling for highly customized datapaths to increase hardware reusabilityProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1356058.1356075(124-133)Online publication date: 6-Apr-2008
  • (2007)Heterogeneous Clustered VLIW MicroarchitecturesProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2007.15(354-366)Online publication date: 11-Mar-2007
  • (2006)Performance evaluation of scheduling applications with DAG topologies on multiclusters with independent local schedulersProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898862(325-325)Online publication date: 25-Apr-2006
  • (2006)Compiler-assisted leakage energy optimization for clustered VLIW architecturesProceedings of the 6th ACM & IEEE International conference on Embedded software10.1145/1176887.1176921(233-241)Online publication date: 22-Oct-2006
  • (2006)Compiler-directed Data Partitioning for Multicluster ProcessorsProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2006.9(208-220)Online publication date: 26-Mar-2006
  • (2005)On unit task linear-nonlinear two-cluster scheduling problemProceedings of the 2005 ACM symposium on Applied computing10.1145/1066677.1066839(713-717)Online publication date: 13-Mar-2005
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media