Article

Register Assignment for Software Pipelining with Partitioned Register Banks

IPDPS '00: Proceedings of the 14th International Symposium on Parallel and Distributed Processing

Page 211

Published: 01 May 2000 Publication History

Abstract

Many techniques for increasing the amount of instruction-level parallelism (ILP) put increased pressure on the registers inside a CPU. These techniques allow for more operations to occur simultaneously at the cost of requiring more registers to hold the operands and results of those operations, and importantly, more ports on the register banks to allow for concurrent access to the data. One approach of ameliorating the number of ports on a register bank (the cost of ports in gates varies as N^2 where N is the number of ports, and adding ports increases access time) is to have multiple register banks with fewer ports, each attached to a subset of the available functional units. This reduces the number of ports needed on a per-bank basis, but can slow operations if a necessary value is not in an attached register bank as copy operations must be inserted. Therefore, there is a circular dependence between assigning operations to functional units and assigning values to register banks. We describe an approach that produces good code by separating partitioning from scheduling and register assignment. Our method is independent of both the scheduling technique and register assignment method used.

Cited By

View all

Tang FYou IGuo MGuo SZheng L(2010)Balanced bipartite graph based register allocation for network processors in mobile and wireless networksMobile Information Systems10.1155/2010/9861926:1(65-83)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1155/2010/986192
Carr SSweany PIrwin MZhao WLavagno LMahlke S(2004)Automatic data partitioning for the agere payload plus network processorProceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1023833.1023867(238-247)Online publication date: 22-Sep-2004
https://dl.acm.org/doi/10.1145/1023833.1023867
Qian YCarr SSweany P(2002)Optimizing Loop Performance for Clustered VLIW ArchitecturesProceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques10.5555/645989.674315(271-280)Online publication date: 22-Sep-2002
https://dl.acm.org/doi/10.5555/645989.674315
Show More Cited By

Recommendations

Loop Transformations for Architectures with Partitioned Register Banks
LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, processors can often exploit considerable instruction-level parallelism (ILP), and thus ...
Loop Transformations for Architectures with Partitioned Register Banks

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, processors can often exploit considerable instruction-level parallelism (ILP), and thus ...
Software register synchronization for super-scalar processors with partitioned register files

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

IPDPS '00: Proceedings of the 14th International Symposium on Parallel and Distributed Processing

May 2000

ISBN:0769505740

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2000

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tang FYou IGuo MGuo SZheng L(2010)Balanced bipartite graph based register allocation for network processors in mobile and wireless networksMobile Information Systems10.1155/2010/9861926:1(65-83)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1155/2010/986192
Carr SSweany PIrwin MZhao WLavagno LMahlke S(2004)Automatic data partitioning for the agere payload plus network processorProceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1023833.1023867(238-247)Online publication date: 22-Sep-2004
https://dl.acm.org/doi/10.1145/1023833.1023867
Qian YCarr SSweany P(2002)Optimizing Loop Performance for Clustered VLIW ArchitecturesProceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques10.5555/645989.674315(271-280)Online publication date: 22-Sep-2002
https://dl.acm.org/doi/10.5555/645989.674315
Qian YCarr SSweany P(2002)Loop fusion for clustered VLIW architecturesACM SIGPLAN Notices10.1145/566225.51385037:7(112-119)Online publication date: 19-Jun-2002
https://dl.acm.org/doi/10.1145/566225.513850
Krishnamurthy GGranston EStotzer EEbcioglu KPingali KNicolau A(2002)Affinity-based cluster assignment for unrolled loopsProceedings of the 16th international conference on Supercomputing10.1145/514191.514209(107-116)Online publication date: 22-Jun-2002
https://dl.acm.org/doi/10.1145/514191.514209
Qian YCarr SSweany PMarwedel PDevadas S(2002)Loop fusion for clustered VLIW architecturesProceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems10.1145/513829.513850(112-119)Online publication date: 19-Jun-2002
https://dl.acm.org/doi/10.1145/513829.513850
Huang XCarr SSweany PBodik RSreedhar V(2001)Loop Transformations for Architectures with Partitioned Register BanksProceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems10.1145/384198.384206(48-55)Online publication date: 1-Aug-2001
https://dl.acm.org/doi/10.1145/384198.384206
Huang XCarr SSweany PHong SPande S(2001)Loop Transformations for Architectures with Partitioned Register BanksProceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems10.1145/384197.384206(48-55)Online publication date: 1-Aug-2001
https://dl.acm.org/doi/10.1145/384197.384206
Huang XCarr SSweany P(2001)Loop Transformations for Architectures with Partitioned Register BanksACM SIGPLAN Notices10.1145/384196.38420636:8(48-55)Online publication date: 1-Aug-2001
https://dl.acm.org/doi/10.1145/384196.384206

Abstract

Cited By

Recommendations

Loop Transformations for Architectures with Partitioned Register Banks

Loop Transformations for Architectures with Partitioned Register Banks

Software register synchronization for super-scalar processors with partitioned register files

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media