A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Krishnan Kailas⁵,
Manoj Franklin⁶ &
Kemal Ebcioğlu⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2400))

Included in the following conference series:

European Conference on Parallel Processing

436 Accesses
4 Citations

Abstract

In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into onchip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction. Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers. We present an efficient code generation algorithm to schedule sendb operations on-the-fly. Detailed experimental results show that a windowed CRB with just 4 entries provides the same performance as that of a partitioned register file with infinite non-architected register space for keeping remote registers.

Supported in part by U.S. National Science Foundation (NSF) through a CAREER grant (MIP 9702569) and a regular grant (CCR 0073582).

Download to read the full chapter text

Chapter PDF

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

A taxonomy of task-based parallel programming technologies for high-performance computing

Article Open access 12 January 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

V. Zyuban and P. M. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” IEEE Trans. on Computers, vol. 50, pp. 268–285, Mar. 2001.
Google Scholar
P. Faraboschi, J. Fisher, G. Brown, G. Desoli, and F. Homewood, “Lx: A Technology Platform for Customizable VLIW Embedded Processing,” in Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
Google Scholar
K. Ebcioğlu, J. Fritts, S. Kosonocky, M. Gschwind, E. Altman, K. Kailas, and T. Bright, “An Eight-Issue Tree-VLIW Processor for Dynamic Binary Translation,” in Proc. of Int. Conf. on Computer Design (ICCD’98), pp. 488–495, 1998.
Google Scholar
Texas Instruments, Inc., TMS320C62x/C67x Technical Brief Apr. 1998.
Google Scholar
J. Fridman and Z. Greenfield, “The TigerSHARC DSP architecture,” IEEE Micro, vol. 20, pp. 66–76, Jan./Feb. 2000.
Google Scholar
R. E. Kessler, “The Alpha 21264 microprocessor,” IEEE Micro, vol. 19, pp. 24–36, Mar./Apr. 1999.
Google Scholar
R. Canal, J. M. Parcerisa, and A. Gonzalez, “Dynamic cluster assignment mech-anisms,” in Proc. of the 6th Int. Conference on High-Performance Computer Architecture (HPCA-6), pp. 133–142, Jan. 2000.
Google Scholar
K. Kailas, Microarchitecture and Compilation Support for Clustered ILP Processors. PhD thesis, Dept. of ECE, University of Maryland, College Park, Mar 2001.
Google Scholar
R. P. Colwell et al., “A VLIW architecture for a trace scheduling compiler,” IEEE Transactions on Computers, vol. C-37, pp. 967–979, Aug. 1988.
Google Scholar
S. Keckler, W. Dally, D. Maskit, N. Carter, A. Chang, and W. Lee, “Exploiting fine-grain thread level parallelism on the MIT Multi-ALU processor,” in Proc. of the 25th Annual Int. Symposium on Computer Architecture, pp. 306–317, 1998.
Google Scholar
A. Capitanio, N. Dutt, and A. Nicolau, “Partitioned register files for VLIWs: A preliminary analysis of tradeoffs,” in Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 292–300, Dec. 1-4, 1992.
Google Scholar
K. Kailas, K. Ebcioğlu, and A. Agrawala, “cars: A New Code Generation Framework for Clustered ILP Processors,” in Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA-7), pp. 133–143, 2001.
Google Scholar
K. Pingali, M. Beck, R. Johnson, M. Moudgill, and P. Stodghill, “Dependence flow graphs: an algebraic approach to program dependencies,” in Proc. of the 18th annual ACM symposium on Principles of programming languages, pp. 67–78, 1991.
Google Scholar
R. Sethi, Algorithms for minimal-length schedules, ch. 2. Computer and job-shop scheduling theory (E. G. Coffman, ed.), John Wiley & Sons, Inc., New York., 1976.
Google Scholar
M. Moudgill, “Implementing an Experimental VLIW Compiler,” IEEE Technical Committee on Computer Architecture Newsletter, pp. 39–40, June 1997.
Google Scholar
P. R. Nuth, The Named-State Register File. PhD thesis, MIT, AI Lab, Aug. 1993.
Google Scholar
H. H. J. Hum, K. B. Theobald, and G. R. Gao, “Building multithreaded architectures with off-the-shelf microprocessors,” in Proceedings of the 8th International Symposium on Parallel Processing, pp. 288–297, 1994.
Google Scholar
E. H. Jensen, “Pipelined register cache.” U.S. Patent No. 5,117,493, May 1992.
Google Scholar
R. Yung and N. C. Wilhelm, “Caching processor general registers,” in International Conference on Computer Design, pp. 307–312, 1995.
Google Scholar
M. M. Fernandes, J. Llosa, and N. Topham, “Extending a VLIW Architecture Model,” technical report ECS-CSG-34-97, Dept. of CS, Edinburgh University, 1997.
Google Scholar
J. Llosa, M. Valero, and E. Ayguade, “Non-consistent dual register files to reduce register pressure,” in Proceedings of the First International Symposium on High-Performance Computer Architecture, pp. 22–31, 1995.
Google Scholar
J.-L. Cruz, A. González, M. Valero, and N. P. Topham, “Multiple-banked register file architectures,” in Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 316–325, 2000.
Google Scholar
J. Zalamea, J. Llosa, E. Ayguade, and M. Valero, “Two-level hierarchical register file organization for VLIW processors,” in Proceedings of the 33rd Annual International Symposium on Microarchitecture (MICRO-33), 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Krishnan Kailas & Kemal Ebcioğlu
Department of ECE, University of Maryland, College Park, MD, USA
Manoj Franklin

Authors

Krishnan Kailas
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Franklin
View author publications
You can also search for this author in PubMed Google Scholar
Kemal Ebcioğlu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich 17, Mathematik und Informatik, Universität Paderborn, Fürstenallee 11, 33102, Paderborn
Burkhard Monien & Rainer Feldmann &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kailas, K., Franklin, M., Ebcioğlu, K. (2002). A Register File Architecture and Compilation Scheme for Clustered ILP Processors. In: Monien, B., Feldmann, R. (eds) Euro-Par 2002 Parallel Processing. Euro-Par 2002. Lecture Notes in Computer Science, vol 2400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45706-2_68

Download citation

DOI: https://doi.org/10.1007/3-540-45706-2_68
Published: 20 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44049-9
Online ISBN: 978-3-540-45706-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Abstract

Chapter PDF

Similar content being viewed by others

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

A taxonomy of task-based parallel programming technologies for high-performance computing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Abstract

Chapter PDF

Similar content being viewed by others

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

A taxonomy of task-based parallel programming technologies for high-performance computing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation