Abstract
In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into onchip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction. Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers. We present an efficient code generation algorithm to schedule sendb operations on-the-fly. Detailed experimental results show that a windowed CRB with just 4 entries provides the same performance as that of a partitioned register file with infinite non-architected register space for keeping remote registers.
Supported in part by U.S. National Science Foundation (NSF) through a CAREER grant (MIP 9702569) and a regular grant (CCR 0073582).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
V. Zyuban and P. M. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” IEEE Trans. on Computers, vol. 50, pp. 268–285, Mar. 2001.
P. Faraboschi, J. Fisher, G. Brown, G. Desoli, and F. Homewood, “Lx: A Technology Platform for Customizable VLIW Embedded Processing,” in Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
K. Ebcioğlu, J. Fritts, S. Kosonocky, M. Gschwind, E. Altman, K. Kailas, and T. Bright, “An Eight-Issue Tree-VLIW Processor for Dynamic Binary Translation,” in Proc. of Int. Conf. on Computer Design (ICCD’98), pp. 488–495, 1998.
Texas Instruments, Inc., TMS320C62x/C67x Technical Brief Apr. 1998.
J. Fridman and Z. Greenfield, “The TigerSHARC DSP architecture,” IEEE Micro, vol. 20, pp. 66–76, Jan./Feb. 2000.
R. E. Kessler, “The Alpha 21264 microprocessor,” IEEE Micro, vol. 19, pp. 24–36, Mar./Apr. 1999.
R. Canal, J. M. Parcerisa, and A. Gonzalez, “Dynamic cluster assignment mech-anisms,” in Proc. of the 6th Int. Conference on High-Performance Computer Architecture (HPCA-6), pp. 133–142, Jan. 2000.
K. Kailas, Microarchitecture and Compilation Support for Clustered ILP Processors. PhD thesis, Dept. of ECE, University of Maryland, College Park, Mar 2001.
R. P. Colwell et al., “A VLIW architecture for a trace scheduling compiler,” IEEE Transactions on Computers, vol. C-37, pp. 967–979, Aug. 1988.
S. Keckler, W. Dally, D. Maskit, N. Carter, A. Chang, and W. Lee, “Exploiting fine-grain thread level parallelism on the MIT Multi-ALU processor,” in Proc. of the 25th Annual Int. Symposium on Computer Architecture, pp. 306–317, 1998.
A. Capitanio, N. Dutt, and A. Nicolau, “Partitioned register files for VLIWs: A preliminary analysis of tradeoffs,” in Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 292–300, Dec. 1-4, 1992.
K. Kailas, K. Ebcioğlu, and A. Agrawala, “cars: A New Code Generation Framework for Clustered ILP Processors,” in Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA-7), pp. 133–143, 2001.
K. Pingali, M. Beck, R. Johnson, M. Moudgill, and P. Stodghill, “Dependence flow graphs: an algebraic approach to program dependencies,” in Proc. of the 18th annual ACM symposium on Principles of programming languages, pp. 67–78, 1991.
R. Sethi, Algorithms for minimal-length schedules, ch. 2. Computer and job-shop scheduling theory (E. G. Coffman, ed.), John Wiley & Sons, Inc., New York., 1976.
M. Moudgill, “Implementing an Experimental VLIW Compiler,” IEEE Technical Committee on Computer Architecture Newsletter, pp. 39–40, June 1997.
P. R. Nuth, The Named-State Register File. PhD thesis, MIT, AI Lab, Aug. 1993.
H. H. J. Hum, K. B. Theobald, and G. R. Gao, “Building multithreaded architectures with off-the-shelf microprocessors,” in Proceedings of the 8th International Symposium on Parallel Processing, pp. 288–297, 1994.
E. H. Jensen, “Pipelined register cache.” U.S. Patent No. 5,117,493, May 1992.
R. Yung and N. C. Wilhelm, “Caching processor general registers,” in International Conference on Computer Design, pp. 307–312, 1995.
M. M. Fernandes, J. Llosa, and N. Topham, “Extending a VLIW Architecture Model,” technical report ECS-CSG-34-97, Dept. of CS, Edinburgh University, 1997.
J. Llosa, M. Valero, and E. Ayguade, “Non-consistent dual register files to reduce register pressure,” in Proceedings of the First International Symposium on High-Performance Computer Architecture, pp. 22–31, 1995.
J.-L. Cruz, A. González, M. Valero, and N. P. Topham, “Multiple-banked register file architectures,” in Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 316–325, 2000.
J. Zalamea, J. Llosa, E. Ayguade, and M. Valero, “Two-level hierarchical register file organization for VLIW processors,” in Proceedings of the 33rd Annual International Symposium on Microarchitecture (MICRO-33), 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kailas, K., Franklin, M., Ebcioğlu, K. (2002). A Register File Architecture and Compilation Scheme for Clustered ILP Processors. In: Monien, B., Feldmann, R. (eds) Euro-Par 2002 Parallel Processing. Euro-Par 2002. Lecture Notes in Computer Science, vol 2400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45706-2_68
Download citation
DOI: https://doi.org/10.1007/3-540-45706-2_68
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44049-9
Online ISBN: 978-3-540-45706-0
eBook Packages: Springer Book Archive