[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Vector prefetching

Published: 15 December 1995 Publication History

Abstract

This paper focuses on extending the memory subsystem by integrating a prefetch buffer mechanism. Prefetching allows high-level application knowledge to increase memory performance, which is currently constraining the performance of most system. While prefetching does not reduce the latency of memory accesses, it hides this latency by overlapping memory access and instruction execution.
The first prefetch operation to the buffer is initiated by an explicit fetch instruction. All further prefetch operations are issued automatically whenever a prefetched value is consumed. To efficiently support list and vector processing, the user can specify a stride value at the time the first prefetch operation is initiated.

References

[1]
{AAD+93} Tom Asprey, Gregory S. Averill, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter. Performance features of the HP PA-7100 microprocessor. IEEE Micro, Special Issue on Hot Chips IV, 13(3), June 1993.
[2]
{BC91} Jean-Loup Baer and Tien-Fu Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proc. of Supercomputing '91, pages 176-186, November 1991.
[3]
{BD94} Keith Boland and Apostolos Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4): 59-67, August 1994.
[4]
{CKP91} David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching. In Proc. of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems , pages 40--52, April 1991.
[5]
{FPJ92} John W. C. Fu, Janak H. Patel, and Bob L. Janssens. Stride directed prefetching in scalar processors. In Proc. of the 25th Annual International Symposium on Microarchitecture, pages 102-110, 1992.
[6]
{GP93} Michael K. Gschwind and Thomas J. Pietsch. A smart cache for improved vector performance. In First International Meeting on Vector and Parallel Processing, Porto, Portugal, September 1993.
[7]
{IBM90} IBM. IBM Journal of Research and Development, Special Issue on RISC/System6000. IBM, January 1990.
[8]
{Jou90} Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 364-373, August 1990.
[9]
{Kan89} Gerry Kane. MIPS RISC Architecture. Prentice Hall, 1989.
[10]
{KL91} Alexander C. Klaiber and Henry M. Levy. An architecture for software-controllod data prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 43-53, May 1991.
[11]
{KP92} Andreas Krall and Thomas Pietsch. R3000 extensions for the support of logic and functional programming languages. Technical report, Abteilung für Programmiersprachen, Technische Universität Wien, 1992.
[12]
{Lar90} James R. Larus. SPIM S20: A MIPS R2000 simulator. Technical Report 966, University of Wisconsin-Madison, Madison, WI, September 1990.
[13]
{McM91} Frank H. McMahon. Lawrence Livermore National Laboratory FORTRAN Kernels Test: MFLOPS. FORTRAN source code, September 1991.
[14]
{Mot88} Motorola. MCS88100: RISC Microprocessor User's Manual. Motorola, Inc., 1988.
[15]
{PS94} Christian L. Piccardi and Jürgen E Strobel. Optimization and evaluation at the Livermore-loops for a prefctching MIPS-I architecture. Technical Report IB-94/14, Institut für Technische Informatik, Technische Universität Wien, Vienna, Austria, 1994.
[16]
{RL92} Anne Rogers and Kai Li. Software support for speculative loads. In Proc. of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 38-50, October 1992.
[17]
{RR93} Anne Rogers and Scott Rosenberg. Cycle level SPIM. Technical report, Department of Computer Science, Princeton University, Princeton, NJ, October 1993.
[18]
{Sta95} Richard Stallman. Using and Porting GNU CC. Free Software Foundation, Cambridge, MA, 1995. (Version 2.7).
[19]
{Tho64} James E. Thornton. Parallel operation in the Control Data 6600. In Proc. of the Spring Joint Computer Conference 1964, pages 33-40, 1964.
[20]
{Tho67} J. F. Thorlin. Code generation for PIE (parallel instruction execution) computers. In Proc. of the Spring Joint Computer Conference 1967, pages 641-643, 1967.
[21]
{Und93} Stephen Undy. Hummingbird: A low-cost superscalar PA-RISC processor. In HOT Chips V-Symposium Record 1993, pages 1.3.1-1.3.12, August 1993.

Cited By

View all
  • (2007)Exploiting eDRAM bandwidth with data prefetching: simulation and measurements2007 25th International Conference on Computer Design10.1109/ICCD.2007.4601945(504-511)Online publication date: Oct-2007
  • (2005)Data cache prefetching design space exploration for BlueGene/L supercomputerProceedings of the 17th International Symposium on Computer Architecture on High Performance Computing10.1109/CAHPC.2005.23(201-208)Online publication date: 24-Oct-2005
  • (2001)FPGA prototyping of a RISC processor core for embedded applicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/92.9240279:2(241-250)Online publication date: 1-Apr-2001
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 23, Issue 5
Dec. 1995
44 pages
ISSN:0163-5964
DOI:10.1145/218328
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 December 1995
Published in SIGARCH Volume 23, Issue 5

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)5
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2007)Exploiting eDRAM bandwidth with data prefetching: simulation and measurements2007 25th International Conference on Computer Design10.1109/ICCD.2007.4601945(504-511)Online publication date: Oct-2007
  • (2005)Data cache prefetching design space exploration for BlueGene/L supercomputerProceedings of the 17th International Symposium on Computer Architecture on High Performance Computing10.1109/CAHPC.2005.23(201-208)Online publication date: 24-Oct-2005
  • (2001)FPGA prototyping of a RISC processor core for embedded applicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/92.9240279:2(241-250)Online publication date: 1-Apr-2001
  • (1999)Instruction set selection for ASIP designProceedings of the seventh international workshop on Hardware/software codesign10.1145/301177.301187(7-11)Online publication date: 1-Mar-1999

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media