More Web Proxy on the site http://driver.im/

article

Free access

Vector prefetching

Authors:

Michael K. Gschwind,

Thomas J. PietschAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 23, Issue 5

Pages 1 - 7

https://doi.org/10.1145/218328.218329

Published: 15 December 1995 Publication History

Abstract

This paper focuses on extending the memory subsystem by integrating a prefetch buffer mechanism. Prefetching allows high-level application knowledge to increase memory performance, which is currently constraining the performance of most system. While prefetching does not reduce the latency of memory accesses, it hides this latency by overlapping memory access and instruction execution.

The first prefetch operation to the buffer is initiated by an explicit fetch instruction. All further prefetch operations are issued automatically whenever a prefetched value is consumed. To efficiently support list and vector processing, the user can specify a stride value at the time the first prefetch operation is initiated.

References

[1]

{AAD⁺93} Tom Asprey, Gregory S. Averill, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter. Performance features of the HP PA-7100 microprocessor. IEEE Micro, Special Issue on Hot Chips IV, 13(3), June 1993.

Digital Library

[2]

{BC91} Jean-Loup Baer and Tien-Fu Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proc. of Supercomputing '91, pages 176-186, November 1991.

Digital Library

[3]

{BD94} Keith Boland and Apostolos Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4): 59-67, August 1994.

Digital Library

[4]

{CKP91} David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching. In Proc. of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems , pages 40--52, April 1991.

Digital Library

[5]

{FPJ92} John W. C. Fu, Janak H. Patel, and Bob L. Janssens. Stride directed prefetching in scalar processors. In Proc. of the 25th Annual International Symposium on Microarchitecture, pages 102-110, 1992.

Digital Library

[6]

{GP93} Michael K. Gschwind and Thomas J. Pietsch. A smart cache for improved vector performance. In First International Meeting on Vector and Parallel Processing, Porto, Portugal, September 1993.

[7]

{IBM90} IBM. IBM Journal of Research and Development, Special Issue on RISC/System6000. IBM, January 1990.

[8]

{Jou90} Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 364-373, August 1990.

Digital Library

[9]

{Kan89} Gerry Kane. MIPS RISC Architecture. Prentice Hall, 1989.

Digital Library

[10]

{KL91} Alexander C. Klaiber and Henry M. Levy. An architecture for software-controllod data prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 43-53, May 1991.

Digital Library

[11]

{KP92} Andreas Krall and Thomas Pietsch. R3000 extensions for the support of logic and functional programming languages. Technical report, Abteilung für Programmiersprachen, Technische Universität Wien, 1992.

[12]

{Lar90} James R. Larus. SPIM S20: A MIPS R2000 simulator. Technical Report 966, University of Wisconsin-Madison, Madison, WI, September 1990.

[13]

{McM91} Frank H. McMahon. Lawrence Livermore National Laboratory FORTRAN Kernels Test: MFLOPS. FORTRAN source code, September 1991.

[14]

{Mot88} Motorola. MCS88100: RISC Microprocessor User's Manual. Motorola, Inc., 1988.

[15]

{PS94} Christian L. Piccardi and Jürgen E Strobel. Optimization and evaluation at the Livermore-loops for a prefctching MIPS-I architecture. Technical Report IB-94/14, Institut für Technische Informatik, Technische Universität Wien, Vienna, Austria, 1994.

[16]

{RL92} Anne Rogers and Kai Li. Software support for speculative loads. In Proc. of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 38-50, October 1992.

Digital Library

[17]

{RR93} Anne Rogers and Scott Rosenberg. Cycle level SPIM. Technical report, Department of Computer Science, Princeton University, Princeton, NJ, October 1993.

[18]

{Sta95} Richard Stallman. Using and Porting GNU CC. Free Software Foundation, Cambridge, MA, 1995. (Version 2.7).

[19]

{Tho64} James E. Thornton. Parallel operation in the Control Data 6600. In Proc. of the Spring Joint Computer Conference 1964, pages 33-40, 1964.

[20]

{Tho67} J. F. Thorlin. Code generation for PIE (parallel instruction execution) computers. In Proc. of the Spring Joint Computer Conference 1967, pages 641-643, 1967.

Digital Library

[21]

{Und93} Stephen Undy. Hummingbird: A low-cost superscalar PA-RISC processor. In HOT Chips V-Symposium Record 1993, pages 1.3.1-1.3.12, August 1993.

Cited By

Salapura VBrunheroto JRedigolo FGara A(2007)Exploiting eDRAM bandwidth with data prefetching: simulation and measurements2007 25th International Conference on Computer Design10.1109/ICCD.2007.4601945(504-511)Online publication date: Oct-2007
https://doi.org/10.1109/ICCD.2007.4601945
Brunheroto JSalapura VRedigolo FHoenicke DGara A(2005)Data cache prefetching design space exploration for BlueGene/L supercomputerProceedings of the 17th International Symposium on Computer Architecture on High Performance Computing10.1109/CAHPC.2005.23(201-208)Online publication date: 24-Oct-2005
https://dl.acm.org/doi/10.1109/CAHPC.2005.23
Gschwind MSalapura VMaurer D(2001)FPGA prototyping of a RISC processor core for embedded applicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/92.9240279:2(241-250)Online publication date: 1-Apr-2001
https://dl.acm.org/doi/10.1109/92.924027
Show More Cited By

Index Terms

Vector prefetching

Recommendations

Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 23, Issue 5

Dec. 1995

44 pages

ISSN:0163-5964

DOI:10.1145/218328

Editor:
Doug DeGoot
Texas Instruments, Dallas, TX

Issue’s Table of Contents

Copyright © 1995 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 December 1995

Published in SIGARCH Volume 23, Issue 5

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
327
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Salapura VBrunheroto JRedigolo FGara A(2007)Exploiting eDRAM bandwidth with data prefetching: simulation and measurements2007 25th International Conference on Computer Design10.1109/ICCD.2007.4601945(504-511)Online publication date: Oct-2007
https://doi.org/10.1109/ICCD.2007.4601945
Brunheroto JSalapura VRedigolo FHoenicke DGara A(2005)Data cache prefetching design space exploration for BlueGene/L supercomputerProceedings of the 17th International Symposium on Computer Architecture on High Performance Computing10.1109/CAHPC.2005.23(201-208)Online publication date: 24-Oct-2005
https://dl.acm.org/doi/10.1109/CAHPC.2005.23
Gschwind MSalapura VMaurer D(2001)FPGA prototyping of a RISC processor core for embedded applicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/92.9240279:2(241-250)Online publication date: 1-Apr-2001
https://dl.acm.org/doi/10.1109/92.924027
Gschwind MJerraya ALavagno LVahid F(1999)Instruction set selection for ASIP designProceedings of the seventh international workshop on Hardware/software codesign10.1145/301177.301187(7-11)Online publication date: 1-Mar-1999
https://dl.acm.org/doi/10.1145/301177.301187

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents