More Web Proxy on the site http://driver.im/

article

Free access

Interleaved parallel schemes: improving memory throughput on supercomputers

Authors:

Jacques LenfantAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 20, Issue 2

Pages 246 - 255

https://doi.org/10.1145/146628.140381

Published: 01 April 1992 Publication History

Abstract

On many commercial supercomputers, several vector register processors share a global highly interleaved memory in a MIMD mode. When all the processors are working on a single vector loop, a significant part of the potential memory throughput may be wasted due to the asynchronism of the processors.

In order to limit this loss of memory throughput, a SIMD synchronization mode for vector accesses to memory may be used. But an important part of the memory bandwidth may be wasted when accessing vectors with an even stride.

In this paper, we present IPS, an interleaved parallel scheme, which ensures an equitable distribution of elements on a highly interleaved memory for a wide range a vector strides. We show how to organize access to memory, such that unscrambling of vectors from memory to the vector register processors requires a minimum number of passes through the interconnection network.

References

[1]

V.E. Benes, "Mathematical Theory of connecting networks and telephone traffic", New York: Academic, 1968

[2]

P.Budnick, D.Kuck "The organization and use of parallel memories" IEEE Transaction On Computers, Dec. 1971

[3]

J.M. Fr~ilong, W.JMby, J.Lenf~nt "XOR-schemes: a flexible organization in parallel memories" Proceedings of 1985 International Conference on Parallel Processing, Aug. 1985

[4]

D.J.Kuck, R.A.Stokes, "The Burroughs Scientific Processor (BSP)" IEEE Transactions on Computers, May 1982.

[5]

D.T. Harper, J.R. Jump "Performance evaluation of vector accesses in parallel memories using a skewed storage scheme ", Proceedings of the 13th International Symposium on Computer Architecture, June 1986

Digital Library

[6]

D.T. Harper, J.R. Jump "Vector accesses in parallel memories using a skewed storage scheme " IEEE Transactions on Computers, Dec. 1987

Digital Library

[7]

D.H. Lawrie "Access and Mignment of data in array computer" IEEE Transactions on Computers, Dec. 1975

[8]

K.Y. Lee, "On the rearrangeability of a (2 log N - 1) stage permutation network" tEEE Transactions on Computers, May 1985.

[9]

J.Lenfant, "Parallel permutations of data : A Benes network control algorithm for frequently used permutations" IEEE Transactions on Computers, July 1978.

[10]

J.Lenfant, " A versatile mechanism to move data in an array processor" IEEE Transactions on Computers, June 1985

[11]

D.Nassimi, S.Sahni "A self-routing Benes network and permutation algorithms" IEEE Tlansactions on Computers, May 1981.

[12]

A.Norton, E.Melton "A class of boolean linear transformations for conflict-free power-of-two stride access", Proceedings of the International Conference on Parallel Processing, 1987

[13]

B.Rau, M.Schlander, D. Yen " TILe Cydra 5 stride insensitive memory system", Proceedings of the International Conference on Parallel Processing, 1989

[14]

A.Seznec, "An efficent routing control unit for the Sigma network E(4)'', Proceedings of the 13th International Symposium on Computer Architecture, June 1986

Digital Library

[15]

A.Seznec, "A new interconnection network for SIMD computers: the Sigma network Y~(~)" IEEE Transactions on Computers, July 1987

Digital Library

[16]

H.Tamura, Y.ShinkM, F.Isobe "The Supercomputer FACOM VP system" Fujitsu Sc. Tech. J., March 1985.

Cited By

Jorda JMzoughi ALitaize DValero M(1995)Semi-linear and bi-base storage schemes classesProceedings of the 9th international conference on Supercomputing10.1145/224538.224574(299-307)Online publication date: 3-Jul-1995
https://dl.acm.org/doi/10.1145/224538.224574
Jorda JM'zoughi A(2012)Isomorphic Recursive SplittingProceedings of the 2012 41st International Conference on Parallel Processing Workshops10.1109/ICPPW.2012.78(574-580)Online publication date: 10-Sep-2012
https://dl.acm.org/doi/10.1109/ICPPW.2012.78
Jia GLi XWang CZhou XZhu Z(2012)Memory AffinityProceedings of the 2012 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2012.33(605-609)Online publication date: 24-Sep-2012
https://dl.acm.org/doi/10.1109/CLUSTER.2012.33
Show More Cited By

Index Terms

Interleaved parallel schemes: improving memory throughput on supercomputers

Recommendations

Interleaved parallel schemes: improving memory throughput on supercomputers
ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture

On many commercial supercomputers, several vector register processors share a global highly interleaved memory in a MIMD mode. When all the processors are working on a single vector loop, a significant part of the potential memory throughput may be ...
Interleaved Parallel Schemes

On vector supercomputers, vector register processors share a global highly interleavedmemory. In order to optimize memory throughput, a single-instruction, multiple-data(SIMD) synchronization mode may be used on vector sections. We present an ...
Module Partitioning and Interlaced Data Placement Schemes to Reduce Conflicts in Interleaved Memories
ICPP '94: Proceedings of the 1994 International Conference on Parallel Processing - Volume 01

In interleaved memories, interference between concurrently active vector streams results in memory bank conflicts and reduced bandwidth. In this paper, we present two schemes for reducing inter-vector interference. First, we propose a memory module ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 20, Issue 2

Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)

May 1992

429 pages

ISSN:0163-5964

DOI:10.1145/146628

Editor:
Allan Gotlieb
New York Univ., New York, NY

Issue’s Table of Contents

ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture
May 1992
439 pages
ISBN:0897915097
DOI:10.1145/139669
Chairman:
Allan Gottlieb
New York Unvi., New York, NY

Copyright © 1992 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1992

Published in SIGARCH Volume 20, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
454
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)7

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jorda JMzoughi ALitaize DValero M(1995)Semi-linear and bi-base storage schemes classesProceedings of the 9th international conference on Supercomputing10.1145/224538.224574(299-307)Online publication date: 3-Jul-1995
https://dl.acm.org/doi/10.1145/224538.224574
Jorda JM'zoughi A(2012)Isomorphic Recursive SplittingProceedings of the 2012 41st International Conference on Parallel Processing Workshops10.1109/ICPPW.2012.78(574-580)Online publication date: 10-Sep-2012
https://dl.acm.org/doi/10.1109/ICPPW.2012.78
Jia GLi XWang CZhou XZhu Z(2012)Memory AffinityProceedings of the 2012 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2012.33(605-609)Online publication date: 24-Sep-2012
https://dl.acm.org/doi/10.1109/CLUSTER.2012.33
Lin CYang CKing KHenkel JKeshavarzi AChang NGhani T(2009)PPTProceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design10.1145/1594233.1594255(93-98)Online publication date: 19-Aug-2009
https://dl.acm.org/doi/10.1145/1594233.1594255
Galuzzi CGou CCalderón HGaydadjiev GVassiliadis S(2008)High-bandwidth Address Generation UnitJournal of Signal Processing Systems10.1007/s11265-008-0174-x57:1(33-44)Online publication date: 19-Jun-2008
https://doi.org/10.1007/s11265-008-0174-x
Seznec AEspasa R(2005)Conflict-Free Accesses to Strided Vectors on a Banked CacheIEEE Transactions on Computers10.1109/TC.2005.11054:7(913-916)Online publication date: 1-Jul-2005
https://dl.acm.org/doi/10.1109/TC.2005.110
Valero MPeiron MAyguadé E(2005)Memory access synchronization in vector multiprocessorsParallel Processing: CONPAR 94 — VAPP VI10.1007/3-540-58430-7_37(414-425)Online publication date: 3-Jun-2005
https://doi.org/10.1007/3-540-58430-7_37
Zhang ZZhu ZZhang XWolfe ASchlansker M(2000)A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data localityProceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture10.1145/360128.360134(32-41)Online publication date: 1-Dec-2000
https://dl.acm.org/doi/10.1145/360128.360134
del Corral ALlaberia JClayton B(1996)Increasing the effective bandwidth of complex memory systems in multivector processorsProceedings of the 1996 ACM/IEEE conference on Supercomputing10.1145/369028.369084(26-es)Online publication date: 17-Nov-1996
https://dl.acm.org/doi/10.1145/369028.369084
Peiron MValero MAyguadé ELang T(1995)Vector multiprocessors with arbitrated memory accessACM SIGARCH Computer Architecture News10.1145/225830.22443523:2(243-252)Online publication date: 1-May-1995
https://dl.acm.org/doi/10.1145/225830.224435
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents