More Web Proxy on the site http://driver.im/

research-article

Scans as Primitive Parallel Operations

Author:

G. E. BlellochAuthors Info & Claims

IEEE Transactions on Computers, Volume 38, Issue 11

Pages 1526 - 1538

https://doi.org/10.1109/12.42122

Published: 01 November 1989 Publication History

Abstract

A study of the effects of adding two scan primitives as unit-time primitives to PRAM (parallel random access machine) models is presented. It is shown that the primitives improve the asymptotic running time of many algorithms by an O(log n) factor, greatly simplifying the description of many algorithms, and are significantly easier to implement than memory references. It is argued that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. The author describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radix-sort algorithm, a quicksort algorithm, a minimum-spanning-tree algorithm, a line-drawing algorithm, and a merging algorithm. These all run on an EREW (exclusive read, exclusive write) PRAM with the addition of two scan primitives and are either simpler or more efficient than their pure PRAM counterparts. The scan primitives have been implemented in microcode on the Connection Machine system, are available in PARIS (the parallel instruction set of the machine).

References

[1]

{1} M. Ajtai, J. Komlos, and E. Szemeredi, "An O(n log n) sorting network," in Proc. ACM Symp. Theory Comput., Apr. 1983, pp. 1-9.

Digital Library

[2]

{2} D. C. Allen, "The BBN multiprocessors: Butterfly and Monarch," in Proc. Princeton Conf. Supercomput. Stellar Dynamics, June 1986.

[3]

{3} B. Awerbuch and Y. Shiloach, "New connectivity and MSF algorithms for Ultracomputer and PRAM," in Proc. ACM Symp Theory Comput. , 1985, pp. 175-179.

[4]

{4} K. E. Batcher, "Sorting networks and their applications," in Proc. AFIPS Spring Joint Comput. Conf., 1968, pp. 307-314.

[5]

{5} C. Berge and A. Ghouila-Houri, Programming, Games, and Transportation Networks. New York: Wiley, 1965.

[6]

{6} H. J. Berliner, "A chronology of computer chess and its literature," Artif. Intell., vol. 10, no. 2, 1978.

[7]

{7} G. E. Blelloch, "Scan primitives and parallel vector models," Ph.D. dissertation, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, Nov. 1988.

[8]

{8} G. E. Blelloch and J. J. Little, "Parallel solutions to geometric problems on the scan model of computation," in Proc. Int. Conf. Parallel Processing, vol. 3, Aug. 1988, pp. 218-222.

[9]

{9} O. Boruvka, "O jistém problén minimálim," Práca Moravské Prírodovedecké Spolecnosti, vol. 3, pp. 37-58, 1926 (In Czech.).

[10]

{10} R. P. Brent and H. T. Kung, "The chip complexity of binary arithmetic," in Proc. ACM Symp. Theory Comput., 1980, pp. 190-200.

Digital Library

[11]

{11} R. Cole, "Parallel merge sort," in Proc. Symp. Foundations Comput. Sci., Oct. 1986, pp. 511-516.

[12]

{12} R. Cole and U. Vishkin, "Approximate scheduling, exact scheduling, and applications to parallel algorithms," in Proc. Symp. Foundations Comput. Sci., 1986, pp. 478-491.

[13]

{13} R. Cole and U. Vishkin, "Faster optimal parallel prefix sums and list ranking," Tech. Rep. Ultracomputer Note 117, New York Univ., Feb. 1987.

[14]

{14} C. C. Elgot and A. Robinson, "Random access stored program machines," J. ACM, vol. no. 4, pp. 365-399, 1964.

Digital Library

[15]

{15} F. E. Fich, "New bounds for parallel prefix circuits," in Proc. ACM Symp. Theory Comput., Apr. 1983, pp. 100-109.

Digital Library

[16]

{16} S. Fortune and J. Wyllie, "Parallelism in random access machines," in Proc. ACM Symp. Theory Comput., 1978, pp. 114-118.

Digital Library

[17]

{17} F. Gavril, "Merging with parallel processors," Commun. ACM, vol. 18, no. 10, pp. 588-591, 1975.

Digital Library

[18]

{18} H. Gazit, G. L. Miller, and S.-H. Teng, Optimal Tree Contraction in the EREW Model. New York: Plenum, 1988, pp. 139-156.

[19]

{19} L. M. Goldschlager, "A universal interconnection pattern for parallel computers," J. ACM, vol. 29, no. 3, pp. 1073-1086, 1982.

Digital Library

[20]

{20} A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer--Designing a MIMD, shared-memory parallel machine," IEEE Trans. Comput., vol. C-32, pp. 175-189, 1983.

[21]

{21} A. Gottlieb, B. D. Lubachevsky, and L. Rudolph, "Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors," ACM Trans. Programming Languages Syst., vol. 5, no. 2, Apr. 1983.

Digital Library

[22]

{22} W. Daniel Hillis, The Connection Machine. Cambridge, MA: MIT Press, 1985.

Digital Library

[23]

{23} D. S. Hirschberg, A. K. Chandra, and D. V. Sarwate, "Computing connected components on parallel computers," Commun. ACM, vol. 22, no. 8, pp. 461-464, 1979.

Digital Library

[24]

{24} C. A. R. Hoare, "Quicksort," Comput. J., vol. 5, no. 1, pp. 10-15, 1962.

Digital Library

[25]

{25} K. E. Iverson, A Programming Language. New York: Wiley, 1962.

Digital Library

[26]

{26} D. E. Knuth, Sorting and Searching. Reading, MA: Addison-Wesley, 1973.

[27]

{27} C. P. Kruskal, L. Rudolph, and M. Snir, "The power of parallel prefix," in Proc. Int. Conf. Parallel Processing, Aug. 1985, pp. 180-185.

[28]

{28} R. E. Ladner and M. J. Fischer, "Parallel prefix computation," J. ACM, vol. 27, no. 4, pp. 831-838, Oct. 1980.

Digital Library

[29]

{29} F. T. Leighton, "Tight bounds on the complexity of parallel sorting," in Proc. ACM Symp. Theory Comput., May 1984, pp. 71-80.

Digital Library

[30]

{30} C. E. Leiserson, "Area-efficient layouts (for VLSI)," in Proc. Symp. Foundations Comput. Sci., 1980.

[31]

{31} B. D. Lubachevsky and A. G. Greenberg, "Simple, efficient asynchronous parallel prefix algorithms," in Proc. Int. Conf. Parallel Processing, Aug. 1987, pp. 66-69.

[32]

{32} G. A. Mago, "A network of computers to execute reduction languages," Int. J. Comput. Inform. Sci., 1979.

[33]

{33} G. L. Miller and J. H. Reif, "Parallel tree contraction and its application," in Proc. Symp. Foundations Comput. Sci., Oct. 1985, pp. 478-489.

[34]

{34} W. M. Newman and R. F. Sproull, Principles of Interactive Computer Graphics. New York: McGraw-Hill, 1979.

Digital Library

[35]

{35} Y. Ofman, "On the algorithmic complexity of discrete functions," Cybern. Contr. Theory, Sov. Phys. Dokl., vol. 7, no. 7, pp. 589-591, Jan. 1963.

[36]

{36} G. F. Pfister and V. A. Norton, "'Hot spot' contention and combining in multistage interconnection networks," in Proc. Int. Conf. Parallel Processing, Aug. 1985, pp. 790-797.

[37]

{37} A. G. Ranade, "Fluent parallel computation," Ph.D. dissertation, Yale Univ., Dep. Comput. Sci., New Haven, CT, 1989.

Digital Library

[38]

{38} J. B. Salem, "<sup>*</sup>Render: A data parallel approach to polygon rendering," Tech. Rep. VZ88-2, Thinking Machines Corp., Jan. 1988.

[39]

{39} C. Savage and J. Ja'Ja', "Fast, efficient parallel algorithms for some graph problems," SIAM, vol. 10, no. 4, pp. 683-691, 1981.

[40]

{40} J. T. Schwartz, "Ultracomputers," ACM Trans. Programming Languages Syst., vol. 2, no. 4, pp. 484-521, Oct. 1980.

Digital Library

[41]

{41} C. L. Seitz, "The Cosmic Cube," Commun. ACM, vol. 28, no. 1, pp. 22-33, Jan. 1985.

Digital Library

[42]

{42} Y. Shiloach and U. Vishkin, "Finding the maximum, merging and sorting in a parallel computation model," J. Algorithms, vol. 2, no. 1, pp. 88-102, 1981.

[43]

{43} Y. Shiloach and U. Vishkin, "An O(log n) parallel connectivity algorithm," J. Algorithms, vol. 3, pp. 57-67, 1982.

Digital Library

[44]

{44} H. S. Stone, "Parallel processing with the perfect shuffle," IEEE Trans. Comput., vol. C-20, no. 2, pp. 53-161, 1971.

[45]

{45} R. E. Tarjan, Data Structures and Network Algorithms. Philadelphia, PA: SIAM, 1983.

Digital Library

[46]

{46} Thinking Machines Corp., "Connection Machine parallel instruction set (PARIS)," July 1986.

[47]

{47} C. D. Thompson and H. T. Kung, "Sorting on a mesh-connected parallel computer," Commun. ACM, vol. 20, pp. 263-271, 1977.

Digital Library

[48]

{48} J. C. Wyllie, "The complexity of parallel computations," Tech. Rep. TR-79-387, Dep. Comput. Sci., Cornell Univ., Ithaca, NY, Aug. 1979.

Digital Library

Cited By

Weißenberger ASchmidt B(2024)Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673067(856-865)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673067
Kwedlo W(2024)Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systemsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673057(1-10)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673057
de Wolff Ivan Balen DKeller GMcDonell T(2024)Zero-Overhead Parallel Scans for Multi-Core CPUsProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649248(52-61)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649248
Show More Cited By

Index Terms

Scans as Primitive Parallel Operations

Recommendations

Parallel Quicksort Using Fetch-And-Add

A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented. The partitioning phase of Quicksort, which has been considered a ...
Parallel Binary Search

Two arrays of numbers sorted in nondecreasing order are given: an array A of size n and an array B of size m, where n>m. It is required to determine, for every element of A, the smallest element of B (if one exists) that is larger than or equal to it. ...
Time-Space Optimal Parallel Merging and Sorting

The authors present a parallel merging algorithm that, on an exclusive-read exclusive-write (EREW) parallel random-access machine (PRAM) with k processors merges two sorted lists of total length n in O(n/k+log n) time and constant extra space per ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 38, Issue 11

November 1989

136 pages

ISSN:0018-9340

Editor:
Ming T. Liu
Ohio State Univ., Columbus, OH

Issue’s Table of Contents

Copyright © Copyright © 1989 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 1989

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

167
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Weißenberger ASchmidt B(2024)Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673067(856-865)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673067
Kwedlo W(2024)Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systemsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673057(1-10)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673057
de Wolff Ivan Balen DKeller GMcDonell T(2024)Zero-Overhead Parallel Scans for Multi-Core CPUsProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649248(52-61)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649248
Yue ZWang YHe ZZeng HMcauley JWang DAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Linear Recurrent Units for Sequential RecommendationProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635760(930-938)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635760
Marzolla MBirolo GD’Angelo GFariselli P(2024)Parallel intersection counting on shared-memory multiprocessors and GPUsFuture Generation Computer Systems10.1016/j.future.2024.05.039159:C(423-431)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.future.2024.05.039
Ostrovsky B(2024)Privacy-Preserving DijkstraAdvances in Cryptology – CRYPTO 202410.1007/978-3-031-68400-5_3(74-110)Online publication date: 18-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-68400-5_3
Bruun LLarsen UHinnerskov NOancea C(2023)Reverse-Mode AD of Multi-Reduce and Scan in FutharkProceedings of the 35th Symposium on Implementation and Application of Functional Languages10.1145/3652561.3652575(1-14)Online publication date: 29-Aug-2023
https://dl.acm.org/doi/10.1145/3652561.3652575
Copik MBöhringer RCalotoiu AHoefler TGallivan KNikolopoulos DBeivide RGallopoulos E(2023)FMI: Fast and Cheap Message Passing for Serverless FunctionsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593718(373-385)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593718
Asharov GHamada KKikuchi RNof APinkas BTomida JMeng WJensen CCremers CKirda E(2023)Secure Statistical Analysis on Multiple Datasets: Join and Group-ByProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623119(3298-3312)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623119
Wang XSong RXiao JLi TLi X(2023)Accelerating k-Shape Time Series Clustering Algorithm Using GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329814834:10(2718-2734)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3298148
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents