[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Scans as Primitive Parallel Operations

Published: 01 November 1989 Publication History

Abstract

A study of the effects of adding two scan primitives as unit-time primitives to PRAM (parallel random access machine) models is presented. It is shown that the primitives improve the asymptotic running time of many algorithms by an O(log n) factor, greatly simplifying the description of many algorithms, and are significantly easier to implement than memory references. It is argued that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. The author describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radix-sort algorithm, a quicksort algorithm, a minimum-spanning-tree algorithm, a line-drawing algorithm, and a merging algorithm. These all run on an EREW (exclusive read, exclusive write) PRAM with the addition of two scan primitives and are either simpler or more efficient than their pure PRAM counterparts. The scan primitives have been implemented in microcode on the Connection Machine system, are available in PARIS (the parallel instruction set of the machine).

References

[1]
{1} M. Ajtai, J. Komlos, and E. Szemeredi, "An O(n log n) sorting network," in Proc. ACM Symp. Theory Comput., Apr. 1983, pp. 1-9.
[2]
{2} D. C. Allen, "The BBN multiprocessors: Butterfly and Monarch," in Proc. Princeton Conf. Supercomput. Stellar Dynamics, June 1986.
[3]
{3} B. Awerbuch and Y. Shiloach, "New connectivity and MSF algorithms for Ultracomputer and PRAM," in Proc. ACM Symp Theory Comput. , 1985, pp. 175-179.
[4]
{4} K. E. Batcher, "Sorting networks and their applications," in Proc. AFIPS Spring Joint Comput. Conf., 1968, pp. 307-314.
[5]
{5} C. Berge and A. Ghouila-Houri, Programming, Games, and Transportation Networks. New York: Wiley, 1965.
[6]
{6} H. J. Berliner, "A chronology of computer chess and its literature," Artif. Intell., vol. 10, no. 2, 1978.
[7]
{7} G. E. Blelloch, "Scan primitives and parallel vector models," Ph.D. dissertation, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, Nov. 1988.
[8]
{8} G. E. Blelloch and J. J. Little, "Parallel solutions to geometric problems on the scan model of computation," in Proc. Int. Conf. Parallel Processing, vol. 3, Aug. 1988, pp. 218-222.
[9]
{9} O. Boruvka, "O jistém problén minimálim," Práca Moravské Prírodovedecké Spolecnosti, vol. 3, pp. 37-58, 1926 (In Czech.).
[10]
{10} R. P. Brent and H. T. Kung, "The chip complexity of binary arithmetic," in Proc. ACM Symp. Theory Comput., 1980, pp. 190-200.
[11]
{11} R. Cole, "Parallel merge sort," in Proc. Symp. Foundations Comput. Sci., Oct. 1986, pp. 511-516.
[12]
{12} R. Cole and U. Vishkin, "Approximate scheduling, exact scheduling, and applications to parallel algorithms," in Proc. Symp. Foundations Comput. Sci., 1986, pp. 478-491.
[13]
{13} R. Cole and U. Vishkin, "Faster optimal parallel prefix sums and list ranking," Tech. Rep. Ultracomputer Note 117, New York Univ., Feb. 1987.
[14]
{14} C. C. Elgot and A. Robinson, "Random access stored program machines," J. ACM, vol. no. 4, pp. 365-399, 1964.
[15]
{15} F. E. Fich, "New bounds for parallel prefix circuits," in Proc. ACM Symp. Theory Comput., Apr. 1983, pp. 100-109.
[16]
{16} S. Fortune and J. Wyllie, "Parallelism in random access machines," in Proc. ACM Symp. Theory Comput., 1978, pp. 114-118.
[17]
{17} F. Gavril, "Merging with parallel processors," Commun. ACM, vol. 18, no. 10, pp. 588-591, 1975.
[18]
{18} H. Gazit, G. L. Miller, and S.-H. Teng, Optimal Tree Contraction in the EREW Model. New York: Plenum, 1988, pp. 139-156.
[19]
{19} L. M. Goldschlager, "A universal interconnection pattern for parallel computers," J. ACM, vol. 29, no. 3, pp. 1073-1086, 1982.
[20]
{20} A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer--Designing a MIMD, shared-memory parallel machine," IEEE Trans. Comput., vol. C-32, pp. 175-189, 1983.
[21]
{21} A. Gottlieb, B. D. Lubachevsky, and L. Rudolph, "Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors," ACM Trans. Programming Languages Syst., vol. 5, no. 2, Apr. 1983.
[22]
{22} W. Daniel Hillis, The Connection Machine. Cambridge, MA: MIT Press, 1985.
[23]
{23} D. S. Hirschberg, A. K. Chandra, and D. V. Sarwate, "Computing connected components on parallel computers," Commun. ACM, vol. 22, no. 8, pp. 461-464, 1979.
[24]
{24} C. A. R. Hoare, "Quicksort," Comput. J., vol. 5, no. 1, pp. 10-15, 1962.
[25]
{25} K. E. Iverson, A Programming Language. New York: Wiley, 1962.
[26]
{26} D. E. Knuth, Sorting and Searching. Reading, MA: Addison-Wesley, 1973.
[27]
{27} C. P. Kruskal, L. Rudolph, and M. Snir, "The power of parallel prefix," in Proc. Int. Conf. Parallel Processing, Aug. 1985, pp. 180-185.
[28]
{28} R. E. Ladner and M. J. Fischer, "Parallel prefix computation," J. ACM, vol. 27, no. 4, pp. 831-838, Oct. 1980.
[29]
{29} F. T. Leighton, "Tight bounds on the complexity of parallel sorting," in Proc. ACM Symp. Theory Comput., May 1984, pp. 71-80.
[30]
{30} C. E. Leiserson, "Area-efficient layouts (for VLSI)," in Proc. Symp. Foundations Comput. Sci., 1980.
[31]
{31} B. D. Lubachevsky and A. G. Greenberg, "Simple, efficient asynchronous parallel prefix algorithms," in Proc. Int. Conf. Parallel Processing, Aug. 1987, pp. 66-69.
[32]
{32} G. A. Mago, "A network of computers to execute reduction languages," Int. J. Comput. Inform. Sci., 1979.
[33]
{33} G. L. Miller and J. H. Reif, "Parallel tree contraction and its application," in Proc. Symp. Foundations Comput. Sci., Oct. 1985, pp. 478-489.
[34]
{34} W. M. Newman and R. F. Sproull, Principles of Interactive Computer Graphics. New York: McGraw-Hill, 1979.
[35]
{35} Y. Ofman, "On the algorithmic complexity of discrete functions," Cybern. Contr. Theory, Sov. Phys. Dokl., vol. 7, no. 7, pp. 589-591, Jan. 1963.
[36]
{36} G. F. Pfister and V. A. Norton, "'Hot spot' contention and combining in multistage interconnection networks," in Proc. Int. Conf. Parallel Processing, Aug. 1985, pp. 790-797.
[37]
{37} A. G. Ranade, "Fluent parallel computation," Ph.D. dissertation, Yale Univ., Dep. Comput. Sci., New Haven, CT, 1989.
[38]
{38} J. B. Salem, "<sup>*</sup>Render: A data parallel approach to polygon rendering," Tech. Rep. VZ88-2, Thinking Machines Corp., Jan. 1988.
[39]
{39} C. Savage and J. Ja'Ja', "Fast, efficient parallel algorithms for some graph problems," SIAM, vol. 10, no. 4, pp. 683-691, 1981.
[40]
{40} J. T. Schwartz, "Ultracomputers," ACM Trans. Programming Languages Syst., vol. 2, no. 4, pp. 484-521, Oct. 1980.
[41]
{41} C. L. Seitz, "The Cosmic Cube," Commun. ACM, vol. 28, no. 1, pp. 22-33, Jan. 1985.
[42]
{42} Y. Shiloach and U. Vishkin, "Finding the maximum, merging and sorting in a parallel computation model," J. Algorithms, vol. 2, no. 1, pp. 88-102, 1981.
[43]
{43} Y. Shiloach and U. Vishkin, "An O(log n) parallel connectivity algorithm," J. Algorithms, vol. 3, pp. 57-67, 1982.
[44]
{44} H. S. Stone, "Parallel processing with the perfect shuffle," IEEE Trans. Comput., vol. C-20, no. 2, pp. 53-161, 1971.
[45]
{45} R. E. Tarjan, Data Structures and Network Algorithms. Philadelphia, PA: SIAM, 1983.
[46]
{46} Thinking Machines Corp., "Connection Machine parallel instruction set (PARIS)," July 1986.
[47]
{47} C. D. Thompson and H. T. Kung, "Sorting on a mesh-connected parallel computer," Commun. ACM, vol. 20, pp. 263-271, 1977.
[48]
{48} J. C. Wyllie, "The complexity of parallel computations," Tech. Rep. TR-79-387, Dep. Comput. Sci., Cornell Univ., Ithaca, NY, Aug. 1979.

Cited By

View all
  • (2024)Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673067(856-865)Online publication date: 12-Aug-2024
  • (2024)Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systemsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673057(1-10)Online publication date: 12-Aug-2024
  • (2024)Zero-Overhead Parallel Scans for Multi-Core CPUsProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649248(52-61)Online publication date: 3-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 38, Issue 11
November 1989
136 pages
ISSN:0018-9340
Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 1989

Author Tags

  1. Connection Machine system
  2. EREW PRAM
  3. O(log n) factor
  4. PARIS
  5. algorithm design
  6. computational complexity
  7. instruction sets
  8. line-drawing algorithm
  9. memory references
  10. merging algorithm
  11. minimum-spanning-tree algorithm
  12. parallel algorithms
  13. parallel instruction set
  14. parallel programming
  15. parallel random access machine
  16. quicksort algorithm
  17. radix-sort algorithm
  18. scan primitives
  19. search problems
  20. sorting.
  21. unit-time primitives

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673067(856-865)Online publication date: 12-Aug-2024
  • (2024)Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systemsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673057(1-10)Online publication date: 12-Aug-2024
  • (2024)Zero-Overhead Parallel Scans for Multi-Core CPUsProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649248(52-61)Online publication date: 3-Mar-2024
  • (2024)Linear Recurrent Units for Sequential RecommendationProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635760(930-938)Online publication date: 4-Mar-2024
  • (2024)Parallel intersection counting on shared-memory multiprocessors and GPUsFuture Generation Computer Systems10.1016/j.future.2024.05.039159:C(423-431)Online publication date: 1-Oct-2024
  • (2024)Privacy-Preserving DijkstraAdvances in Cryptology – CRYPTO 202410.1007/978-3-031-68400-5_3(74-110)Online publication date: 18-Aug-2024
  • (2023)Reverse-Mode AD of Multi-Reduce and Scan in FutharkProceedings of the 35th Symposium on Implementation and Application of Functional Languages10.1145/3652561.3652575(1-14)Online publication date: 29-Aug-2023
  • (2023)FMI: Fast and Cheap Message Passing for Serverless FunctionsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593718(373-385)Online publication date: 21-Jun-2023
  • (2023)Secure Statistical Analysis on Multiple Datasets: Join and Group-ByProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623119(3298-3312)Online publication date: 15-Nov-2023
  • (2023)Accelerating k-Shape Time Series Clustering Algorithm Using GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329814834:10(2718-2734)Online publication date: 1-Oct-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media