More Web Proxy on the site http://driver.im/

research-article

Public Access

A Lower Bound Technique for Communication in BSP

Authors:

Gianfranco Bilardi,

Michele Scquizzato,

Francesco SilvestriAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 3

Article No.: 14, Pages 1 - 27

https://doi.org/10.1145/3181776

Published: 20 February 2018 Publication History

Abstract

Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This article presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM.

References

[1]

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. 1990. Communication complexity of PRAMs. Theoret. Comput. Sci. 71, 1 (1990), 3--28.

Digital Library

[2]

Alok Aggarwal and Jeffrey S. Vitter. 1988. The input/output complexity of sorting and related problems. Comm. ACM 31, 9 (1988), 1116--1127.

Digital Library

[3]

Miklós Ajtai, János Komlós, and Endre Szemerédi. 1983. Sorting in c log n parallel steps. Combinatorica 3, 1 (1983), 1--19.

Digital Library

[4]

Grey Ballard, James Demmel, Andrew Gearhart, Ben Lipshitz, Yishai Oltchik, Oded Schwartz, and Sivan Toledo. 2016. Network topologies and inevitable contention. In Proceedings of the 1st International Workshop on Communication Optimizations in HPC (COMHPC’16). 39--52.

Digital Library

[5]

Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2011. Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32, 3 (2011), 866--901.

[6]

Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2012. Graph expansion and communication costs of fast matrix multiplication. J. ACM 59, 6 (2012), Article 32.

Digital Library

[7]

Kenneth E. Batcher. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference, Vol. 32. 307--314.

Digital Library

[8]

Armin Bäumker, Wolfgang Dittrich, and Friedhelm Meyer auf der Heide. 1998. Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Theoret. Comput. Sci. 203, 2 (1998), 175--203.

Digital Library

[9]

Václav E. Beneš. 1964. Permutation groups, complexes, and rearrangeable connecting networks. Bell System Tech. J. 43 (1964), 1619--1640.

[10]

Sandeep N. Bhatt, Gianfranco Bilardi, and Geppino Pucci. 2008. Area-time tradeoffs for universal VLSI circuits. Theoret. Comput. Sci. 408, 2--3 (2008), 143--150.

Digital Library

[11]

Gianfranco Bilardi. 1989. Merging and sorting networks with the topology of the omega network. IEEE Trans. Comput. 38, 10 (1989), 1396--1403.

Digital Library

[12]

Gianfranco Bilardi and Carlo Fantozzi. 2011. New area-time lower bounds for the multidimensional DFT. In Proceedings of the 17th Computing: The Australasian Theory Symposium (CATS’11). 111--120.

Digital Library

[13]

Gianfranco Bilardi, Andrea Pietracaprina, and Paolo D’Alberto. 2000. On the space and access complexity of computation DAGs. In Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science (WG’00). 47--58.

Digital Library

[14]

Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. 2007. Decomposable BSP: A bandwidth-latency model for parallel and hierarchical computation. In Handbook of Parallel Computing: Models, Algorithms and Applications. CRC Press, 277--315.

[15]

Gianfranco Bilardi, Andrea Pietracaprina, Geppino Pucci, Michele Scquizzato, and Francesco Silvestri. 2016. Network-oblivious algorithms. J. ACM 63, 1, Article 3 (2016).

Digital Library

[16]

Gianfranco Bilardi and Franco Preparata. 1986. Area-time lower-bound techniques with applications to sorting. Algorithmica 1, 1 (1986), 65--91.

Digital Library

[17]

Gianfranco Bilardi and Franco Preparata. 1999. Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory Comput. Syst. 32, 5 (1999), 531--559.

[18]

Gianfranco Bilardi, Michele Scquizzato, and Francesco Silvestri. 2012. A lower bound technique for communication on BSP with application to the FFT. In Proceedings of the 18th International European Conference on Parallel and Distributed Computing (Euro-Par’12). 676--687.

Digital Library

[19]

Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. 2016. Efficient algorithms with asymmetric read and write costs. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA’16). 14:1--14:18.

[20]

Rezaul A. Chowdhury, Vijaya Ramachandran, Francesco Silvestri, and Brandon Blakeley. 2013. Oblivious algorithms for multicores and networks of processors. J. Parallel Distrib. Comput. 73, 7 (2013), 911--925.

[21]

Richard Cole and Vijaya Ramachandran. 2012. Efficient resource oblivious algorithms for multicores with false sharing. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12). 201--214.

Digital Library

[22]

Richard Cole and Vijaya Ramachandran. 2017. Resource oblivious sorting on multicores. ACM Trans. Parallel Comput. 3, 4, Article 23 (2017).

Digital Library

[23]

James W. Cooley and John W. Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19 (1965), 297--301.

[24]

David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Eunice E. Santos, Klaus E. Schauser, Ramesh Subramonian, and Thorsten von Eicken. 1996. LogP: A practical model of parallel computation. Commun. ACM 39, 11 (1996), 78--85.

Digital Library

[25]

Pilar de la Torre and Clyde P. Kruskal. 1996. Submachine locality in the bulk synchronous setting. In Proceedings of the 2nd International Conference on Parallel Processing (Euro-Par’96). 352--358.

Digital Library

[26]

Martin Dowd, Yehoshua Perl, Larry Rudolph, and Michael Saks. 1989. The periodic balanced sorting network. J. ACM 36, 4 (1989), 738--757.

Digital Library

[27]

Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 2012. Cache-oblivious algorithms. ACM Trans. Algorithms 8, 1, Article 4 (2012).

Digital Library

[28]

Michael T. Goodrich. 1999. Communication-efficient parallel sorting. SIAM J. Comput. 29, 2 (1999), 416--432.

Digital Library

[29]

Philip Hall. 1935. On representatives of subsets. J. London Math. Soc. 10, 1 (1935), 26--30.

[30]

Frederick C. Hennie. 1965. One-tape, off-line Turing machine computations. Info. Control 8, 6 (1965), 553--578.

[31]

Jia-Wei Hong and H. T. Kung. 1981. I/O complexity: The red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing (STOC’81). 326--333.

Digital Library

[32]

John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. 1977. On time versus space. J. ACM 24, 2 (1977), 332--337.

Digital Library

[33]

Dror Irony, Sivan Toledo, and Alexandre Tiskin. 2004. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64, 9 (2004), 1017--1026.

Digital Library

[34]

Ben H. H. Juurlink and Harry A. G. Wijshoff. 1998. A quantitative comparison of parallel computation models. ACM Trans. Comput. Syst. 16, 3 (1998), 271--318.

Digital Library

[35]

Donald E. Knuth. 1973. The Art of Computer Programming, Volume III: Sorting and Searching (2nd ed.). Addison-Wesley.

[36]

Richard R. Koch, Frank T. Leighton, Bruce M. Maggs, Satish B. Rao, Arnold L. Rosenberg, and Eric J. Schwabe. 1997. Work-preserving emulations of fixed-connection networks. J. ACM 44, 1 (1997), 104--147.

Digital Library

[37]

Duncan H. Lawrie. 1975. Access and alignment of data in an array processor. IEEE Trans. Comput. C-24, 12 (1975), 1145--1155.

Digital Library

[38]

Frank T. Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann.

Digital Library

[39]

Philip D. MacKenzie and Vijaya Ramachandran. 1998. Computational bounds for fundamental problems on general-purpose parallel models. In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98). 152--163.

Digital Library

[40]

Christos H. Papadimitriou and Jeffrey D. Ullman. 1987. A communication-time tradeoff. SIAM J. Comput. 16, 4 (1987), 639--646.

Digital Library

[41]

Michael S. Paterson and Carl E. Hewitt. 1970. Comparative schematology. In Proceedings of the Project MAC Conference on Concurrent Systems and Parallel Computation. 119--127.

Digital Library

[42]

Desh Ranjan, John Savage, and Mohammad Zubair. 2011. Strong I/O lower bounds for binomial and FFT computation graphs. In Proceedings of the 17th Annual International Conference on Computing and Combinatorics (COCOON’11). 134--145.

Digital Library

[43]

John E. Savage. 1995. Extending the hong-kung model to memory hierarchies. In Proceedings of the 1st Annual International Conference on Computing and Combinatorics (COCOON’95). 270--281.

Digital Library

[44]

John E. Savage. 1998. Models of Computation: Exploring the Power of Computing. Addison-Wesley Longman Publishing Co., Inc.

Digital Library

[45]

Michele Scquizzato and Francesco Silvestri. 2014. Communication lower bounds for distributed-memory computations. In Proceedings of the 31st Symposium on Theoretical Aspects of Computer Science (STACS’14). 627--638.

[46]

David C. Thompson. 1980. A Complexity Theory for VLSI. Ph.D. Dissertation. Carnegie Mellon University.

Digital Library

[47]

Alexandre Tiskin. 1998. The bulk-synchronous parallel random access machine. Theoret. Comput. Sci. 196, 1--2 (1998), 109--130.

Digital Library

[48]

Alexander Tiskin. 2011. BSP (bulk synchronous parallelism). In Encyclopedia of Parallel Computing. Springer, 192--199.

[49]

Leslie G. Valiant. 1990. A bridging model for parallel computation. Comm. ACM 33, 8 (1990), 103--111.

Digital Library

[50]

Jeffrey Scott Vitter. 2006. Algorithms and data structures for external memory. Found. Trends Theoret. Comput. Sci. 2, 4 (2006), 305--474.

Digital Library

[51]

Jean Vuillemin. 1983. A combinatorial limit to the computing power of VLSI circuits. IEEE Trans. Comput. 32, 3 (1983), 294--300.

Digital Library

[52]

Chuan-Lin Wu and Tse-Yun Feng. 1981. The universality of the shuffle-exchange network. IEEE Trans. Comput. 30, 5 (1981), 324--332.

Digital Library

[53]

Andrew Chi-Chih Yao. 1979. Some complexity questions related to distributive computing. In Proceedings of the 11th Annual ACM Symposium on Theory of Computing (STOC’79). 209--213.

Digital Library

Cited By

Böhnlein TPapp PYzelman AAgrawal KPetrank E(2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660269
Eyraud-Dubois LIooss GLangou JRastello FAgrawal KPetrank E(2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659986
Nanongkai DScquizzato M(2022)Equivalence classes and conditional hardness in massively parallel computationsDistributed Computing10.1007/s00446-021-00418-235:2(165-183)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s00446-021-00418-2
Show More Cited By

Index Terms

A Lower Bound Technique for Communication in BSP
1. Theory of computation
  1. Design and analysis of algorithms
  2. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Minimizing Communication in the Bitonic Sort

This paper presents bitonic sorting schemes for special-purpose parallel architectures such as sorting networks and for general-purpose parallel architectures such as SIMD and/or MIMD computers. First, bitonic sorting algorithms for shared-memory SIMD ...
Property Testing Lower Bounds via Communication Complexity
Selected papers from the 26th Annual IEEE Conference on Computational Complexity (CCC 2011)

We develop a new technique for proving lower bounds in property testing, by showing a strong connection between testing and communication complexity. We give a simple scheme for reducing communication problems to testing problems, thus allowing us to ...
A lower bound technique for communication on BSP with application to the FFT
Euro-Par'12: Proceedings of the 18th international conference on Parallel Processing

Communication complexity is defined, within the Bulk Synchronous Parallel (BSP) model of computation, as the sum of the degrees of all the supersteps. A lower bound to the communication complexity is derived for a given class of DAG computations in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 4, Issue 3

September 2017

120 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3175004

Editor:
Phillip B. Gibbons
Carnegie Mellon University, Pittsburgh, USA

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2018

Accepted: 01 November 2017

Revised: 01 October 2017

Received: 01 January 2015

Published in TOPC Volume 4, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

MIUR of Italy
ERC
NSF
University of Padova

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
428
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)20

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Böhnlein TPapp PYzelman AAgrawal KPetrank E(2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660269
Eyraud-Dubois LIooss GLangou JRastello FAgrawal KPetrank E(2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659986
Nanongkai DScquizzato M(2022)Equivalence classes and conditional hardness in massively parallel computationsDistributed Computing10.1007/s00446-021-00418-235:2(165-183)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s00446-021-00418-2
Nissim RSchwartz O(2019)Revisiting the I/O-Complexity of Fast Matrix Multiplication with Recomputations2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00058(482-490)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00058

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents