[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Public Access

A Lower Bound Technique for Communication in BSP

Published: 20 February 2018 Publication History

Abstract

Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This article presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM.

References

[1]
Alok Aggarwal, Ashok K. Chandra, and Marc Snir. 1990. Communication complexity of PRAMs. Theoret. Comput. Sci. 71, 1 (1990), 3--28.
[2]
Alok Aggarwal and Jeffrey S. Vitter. 1988. The input/output complexity of sorting and related problems. Comm. ACM 31, 9 (1988), 1116--1127.
[3]
Miklós Ajtai, János Komlós, and Endre Szemerédi. 1983. Sorting in c log n parallel steps. Combinatorica 3, 1 (1983), 1--19.
[4]
Grey Ballard, James Demmel, Andrew Gearhart, Ben Lipshitz, Yishai Oltchik, Oded Schwartz, and Sivan Toledo. 2016. Network topologies and inevitable contention. In Proceedings of the 1st International Workshop on Communication Optimizations in HPC (COMHPC’16). 39--52.
[5]
Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2011. Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32, 3 (2011), 866--901.
[6]
Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2012. Graph expansion and communication costs of fast matrix multiplication. J. ACM 59, 6 (2012), Article 32.
[7]
Kenneth E. Batcher. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference, Vol. 32. 307--314.
[8]
Armin Bäumker, Wolfgang Dittrich, and Friedhelm Meyer auf der Heide. 1998. Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Theoret. Comput. Sci. 203, 2 (1998), 175--203.
[9]
Václav E. Beneš. 1964. Permutation groups, complexes, and rearrangeable connecting networks. Bell System Tech. J. 43 (1964), 1619--1640.
[10]
Sandeep N. Bhatt, Gianfranco Bilardi, and Geppino Pucci. 2008. Area-time tradeoffs for universal VLSI circuits. Theoret. Comput. Sci. 408, 2--3 (2008), 143--150.
[11]
Gianfranco Bilardi. 1989. Merging and sorting networks with the topology of the omega network. IEEE Trans. Comput. 38, 10 (1989), 1396--1403.
[12]
Gianfranco Bilardi and Carlo Fantozzi. 2011. New area-time lower bounds for the multidimensional DFT. In Proceedings of the 17th Computing: The Australasian Theory Symposium (CATS’11). 111--120.
[13]
Gianfranco Bilardi, Andrea Pietracaprina, and Paolo D’Alberto. 2000. On the space and access complexity of computation DAGs. In Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science (WG’00). 47--58.
[14]
Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. 2007. Decomposable BSP: A bandwidth-latency model for parallel and hierarchical computation. In Handbook of Parallel Computing: Models, Algorithms and Applications. CRC Press, 277--315.
[15]
Gianfranco Bilardi, Andrea Pietracaprina, Geppino Pucci, Michele Scquizzato, and Francesco Silvestri. 2016. Network-oblivious algorithms. J. ACM 63, 1, Article 3 (2016).
[16]
Gianfranco Bilardi and Franco Preparata. 1986. Area-time lower-bound techniques with applications to sorting. Algorithmica 1, 1 (1986), 65--91.
[17]
Gianfranco Bilardi and Franco Preparata. 1999. Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory Comput. Syst. 32, 5 (1999), 531--559.
[18]
Gianfranco Bilardi, Michele Scquizzato, and Francesco Silvestri. 2012. A lower bound technique for communication on BSP with application to the FFT. In Proceedings of the 18th International European Conference on Parallel and Distributed Computing (Euro-Par’12). 676--687.
[19]
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. 2016. Efficient algorithms with asymmetric read and write costs. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA’16). 14:1--14:18.
[20]
Rezaul A. Chowdhury, Vijaya Ramachandran, Francesco Silvestri, and Brandon Blakeley. 2013. Oblivious algorithms for multicores and networks of processors. J. Parallel Distrib. Comput. 73, 7 (2013), 911--925.
[21]
Richard Cole and Vijaya Ramachandran. 2012. Efficient resource oblivious algorithms for multicores with false sharing. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12). 201--214.
[22]
Richard Cole and Vijaya Ramachandran. 2017. Resource oblivious sorting on multicores. ACM Trans. Parallel Comput. 3, 4, Article 23 (2017).
[23]
James W. Cooley and John W. Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19 (1965), 297--301.
[24]
David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Eunice E. Santos, Klaus E. Schauser, Ramesh Subramonian, and Thorsten von Eicken. 1996. LogP: A practical model of parallel computation. Commun. ACM 39, 11 (1996), 78--85.
[25]
Pilar de la Torre and Clyde P. Kruskal. 1996. Submachine locality in the bulk synchronous setting. In Proceedings of the 2nd International Conference on Parallel Processing (Euro-Par’96). 352--358.
[26]
Martin Dowd, Yehoshua Perl, Larry Rudolph, and Michael Saks. 1989. The periodic balanced sorting network. J. ACM 36, 4 (1989), 738--757.
[27]
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 2012. Cache-oblivious algorithms. ACM Trans. Algorithms 8, 1, Article 4 (2012).
[28]
Michael T. Goodrich. 1999. Communication-efficient parallel sorting. SIAM J. Comput. 29, 2 (1999), 416--432.
[29]
Philip Hall. 1935. On representatives of subsets. J. London Math. Soc. 10, 1 (1935), 26--30.
[30]
Frederick C. Hennie. 1965. One-tape, off-line Turing machine computations. Info. Control 8, 6 (1965), 553--578.
[31]
Jia-Wei Hong and H. T. Kung. 1981. I/O complexity: The red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing (STOC’81). 326--333.
[32]
John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. 1977. On time versus space. J. ACM 24, 2 (1977), 332--337.
[33]
Dror Irony, Sivan Toledo, and Alexandre Tiskin. 2004. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64, 9 (2004), 1017--1026.
[34]
Ben H. H. Juurlink and Harry A. G. Wijshoff. 1998. A quantitative comparison of parallel computation models. ACM Trans. Comput. Syst. 16, 3 (1998), 271--318.
[35]
Donald E. Knuth. 1973. The Art of Computer Programming, Volume III: Sorting and Searching (2nd ed.). Addison-Wesley.
[36]
Richard R. Koch, Frank T. Leighton, Bruce M. Maggs, Satish B. Rao, Arnold L. Rosenberg, and Eric J. Schwabe. 1997. Work-preserving emulations of fixed-connection networks. J. ACM 44, 1 (1997), 104--147.
[37]
Duncan H. Lawrie. 1975. Access and alignment of data in an array processor. IEEE Trans. Comput. C-24, 12 (1975), 1145--1155.
[38]
Frank T. Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann.
[39]
Philip D. MacKenzie and Vijaya Ramachandran. 1998. Computational bounds for fundamental problems on general-purpose parallel models. In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98). 152--163.
[40]
Christos H. Papadimitriou and Jeffrey D. Ullman. 1987. A communication-time tradeoff. SIAM J. Comput. 16, 4 (1987), 639--646.
[41]
Michael S. Paterson and Carl E. Hewitt. 1970. Comparative schematology. In Proceedings of the Project MAC Conference on Concurrent Systems and Parallel Computation. 119--127.
[42]
Desh Ranjan, John Savage, and Mohammad Zubair. 2011. Strong I/O lower bounds for binomial and FFT computation graphs. In Proceedings of the 17th Annual International Conference on Computing and Combinatorics (COCOON’11). 134--145.
[43]
John E. Savage. 1995. Extending the hong-kung model to memory hierarchies. In Proceedings of the 1st Annual International Conference on Computing and Combinatorics (COCOON’95). 270--281.
[44]
John E. Savage. 1998. Models of Computation: Exploring the Power of Computing. Addison-Wesley Longman Publishing Co., Inc.
[45]
Michele Scquizzato and Francesco Silvestri. 2014. Communication lower bounds for distributed-memory computations. In Proceedings of the 31st Symposium on Theoretical Aspects of Computer Science (STACS’14). 627--638.
[46]
David C. Thompson. 1980. A Complexity Theory for VLSI. Ph.D. Dissertation. Carnegie Mellon University.
[47]
Alexandre Tiskin. 1998. The bulk-synchronous parallel random access machine. Theoret. Comput. Sci. 196, 1--2 (1998), 109--130.
[48]
Alexander Tiskin. 2011. BSP (bulk synchronous parallelism). In Encyclopedia of Parallel Computing. Springer, 192--199.
[49]
Leslie G. Valiant. 1990. A bridging model for parallel computation. Comm. ACM 33, 8 (1990), 103--111.
[50]
Jeffrey Scott Vitter. 2006. Algorithms and data structures for external memory. Found. Trends Theoret. Comput. Sci. 2, 4 (2006), 305--474.
[51]
Jean Vuillemin. 1983. A combinatorial limit to the computing power of VLSI circuits. IEEE Trans. Comput. 32, 3 (1983), 294--300.
[52]
Chuan-Lin Wu and Tse-Yun Feng. 1981. The universality of the shuffle-exchange network. IEEE Trans. Comput. 30, 5 (1981), 324--332.
[53]
Andrew Chi-Chih Yao. 1979. Some complexity questions related to distributive computing. In Proceedings of the 11th Annual ACM Symposium on Theory of Computing (STOC’79). 209--213.

Cited By

View all
  • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
  • (2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
  • (2022)Equivalence classes and conditional hardness in massively parallel computationsDistributed Computing10.1007/s00446-021-00418-235:2(165-183)Online publication date: 1-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 4, Issue 3
September 2017
120 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3175004
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2018
Accepted: 01 November 2017
Revised: 01 October 2017
Received: 01 January 2015
Published in TOPC Volume 4, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Communication
  2. FFT
  3. lower bounds
  4. parallel computing
  5. sorting networks
  6. switching networks

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • MIUR of Italy
  • ERC
  • NSF
  • University of Padova

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)20
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
  • (2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
  • (2022)Equivalence classes and conditional hardness in massively parallel computationsDistributed Computing10.1007/s00446-021-00418-235:2(165-183)Online publication date: 1-Apr-2022
  • (2019)Revisiting the I/O-Complexity of Fast Matrix Multiplication with Recomputations2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00058(482-490)Online publication date: May-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media