[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Network-Oblivious Algorithms

Published: 30 March 2016 Publication History

Abstract

A framework is proposed for the design and analysis of network-oblivious algorithms, namely algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network-oblivious algorithm be specified on a parallel model of computation where the only parameter is the problem’s input size, and then evaluated on a model with two parameters, capturing parallelism granularity and communication latency. It is shown that for a wide class of network-oblivious algorithms, optimality in the latter model implies optimality in the decomposable bulk synchronous parallel model, which is known to effectively describe a wide and significant class of parallel platforms. The proposed framework can be regarded as an attempt to port the notion of obliviousness, well established in the context of cache hierarchies, to the realm of parallel computation. Its effectiveness is illustrated by providing optimal network-oblivious algorithms for a number of key problems. Some limitations of the oblivious approach are also discussed.

References

[1]
Alok Aggarwal, Bowen Alpern, Ashok K. Chandra, and Marc Snir. 1987. A model for hierarchical memory. In Proceedings of the 19th ACM Symposium on Theory of Computing (STOC’87). 305--314.
[2]
Alok Aggarwal, Ashok K. Chandra, and Marc Snir. 1987. Hierarchical memory with block transfer. In Proceedings of the 28th IEEE Symposium on Foundations of Computer Science (FOCS’87). 204--216.
[3]
Alok Aggarwal, Ashok K. Chandra, and Marc Snir. 1990. Communication complexity of PRAMs. Theoretical Computer Science 71, 1, 3--28.
[4]
Alok Aggarwal and Jeffrey S. Vitter. 1988. The input/output complexity of sorting and related problems. Communications of the ACM 31, 9, 1116--1127.
[5]
Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. 2012. Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’12). 77--79.
[6]
Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2011. Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications 32, 3, 866--901.
[7]
Armin Bäumker, Wolfgang Dittrich, and Friedhelm Meyer auf der Heide. 1998. Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Theoretical Computer Science 203, 2, 175--203.
[8]
Michael A. Bender, Roozbeh Ebrahimi, Jeremy T. Fineman, Golnaz Ghasemiesfeh, Rob Johnson, and Samuel McCauley. 2014. Cache-adaptive algorithms. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 958--971.
[9]
Sandeep N. Bhatt, Gianfranco Bilardi, and Geppino Pucci. 2008. Area-time tradeoffs for universal VLSI circuits. Theoretical Computer Science 408, 2--3, 143--150.
[10]
Gianfranco Bilardi and Enoch Peserico. 2001. A characterization of temporal locality and its portability across memory hierarchies. In Proceedings of the 28th International Colloquium on Automata, Languages, and Programming (ICALP’01). 128--139.
[11]
Gianfranco Bilardi and Andrea Pietracaprina. 2011. Theoretical models of computation. In Encyclopedia of Parallel Computing, D. A. Padua (Ed.). Springer, 1150--1158.
[12]
Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. 1999. A quantitative measure of portability with application to bandwidth-latency models for parallel computing. In Proceedings of the 5th International Euro-Par Conference on Parallel Processing (Euro-Par’99). 543--551.
[13]
Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. 2007a. Decomposable BSP: A bandwidth-latency model for parallel and hierarchical computation. In Handbook of Parallel Computing: Models, Algorithms and Applications, J. Reif and S. Rajasekaran (Eds.). CRC Press, Boca Raton, FL, 277--315.
[14]
Gianfranco Bilardi, Andrea Pietracaprina, Geppino Pucci, and Francesco Silvestri. 2007b. Network-oblivious algorithms. In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS’07). 1--10.
[15]
Gianfranco Bilardi and Franco Preparata. 1995. Horizons of parallel computation. Journal of Parallel and Distributed Computing 27, 2, 172--182.
[16]
Gianfranco Bilardi and Franco Preparata. 1997. Processor-time tradeoffs under bounded-speed message propagation: Part I, upper bounds. Theory of Computing Systems 30, 6, 523--546.
[17]
Gianfranco Bilardi and Franco Preparata. 1999. Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32, 5, 531--559.
[18]
Gianfranco Bilardi and Geppino Pucci. 2011. Universality in VLSI computation. In Encyclopedia of Parallel Computing, D. A. Padua (Ed.). Springer, 2112--2118.
[19]
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). 355--366.
[20]
Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2010. Low depth cache-oblivious algorithms. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). 189--199.
[21]
Gerth S. Brodal and Rolf Fagerberg. 2003. On the limits of cache-obliviousness. In Proceedings of the 35th ACM Symposium on Theory of Computing (STOC’03). 307--315.
[22]
Rezaul A. Chowdhury and Vijaya Ramachandran. 2008. Cache-efficient dynamic programming algorithms for multicores. In Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’08). 207--216.
[23]
Rezaul A. Chowdhury, Vijaya Ramachandran, Francesco Silvestri, and Brandon Blakeley. 2013. Oblivious algorithms for multicores and networks of processors. Journal of Parallel and Distributed Computing 73, 7, 911--925.
[24]
Richard Cole and Vijaya Ramachandran. 2010. Resource oblivious sorting on multicores. In Proceedings of the 37th International Colloquium on Automata, Languages, and Programming (ICALP’10). 226--237.
[25]
Richard Cole and Vijaya Ramachandran. 2012a. Efficient resource oblivious algorithms for multicores with false sharing. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12). 201--214.
[26]
Richard Cole and Vijaya Ramachandran. 2012b. Revisiting the cache miss analysis of multithreaded algorithms. In Proceedings of the 10th Latin American Theoretical Informatics Symposium (LATIN’12). 172--183.
[27]
David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Eunice E. Santos, Klaus E. Schauser, Ramesh Subramonian, and Thorsten von Eicken. 1996. LogP: A practical model of parallel computation. Communications of the ACM 39, 11, 78--85.
[28]
Pilar de la Torre and Clyde P. Kruskal. 1996. Submachine locality in the bulk synchronous setting. In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing (Euro-Par’96). 352--358.
[29]
James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Ben Lipshitz, Oded Schwartz, and Omer Spillinger. 2013. Communication-optimal parallel recursive rectangular matrix multiplication. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS’13). 261--272.
[30]
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 2012. Cache-oblivious algorithms. ACM Transactions on Algorithms 8, 1, Article No. 4.
[31]
Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations. In Proceedings of the 19th International Conference on Supercomputing (ICS’05). 361--366.
[32]
Matteo Frigo and Volker Strumpen. 2009. The cache complexity of multithreaded cache oblivious algorithms. Theory of Computing Systems 45, 2, 203--233.
[33]
Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. 1999. Can a shared-memory model serve as a bridging model for parallel computation? Theory of Computing Systems 32, 3, 327--359.
[34]
Kieran T. Herley. 2011. Network obliviousness. In Encyclopedia of Parallel Computing, D. A. Padua (Ed.). Springer, 1298--1303.
[35]
Dror Irony, Sivan Toledo, and Alexandre Tiskin. 2004. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing 64, 9, 1017--1026.
[36]
Joseph JáJá. 1992. An Introduction to Parallel Algorithms. Addison Wesley Longman.
[37]
Ben H. H. Juurlink and Harry A. G. Wijshoff. 1998. A quantitative comparison of parallel computation models. ACM Transactions on Computer Systems 16, 3, 271--318.
[38]
Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’10). 938--948.
[39]
Leslie Robert Kerr. 1970. The Effect of Algebraic Structure on the Computational Complexity of Matrix Multiplication. Ph.D. Dissertation. Cornell University.
[40]
Frank T. Leighton. 1985. Tight bounds on the complexity of parallel sorting. IEEE Transactions on Computers 34, 4, 344--354.
[41]
Frank T. Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann.
[42]
Charles E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 34, 10, 892--901.
[43]
Charles E. Leiserson and Bruce M. Maggs. 1988. Communication-efficient parallel algorithms for distributed random-access machines. Algorithmica 3, 1--4, 53--77.
[44]
Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, and Eli Upfal. 2012. Space-round tradeoffs for MapReduce computations. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). 235--244.
[45]
Andrea Pietracaprina, Geppino Pucci, and Francesco Silvestri. 2006. Cache-oblivious simulation of parallel programs. In Proceedings of the 8th IEEE IPDPS Workshop on Advances in Parallel and Distributed Computational Models (APDCM’06). 1--8.
[46]
John E. Savage. 1998. Models of Computation: Exploring the Power of Computing. Addison Wesley Longman.
[47]
Michele Scquizzato and Francesco Silvestri. 2014. Communication lower bounds for distributed-memory computations. In Proceedings of the 31st Symposium on Theoretical Aspects of Computer Science (STACS’14). 627--638.
[48]
Francesco Silvestri. 2006. On the limits of cache-oblivious matrix transposition. In Proceedings of the 2nd Symposium on Trustworthy Global Computing (TGC’06). 233--243.
[49]
Francesco Silvestri. 2008. On the limits of cache-oblivious rational permutations. Theoretical Computer Science 402, 2--3, 221--233.
[50]
Harsha Vardhan Simhadri, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Aapo Kyrola. 2014. Experimental analysis of space-bounded schedulers. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’14). 30--41.
[51]
Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). 117--128.
[52]
Yuan Tang, Ronghui You, Haibin Kan, Jesmin Jahan Tithi, Pramod Ganapathi, and Rezaul A. Chowdhury. 2015. Cache-oblivious wavefront: Improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). 205--214.
[53]
Alexandre Tiskin. 1998. The bulk-synchronous parallel random access machine. Theoretical Computer Science 196, 1--2, 109--130.
[54]
Leslie G. Valiant. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8, 103--111.
[55]
Leslie G. Valiant. 2011. A bridging model for multi-core computing. Journal of Computer and System Sciences 77, 1, 154--166.

Cited By

View all
  • (2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
  • (2020)Balanced Partitioning of Several Cache-Oblivious AlgorithmsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400214(575-577)Online publication date: 6-Jul-2020
  • (2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 63, Issue 1
March 2016
353 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/2893450
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2016
Accepted: 01 August 2015
Revised: 01 July 2015
Received: 01 April 2014
Published in JACM Volume 63, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Communication
  2. models of computation
  3. network locality
  4. oblivious algorithms
  5. parallel algorithms

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • University of Padova
  • European Research Council
  • MIUR of Italy under project AMANDA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
  • (2020)Balanced Partitioning of Several Cache-Oblivious AlgorithmsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400214(575-577)Online publication date: 6-Jul-2020
  • (2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
  • (2017)Bounding Cache Miss Costs of Multithreaded Computations Under General SchedulersProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087572(351-362)Online publication date: 24-Jul-2017
  • (2017)Two-level main memory co-designJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.009102:C(213-228)Online publication date: 1-Apr-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media