More Web Proxy on the site http://driver.im/

research-article

Network-Oblivious Algorithms

Authors:

Gianfranco Bilardi,

Andrea Pietracaprina,

Michele Scquizzato,

Francesco SilvestriAuthors Info & Claims

Journal of the ACM (JACM), Volume 63, Issue 1

Article No.: 3, Pages 1 - 36

https://doi.org/10.1145/2812804

Published: 30 March 2016 Publication History

Abstract

A framework is proposed for the design and analysis of network-oblivious algorithms, namely algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network-oblivious algorithm be specified on a parallel model of computation where the only parameter is the problem’s input size, and then evaluated on a model with two parameters, capturing parallelism granularity and communication latency. It is shown that for a wide class of network-oblivious algorithms, optimality in the latter model implies optimality in the decomposable bulk synchronous parallel model, which is known to effectively describe a wide and significant class of parallel platforms. The proposed framework can be regarded as an attempt to port the notion of obliviousness, well established in the context of cache hierarchies, to the realm of parallel computation. Its effectiveness is illustrated by providing optimal network-oblivious algorithms for a number of key problems. Some limitations of the oblivious approach are also discussed.

References

[1]

Alok Aggarwal, Bowen Alpern, Ashok K. Chandra, and Marc Snir. 1987. A model for hierarchical memory. In Proceedings of the 19th ACM Symposium on Theory of Computing (STOC’87). 305--314.

Digital Library

[2]

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. 1987. Hierarchical memory with block transfer. In Proceedings of the 28th IEEE Symposium on Foundations of Computer Science (FOCS’87). 204--216.

Digital Library

[3]

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. 1990. Communication complexity of PRAMs. Theoretical Computer Science 71, 1, 3--28.

Digital Library

[4]

Alok Aggarwal and Jeffrey S. Vitter. 1988. The input/output complexity of sorting and related problems. Communications of the ACM 31, 9, 1116--1127.

Digital Library

[5]

Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. 2012. Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’12). 77--79.

Digital Library

[6]

Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2011. Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications 32, 3, 866--901.

[7]

Armin Bäumker, Wolfgang Dittrich, and Friedhelm Meyer auf der Heide. 1998. Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Theoretical Computer Science 203, 2, 175--203.

Digital Library

[8]

Michael A. Bender, Roozbeh Ebrahimi, Jeremy T. Fineman, Golnaz Ghasemiesfeh, Rob Johnson, and Samuel McCauley. 2014. Cache-adaptive algorithms. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 958--971.

Digital Library

[9]

Sandeep N. Bhatt, Gianfranco Bilardi, and Geppino Pucci. 2008. Area-time tradeoffs for universal VLSI circuits. Theoretical Computer Science 408, 2--3, 143--150.

Digital Library

[10]

Gianfranco Bilardi and Enoch Peserico. 2001. A characterization of temporal locality and its portability across memory hierarchies. In Proceedings of the 28th International Colloquium on Automata, Languages, and Programming (ICALP’01). 128--139.

Digital Library

[11]

Gianfranco Bilardi and Andrea Pietracaprina. 2011. Theoretical models of computation. In Encyclopedia of Parallel Computing, D. A. Padua (Ed.). Springer, 1150--1158.

[12]

Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. 1999. A quantitative measure of portability with application to bandwidth-latency models for parallel computing. In Proceedings of the 5th International Euro-Par Conference on Parallel Processing (Euro-Par’99). 543--551.

Digital Library

[13]

Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. 2007a. Decomposable BSP: A bandwidth-latency model for parallel and hierarchical computation. In Handbook of Parallel Computing: Models, Algorithms and Applications, J. Reif and S. Rajasekaran (Eds.). CRC Press, Boca Raton, FL, 277--315.

[14]

Gianfranco Bilardi, Andrea Pietracaprina, Geppino Pucci, and Francesco Silvestri. 2007b. Network-oblivious algorithms. In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS’07). 1--10.

[15]

Gianfranco Bilardi and Franco Preparata. 1995. Horizons of parallel computation. Journal of Parallel and Distributed Computing 27, 2, 172--182.

Digital Library

[16]

Gianfranco Bilardi and Franco Preparata. 1997. Processor-time tradeoffs under bounded-speed message propagation: Part I, upper bounds. Theory of Computing Systems 30, 6, 523--546.

Digital Library

[17]

Gianfranco Bilardi and Franco Preparata. 1999. Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Theory of Computing Systems 32, 5, 531--559.

[18]

Gianfranco Bilardi and Geppino Pucci. 2011. Universality in VLSI computation. In Encyclopedia of Parallel Computing, D. A. Padua (Ed.). Springer, 2112--2118.

[19]

Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). 355--366.

Digital Library

[20]

Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2010. Low depth cache-oblivious algorithms. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). 189--199.

Digital Library

[21]

Gerth S. Brodal and Rolf Fagerberg. 2003. On the limits of cache-obliviousness. In Proceedings of the 35th ACM Symposium on Theory of Computing (STOC’03). 307--315.

Digital Library

[22]

Rezaul A. Chowdhury and Vijaya Ramachandran. 2008. Cache-efficient dynamic programming algorithms for multicores. In Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’08). 207--216.

Digital Library

[23]

Rezaul A. Chowdhury, Vijaya Ramachandran, Francesco Silvestri, and Brandon Blakeley. 2013. Oblivious algorithms for multicores and networks of processors. Journal of Parallel and Distributed Computing 73, 7, 911--925.

[24]

Richard Cole and Vijaya Ramachandran. 2010. Resource oblivious sorting on multicores. In Proceedings of the 37th International Colloquium on Automata, Languages, and Programming (ICALP’10). 226--237.

Digital Library

[25]

Richard Cole and Vijaya Ramachandran. 2012a. Efficient resource oblivious algorithms for multicores with false sharing. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12). 201--214.

Digital Library

[26]

Richard Cole and Vijaya Ramachandran. 2012b. Revisiting the cache miss analysis of multithreaded algorithms. In Proceedings of the 10th Latin American Theoretical Informatics Symposium (LATIN’12). 172--183.

Digital Library

[27]

David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Eunice E. Santos, Klaus E. Schauser, Ramesh Subramonian, and Thorsten von Eicken. 1996. LogP: A practical model of parallel computation. Communications of the ACM 39, 11, 78--85.

Digital Library

[28]

Pilar de la Torre and Clyde P. Kruskal. 1996. Submachine locality in the bulk synchronous setting. In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing (Euro-Par’96). 352--358.

Digital Library

[29]

James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Ben Lipshitz, Oded Schwartz, and Omer Spillinger. 2013. Communication-optimal parallel recursive rectangular matrix multiplication. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS’13). 261--272.

Digital Library

[30]

Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 2012. Cache-oblivious algorithms. ACM Transactions on Algorithms 8, 1, Article No. 4.

Digital Library

[31]

Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations. In Proceedings of the 19th International Conference on Supercomputing (ICS’05). 361--366.

Digital Library

[32]

Matteo Frigo and Volker Strumpen. 2009. The cache complexity of multithreaded cache oblivious algorithms. Theory of Computing Systems 45, 2, 203--233.

Digital Library

[33]

Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. 1999. Can a shared-memory model serve as a bridging model for parallel computation? Theory of Computing Systems 32, 3, 327--359.

[34]

Kieran T. Herley. 2011. Network obliviousness. In Encyclopedia of Parallel Computing, D. A. Padua (Ed.). Springer, 1298--1303.

Digital Library

[35]

Dror Irony, Sivan Toledo, and Alexandre Tiskin. 2004. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing 64, 9, 1017--1026.

Digital Library

[36]

Joseph JáJá. 1992. An Introduction to Parallel Algorithms. Addison Wesley Longman.

Digital Library

[37]

Ben H. H. Juurlink and Harry A. G. Wijshoff. 1998. A quantitative comparison of parallel computation models. ACM Transactions on Computer Systems 16, 3, 271--318.

Digital Library

[38]

Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’10). 938--948.

Digital Library

[39]

Leslie Robert Kerr. 1970. The Effect of Algebraic Structure on the Computational Complexity of Matrix Multiplication. Ph.D. Dissertation. Cornell University.

[40]

Frank T. Leighton. 1985. Tight bounds on the complexity of parallel sorting. IEEE Transactions on Computers 34, 4, 344--354.

Digital Library

[41]

Frank T. Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann.

Digital Library

[42]

Charles E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 34, 10, 892--901.

Digital Library

[43]

Charles E. Leiserson and Bruce M. Maggs. 1988. Communication-efficient parallel algorithms for distributed random-access machines. Algorithmica 3, 1--4, 53--77.

Digital Library

[44]

Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, and Eli Upfal. 2012. Space-round tradeoffs for MapReduce computations. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). 235--244.

Digital Library

[45]

Andrea Pietracaprina, Geppino Pucci, and Francesco Silvestri. 2006. Cache-oblivious simulation of parallel programs. In Proceedings of the 8th IEEE IPDPS Workshop on Advances in Parallel and Distributed Computational Models (APDCM’06). 1--8.

Digital Library

[46]

John E. Savage. 1998. Models of Computation: Exploring the Power of Computing. Addison Wesley Longman.

Digital Library

[47]

Michele Scquizzato and Francesco Silvestri. 2014. Communication lower bounds for distributed-memory computations. In Proceedings of the 31st Symposium on Theoretical Aspects of Computer Science (STACS’14). 627--638.

[48]

Francesco Silvestri. 2006. On the limits of cache-oblivious matrix transposition. In Proceedings of the 2nd Symposium on Trustworthy Global Computing (TGC’06). 233--243.

Digital Library

[49]

Francesco Silvestri. 2008. On the limits of cache-oblivious rational permutations. Theoretical Computer Science 402, 2--3, 221--233.

Digital Library

[50]

Harsha Vardhan Simhadri, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Aapo Kyrola. 2014. Experimental analysis of space-bounded schedulers. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’14). 30--41.

Digital Library

[51]

Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). 117--128.

Digital Library

[52]

Yuan Tang, Ronghui You, Haibin Kan, Jesmin Jahan Tithi, Pramod Ganapathi, and Rezaul A. Chowdhury. 2015. Cache-oblivious wavefront: Improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). 205--214.

Digital Library

[53]

Alexandre Tiskin. 1998. The bulk-synchronous parallel random access machine. Theoretical Computer Science 196, 1--2, 109--130.

Digital Library

[54]

Leslie G. Valiant. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8, 103--111.

Digital Library

[55]

Leslie G. Valiant. 2011. A bridging model for multi-core computing. Journal of Computer and System Sciences 77, 1, 154--166.

Digital Library

Cited By

Tang YGao W(2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472506
Tang YScheideler CSpear M(2020)Balanced Partitioning of Several Cache-Oblivious AlgorithmsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400214(575-577)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3350755.3400214
Bilardi GScquizzato MSilvestri F(2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
https://dl.acm.org/doi/10.1145/3181776
Show More Cited By

Index Terms

Network-Oblivious Algorithms
1. Theory of computation
  1. Design and analysis of algorithms
  2. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Low depth cache-oblivious algorithms
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (...
Parallelizing Subroutines in Sequential Programs

An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran ...
Pipelined Data Parallel Algorithms-I: Concept and Modeling

The basic concept of pipelined data-parallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined data-parallel algorithms are a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 63, Issue 1

March 2016

353 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/2893450

Editor:
Éva Tardos
Cornell University

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2016

Accepted: 01 August 2015

Revised: 01 July 2015

Received: 01 April 2014

Published in JACM Volume 63, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

University of Padova
European Research Council
MIUR of Italy under project AMANDA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
518
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tang YGao W(2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472506
Tang YScheideler CSpear M(2020)Balanced Partitioning of Several Cache-Oblivious AlgorithmsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400214(575-577)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3350755.3400214
Bilardi GScquizzato MSilvestri F(2018)A Lower Bound Technique for Communication in BSPACM Transactions on Parallel Computing10.1145/31817764:3(1-27)Online publication date: 20-Feb-2018
https://dl.acm.org/doi/10.1145/3181776
Cole RRamachandran VScheideler CHajiaghayi M(2017)Bounding Cache Miss Costs of Multithreaded Computations Under General SchedulersProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087572(351-362)Online publication date: 24-Jul-2017
https://dl.acm.org/doi/10.1145/3087556.3087572
Bender MBerry JHammond SScott Hemmert KMcCauley SMoore BMoseley BPhillips CResnick DRodrigues A(2017)Two-level main memory co-designJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.009102:C(213-228)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1016/j.jpdc.2016.12.009

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents