[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1152154.1152189acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
Article

Program generation for the all-pairs shortest path problem

Published: 16 September 2006 Publication History

Abstract

A recent trend in computing are domain-specific program generators, designed to alleviate the effort of porting and reoptimizing libraries for fast-changing and increasingly complex computing platforms. Examples include ATLAS, SPIRAL, and the codelet generator in FFTW. Each of these generators produces highly optimized source code directly from a problem specification. In this paper, we extend this list by a program generator for the well-known Floyd-Warshall (FW) algorithm that solves the all-pairs shortest path problem, which is important in a wide range of engineering applications.As the first contribution, we derive variants of the FW algorithm that make it possible to apply many of the optimization techniques developed for matrix-matrix multiplication. The second contribution is the actual program generator, which uses tiling, loop unrolling, and SIMD vectorization combined with a hill climbing search to produce the best code (float or integer) for a given platform.Using the program generator, we demonstrate a speedup over a straightforward single-precision implementation of up to a factor of 1.3 on Pentium 4 and 1.8 on Athlon 64. Use of 4-way vectorization further improves the performance by another factor of up to 5.7 on Pentium 4 and 3.0 on Athlon 64. For data type short integers, 8-way vectorization provides a speed-up of up to 4.6 on Pentium 4 and 5.0 on Athlon 64 over the best scalar code.

References

[1]
G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. J. Harrison, S. Hirata, S. Krishanmoorthy, S. Krishnan, C.-C. Lam, Q. Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proceedings of the IEEE, 93(2):276--292, 2005. Special issue on "Program Generation, Optimization, and Adaptation".
[2]
D. Bertsekas and R. Gallager. Data Networks. Prentice Hall, Upper Saddle River, 1992.
[3]
P. Bientinesi, J. A. Gunnels, M. E. Myers, E. Quintana-Orti, and R. van de Geijn. The science of deriving dense linear algebra algorithms. ACM Transactions on Mathematical Software, 31(1):1--26, March 2005.
[4]
C. Demetrescu and G. F. Italiano. A new approach to dynamic all pairs shortest paths. In STOC '03: Proceedings of the 35th ACM Symposium on Theory of Computing, pages 159--166, New York, NY, USA, 2003. ACM Press.
[5]
J. Demmel, J. J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R. C. Whaley, and K. Yelick. Self adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2):342--357, 2005. Special issue on "Program Generation, Optimization, and Adaptation".
[6]
S. E. Dreyfus and A. M. Law. The Art and Theory of Dynamic Programming. Academic Press, New York, NY, 1977.
[7]
M. Elkin. Computing almost shortest paths. ACM Transactions on Algorithms, 1(2):283--323, 2005.
[8]
M. Frigo. A fast Fourier transform compiler. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI), pages 169--180, 1999.
[9]
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Adaptation".
[10]
A. V. Goldberg and C. Harrelson. Computing the shortest path: A search meets graph theory. In SODA '05: Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms, pages 156--165, Philadelphia, PA, USA, 2005.
[11]
J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn. Flame: Formal linear algebra methods environment. TOMS, 27(4):422--455, December 2001.
[12]
E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int'l Journal of High Performance Computing Applications, 18(1), 2004.
[13]
Intel Corporation. Intel C++ compiler documentation, 2005. Document number 304968-005.
[14]
R. Johnsonbaugh and M. Kalin. A graph generation package. www.depaul.edu/~rjohnson/source.
[15]
J. Moy. OSPF version 2. RFC 2328, April 1998.
[16]
J.-S. Park, M. Penner, and V. K. Prasanna. Optimizing graph algorithms for improved cache performance. IEEE Transactions on Parallel and Distributed Systems, 15(9):769--782, 2004.
[17]
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005. Special issue on "Program Generation, Optimization, and Adaptation".
[18]
G. Venkataraman, S. Sahni, and S. Mukhopadhyaya. A blocked all-pairs shortest-paths algorithm. J. Experimental Algorithms, 8:2.2, 2003.
[19]
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001. Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000.
[20]
K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS. Proceedings of the IEEE, 93(2):358--386, 2005. Special issue on "Program Generation, Optimization, and Adaptation".

Cited By

View all
  • (2024)Enhanced OpenMP Algorithm to Compute All-Pairs Shortest Path on X86 ArchitecturesComputer Science – CACIC 202310.1007/978-3-031-62245-8_4(46-61)Online publication date: 23-Jun-2024
  • (2023)Path Merging Based Betweenness Centrality Algorithm in Delay Tolerant NetworksIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.331007141:10(3133-3145)Online publication date: Oct-2023
  • (2023)Galliot: Path Merging Based Betweenness Centrality Algorithm on GPUIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10229018(1-10)Online publication date: 17-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques
September 2006
308 pages
ISBN:159593264X
DOI:10.1145/1152154
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Floyd-Warshall algorithm
  2. SIMD vectorization
  3. blocking
  4. empirical search
  5. tiling

Qualifiers

  • Article

Conference

PACT06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhanced OpenMP Algorithm to Compute All-Pairs Shortest Path on X86 ArchitecturesComputer Science – CACIC 202310.1007/978-3-031-62245-8_4(46-61)Online publication date: 23-Jun-2024
  • (2023)Path Merging Based Betweenness Centrality Algorithm in Delay Tolerant NetworksIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.331007141:10(3133-3145)Online publication date: Oct-2023
  • (2023)Galliot: Path Merging Based Betweenness Centrality Algorithm on GPUIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10229018(1-10)Online publication date: 17-May-2023
  • (2023)Network Robustness Improvement Based on Alternative Paths ConsiderationIntelligent Transport Systems10.1007/978-3-031-49379-9_10(179-193)Online publication date: 12-Dec-2023
  • (2021)A Fresh Look at Zones and OctagonsACM Transactions on Programming Languages and Systems10.1145/345788543:3(1-51)Online publication date: 3-Sep-2021
  • (2020)Deriving parametric multi-way recursive divide-and-conquer dynamic programming algorithms using polyhedral compilersProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377916(317-329)Online publication date: 22-Feb-2020
  • (2020)Comparison of the Non-Blocked and Blocked Floyd-Warshall Algorithm with Regard to Speedup and Energy Saving on an Embedded GPU2020 19th International Symposium INFOTEH-JAHORINA (INFOTEH)10.1109/INFOTEH48170.2020.9066330(1-5)Online publication date: Mar-2020
  • (2020)Efficient parallel implementations to compute the diameter of a graphConcurrency and Computation: Practice and Experience10.1002/cpe.596335:11Online publication date: 16-Aug-2020
  • (2019)Efficient GPU Implementations to Compute the Diameter of a Graph2019 Seventh International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2019.00020(102-111)Online publication date: Nov-2019
  • (2018)From Bitcoin to Bitcoin CashProceedings of the 1st Workshop on Cryptocurrencies and Blockchains for Distributed Systems10.1145/3211933.3211947(77-81)Online publication date: 15-Jun-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media