[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3225058.3225117acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Topology-induced Enhancement of Mappings

Published: 13 August 2018 Publication History

Abstract

In this paper we propose a new method to enhance a mapping μ(·) of a parallel application's computational tasks to the processing elements (PEs) of a parallel computer. The idea behind our method TiMEr is to enhance such a mapping by drawing on the observation that many topologies take the form of a partial cube. This class of graphs includes all rectangular and cubic meshes, any such torus with even extensions in each dimension, all hypercubes, and all trees.
Following previous work, we represent the parallel application and the parallel computer by graphs Ga and Gp, respectively. Gp being a partial cube allows us to label its vertices, the PEs, by bitvectors such that the cost of exchanging one unit of information between any two vertices of Gp amounts to the Hamming distance between their labels. By transferring these bitvectors to the vertex set of Ga via μ-1(·) and extending them to be unique on Ga, we can enhance μ(·) by swapping labels on Ga in a new way. Pairs of swapped labels are local w. r. t. the PEs, but not w. r. t. Ga. Moreover, permutations of the bitvectors' entries give rise to a plethora of hierarchies on the PEs. Through these hierarchies we turn TiMEr into a hierarchical method for improving μ(·) that is complementary to state-of-the-art methods for computing μ(·) in the first place.
In our experiments we use TiMEr to enhance mappings of complex networks onto rectangular meshes and tori with 256 and 512 nodes, as well as hypercubes with 256 nodes. It turns out that common quality measure of mappings derived from state-of-the-art tools, such as Scotch and KaHIP, can be improved up to 34%.

References

[1]
Eric Aubanel. 2009. Resource-Aware Load Balancing of Parallel Applications. In Handbook of Research on Grid Technologies and Utility Computing: Concepts for Managing Large-Scale Applications, Emmanuel Udoh and Frank Zhigang Wang (Eds.). Information Science Reference - Imprint of: IGI Publishing, 12--21.
[2]
C. Bichot and P. Siarry (Eds.). 2011. Graph Partitioning. Wiley.
[3]
B. Brandfass, T. Alrutz, and T. Gerhold. 2013. Rank Reordering for MPI Communication Optimization. Computers & Fluids 80, 0 (2013), 372--380.
[4]
Aydin Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. 2016. Recent Advances in Graph Partitioning. In Algorithm Engineering -Selected Results and Surveys, Lasse Kliemann and Peter Sanders (Eds.). Lecture Notes in Computer Science, Vol. 9220. 117--158.
[5]
Siew Yin Chan, Teck Chaw Ling, and Eric Aubanel. 2012. The Impact of Heterogeneous Multi-Core Clusters on Graph Partitioning: An Empirical Study. Cluster Computing 15, 3 (2012), 281--302.
[6]
Cédric Chevalier and Ilya Safro. 2009. Comparison of Coarsening Schemes for Multilevel Graph Partitioning. In Learning and Intelligent Optimization, Third International Conference, LION 3, Trento, Italy, January 14-18, 2009. Selected Papers (Lecture Notes in Computer Science), Thomas Stützle (Ed.), Vol. 5851. Springer, 191--205.
[7]
C. Clauss, S. Lankes, P. Reble, and T. Bemmerl. 2011. Evaluation and improvements of programming models for the Intel SCC many-core processor. In 2011 International Conference on High Performance Computing Simulation. 525--532.
[8]
M. Deveci, K. Kaya, B. Uçar, and U. V. Catalyurek. 2015. Fast and High Quality Topology-Aware Task Mapping. In 2015 IEEE International Parallel and Distributed Processing Symposium. 197--206.
[9]
D. Eppstein. 2011. Recognizing Partial Cubes in Quadratic Time. J. Graph Algorithms Appl. 15, 2 (2011), 269--293.
[10]
Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co.
[11]
R. Glantz, H. Meyerhenke, and A. Noe. 2015. Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures. In 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2015, Turku, Finland. 236--243.
[12]
T. Hoefler and M. Snir. 2011. Generic Topology Mapping Strategies for Large-scale Parallel Architectures. In ACM International Conference on Supercomputing (ICS'11). ACM, 75--85.
[13]
W. Imrich. 1993. A simple O(mn) algorithm for recognizing Hamming graphs. Bull. Inst. Comb. Appl (1993), 45--56.
[14]
E. Jeannot, G. Mercier, and F. Tessier. 2013. Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques. IEEE Transactions on Parallel and Distributed Systems PP, 99 (2013), 1--1.
[15]
G. Karypis and V. Kumar. 1998. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359--392.
[16]
B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal 49 (1970), 291--307.
[17]
Young Man Kim and Ten-Hwang Lai. 1991. The Complexity of Congestion-1 Embedding in a Hypercube. Journal of Algorithms 12, 2 (1991), 246--280.
[18]
Shad Kirmani, JeongHyung Park, and Padma Raghavan. 2017. An embedded sectioning scheme for multiprocessor topology-aware mapping of irregular applications. IJHPCA 31, 1 (2017), 91--103.
[19]
Henning Meyerhenke, Burkhard Monien, and Thomas Sauerwald. 2009. A New Diffusion-based Multilevel Algorithm for Computing Graph Partitions. J. Parallel and Distributed Computing 69, 9 (2009), 750--761.
[20]
Vitaly Osipov and Peter Sanders. 2010. n-Level Graph Partitioning. In Algorithms - ESA 2010, 18th Annual European Symposium, Liverpool, UK, September 6-8, 2010. Proceedings, Part I (Lecture Notes in Computer Science), Mark de Berg and Ulrich Meyer (Eds.), Vol. 6346. Springer, 278--289.
[21]
S. Ovchinnikov. 2008. Partial cubes: Structures, characterizations, and constructions. Discrete Mathematics 308 (2008), 5597--5621.
[22]
François Pellegrini. 1994. Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs. In Scalable High-Performance Computing Conference (SHPCC). IEEE, 486--493.
[23]
François Pellegrini. 2007. Scotch and libScotch 5.0 User's Guide. Technical Report. LaBRI, Université Bordeaux I.
[24]
François Pellegrini. 2011. Static Mapping of Process Graphs. In Graph Partitioning, Charles-Edmond Bichot and Patrick Siarry (Eds.). John Wiley & Sons, Chapter 5, 115--136.
[25]
Xinyu Que, Fabio Checconi, Fabrizio Petrini, and John A. Gunnels. 2015. Scalable Community Detection with the Louvain Algorithm. In 2015 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, Hyderabad, India, May 25-29, 2015. IEEE Computer Society, 28--37.
[26]
I. Safro, P. Sanders, and Ch. Schulz. 2012. Advanced Coarsening Schemes for Graph Partitioning. In Proc. 11th Int. Symp. on Experimental Algorithms. Springer, 369--380.
[27]
P. Sanders and C. Schulz. 2013. High Quality Graph Partitioning. In Proc. of the 10th DIMACS Impl. Challenge Workshop: Graph Partitioning and Graph Clustering. AMS, 1--17.
[28]
Peter Sanders and Christian Schulz. 2013. KaHIP v0.53 - Karlsruhe High Quality Partitioning - User Guide. CoRR abs/1311.1714 (2013).
[29]
C. Schulz and J. L. Träff. 2017. Better Process Mapping and Sparse Quadratic Assignment. CoRR abs/1702.04164 (2017).
[30]
Roman Trobec and Gregor Kosec. 2015. Parallel Scientific Computing. Theory, Algorithms, and Applications of Mesh Based and Meshless Methods. Springer Intl. Publ.
[31]
Bora Ucar, Cevdet Aykanat, Kamer Kaya, and Murat Ikinci. 2006. Task Assignment in Heterogeneous Computing Systems. J. Parallel and Distrib. Comput. 66, 1 (2006), 32--46.
[32]
Chris Walshaw and Mark Cross. 2001. Multilevel Mesh Partitioning for Heterogeneous Communication Networks. Future Generation Comp. Syst. 17, 5 (2001), 601--623.
[33]
Hao Yu, I-Hsin Chung, and Jose Moreira. 2006. Topology Mapping for Blue Gene/L Supercomputer. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC '06). ACM, New York, NY, USA.

Cited By

View all
  • (2021)An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical ArchitecturesEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_11(167-182)Online publication date: 1-Sep-2021
  • (2020)Distributing Sparse Matrix/Graph Applications in Heterogeneous Clusters - an Experimental Study2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00021(72-81)Online publication date: Dec-2020

Index Terms

  1. Topology-induced Enhancement of Mappings

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
    August 2018
    945 pages
    ISBN:9781450365109
    DOI:10.1145/3225058
    © 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    In-Cooperation

    • University of Oregon: University of Oregon

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Hamming distance
    2. multi-hierarchical mapping
    3. parallel communication optimization
    4. partial cube

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPP 2018

    Acceptance Rates

    ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical ArchitecturesEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_11(167-182)Online publication date: 1-Sep-2021
    • (2020)Distributing Sparse Matrix/Graph Applications in Heterogeneous Clusters - an Experimental Study2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00021(72-81)Online publication date: Dec-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media