article

Relaxing SIMD control flow constraints using loop transformations

Authors:

Reinhard v. Hanxleden,

Ken KennedyAuthors Info & Claims

ACM SIGPLAN Notices, Volume 27, Issue 7

Pages 188 - 199

https://doi.org/10.1145/143103.143133

Published: 01 July 1992 Publication History

Abstract

Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop nest on a SIMD machine, the SIMD-inherent restriction to single program counter common to all processors will cause a performance degradation relative to comparable MIMD implementations. This problem is not due to limited parallelism or bad load balance, it is merely a problem of control flow.

This paper presents a loop transformation, which we call loop flattening, that overcomes this limitation by letting each processor advance to the next loop iteration containing useful computation, if there is such an iteration for the given processor. We study a concrete example derived from a molecular dynamics code and compare performance results for flattened and unflattened versions of this kernel on two SIMD machines, the CM-2 and the DECmpp 12000. We then evaluate loop flattening from the compiler's perspective in terms of applicability, cost, profitability, and safety. We conclude with arguing that loop flattening, whether performed by the programmer or by the compiler, introduces negligible overhead and can significantly improve the performance of scientific codes for solving irregular problems.

Cited By

View all

Carminati AStarke Rde Oliveira R(2017)Combining loop unrolling strategies and code predication to reduce the worst-case execution time of real-time softwareApplied Computing and Informatics10.1016/j.aci.2017.03.00213:2(184-193)Online publication date: Jul-2017
https://doi.org/10.1016/j.aci.2017.03.002
Sanders P(1994)Emulating MIMD Behavior on SIMD MachinesMassively Parallel Processing Applications and Development10.1016/B978-0-444-81784-6.50042-7(313-320)Online publication date: 1994
https://doi.org/10.1016/B978-0-444-81784-6.50042-7
Philippsen MWarschko TTichy WHerter CHeinz ELukowicz P(1994)Project Triton: Towards Improved Programmability of Parallel ComputersThe Interaction of Compilation Technology and Computer Architecture10.1007/978-1-4615-2684-1_10(249-281)Online publication date: 1994
https://doi.org/10.1007/978-1-4615-2684-1_10
Show More Cited By

Index Terms

Relaxing SIMD control flow constraints using loop transformations

Recommendations

Relaxing SIMD control flow constraints using loop transformations
PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation

Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop nest on a SIMD machine, the SIMD-...
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-...
Combining loop transformations considering caches and scheduling
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

The performance of modern microprocessors is greatly affected by cache behavior, instruction scheduling, register allocation and loop overhead. High level loop transformations such as fission, fusion, tiling, interchanging and outer loop unrolling (e.g.,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACM SIGPLAN Notices Volume 27, Issue 7

July 1992

352 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/143103

Editor:
Richard Wexelblat
IDA/CDED, Alexandria, VA

Issue’s Table of Contents

PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
July 1992
352 pages
ISBN:0897914759
DOI:10.1145/143095
Chairman:
Stuart I. Feldman
Bell Communications Research, Morristown, NJ
,
Editor:
Richard L. Wexelblat

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 1992

Published in SIGPLAN Volume 27, Issue 7

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Carminati AStarke Rde Oliveira R(2017)Combining loop unrolling strategies and code predication to reduce the worst-case execution time of real-time softwareApplied Computing and Informatics10.1016/j.aci.2017.03.00213:2(184-193)Online publication date: Jul-2017
https://doi.org/10.1016/j.aci.2017.03.002
Sanders P(1994)Emulating MIMD Behavior on SIMD MachinesMassively Parallel Processing Applications and Development10.1016/B978-0-444-81784-6.50042-7(313-320)Online publication date: 1994
https://doi.org/10.1016/B978-0-444-81784-6.50042-7
Philippsen MWarschko TTichy WHerter CHeinz ELukowicz P(1994)Project Triton: Towards Improved Programmability of Parallel ComputersThe Interaction of Compilation Technology and Computer Architecture10.1007/978-1-4615-2684-1_10(249-281)Online publication date: 1994
https://doi.org/10.1007/978-1-4615-2684-1_10
Mustafa DAlkhasawneh RObeidat FShatnawi A(2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3372990
Rocha RPetoumenos PFranke BBhatotia PO'Boyle M(2022)Loop Rolling for Code Size Reduction2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741256(217-229)Online publication date: 2-Apr-2022
https://doi.org/10.1109/CGO53902.2022.9741256
Hughes C(2015)Single-Instruction Multiple-Data ExecutionSynthesis Lectures on Computer Architecture10.2200/S00647ED1V01Y201505CAC03210:1(1-121)Online publication date: 27-May-2015
https://doi.org/10.2200/S00647ED1V01Y201505CAC032
Haine CAumage OPetit EBarthou D(2015)Exploring and Evaluating Array Layout Restructuring for SIMDizationLanguages and Compilers for Parallel Computing10.1007/978-3-319-17473-0_23(351-366)Online publication date: 1-May-2015
https://doi.org/10.1007/978-3-319-17473-0_23
Ren BMytkowicz TAgrawal G(2014)A Portable Optimization Engine for Accelerating Irregular Data-Traversal Applications on SIMD ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/263221511:2(1-31)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1145/2632215
Govindaraju VNowatzki TSankaralingam KFensch CO'Boyle MSeznec ABodin F(2013)Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDGProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523767(341-352)Online publication date: 7-Oct-2013
https://dl.acm.org/doi/10.5555/2523721.2523767
Ren BPoutanen TMytkowicz TSchulte WAgrawal GLarus J(2013)SIMD parallelization of applications that traverse irregular data structuresProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6494989(1-10)Online publication date: 23-Feb-2013
https://dl.acm.org/doi/10.1109/CGO.2013.6494989
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations