More Web Proxy on the site http://driver.im/

research-article

Efficient SIMD code generation for irregular kernels

Authors:

Hwansoo HanAuthors Info & Claims

PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Pages 55 - 64

https://doi.org/10.1145/2145816.2145824

Published: 25 February 2012 Publication History

Abstract

Array indirection causes several challenges for compilers to utilize single instruction, multiple data (SIMD) instructions. Disjoint memory references, arbitrarily misaligned memory references, and dependence cycles in loops are main challenges to handle for SIMD compilers. Due to those challenges, existing SIMD compilers have excluded loops with array indirection from their candidate loops for SIMD vectorization. However, addressing those challenges is inevitable, since many important compute-intensive applications extensively use array indirection to reduce memory and computation requirements. In this work, we propose a method to generate efficient SIMD code for loops containing indirected memory references. We extract both inter- and intra-iteration parallelism, taking data reorganization overhead into consideration. We also optimally place data reorganization code in order to amortize the reorganization overhead through the performance gain of SIMD vectorization. Experiments on four array indirection kernels, which are extracted from real-world scientific applications, show that our proposed method effectively generates SIMD code for irregular kernels with array indirection. Compared to the existing SIMD vectorization methods, our proposed method significantly improves the performance of irregular kernels by 91%, on average.

References

[1]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2002.

Digital Library

[2]

R. Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, pages 201--212, 2010.

Digital Library

[3]

H. Chang and W. Sung. Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES '08, pages 167--176, 2008.

Digital Library

[4]

R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. J. Parallel Distrib. Comput., 22: 462--478, Sep. 1994.

Digital Library

[5]

K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales. AltiVec extension to PowerPC accelerates media processing. IEEE Micro, 20: 85--95, Mar./Apr. 2000.

Digital Library

[6]

A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, PLDI '04, pages 82--93, 2004.

Digital Library

[7]

T. Grosser, H. Zheng, R. A, A. Simburger, A. Grosslinger, and L.-N. Pouchet. Polly - polyhedral optimization in llvm. In First International Workshop on Polyhedral Compilation Techniques (IMPACT'11), 2011.

[8]

M. Gschwind, H. P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in Cell's multicore architecture. IEEE Micro, 26: 10--24, Mar. 2006.

Digital Library

[9]

J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34: 1--17, Sep. 2006.

Digital Library

[10]

A. Krall and S. Lelait. Compilation techniques for multimedia processors. Int. J. Parallel Program., 28: 347--361, Aug. 2000.

[11]

S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 145--156, 2000.

Digital Library

[12]

S. Larsen, R. Rabbah, and S. Amarasinghe. Exploiting vector parallelism in software pipelined loops. In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 38, pages 119--129, 2005.

Digital Library

[13]

C. Lattner. Macroscopic Data Structure Analysis and Optimization. PhD thesis, Computer Science Dept., University of Illinois at Urbana-Champaign, Urbana, IL, May 2005. {online} http://llvm.cs.uiuc.edu.

Digital Library

[14]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.

Digital Library

[15]

R. Leupers. Code selection for media processors with SIMD instructions. In Proceedings of the conference on Design, Automation and Test in Europe, DATE '00, pages 4--8, 2000.

Digital Library

[16]

D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a SIMdD DSP architecture. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '03, pages 2--11, 2003.

Digital Library

[17]

D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, pages 132--143, 2006.

Digital Library

[18]

I. Pryanishnikov, A. Krall, T. U. Wien, and N. Horspool. Pointer alignment analysis for processors with SIMD instructions. In Proceedings of the 5th Workshop on Media and Streaming Processors, pages 50--57, 2003.

[19]

G. Ren, P. Wu, and D. Padua. A preliminary study on the vectorization of multimedia applications for multimedia extensions. In Languages and Compilers for Parallel Computing, volume 2958 of Lecture Notes in Computer Science, pages 420--435. 2004.

[20]

G. Ren, P. Wu, and D. Padua. Optimizing data permutations for SIMD devices. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, pages 118--131, 2006.

Digital Library

[21]

I. Rosen, D. Nuzman, and A. Zaks. Loop-aware SLP in GCC. In Proceedings of GCC Developers' Summit, pages 131--142, 2007.

[22]

J. Shalf, S. Dosanjh, and J. Morrison. Exascale computing technology challenges. In Proc. International Meeting on High Performance Computing for Computational Science, volume 6449 of Lecture Notes in Computer Science, pages 1--25, 2011.

Digital Library

[23]

N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. Int. J. Parallel Program., 28: 363--400, Aug. 2000.

[24]

R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1 (2): 146--160, 1972.

Digital Library

[25]

D. Walls. How to use the restrict qualifier in C. Sun Microsystems, Sun Developer Network (SDN), March 2006. {online} http://developers.sun.com/.

[26]

P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An integrated simdization framework using virtual vectors. In Proceedings of the 19th annual International Conference on Supercomputing, ICS '05, pages 169--178, 2005.

Digital Library

Cited By

Liu BLaird ATsang WMahjour BDehnavi MKloeckner AMoreira J(2022)Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569663(439-450)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569663
Jiang PXia YAgrawal G(2020)Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative SpeculationACM Transactions on Parallel Computing10.1145/33997147:3(1-26)Online publication date: 21-Jun-2020
https://dl.acm.org/doi/10.1145/3399714
Liu YHong DWu JFu SHsu W(2019)Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/330148816:1(1-24)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3301488
Show More Cited By

Index Terms

Efficient SIMD code generation for irregular kernels
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Efficient SIMD code generation for irregular kernels
PPOPP '12

Array indirection causes several challenges for compilers to utilize single instruction, multiple data (SIMD) instructions. Disjoint memory references, arbitrarily misaligned memory references, and dependence cycles in loops are main challenges to ...
Distributed memory code generation for mixed Irregular/Regular computations
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Many applications feature a mix of irregular and regular computational structures. For example, codes using adaptive mesh refinement (AMR) typically use a collection of regular blocks, where the number of blocks and the relationship between blocks is ...
Distributed memory code generation for mixed Irregular/Regular computations
PPoPP '15

Many applications feature a mix of irregular and regular computational structures. For example, codes using adaptive mesh refinement (AMR) typically use a collection of regular blocks, where the number of blocks and the relationship between blocks is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

February 2012

352 pages

ISBN:9781450311601

DOI:10.1145/2145816

General Chair:
J. Ramanujam
Louisiana State University, USA
,
Program Chair:
P. Sadayappan
The Ohio State University, USA

ACM SIGPLAN Notices Volume 47, Issue 8
PPOPP '12
August 2012
334 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2370036
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '12

Sponsor:

SIGPLAN

PPoPP '12: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 25 - 29, 2012

Louisiana, New Orleans, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
901
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu BLaird ATsang WMahjour BDehnavi MKloeckner AMoreira J(2022)Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569663(439-450)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569663
Jiang PXia YAgrawal G(2020)Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative SpeculationACM Transactions on Parallel Computing10.1145/33997147:3(1-26)Online publication date: 21-Jun-2020
https://dl.acm.org/doi/10.1145/3399714
Liu YHong DWu JFu SHsu W(2019)Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/330148816:1(1-24)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3301488
Pohl ACosenza BJuurlink B(2018)Control Flow Vectorization for ARM NEONProceedings of the 21st International Workshop on Software and Compilers for Embedded Systems10.1145/3207719.3207721(66-75)Online publication date: 28-May-2018
https://dl.acm.org/doi/10.1145/3207719.3207721
GAO WHAN LZHAO RLI YLIU J(2017)Insufficient Vectorization: A New Method to Exploit Superword Level ParallelismIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7236E100.D:1(91-106)Online publication date: 2017
https://doi.org/10.1587/transinf.2016EDP7236
Dai DZhang WChen Y(2017)POSTERACM SIGPLAN Notices10.1145/3155284.301903752:8(439-440)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3019037
Awan AHamidouche KHashmi JPanda D(2017)S-CaffeACM SIGPLAN Notices10.1145/3155284.301876952:8(193-205)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018769
Ren BKrishnamoorthy SAgrawal KKulkarni M(2017)Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel ProgramsACM SIGPLAN Notices10.1145/3155284.301876352:8(117-130)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018763
Acar UBen-David NRainey M(2017)Contention in Structured ConcurrencyACM SIGPLAN Notices10.1145/3155284.301876252:8(75-88)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018762
Jiang PAgrawal G(2017)Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative SpeculationACM SIGPLAN Notices10.1145/3155284.301876052:8(179-191)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018760
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents