More Web Proxy on the site http://driver.im/

Article

Vapor SIMD: Auto-vectorize once, run everywhere

Authors:

Kevin Williams,

Ayal ZaksAuthors Info & Claims

CGO '11: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Pages 151 - 160

Published: 02 April 2011 Publication History

Abstract

Just-in-Time (JIT) compiler technology offers portability while facilitating target- and context-specific specialization. Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficiently target due to resource and budget constraints. We present our design for a synergistic auto-vectorizing compilation scheme. The scheme is composed of an aggressive, generic offline stage coupled with a lightweight, target-specific online stage. Our method leverages the optimized intermediate results provided by the first stage across disparate SIMD architectures from different vendors, having distinct characteristics ranging from different vector sizes, memory alignment and access constraints, to special computational idioms. We demonstrate the effectiveness of our design using a set of kernels that exercise innermost loop, outer loop, as well as straight-line code vectorization, all automatically extracted by the common offline compilation stage. This results in performance comparable to that provided by specialized monolithic offline compilers. Our framework is implemented using open-source tools and standards, thereby promoting interoperability and extendibility.

References

[1]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan and Kaufman, 2002.

Digital Library

[2]

A. Bik. The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.

Digital Library

[3]

R. L. Bocchino, Jr. and V. S. Adve. Vector LLVA: a virtual vector instruction set for media processing. In VEE, pages 46-56, 2006.

Digital Library

[4]

C. J. Newburn et al. Intel array building blocks: a retargetable, dynamic compilation framework. In CGO, 2011.

Digital Library

[5]

N. Clark, A. Hormati, S. Yehia, S. Mahlke, and K. Flautner. Liquid SIMD: Abstracting SIMD hardware using lightweight dynamic mapping. In HPCA'07, pages 216-227, Washington, DC, USA, 2007.

Digital Library

[6]

A. Cohen and E. Rohou. Processor virtualization and split compilation for heterogeneous multicore embedded systems. In DAC, pages 102- 107, June 2010. Special session on Embedded Virtualization.

Digital Library

[7]

R. Costa, A. C. Ornstein, and E. Rohou. CLI back-end in GCC. In GCC Developers' Summit, pages 111-116, Ottawa, Canada, July 2007.

[8]

A. E. Eichenberger, P. Wu, and K. O'brien. Vectorization for SIMD architectures with alignment constraints. In PLDI, June 2004.

Digital Library

[9]

Intel Corporation. Intel Architecture Code Analyzer - User's Guide, 1.1.2 edition, 2009.

[10]

S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 145-156, 2000.

Digital Library

[11]

S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and detecting memory address congruence. In PACT, Sept. 2002.

Digital Library

[12]

C. G. Lee. UTDSP benchmarks. http://www.eecg.- toronto.edu/corinna/DSP/infrastructure/UTDSP.html, 1998.

[13]

J. Li, Q. Zhang, S. Xu, and B. Huang. Optimizing dynamic binary translation for SIMD instructions. In CGO, 2006.

Digital Library

[14]

The Mono Project. http://www.mono-project.com.

[15]

J. Nie, B. Cheng, S. Li, L. Wang, and X.-F. Li. Vectorization for Java. In NPC, pages 3-17, 2010.

Digital Library

[16]

D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, 2006.

Digital Library

[17]

D. Nuzman and A. Zaks. Autovectorization in GCC - two years later. In GCC Developer's summit, June 2006.

[18]

D. Nuzman and A. Zaks. Outer-loop vectorization - revisited for short SIMD architectures. In PACT, Oct. 2008.

Digital Library

[19]

A. Pajuelo, A. González, and M. Valero. Speculative dynamic vectorization. In ISCA, pages 271-280, 2002.

Digital Library

[20]

L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Combined iterative and model-driven optimization in an automatic parallelization framework. In SC, 2010.

Digital Library

[21]

G. Ren, P. Wu, and D. Padua. A preliminary study on the vectorization of multimedia applications for multimedia extensions. In LCPC, Oct. 2003.

[22]

E. Rohou, S. Dyshel, D. Nuzman, I. Rosen, K. Williams, A. Cohen, and A. Zaks. Speculatively vectorized bytecode. In HiPEAC, Heraklion, Greece, Jan. 2011.

Digital Library

[23]

J. Shin, J. Chame, and M. W. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. In PACT, Sept. 2002.

Digital Library

[24]

J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In CGO, Mar. 2005.

Digital Library

[25]

K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In PACT, Sept. 2009.

Digital Library

[26]

P. Wu, A. E. Eichenberger, and A. Wang. Efficient SIMD code generation for runtime alignment. In CGO, Mar. 2005.

Digital Library

[27]

P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An integrated simdization framework using virtual vectors. In ICS, June 2005.

Digital Library

Cited By

Thuerck DWeber NBifulco R(2021)Flynn’s ReconciliationACM Transactions on Architecture and Code Optimization10.1145/345835718:3(1-26)Online publication date: 8-Jun-2021
https://dl.acm.org/doi/10.1145/3458357
Şuşu A(2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3406536
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor F(2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3391898
Show More Cited By

Vapor SIMD: Auto-vectorize once, run everywhere
1. Software and its engineering
  1. Software notations and tools

Recommendations

Advanced SIMD: extending the reach of contemporary SIMD architectures
DATE '14: Proceedings of the conference on Design, Automation & Test in Europe

SIMD extensions have gained widespread acceptance in modern microprocessors as a way to exploit data-level parallelism in general-purpose cores. Popular SIMD architectures (e.g. Intel SSE/AVX) have evolved by adding support for wider registers and ...
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Single-instruction multiple-data (SIMD) accelerators provide an energy-efficient platform to scale the performance of mobile systems while still retaining post-programmability. The central challenge is translating the parallel resources of the SIMD ...
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS '12

Single-instruction multiple-data (SIMD) accelerators provide an energy-efficient platform to scale the performance of mobile systems while still retaining post-programmability. The central challenge is translating the parallel resources of the SIMD ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '11: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

April 2011

324 pages

ISBN:9781612843568

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 April 2011

Check for updates

Qualifiers

Article

Acceptance Rates

CGO '11 Paper Acceptance Rate 28 of 105 submissions, 27%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
318
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Thuerck DWeber NBifulco R(2021)Flynn’s ReconciliationACM Transactions on Architecture and Code Optimization10.1145/345835718:3(1-26)Online publication date: 8-Jun-2021
https://dl.acm.org/doi/10.1145/3458357
Şuşu A(2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3406536
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor F(2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3391898
Zhou RWort GErdős MJones TSartor JNaik MRossbach C(2019)The janus triad: exploiting parallelism through dynamic binary modificationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313812(88-100)Online publication date: 14-Apr-2019
https://dl.acm.org/doi/10.1145/3313808.3313812
Şuşu A(2019)Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector ProcessorProceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing10.1145/3303117.3306166(1-8)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3303117.3306166
Liu YHong DWu JFu SHsu W(2019)Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/330148816:1(1-24)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3301488
Pérard-Gayot AMembarth RSlusallek PMoll SLeißa RHack S(2018)A Data Layout Transformation for Vectorizing CompilersProceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing10.1145/3178433.3178440(1-8)Online publication date: 24-Feb-2018
https://dl.acm.org/doi/10.1145/3178433.3178440
Hong DLiu YFu SWu JHsu W(2018)Improving SIMD Parallelism via Dynamic Binary TranslationACM Transactions on Embedded Computing Systems10.1145/317345617:3(1-27)Online publication date: 12-Feb-2018
https://dl.acm.org/doi/10.1145/3173456
Spampinato DFabregat-Traver DBientinesi PPüschel MKnoop JSchordan MJohnson TO'Boyle M(2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168812(327-339)Online publication date: 24-Feb-2018
https://dl.acm.org/doi/10.1145/3168812
Manilov SFranke BMagrath AAndrieu C(2016)Free RiderACM Transactions on Embedded Computing Systems10.1145/299019416:2(1-24)Online publication date: 12-Dec-2016
https://dl.acm.org/doi/10.1145/2990194
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten