[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2190025.2190062acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Vapor SIMD: Auto-vectorize once, run everywhere

Published: 02 April 2011 Publication History

Abstract

Just-in-Time (JIT) compiler technology offers portability while facilitating target- and context-specific specialization. Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficiently target due to resource and budget constraints. We present our design for a synergistic auto-vectorizing compilation scheme. The scheme is composed of an aggressive, generic offline stage coupled with a lightweight, target-specific online stage. Our method leverages the optimized intermediate results provided by the first stage across disparate SIMD architectures from different vendors, having distinct characteristics ranging from different vector sizes, memory alignment and access constraints, to special computational idioms. We demonstrate the effectiveness of our design using a set of kernels that exercise innermost loop, outer loop, as well as straight-line code vectorization, all automatically extracted by the common offline compilation stage. This results in performance comparable to that provided by specialized monolithic offline compilers. Our framework is implemented using open-source tools and standards, thereby promoting interoperability and extendibility.

References

[1]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan and Kaufman, 2002.
[2]
A. Bik. The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
[3]
R. L. Bocchino, Jr. and V. S. Adve. Vector LLVA: a virtual vector instruction set for media processing. In VEE, pages 46-56, 2006.
[4]
C. J. Newburn et al. Intel array building blocks: a retargetable, dynamic compilation framework. In CGO, 2011.
[5]
N. Clark, A. Hormati, S. Yehia, S. Mahlke, and K. Flautner. Liquid SIMD: Abstracting SIMD hardware using lightweight dynamic mapping. In HPCA'07, pages 216-227, Washington, DC, USA, 2007.
[6]
A. Cohen and E. Rohou. Processor virtualization and split compilation for heterogeneous multicore embedded systems. In DAC, pages 102- 107, June 2010. Special session on Embedded Virtualization.
[7]
R. Costa, A. C. Ornstein, and E. Rohou. CLI back-end in GCC. In GCC Developers' Summit, pages 111-116, Ottawa, Canada, July 2007.
[8]
A. E. Eichenberger, P. Wu, and K. O'brien. Vectorization for SIMD architectures with alignment constraints. In PLDI, June 2004.
[9]
Intel Corporation. Intel Architecture Code Analyzer - User's Guide, 1.1.2 edition, 2009.
[10]
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 145-156, 2000.
[11]
S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and detecting memory address congruence. In PACT, Sept. 2002.
[12]
C. G. Lee. UTDSP benchmarks. http://www.eecg.- toronto.edu/corinna/DSP/infrastructure/UTDSP.html, 1998.
[13]
J. Li, Q. Zhang, S. Xu, and B. Huang. Optimizing dynamic binary translation for SIMD instructions. In CGO, 2006.
[14]
The Mono Project. http://www.mono-project.com.
[15]
J. Nie, B. Cheng, S. Li, L. Wang, and X.-F. Li. Vectorization for Java. In NPC, pages 3-17, 2010.
[16]
D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, 2006.
[17]
D. Nuzman and A. Zaks. Autovectorization in GCC - two years later. In GCC Developer's summit, June 2006.
[18]
D. Nuzman and A. Zaks. Outer-loop vectorization - revisited for short SIMD architectures. In PACT, Oct. 2008.
[19]
A. Pajuelo, A. González, and M. Valero. Speculative dynamic vectorization. In ISCA, pages 271-280, 2002.
[20]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Combined iterative and model-driven optimization in an automatic parallelization framework. In SC, 2010.
[21]
G. Ren, P. Wu, and D. Padua. A preliminary study on the vectorization of multimedia applications for multimedia extensions. In LCPC, Oct. 2003.
[22]
E. Rohou, S. Dyshel, D. Nuzman, I. Rosen, K. Williams, A. Cohen, and A. Zaks. Speculatively vectorized bytecode. In HiPEAC, Heraklion, Greece, Jan. 2011.
[23]
J. Shin, J. Chame, and M. W. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. In PACT, Sept. 2002.
[24]
J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In CGO, Mar. 2005.
[25]
K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In PACT, Sept. 2009.
[26]
P. Wu, A. E. Eichenberger, and A. Wang. Efficient SIMD code generation for runtime alignment. In CGO, Mar. 2005.
[27]
P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An integrated simdization framework using virtual vectors. In ICS, June 2005.

Cited By

View all
  • (2021)Flynn’s ReconciliationACM Transactions on Architecture and Code Optimization10.1145/345835718:3(1-26)Online publication date: 8-Jun-2021
  • (2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
  • (2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
  • Show More Cited By
  1. Vapor SIMD: Auto-vectorize once, run everywhere

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '11: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
    April 2011
    324 pages
    ISBN:9781612843568

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 02 April 2011

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    CGO '11 Paper Acceptance Rate 28 of 105 submissions, 27%;
    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Flynn’s ReconciliationACM Transactions on Architecture and Code Optimization10.1145/345835718:3(1-26)Online publication date: 8-Jun-2021
    • (2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
    • (2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
    • (2019)The janus triad: exploiting parallelism through dynamic binary modificationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313812(88-100)Online publication date: 14-Apr-2019
    • (2019)Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector ProcessorProceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing10.1145/3303117.3306166(1-8)Online publication date: 16-Feb-2019
    • (2019)Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/330148816:1(1-24)Online publication date: 13-Feb-2019
    • (2018)A Data Layout Transformation for Vectorizing CompilersProceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing10.1145/3178433.3178440(1-8)Online publication date: 24-Feb-2018
    • (2018)Improving SIMD Parallelism via Dynamic Binary TranslationACM Transactions on Embedded Computing Systems10.1145/317345617:3(1-27)Online publication date: 12-Feb-2018
    • (2018)Program generation for small-scale linear algebra applicationsProceedings of the 2018 International Symposium on Code Generation and Optimization10.1145/3168812(327-339)Online publication date: 24-Feb-2018
    • (2016)Free RiderACM Transactions on Embedded Computing Systems10.1145/299019416:2(1-24)Online publication date: 12-Dec-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media