[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3078633.3081029acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions

Published: 21 June 2017 Publication History

Abstract

More and more modern processors have been supporting non-contiguous SIMD data accesses. However, translating such instructions has been overlooked in the Dynamic Binary Translation (DBT) area. For example, in the popular QEMU dynamic binary translator, guest memory instructions with strides are emulated by a sequence of scalar instructions, leaving a significant room for performance improvement when the host machines have SIMD instructions available. Structured loads/stores, such as VLDn/VSTn in ARM NEON, are one type of strided SIMD data access instructions. They are widely used in signal processing, multimedia, mathematical and 2D matrix transposition applications. Efficient translation of such structured loads/stores is a critical issue when migrating ARM executables to other ISAs. However, it is quite challenging since not only the translation of structured loads/stores is not trivial, but also the difference between guest and host register configurations must be taken into consideration. In this work, we present the design and implementation of translating structured loads/stores in DBT, including target code generation as well as efficient SIMD register mapping. Our proposed register mapping mechanisms are not limited to handling structured loads/stores, they can be extended to deal with normal SIMD instructions. On a set of OpenCV benchmarks, our QEMU-based system has achieved a maximum speedup of 5.41x, with an average improvement of 2.93x. On a set of BLAS benchmarks, our system has also obtained a maximum speedup of 2.19x and an average improvement of 1.63x.

References

[1]
A. Anderson, A. Malik, and D. Gregg. Automatic vectorization of interleaved data revisited. TACO, 12(4):50, 2016.
[2]
N. Hallou, E. Rohou, P. Clauss, and A. Ketterlin. Dynamic revectorization of binary code. In SAMOS, pages 228–237. IEEE, 2015.
[3]
C. J. Hughes. Single-instruction multiple-data execution. Synthesis Lectures on Computer Architecture, 10(1):1–121, 2015.
[4]
Intel. Intel 64 and ia-32 architectures optimization reference manual. Intel Corporation, Sept, 2016.
[5]
S. Kim and H. Han. Efficient SIMD code generation for irregular kernels. In PPoPP, pages 55–64. ACM, 2012.
[6]
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 59–69. ACM, 2000.
[7]
R. Leupers. Code selection for media processors with SIMD instructions. In DATE, pages 4–8. ACM, 2000.
[8]
L. Michel, N. Fournel, and F. Pétrot. Speeding-up SIMD instructions dynamic binary translation in embedded processor simulation. In DATE, pages 1–4. ACM, 2011.
[9]
D. Naishlos, M. Biberstein, and A. Zaks. Compiler vectorization techniques for disjoint SIMD architectures. Technical report, 2002.
[10]
D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, pages 281–294. IEEE Computer Society, 2006.
[11]
D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI, pages 132–143. ACM, 2006.
[12]
V. Porpodas, A. Magni, and T. M. Jones. Pslp: Padded slp automatic vectorization. In CGO, pages 190–201. IEEE Computer Society, 2015.
[13]
Y. Sui, X. Fan, H. Zhou, and J. Xue. Loop-oriented array-and field-sensitive pointer analysis for automatic SIMD vectorization. In LCTES, pages 41–51. ACM, 2016.
[14]
C. Zheng and C. Thompson. Pa-risc to ia-64: Transparent execution, no recompilation. Computer, 33(3):47–52, 2000.
[15]
H. Zhou and J. Xue. A compiler approach for exploiting partial SIMD parallelism. TACO, 13(1):11, 2016.
[16]
H. Zhou and J. Xue. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In CGO, pages 59–69. ACM, 2016.

Cited By

View all
  • (2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021
  • (2020)More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00043(415-426)Online publication date: Oct-2020
  • (2019)Unleashing the power of learningProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358815(77-89)Online publication date: 10-Jul-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems
June 2017
120 pages
ISBN:9781450350303
DOI:10.1145/3078633
  • General Chair:
  • Vijay Nagarajan,
  • Program Chair:
  • Zili Shao
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dynamic Binary Translation
  2. Register Mapping
  3. SIMD
  4. Structured Load/Store

Qualifiers

  • Research-article

Conference

LCTES '17

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021
  • (2020)More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00043(415-426)Online publication date: Oct-2020
  • (2019)Unleashing the power of learningProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358815(77-89)Online publication date: 10-Jul-2019
  • (2019)Optimizing data permutations in structured loads/stores translation and SIMD register mapping for a cross-ISA dynamic binary translatorJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.07.00898:C(173-190)Online publication date: 1-Sep-2019
  • (2023)Application of Speech Recognition Translator based on Evolutionary Multi-objective Optimization Algorithm2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT)10.1109/EASCT59475.2023.10392501(1-6)Online publication date: 20-Oct-2023
  • (2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media