[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3078659.3079071acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscopesConference Proceedingsconference-collections
short-paper

Enabling zero-copy OpenMP offloading on the PULP many-core accelerator

Published: 12 June 2017 Publication History

Abstract

Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Unified Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4x faster execution compared to traditional copy-based offload mechanisms.

References

[1]
AMD Inc. 2016. AMD I/O virtualization technology (IOMMU) specification. (2016). https://support.amd.com/TechDocs/48882_IOMMU.pdf
[2]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems 30, 1 (1998), 107--117.
[3]
A. Capotondi et al. 2016. Runtime Support for Multiple Offload-Based Programming Models on Clustered Manycore Accelerators. IEEE Transactions on Emerging Topics in Computing PP, 99 (2016), 1--1.
[4]
A. Marongiu et al. 2015. Simplifying many-core-based heterogeneous SoC programming with offload directives. IEEE Transactions on Industrial Informatics 11, 4 (2015), 957--967.
[5]
A. Marongiu et al. 2016. Controlling NUMA effects in embedded manycore applications with lightweight nested parallelism support. Parallel Comput. 59 (2016), 24--42.
[6]
D. Rossi et al. 2015. PULP: A parallel ultra low power platform for next generation IoT applications. In Hot Chips 27 Symposium (HCS). IEEE, 1--39.
[7]
D. Rossi et al. 2016. A 60 GOPS/W- 1.8 V to 0.9 V body bias ULP cluster in 28nm UTBB FD-SOI technology. Solid-State Electronics 117 (2016), 170--184.
[8]
J. Balkind et al. 2016. OpenPiton: An Open Source Manycore Research Framework. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, 217--232.
[9]
P. Vogel et al. 2016. Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs. IEEE Transactions on Parallel and Distributed Systems PP, 99 (2016), 1--1.
[10]
S. Che et al. 2016. Challenges of Programming a System with Heterogeneous Memories and Heterogeneous Processors: A Programmer's View. In Proceedings of the Second International Symposium on Memory Systems. ACM, 99--103.
[11]
X. Tian et al. 2013. Compiling a high-level directive-based programming model for GPGPUs. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 105--120.
[12]
Free Software Foundation, Inc. 2017. GCC 5 Release Series. Changes, New Features, and Fixes. (2017). https://gcc.gnu.org/gcc-5/changes.html
[13]
Juergen Gall and Victor Lempitsky. 2013. Class-specific hough forests for object detection. In Decision forests for computer vision and medical image analysis. Springer, 143--157.
[14]
HSA Foundation. 2016. HSA Platform System Architecture Specification. (2016). http://www.hsafoundation.com/?ddownload=5114
[15]
Intel Corp. 2016. Intel virtualization technology for directed I/O. (2016). http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
[16]
Kalray S.A. 2014. Kalray MPPA Manycore 256. (2014). http://www.kalrayinc.com/IMG/pdf/FLYER_MPPA_MANYCORE.pdf
[17]
Nvidia Inc. 2014. Nvidia Tegra X1. (2014). http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf
[18]
OpenMP ARB. 2013. OpenMP 4.0 Application Program Interface. (2013). http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf

Cited By

View all
  • (2021)A RISC-V-based FPGA Overlay to Simplify Embedded Accelerator Deployment2021 24th Euromicro Conference on Digital System Design (DSD)10.1109/DSD53832.2021.00011(9-17)Online publication date: Sep-2021
  • (2020)Mixed-data-model heterogeneous compilation and OpenMP offloadingProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377891(119-131)Online publication date: 22-Feb-2020
  • (2019)DISHM: A Zero-Copy Intra-Node Communication Approach in Large Scale Simulation2019 IEEE 19th International Conference on Communication Technology (ICCT)10.1109/ICCT46805.2019.8947207(578-582)Online publication date: Oct-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SCOPES '17: Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems
June 2017
100 pages
ISBN:9781450350396
DOI:10.1145/3078659
  • Editor:
  • Sander Stuijk
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Heterogeneous embedded many-core System on Chip
  2. Lightweight IOMMU
  3. OpenMP4.0
  4. Parallel Programming Models

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

SCOPES '17

Acceptance Rates

SCOPES '17 Paper Acceptance Rate 6 of 9 submissions, 67%;
Overall Acceptance Rate 38 of 79 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A RISC-V-based FPGA Overlay to Simplify Embedded Accelerator Deployment2021 24th Euromicro Conference on Digital System Design (DSD)10.1109/DSD53832.2021.00011(9-17)Online publication date: Sep-2021
  • (2020)Mixed-data-model heterogeneous compilation and OpenMP offloadingProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377891(119-131)Online publication date: 22-Feb-2020
  • (2019)DISHM: A Zero-Copy Intra-Node Communication Approach in Large Scale Simulation2019 IEEE 19th International Conference on Communication Technology (ICCT)10.1109/ICCT46805.2019.8947207(578-582)Online publication date: Oct-2019
  • (2018)HEROProceedings of the 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems10.1145/3295816.3295821(1-6)Online publication date: 4-Nov-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media