[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3152821.3152880acmotherconferencesArticle/Chapter ViewAbstractPublication PagesandareConference Proceedingsconference-collections
research-article

Exploiting Parallelism on GPUs and FPGAs with OmpSs

Published: 09 September 2017 Publication History

Abstract

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives specifying task-based parallelism. The Mercurium compiler transforms the code to exploit the parallelism in the SMP host cores, and also to spawn work on CUDA/OpenCL devices, and FPGA accelerators. For the CUDA/OpenCL devices, the programmer needs only to insert the annotations and provide the kernel function to be compiled by the native CUDA/OpenCL compiler. In the case of the FPGAs, OmpSs uses the High-Level Synthesis tools from FPGA vendors to generate the IP configurations for the FPGA. In this paper we present the performance obtained on the matrix multiply benchmark in the Xilinx Zynq Ultrascale+, as a result of using OmpSs on this benchmark.

References

[1]
Intel Corp. 2017. Quartus Prime. (2017). https://www.altera.com/products/design-software/fpga-design/quartus-prime/what-s-new.html
[2]
Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures. Parallel Processing Letters 21, 2 (2011), 173--193.
[3]
Avnet Inc. 2017. Zedboard. (September 2017). http://zedboard.org/product/zedboard
[4]
SECO Inc. 2017. The AXIOM Board. (2017). http://www.axiom-project.eu/2017/02/the-axiom-board-has-arrived/
[5]
Xilinx Inc. 2017. Vivado High-Level Synthesis. (2017). http://www.xilinx.com/hls
[6]
Xilinx Inc. 2017. Xilinx Zynq-7000 All Programmable SoC ZC702 Evaluation Kit. (September 2017). https://www.xilinx.com/products/boards-and-kits/ek-z7-zc702-g.html
[7]
Xilinx Inc. 2017. Xilinx Zynq-7000 All Programmable SoC ZC706 Evaluation Kit. (September 2017). https://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html
[8]
Xilinx Inc. 2017. Zynq Ultrascale+ MPSoC. (2017). https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html
[9]
Stephen Neuendorffer and Fernando Martinez-Vallina. 2013. Building Zynq® Accelerators with Vivado®High Level Synthesis. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '13). ACM, New York, NY, USA, 1--2.
[10]
University of Tennessee. 2017. BLAS - Basic Linear Algebra Subprograms. (2017). http://www.netlib.org/blas/
[11]
Florentino Sainz, Sergi Mateo, Vicenç Beltran, José Luis Bosque, Xavier Martorell, and Eduard Ayguadé. 2014. Leveraging OmpSs to Exploit Hardware Accelerators. In 26th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2014, Paris, France, October 22-24, 2014. 112--119.

Cited By

View all
  • (2024)General-purpose data stream processing on heterogeneous architectures with WindFlowJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104782184:COnline publication date: 1-Feb-2024
  • (2022)OmpSs-2 and OpenACC Interoperation2022 Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD56842.2022.00007(11-21)Online publication date: Nov-2022
  • (2021)Particle-In-Cell Simulation Using Asynchronous TaskingEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_30(482-498)Online publication date: 25-Aug-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems
September 2017
35 pages
ISBN:9781450353632
DOI:10.1145/3152821
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. GPU
  3. OmpSs Programming Model
  4. Parallelism

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Spanish Ministerio de economia y competitividad

Conference

ANDARE '17

Acceptance Rates

ANDARE '17 Paper Acceptance Rate 3 of 4 submissions, 75%;
Overall Acceptance Rate 3 of 4 submissions, 75%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)General-purpose data stream processing on heterogeneous architectures with WindFlowJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104782184:COnline publication date: 1-Feb-2024
  • (2022)OmpSs-2 and OpenACC Interoperation2022 Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD56842.2022.00007(11-21)Online publication date: Nov-2022
  • (2021)Particle-In-Cell Simulation Using Asynchronous TaskingEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_30(482-498)Online publication date: 25-Aug-2021
  • (2020)Design and Preliminary Evaluation of OpenACC Compiler for FPGA with OpenCL and Stream Processing DSLProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3373271.3373274(10-16)Online publication date: 15-Jan-2020
  • (2020)HePREM: A Predictable Execution Model for GPU-based Heterogeneous SoCsIEEE Transactions on Computers10.1109/TC.2020.2980520(1-1)Online publication date: 2020
  • (2019)First Steps in Porting the LFRic Weather and Climate Model to the FPGAs of the EuroExa ArchitectureScientific Programming10.1155/2019/78078602019Online publication date: 13-Oct-2019
  • (2018)Trade-Off of Offloading to FPGA in OpenMP Task-Based ProgrammingEvolving OpenMP for Evolving Architectures10.1007/978-3-319-98521-3_7(96-110)Online publication date: 29-Aug-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media