More Web Proxy on the site http://driver.im/

research-article

Zero-copy I/O processing for low-latency GPU computing

Authors:

Jason Aumiller,

Scott BrandtAuthors Info & Claims

ICCPS '13: Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems

Pages 170 - 178

https://doi.org/10.1145/2502524.2502548

Published: 08 April 2013 Publication History

Abstract

Cyber-physical systems (CPS) aim to monitor and control complex real-world phenomena where the computational cost and real-time constraints could be a major challenge. Many-core hardware accelerators such as graphics processing units (GPUs) promise to enhancing computation, leveraging the data parallelism often found in real-world scenarios of CPS, but performance is limited by the overhead of the data transfer between the host and the device memory. For example, plasma control in the HBT-EP Tokamak device at Columbia University [11, 18] must execute the control algorithm in a few microseconds, but may take tens of microseconds to copy the data set between the host and the device memory. This paper presents a zero-copy I/O processing scheme that maps the I/O address space of the system to the virtual address space of the compute device, allowing sensors and actuators to transfer data to and from the compute device directly. Experiments using the plasma control system show a 33% reduction in computational cost, and microbenchmarks with more generic matrix operations show a 34% reduction, while in both cases, effective data throughput remains at least as good as the current best performers.

References

[1]

G. Elliott and J. Anderson. Globally Scheduled Real-Time Multiprocessor Systems with GPUs. Real-Time Systems, 48(1):34--74, 2012.

Digital Library

[2]

G. Elliott and J. Anderson. Robust Real-Time Multiprocessor Interrupt Handling Motivated by GPUs. In Proc. of the Euromicro Conference on Real-Time Systems, pages 267--276, 2012.

Digital Library

[3]

M. Hirabayashi, S. Kato, M. Edahiro, and Y. Sugiyama. Toward GPU-accelerated traffic simulation and its real-time challenge. In Proc. of the International Workshop on Real-Time and Distributed Computing in Emerging Applications, 2012.

[4]

T. Jablin, P. Prabhu, J. Jablin, N. Johnson, S. Beard, and D. August. Automatic CPU-GPU communication management and optimization. In Proc. of the ACM Conference on Programming Language Design and Implementation, 2011.

Digital Library

[5]

S. Kato, K. Lakshmanan, Y. Ishikawa, and R. Rajkumar. Resource Sharing in GPU-accelerated Windowing Systems. In Proc. of the IEEE Real-Time and Embedded Technology and Aplications Symposium, pages 191--200, 2011.

Digital Library

[6]

S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. Rajkumar. RGEM: A Responsive GPGPU Execution Model for Runtime Engines. In Proc. of the IEEE Real-Time Systems Symposium, pages 57--66, 2011.

Digital Library

[7]

S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments. In Proc. of the USENIX Annual Technical Conference, 2011.

Digital Library

[8]

S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-Class GPU Resource Management in the Operating System. In Proc. of the USENIX Annual Technical Conference, 2012.

Digital Library

[9]

C. Liu, J. Li, W. Huang, J. Rubio, E. Speight, and X. Lin. Power-Efficient Time-Sensitive Mapping in Heterogeneous Systems. In Proc. of the International Conference on Parallel Architectures and Compilation Techniques, 2012.

Digital Library

[10]

R. Mangharam and A. Saba. Anytime Algorithms for GPU Architectures. In Proc. of the IEEE Real-Time Systems Symposium, pages 47--56, 2011.

Digital Library

[11]

D. Maurer, J. Bialek, P. Byrne, B. D. Bono, J. Levesque, and e. a. B. Q. Li. The high beta tokamak-extended pulse magnetohydrodynamic mode control research program. Plasma Physics and Controlled Fusion, 53, 2011.

[12]

M. McNaughton, C. Urmson, J. Dolan, and J.-W. Lee. Motion Planning for Autonomous Driving with a Conformal Spatiotemporal Lattice. In Proc. of the IEE International Conference on Robotics and Automation, pages 4889--4895, 2011.

[13]

Mellanox. NVIDIA GPUDirect Technology--Accelerating GPU-based Systems, 2010.

[14]

P. Michel, J. Chestnutt, S. Kagami, K. Nishiwaki, J. Kuffner, and T. Kanade. GPU-accelerated Real-Time 3D Tracking for Humanoid Locomotion and Stair Climbing. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 463--469, 2007.

[15]

NVIDIA. NVIDIA's next generation CUDA computer architecture: Fermi, 2009.

[16]

NVIDIA. CUDA C Programming Guide Version 4.2, 2012.

[17]

NVIDIA. NVIDIA GeForce GTX 680: The fastest, most efficient GPU ever built, 2012.

[18]

N. Rath, J. Bialek, P. Byrne, B. DeBono, J. Levesque, B. Li, M. Mauel, D. Maurer, G. Navratil, and D. Shiraki. High-speed, multi-input, multi-output control using GPU processing in the HBT-EP tokamak. Fusion Engineering and Design, 2012.

[19]

C. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proc. of the ACM Symposium on Operating Systems Principles, 2011.

Digital Library

Cited By

Verheijen PHaghi MLazar MGoswami D(2023)Parallel Shooting Sequential Quadratic Programming for Nonlinear MPC Problems2023 IEEE Conference on Control Technology and Applications (CCTA)10.1109/CCTA54093.2023.10252893(605-611)Online publication date: 16-Aug-2023
https://doi.org/10.1109/CCTA54093.2023.10252893
TSOG NMUBEEN SSJÖDIN MBRUHN F(2021)A Trade-Off between Computing Power and Energy Consumption of On-Board Data Processing in GPU Accelerated In-Orbit Space SystemsTRANSACTIONS OF THE JAPAN SOCIETY FOR AERONAUTICAL AND SPACE SCIENCES, AEROSPACE TECHNOLOGY JAPAN10.2322/tastj.19.70019:5(700-708)Online publication date: 2021
https://doi.org/10.2322/tastj.19.700
Rohloff AAllen ZLin KOkrend JNie CLiu YTseng H(2021)OpenUVR: an Open-Source System Framework for Untethered Virtual Reality Applications2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00026(223-236)Online publication date: May-2021
https://doi.org/10.1109/RTAS52030.2021.00026
Show More Cited By

Index Terms

Zero-copy I/O processing for low-latency GPU computing
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments

Recommendations

Enabling zero-copy OpenMP offloading on the PULP many-core accelerator
SCOPES '17: Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems

Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Unified Virtual Memory (UVM). In this paper we ...
Many-core GPU computing with NVIDIA CUDA
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

In the past, graphics processors were special-purpose hardwired application accelerators, suitable only for conventional graphics applications. Modern GPUs are fully programmable, massively parallel floating point processors. In this talk I will ...
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster Computing

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCPS '13: Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems

April 2013

278 pages

ISBN:9781450319966

DOI:10.1145/2502524

General Chair:
Chenyang Lu
Washington University
,
Program Chairs:
P. R. Kumar
Texas A&M University
,
R. Stoleru
Texas A&M University

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE-CS\TCRT: TC on Real-Time Systems
SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICCPS '13

Sponsor:

IEEE-CS\TCRT
SIGBED

ICCPS '13: ACM/IEEE 4th International Conference on Cyber-Physical Systems

April 8 - 11, 2013

Pennsylvania, Philadelphia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
444
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)9

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Verheijen PHaghi MLazar MGoswami D(2023)Parallel Shooting Sequential Quadratic Programming for Nonlinear MPC Problems2023 IEEE Conference on Control Technology and Applications (CCTA)10.1109/CCTA54093.2023.10252893(605-611)Online publication date: 16-Aug-2023
https://doi.org/10.1109/CCTA54093.2023.10252893
TSOG NMUBEEN SSJÖDIN MBRUHN F(2021)A Trade-Off between Computing Power and Energy Consumption of On-Board Data Processing in GPU Accelerated In-Orbit Space SystemsTRANSACTIONS OF THE JAPAN SOCIETY FOR AERONAUTICAL AND SPACE SCIENCES, AEROSPACE TECHNOLOGY JAPAN10.2322/tastj.19.70019:5(700-708)Online publication date: 2021
https://doi.org/10.2322/tastj.19.700
Rohloff AAllen ZLin KOkrend JNie CLiu YTseng H(2021)OpenUVR: an Open-Source System Framework for Untethered Virtual Reality Applications2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00026(223-236)Online publication date: May-2021
https://doi.org/10.1109/RTAS52030.2021.00026
Zhang YXing ZWang YChen LWang QZhu Y(2019)Optimization Methods for Computing System in Mobile CPSProceedings of the 2nd International Conference on Big Data Technologies10.1145/3358528.3358551(300-305)Online publication date: 28-Aug-2019
https://dl.acm.org/doi/10.1145/3358528.3358551
SUZUKI YYAMADA HKATO SKONO K(2018)Cooperative GPGPU Scheduling for Consolidating Server WorkloadsIEICE Transactions on Information and Systems10.1587/transinf.2018EDP7027E101.D:12(3019-3037)Online publication date: 1-Dec-2018
https://doi.org/10.1587/transinf.2018EDP7027
Tseng HZhao QZhou YGahagan MSwanson S(2018)MorpheusACM SIGOPS Operating Systems Review10.1145/3273982.327398952:1(71-83)Online publication date: 28-Aug-2018
https://dl.acm.org/doi/10.1145/3273982.3273989
Suzuki YYamada HKato SKono K(2017)GLoopProceedings of the 2017 Symposium on Cloud Computing10.1145/3127479.3132023(80-93)Online publication date: 24-Sep-2017
https://dl.acm.org/doi/10.1145/3127479.3132023
Zhong RWang MChen ZLiu LLiu YZhang JZhang LMoscibroda T(2017)On Building a Programmable Wireless High-Quality Virtual Reality System Using Commodity HardwareProceedings of the 8th Asia-Pacific Workshop on Systems10.1145/3124680.3124723(1-7)Online publication date: 2-Sep-2017
https://dl.acm.org/doi/10.1145/3124680.3124723
Mu DCicotti PCui YLee EChen PHart D(2017)A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthquake Detection Code for Large Seismic DatasetsPractice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact10.1145/3093338.3093382(1-6)Online publication date: 9-Jul-2017
https://dl.acm.org/doi/10.1145/3093338.3093382
Suzuki YFujii YAzumi TNishio NKato S(2017)Real-Time GPU Resource Management with Loadable Kernel ModulesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263069728:6(1715-1727)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1109/TPDS.2016.2630697
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents