More Web Proxy on the site http://driver.im/

research-article

Open access

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

Authors:

Ignacio Laguna,

Katarzyna Swirydowicz,

Ganesh GopalakrishnanAuthors Info & Claims

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

Pages 59 - 71

https://doi.org/10.1145/3588195.3592991

Published: 07 August 2023 Publication History

Abstract

Floating-point exceptions occurring during numerical computations can be a serious threat to the validity of the computed results if they are not caught and diagnosed Unfortunately, on NVIDIA GPUs-today's most widely used types and which do not have hardware exception traps-this task must be carried out in software. Given the prevalence of closed-source kernels, efficient binary-level exception tracking is essential. It is also important to know how exceptions flow through the code, whether they alter the code behavior and additionally whether these exceptions can be detected at the program outputs or are killed inside program flow-paths.

In this paper, we introduce GPU-FPX, a tool that has low overhead, allows for deep understanding of the origin and flow of exceptions, and also how exceptions are modified by code optimizations. We measure GPU-FPX's performance over 151 widely used GPU programs coming from HPC and ML, detecting 26 serious exceptions that were previously not reported. Our results show that GPU-FPX is 16× faster with respect to the geometric-mean runtime in relation to the only comparable prior tool, while also helping debug a larger class of codes more effectively.

References

[1]

2022. CUDA C Programming Guide, v12. https://docs.nvidia.com/cuda/floating-point/index.html. Online; accessed March, 30, 2022.

[2]

2022. NVIDIA Deep Learning Performance. https://docs.nvidia.com/deeplearning/performance/. Online; accessed March, 30, 2022.

[3]

Syed Ahmed, Christian Sarofeen, Mike Ruberry, Eddie Yan, Natalia Gimelshein, Michael Carilli, Szymon Migacz, Piotr Bialecki, Paulius Micikevicius, Dusan Stosic, Dong Yang, and Naoya Maruyama. 2022. https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/.

[4]

AMD. 2015. FLOATING-POINT ARITHMETIC IN AMD PROCESSORS. https://community.amd.com/t5/opencl/amd-gpus-ieee-754-compliance/td-p/98382. Accessed: 2023-04--10.

[5]

NVIDIA Corporation. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf

[6]

Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, and Pavel Panchekha. 2020. Scalable yet Rigorous Floating-Point Error Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC '20). IEEE Press, Article 51, 14 pages.

Digital Library

[7]

Marc Daumas and Guillaume Melquiond. 2010. Certification of Bounds on Expressions Involving Rounded Operators. ACM Trans. Math. Software 37, 1, Article 2 (2010), 20 pages.

[8]

David Delmas, Eric Goubault, Sylvie Putot, Jean Souyris, Karim Tekkal, and Franck Védrine. 2009. Towards an Industrial Use of FLUCTUAT on Safety-Critical Avionics Software. In Formal Methods for Industrial Critical Systems, FMICS 2009. Lecture Notes in Computer Science, Vol. 5825. Springer Berlin Heidelberg, 53--69. https://doi.org/10.1007/978--3--642-04570--7_6

[9]

James Demmel, Jack Dongarra, Mark Gates, Greg Henry, Julien Langou, Xiaoye Li, Piotr Luszczek, Weslley Pereira, Jason Riedy, and Cindy Rubio-González. 2022. Proposed Consistent Exception Handling for the BLAS and LAPACK. arXiv preprint arXiv:2207.09281 (2022).

[10]

Peter Dinda, Alex Bernat, and Conor Hetland. 2020. Spying on the floating point behavior of existing, unmodified scientific applications. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. 5--16.

Digital Library

[11]

Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li, Pavel Panchekha, Cindy Rubio-González, and Zachary Tatlock. 2021. Guarding Numerics Amidst Rising Heterogeneity. In Correctness 2021: Fifth International Workshop on Software Correctness for HPC Applications. https://correctness-workshop.github.io/2021/.

[12]

IEEE 754 Working Group et al . 2019. IEEE Standard for Floating-Point Arithmetic. IEEE Std (2019), 754--2008.

[13]

Ari B. Hayes, Fei Hua, Jin Huang, Yanhao Chen, and Eddy Z. Zhang. 2019. Decoding CUDA Binary. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 229--241. https://doi.org/10.1109/CGO.2019.8661186

[14]

David G. Hough. 2019. The IEEE Standard 754: One for the History Books. Computer 52, 12 (2019), 109--112. https://doi.org/10.1109/MC.2019.2926614

[15]

2008. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754--2008 (2008), 1--70. https://doi.org/10.1109/IEEESTD.2008.4610935

[16]

Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. https://doi.org/10.48550/ARXIV.1903.07486

[17]

Ignacio Laguna. 2019. FPChecker: Detecting Floating-Point Exceptions in GPU Applications. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (San Diego, California) (ASE '19). IEEE Press, 1126--1129. https://doi.org/10.1109/ASE.2019.00118

Digital Library

[18]

Ignacio Laguna and Ganesh Gopalakrishnan. 2022. Finding Inputs that Trigger Floating-Point Exceptions in GPUs via Bayesian Optimization. In Supercomputing.

[19]

Ignacio Laguna, Xinyi Li, and Ganesh Gopalakrishnan. 2022. BinFPE: Accurate Floating-Point Exception Detection for GPU Applications. In Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Pro- gram Analysis (San Diego, CA, USA) (SOAP 2022). Association for Computing Machinery, New York, NY, USA, 1--8. https://doi.org/10.1145/3520313.3534655

Digital Library

[20]

Ignacio Laguna, Tanmay Tirpankar, Xinyi Li, and Ganesh Gopalakrishnan. 2022. FPChecker: Floating-Point Exception Detection Tool and Benchmark for Parallel and Distributed HPC. In 2022 IEEE International Symposium on Workload Characterization (IISWC). 39--50. https://doi.org/10.1109/IISWC55918.2022.00014

[21]

Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, and Yoav Artzi. 2018. Simple Recurrent Units for Highly Parallelizable Recurrence. In Empirical Methods in Natural Language Processing (EMNLP).

[22]

Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-Driven Transparent Approximation Acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS '16). Association for Computing Machinery, New York, NY, USA, Article 15, 14 pages. https://doi.org/10.1145/2925426.2926255

Digital Library

[23]

NVIDIA. 2022. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html. Online; accessed March, 30, 2022.

[24]

Alexey Solovyev. 2017. TOPLAS FPTaylor Results Table. Retrieved October 10, 2017 from http://tinyurl.com/TOPLAS-FPTaylor-Results-Table

[25]

Laura Titolo, Marco A. Feliú, Mariano Moscato, and César A. Muñoz. 2017. An Abstract Interpretation Framework for the Round-Off Error Analysis of Floating- Point Programs. In Lecture Notes in Computer Science. Springer International Publishing, 516--537. https://doi.org/10.1007/978--3--319--73721--8_24

[26]

Oreste Villa, Mark Stephenson, David Nellans, and Stephen W Keckler. 2019. Nvbit: A dynamic binary instrumentation framework for nvidia gpus. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 372--383.

Digital Library

[27]

Nathan Whitehead and Alex Fit-florea. 2022. Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs. https://docs.nvidia.com/cuda/floating-point/index.html

Cited By

Wang JHsu JQin Z(2024)A Comprehensive Analysis of Nvidia's Technological Innovations, Market Strategies, and Future ProspectsInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.34442317:1(1-16)Online publication date: 24-May-2024
https://doi.org/10.4018/IJITSA.344423
Tran ALaguna IGopalakrishnan GMencagli GDazzi PLowenthal DBadia R(2024)FPBOXer: Efficient Input-Generation for Targeting Floating-Point Exceptions in GPU ProgramsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658660(83-93)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658660
Laguna ITran AGopalakrishnan G(2023)Finding inputs that trigger floating-point exceptions in heterogeneous computing via Bayesian optimizationParallel Computing10.1016/j.parco.2023.103042117:COnline publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.parco.2023.103042

Index Terms

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools
  2. Software organization and properties
    1. Extra-functional properties
      1. Software safety

Recommendations

BinFPE: accurate floating-point exception detection for GPU applications
SOAP 2022: Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis

When modern heterogeneous HPC systems perform numerical computations, floating-point exceptional quantities such as NaN and infinity in the GPU context, remain insufficiently handled. This is because commonly used GPUs and the CUDA language have no ...
Automatic detection of floating-point exceptions
POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

It is well-known that floating-point exceptions can be disastrous and writing exception-free numerical programs is very difficult. Thus, it is important to automatically detect such errors. In this paper, we present Ariadne, a practical symbolic ...
Automatic detection of floating-point exceptions
POPL '13

It is well-known that floating-point exceptions can be disastrous and writing exception-free numerical programs is very difficult. Thus, it is important to automatically detect such errors. In this paper, we present Ariadne, a practical symbolic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

August 2023

350 pages

ISBN:9798400701559

DOI:10.1145/3588195

General Chair:
Ali R. Butt
Virginia Tech, USA
,
Program Chairs:
Ningfang Mi
Northeastern University, USA
,
Kyle Chard
University of Chicago & Argonne National Laboratory, USA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

HPDC '23

Sponsor:

HPDC '23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing

June 16 - 23, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
715
Total Downloads

Downloads (Last 12 months)533
Downloads (Last 6 weeks)130

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang JHsu JQin Z(2024)A Comprehensive Analysis of Nvidia's Technological Innovations, Market Strategies, and Future ProspectsInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.34442317:1(1-16)Online publication date: 24-May-2024
https://doi.org/10.4018/IJITSA.344423
Tran ALaguna IGopalakrishnan GMencagli GDazzi PLowenthal DBadia R(2024)FPBOXer: Efficient Input-Generation for Targeting Floating-Point Exceptions in GPU ProgramsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658660(83-93)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658660
Laguna ITran AGopalakrishnan G(2023)Finding inputs that trigger floating-point exceptions in heterogeneous computing via Bayesian optimizationParallel Computing10.1016/j.parco.2023.103042117:COnline publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.parco.2023.103042

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents