Characterizing GPU Overclocking Faults

Eldad Zuberi¹¹ &
Avishai Wool¹¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12972))

Included in the following conference series:

European Symposium on Research in Computer Security

3849 Accesses

Abstract

Graphics Processing Units (GPUs) are powerful parallel processors that are becoming common on computers. They are used in many high-performance tasks such as crypto-mining and neural-network training. It is common to overclock a GPU to gain performance, however this practice may introduce calculation faults. In our work, we lay the foundations to exploiting these faults, by characterizing their formation and structure. We find that temperature is a contributing factor to the fault rate, but is not the sole cause. We also find that faults are a byte-wide phenomenon: individual bit-flips are rare. Surprisingly, we find that the vast majority of byte faults are in fact byte-flips: all 8 bits are simultaneously negated. Finally, we find strong evidence that faults are triggered by memory-remnant reads at an alignment of a 32 byte memory transaction size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

DYRE: a DYnamic REconfigurable solution to increase GPGPU’s reliability

Article Open access 29 March 2021

Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs

Article Open access 22 February 2024

References

Agoyan, M., Dutertre, J., Mirbaha, A., Naccache, D., Ribotta, A., Tria, A.: Single-bit DFA using multiple-byte laser fault injection. In: 2010 IEEE International Conference on Technologies for Homeland Security (HST), pp. 113–119 (2010)
Google Scholar
Agoyan, M., Dutertre, J.-M., Naccache, D., Robisson, B., Tria, A.: When clocks fail: on critical paths and clock faults. In: Gollmann, D., Lanet, J.-L., Iguchi-Cartigny, J. (eds.) CARDIS 2010. LNCS, vol. 6035, pp. 182–193. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12510-2_13
Chapter Google Scholar
ArchWiki. NVIDIA/Tips and tricks. https://wiki.archlinux.org/index.php/NVIDIA/Tips_and_tricks
Barenghi, A., Bertoni, G.M., Breveglieri, L., Pellicioli, M., Pelosi, G.: Low voltage fault attacks to AES. In: 2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), pp. 7–12. IEEE (2010)
Google Scholar
Bialas, P., Strzelecki, A.: Benchmarking the cost of thread divergence in CUDA. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 570–579. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_53
Chapter Google Scholar
Biham, E., Shamir, A.: Differential fault analysis of secret key cryptosystems. In: Kaliski, B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 513–525. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0052259
Chapter Google Scholar
Boneh, D., DeMillo, R.A., Lipton, R.J.: On the importance of checking cryptographic protocols for faults. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 37–51. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-69053-0_4
Chapter Google Scholar
Nvidia developer forum. Unified Memory vs Pinned Memory. https://forums.developer.nvidia.com/t/unified-memory-vs-pinned-host-memory-vs-gpu-global-memory/34640
Dusart, P., Letourneux, G., Vivolo, O.: Differential fault analysis on A.E.S. In: Zhou, J., Yung, M., Han, Y. (eds.) ACNS 2003. LNCS, vol. 2846, pp. 293–306. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45203-4_23
Chapter Google Scholar
Ekbote, B., Hire, V., Mahajan, P., Sisodia, J.: Blockchain based remittances and mining using CUDA. In: 2017 International Conference On Smart Technologies for Smart Nation (SmartTechCon), pp. 908–911. IEEE (2017)
Google Scholar
Nvidia Forum. Run CUDA on dedicated GPU. https://forums.developer.nvidia.com/t/solved-run-cuda-on-dedicated-nvidia-gpu-while-connecting-monitors-to-intel-hd-graphics-is-this-possible/47690/2/
Gawande, N.A., Daily, J.A., Siegel, C., Tallent, N.R., Vishnu, A.: Scaling deep learning workloads: NVIDIA DGX-1/Pascal and intel knights landing. Future Gener. Comput. Syst. 108, 1162–1172 (2020)
Article Google Scholar
Giraud, C.: DFA on AES. In: Dobbertin, H., Rijmen, V., Sowa, A. (eds.) AES 2004. LNCS, vol. 3373, pp. 27–41. Springer, Heidelberg (2005). https://doi.org/10.1007/11506447_4
Chapter Google Scholar
Gratchoff, J.: Proving the wild jungle jump. Technical report, University of Amsterdam (2015). https://homepages.staff.os3.nl/~delaat/rp/2014-2015/p48/report.pdf
Harris, M.: Unified Memory in CUDA 6. https://developer.nvidia.com/blog/unified-memory-in-cuda-6/
integralfx. DDR4 Overclocking Guide. https://github.com/integralfx/MemTestHelper/blob/master/DDR4
Jiang, Z.H., Fei, Y., Kaeli, D.: A complete key recovery timing attack on a GPU. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 394–405. IEEE (2016)
Google Scholar
Kemal. Scripts to overclock & start bitcoin miners on boot. https://gist.github.com/disq/995082
Kovacs, B.: Nvidia overclock scripts. https://github.com/brandonkovacs/nvidia-overclock-scripts
Landaverde, R., Zhang, T., Coskun, A.K., Herbordt, M.: An investigation of unified memory access performance in CUDA. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)
Google Scholar
Lapid, B., Wool, A.: Cache-attacks on the ARM TrustZone implementations of AES-256 and AES-256-GCM via GPU-based analysis. In: Cid, C., Jacobson Jr. M. (eds.) SAC 2018. LNCS, vol. 11349, pp. 235–256. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10970-7_11
Lee, S., Kim, Y., Kim, J., Kim, J.: Stealing webpages rendered on your browser by exploiting GPU vulnerabilities. In: 2014 IEEE Symposium on Security and Privacy, pp. 19–33. IEEE (2014)
Google Scholar
Liao, N., Cui, X., Liao, K., Wang, T., Yu, D., Cui, X.: Improving DFA attacks on AES with unknown and random faults. Sci. China Inf. Sci. 60(4), 1–14 (2016). https://doi.org/10.1007/s11432-016-0071-7
Article Google Scholar
Liu, Y., Cui, X., Cao, J., Zhang, X.: A hybrid fault model for differential fault attack on AES. In: 2017 IEEE 12th International Conference on ASIC (ASICON), pp. 784–787. IEEE (2017)
Google Scholar
Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: 2007 IEEE International Conference on Signal Processing and Communications, pp. 65–68. IEEE (2007)
Google Scholar
Moro, N., Dehbaoui, A., Heydemann, K., Robisson, B., Encrenaz, E.: Electromagnetic fault injection: towards a fault model on a 32-bit microcontroller. In: 2013 Workshop on Fault Diagnosis and Tolerance in Cryptography, pp. 77–88. IEEE (2013)
Google Scholar
Murakami, T., Kasahara, R., Saito, T.: An implementation and its evaluation of password cracking tool parallelized on GPGPU. In: 2010 10th International Symposium on Communications and Information Technologies, pp. 534–538. IEEE (2010)
Google Scholar
Nvidia. Cuda-C-Programming-Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html/
Nvidia. Nvidia System Management Interface. https://developer.nvidia.com/nvidia-system-management-interface/
Nvidia. Using the nvidia-settings Utility. https://download.nvidia.com/XFree86/Linux-x86_64/396.51/README/nvidiasettings.html/
Nvidia. Everything you need to know about unified memory. https://on-demand.gputechconf.com/gtc/2018/presentation/s8430-everything-you-need-to-know-about-unified-memory.pdf, 2018
Piret, G., Quisquater, J.-J.: A differential fault attack technique against SPN structures, with application to the AES and Khazad. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 77–88. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45238-6_7
Chapter MATH Google Scholar
Gerardo Ravago. CUDA bitcoin miner. https://github.com/geedo0/cuda_bitcoin_miner
Jan S. CUDA-AES. https://github.com/franneck94/CUDA-AES
Sabbagh, M., Fei, Y., Kaeli, D.: A novel GPU overdrive fault attack. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)
Google Scholar
Selmane, N., Guilley, S., Danger, J.: Practical setup time violation attacks on AES. In: 2008 Seventh European Dependable Computing Conference, pp. 91–96 (2008)
Google Scholar
Online tech tips. How to overclock your GPU safely to boost performance. https://www.online-tech-tips.com/computer-tips/overclock-gpu-safely-boost-performance/
George Thessalonikefs. Electromagnetic fault injection characterization. Master’s thesis, University of Amsterdam (2014). https://homepages.staff.os3.nl/~delaat/rp/2013-2014/p67/report.pdf
Timmers, N., Mune, C.: Escalating privileges in Linux using voltage fault injection. In: 2017 Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC), pp. 1–8 (2017)
Google Scholar
Timmers, N., Spruyt, A., Witteman, M.: Controlling PC on ARM using fault injection. In: 2016 Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC), pp. 25–35. IEEE (2016)
Google Scholar
Ville Timonen. GPU Burn. https://github.com/wilicc/gpu-burn
Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 235–246. IEEE (2010)
Google Scholar
Zhu, Z., Kim, S., Rozhanski, Y., Hu, Y., Witchel, E., Silberstein, M.: Understanding the security of discrete GPUs. In: Proceedings of the General Purpose GPUs, pp. 1–11 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, Tel Aviv University, 69978, Ramat Aviv, Israel
Eldad Zuberi & Avishai Wool

Authors

Eldad Zuberi
View author publications
You can also search for this author in PubMed Google Scholar
Avishai Wool
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Eldad Zuberi or Avishai Wool .

Editor information

Editors and Affiliations

Purdue University, West Lafayette, IN, USA
Elisa Bertino
National Research Center for Applied Cybersecurity ATHENE, Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany
Haya Shulman
National Research Center for Applied Cybersecurity ATHENE , Technische Universität Darmstadt, Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany
Michael Waidner

A Appendix

1.1 A.1 CUDA basics

CUDA Kernels are declared using the \(\mathtt{\_}{} \mathtt{\_}{} \mathtt{global}{} \mathtt{\_}{} \mathtt{\_}\) declaration specifier and can be invoked using the syntax in Algorithm 2. Kernels are executed in blocks where each block consists of multiple threads. The parameters numBlocks and threadsPerBlock specify the execution configuration syntax. Each thread that executes the kernel is given unique thread/block IDs that are accessible within the kernel through built-in variables. All threads of a block reside on the same processor core and must share the memory resources of that core. Therefore, the number of threads per block is limited (up 1024 on current GPUs). Instructions are issued and executed in groups of 32 threads, called warps.

Thread blocks are required to execute independently: It must be possible to execute them in any order, in parallel or in series. Threads within a block can cooperate by sharing data through some shared memory and by synchronizing their execution to coordinate memory accesses. Synchronization points can be declared using intrinsic functions, e.g., \(\mathtt{\_}{} \mathtt{\_}{} \mathtt{syncthreads()}\).

1.2 A.2 Future Work

Future work involves leveraging the characterization of faults presented in this paper towards the development of efficient tailored exploitation algorithms and methods. Examples include:

Breaking Cryptographic Calculations Implemented on GPUs. One can speculate that using the byte-flip phenomenon may be incorporated with the work done by Sabbagh et al. [35]. As their work relies on exploiting an instrumented-AES, our characterization might enable the attack to target non-instrumented kernels, as well as reducing the number of messages required to break the encryption. Also it seems that byte-flips may be used to improve attacks on public-key calculations done in a GPU.

Faulty Instructions. During our tests we observed that as the faults rate increased, occasionally the graphics card stopped responding (API calls failed), crashed, or acted extremely slow. We also received kernel crashes with error codes such as: “An illegal instruction was encountered” and “Invalid program counter”. This suggests that the GPU is not only vulnerable to data corruption, but also to instruction corruption [14, 26, 38,39,40], since code-registers (apart from data-registers) are also vulnerable to the faults caused by overclocking.

The knowledge in this paper may allow an attacker to develop code which triggers precise and predictable faults - effectively allowing it to hide malicious instruction in a legitimate code. To design this, the attacker could create a more “prone-to-errors” region of the code (e.g., by performing many loops in a specific alignment). The attacker also knows that it is likely the fault value will be a byte-flip. By studying of GPU opcodes and their inverse, the attacker can then craft his own command in the misread CUDA code. Similar technique can be used to leverage the faults to modification of the Program Counter register.

Other GPUs. Our tests were conducted on an Nvidia GPU, similar work can be carried out to characterize the faults on other GPUs.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zuberi, E., Wool, A. (2021). Characterizing GPU Overclocking Faults. In: Bertino, E., Shulman, H., Waidner, M. (eds) Computer Security – ESORICS 2021. ESORICS 2021. Lecture Notes in Computer Science(), vol 12972. Springer, Cham. https://doi.org/10.1007/978-3-030-88418-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-88418-5_6
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88417-8
Online ISBN: 978-3-030-88418-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Characterizing GPU Overclocking Faults

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

DYRE: a DYnamic REconfigurable solution to increase GPGPU’s reliability

Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Appendix

1.1 A.1 CUDA basics

1.2 A.2 Future Work

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Characterizing GPU Overclocking Faults

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

DYRE: a DYnamic REconfigurable solution to increase GPGPU’s reliability

Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 CUDA basics

1.2 A.2 Future Work

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation