Abstract
Graphics Processing Units (GPUs) are powerful parallel processors that are becoming common on computers. They are used in many high-performance tasks such as crypto-mining and neural-network training. It is common to overclock a GPU to gain performance, however this practice may introduce calculation faults. In our work, we lay the foundations to exploiting these faults, by characterizing their formation and structure. We find that temperature is a contributing factor to the fault rate, but is not the sole cause. We also find that faults are a byte-wide phenomenon: individual bit-flips are rare. Surprisingly, we find that the vast majority of byte faults are in fact byte-flips: all 8 bits are simultaneously negated. Finally, we find strong evidence that faults are triggered by memory-remnant reads at an alignment of a 32 byte memory transaction size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agoyan, M., Dutertre, J., Mirbaha, A., Naccache, D., Ribotta, A., Tria, A.: Single-bit DFA using multiple-byte laser fault injection. In: 2010 IEEE International Conference on Technologies for Homeland Security (HST), pp. 113–119 (2010)
Agoyan, M., Dutertre, J.-M., Naccache, D., Robisson, B., Tria, A.: When clocks fail: on critical paths and clock faults. In: Gollmann, D., Lanet, J.-L., Iguchi-Cartigny, J. (eds.) CARDIS 2010. LNCS, vol. 6035, pp. 182–193. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12510-2_13
ArchWiki. NVIDIA/Tips and tricks. https://wiki.archlinux.org/index.php/NVIDIA/Tips_and_tricks
Barenghi, A., Bertoni, G.M., Breveglieri, L., Pellicioli, M., Pelosi, G.: Low voltage fault attacks to AES. In: 2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), pp. 7–12. IEEE (2010)
Bialas, P., Strzelecki, A.: Benchmarking the cost of thread divergence in CUDA. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 570–579. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_53
Biham, E., Shamir, A.: Differential fault analysis of secret key cryptosystems. In: Kaliski, B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 513–525. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0052259
Boneh, D., DeMillo, R.A., Lipton, R.J.: On the importance of checking cryptographic protocols for faults. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 37–51. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-69053-0_4
Nvidia developer forum. Unified Memory vs Pinned Memory. https://forums.developer.nvidia.com/t/unified-memory-vs-pinned-host-memory-vs-gpu-global-memory/34640
Dusart, P., Letourneux, G., Vivolo, O.: Differential fault analysis on A.E.S. In: Zhou, J., Yung, M., Han, Y. (eds.) ACNS 2003. LNCS, vol. 2846, pp. 293–306. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45203-4_23
Ekbote, B., Hire, V., Mahajan, P., Sisodia, J.: Blockchain based remittances and mining using CUDA. In: 2017 International Conference On Smart Technologies for Smart Nation (SmartTechCon), pp. 908–911. IEEE (2017)
Nvidia Forum. Run CUDA on dedicated GPU. https://forums.developer.nvidia.com/t/solved-run-cuda-on-dedicated-nvidia-gpu-while-connecting-monitors-to-intel-hd-graphics-is-this-possible/47690/2/
Gawande, N.A., Daily, J.A., Siegel, C., Tallent, N.R., Vishnu, A.: Scaling deep learning workloads: NVIDIA DGX-1/Pascal and intel knights landing. Future Gener. Comput. Syst. 108, 1162–1172 (2020)
Giraud, C.: DFA on AES. In: Dobbertin, H., Rijmen, V., Sowa, A. (eds.) AES 2004. LNCS, vol. 3373, pp. 27–41. Springer, Heidelberg (2005). https://doi.org/10.1007/11506447_4
Gratchoff, J.: Proving the wild jungle jump. Technical report, University of Amsterdam (2015). https://homepages.staff.os3.nl/~delaat/rp/2014-2015/p48/report.pdf
Harris, M.: Unified Memory in CUDA 6. https://developer.nvidia.com/blog/unified-memory-in-cuda-6/
integralfx. DDR4 Overclocking Guide. https://github.com/integralfx/MemTestHelper/blob/master/DDR4
Jiang, Z.H., Fei, Y., Kaeli, D.: A complete key recovery timing attack on a GPU. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 394–405. IEEE (2016)
Kemal. Scripts to overclock & start bitcoin miners on boot. https://gist.github.com/disq/995082
Kovacs, B.: Nvidia overclock scripts. https://github.com/brandonkovacs/nvidia-overclock-scripts
Landaverde, R., Zhang, T., Coskun, A.K., Herbordt, M.: An investigation of unified memory access performance in CUDA. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)
Lapid, B., Wool, A.: Cache-attacks on the ARM TrustZone implementations of AES-256 and AES-256-GCM via GPU-based analysis. In: Cid, C., Jacobson Jr. M. (eds.) SAC 2018. LNCS, vol. 11349, pp. 235–256. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10970-7_11
Lee, S., Kim, Y., Kim, J., Kim, J.: Stealing webpages rendered on your browser by exploiting GPU vulnerabilities. In: 2014 IEEE Symposium on Security and Privacy, pp. 19–33. IEEE (2014)
Liao, N., Cui, X., Liao, K., Wang, T., Yu, D., Cui, X.: Improving DFA attacks on AES with unknown and random faults. Sci. China Inf. Sci. 60(4), 1–14 (2016). https://doi.org/10.1007/s11432-016-0071-7
Liu, Y., Cui, X., Cao, J., Zhang, X.: A hybrid fault model for differential fault attack on AES. In: 2017 IEEE 12th International Conference on ASIC (ASICON), pp. 784–787. IEEE (2017)
Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: 2007 IEEE International Conference on Signal Processing and Communications, pp. 65–68. IEEE (2007)
Moro, N., Dehbaoui, A., Heydemann, K., Robisson, B., Encrenaz, E.: Electromagnetic fault injection: towards a fault model on a 32-bit microcontroller. In: 2013 Workshop on Fault Diagnosis and Tolerance in Cryptography, pp. 77–88. IEEE (2013)
Murakami, T., Kasahara, R., Saito, T.: An implementation and its evaluation of password cracking tool parallelized on GPGPU. In: 2010 10th International Symposium on Communications and Information Technologies, pp. 534–538. IEEE (2010)
Nvidia. Cuda-C-Programming-Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html/
Nvidia. Nvidia System Management Interface. https://developer.nvidia.com/nvidia-system-management-interface/
Nvidia. Using the nvidia-settings Utility. https://download.nvidia.com/XFree86/Linux-x86_64/396.51/README/nvidiasettings.html/
Nvidia. Everything you need to know about unified memory. https://on-demand.gputechconf.com/gtc/2018/presentation/s8430-everything-you-need-to-know-about-unified-memory.pdf, 2018
Piret, G., Quisquater, J.-J.: A differential fault attack technique against SPN structures, with application to the AES and Khazad. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 77–88. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45238-6_7
Gerardo Ravago. CUDA bitcoin miner. https://github.com/geedo0/cuda_bitcoin_miner
Jan S. CUDA-AES. https://github.com/franneck94/CUDA-AES
Sabbagh, M., Fei, Y., Kaeli, D.: A novel GPU overdrive fault attack. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)
Selmane, N., Guilley, S., Danger, J.: Practical setup time violation attacks on AES. In: 2008 Seventh European Dependable Computing Conference, pp. 91–96 (2008)
Online tech tips. How to overclock your GPU safely to boost performance. https://www.online-tech-tips.com/computer-tips/overclock-gpu-safely-boost-performance/
George Thessalonikefs. Electromagnetic fault injection characterization. Master’s thesis, University of Amsterdam (2014). https://homepages.staff.os3.nl/~delaat/rp/2013-2014/p67/report.pdf
Timmers, N., Mune, C.: Escalating privileges in Linux using voltage fault injection. In: 2017 Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC), pp. 1–8 (2017)
Timmers, N., Spruyt, A., Witteman, M.: Controlling PC on ARM using fault injection. In: 2016 Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC), pp. 25–35. IEEE (2016)
Ville Timonen. GPU Burn. https://github.com/wilicc/gpu-burn
Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 235–246. IEEE (2010)
Zhu, Z., Kim, S., Rozhanski, Y., Hu, Y., Witchel, E., Silberstein, M.: Understanding the security of discrete GPUs. In: Proceedings of the General Purpose GPUs, pp. 1–11 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 CUDA basics
CUDA Kernels are declared using the \(\mathtt{\_}{} \mathtt{\_}{} \mathtt{global}{} \mathtt{\_}{} \mathtt{\_}\) declaration specifier and can be invoked using the syntax in Algorithm 2. Kernels are executed in blocks where each block consists of multiple threads. The parameters numBlocks and threadsPerBlock specify the execution configuration syntax. Each thread that executes the kernel is given unique thread/block IDs that are accessible within the kernel through built-in variables. All threads of a block reside on the same processor core and must share the memory resources of that core. Therefore, the number of threads per block is limited (up 1024 on current GPUs). Instructions are issued and executed in groups of 32 threads, called warps.
Thread blocks are required to execute independently: It must be possible to execute them in any order, in parallel or in series. Threads within a block can cooperate by sharing data through some shared memory and by synchronizing their execution to coordinate memory accesses. Synchronization points can be declared using intrinsic functions, e.g., \(\mathtt{\_}{} \mathtt{\_}{} \mathtt{syncthreads()}\).
1.2 A.2 Future Work
Future work involves leveraging the characterization of faults presented in this paper towards the development of efficient tailored exploitation algorithms and methods. Examples include:
Breaking Cryptographic Calculations Implemented on GPUs. One can speculate that using the byte-flip phenomenon may be incorporated with the work done by Sabbagh et al. [35]. As their work relies on exploiting an instrumented-AES, our characterization might enable the attack to target non-instrumented kernels, as well as reducing the number of messages required to break the encryption. Also it seems that byte-flips may be used to improve attacks on public-key calculations done in a GPU.
Faulty Instructions. During our tests we observed that as the faults rate increased, occasionally the graphics card stopped responding (API calls failed), crashed, or acted extremely slow. We also received kernel crashes with error codes such as: “An illegal instruction was encountered” and “Invalid program counter”. This suggests that the GPU is not only vulnerable to data corruption, but also to instruction corruption [14, 26, 38,39,40], since code-registers (apart from data-registers) are also vulnerable to the faults caused by overclocking.
The knowledge in this paper may allow an attacker to develop code which triggers precise and predictable faults - effectively allowing it to hide malicious instruction in a legitimate code. To design this, the attacker could create a more “prone-to-errors” region of the code (e.g., by performing many loops in a specific alignment). The attacker also knows that it is likely the fault value will be a byte-flip. By studying of GPU opcodes and their inverse, the attacker can then craft his own command in the misread CUDA code. Similar technique can be used to leverage the faults to modification of the Program Counter register.
Other GPUs. Our tests were conducted on an Nvidia GPU, similar work can be carried out to characterize the faults on other GPUs.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zuberi, E., Wool, A. (2021). Characterizing GPU Overclocking Faults. In: Bertino, E., Shulman, H., Waidner, M. (eds) Computer Security – ESORICS 2021. ESORICS 2021. Lecture Notes in Computer Science(), vol 12972. Springer, Cham. https://doi.org/10.1007/978-3-030-88418-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-88418-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88417-8
Online ISBN: 978-3-030-88418-5
eBook Packages: Computer ScienceComputer Science (R0)