Abstract
In modern systems CPUs as well as GPUs are equipped with multi-level memory architectures, where different levels of the hierarchy vary in latency and capacity. Therefore, various memory access models were studied. Such a model can be seen as an interface abstracting the user from the physical architecture details. In this paper we present a general and uniform GPU computation and memory access model based on bounded inhibitor Petri nets (PNs). Its effectiveness is demonstrated by comparing its throughputs to practical computational experiments performed with the usage of Nvidia GPU with CUDA architecture.
Our PN model is consistent with the workflow of multithreaded GPU streaming multiprocessors. It models a selection and execution of instructions for each warp. The three types of instructions included in the model are: the arithmetic operation, the access to the shared memory and the access to the global memory. For a given algorithm the model allows to check how efficient the parallelization is, and whether a different organization of threads will improve performance.
The accuracy of our model was tested with different kernels. As the preliminary experiments we used the matrix multiplication program and stability example created by Nvidia, and as the main experiment a binary version of the least significant digit radix sort algorithm. We created three implementations of the algorithm using CUDA architecture, differing in the usage of shared and global memory as well as organization of calculations. For each implementation the PN model was used and the results of experiments are presented in the work.
This research has been supported by the Polish National Science Center through grant No. 2013/09/D/ST6/03928.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that in the case of bounded nets the use of inhibitors is not necessary, one can provide an equivalent (with more complex structure) net without inhibitors.
References
Chiola, G., Donatelli, S., Franceschinis, G.: Priorities, inhibitor arcs and concurrency in P/T nets. In: Proceedings of ICATPN, vol. 91, pp. 182–205 (1991)
Nvidia Corporation. CUDA. http://www.nvidia.com/object/cuda_home_new.html
Nvidia Corporation. CUDA. Best practice guide version 8.0.61 (2017)
Corporation, Nvidia: CUDA. C programming guide version 7.5 (2017)
Duvanenko, V.J.: Algorithm improvement through performance measurement (2009). http://www.drdobbs.com/architecture-and-design/algorithm-improvement-through-performanc/220000504
Grama, A.: Introduction to Parallel Computing. Pearson Education, London (2003)
Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, vol. 37, pp. 152–163. ACM (2009)
Hwang, K., Jotwani, N.: Advanced Computer Architecture, 3rd edn. McGraw-Hill Education, New York (2011)
Janicki, R., Koutny, M.: Structure of concurrency. Theoret. Comput. Sci. 112(1), 5–52 (1993)
Jensen, K., Kristensen, L.M.: Coloured Petri Nets - Modelling and Validation of Concurrent Systems. Springer, Heidelberg (2009). https://doi.org/10.1007/b95112
Luebke, D., Owens, J., Roberts, M., Lee, C.-H.: Intro to Parallel Programming - Online Course. Nvidia Corporation, Santa Clara
Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Gen. Comput. Syst. 30, 202–215 (2014)
Ma, L., Chamberlain, R.D., Agrawal, K.: Performance modeling for highly-threaded many-core GPUs. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 84–91. IEEE (2014)
Madougou, S., Varbanescu, A.L., de Laat, C.: Using colored petri nets for GPGPU performance modeling. In: Proceedings of the ACM International Conference on Computing Frontiers, pp. 240–249. ACM (2016)
Mukherjee, R.: A performance prediction model for the CUDA GPGPU platform. Ph.D. thesis, International Institute of Information Technology Hyderabad, India (2010)
Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)
Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)
Ratzer, A.V., et al.: CPN tools for editing, simulating, and analysing coloured petri nets. In: van der Aalst, W.M.P., Best, E. (eds.) ICATPN 2003. LNCS, vol. 2679, pp. 450–462. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44919-1_28
Reisig, W.: Petri Nets: An Introduction, vol. 4. Springer, Heidelberg (2012)
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–10. IEEE (2009)
Seward, H.E.: Information sorting in the application of electronic digital computers to business operations, Master Thesis, MIT (1954)
Shuaiwen, S., Chunyi, S., Barry, R., Kirk, C.: A simplified and accurate model of power-performance efficiency on emergent GPU architecure. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 673–686 (2013)
Storti, D., Yurtoglu, M.: CUDA for Engineers: An Introduction to High-performance Parallel Computing. Addison-Wesley Professional, Boston (2015)
Westergaard, M., (Eric) Verbeek, H.M.W.: CPN Tools official webpage. Eindhoven University of Technology
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Cheng, S.-T., Hung, Y.: Estimation of job execution time in mapreduce framework over GPU clusters. In: The Fifth International Conference on Performance, Safety and Robustness in Complex Systems and Applications, PESARO 2015 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Gogolińska, A., Mikulski, Ł., Piątkowski, M. (2018). GPU Computations and Memory Access Model Based on Petri Nets. In: Koutny, M., Kristensen, L., Penczek, W. (eds) Transactions on Petri Nets and Other Models of Concurrency XIII. Lecture Notes in Computer Science(), vol 11090. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58381-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-58381-4_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58380-7
Online ISBN: 978-3-662-58381-4
eBook Packages: Computer ScienceComputer Science (R0)