GPU Computations and Memory Access Model Based on Petri Nets

Anna Gogolińska¹⁶,
Łukasz Mikulski¹⁶ &
Marcin Piątkowski¹⁶

Part of the book series: Lecture Notes in Computer Science ((TOPNOC,volume 11090))

415 Accesses
5 Citations

Abstract

In modern systems CPUs as well as GPUs are equipped with multi-level memory architectures, where different levels of the hierarchy vary in latency and capacity. Therefore, various memory access models were studied. Such a model can be seen as an interface abstracting the user from the physical architecture details. In this paper we present a general and uniform GPU computation and memory access model based on bounded inhibitor Petri nets (PNs). Its effectiveness is demonstrated by comparing its throughputs to practical computational experiments performed with the usage of Nvidia GPU with CUDA architecture.

Our PN model is consistent with the workflow of multithreaded GPU streaming multiprocessors. It models a selection and execution of instructions for each warp. The three types of instructions included in the model are: the arithmetic operation, the access to the shared memory and the access to the global memory. For a given algorithm the model allows to check how efficient the parallelization is, and whether a different organization of threads will improve performance.

The accuracy of our model was tested with different kernels. As the preliminary experiments we used the matrix multiplication program and stability example created by Nvidia, and as the main experiment a binary version of the least significant digit radix sort algorithm. We created three implementations of the algorithm using CUDA architecture, differing in the usage of shared and global memory as well as organization of calculations. For each implementation the PN model was used and the results of experiments are presented in the work.

This research has been supported by the Polish National Science Center through grant No. 2013/09/D/ST6/03928.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

eBook: GBP 12.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scalable Parallelization of Stencils Using MODA

Heterogeneous parallel_for Template for CPU–GPU Chips

Article 31 January 2018

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

Notes

1.
Note that in the case of bounded nets the use of inhibitors is not necessary, one can provide an equivalent (with more complex structure) net without inhibitors.

References

Chiola, G., Donatelli, S., Franceschinis, G.: Priorities, inhibitor arcs and concurrency in P/T nets. In: Proceedings of ICATPN, vol. 91, pp. 182–205 (1991)
Google Scholar
Nvidia Corporation. CUDA. http://www.nvidia.com/object/cuda_home_new.html
Nvidia Corporation. CUDA. Best practice guide version 8.0.61 (2017)
Google Scholar
Corporation, Nvidia: CUDA. C programming guide version 7.5 (2017)
Google Scholar
Duvanenko, V.J.: Algorithm improvement through performance measurement (2009). http://www.drdobbs.com/architecture-and-design/algorithm-improvement-through-performanc/220000504
Grama, A.: Introduction to Parallel Computing. Pearson Education, London (2003)
Google Scholar
Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, vol. 37, pp. 152–163. ACM (2009)
Google Scholar
Hwang, K., Jotwani, N.: Advanced Computer Architecture, 3rd edn. McGraw-Hill Education, New York (2011)
Google Scholar
Janicki, R., Koutny, M.: Structure of concurrency. Theoret. Comput. Sci. 112(1), 5–52 (1993)
Article MathSciNet Google Scholar
Jensen, K., Kristensen, L.M.: Coloured Petri Nets - Modelling and Validation of Concurrent Systems. Springer, Heidelberg (2009). https://doi.org/10.1007/b95112
Book MATH Google Scholar
Luebke, D., Owens, J., Roberts, M., Lee, C.-H.: Intro to Parallel Programming - Online Course. Nvidia Corporation, Santa Clara
Google Scholar
Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Gen. Comput. Syst. 30, 202–215 (2014)
Article Google Scholar
Ma, L., Chamberlain, R.D., Agrawal, K.: Performance modeling for highly-threaded many-core GPUs. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 84–91. IEEE (2014)
Google Scholar
Madougou, S., Varbanescu, A.L., de Laat, C.: Using colored petri nets for GPGPU performance modeling. In: Proceedings of the ACM International Conference on Computing Frontiers, pp. 240–249. ACM (2016)
Google Scholar
Mukherjee, R.: A performance prediction model for the CUDA GPGPU platform. Ph.D. thesis, International Institute of Information Technology Hyderabad, India (2010)
Google Scholar
Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)
Article Google Scholar
Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)
Google Scholar
Ratzer, A.V., et al.: CPN tools for editing, simulating, and analysing coloured petri nets. In: van der Aalst, W.M.P., Best, E. (eds.) ICATPN 2003. LNCS, vol. 2679, pp. 450–462. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44919-1_28
Chapter Google Scholar
Reisig, W.: Petri Nets: An Introduction, vol. 4. Springer, Heidelberg (2012)
Google Scholar
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–10. IEEE (2009)
Google Scholar
Seward, H.E.: Information sorting in the application of electronic digital computers to business operations, Master Thesis, MIT (1954)
Google Scholar
Shuaiwen, S., Chunyi, S., Barry, R., Kirk, C.: A simplified and accurate model of power-performance efficiency on emergent GPU architecure. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 673–686 (2013)
Google Scholar
Storti, D., Yurtoglu, M.: CUDA for Engineers: An Introduction to High-performance Parallel Computing. Addison-Wesley Professional, Boston (2015)
Google Scholar
Westergaard, M., (Eric) Verbeek, H.M.W.: CPN Tools official webpage. Eindhoven University of Technology
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Cheng, S.-T., Hung, Y.: Estimation of job execution time in mapreduce framework over GPU clusters. In: The Fifth International Conference on Performance, Safety and Robustness in Complex Systems and Applications, PESARO 2015 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Toruń, Poland
Anna Gogolińska, Łukasz Mikulski & Marcin Piątkowski

Authors

Anna Gogolińska
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Mikulski
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Piątkowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Gogolińska .

Editor information

Editors and Affiliations

Newcastle University, Newcastle upon Tyne, UK
Maciej Koutny
Western Norway University of Applied Sciences, Bergen, Norway
Lars Michael Kristensen
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Wojciech Penczek

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gogolińska, A., Mikulski, Ł., Piątkowski, M. (2018). GPU Computations and Memory Access Model Based on Petri Nets. In: Koutny, M., Kristensen, L., Penczek, W. (eds) Transactions on Petri Nets and Other Models of Concurrency XIII. Lecture Notes in Computer Science(), vol 11090. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58381-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-662-58381-4_7
Published: 21 November 2018
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58380-7
Online ISBN: 978-3-662-58381-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics