Abstract
Stencil computations are commons in High Performance Computing (HPC) applications, they consist in a pattern that replicates the same calculation in a data domain. The Finite-Difference Method is an example of stencil computations and it is used to solve real problems in diverse areas related to Partial Differential Equations (electromagnetics, fluid dynamics, geophysics, etc.). Although a large body of literature on optimization of this class of applications is available, the performance evaluation and its optimization on different HPC architectures remain a challenge. In this work, we implemented the 7-point Jacobian stencil in a Source-to-Source Transformation Framework (BOAST) to evaluate the performance of different HPC architectures. Achieved results present that the same source code can be executed on current architectures with a performance improvement, and it helps the programmer to develop the applications without dependence on hardware features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Breuer, A., Heinecke, A., Bader, M.: Petascale local time stepping for the ADER-DG finite element method. In: 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, 23–27 May 2016, pp. 854–863 (2016)
Buchty, R., Heuveline, V., Karl, W., Weiss, J.P.: A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators. Concurrency Comput. Pract. Exp. 24(7), 663–675 (2012). https://doi.org/10.1002/cpe.1904
Christen, M., Schenk, O., Burkhart, H.: Automatic code generation and tuning for stencil kernels on modern shared memory architectures. Comput. Sci. 26(3–4), 205–210 (2011)
Cronsioe, J., Videau, B., Marangozova-Martin, V.: Boast: bringing optimization through automatic source-to-source transformations. In: 2013 IEEE 7th International Symposium on Embedded Multicore SoCs, pp. 129–134, September 2013. https://doi.org/10.1109/MCSoC.2013.12
Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev. 51(1), 129–159 (2009). https://doi.org/10.1137/070693199
Datta, K., et al.: Auto-Tuning Stencil Computations on Multicore and Accelerators. CRC Press, Taylor & Francis Group (2010)
Dupros, F., Boulahya, F., Aochi, H., Thierry, P.: Communication-avoiding seismic numerical kernels on multicore processors. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), pp. 330–335, August 2015. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.230
Dupros, F., Do, H., Aochi, H.: On scalability issues of the elastodynamics equations on multicore platforms. In: Proceedings of the International Conference on Computational Science, ICCS 2013, Barcelona, Spain, 5–7 June 2013, pp. 1226–1234 (2013)
Forth, S.A., Tadjouddine, M., Pryce, J.D., Reid, J.K.: Jacobian code generated by source transformation and vertex elimination can be as efficient ash and-coding. ACM Trans. Math. Softw. 30(3), 266–299 (2004). https://doi.org/10.1145/1024074.1024076. http://doi.acm.org/10.1145/1024074.1024076
Genssler, T., Kuttruff, V.: Source-to-source transformation in the large. In: Böszörményi, L., Schojer, P. (eds.) JMLC 2003. LNCS, vol. 2789, pp. 254–265. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45213-3_31
Khan, M., Priyanka, N., Ahmed, W., Radhika, N., Pavithra, M., Parimala, K.: Understanding source-to-source transformations for frequent porting of applications on changing cloud architectures. In: 2014 International Conference on Parallel, Distributed and Grid Computing, pp. 350–354, December 2014. https://doi.org/10.1109/PDGC.2014.7030769
Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. SIGPLAN Not. 44(4), 101–110 (2009). https://doi.org/10.1145/1594835.1504194. http://doi.acm.org/10.1145/1594835.1504194
Loveman, D.B.: Program improvement by source-to-source transformation. J. ACM 24(1), 121–145 (1977). https://doi.org/10.1145/321992.322000. http://doi.acm.org/10.1145/321992.322000
Martínez, V., Dupros, F., Castro, M., Navaux, P.: Performance improvement of stencil computations for multi-core architectures based on machine learning. Procedia Comput. Sci. 108, 305–314 (2017). https://doi.org/10.1016/j.procs.2017.05.164. http://www.sciencedirect.com/science/article/pii/S1877050917307408. international Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland
Mijakovic, R., Firbach, M., Gerndt, M.: An architecture for flexible auto-tuning: the periscope tuning framework 2.0. In: International Conference on Green High Performance Computing (ICGHPC), pp. 1–9, February 2016. https://doi.org/10.1109/ICGHPC.2016.7508066
Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015). https://doi.org/10.1145/2788396
Moczo, P., Robertsson, J., Eisner, L.: The finite-difference time-domain method for modeling of seismic wave propagation. In: Advances in Wave Propagation in Heterogeneous Media, Advances in Geophysics, vol. 48, chap. 8, pp. 421–516. Elsevier - Academic Press (2007)
Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13, November 2010. https://doi.org/10.1109/SC.2010.2
Noaje, G., Jaillet, C., Krajecki, M.: Source-to-source code translator: OpenMP C to CUDA. In: 2011 IEEE International Conference on High Performance Computing and Communications, pp. 512–519, September 2011. https://doi.org/10.1109/HPCC.2011.73
Renault, E., Ancelin, C., Jimenez, W., Botero, O.: Using source-to-source transformation tools to provide distributed parallel applications from openMP source code. In: 2008 International Symposium on Parallel and Distributed Computing, pp. 197–204, July 2008. https://doi.org/10.1109/ISPDC.2008.65
Sodani, A., et al.: Knights landing: second-generation intelxeon phi product. IEEE Micro 36(2), 34–46 (2016). https://doi.org/10.1109/MM.2016.25
Stojanovic, S., Bojic, D., Bojovic, M., Valero, M., Milutinovic, V.: An overview of selected hybrid and reconfigurable architectures. In: 2012 IEEE International Conference on Industrial Technology (ICIT), pp. 444–449, March 2012. https://doi.org/10.1109/ICIT.2012.6209978
Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011). https://doi.org/10.1145/1989493.1989508. http://doi.acm.org/10.1145/1989493.1989508
Videau, B., et al.: Boast: a meta programming framework to produce portable and efficient computing kernels for HPC applications. Int. J. High Perform. Comput. Appl. 32(1), 28–44 (2018). https://doi.org/10.1177/1094342017718068
Wahib, M., Maruyama, N.: Automated GPU kernel transformations in large-scale production stencil applications. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015, pp. 259–270. ACM, New York (2015). https://doi.org/10.1145/2749246.2749255. http://doi.acm.org/10.1145/2749246.2749255
Zhao, B., Li, Z., Jannesari, A., Wolf, F., Wu, W.: Dependence-based code transformation for coarse-grained parallelism. In: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, COSMIC 2015, pp. 1:1–1:10. ACM, New York (2015). https://doi.org/10.1145/2723772.2723777. http://doi.acm.org/10.1145/2723772.2723777
Acknowledgments
This work has been granted by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), the Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS). Research has received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E Project, grant agreement n.o 689772. It was also supported by Intel under the Modern Code project, and the PETROBRAS oil company under Ref. 2016/00133-9. We also thank to RICAP, partially funded by the Ibero-American Program of Science and Technology for Development (CYTED), Ref. 517RT0529.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Martínez, V., Serpa, M.S., Pavan, P.J., Padoin, E.L., Navaux, P.O.A. (2019). Performance Evaluation of Stencil Computations Based on Source-to-Source Transformations. In: Meneses, E., Castro, H., Barrios Hernández, C., Ramos-Pollan, R. (eds) High Performance Computing. CARLA 2018. Communications in Computer and Information Science, vol 979. Springer, Cham. https://doi.org/10.1007/978-3-030-16205-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-16205-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16204-7
Online ISBN: 978-3-030-16205-4
eBook Packages: Computer ScienceComputer Science (R0)