Abstract
This paper provides an overview and an evaluation of the Cetus source-to-source compiler infrastructure. The original goal of the Cetus project was to create an easy-to-use compiler for research in automatic parallelization of C programs. In meantime, Cetus has been used for many additional program transformation tasks. It serves as a compiler infrastructure for many projects in the US and internationally. Recently, Cetus has been supported by the National Science Foundation to build a community resource. The compiler has gone through several iterations of benchmark studies and implementations of those techniques that could improve the parallel performance of these programs. These efforts have resulted in a system that favorably compares with state-of-the-art parallelizers, such as Intel’s ICC. A key limitation of advanced optimizing compilers is their lack of runtime information, such as the program input data. We will discuss and evaluate several techniques that support dynamic optimization decisions. Finally, as there is an extensive body of proposed compiler analyses and transformations for parallelization, the question of the importance of the techniques arises. This paper evaluates the impact of the individual Cetus techniques on overall program performance.
Similar content being viewed by others
References
Allen R., Kennedy K.: Optimizing Compilers for Modern Architectures. Morgan Kaufman, San Francisco (2002)
Asenjo, R., Castillo, R., Corbera, F., Navarro, A., Tineo, A., Zapata, E.: Parallelizing irregular C codes assisted by interprocedural shape analysis. In: 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08) (2008)
Baek, W., Minh, C.C., Trautmann, M., Kozyrakis, C., Olukotun, K.: The opentm transactional application programming interface. In: PACT ’07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp. 376–387. IEEE Computer Society, Washington, DC, USA (2007). doi:10.1109/PACT.2007.74
Barszcz, E., Barton, J., Dagum, L., Frederickson, P., Lasinski, T., Schreiber, R., Venkatakrishnan, V., Weeratunga, S., Bailey, D., Bailey, D., Browning, D., Browning, D., Carter, R., Carter, R., Fineberg, S., Fineberg, S., Simon, H., Simon, H.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. Technical report (1991)
Basumallik, A., Eigenmann, R.: Optimizing irregular shared-memory applications for distributed-memory systems. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 119–128. ACM, New York, NY, USA (2006). doi:10.1145/1122971.1122990
Blume W., Doallo R., Eigenmann R., Grout J., Hoeflinger J., Lawrence T., Lee J., Padua D., Paek Y., Pottenger B., Rauchwerger L., Tu P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)
Blume W., Eigenmann R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3(1), 643–656 (1992)
Blume, W., Eigenmann, R.: The range test: a dependence test for symbolic, non-linear expressions. In: Proceedings of Supercomputing ’94, Washington, DC, pp. 528–537 (1994)
Callahan, D., Dongarra, J., Levine D.: Vectorizing compilers: a test suite and results. In: Proceedings of the 1988 ACE/IEEE Conference on Supercomputing, Orlando, FL, USA, pp. 98–105. IEEE Computer Society Press, Los Alamitos, CA (1988)
Callahan, D.: The program summary graph and flow-sensitive interprocedual data flow analysis. In: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language design and Implementation, PLDI ’88, pp. 47–56. ACM, New York, NY, USA (1988). doi:10.1145/53990.53995
Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011 (2011)
Dave, C.: Parallelization and performance-tuning: automating two essential techniques in the multicore era. Master’s thesis, Purdue University (2010)
Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009)
Eigenmann, R., Blume, W.: An effectiveness study of parallelizing compiler techniques. In: Proceedings of the International Conference on Parallel Processing, vol. 2, pp. 17–25 (1991)
Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. Parallel Distrib. Syst. 9(1), 5–23 (1998)
Emami, M., Ghiya, R., Hendren, L.J.: Context-sensitive interprocedural points-to analysis in the presence of function pointers. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI ’94, pp. 242–256. ACM, New York, NY, USA (1994). doi:10.1145/178243.178264
Fei, L., Midkiff, S.P.: Artemis: practical runtime monitoring of applications for execution anomalies. In: PLDI ’06: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 84–95. ACM, New York, NY, USA (2006). doi:10.1145/1133981.1133992
Guo, J., Stiles, M., Yi, Q., Psarris, K.: Enhancing the role of inlining in effective interprocedural parallelization. In: Parallel Processing (ICPP), 2011 International Conference on, pp. 265–274 (2011). doi:10.1109/ICPP.2011.68
Kim, S.W., Voss, M., Eigenmann, R.: Performance analysis of compiler-parallelized programs on shared-memory multiprocessors. In: Proceedings of CPC2000 Compilers for Parallel Computers, p. 305 (2000)
Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of the ACM Symposium on Principles and Practice of Parallel Programming (PPOPP’09), ACM Press (2009)
Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU program optimizations. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–10. IEEE Computer Society, Washington, DC, USA (2009) doi:10.1109/IPDPS.2009.5160988. http://portal.acm.org/citation.cfm?id=1586640.1587597
Min, S.J., Kim, S.W., Voss, M., Lee, S.I., Eigenmann, R.: Portable compilers for OpenMP. In: OpenMP Shared-Memory Parallel Programming, Lecture Notes in Computer Science #2104, pp. 11–19. Springer, Heidelberg (2001)
Mustafa, D., Eigenmann, R.: Portable section-level tuning of compiler parallelized applications. In: Proceedings of the 2012 ACM/IEEE Conference on Supercomputing. IEEE Press (2012)
Mustafa, D., Eigenmann, R.: Window-based empirical tuning of parallelized applications. Technical report, Purdue University, ParaMount Research Group (2011)
Mytkowicz T., Diwan A., Hauswirth M., Sweeney P.: The effect of omitted-variable bias on the evaluation of compiler optimizations. Computer 43(9), 62–67 (2010). doi:10.1109/MC.2010.214
Nobayashi, H., Eoyang, C.: A comparison study of automatically vectorizing Fortran compilers. In: Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pp. 820–825 (1989)
Papakonstantinou, A., Gururaj, K., Stratton, J.A., Chen, D., Cong, J., Hwu, W.M.W.: High-performance CUDA kernel execution on FPGAs. In: Proceedings of the 23rd International Conference on Supercomputing, ICS ’09, pp. 515–516. ACM, New York, NY, USA (2009). doi:10.1145/1542275.1542357
Satoh, S.: NAS Parallel Benchmarks 2.3 OpenMP C version [Online]. Available: http://www.hpcs.cs.tsukuba.ac.jp/omni-openmp(2000)
Shen Z., Li Z., Yew P.: An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1(3), 356–364 (1990)
Tu, P., Padua, D.: Automatic array privatization. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua D. (eds.) Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, vol. 768, pp. 500–521, Portland (12–14 August 1993)
der Wijngaart, R.F.V.: NAS parallel benchmarks version 2.4. Technical report, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division (2002)
Wolfe M.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)
Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, pp. 86–97. ACM, New York, NY, USA (2010). doi:10.1145/1806596.1806606
Yang, Y., Xiang, P., Kong, J., Zhou, H.: An optimizing compiler for GPGPU programs with input-data sharing. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 343–344. ACM, New York, NY, USA (2010). doi:10.1145/1693453.1693505
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bae, H., Mustafa, D., Lee, JW. et al. The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation. Int J Parallel Prog 41, 753–767 (2013). https://doi.org/10.1007/s10766-012-0211-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-012-0211-z