More Web Proxy on the site http://driver.im/

research-article

An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels: Full Research Paper

Authors:

Masatomo Hashimoto,

Toshiyuki Maeda,

Kazuo MinamiAuthors Info & Claims

ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering

Pages 361 - 372

https://doi.org/10.1145/3030207.3030217

Published: 17 April 2017 Publication History

Abstract

The process of performance tuning is time consuming and costly even if it is carried out automatically. It is crucial to learn from the experience of experts. Our long-term goal is to construct a database of facts extracted from specific performance tuning histories of computation-intensive applications such that we can search the database for promising optimization patterns that fit a given kernel.

In this study, as a significant step toward our goal, we explored a thousand computation-intensive applications in terms of the distribution of kernel classes, each of which is related to expected efficiency and specific tuning patterns. To statistically estimate the distribution of the kernel classes, 100 loops were randomly sampled and then manually classified by experienced performance engineers. The result indicates that 50-70% of the kernels are memory-bound and hence difficult to run efficiently on modern scalar processors. In addition, based on the classification results, we constructed experimental classifiers for identifying loop kernels and for predicting kernel classes, which achieved cross-validated classification accuracy of 81% and 65%, respectively.

References

[1]

K. Asanović, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006--183, EECS Department, University of California, Berkeley, 2006.

[2]

D. H. Bailey, R. F. Lucas, and S. W. Williams. Performance Tuning of Scientific Applications. CRC Press, 2011.

[3]

V. R. Basili, J. C. Carver, D. Cruzes, L. M. Hochstein, J. K. Hollingsworth, F. Shull, and M. V. Zelkowitz. Understanding the high-performance-computing community: A software engineer's perspective. IEEE Software, 25(4):29--36, 2008.

Digital Library

[4]

P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, and A. Venkat. Towards making autotuning mainstream. International Journal of High Performance Computing Applications, 27(4):379--393, 2013.

Digital Library

[5]

A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM, 53(2):66--75, 2010.

Digital Library

[6]

C.-C. Chang and C.-J. Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1--27:27, 2011.

Digital Library

[7]

C. Collberg, G. Myles, and M. Stepp. An empirical study of java bytecode programs. Softw. Pract. Exper., 37(6):581--641, 2007.

Digital Library

[8]

J. Dongarra and P. Luszczek. HPC challenge benchmark. http://icl.cs.utk.edu/hpcc/.

[9]

R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: Ultra-large-scale software repository and source-code mining. ACM Trans. Softw. Eng. Methodol., 25(1):7:1--7:34, 2015.

Digital Library

[10]

R. Dyer, H. Rajan, H. A. Nguyen, and T. N. Nguyen. Mining billions of ast nodes to study actual and potential usage of Java language features. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 779--790, 2014.

Digital Library

[11]

G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, F. Bodin, P. Barnard, E. Ashton, E. V. Bonilla, J. Thomson, C. K. I. Williams, and M. F. P. O'Boyle. Milepost GCC: machine learning enabled self-tuning compiler. International Journal of Parallel Programming, 39(3):296--327, 2011.

[12]

T. Gorschek, E. Tempero, and L. Angelis. A large-scale empirical study of practitioners' use of object-oriented concepts. In Proceedings of the 2010 ACM/IEEE 32nd International Conference on Software Engineering (ICSE), volume 1, pages 115--124, 2010.

Digital Library

[13]

M. Grechanik, C. McMillan, L. DeFerrari, M. Comi, S. Crespi, D. Poshyvanyk, C. Fu, Q. Xie, and C. Ghezzi. An empirical investigation into a large-scale Java open source code repository. In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 11:1--11:10, 2010.

Digital Library

[14]

M. Hashimoto, M. Terai, T. Maeda, and K. Minami. Extracting facts from performance tuning history of scientific applications for predicting effective optimization patterns. In Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), pages 13--23, 2015.

[15]

C.-W. Hsu, C.-C. Chang, and C.-J. Lin. A practical guide to support vector classification. Technical report, National Taiwan University, 2003. http://www.csie.ntu.edu.tw/ cjlin/papers/guide/guide.pdf.

[16]

E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian. The promises and perils of mining github. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR), pages 92--101, 2014.

Digital Library

[17]

D. E. Knuth. An empirical study of FORTRAN programs. Software: Practice and Experience, 1(2):105--133, 1971.

[18]

R. Lämmel, E. Pek, and J. Starek. Large-scale, AST-based API-usage analysis of open-source Java projects. In Proceedings of the 2011 ACM Symposium on Applied Computing (SAC), pages 1317--1324, 2011.

Digital Library

[19]

G. Pinto, W. Torres, B. Fernandes, F. Castor, and R. S. Barros. A large-scale study on the usage of Java's concurrent programming constructs. J. Syst. Softw., 106(C):59--81, 2015.

Digital Library

[20]

M. Terai, H. Murai, K. Minami, M. Yokokawa, and E. Tomiyama. K-scope: A Java-based Fortran source code analyzer with graphical user interface for performance improvement. In Proceedings of the 41st International Conference on Parallel Processing Workshops (ICPPW), pages 434--443, 2012.

Digital Library

[21]

A. Tiwari, J. K. Hollingsworth, C. Chen, M. Hall, C. Liao, D. J. Quinlan, and J. Chame. Auto-tuning full applications: A case study. Int. J. High Perform. Comput. Appl., 25(3):286--294, 2011.

Digital Library

[22]

S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009.

Digital Library

[23]

M. Yokokawa, F. Shoji, A. Uno, M. Kurokawa, and T. Watanabe. The K computer: Japanese next-generation supercomputer development project. In Proceedings of the 2011 International Symposium on Low Power Electronics and Design (ISLPED), pages 371--372, 2011.

Cited By

Domke JMatsumura KWahib MZhang HYashima KTsuchikawa TTsuji YPodobas AMatsuoka S(2019)Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00019(78-88)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00019
Katagiri TTakahashi D(2018)Japanese Autotuning Research: Autotuning Languages and FFTProceedings of the IEEE10.1109/JPROC.2018.2870284106:11(2056-2067)Online publication date: Nov-2018
https://doi.org/10.1109/JPROC.2018.2870284

Index Terms

An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels: Full Research Paper

Recommendations

Fast-path loop unrolling of non-counted loops to enable subsequent compiler optimizations
ManLang '18: Proceedings of the 15th International Conference on Managed Languages & Runtimes

Java programs can contain non-counted loops, that is, loops for which the iteration count can neither be determined at compile time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often ...
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping
Abstract
Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques ...
Modulo scheduling of loops in control-intensive non-numeric programs
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Much of the previous work on modulo scheduling has targeted numeric programs, in which, often, the majority of the loops are well-behaved loop-counter-based loops without early exits. In control-intensive non-numeric programs, the loops frequently have ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPE '17: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering

April 2017

450 pages

ISBN:9781450344043

DOI:10.1145/3030207

General Chairs:
Walter Binder
Università della Svizzera Italiana, Switzerland
,
Vittorio Cortellessa
Università dell'Aquila, Italy
,
Program Chairs:
Anne Koziolek
Karlsruhe Institute of Technology, Germany
,
Evgenia Smirni
College of William & Mary, USA
,
Meikel Poess
Oracle Corporation, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation
SIGSOFT: ACM Special Interest Group on Software Engineering
SPEC: SPEC Research Group

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Japan Society for the Promotion of Science (JSPS)

Conference

ICPE '17

Sponsor:

ICPE '17: ACM/SPEC International Conference on Performance Engineering

April 22 - 26, 2017

L'Aquila, Italy

Acceptance Rates

ICPE '17 Paper Acceptance Rate 27 of 83 submissions, 33%;

Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
155
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Domke JMatsumura KWahib MZhang HYashima KTsuchikawa TTsuji YPodobas AMatsuoka S(2019)Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00019(78-88)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00019
Katagiri TTakahashi D(2018)Japanese Autotuning Research: Autotuning Languages and FFTProceedings of the IEEE10.1109/JPROC.2018.2870284106:11(2056-2067)Online publication date: Nov-2018
https://doi.org/10.1109/JPROC.2018.2870284

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents