[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2503210.2503296acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Precimonious: tuning assistant for floating-point precision

Published: 17 November 2013 Publication History

Abstract

Given the variety of numerical errors that can occur, floating-point programs are difficult to write, test and debug. One common practice employed by developers without an advanced background in numerical analysis is using the highest available precision. While more robust, this can degrade program performance significantly. In this paper we present Precimonious, a dynamic program analysis tool to assist developers in tuning the precision of floating-point programs. Precimonious performs a search on the types of the floating-point program variables trying to lower their precision subject to accuracy constraints and performance goals. Our tool recommends a type instantiation that uses lower precision while producing an accurate enough answer without causing exceptions. We evaluate Precimonious on several widely used functions from the GNU Scientific Library, two NAS Parallel Benchmarks, and three other numerical programs. For most of the programs analyzed, Precimonious reduces precision, which results in performance improvements as high as 41%.

References

[1]
D. An, R. Blue, M. Lam, S. Piper, and G. Stoker. Fpinst: Floating point error analysis using dyninst, 2008. URL http://www.freearrow.com/downloads/files/fpinst.pdf.
[2]
M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov. Accelerating scientific computations with mixed precision algorithms. Computer Physics Communications, 180(12):2526--2533, 2009.
[3]
D. H. Bailey. Resolving numerical anomalies in scientific computation, 2008. URL http://www.davidhbailey.com/dhbpapers/numerical-bugs.pdf.
[4]
D. H. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0, 1995.
[5]
E. T. Barr, T. Vo, V. Le, and Z. Su. Automatic detection of floating-point exceptions. In R. Giacobazzi and R. Cousot, editors, POPL, pages 549--560. ACM, 2013.
[6]
F. Benz, A. Hildebrandt, and S. Hack. A dynamic program analysis to find floating-point accuracy problems. In J. Vitek, H. Lin, and F. Tip, editors, PLDI, pages 453--462. ACM, 2012.
[7]
J. Bilmes, K. Asanović, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the International Conference on Supercomputing, Vienna, Austria, July 1997. ACM SIGARC. see http://www.icsi.berkeley.edu/~bilmes/phipac.
[8]
A. W. Brown, P. H. J. Kelly, and W. Luk. Profiling floating point value ranges for reconfigurable implementation. In Proceedings of the 1st HiPEAC Workshop on Reconfigurable Computing, pages 6--16, 2007.
[9]
A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64--bit accuracy. ACM Trans. Math. Softw., 34(4):17:1--17:22, July 2008. URL http://doi.acm.org/10.1145/1377596.1377597.
[10]
G. P. Contributors. GSL - GNU scientific library - GNU project - free software foundation (FSF). http://www.gnu.org/software/gsl/, 2010. URL http://www.gnu.org/software/gsl/.
[11]
E. Darulova and V. Kuncak. Trustworthy numerical computation in scala. In C. V. Lopes and K. Fisher, editors, OOPSLA, pages 325--344. ACM, 2011.
[12]
F. de Dinechin, C. Lauter, and G. Melquiond. Certifying the floating-point implementation of an elementary function using gappa. IEEE Trans. Comput., 60(2):242--253, Feb. 2011. URL http://dx.doi.org/10.1109/TC.2010.128.
[13]
D. Delmas, E. Goubault, S. Putot, J. Souyris, K. Tekkal, and F. Védrine. Towards an industrial use of fluctuat on safety-critical avionics software. In Proceedings of the 14th International Workshop on Formal Methods for Industrial Critical Systems, FMICS '09, pages 53--69, Berlin, Heidelberg, 2009. Springer-Verlag. URL http://dx.doi.org/10.1007/978-3-642-04570-7_6.
[14]
M. Frigo. A Fast Fourier Transform compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.
[15]
D. Goldberg. What every computer scientist should know about floating point arithmetic. ACM Computing Surveys, 23(1):5--48, 1991.
[16]
Y. Hida, X. S. Li, and D. H. Bailey. Algorithms for quad-double precision floating point arithmetic. In Proceedings of the 15th IEEE Symposium on Computer Arithmetic, ARITH '01, pages 155--, Washington, DC, USA, 2001. IEEE Computer Society. URL http://dl.acm.org/citation.cfm?id=872021.872445.
[17]
N. J. Higham. Accuracy and Stability of Numerical Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, second edition, 2002.
[18]
W. Kahan. A Mathematical Curiosity, 2013.
[19]
M. O. Lam, J. K. Hollingsworth, and G. W. Stewart. Dynamic floating-point cancellation detection. In 1st International Workshop on High-performance Infrastructure for Scalable Tools, 2011.
[20]
M. O. Lam, J. K. Hollingsworth, B. R. de Supinski, and M. P. LeGendre. Automatically adapting programs for mixed-precision floating-point computation. In A. D. Malony, M. Nemirovsky, and S. P. Midkiff, editors, ICS, pages 369--378. ACM, 2013.
[21]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.
[22]
X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung, and D. J. Yoo. Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw., 28(2):152--205, June 2002. URL http://doi.acm.org/10.1145/567806.567808.
[23]
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.
[24]
T. Ravitch. LLVM Whole-Program Wrapper @ONLINE, Mar. 2011. URL https://github.com/travitch/whole-program-llvm.
[25]
K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In 5th joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'05), pages 263--272. ACM, 2005.
[26]
I. C. Society. IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754--2008, Aug. 2008. URL http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935.
[27]
R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. of SciDAC 2005, J. of Physics: Conference Series. Institute of Physics Publishing, June 2005.
[28]
C. Whaley. Automatically Tuned Linear Algebra Software (ATLAS). math-atlas.sourceforge.net, 2012.
[29]
A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Trans. Software Eng., 28(2):183--200, 2002.

Cited By

View all
  • (2024)A Survey on Design Space Exploration Approaches for Approximate Computing SystemsElectronics10.3390/electronics1322444213:22(4442)Online publication date: 13-Nov-2024
  • (2024)FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point DataProceedings of the VLDB Endowment10.14778/3648160.364818017:6(1418-1431)Online publication date: 1-Feb-2024
  • (2024)Arfa: An Agile Regime-Based Floating-Point Optimization Approach for Rounding ErrorsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680378(1516-1528)Online publication date: 11-Sep-2024
  • Show More Cited By

Recommendations

Reviews

Sanzheng Qiao

Accuracy and error tolerance are important requirements for most software. A common practice for achieving the desirable accuracy and avoiding potential numerical problems, such as overflowunderflow, is to use high precision for all of the floating-point variables in a program. High precision comes at a cost: increased execution time. This is especially true when the extra precision such as double double is unavailable in hardware and implemented in software; the extra cost can be substantial. Very often, only a portion of the variables and computation need to be in high precision for the desirable accuracy. However, manually identifying the variables and computation that can use lower precision while satisfying the given error tolerance is impractical, if not impossible. The tool Precimonious presented in this paper automates the tuning of the floating-point precision of the variables of a program. Specifically, given a high-precision program that satisfies the error tolerance, this tool identifies the variables that can be declared in lower precision while achieving the same accuracy without degrading the performance (execution time). The basic idea behind the tool is as follows. It searches for the program variables whose precision (types) can be lowered, for example, from double double to double; suggests a type configuration mapping variables to types; transforms the original program by applying the type configuration; tests the transformed program to determine if the type configuration is valid in terms of the two criteria, accuracy (error tolerance) and performance (execution time); and repeats until a valid type configuration is found. The authors tested the tool on programs that use GNU scientific library and NAS parallel benchmarks. They reported speedup ranging from 0 to 41.7 percent, with more cases of significant speedup within a larger error threshold (10-4) than a smaller threshold (10-10). Tuning floating-point precision to improve performance is of great interest. This work is an ongoing research project rather than a user-friendly software package. Source code (a C program in the tests) is first translated into LLVM bitcode. Mapping the type configuration, which is in LLVM bitcode, to the source code is preformed manually. The type configuration suggested by the tool is valid only for the testing input data, and is not guaranteed for all possible inputs. The user should provide a set of representative input data. The tool only changes variable declaration and switches function calls. Tuning the precision of intermediate results is a challenging future work. Despite the limitations of the tool, it is of great interest for researchers and developers. The Precimonious source code and all of the data and results presented in the paper are available under BSD license at https:github.comcorvette-berkeleyprecimonious. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. delta-debugging algorithm
  2. dynamic program analysis
  3. floating-point arithmetic
  4. mixed precision
  5. program optimization

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)17
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey on Design Space Exploration Approaches for Approximate Computing SystemsElectronics10.3390/electronics1322444213:22(4442)Online publication date: 13-Nov-2024
  • (2024)FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point DataProceedings of the VLDB Endowment10.14778/3648160.364818017:6(1418-1431)Online publication date: 1-Feb-2024
  • (2024)Arfa: An Agile Regime-Based Floating-Point Optimization Approach for Rounding ErrorsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680378(1516-1528)Online publication date: 11-Sep-2024
  • (2024)VCFloat2: Floating-Point Error Analysis in CoqProceedings of the 13th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3636501.3636953(14-29)Online publication date: 9-Jan-2024
  • (2024)Implementation and Synthesis of Math Library FunctionsProceedings of the ACM on Programming Languages10.1145/36328748:POPL(942-969)Online publication date: 5-Jan-2024
  • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
  • (2024)Predicting Performance and Accuracy of Mixed-Precision Programs for Precision TuningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623338(1-13)Online publication date: 20-May-2024
  • (2024)Interleaved Execution of Approximated CUDA Kernels in Iterative Applications2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00017(60-67)Online publication date: 20-Mar-2024
  • (2024)Using reactive links to propagate changes across engineering modelsSoftware and Systems Modeling10.1007/s10270-024-01186-wOnline publication date: 10-Jun-2024
  • (2024)Rigorous Floating-Point Round-Off Error Analysis in PRECiSA 4.0Formal Methods10.1007/978-3-031-71177-0_2(20-38)Online publication date: 9-Sep-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media