[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A trustworthy mechanized formalization of R

Published: 06 April 2020 Publication History

Abstract

The R programming language is very popular for developing statistical software and data analysis, thanks to rich libraries, concise and expressive syntax, and support for interactive programming. Yet, the semantics of R is fairly complex, contains many subtle corner cases, and is not formally specified. This makes it difficult to reason about R programs. In this work, we develop a big-step operational semantics for R in the form of an interpreter written in the Coq proof assistant. We ensure the trustworthiness of the formalization by introducing a monadic encoding that allows the Coq interpreter, CoqR, to be in direct visual correspondence with the reference R interpreter, GNU R. Additionally, we provide a testing framework that supports systematic comparison of CoqR and GNU R. In its current state, CoqR covers the nucleus of the R language as well as numerous additional features, making it pass a significant number of realistic test cases from the GNU R and FastR projects. To exercise the formal specification, we prove in Coq the preservation of memory invariants in selected parts of the interpreter. This work is an important first step towards a robust environment for formal verification of R programs.

References

[1]
Karthikeyan Bhargavan, Antoine Delignat-Lavaud, and Sergio Maffeis. 2013. Language-Based Defenses Against Untrusted Browser Origins. In Usenix security symposium.
[2]
Martin Bodin, Arthur Charguéraud, Daniele Filaretti, Philippa Gardner, Sergio Maffeis, Daiva Naudinien, Alan Schmitt, and Gareth Smith. 2014. A Trusted Mechanised JavaScript Specification. In POPL.
[3]
Patrick Burns. 2011. The R Inferno.
[4]
Arthur Charguéraud, Alan Schmitt, and Thomas Wood. 2018. JSExplain: A Double Debugger for JavaScript. In The web conference.
[5]
ECMA International. 2010. Test262. https://github.com/tc39/test262 .
[6]
Philippa Gardner, Sergio Maffeis, and Gareth Smith. 2012. Towards a Program Logic for JavaScript. In POPL.
[7]
Filippo Ghibellini. 2017. Dynamic test generation for R packages. Bachelor’s Thesis.
[8]
Google. {n. d.} R Style Guide. Retrieved 2018 from https://google. github.io/styleguide/Rguide.xml .
[9]
Arjun Guha, Claudiu Saftoiu, and Shriram Krishnamurthi. 2010. The Essence of JavaScript. In ECOOP.
[10]
Ross Ihaka and Robert Gentleman. 1996. R: a Language for Data Analysis and Graphics. Journal of computational and graphical statistics.
[11]
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR(1) Parsers. In ESOP.
[12]
Tomas Kalibera, Petr Maj, Floreal Morandat, and Jan Vitek. 2014. A Fast Abstract Syntax Tree Interpreter for R. In Virtual execution environments.
[13]
Robbert Krebbers and Freek Wiedijk. 2011. A Formalization of the C99 Standard in HOL, Isabelle and Coq. In Calculemus/mkm.
[14]
Xavier Leroy. 2009. Formal Verification of a Realistic Compiler. Communications of the acm.
[15]
Xavier Leroy. 2014. How much is a mechanized proof worth, certification-wise? In Principles in Practice.
[16]
Sergio Maffeis, John C. Mitchell, and Ankur Taly. 2008. An Operational Semantics for JavaScript. In APLAS.
[17]
Sergio Maffeis, John C. Mitchell, and Ankur Taly. 2009. Isolating JavaScript with Filters, Rewriting, and Wrappers. In ESORICS.
[18]
Sergio Maffeis, John C. Mitchell, and Ankur Taly. 2010. Object Capabilities and Isolation of Untrusted Web Applications. In SP. IEEE.
[19]
Petr Maj, Tomas Kalibera, and Jan Vitek. 2013. TestR: R Language Test Driven Specification. In The R User Conference, UseR!
[20]
Jonathan McPherson. 2014. Debugging in R. In The R User Conference, UseR!
[21]
Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the design of the R language. In ECOOP.
[22]
Mozilla. 2013. Mozilla Automated JavaScript Tests. https://developer. mozilla . org / en - US / docs / SpiderMonkey / Running _ Automated _ JavaScript_Tests .
[23]
Daejun Park, Andrei Stefnescu, and Grigore Rou. 2015. KJS: A Complete Formal Semantics of JavaScript. In PLDI.
[24]
Joe Gibbs Politz, Matthew J. Carroll, Benjamin S. Lerner, Justin Pombrio, and Shriram Krishnamurthi. 2012. A Tested Semantics for Getters, Setters, and eval in JavaScript. DLS.
[25]
R Core Team. 2015. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/ .
[26]
R Core Team. 2000. R Language Definition. R foundation for statistical computing.
[27]
R Core Team. {n. d.} The Comprehensive R Archive Network. Retrieved 2018 from https://cran.r-project.org/ .
[28]
Gregor Richards, Christian Hammer, Brian Burg, and Jan Vitek. 2011. The eval that Men Do. A large-scale study of the use of eval in javascript applications. In ECOOP.
[29]
Ankur Taly, Úlfar Erlingsson, John C. Mitchell, Mark S. Miller, and Jasvir Nagra. 2011. Automated Analysis of Security-Critical JavaScript APIs. In SP.
[30]
The Coq development team. 1984. the Coq Proof Assistant. Retrieved 2018 from https://coq.inria.fr/ .
[31]
Luke Tierney, Gabe Becker, and Tomas Kalibera. 2017. ALTREP and Other Things. In R-devel.
[32]
Roman Tsegelskyi and Jan Vitek. 2014. TestR: Generating Unit Tests for R Internals. In The R User Conference, UseR!
[33]
Philip Wadler. 1992. Comprehending Monads. Mathematical structures in computer science.
[34]
Thomas Wuerthinger. 2012. Truffle: A Self-Optimizing Runtime System.
[35]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI.

Cited By

View all
  • (2023)An Executable Semantics for Faster Development of Optimizing Python CompilersProceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3623476.3623529(15-28)Online publication date: 23-Oct-2023
  • (2020)Validating Formal Semantics by Property-Based Cross-TestingProceedings of the 32nd Symposium on Implementation and Application of Functional Languages10.1145/3462172.3462200(150-161)Online publication date: 2-Sep-2020
  • (2020)A mechanized formalization of GraphQLProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3372885.3373822(201-214)Online publication date: 20-Jan-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 53, Issue 8
DLS '18
August 2018
100 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3393673
Issue’s Table of Contents
  • cover image ACM Conferences
    DLS 2018: Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages
    October 2018
    100 pages
    ISBN:9781450360302
    DOI:10.1145/3276945
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2020
Published in SIGPLAN Volume 53, Issue 8

Check for updates

Author Tags

  1. Coq
  2. R
  3. Testing infrastructure

Qualifiers

  • Article

Funding Sources

  • Millenium Institute for Foundational Research on Data (IMFD Chile).

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)An Executable Semantics for Faster Development of Optimizing Python CompilersProceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3623476.3623529(15-28)Online publication date: 23-Oct-2023
  • (2020)Validating Formal Semantics by Property-Based Cross-TestingProceedings of the 32nd Symposium on Implementation and Application of Functional Languages10.1145/3462172.3462200(150-161)Online publication date: 2-Sep-2020
  • (2020)A mechanized formalization of GraphQLProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3372885.3373822(201-214)Online publication date: 20-Jan-2020
  • (2019)Automatic and scalable detection of logical errors in functional programming assignmentsProceedings of the ACM on Programming Languages10.1145/33606143:OOPSLA(1-30)Online publication date: 10-Oct-2019
  • (2019)Reflection-aware static regression test selectionProceedings of the ACM on Programming Languages10.1145/33606133:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • (2019)On the design, implementation, and use of laziness in RProceedings of the ACM on Programming Languages10.1145/33605793:OOPSLA(1-27)Online publication date: 10-Oct-2019
  • (2019)Reliable and fast DWARF-based stack unwindingProceedings of the ACM on Programming Languages10.1145/33605723:OOPSLA(1-24)Online publication date: 10-Oct-2019
  • (2019)A path to DOT: formalizing fully path-dependent typesProceedings of the ACM on Programming Languages10.1145/33605713:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • (2019)A fault-tolerant programming model for distributed interactive applicationsProceedings of the ACM on Programming Languages10.1145/33605703:OOPSLA(1-29)Online publication date: 10-Oct-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media