More Web Proxy on the site http://driver.im/

article

A trustworthy mechanized formalization of R

Authors:

Éric TanterAuthors Info & Claims

ACM SIGPLAN Notices, Volume 53, Issue 8

Pages 13 - 24

https://doi.org/10.1145/3393673.3276946

Published: 06 April 2020 Publication History

Abstract

The R programming language is very popular for developing statistical software and data analysis, thanks to rich libraries, concise and expressive syntax, and support for interactive programming. Yet, the semantics of R is fairly complex, contains many subtle corner cases, and is not formally specified. This makes it difficult to reason about R programs. In this work, we develop a big-step operational semantics for R in the form of an interpreter written in the Coq proof assistant. We ensure the trustworthiness of the formalization by introducing a monadic encoding that allows the Coq interpreter, CoqR, to be in direct visual correspondence with the reference R interpreter, GNU R. Additionally, we provide a testing framework that supports systematic comparison of CoqR and GNU R. In its current state, CoqR covers the nucleus of the R language as well as numerous additional features, making it pass a significant number of realistic test cases from the GNU R and FastR projects. To exercise the formal specification, we prove in Coq the preservation of memory invariants in selected parts of the interpreter. This work is an important first step towards a robust environment for formal verification of R programs.

References

[1]

Karthikeyan Bhargavan, Antoine Delignat-Lavaud, and Sergio Maffeis. 2013. Language-Based Defenses Against Untrusted Browser Origins. In Usenix security symposium.

Digital Library

[2]

Martin Bodin, Arthur Charguéraud, Daniele Filaretti, Philippa Gardner, Sergio Maffeis, Daiva Naudinien, Alan Schmitt, and Gareth Smith. 2014. A Trusted Mechanised JavaScript Specification. In POPL.

Digital Library

[3]

Patrick Burns. 2011. The R Inferno.

[4]

Arthur Charguéraud, Alan Schmitt, and Thomas Wood. 2018. JSExplain: A Double Debugger for JavaScript. In The web conference.

Digital Library

[5]

ECMA International. 2010. Test262. https://github.com/tc39/test262 .

[6]

Philippa Gardner, Sergio Maffeis, and Gareth Smith. 2012. Towards a Program Logic for JavaScript. In POPL.

Digital Library

[7]

Filippo Ghibellini. 2017. Dynamic test generation for R packages. Bachelor’s Thesis.

[8]

Google. {n. d.} R Style Guide. Retrieved 2018 from https://google. github.io/styleguide/Rguide.xml .

[9]

Arjun Guha, Claudiu Saftoiu, and Shriram Krishnamurthi. 2010. The Essence of JavaScript. In ECOOP.

Digital Library

[10]

Ross Ihaka and Robert Gentleman. 1996. R: a Language for Data Analysis and Graphics. Journal of computational and graphical statistics.

[11]

Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR(1) Parsers. In ESOP.

Digital Library

[12]

Tomas Kalibera, Petr Maj, Floreal Morandat, and Jan Vitek. 2014. A Fast Abstract Syntax Tree Interpreter for R. In Virtual execution environments.

Digital Library

[13]

Robbert Krebbers and Freek Wiedijk. 2011. A Formalization of the C99 Standard in HOL, Isabelle and Coq. In Calculemus/mkm.

Digital Library

[14]

Xavier Leroy. 2009. Formal Verification of a Realistic Compiler. Communications of the acm.

Digital Library

[15]

Xavier Leroy. 2014. How much is a mechanized proof worth, certification-wise? In Principles in Practice.

[16]

Sergio Maffeis, John C. Mitchell, and Ankur Taly. 2008. An Operational Semantics for JavaScript. In APLAS.

Digital Library

[17]

Sergio Maffeis, John C. Mitchell, and Ankur Taly. 2009. Isolating JavaScript with Filters, Rewriting, and Wrappers. In ESORICS.

Digital Library

[18]

Sergio Maffeis, John C. Mitchell, and Ankur Taly. 2010. Object Capabilities and Isolation of Untrusted Web Applications. In SP. IEEE.

Digital Library

[19]

Petr Maj, Tomas Kalibera, and Jan Vitek. 2013. TestR: R Language Test Driven Specification. In The R User Conference, UseR!

[20]

Jonathan McPherson. 2014. Debugging in R. In The R User Conference, UseR!

[21]

Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the design of the R language. In ECOOP.

[22]

Mozilla. 2013. Mozilla Automated JavaScript Tests. https://developer. mozilla . org / en - US / docs / SpiderMonkey / Running _ Automated _ JavaScript_Tests .

[23]

Daejun Park, Andrei Stefnescu, and Grigore Rou. 2015. KJS: A Complete Formal Semantics of JavaScript. In PLDI.

Digital Library

[24]

Joe Gibbs Politz, Matthew J. Carroll, Benjamin S. Lerner, Justin Pombrio, and Shriram Krishnamurthi. 2012. A Tested Semantics for Getters, Setters, and eval in JavaScript. DLS.

[25]

R Core Team. 2015. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/ .

[26]

R Core Team. 2000. R Language Definition. R foundation for statistical computing.

[27]

R Core Team. {n. d.} The Comprehensive R Archive Network. Retrieved 2018 from https://cran.r-project.org/ .

[28]

Gregor Richards, Christian Hammer, Brian Burg, and Jan Vitek. 2011. The eval that Men Do. A large-scale study of the use of eval in javascript applications. In ECOOP.

Digital Library

[29]

Ankur Taly, Úlfar Erlingsson, John C. Mitchell, Mark S. Miller, and Jasvir Nagra. 2011. Automated Analysis of Security-Critical JavaScript APIs. In SP.

Digital Library

[30]

The Coq development team. 1984. the Coq Proof Assistant. Retrieved 2018 from https://coq.inria.fr/ .

[31]

Luke Tierney, Gabe Becker, and Tomas Kalibera. 2017. ALTREP and Other Things. In R-devel.

[32]

Roman Tsegelskyi and Jan Vitek. 2014. TestR: Generating Unit Tests for R Internals. In The R User Conference, UseR!

[33]

Philip Wadler. 1992. Comprehending Monads. Mathematical structures in computer science.

[34]

Thomas Wuerthinger. 2012. Truffle: A Self-Optimizing Runtime System.

[35]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI.

Digital Library

Cited By

Melançon OFeeley MSerrano MSaraiva JDegueule TScott E(2023)An Executable Semantics for Faster Development of Optimizing Python CompilersProceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3623476.3623529(15-28)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3623476.3623529
Bereczky PHorpácsi DKőszegi JSzeier SThompson S(2020)Validating Formal Semantics by Property-Based Cross-TestingProceedings of the 32nd Symposium on Implementation and Application of Functional Languages10.1145/3462172.3462200(150-161)Online publication date: 2-Sep-2020
https://dl.acm.org/doi/10.1145/3462172.3462200
Díaz TOlmedo FTanter ÉBlanchette JHriţcu C(2020)A mechanized formalization of GraphQLProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3372885.3373822(201-214)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3372885.3373822
Show More Cited By

Index Terms

A trustworthy mechanized formalization of R
1. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments
      1. Application specific development environments
2. Theory of computation
  1. Semantics and reasoning
    1. Program semantics
      1. Denotational semantics

Recommendations

A trustworthy mechanized formalization of R
DLS 2018: Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages

The R programming language is very popular for developing statistical software and data analysis, thanks to rich libraries, concise and expressive syntax, and support for interactive programming. Yet, the semantics of R is fairly complex, contains many ...
A mechanized formalization of GraphQL
CPP 2020: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs

GraphQL is a novel language for specifying and querying web APIs, allowing clients to flexibly and efficiently retrieve data of interest. The GraphQL language specification is unfortunately only available in prose, making it hard to develop robust ...
A List-machine Benchmark for Mechanized Metatheory

We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out that much can be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 53, Issue 8

DLS '18

August 2018

100 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3393673

Editor:
Jens Palsberg
University of California, Los Angeles

Issue’s Table of Contents

DLS 2018: Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages
October 2018
100 pages
ISBN:9781450360302
DOI:10.1145/3276945
Program Chair:
Tim Felgentreff
Oracle Labs, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2020

Published in SIGPLAN Volume 53, Issue 8

Check for updates

Author Tags

Qualifiers

Article

Funding Sources

Millenium Institute for Foundational Research on Data (IMFD Chile).

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
182
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Melançon OFeeley MSerrano MSaraiva JDegueule TScott E(2023)An Executable Semantics for Faster Development of Optimizing Python CompilersProceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3623476.3623529(15-28)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3623476.3623529
Bereczky PHorpácsi DKőszegi JSzeier SThompson S(2020)Validating Formal Semantics by Property-Based Cross-TestingProceedings of the 32nd Symposium on Implementation and Application of Functional Languages10.1145/3462172.3462200(150-161)Online publication date: 2-Sep-2020
https://dl.acm.org/doi/10.1145/3462172.3462200
Díaz TOlmedo FTanter ÉBlanchette JHriţcu C(2020)A mechanized formalization of GraphQLProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3372885.3373822(201-214)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3372885.3373822
Song DLee MOh H(2019)Automatic and scalable detection of logical errors in functional programming assignmentsProceedings of the ACM on Programming Languages10.1145/33606143:OOPSLA(1-30)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360614
Shi AHadzi-Tanovic MZhang LMarinov DLegunsen O(2019)Reflection-aware static regression test selectionProceedings of the ACM on Programming Languages10.1145/33606133:OOPSLA(1-29)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360613
Goel AVitek J(2019)On the design, implementation, and use of laziness in RProceedings of the ACM on Programming Languages10.1145/33605793:OOPSLA(1-27)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360579
Bastian TKell SZappa Nardelli F(2019)Reliable and fast DWARF-based stack unwindingProceedings of the ACM on Programming Languages10.1145/33605723:OOPSLA(1-24)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360572
Rapoport MLhoták O(2019)A path to DOT: formalizing fully path-dependent typesProceedings of the ACM on Programming Languages10.1145/33605713:OOPSLA(1-29)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360571
Mogk RDrechsler JSalvaneschi GMezini M(2019)A fault-tolerant programming model for distributed interactive applicationsProceedings of the ACM on Programming Languages10.1145/33605703:OOPSLA(1-29)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360570

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents