More Web Proxy on the site http://driver.im/

research-article

Public Access

Continuously reasoning about programs using differential Bayesian inference

Authors:

Mukund Raghothaman,

Mayur NaikAuthors Info & Claims

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 561 - 575

https://doi.org/10.1145/3314221.3314616

Published: 08 June 2019 Publication History

Abstract

Programs often evolve by continuously integrating changes from multiple programmers. The effective adoption of program analysis tools in this continuous integration setting is hindered by the need to only report alarms relevant to a particular program change. We present a probabilistic framework, Drake, to apply program analyses to continuously evolving programs. Drake is applicable to a broad range of analyses that are based on deductive reasoning. The key insight underlying Drake is to compute a graph that concisely and precisely captures differences between the derivations of alarms produced by the given analysis on the program before and after the change. Performing Bayesian inference on the graph thereby enables to rank alarms by likelihood of relevance to the change. We evaluate Drake using Sparrow—a static analyzer that targets buffer-overrun, format-string, and integer-overflow errors—on a suite of ten widely-used C programs each comprising 13k–112k lines of code. Drake enables to discover all true bugs by inspecting only 30 alarms per benchmark on average, compared to 85 (3× more) alarms by the same ranking approach in batch mode, and 118 (4× more) alarms by a differential approach based on syntactic masking of alarms which also misses 4 of the 26 bugs overall.

Supplementary Material

WEBM File (p561-heo.webm)

Download
88.07 MB

References

[1]

Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of Databases: The Logical Level (1st ed.). Pearson.

Digital Library

[2]

Thomas Ball and Sriram Rajamani. 2002. The SLAM Project: Debugging System Software via Static Analysis. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2002) . ACM, 1–3.

Digital Library

[3]

Gilles Barthe, Juan Manuel Crespo, and César Kunz. 2011. Relational Verification Using Product Programs. In Formal Methods (FM 2011). Springer, 200–214.

Digital Library

[4]

Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66–75.

Digital Library

[5]

Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jérome Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2003. A Static Analyzer for Large Safety-critical Software. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2003) . ACM, 196–207.

Digital Library

[6]

Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA 2009) . ACM, 243–262.

Digital Library

[7]

Cristiano Calcagno, Dino Distefano, Jeremy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving Fast with Software Verification. In NASA Formal Method Symposium. Springer, 3–11.

[8]

Ann Campbell and Patroklos Papapetrou. 2013. SonarQube in Action (1st ed.). Manning Publications Co.

Digital Library

[9]

Michael Clarkson and Fred Schneider. 2010. Hyperproperties. Journal of Computer Security 18, 6 (Sept. 2010), 1157–1210.

Digital Library

[10]

MITRE Corporation. 2015. CVE-2015-1345. https://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2015-1345 .

[11]

MITRE Corporation. 2015. CVE-2015-8106. https://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2015-8106 .

[12]

MITRE Corporation. 2017. CVE-2017-16938. https://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2017-16938 .

[13]

MITRE Corporation. 2018. CVE-2018-10372. https://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2018-10372 .

[14]

Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL 1977) . ACM, 238–252.

Digital Library

[15]

Paul Eggert. 2010. sort: Commit 14ad7a2. http://git.savannah. gnu.org/cgit/coreutils.git/commit/?id=14ad7a2 . sort: Fix very-unlikely buffer overrun when merging to input file.

[16]

Manuel Fähndrich and Francesco Logozzo. 2010. Static Contract Checking with Abstract Interpretation. In Proceedings of the International Conference on Formal Verification of Object-Oriented Software (FoVeOOS 2010) . Springer, 10–30.

Digital Library

[17]

Benny Godlin and Ofer Strichman. 2009. Regression Verification. In Proceedings of the 46th Annual Design Automation Conference (DAC 2009) . ACM, 466–471.

Digital Library

[18]

Assaf Gordon. 2018. sed: Commit 007a417. http://git.savannah. gnu.org/cgit/sed.git/commit/?id=007a417 . sed: Fix heap buffer overflow from multiline EOL regex optimization.

[19]

GrammaTech. 2005. CodeSonar. https://www.grammatech.com/ products/codesonar .

[20]

Chris Hawblitzel, Ming Kawaguchi, Shuvendu K. Lahiri, and Henrique Rebêlo. 2013. Towards Modularly Comparing Programs Using Automated Theorem Provers. In Proceedings of the International Conference on Automated Deduction (CADE 24) . Springer, 282–299.

Digital Library

[21]

Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017. Machine-learningguided Selectively Unsound Static Analysis. In Proceedings of the 39th International Conference on Software Engineering (ICSE 2017) . IEEE Press, 519–529.

Digital Library

[22]

David Hovemeyer and William Pugh. 2004. Finding Bugs is Easy. SIGPLAN Notices 39, OOPSLA (Dec. 2004), 92–106.

Digital Library

[23]

James Hunt and Douglas McIlroy. 1976. An Algorithm for Differential File Comparison . Technical Report. Bell Laboratories.

[24]

Daniel Jackson and David Ladd. 1994. Semantic Diff: A Tool for Summarizing the Effects of Modifications. In Proceedings of the International Conference on Software Maintenance (ICSM 1994) . 243–252.

Digital Library

[25]

Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. In Proceedings of the International Conference on Computer Aided Verification (CAV 2016) . Springer, 422– 430.

[26]

Yungbum Jung, Jaehwang Kim, Jaeho Shin, and Kwangkeun Yi. 2005. Taming False Alarms From a Domain-unaware C Analyzer by a Bayesian Statistical Post Analysis. In Static Analysis: 12th International Symposium (SAS 2005) . Springer, 203–217.

Digital Library

[27]

Jeehoon Kang, Yoonseung Kim, Youngju Song, Juneyoung Lee, Sanghoon Park, Mark Dongyeon Shin, Yonghyun Kim, Sungkeun Cho, Joonwon Choi, Chung-Kil Hur, and Kwangkeun Yi. 2018. Crellvm: Verified Credible Compilation for LLVM. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018) . ACM, 631–645.

Digital Library

[28]

Ming Kawaguchi, Shuvendu Lahiri, and Henrique Rebelo. 2010. Conditional Equivalence . Technical Report. Microsoft Research. https://www.microsoft.com/en-us/research/publication/ conditional-equivalence/

[29]

Miryung Kim and David Notkin. 2009. Discovering and Representing Systematic Code Changes. In Proceedings of the 31st International Conference on Software Engineering (ICSE 2009) . IEEE Computer Society, 309–319.

Digital Library

[30]

Ugur Koc, Parsa Saadatpanah, Jeffrey Foster, and Adam Porter. 2017. Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2017) . ACM, 35–42.

Digital Library

[31]

Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques . The MIT Press.

Digital Library

[32]

Ted Kremenek and Dawson Engler. 2003. Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations. In Static Analysis: 10th International Symposium (SAS 2003). Springer, 295–315.

Digital Library

[33]

Ted Kremenek, Andrew Ng, and Dawson Engler. 2007. A Factor Graph Model for Software Bug Finding. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI 2007) . Morgan Kaufmann, 2510–2516.

Digital Library

[34]

Shuvendu Lahiri, Chris Hawblitzel, Ming Kawaguchi, and Henrique Rebêlo. 2012. SymDiff: A Language-Agnostic Semantic Diff Tool for Imperative Programs. In Proceedings of the International Conference on Computer Aided Verification (CAV 2012) . Springer, 712–717.

Digital Library

[35]

Shuvendu Lahiri, Kenneth McMillan, Rahul Sharma, and Chris Hawblitzel. 2013. Differential Assertion Checking. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013) . ACM, 345–355.

Digital Library

[36]

Shuvendu Lahiri, Kapil Vaswani, and C. A. R. Hoare. 2010. Differential Static Analysis: Opportunities, Applications, and Challenges. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research (FoSER 2010) . ACM, 201–204.

Digital Library

[37]

Wei Le and Mary Lou Soffa. 2010. Path-based Fault Correlations. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2010) . ACM, 307–316.

Digital Library

[38]

Woosuk Lee, Wonchan Lee, and Kwangkeun Yi. 2012. Sound Nonstatistical Clustering of Static Analysis Alarms. In Verification, Model Checking, and Abstract Interpretation: 13th International Conference (VMCAI 2012) . Springer, 299–314.

Digital Library

[39]

Benjamin Livshits, Aditya Nori, Sriram Rajamani, and Anindya Banerjee. 2009. Merlin: Specification Inference for Explicit Information Flow Problems. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009) . ACM, 75–86.

Digital Library

[40]

Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Guyer, Uday Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM 58, 2 (Jan. 2015), 44–46.

Digital Library

[41]

Francesco Logozzo, Shuvendu K. Lahiri, Manuel Fähndrich, and Sam Blackshear. 2014. Verification Modulo Versions: Towards Usable Verification. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2014) . ACM, 294–304.

Digital Library

[42]

Magnus Madsen and Anders Møller. 2014. Sparse Dataflow Analysis with Pointers and Reachability. In Static Analysis. Springer, 201–218.

[43]

Jim Meyering. 2018. tar: Commit b531801. http://git.savannah. gnu.org/cgit/tar.git/commit/?id=b531801 . One-top-level: Avoid a heap-buffer-overflow.

[44]

Gianluca Mezzetti, Anders Møller, and Martin Toldam Torp. 2018. Type Regression Testing to Detect Breaking Changes in Node.js Libraries. In Proceedings of the 32nd European Conference on Object-Oriented Programming (ECOOP 2018), Vol. 109. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 7:1–7:24.

[45]

Joris Mooij. 2010. libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models. Journal of Machine Learning Research 11 (Aug. 2010), 2169–2173.

Digital Library

[46]

Mayur Naik. 2006. Chord: A Program Analysis Platform for Java. https://github.com/pag-lab/jchord .

[47]

Aleksandar Nanevski, Anindya Banerjee, and Deepak Garg. 2011. Verification of Information Flow and Access Control Policies with Dependent Types. In Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP 2011) . IEEE Computer Society, 165–179.

Digital Library

[48]

Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and Implementation of Sparse Global Analyses for C-like Languages. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI 2012) . ACM, 229–238.

Digital Library

[49]

Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. The Sparrow static analyzer. https://github.com/ropas/ sparrow .

[50]

Peter O’Hearn. 2018. Continuous Reasoning: Scaling the Impact of Formal Methods. In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2018) . ACM, 13–25.

Digital Library

[51]

Nimrod Partush and Eran Yahav. 2013. Abstract Semantic Differencing for Numerical Programs. In Proceedings of the International Static Analysis Symposium (SAS 2013) . Springer, 238–258.

[52]

Suzette Person, Matthew Dwyer, Sebastian Elbaum, and Corina Pˇasˇareanu. 2008. Differential Symbolic Execution. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2008) . ACM, 226–237.

Digital Library

[53]

Mukund Raghothaman, Sulekha Kulkarni, Kihong Heo, and Mayur Naik. 2018. User-guided Program Reasoning Using Bayesian Inference. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018) . ACM, 722–735.

Digital Library

[54]

David Ramos and Dawson Engler. 2011. Practical, Low-effort Equivalence Verification of Real Code. In Proceedings of the International Conference on Computer Aided Verification (CAV 2011) . Springer, 669– 685.

Digital Library

[55]

Tim Rühsen. 2018. wget: Commit b3ff8ce. http://git.savannah. gnu.org/cgit/wget.git/commit/?id=b3ff8ce . src/ftp-ls.c ( ftp_parse_vms_ls): Fix heap-buffer-overflow.

[56]

Tim Rühsen. 2018. wget: Commit f0d715b. http://git.savannah. gnu.org/cgit/wget.git/commit/?id=f0d715b . src/ftp-ls.c ( ftp_parse_vms_ls): Fix heap-buffer-overflow.

[57]

Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan. 2018. Lessons from Building Static Analysis Tools at Google. Commun. ACM 61, 4 (March 2018), 58–66.

Digital Library

[58]

Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and Precise Sparse Value Flow Analysis for Million Lines of Code. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018) . ACM, 693–706.

Digital Library

[59]

Yannis Smaragdakis, George Kastrinis, and George Balatsouras. 2014. Introspective Analysis: Context-sensitivity, Across the Board. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2014) . ACM, 485–495.

Digital Library

[60]

Marcelo Sousa, Isil Dillig, and Shuvendu Lahiri. 2018. Verified Threeway Program Merge. 2, OOPSLA (2018), 165:1–165:29.

Digital Library

[61]

Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-flow Analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016) . ACM, 265–266.

Digital Library

[62]

Chungha Sung, Shuvendu Lahiri, Constantin Enea, and Chao Wang. 2018. Datalog-based Scalable Semantic Diffing of Concurrent Programs. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018) . ACM, 656–666.

Digital Library

Cited By

Ge XFang CLi XSun WWu DZhai JLin SZhao ZLiu YChen Z(2024)Machine Learning for Actionable Warning Identification: A Comprehensive SurveyACM Computing Surveys10.1145/369635257:2(1-35)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.1145/3696352
Zhang YShi YZhang X(2024)Learning Abstraction Selection for Bayesian Program AnalysisProceedings of the ACM on Programming Languages10.1145/36498458:OOPSLA1(954-982)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649845
Wang XZhao L(2023)APICAD: Augmenting API Misuse Detection through Specifications from Code and Documents2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00032(245-256)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00032
Show More Cited By

Index Terms

Continuously reasoning about programs using differential Bayesian inference
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
      1. Bayesian networks
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software evolution
  2. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Automated static analysis

Recommendations

User-guided program reasoning using Bayesian inference
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false ...
User-guided program reasoning using Bayesian inference
PLDI '18

Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false ...
Effective Program Reasoning Using Bayesian Inference

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2019

1162 pages

ISBN:9781450367127

DOI:10.1145/3314221

General Chair:
Kathryn S. McKinley
Google, USA
,
Program Chair:
Kathleen Fisher
Tufts University, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

PLDI '19

Sponsor:

SIGPLAN

PLDI '19: 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 22 - 26, 2019

AZ, Phoenix, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
930
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)23

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ge XFang CLi XSun WWu DZhai JLin SZhao ZLiu YChen Z(2024)Machine Learning for Actionable Warning Identification: A Comprehensive SurveyACM Computing Surveys10.1145/369635257:2(1-35)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.1145/3696352
Zhang YShi YZhang X(2024)Learning Abstraction Selection for Bayesian Program AnalysisProceedings of the ACM on Programming Languages10.1145/36498458:OOPSLA1(954-982)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649845
Wang XZhao L(2023)APICAD: Augmenting API Misuse Detection through Specifications from Code and Documents2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00032(245-256)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00032
Liu PLu YYang WPan M(2023)VALAR: Streamlining Alarm Ranking in Static Analysis with Value-Flow Assisted Active Learning2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00098(1940-1951)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00098
Kang WSon BHeo KYin HStavrou ACremers CShi E(2022)TRACERProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560664(1695-1708)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3548606.3560664
Kim HRaghothaman MHeo KDwyer MDamian DZeller A(2022)Learning probabilistic models for static analysis alarmsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510098(1282-1293)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510098
Muske TSerebrenik A(2022)Survey of Approaches for Postprocessing of Static Analysis AlarmsACM Computing Surveys10.1145/349452155:3(1-39)Online publication date: 3-Feb-2022
https://dl.acm.org/doi/10.1145/3494521
Muske TSerebrenik A(2022)Classification and Ranking of Delta Static Analysis Alarms2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM55253.2022.00029(197-207)Online publication date: Oct-2022
https://doi.org/10.1109/SCAM55253.2022.00029
Dura AReichenbach CSöderberg E(2021)JavaDL: automatically incrementalizing Java bug pattern detectionProceedings of the ACM on Programming Languages10.1145/34855425:OOPSLA(1-31)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3485542
Chen THeo KRaghothaman MSpinellis DGousios GChechik MDi Penta M(2021)Boosting static analysis accuracy with instrumented test executionsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468626(1154-1165)Online publication date: 20-Aug-2021
https://dl.acm.org/doi/10.1145/3468264.3468626
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents