More Web Proxy on the site http://driver.im/

research-article

Open access

SynGuar: guaranteeing generalization in programming by example

Authors:

Teodora Baluta,

Aashish Kolluri,

Prateek SaxenaAuthors Info & Claims

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 677 - 689

https://doi.org/10.1145/3468264.3468621

Published: 18 August 2021 Publication History

Abstract

Programming by Example (PBE) is a program synthesis paradigm in which the synthesizer creates a program that matches a set of given examples. In many applications of such synthesis (e.g., program repair or reverse engineering), we are to reconstruct a program that is close to a specific target program, not merely to produce some program that satisfies the seen examples. In such settings, we wish that the synthesized program generalizes well, i.e., has as few errors as possible on the unobserved examples capturing the target function behavior. In this paper, we propose the first framework (called SynGuar) for PBE synthesizers that guarantees to achieve low generalization error with high probability. Our main contribution is a procedure to dynamically calculate how many additional examples suffice to theoretically guarantee generalization. We show how our techniques can be used in 2 well-known synthesis approaches: PROSE and STUN (synthesis through unification), for common string-manipulation program benchmarks. We find that often a few hundred examples suffice to provably bound generalization error below 5% with high (≥ 98%) probability on these benchmarks. Further, we confirm this empirically: SynGuar significantly improves the accuracy of existing synthesizers in generating the right target programs. But with fewer examples chosen arbitrarily, the same baseline synthesizers (without SynGuar) overfit and lose accuracy.

References

[1]

2019. SyGuS-Comp 2019. https://sygus.org/comp/2019/

[2]

2021. Supplementary Material. https://github.com/HALOCORE/SynGuar

[3]

Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive program synthesis. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-642-39799-8_67

[4]

Zeyuan Allen-Zhu, Yuanzhi Li, and Yingyu Liang. 2019. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers. In Conference on Neural Information Processing Systems (NeurIPS).

[5]

Rajeev Alur, Pavol Černý, and Arjun Radhakrishna. 2015. Synthesis Through Unification. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-21668-3_10

[6]

Shengwei An, Rishabh Singh, Sasa Misailovic, and Roopsha Samanta. 2019. Augmented example-based synthesis using relational perturbation properties. In Principles of Programming Languages (POPL). https://doi.org/10.1145/3371124

Digital Library

[7]

Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. Deepcoder: Learning to write programs. In International Conference on Learning Representations (ICLR).

[8]

Guy Blanc, Jane Lange, and Li-Yang Tan. 2020. Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations. In Innovations in Theoretical Computer Science (ITCS). https://doi.org/10.4230/LIPIcs.ITCS.2020.44

[9]

Tim Blazytko, Moritz Schlögel, Cornelius Aschermann, Ali Abbasi, Joel Frank, Simon Wörner, and Thorsten Holz. 2020. AURORA: Statistical Crash Analysis for Automated Root Cause Explanation. In USENIX Security.

[10]

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. 1987. Occam’s razor. Information processing letters, 24, 6 (1987), 377–380. https://doi.org/10.1016/0020-0190(87)90114-1

Digital Library

[11]

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. 1989. Learnability and the Vapnik-Chervonenkis Dimension. Journal of the ACM (JACM), 36 (1989), https://doi.org/10.1145/76359.76371

Digital Library

[12]

Olivier Bousquet and André Elisseeff. 2002. Stability and Generalization. Journal of Machine Learning Research (JMLR), 2 (2002).

[13]

Xinyun Chen, Chang Liu, and Dawn Song. 2019. Execution-Guided Neural Program Synthesis. In International Conference on Learning Representations (ICLR).

[14]

William W. Cohen. 1994. Pac-learning recursive logic programs: efficient algorithms. Journal of Artificial Intelligence Research (JAIR), 2, 1 (1994), 501–539. https://doi.org/10.1613/jair.97

[15]

William W. Cohen. 1994. Pac-learning recursive logic programs: negative results. Journal of Artificial Intelligence Research (JAIR), 2, 1 (1994), 541–573. https://doi.org/10.1613/jair.1917

[16]

Tamraparni Dasu and Theodore Johnson. 2003. Exploratory Data Mining and Data Cleaning. 479, John Wiley & Sons. https://doi.org/10.1002/0471448354

[17]

Sašo Džeroski, Stephen Muggleton, and Stuart Russell. 1992. PAC-learnability of determinate logic programs. In Annual Workshop on Computational Learning Theory (COLT). 128–135. https://doi.org/10.1145/130385.130399

Digital Library

[18]

Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. RobustFill: neural program learning under noisy I/O. In International Conference on Machine Learning (ICML).

Digital Library

[19]

Dana Drachsler-Cohen, Sharon Shoham, and Eran Yahav. 2017. Synthesis with Abstract Examples. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-63387-9_13

[20]

Samuel Drews, Aws Albarghouthi, and Loris D’Antoni. 2019. Efficient Synthesis with Probabilistic Constraints. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-030-25540-4_15

[21]

Kevin Ellis and Sumit Gulwani. 2017. Learning to Learn Programs from Examples: Going Beyond Program Structure. In International Joint Conferences on Artificial Intelligence Organization (IJCAI). https://doi.org/10.24963/ijcai.2017/227

[22]

P. Ezudheen, Daniel Neider, Deepak D’Souza, Pranav Garg, and P. Madhusudan. 2018. Horn-ICE Learning for Synthesizing Invariants and Contracts. In Object-Oriented Programming, Systems, Languages & Applications (OOPSLA). https://doi.org/10.1145/3276501

Digital Library

[23]

Xiang Gao, Shraddha Barke, Arjun Radhakrishna, Gustavo Soares, Sumit Gulwani, Alan Leung, Nachiappan Nagappan, and Ashish Tiwari. 2020. Feedback-Driven Semi-Supervised Synthesis of Program Transformations. In Object-Oriented Programming, Systems, Languages & Applications (OOPSLA). https://doi.org/10.1145/3428287

Digital Library

[24]

Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM, https://doi.org/10.1145/3318162

Digital Library

[25]

Peter Grünwald. 2007. The Minimum Description Length Principle. isbn:9780262256292 https://doi.org/10.7551/mitpress/4643.001.0001

[26]

Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-Output Examples. In Principles of Programming Languages (POPL). 317–330. https://doi.org/10.1145/1926385.1926423

Digital Library

[27]

Sumit Gulwani. 2016. Programming by Examples - and its applications in Data Wrangling. Verification and Synthesis of Correct and Secure Systems, https://doi.org/10.3233/978-1-61499-627-9-137

[28]

Moritz Hardt, Benjamin Recht, and Yoram Singer. 2016. Train Faster, Generalize Better: Stability of Stochastic Gradient Descent. In International Conference on Machine Learning (ICML).

[29]

David Haussler. 1992. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100 (1992), https://doi.org/10.1016/0890-5401(92)90010-D

Digital Library

[30]

Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In International Conference on Software Engineering (ICSE). https://doi.org/10.1145/1806799.1806833

Digital Library

[31]

Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question Selection for Interactive Program Synthesis. In Programming Language Design and Implementation (PLDI). 1143–1158. https://doi.org/10.1145/3385412.3386025

Digital Library

[32]

K. Kawaguchi and J. Huang. 2019. Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes. In Annual Allerton Conference on Communication, Control, and Computing (Allerton). https://doi.org/10.1109/ALLERTON.2019.8919696

Digital Library

[33]

Larissa Laich, Pavol Bielik, and Martin Vechev. 2020. Guiding Program Synthesis by Learning to Generate Examples. In International Conference on Learning Representations (ICLR).

[34]

Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Machine Learning, 53, 1 (2003), 111–156. https://doi.org/10.1023/A:1025671410623

Digital Library

[35]

Tessa A. Lau, Pedro Domingos, and Daniel S. Weld. 2000. Version Space Algebra and its Application to Programming by Demonstration. In International Conference on Machine Learning (ICML).

[36]

Ben London. 2017. A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent. In Conference on Neural Information Processing Systems (NeurIPS).

[37]

Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In User Interface Software & Technology (UIST). 291–301. https://doi.org/10.1145/2807442.2807459

Digital Library

[38]

Sergey Mechtaev, Alberto Griggio, Alessandro Cimatti, and Abhik Roychoudhury. 2018. Symbolic execution with existential second-order constraints. In ESEC/FSE. 389–399. https://doi.org/10.1145/3236024.3236049

Digital Library

[39]

Tom M Mitchell. 1982. Generalization as search. Artificial Intelligence, 18, 2 (1982), 203–226. https://doi.org/10.1016/0004-3702(82)90040-6

[40]

Wenlong Mou, Liwei Wang, Xiyu Zhai, and Kai Zheng. 2018. Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints. In Conference on Learning Theory, PMLR. 605–638.

[41]

Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan Srebro. 2018. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks. In International Conference on Learning Representations (ICLR).

[42]

Hila Peleg and Nadia Polikarpova. 2020. Perfect is the Enemy of Good: Best-Effort Program Synthesis. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.4230/LIPIcs.ECOOP.2020.2

[43]

A. Pensia, V. Jog, and P. Loh. 2018. Generalization Error Bounds for Noisy, Iterative Algorithms. In International Symposium on Information Theory (ISIT). https://doi.org/10.1109/ISIT.2018.8437571

Digital Library

[44]

Daniel Perelman, Sumit Gulwani, Dan Grossman, and Peter Provost. 2014. Test-Driven Synthesis. In Programming Language Design and Implementation (PLDI). https://doi.org/10.1145/2594291.2594297

Digital Library

[45]

Illia Polosukhin and Alexander Skidanov. 2018. Neural Program Search: Solving Programming Tasks from Description and Examples. In ICLR workshop.

[46]

Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: a framework for inductive program synthesis. In Object-Oriented Programming, Systems, Languages & Applications (OOPSLA). 50, 107–126. https://doi.org/10.1145/2858965.2814310

Digital Library

[47]

Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Principles of Programming Languages (POPL). https://doi.org/10.1145/2676726.2677009

Digital Library

[48]

Andrew Reynolds, Haniel Barbosa, Andres Nötzli, Clark Barrett, and Cesare Tinelli. 2019. cvc4sy: Smart and Fast Term Enumeration for Syntax-Guided Synthesis. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-030-25543-5_5

[49]

Andrew Reynolds, Morgan Deters, Viktor Kuncak, Cesare Tinelli, and Clark Barrett. 2015. Counterexample-Guided Quantifier Instantiation for Synthesis in SMT. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-21668-3_12

[50]

Omar Rivasplata, Emilio Parrado-Hernández, John Shawe-Taylor, Shiliang Sun, and Csaba Szepesvári. 2018. PAC-Bayes Bounds for Stable Algorithms with Instance-Dependent Priors. In Conference on Neural Information Processing Systems (NeurIPS).

[51]

Ronald L. Rivest. 1987. Learning Decision Lists. Machine Learning, 2 (1987), https://doi.org/10.1023/A:1022607331053

Digital Library

[52]

Jean E. Sammet. 1966. The Use of English as a Programming Language. Commun. ACM, https://doi.org/10.1145/365230.365274

Digital Library

[53]

Shiqi Shen, Shweta Shinde, Soundarya Ramesh, Abhik Roychoudhury, and Prateek Saxena. 2019. Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints. In Network and Distributed Systems Security (NDSS). https://doi.org/10.14722/ndss.2019.23530

[54]

Eui Chul Shin, Illia Polosukhin, and Dawn Song. 2018. Improving Neural Program Synthesis with Inferred Execution Traces. In Conference on Neural Information Processing Systems (NeurIPS).

[55]

Rishabh Singh and Sumit Gulwani. 2015. Predicting a Correct Program in Programming by Example. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-21690-4_23

[56]

David Canfield Smith. 1975. PYGMALION: A Creative Programming Environment. Stanford University. https://doi.org/10.21236/ada016811

[57]

Phillip D Summers. 1977. A methodology for LISP program construction from examples. Journal of the ACM (JACM), https://doi.org/10.1145/321992.322002

Digital Library

[58]

L. G. Valiant. 1984. A Theory of the Learnable. Commun. ACM, 27 (1984), https://doi.org/10.1145/1968.1972

Digital Library

[59]

Vladimir Vapnik. 2000. The Nature of Statistical Learning Theory. Springer science & business media. https://doi.org/10.1007/978-1-4757-3264-1

[60]

Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Interactive Query Synthesis from Input-Output Examples. In International Conference on Management of Data (SIGMOD). 1631–1634. https://doi.org/10.1145/3035918.3058738

Digital Library

[61]

Amit Zohar and Lior Wolf. 2018. Automatic Program Synthesis of Long Programs with a Learned Garbage Collector. In Conference on Neural Information Processing Systems (NeurIPS).

[62]

Difan Zou, Yuan Cao, Dongruo Zhou, and Quanquan Gu. 2020. Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning, 109 (2020), 1–26. https://doi.org/10.1007/s10994-019-05839-6

Index Terms

SynGuar: guaranteeing generalization in programming by example
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages

Recommendations

Programming by Example Made Easy
Programming by example (PBE) is an emerging programming paradigm that automatically synthesizes programs specified by user-provided input-output examples. Despite the convenience for end-users, implementing PBE tools often requires strong expertise in ...
Can reactive synthesis and syntax-guided synthesis be friends?
SPLASH Companion 2021: Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Optimizing synthesis with metasketches
POPL '16

Many advanced programming tools---for both end-users and expert developers---rely on program synthesis to automatically generate implementations from high-level specifications. These tools often need to employ tricky, custom-built synthesis algorithms ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 2021

1690 pages

ISBN:9781450385626

DOI:10.1145/3468264

General Chairs:
Diomidis Spinellis
Athens University of Economics and Business, Greece
,
Georgios Gousios
Facebook, Netherlands / Delft University of Technology, Netherlands
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Massimiliano Di Penta
University of Sannio, Italy

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '21

Sponsor:

SIGSOFT

ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 23 - 28, 2021

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
445
Total Downloads

Downloads (Last 12 months)115
Downloads (Last 6 weeks)28

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten