[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3468264.3468621acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

SynGuar: guaranteeing generalization in programming by example

Published: 18 August 2021 Publication History

Abstract

Programming by Example (PBE) is a program synthesis paradigm in which the synthesizer creates a program that matches a set of given examples. In many applications of such synthesis (e.g., program repair or reverse engineering), we are to reconstruct a program that is close to a specific target program, not merely to produce some program that satisfies the seen examples. In such settings, we wish that the synthesized program generalizes well, i.e., has as few errors as possible on the unobserved examples capturing the target function behavior. In this paper, we propose the first framework (called SynGuar) for PBE synthesizers that guarantees to achieve low generalization error with high probability. Our main contribution is a procedure to dynamically calculate how many additional examples suffice to theoretically guarantee generalization. We show how our techniques can be used in 2 well-known synthesis approaches: PROSE and STUN (synthesis through unification), for common string-manipulation program benchmarks. We find that often a few hundred examples suffice to provably bound generalization error below 5% with high (≥ 98%) probability on these benchmarks. Further, we confirm this empirically: SynGuar significantly improves the accuracy of existing synthesizers in generating the right target programs. But with fewer examples chosen arbitrarily, the same baseline synthesizers (without SynGuar) overfit and lose accuracy.

References

[1]
2019. SyGuS-Comp 2019. https://sygus.org/comp/2019/
[2]
2021. Supplementary Material. https://github.com/HALOCORE/SynGuar
[3]
Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive program synthesis. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-642-39799-8_67
[4]
Zeyuan Allen-Zhu, Yuanzhi Li, and Yingyu Liang. 2019. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers. In Conference on Neural Information Processing Systems (NeurIPS).
[5]
Rajeev Alur, Pavol Černý, and Arjun Radhakrishna. 2015. Synthesis Through Unification. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-21668-3_10
[6]
Shengwei An, Rishabh Singh, Sasa Misailovic, and Roopsha Samanta. 2019. Augmented example-based synthesis using relational perturbation properties. In Principles of Programming Languages (POPL). https://doi.org/10.1145/3371124
[7]
Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. Deepcoder: Learning to write programs. In International Conference on Learning Representations (ICLR).
[8]
Guy Blanc, Jane Lange, and Li-Yang Tan. 2020. Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations. In Innovations in Theoretical Computer Science (ITCS). https://doi.org/10.4230/LIPIcs.ITCS.2020.44
[9]
Tim Blazytko, Moritz Schlögel, Cornelius Aschermann, Ali Abbasi, Joel Frank, Simon Wörner, and Thorsten Holz. 2020. AURORA: Statistical Crash Analysis for Automated Root Cause Explanation. In USENIX Security.
[10]
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. 1987. Occam’s razor. Information processing letters, 24, 6 (1987), 377–380. https://doi.org/10.1016/0020-0190(87)90114-1
[11]
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. 1989. Learnability and the Vapnik-Chervonenkis Dimension. Journal of the ACM (JACM), 36 (1989), https://doi.org/10.1145/76359.76371
[12]
Olivier Bousquet and André Elisseeff. 2002. Stability and Generalization. Journal of Machine Learning Research (JMLR), 2 (2002).
[13]
Xinyun Chen, Chang Liu, and Dawn Song. 2019. Execution-Guided Neural Program Synthesis. In International Conference on Learning Representations (ICLR).
[14]
William W. Cohen. 1994. Pac-learning recursive logic programs: efficient algorithms. Journal of Artificial Intelligence Research (JAIR), 2, 1 (1994), 501–539. https://doi.org/10.1613/jair.97
[15]
William W. Cohen. 1994. Pac-learning recursive logic programs: negative results. Journal of Artificial Intelligence Research (JAIR), 2, 1 (1994), 541–573. https://doi.org/10.1613/jair.1917
[16]
Tamraparni Dasu and Theodore Johnson. 2003. Exploratory Data Mining and Data Cleaning. 479, John Wiley & Sons. https://doi.org/10.1002/0471448354
[17]
Sašo Džeroski, Stephen Muggleton, and Stuart Russell. 1992. PAC-learnability of determinate logic programs. In Annual Workshop on Computational Learning Theory (COLT). 128–135. https://doi.org/10.1145/130385.130399
[18]
Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. RobustFill: neural program learning under noisy I/O. In International Conference on Machine Learning (ICML).
[19]
Dana Drachsler-Cohen, Sharon Shoham, and Eran Yahav. 2017. Synthesis with Abstract Examples. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-63387-9_13
[20]
Samuel Drews, Aws Albarghouthi, and Loris D’Antoni. 2019. Efficient Synthesis with Probabilistic Constraints. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-030-25540-4_15
[21]
Kevin Ellis and Sumit Gulwani. 2017. Learning to Learn Programs from Examples: Going Beyond Program Structure. In International Joint Conferences on Artificial Intelligence Organization (IJCAI). https://doi.org/10.24963/ijcai.2017/227
[22]
P. Ezudheen, Daniel Neider, Deepak D’Souza, Pranav Garg, and P. Madhusudan. 2018. Horn-ICE Learning for Synthesizing Invariants and Contracts. In Object-Oriented Programming, Systems, Languages & Applications (OOPSLA). https://doi.org/10.1145/3276501
[23]
Xiang Gao, Shraddha Barke, Arjun Radhakrishna, Gustavo Soares, Sumit Gulwani, Alan Leung, Nachiappan Nagappan, and Ashish Tiwari. 2020. Feedback-Driven Semi-Supervised Synthesis of Program Transformations. In Object-Oriented Programming, Systems, Languages & Applications (OOPSLA). https://doi.org/10.1145/3428287
[24]
Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM, https://doi.org/10.1145/3318162
[25]
Peter Grünwald. 2007. The Minimum Description Length Principle. isbn:9780262256292 https://doi.org/10.7551/mitpress/4643.001.0001
[26]
Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-Output Examples. In Principles of Programming Languages (POPL). 317–330. https://doi.org/10.1145/1926385.1926423
[27]
Sumit Gulwani. 2016. Programming by Examples - and its applications in Data Wrangling. Verification and Synthesis of Correct and Secure Systems, https://doi.org/10.3233/978-1-61499-627-9-137
[28]
Moritz Hardt, Benjamin Recht, and Yoram Singer. 2016. Train Faster, Generalize Better: Stability of Stochastic Gradient Descent. In International Conference on Machine Learning (ICML).
[29]
David Haussler. 1992. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100 (1992), https://doi.org/10.1016/0890-5401(92)90010-D
[30]
Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In International Conference on Software Engineering (ICSE). https://doi.org/10.1145/1806799.1806833
[31]
Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question Selection for Interactive Program Synthesis. In Programming Language Design and Implementation (PLDI). 1143–1158. https://doi.org/10.1145/3385412.3386025
[32]
K. Kawaguchi and J. Huang. 2019. Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes. In Annual Allerton Conference on Communication, Control, and Computing (Allerton). https://doi.org/10.1109/ALLERTON.2019.8919696
[33]
Larissa Laich, Pavol Bielik, and Martin Vechev. 2020. Guiding Program Synthesis by Learning to Generate Examples. In International Conference on Learning Representations (ICLR).
[34]
Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Machine Learning, 53, 1 (2003), 111–156. https://doi.org/10.1023/A:1025671410623
[35]
Tessa A. Lau, Pedro Domingos, and Daniel S. Weld. 2000. Version Space Algebra and its Application to Programming by Demonstration. In International Conference on Machine Learning (ICML).
[36]
Ben London. 2017. A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent. In Conference on Neural Information Processing Systems (NeurIPS).
[37]
Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In User Interface Software & Technology (UIST). 291–301. https://doi.org/10.1145/2807442.2807459
[38]
Sergey Mechtaev, Alberto Griggio, Alessandro Cimatti, and Abhik Roychoudhury. 2018. Symbolic execution with existential second-order constraints. In ESEC/FSE. 389–399. https://doi.org/10.1145/3236024.3236049
[39]
Tom M Mitchell. 1982. Generalization as search. Artificial Intelligence, 18, 2 (1982), 203–226. https://doi.org/10.1016/0004-3702(82)90040-6
[40]
Wenlong Mou, Liwei Wang, Xiyu Zhai, and Kai Zheng. 2018. Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints. In Conference on Learning Theory, PMLR. 605–638.
[41]
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan Srebro. 2018. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks. In International Conference on Learning Representations (ICLR).
[42]
Hila Peleg and Nadia Polikarpova. 2020. Perfect is the Enemy of Good: Best-Effort Program Synthesis. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.4230/LIPIcs.ECOOP.2020.2
[43]
A. Pensia, V. Jog, and P. Loh. 2018. Generalization Error Bounds for Noisy, Iterative Algorithms. In International Symposium on Information Theory (ISIT). https://doi.org/10.1109/ISIT.2018.8437571
[44]
Daniel Perelman, Sumit Gulwani, Dan Grossman, and Peter Provost. 2014. Test-Driven Synthesis. In Programming Language Design and Implementation (PLDI). https://doi.org/10.1145/2594291.2594297
[45]
Illia Polosukhin and Alexander Skidanov. 2018. Neural Program Search: Solving Programming Tasks from Description and Examples. In ICLR workshop.
[46]
Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: a framework for inductive program synthesis. In Object-Oriented Programming, Systems, Languages & Applications (OOPSLA). 50, 107–126. https://doi.org/10.1145/2858965.2814310
[47]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Principles of Programming Languages (POPL). https://doi.org/10.1145/2676726.2677009
[48]
Andrew Reynolds, Haniel Barbosa, Andres Nötzli, Clark Barrett, and Cesare Tinelli. 2019. cvc4sy: Smart and Fast Term Enumeration for Syntax-Guided Synthesis. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-030-25543-5_5
[49]
Andrew Reynolds, Morgan Deters, Viktor Kuncak, Cesare Tinelli, and Clark Barrett. 2015. Counterexample-Guided Quantifier Instantiation for Synthesis in SMT. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-21668-3_12
[50]
Omar Rivasplata, Emilio Parrado-Hernández, John Shawe-Taylor, Shiliang Sun, and Csaba Szepesvári. 2018. PAC-Bayes Bounds for Stable Algorithms with Instance-Dependent Priors. In Conference on Neural Information Processing Systems (NeurIPS).
[51]
Ronald L. Rivest. 1987. Learning Decision Lists. Machine Learning, 2 (1987), https://doi.org/10.1023/A:1022607331053
[52]
Jean E. Sammet. 1966. The Use of English as a Programming Language. Commun. ACM, https://doi.org/10.1145/365230.365274
[53]
Shiqi Shen, Shweta Shinde, Soundarya Ramesh, Abhik Roychoudhury, and Prateek Saxena. 2019. Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints. In Network and Distributed Systems Security (NDSS). https://doi.org/10.14722/ndss.2019.23530
[54]
Eui Chul Shin, Illia Polosukhin, and Dawn Song. 2018. Improving Neural Program Synthesis with Inferred Execution Traces. In Conference on Neural Information Processing Systems (NeurIPS).
[55]
Rishabh Singh and Sumit Gulwani. 2015. Predicting a Correct Program in Programming by Example. In Computer-Aided Verification (CAV). https://doi.org/10.1007/978-3-319-21690-4_23
[56]
David Canfield Smith. 1975. PYGMALION: A Creative Programming Environment. Stanford University. https://doi.org/10.21236/ada016811
[57]
Phillip D Summers. 1977. A methodology for LISP program construction from examples. Journal of the ACM (JACM), https://doi.org/10.1145/321992.322002
[58]
L. G. Valiant. 1984. A Theory of the Learnable. Commun. ACM, 27 (1984), https://doi.org/10.1145/1968.1972
[59]
Vladimir Vapnik. 2000. The Nature of Statistical Learning Theory. Springer science & business media. https://doi.org/10.1007/978-1-4757-3264-1
[60]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Interactive Query Synthesis from Input-Output Examples. In International Conference on Management of Data (SIGMOD). 1631–1634. https://doi.org/10.1145/3035918.3058738
[61]
Amit Zohar and Lior Wolf. 2018. Automatic Program Synthesis of Long Programs with a Learned Garbage Collector. In Conference on Neural Information Processing Systems (NeurIPS).
[62]
Difan Zou, Yuan Cao, Dongruo Zhou, and Quanquan Gu. 2020. Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning, 109 (2020), 1–26. https://doi.org/10.1007/s10994-019-05839-6

Index Terms

  1. SynGuar: guaranteeing generalization in programming by example

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    August 2021
    1690 pages
    ISBN:9781450385626
    DOI:10.1145/3468264
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 August 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Generalization
    2. Program Synthesis
    3. Sample Complexity

    Qualifiers

    • Research-article

    Conference

    ESEC/FSE '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 445
      Total Downloads
    • Downloads (Last 12 months)115
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media