[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Synthesis of data completion scripts using finite tree automata

Published: 12 October 2017 Publication History

Abstract

In application domains that store data in a tabular format, a common task is to fill the values of some cells using values stored in other cells. For instance, such data completion tasks arise in the context of missing value imputation in data science and derived data computation in spreadsheets and relational databases. Unfortunately, end-users and data scientists typically struggle with many data completion tasks that require non-trivial programming expertise. This paper presents a synthesis technique for automating data completion tasks using programming-by-example (PBE) and a very lightweight sketching approach. Given a formula sketch (e.g., AVG(?1, ?2)) and a few input-output examples for each hole, our technique synthesizes a program to automate the desired data completion task. Towards this goal, we propose a domain-specific language (DSL) that combines spatial and relational reasoning over tabular data and a novel synthesis algorithm that can generate DSL programs that are consistent with the input-output examples. The key technical novelty of our approach is a new version space learning algorithm that is based on finite tree automata (FTA). The use of FTAs in the learning algorithm leads to a more compact representation that allows more sharing between programs that are consistent with the examples. We have implemented the proposed approach in a tool called DACE and evaluate it on 84 benchmarks taken from online help forums. We also illustrate the advantages of our approach by comparing our technique against two existing synthesizers, namely Prose and Sketch.

References

[1]
Parosh A Abdulla, Ahmed Bouajjani, Lukáš Holík, Lisa Kaati, and Tomáš Vojnar. 2008. Composed bisimulation for tree automata. In International Conference on Implementation and Application of Automata. Springer, 212–222.
[2]
Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Program Synthesis (CAV). Springer-Verlag, 934–950.
[3]
Rajeev Alur, Pavol Čern`y, and Arjun Radhakrishna. 2015. Synthesis through unification. In International Conference on Computer Aided Verification . Springer, 163–179.
[4]
James Bornholt, Emina Torlak, Dan Grossman, and Luis Ceze. 2016. Optimizing Synthesis with Metasketches (POPL). ACM, 775–788.
[5]
Julien Cristau, Christof Löding, and Wolfgang Thomas. 2005. Deterministic Automata on Unranked Trees (FCT). SpringerVerlag, 68–79.
[6]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In PLDI. ACM, 422–436.
[7]
John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-output Examples (PLDI). ACM, 229–239.
[8]
Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples (POPL). ACM, 317–330.
[9]
Haruo Hosoya and Benjamin C. Pierce. 2003. XDuce: A Statically Typed XML Processing Language. ACM Trans. Internet Technol. 3, 2 (2003), 117–148.
[10]
Bishoksan Kafle and John P. Gallagher. 2015. Tree Automata-Based Refinement with Application to Horn Clause Verification (VMCAI 2015) . Springer-Verlag New York, Inc., 209–226.
[11]
Kevin Knight and Jonathan May. 2009. Applications of weighted automata in natural language processing. In Handbook of Weighted Automata . Springer, 571–596.
[12]
Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn. 53, 1-2 (2003), 111–156.
[13]
A Solar Lezama. 2008. Program synthesis by sketching. Ph.D. Dissertation.
[14]
Parthasarathy Madhusudan. 2011. Synthesizing Reactive Programs. In Computer Science Logic. 428–442.
[15]
Wim Martens and Joachim Niehren. 2005. Minimizing Tree Automata for Unranked Trees. Springer Berlin Heidelberg, 232–246.
[16]
Jonathan May and Kevin Knight. 2008. A Primer on Tree Automata Software for Natural Language Processing. (2008).
[17]
Tom M Mitchell. 1982. Generalization as search. Artificial intelligence 18, 2 (1982), 203–226.
[18]
Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed Program Synthesis (PLDI). ACM, 619–630.
[19]
Michael Pardowitz, Bernhard Glaser, and Rüdiger Dillmann. 2007. Learning Repetitive Robot Programs from Demonstrations Using Version Space Algebra. In Proceedings of the 13th IASTED International Conference on Robotics and Applications (RA) . ACTA Press, 394–399.
[20]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types (PLDI) . ACM, 522–538.
[21]
Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis (OOPSLA). ACM, 107–126.
[22]
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic Superoptimization (ASPLOS). 305–316.
[23]
Rishabh Singh and Sumit Gulwani. 2012. Synthesizing number transformations from input-output examples (CAV). Springer, 634–651.
[24]
Calvin Smith and Aws Albarghouthi. 2016. MapReduce Program Synthesis (PLDI). ACM, 326–340.
[25]
Armando Solar-Lezama, Gilad Arnold, Liviu Tancau, Rastislav Bodik, Vijay Saraswat, and Sanjit Seshia. 2007. Sketching Stencils (PLDI). ACM, 167–178.
[26]
Armando Solar-Lezama, Rodric Rabbah, Rastislav Bodík, and Kemal Ebcioğlu. 2005. Programming by Sketching for Bit-streaming Programs (PLDI). ACM, 281–294.
[27]
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs (ASPLOS). ACM, 404–415.
[28]
James W Thatcher and Jesse B Wright. 1968. Generalized finite automata theory with an application to a decision problem of second-order logic. Theory of Computing Systems 2, 1 (1968), 57–81.
[29]
Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. 2013. TRANSIT: specifying protocols with concolic snippets (PLDI). 287–296.
[30]
Xinyu Wang, Sumit Gulwani, and Rishabh Singh. 2016. FIDEX: Filtering Spreadsheet Data using Examples (OOPSLA). ACM, 195–213.
[31]
Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016. Synthesizing Transformations on Hierarchically Structured Data (PLDI). ACM, 508–521.

Cited By

View all
  • (2024)Efficient Bottom-Up Synthesis for Programs with Local VariablesProceedings of the ACM on Programming Languages10.1145/36328948:POPL(1540-1568)Online publication date: 5-Jan-2024
  • (2024)Relational Synthesis of Recursive Programs via Constraint Annotated Tree AutomataComputer Aided Verification10.1007/978-3-031-65633-0_3(41-63)Online publication date: 24-Jul-2024
  • (2023)Saggitarius: A DSL for Specifying Grammatical DomainsProceedings of the ACM on Programming Languages10.1145/36228697:OOPSLA2(2023-2051)Online publication date: 16-Oct-2023
  • Show More Cited By

Index Terms

  1. Synthesis of data completion scripts using finite tree automata

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 1, Issue OOPSLA
    October 2017
    1786 pages
    EISSN:2475-1421
    DOI:10.1145/3152284
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2017
    Published in PACMPL Volume 1, Issue OOPSLA

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data imputation
    2. Finite tree automata
    3. Program synthesis
    4. Programming-by-example

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)108
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Bottom-Up Synthesis for Programs with Local VariablesProceedings of the ACM on Programming Languages10.1145/36328948:POPL(1540-1568)Online publication date: 5-Jan-2024
    • (2024)Relational Synthesis of Recursive Programs via Constraint Annotated Tree AutomataComputer Aided Verification10.1007/978-3-031-65633-0_3(41-63)Online publication date: 24-Jul-2024
    • (2023)Saggitarius: A DSL for Specifying Grammatical DomainsProceedings of the ACM on Programming Languages10.1145/36228697:OOPSLA2(2023-2051)Online publication date: 16-Oct-2023
    • (2023)Inductive Program Synthesis Guided by Observational Program SimilarityProceedings of the ACM on Programming Languages10.1145/36228307:OOPSLA2(912-940)Online publication date: 16-Oct-2023
    • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
    • (2023)Trace-Guided Inductive Synthesis of Recursive Functional ProgramsProceedings of the ACM on Programming Languages10.1145/35912557:PLDI(860-883)Online publication date: 6-Jun-2023
    • (2023)Better Together: Unifying Datalog and Equality SaturationProceedings of the ACM on Programming Languages10.1145/35912397:PLDI(468-492)Online publication date: 6-Jun-2023
    • (2023)Languages with Decidable Learning: A Meta-theoremProceedings of the ACM on Programming Languages10.1145/35860327:OOPSLA1(143-171)Online publication date: 6-Apr-2023
    • (2022)Searching entangled program spacesProceedings of the ACM on Programming Languages10.1145/35476226:ICFP(23-51)Online publication date: 31-Aug-2022
    • (2022)Improving IDE code inspections with tree automataProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3559081(1814-1815)Online publication date: 7-Nov-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media