[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3397481.3450680acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Data-centric disambiguation for data transformation with programming-by-example

Published: 14 April 2021 Publication History

Abstract

Programming-by-example (PBE), can be a powerful tool to reduce manual work in repetitive data transformation tasks. However, few examples often leave ambiguity and may cause undesirable data transformation by the system. This ambiguity can be resolved by allowing the user to directly edit the synthesized programs; however, this is difficult for non-programmers. Here, we present a novel approach: data-centric disambiguation for data transformation, where users resolve the ambiguity in data transformation by examining and modifying the output rather than the program. The key idea is to focus on the given set of data the user wants to transform instead of pursuing the synthesized program’s generality or completeness. Our system provides visualization and interaction methods that allow users to efficiently examine and fix the transformed outputs, which is much simpler than understanding and modifying the program itself. The user study suggests that our system can successfully help non-programmers to more easily and efficiently process data.

References

[1]
no date. NBA Advanced Stats: Players Box Scores. https://stats.nba.com/players/boxscores/ Accessed: 2020-07-12.
[2]
no date. Titanic: Machine Learning from Disaster. https://www.kaggle.com/francksylla/titanic-machine-learning-from-disaster Accessed: 2020-07-12.
[3]
Alan F. Blackwell. 2001. Your Wish is My Command. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter SWYN: A Visual Representation for Regular Expressions, 245–270. http://dl.acm.org/citation.cfm?id=369505.369521
[4]
A. Wroblewski David, C. Hill William, and P. McCandless Timothy. 1990. Attribute-Mapped Scroll Bars. US.
[5]
Helena Galhardas, Daniela Florescu, Dennis Shasha, and Eric Simon. 2000. AJAX: An Extensible Data Cleaning Tool. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA) (SIGMOD ’00). ACM, New York, NY, USA, 590–. https://doi.org/10.1145/342009.336568
[6]
Sumit Gulwani, William R. Harris, and Rishabh Singh. 2012. Spreadsheet Data Manipulation Using Examples. Commun. ACM 55, 8 (Aug. 2012), 97–105. https://doi.org/10.1145/2240236.2240260
[7]
Sumit Gulwani and Emilio Parisotto. 2018. PROSE Public Benchmark Suite. https://github.com/microsoft/prose-benchmarks.
[8]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10–18. https://doi.org/10.1145/1656274.1656278
[9]
Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek Narasayya, Surajit Chaudhuri, Xu Chu, and Yudian Zheng. 2018. Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). ACM, New York, NY, USA, 1785–1788. https://doi.org/10.1145/3183713.3193539
[10]
Joseph M. Hellerstein. 2008. Quantitative Data Cleaning for Large Databases.
[11]
William C. Hill, James D. Hollan, Dave Wroblewski, and Tim McCandless. 1992. Edit Wear and Read Wear. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Monterey, California, USA) (CHI ’92). Association for Computing Machinery, New York, NY, USA, 3–9. https://doi.org/10.1145/142750.142751
[12]
Zhongjun Jin, Michael R. Anderson, Michael Cafarella, and H. V. Jagadish. 2017. Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). ACM, New York, NY, USA, 1607–1610. https://doi.org/10.1145/3035918.3058732
[13]
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive Visual Specification of Data Transformation Scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). ACM, New York, NY, USA, 3363–3372. https://doi.org/10.1145/1978942.1979444
[14]
Max James Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In IEEE International Conference on Data Science and Advanced Analytics (DSAA)(DSAA ’15). IEEE, Paris, France. https://doi.org/10.1109/DSAA.2015.7344858
[15]
Gilad Katz, Eui Chul Richard Shin, and Dawn Song. 2016. ExploreKit: Automatic Feature Generation and Selection. In IEEE 16th International Conference on Data Mining (ICDM)(ICDM ’16). IEEE, Barcelona, Spain, 979–984. https://doi.org/10.1109/ICDM.2016.0123
[16]
Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. 2016. Cognito: Automated Feature Engineering for Supervised Learning. In IEEE 16th International Conference on Data Mining Workshops (ICDMW)(ICDMW ’16). IEEE Computer Society, Barcelona, Spain, 1304–1307. https://doi.org/10.1109/ICDMW.2016.0190
[17]
Tessa Lau. 2009. Why Programming-By-Demonstration Systems Fail: Lessons Learned for Usable AI. AI Magazine 30, 4 (Oct. 2009), 65. https://doi.org/10.1609/aimag.v30i4.2262
[18]
Tessa Lau, Steven A. Wolfman, Pedro Domingos, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn. 53, 1-2 (Oct. 2003), 111–156. https://doi.org/10.1023/A:1025671410623
[19]
Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology(Charlotte, NC, USA) (UIST ’15). ACM, New York, NY, USA, 291–301. https://doi.org/10.1145/2807442.2807459
[20]
Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, and Deepak Turaga. 2017. Learning Feature Engineering for Classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI’17). AAAI Press, Melbourne, Australia, 2529–2535. https://doi.org/10.24963/ijcai.2017/352
[21]
Erhard Rahm and Philip A. Bernstein. 2001. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal 10, 4 (Dec. 2001), 334–350. https://doi.org/10.1007/s007780100057
[22]
Vijayshankar Raman and Joseph M. Hellerstein. 2001. Potter’s Wheel: An Interactive Data Cleaning System. In Proceedings of the 27th International Conference on Very Large Data Bases(VLDB ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 381–390. http://dl.acm.org/citation.cfm?id=645927.672045
[23]
Christopher Scaffidi, Brad Myers, and Mary Shaw. 2009. Intelligently Creating and Recommending Reusable Reformatting Rules. In Proceedings of the 14th International Conference on Intelligent User Interfaces (Sanibel Island, Florida, USA) (IUI ’09). ACM, New York, NY, USA, 297–306. https://doi.org/10.1145/1502650.1502692
[24]
Rishabh Singh. 2016. BlinkFill: Semi-supervised Programming By Example for Syntactic String Transformations. In PVLDB, 42nd International Conference on Very Large Data Bases (VLDB 2016). VLDB Endowment, New Delhi, India, 816–827.
[25]
Stephen Soderland. 1999. Learning Information Extraction Rules for Semi-Structured and Free Text. Mach. Learn. 34, 1-3 (Feb. 1999), 233–272. https://doi.org/10.1023/A:1007562322031
[26]
D. K. Wind. 2014. Concepts in predictive machine learning. Master’s thesis. Technical University of Denmark, Copenhagen, Denmark.

Cited By

View all
  • (2023)Improving Oracle-Guided Inductive Synthesis by Efficient Question SelectionProceedings of the ACM on Programming Languages10.1145/35860557:OOPSLA1(819-847)Online publication date: 6-Apr-2023
  • (2023)On the Design of AI-powered Code Assistants for NotebooksProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580940(1-16)Online publication date: 19-Apr-2023

Index Terms

  1. Data-centric disambiguation for data transformation with programming-by-example
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces
      April 2021
      618 pages
      ISBN:9781450380171
      DOI:10.1145/3397481
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 April 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Programming-by-example
      2. User interface
      3. Visualization

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • JST CREST

      Conference

      IUI '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 746 of 2,811 submissions, 27%

      Upcoming Conference

      IUI '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Improving Oracle-Guided Inductive Synthesis by Efficient Question SelectionProceedings of the ACM on Programming Languages10.1145/35860557:OOPSLA1(819-847)Online publication date: 6-Apr-2023
      • (2023)On the Design of AI-powered Code Assistants for NotebooksProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580940(1-16)Online publication date: 19-Apr-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media