[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3025171.3025227acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

Analyza: Exploring Data with Conversation

Published: 07 March 2017 Publication History

Abstract

We describe Analyza, a system that helps lay users explore data. Analyza has been used within two large real world systems. The first is a question-and-answer feature in a spreadsheet product. The second provides convenient access to a revenue/inventory database for a large sales force. Both user bases consist of users who do not necessarily have coding skills, demonstrating Analyza's ability to democratize access to data. We discuss the key design decisions in implementing this system. For instance, how to mix structured and natural language modalities, how to use conversation to disambiguate and simplify querying, how to rely on the ``semantics' of the data to compensate for the lack of syntactic structure, and how to efficiently curate the data.

References

[1]
I. Androutsopoulos, G. D. Ritchie, and P. Thanisch. 1995. Natural language interfaces to databases - an introduction. Natural Language Engineering 1(1) (1995), 29--81. https://arxiv.org/pdf/cmp-lg/9503016.pdf
[2]
Cory Barr, Rosie Jones, and Moira Regelson. 2008. The Linguistic Structure of English Web-search Queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 1021--1030. http: //dl.acm.org/citation.cfm?id=1613715.1613848
[3]
E. Bier, Card S. K., and J. W. Bodnar. 2007. Entity-based collaboration tools for intelligence analysis. In IEEE Symposium on Visual Analytics Science and Technology (VAST '07), W. Ribarsky and O. Keim (Eds.). IEEE Computer Society Press, Los Alamitos, CA, 99--106.
[4]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, NY, NY, USA, 1247--1250.
[5]
Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. WebTables: Exploring the Power of Tables on the Web. Proc. VLDB Endow. 1, 1 (Aug. 2008), 538--549.
[6]
Donald D. Chamberlin, Morton M. Astrahan, Michael W. Blasgen, James N. Gray, W. Frank King, Bruce G. Lindsay, Raymond Lorie, James W. Mehl, Thomas G. Price, Franco Putzolu, Patricia Griffiths Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. 1981. A History and Evaluation of System R. Commun. ACM 24, 10 (Oct. 1981), 632--646.
[7]
Donald D. Chamberlin and Raymond F. Boyce. 1974. SEQUEL: A Structured English Query Language. In Proceedings of the 1974 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET '74). ACM, NY, NY, USA, 249--264.
[8]
Herbert H. Clark and Susan E. Brennan. 1991. Grounding in Communication. Vol. 13. American Psychological Association, 127--149.
[9]
E. F. Codd. 1974. Seven Steps to Rendezvous with the Casual User. In Proc. IFIP TC-2 Working Conference on Data Base Management Systems. https://exhibits. stanford.edu/feigenbaum/catalog/cp353fq9623 published in "Data Base Management", ed. J. W. Klimbie and K. I. Koffeman, North-Holland 1974.
[10]
Kenneth Cox, Rebecca E. Grinter, Stacie L. Hibino, Lalita Jategaonkar Jagadeesan, and David Mantilla. 2001. A Multi-Modal Natural Language Interface to an Information Visualization Environment. International Journal of Speech Technology 4, 3 (2001), 297--314.
[11]
Usama Fayyad, Georges G. Grinstein, and Andreas Wierse (Eds.). 2002. Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[12]
Usama M. Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From Data Mining to Knowledge Discovery: An Overview. In Advances in Knowledge Discovery and Data Mining, Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy (Eds.). American Association for Artificial Intelligence, Menlo Park, CA, USA, 1--34. http://dl.acm.org/citation.cfm?id=257938.257942
[13]
Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST '15). ACM, NY, NY, USA, 489--500.
[14]
Latifa Guerrouj, Massimiliano Penta, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2014. An Experimental Investigation on the Effects of Context on Source Code Identifiers Splitting and Expansion. Empirical Softw. Engg. 19, 6 (Dec. 2014), 1706--1753.
[15]
Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive Programming by Natural Language for Spreadsheet Data Analysis and Manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, NY, NY, USA, 803--814.
[16]
Alon Halevy, Flip Korn, Natalya F. Noy, Christopher Olston, Neoklis Polyzotis, Sudip Roy, and Steven Euijong Whang. 2016. Goods: Organizing Google's Datasets. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, NY, NY, USA, 795--806.
[17]
Patrick Hertzog. 2015. Binary Space Partitioning Layouts To Help Build Better Information Dashboards. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI '15). ACM, NY, NY, USA, 138--147.
[18]
Andreas Holzinger, Matthias Dehmer, and Igor Jurisica. 2014. Knowledge Discovery and interactive Data Mining in Bioinformatics State-of-the-Art, future challenges and research directions. BMC Bioinformatics 15, S-6 (2014), I1.
[19]
S. Jolaoso, R. Burtner, and A. Endert. 2015. Toward a Deeper Understanding of Data Analysis, Sensemaking, and Signature Discovery. In Human-Computer Interaction--INTERACT, 2015. Springer, 463--478.
[20]
Daniel A. Keim. 2002. Information Visualization and Visual Data Mining. IEEE Transactions on Visualization and Computer Graphics 8, 1 (Jan. 2002), 1--8.
[21]
Fei Li and H. V. Jagadish. 2016. Understanding Natural Language Queries over Relational Databases. SIGMOD Rec. 45, 1 (June 2016), 6--13.
[22]
Percy Liang. 2016. Learning Executable Semantic Parsers for Natural Language Understanding. Commun. ACM 59, 9 (Aug. 2016), 68--76.
[23]
Michael H. Long. 1996. The role of the linguistic environment in second language acquisition. Academic Press.
[24]
Daniel Morrow, Alfred Lee, and Michelle Rodvold. 1993. Analysis of Problems in Routine Controller-Pilot Communication. The International Journal of Aviation Psychology 3, 4 (1993), 285--302.
[25]
Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. 2015. Neural Programmer: Inducing Latent Programs with Gradient Descent. CoRR abs/1511.04834 (2015). http://arxiv.org/abs/1511.04834
[26]
P. Pirolli and S. Card. 2015. The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proceedings of 2005 International Conference on Intelligence Analysis. http://www.phibetaiota.net/wp-content/uploads/2014/ 12/Sensemaking-Process-Pirolli-and-Card.pdf
[27]
Ana-Maria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander Yates. 2004. Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability. In Proceedings of the 20th International Conference on Computational Linguistics (COLING '04). Association for Computational Linguistics, Stroudsburg, PA, USA, Article 141.
[28]
Ana-Maria Popescu, Oren Etzioni, and Henry Kautz. 2003. Towards a Theory of Natural Language Interfaces to Databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI '03). ACM, NY, NY, USA, 149--157.
[29]
Martin F Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130--137.
[30]
Ritcha Ranjan. 2016. Explore in Docs, Sheets and Slides makes work a breeze -- and makes you look good, too. (September 2016). https://docs.googleblog.com/2016/09/ExploreinDocsSheetsSlides.html.
[31]
Christopher Scaffidi, Mary Shaw, and Brad Myers. 2005. Estimating the Numbers of End Users and End User Programmers. In Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VLHCC '05). IEEE Computer Society, Washington, DC, USA, 207--214.
[32]
Richard Schmidt. 1990. The Role of Consciousness in Second Language Learning. Applied Linguistics 11 (1990), 129--158. http://nflrc.hawaii.edu/PDFs/SCHMIDT%20The%20role%20of%20consciousness%20in% 20second%20language%20learning.pdf
[33]
Vidya Setlur, Sarah E Battersby, Melanie K Tory, Rich Gossweiler, and Angel X Chang. 2016. Eviza: A Natural Language Interface for Visual Analysis. In 29th ACM User Interface Software and Technology Symposium (UIST 2016). ACM, NY, NY.
[34]
Amit Singhal. 2012. Introducting the Knowledge Graph: things, not strings. Google Blog. (May 2012). https://blog.google/products/search/ introducing-knowledge-graph-things-not/ See also https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html.
[35]
Chris Stolte, Diane Tang, and Pat Hanrahan. 2008. Polaris: A System for Query, Analysis, and Visualization of Multidimensional Databases. Commun. ACM 51, 11 (Nov. 2008), 75--84.
[36]
John W. Tukey. 1980. We need both exploratory and confirmatory. The American Statistician 34 (1980), 23--25. https://www.jstor.org/stable/2682991'seq=1
[37]
Ryen W. White, Matthew Richardson, and Wen-tau Yih. 2015. Questions vs. Queries in Informational Search Tasks. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, NY, NY, USA, 135--136.
[38]
Ji Soo Yi, Youn ah Kang, John Stasko, and Julie Jacko. 2007. Toward a Deeper Understanding of the Role of Interaction in Information Visualization. IEEE Transactions on Visualization and Computer Graphics 13, 6 (Nov. 2007), 1224--1231.

Cited By

View all
  • (2024)Grounding with Structure: Exploring Design Variations of Grounded Human-AI Collaboration in a Natural Language InterfaceProceedings of the ACM on Human-Computer Interaction10.1145/36869028:CSCW2(1-27)Online publication date: 8-Nov-2024
  • (2024)SlopeSeeker: A Search Tool for Exploring a Dataset of Quantifiable TrendsProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645208(817-836)Online publication date: 18-Mar-2024
  • (2024)DataDive: Supporting Readers' Contextualization of Statistical Statements with Data ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645155(623-639)Online publication date: 18-Mar-2024
  • Show More Cited By

Index Terms

  1. Analyza: Exploring Data with Conversation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IUI '17: Proceedings of the 22nd International Conference on Intelligent User Interfaces
    March 2017
    654 pages
    ISBN:9781450343480
    DOI:10.1145/3025171
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 March 2017

    Check for updates

    Author Tags

    1. exploratory data analysis
    2. natural language

    Qualifiers

    • Research-article

    Conference

    IUI'17
    Sponsor:

    Acceptance Rates

    IUI '17 Paper Acceptance Rate 63 of 272 submissions, 23%;
    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Upcoming Conference

    IUI '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)232
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 30 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Grounding with Structure: Exploring Design Variations of Grounded Human-AI Collaboration in a Natural Language InterfaceProceedings of the ACM on Human-Computer Interaction10.1145/36869028:CSCW2(1-27)Online publication date: 8-Nov-2024
    • (2024)SlopeSeeker: A Search Tool for Exploring a Dataset of Quantifiable TrendsProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645208(817-836)Online publication date: 18-Mar-2024
    • (2024)DataDive: Supporting Readers' Contextualization of Statistical Statements with Data ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645155(623-639)Online publication date: 18-Mar-2024
    • (2024)Measures in SQLCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653374(31-40)Online publication date: 9-Jun-2024
    • (2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
    • (2024)XNLI: Explaining and Diagnosing NLI-Based Visual Data AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324000330:7(3813-3827)Online publication date: Jul-2024
    • (2024)AI-Assisted Analytics – An Automated Approach to Data VisualizationAdvances in Conceptual Modeling10.1007/978-3-031-75599-6_24(343-358)Online publication date: 29-Oct-2024
    • (2023)Olio: A Semantic Search Interface for Data RepositoriesProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606806(1-16)Online publication date: 29-Oct-2023
    • (2023)Follow the Successful Herd: Towards Explanations for Improved Use and Mental Models of Natural Language SystemsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584088(220-239)Online publication date: 27-Mar-2023
    • (2023)COLDECO: An End User Spreadsheet Inspection Tool for AI-Generated Code2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL-HCC57772.2023.00017(82-91)Online publication date: 3-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media