[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3332165.3347940acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Public Access

Tea: A High-level Language and Runtime System for Automating Statistical Analysis

Published: 17 October 2019 Publication History

Abstract

Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests.

Supplementary Material

MP4 File (ufp8521pv.mp4)
Preview video
MP4 File (p591-jun.mp4)

References

[1]
American Psychological Association. 1996. Task Force on Statistical Inference. (1996). https://www.apa.org/science/leadership/bsa/statistical/
[2]
American Psychological Association and others. 1983. Publication manual. American Psychological Association Washington, DC.
[3]
Eytan Bakshy, Dean Eckles, and Michael S Bernstein. 2014. Designing and deploying online field experiments. In Proceedings of the 23rd international conference on World wide web. ACM, 283--292.
[4]
J. Bruin. 2019. Choosing the Correct Statistical Test in SAS, Stata, SPSS and R. (2019). https://stats.idre.ucla.edu/other/mult-pkg/whatstat/
[5]
Andreas Buja, Dianne Cook, Heike Hofmann, Michael Lawrence, Eun-Kyung Lee, Deborah F Swayne, and Hadley Wickham. 2009. Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367, 1906 (2009), 4361--4383.
[6]
Paul Cairns. 2007. HCI... not as it should be: inferential statistics in HCI research. In Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI... but not as we know it-Volume 1. British Computer Society, 195--201.
[7]
Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan : A Probabilistic Programming Language. Journal of Statistical Software 76 (01 2017). http://dx.doi.org/10.18637/jss.v076.i01
[8]
Andy Cockburn, Carl Gutwin, and Alan Dix. 2018. Hark no more: on the preregistration of chi experiments. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 141.
[9]
Jacob Cohen. 1988. Statistical power analysis for the social sciences. (1988).
[10]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337--340.
[11]
Pierre Dragicevic. 2016. Fair statistical communication in HCI. In Modern Statistical Methods for HCI. Springer, 291--330.
[12]
Bradley Efron. 1992. Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics. Springer, 569--593.
[13]
Isaac Ehrlich. 1973. Participation in illegitimate activities: A theoretical and empirical investigation. Journal of political Economy 81, 3 (1973), 521--565.
[14]
Alexander Eiselmayer, Chatchavan Wacharamanotham, Michel Beaudouin-Lafon, and Wendy Mackay. 2019. Touchstone2: An Interactive Environment for Exploring Trade-offs in HCI Experiment Design. (2019).
[15]
Andy Field, Jeremy Miles, and Zoë Field. 2012. Discovering statistics using R. Sage publications.
[16]
Ronald Aylmer Fisher. 1937. The design of experiments. Oliver And Boyd; Edinburgh; London.
[17]
Jonah Gabry, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. Visualization in Bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society) 182, 2 (2019), 389--402.
[18]
N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. 2008. Church: a language for generative models. Uncertainty in Artificial Intelligence (2008).
[19]
Francc ois Guimbretière, Morgan Dixon, and Ken Hinckley. 2007. ExperiScope: an analysis tool for interaction data. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1333--1342.
[20]
Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings of the National Academy of Sciences 116, 6 (2019), 1844--1850.
[21]
Jane Hoffswell, Alan Borning, and Jeffrey Heer. 2018. SetCoLa: High-Level Constraints for Graph Layout. In Computer Graphics Forum, Vol. 37. Wiley Online Library, 537--548.
[22]
Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics (1979), 65--70.
[23]
Eric Jones, Travis Oliphant, Pearu Peterson, and others. 2001--2019. SciPy: Open source scientific tools for Python. (2001--2019). http://www.scipy.org/
[24]
Robert I Kabacoff. 2011. R: In Action. (2011).
[25]
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3363--3372.
[26]
Maurits Kaptein and Judy Robertson. 2012. Rethinking statistical analysis methods for CHI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1105--1114.
[27]
Matthew Kay, Gregory L Nelson, and Eric B Hekler. 2016. Researcher-centered design of statistics: Why Bayesian statistics better fit the culture and incentives of HCI. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4521--4532.
[28]
Norbert L Kerr. 1998. HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review 2, 3 (1998), 196--217.
[29]
Scott Klemmer and Jacob Wobbrock. 2019. Designing, Running, and Analyzing Experiments. (2019). https://www.coursera.org/learn/designexperiments
[30]
John K. Kruschke. 2010. Doing Bayesian Data Analysis: A Tutorial with R and BUGS (1st ed.). Academic Press, Inc., Orlando, FL, USA.
[31]
John K. Kruschke and Torrin M. Liddell. 2018. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review 25, 1 (01 Feb 2018), 178--206. http://dx.doi.org/10.3758/s13423-016--1221--4
[32]
Kent State University Libraries. 2019. SPSS Tutorials: Analyzing Data. (2019). https://libguides.library.kent.edu/SPSS/AnalyzeData
[33]
Calvin Loncaric, Emina Torlak, and Michael D Ernst. 2016. Fast synthesis of fast collections. ACM SIGPLAN Notices 51, 6 (2016), 355--368.
[34]
Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen. 2002. The importance of the normality assumption in large public health data sets. Annual review of public health 23, 1 (2002), 151--169.
[35]
David J. Lunn, Andrew Thomas, Nicky Best, and David Spiegelhalter. 2000. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 10, 4 (01 Oct 2000), 325--337. http://dx.doi.org/10.1023/A:1008929526011
[36]
Wendy E Mackay, Caroline Appert, Michel Beaudouin-Lafon, Olivier Chapuis, Yangzhou Du, Jean-Daniel Fekete, and Yves Guiard. 2007. Touchstone: exploratory design of experiments. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1425--1434.
[37]
Michael E. J. Masson. 2011. A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods 43, 3 (Sept. 2011), 679--690. http://dx.doi.org/10.3758/s13428-010-0049--5
[38]
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. 2005. BLOG: Probabilistic Models with Unknown Objects. In Proc. 19th International Joint Conference on Artificial Intelligence. 1352--1359. http://sites.google.com/site/bmilch/papers/blog-ijcai05.pdf
[39]
Dominik Moritz, Chenglong Wang, Greg L Nelson, Halden Lin, Adam M Smith, Bill Howe, and Jeffrey Heer. 2019. Formalizing visualization design knowledge as constraints: Actionable and extensible models in Draco. IEEE transactions on visualization and computer graphics 25, 1 (2019), 438--448.
[40]
Travis E Oliphant. 2006. A guide to NumPy. Vol. 1. Trelgol Publishing USA.
[41]
Pavel Panchekha, Adam T Geller, Michael D Ernst, Zachary Tatlock, and Shoaib Kamil. 2018. Verifying that web pages have accessible layout. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 1--14.
[42]
Avi Pfeffer. 2011. Practical Probabilistic Programming. In Inductive Logic Programming, Paolo Frasconi and Francesca A. Lisi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 2--3.
[43]
Alex Reinhart. 2015. Statistics done wrong: The woefully complete guide. No starch press.
[44]
Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2017. Vega-lite: A grammar of interactive graphics. IEEE transactions on visualization and computer graphics 23, 1 (2017), 341--350.
[45]
Skipper Seabold and Josef Perktold. 2010. Statsmodels: Econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference, Vol. 57. Scipy, 61.
[46]
Amanda Swearngin, Andrew J Ko, and James Fogarty. 2018. Scout: Mixed-Initiative Exploration of Design Variations through High-Level Design Constraints. In The 31st Annual ACM Symposium on User Interface Software and Technology Adjunct Proceedings. ACM, 134--136.
[47]
Walter Vandaele. 1987. Participation in illegitimate activities: Ehrlich revisited, 1960. Vol. 8677. Inter-university Consortium for Political and Social Research.
[48]
András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics 25, 2 (2000), 101--132.
[49]
William N Venables and Brian D Ripley. 2013. Modern applied statistics with S-PLUS. Springer Science & Business Media.
[50]
Chat Wacharamanotham, Krishna Subramanian, Sarah Theres Volkel, and Jan Borchers. 2015. Statsplorer: Guiding novices in statistical analysis. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2693--2702.
[51]
Hadley Wickham and others. 2014. Tidy data. Journal of Statistical Software 59, 10 (2014), 1--23.
[52]
Wikipedia contributors. 2019a. JMP (statistical software) -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=JMP_(statistical_software)&oldid=887217350. (2019). [Online; accessed 5-April-2019].
[53]
Wikipedia contributors. 2019b. R (programming language) -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=R_(programming_language)&oldid=890657071. (2019). [Online; accessed 5-April-2019].
[54]
Wikipedia contributors. 2019c. SAS (software) -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=SAS_(software)&oldid=890451452. (2019). [Online; accessed 5-April-2019].
[55]
Wikipedia contributors. 2019d. SPSS -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=SPSS&oldid=888470477. (2019). [Online; accessed 5-April-2019].
[56]
Leland Wilkinson. 1999. Statistical methods in psychology journals: Guidelines and explanations. American psychologist 54, 8 (1999), 594.

Cited By

View all
  • (2024)Demonstrating FEDT: Supporting Characterization Experiments in Fabrication ResearchAdjunct Proceedings of the 9th ACM Symposium on Computational Fabrication10.1145/3665662.3673270(1-3)Online publication date: 7-Jul-2024
  • (2024)Can GPT-4 Replicate Empirical Software Engineering Research?Proceedings of the ACM on Software Engineering10.1145/36607671:FSE(1330-1353)Online publication date: 12-Jul-2024
  • (2024)rTisane: Externalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modelingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642267(1-16)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. Tea: A High-level Language and Runtime System for Automating Statistical Analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology
    October 2019
    1229 pages
    ISBN:9781450368162
    DOI:10.1145/3332165
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automated statistical analysis
    2. constraint-based system
    3. data science
    4. declarative programming language
    5. pre-registration
    6. reproducibility
    7. statistical analysis

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    UIST '19

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)263
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Demonstrating FEDT: Supporting Characterization Experiments in Fabrication ResearchAdjunct Proceedings of the 9th ACM Symposium on Computational Fabrication10.1145/3665662.3673270(1-3)Online publication date: 7-Jul-2024
    • (2024)Can GPT-4 Replicate Empirical Software Engineering Research?Proceedings of the ACM on Software Engineering10.1145/36607671:FSE(1330-1353)Online publication date: 12-Jul-2024
    • (2024)rTisane: Externalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modelingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642267(1-16)Online publication date: 11-May-2024
    • (2024)How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz StudyProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641891(1-22)Online publication date: 11-May-2024
    • (2024)Guided Statistical Workflows with Interactive Explanations and Assumption Checking2024 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS55277.2024.00013(26-30)Online publication date: 13-Oct-2024
    • (2023)Statslator: Interactive Translation of NHST and Estimation Statistics Reporting Styles in Scientific DocumentsProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606762(1-14)Online publication date: 29-Oct-2023
    • (2023)Understanding and Supporting Debugging Workflows in Multiverse AnalysisProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581099(1-19)Online publication date: 19-Apr-2023
    • (2023)AI Assistants: A Framework for Semi-Automated Data WranglingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322253835:9(9295-9306)Online publication date: 1-Sep-2023
    • (2022)Where's my jetpack?Interactions10.1145/355190029:5(68-71)Online publication date: 30-Aug-2022
    • (2022)Empowering domain experts to author valid statistical analysesAdjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526114.3558530(1-5)Online publication date: 29-Oct-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media