[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3313831.3376729acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities

Published: 23 April 2020 Publication History

Abstract

Computational notebooks - such as Azure, Databricks, and Jupyter - are a popular, interactive paradigm for data scientists to author code, analyze data, and interleave visualizations, all within a single document. Nevertheless, as data scientists incorporate more of their activities into notebooks, they encounter unexpected difficulties, or pain points, that impact their productivity and disrupt their workflow. Through a systematic, mixed-methods study using semi-structured interviews (n=20) and survey (n=156) with data scientists, we catalog nine pain points when working with notebooks. Our findings suggest that data scientists face numerous pain points throughout the entire workflow - from setting up notebooks to deploying to production - across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks.

References

[1]
Titus Barik, Robert DeLine, Steven Drucker, and Danyel Fisher. 2016. The bones of the system: A case study of logging and telemetry at Microsoft. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). IEEE, 92--101.
[2]
M. Beth Kery and B. A. Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 25--29.
[3]
Jelke Bethlehem. 2010. Selection bias in web surveys. International Statistical Review 78, 2 (2010), 161--188.
[4]
Allen Cypher and Daniel Conrad Halbert. 1993. Watch What I Do: Programming by Demonstration. MIT Press.
[5]
Robert DeLine, Danyel Fisher, Badrish Chandramouli, Jonathan Goldstein, Michael Barnett, James F Terwilliger, and John Wernsing. 2015. Tempe: Live scripting for live data. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 137--141.
[6]
Olivier Flückiger, Guido Chari, Jan Jecmen, Ming-Ho Yee, Jakob Hain, and Jan Vitek. 2019. R melts brains: An IR for first-class environments and lazy effectful arguments. In Proceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages. ACM, 55--66.
[7]
Aviral Goel and Jan Vitek. 2019. On the design, implementation, and use of laziness in R. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1--27.
[8]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, Vol. 46. ACM, 317--330.
[9]
Sumit Gulwani. 2016. Programming by examples. Dependable Software Systems Engineering 45, 137 (2016), 3--15.
[10]
Philip J. Guo. 2012. BURRITO: Wrapping your lab Notebook in computational infrastructure. In 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP). USENIX.
[11]
Philip J. Guo, Sean Kandel, Joseph M. Hellerstein, and Jeffrey Heer. 2011. Proactive wrangling: Mixed-initiative end-user programming of data transformation scripts. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST). ACM, 65--74.
[12]
Jo Erskine Hannay, Carolyn MacLeod, Janice Singer, Hans Petter Langtangen, Dietmar Pfahl, and Greg Wilson. 2009. How do scientists develop and use scientific software?. In Proceedings of the 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering (SECSE). IEEE, 1--8.
[13]
Andrew Head, Fred Hohman, Titus Barik, Steven M. Drucker, and Robert DeLine. 2019. Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI). ACM, Article 270.
[14]
Eric Horton and Chris Parnin. 2019. DockerizeMe: Automatic inference of environment dependencies for Python code snippets. In Proceedings of the 41st International Conference on Software Engineering (ICSE). IEEE, 328--338.
[15]
Jupyter. 2015. Jupyter Notebook UX Survey. (2015). https://github.com/jupyter/surveys/blob/master/surveys/ 2015--12-notebook-ux/analysis/report_dashboard.ipynb
[16]
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 3363--3372.
[17]
S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (Dec 2012), 2917--2926.
[18]
Mary Beth Kery, Bonnie E. John, Patrick O'Flaherty, Amber Horvath, and Brad A. Myers. 2019. Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI). ACM, 92.
[19]
M. B. Kery and B. A. Myers. 2018. Interactions for untangling messy history in a computational notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 147--155.
[20]
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI). ACM, Article 174, 11 pages.
[21]
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, and others. 2016. Jupyter Notebooks: A publishing format for reproducible computational workflows. In ELPUB. 87--90.
[22]
Sean Kross and Philip J. Guo. 2019. Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI). ACM, 1--14.
[23]
Hiroaki Mikami, Daisuke Sakamoto, and Takeo Igarashi. 2017. Micro-versioning tool to support experimentation in exploratory programming. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI). ACM, 6208--6219.
[24]
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How data science workers work with data: Discovery, capture, curation, design, creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI). ACM, Article 126, 15 pages.
[25]
B. A. Myers. 1986. Visual programming, programming by example, and program visualization: A taxonomy. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 59--66.
[26]
Brad A. Myers. 1998. Scripting graphical applications by demonstration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 534--541.
[27]
Netflix. 2018. Part 2: Scheduling Notebooks at Netflix. (2018). https://medium.com/netflix-techblog/ scheduling-notebooks-348e6c14cfd6
[28]
Anthony J. Onwuegbuzie and Nancy L. Leech. 2007. Validity and qualitative research: An oxymoron? Quality & Quantity 41, 2 (01 April 2007), 233--249.
[29]
F. Perez and B. E. Granger. 2007. IPython: A system for interactive scientific computing. Computing in Science Engineering 9, 3 (May 2007), 21--29.
[30]
Fernando Perez and Brian E Granger. 2015. Project Jupyter: Computational narratives as the engine of collaborative data science. Retrieved September 11, 207 (2015), 108.
[31]
F. Perez, B. E. Granger, and J. D. Hunter. 2011. Python: An ecosystem for scientific Computing. Computing in Science Engineering 13, 2 (March 2011), 13--21.
[32]
M. Ragan-Kelley, F. Perez, B. Granger, T. Kluyver, P. Ivanov, J. Frederic, and M. Bussonnier. 2014. The Jupyter/IPython architecture: A unified view of computational research, from interactive exploration to communication and publication. AGU Fall Meeting Abstracts (Dec. 2014), H44D--07.
[33]
B. M. Randles, I. V. Pasquetto, M. S. Golshan, and C. L. Borgman. 2017. Using the Jupyter Notebook as a tool for open science: An empirical study. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 1--2.
[34]
Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI). ACM, Article 32, 12 pages.
[35]
Johnny Saldaña. 2009. The Coding Manual for Qualitative Researchers. SAGE Publications.
[36]
Helen Shen. 2014. Interactive notebooks: Sharing the code. Nature News 515, 7525 (2014), 151.
[37]
April Yi Wang, Anant Mittal, Christopher Brooks, and Steve Oney. 2019. How data scientists use computational notebooks for real-time collaboration. Proceedings of the ACM on Human-Computer Interaction (CSCW) 3 (2019), 1--30.
[38]
Aruliah D.A. Brown C.T. Hong N.P.C. Davis M. Guy R.T. Haddock S.H. Huff K.D. Mitchell I.M. Plumbley M.D. Wilson, G. and B. Waugh. 2014. Best practices for scientific computing. PLoS Biology 12, 1 (2014), e1001745.
[39]
Jo Wood, Alexander Kachkaev, and Jason Dykes. 2018. Design exposition with literate visualization. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 759--768.

Cited By

View all
  • (2025)Loops: Leveraging Provenance and Visualization to Support Exploratory Data Analysis in NotebooksIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345618631:1(1213-1223)Online publication date: Jan-2025
  • (2024)Bridging Incremental Programming and Complex Software Development EnvironmentsProceedings of the 3rd ACM SIGPLAN International Workshop on Programming Abstractions and Interactive Notations, Tools, and Environments10.1145/3689488.3689991(29-40)Online publication date: 18-Oct-2024
  • (2024)Extending Jupyter with Multi-Paradigm EditorsProceedings of the ACM on Human-Computer Interaction10.1145/36602478:EICS(1-22)Online publication date: 17-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
April 2020
10688 pages
ISBN:9781450367080
DOI:10.1145/3313831
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2020

Permissions

Request permissions for this article.

Check for updates

Badges

  • Honorable Mention

Author Tags

  1. challenges
  2. computational notebooks
  3. data science
  4. interviews
  5. pain points
  6. survey

Qualifiers

  • Research-article

Funding Sources

Conference

CHI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)475
  • Downloads (Last 6 weeks)47
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Loops: Leveraging Provenance and Visualization to Support Exploratory Data Analysis in NotebooksIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345618631:1(1213-1223)Online publication date: Jan-2025
  • (2024)Bridging Incremental Programming and Complex Software Development EnvironmentsProceedings of the 3rd ACM SIGPLAN International Workshop on Programming Abstractions and Interactive Notations, Tools, and Environments10.1145/3689488.3689991(29-40)Online publication date: 18-Oct-2024
  • (2024)Extending Jupyter with Multi-Paradigm EditorsProceedings of the ACM on Human-Computer Interaction10.1145/36602478:EICS(1-22)Online publication date: 17-Jun-2024
  • (2024)NotePlayer: Engaging Computational Notebooks for Dynamic Presentation of Analytical ProcessesProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676410(1-20)Online publication date: 13-Oct-2024
  • (2024)Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task DecompositionProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676345(1-19)Online publication date: 13-Oct-2024
  • (2024)Multiverse Notebook: Shifting Data Scientists to Time TravelersProceedings of the ACM on Programming Languages10.1145/36498388:OOPSLA1(754-783)Online publication date: 29-Apr-2024
  • (2024)Hidden Gems in the Rough: Computational Notebooks as an Uncharted Oasis for IDEsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648465(107-109)Online publication date: 20-Apr-2024
  • (2024)Understanding the Dataset Practitioners Behind Large Language ModelsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651007(1-7)Online publication date: 11-May-2024
  • (2024)SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational NotebooksExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650848(1-17)Online publication date: 11-May-2024
  • (2024)Human-Notebook Interactions: The CHI of Computational NotebooksExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3636318(1-6)Online publication date: 11-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media