[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3301275.3302307acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Avoiding drill-down fallacies with VisPilot: assisted exploration of data subsets

Published: 17 March 2019 Publication History

Abstract

As datasets continue to grow in size and complexity, exploring multi-dimensional datasets remain challenging for analysts. A common operation during this exploration is drill-down-understanding the behavior of data subsets by progressively adding filters. While widely used, in the absence of careful attention towards confounding factors, drill-downs could lead to inductive fallacies. Specifically, an analyst may end up being "deceived" into thinking that a deviation in trend is attributable to a local change, when in fact it is a more general phenomenon; we term this the drill-down fallacy. One way to avoid falling prey to drill-down fallacies is to exhaustively explore all potential drill-down paths, which quickly becomes infeasible on complex datasets with many attributes. We present VisPilot, an accelerated visual data exploration tool that guides analysts through the key insights in a dataset, while avoiding drill-down fallacies. Our user study results show that VisPilot helps analysts discover interesting visualizations, understand attribute importance, and predict unseen visualizations better than other multidimensional data analysis baselines.

Supplementary Material

MP4 File (p186-lee.mp4)

References

[1]
2016. Elections 2016 Exit Polls. http://edition.cnn.com/election/2016/results/exit-polls
[2]
2017. Titanic: Machine Learning from Disaster. Kaggle. http://www.kaggle.com/c/titanic
[3]
Nazanin Alipourfard, Peter G. Fennell, and Kristina Lerman. 2018. Can You Trust the Trend?: Discovering Simpson's Paradoxes in Social Data. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18). ACM, New York, NY, USA, 19--27.
[4]
Anushka Anand and Justin Talbot. 2015. Automatic Selection of Partitioning Variables for Small Multiple Displays. 2626, c (2015). 2015.2467323
[5]
Zan Armstrong and Martin Wattenberg. 2014. Visualizing statistical mix effects and simpson's paradox. IEEE transactions on visualization and computer graphics 20, 12 (2014), 2132--2141.
[6]
Carsten Binnig, Lorenzo De Stefani, Tim Kraska, Eli Upfal, Emanuel Zgraggen, and Zheguang Zhao. 2017. Toward Sustainable Insights, or Why Polygamy is Bad for You. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017. cidrdb.org/cidr2017/papers/p56-binnig-cidr17.pdf
[7]
Jeremy Boy, Francoise Detienne, and Jean-Daniel Fekete. 2015. Storytelling in Information Visualizations: Does It Engage Users to Explore Data?. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1449--1458.
[8]
Michael Correll and Jeffrey Heer. 2016. Surprise! Bayesian Weighting for De-Biasing Thematic Maps. IEEE Transactions on Visualization and Computer Graphics 2626, c(2016), 1--1.
[9]
Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, and Chi Wang. 2016. Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 679--694.
[10]
Fadi Fayez Thabtah. 2017. Autism Screening Adult Data Set. UCI Machine Learning Repository.
[11]
David Gotz, Shun Sun, and Nan Cao. 2016. Adaptive Contextualization: Combating Bias During High-Dimensional Visualization and Data Selection. Proceedings of the 21st International Conference on Intelligent User Interfaces - IUI '16 (2016), 85--95.
[12]
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1, 1 (01 Mar 1997), 29--53.
[13]
Yue Guo, Carsten Binnig, Tim Kraska, and T U Darmstadt. 2017. What you see is not what you get ! Detecting Simpson ' s Paradoxes during Data Exploration. HILDA 2017 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics (2017).
[14]
Jiawei Han. 2005. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[15]
Jeffrey Heer and Ben Shneiderman. 2012. Interactive Dynamics for Visual Analysis. Queue 10, 2 (2012), 30.
[16]
Enamul Hoque, Vidya Setlur, Melanie Tory, and Isaac Dykeman. 2017. Applying Pragmatics Principles for Interaction with Visual Analytics. IEEE Transactions on Visualization and Computer Graphics c (2017).
[17]
Jessica Hullman, Robert Kosara, and Heidi Lam. 2017. Finding a Clear Path: Structuring Strategies for Visualization Sequences. Comput. Graph. Forum 36, 3 (June 2017), 365--375.
[18]
Laurent Itti and Pierre Baldi. 2009. Bayesian surprise attracts human attention. Vision Research 49, 10 (19 May 2009), 1295--1306.
[19]
Manas Joglekar, Hector Garcia-Molina, and Aditya Parameswaran. 2015. Smart Drill-Down : A New Data Exploration Operator. Proceedings of the 41st International Conference on Very Large Data Bases 8, 12 (2015), 1928--1931.
[20]
Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2012. Profiler: Integrated Statistical Analysis and Visualization for Data Quality Assessment. In Advanced Visual Interfaces. http://vis.stanford.edu/papers/profiler
[21]
Alicia Key, Bill Howe, Daniel Perry, and Cecilia Aragon. 2012. VizDeck. Proceedings of the 2012 international conference on Management of Data - SIGMOD '12 (2012), 681.
[22]
Younghoon Kim, Kanit Wongsuphasawat, Jessica Hullman, and Jeffrey Heer. 2017. GraphScape: A Model for Automated Reasoning about Visualization Similarity and Sequencing. Proc. of ACM CHI 2017 (2017).
[23]
Doris Jung-Lin Lee, Himel Dev, Huizi Hu, Hazem Elmeleegy, and Aditya Parameswaran. 2019. Avoiding Drill-down Fallacies with VisPilot: Assisted Exploration of Data Subsets (Technical Report). (2019).
[24]
Doris Jung-Lin Lee and Aditya Parameswaran. 2018. The Case for a Visual Discovery Assistant: A Holistic Solution for Accelerating Visual Data Exploration. IEEE Bulletin of Technical Committee on Data Engineering (2018).
[25]
Stephen Macke, Yiming Zhang, Silu Huang, and Aditya Parameswaran. 2018. Adaptive Sampling for Rapidly Matching Histograms. Proc. VLDB Endow. 11, 10 (June 2018), 1262--1275.
[26]
Jock D. Mackinlay, Pat Hanrahan, and Chris Stolte. 2007. Show Me: Automatic presentation for visual analysis. IEEE Transactions on Visualization and Computer Graphics 13, 6 (2007), 1137--1144.
[27]
Miro Mannino and Azza Abouzeid. 2018. Qetch: Time Series Querying with Expressive Sketches. In SIGMOD Conference.
[28]
Mary L. McHugh. 2013. The Chi-square test of independence. Biochemia Medica 23, 2 (15 Jun 2013), 143--149. ncbi.nlm.nih.gov/pmc/articles/PMC3900058/
[29]
Michael J. Muller and Sarah Kuhn. 1993. Participatory Design. Commun. ACM 36, 6 (June 1993), 24--28.
[30]
Aditya G. Parameswaran, Hector Garcia-Molina, and Jeffrey D. Ullman. 2010. Evaluating, combining and generalizing recommendations with prerequisites. Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10 (2010), 919. E. Pierson, C. Simoiu, J. Overgoor, S. Corbett-Davies, V. Ramachandran, C. Phillips, and S. Goel. 2017. A large-scale analysis of racial disparities in police stops across the United States. http://openpolicing.stanford.edu/data/
[31]
E. Pierson, C. Simoiu, J. Overgoor, S. Corbett-Davies, V. Ramachandran, C. Phillips, and S. Goel. 2017. A large-scale analysis of racial disparities in police stops across the United States. http://openpolicing.stanford.edu/data/
[32]
Edward Segel and Jeffrey Heer. 2010. Narrative visualization: Telling stories with data. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1139--1148.
[33]
Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya Parameswaran. 2016. Effortless data exploration with zenvisage: an expressive and interactive visual analytics system. Proceedings of the VLDB Endowment 10, 4 (2016), 457--468.
[34]
Tarique Siddiqui, Zesheng Wang, Paul Luh, Karrie Karahalios, and Aditya G. Parameswaran. 2018. ShapeSearch: A Flexible and Efficient System for Shape-based Exploration of Trendlines. CoRR abs/1811.07977 (2018). arXiv:1811.07977 http://arxiv.org/abs/1811.07977
[35]
Tuan Nhon Dang and Leland Wilkinson. 2014. ScagExplorer: Exploring Scatter-plots by Their Scagnostics. 2014 IEEE Pacific Visualization Symposium (2014), 73--80.
[36]
Manasi Vartak, Samuel Madden, Aditya G. Parameswaran, and Neoklis Polyzotis. 2014. SEEDB: Automatically Generating Query Visualizations. PVLDB 7, 13 (2014), 1581--1584.
[37]
Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. SeeDB: efficient data-driven visualization recommendations to support visual analytics. Proceedings of the VLDB Endowment 8, 13 (2015), 2182--2193.
[38]
Emily Wall, Leslie M Blaha, Lyndsey Franklin, and Alex Endert. 2017. Warning, Bias May Occur: A Proposed Approach to Detecting Cognitive Bias in Interactive Visual Analytics. 2017 IEEE Conference on Visual Analytics Science and Technology (VAST) (2017).
[39]
Wikipedia contributors. 2018. Minimax --- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Minimax&oldid=866945016 {Online; accessed 30-December-2018}.
[40]
Leland Wilkinson, Anushka Anand, and Robert Grossman. 2005. Graph-Theoretic Scagnostics. IEEE Symposium on Information Visualization (INFOVIS) (2005).
[41]
Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2016. Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2016), 649--658.
[42]
Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining Away Outliers in Aggregate Queries. Proceedings of the VLDB Endowment 6, 8 (2013), 553--564.
[43]
Emanuel Zgraggen, Zheguang Zhao, Robert Zeleznik, and Tim Kraska. 2018. Investigating the Effect of the Multiple Comparisons Problem in Visual Analysis. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 479, 12 pages.

Cited By

View all
  • (2024)Inferring Visualization Intent from ConversationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679589(1184-1194)Online publication date: 21-Oct-2024
  • (2024)Socrates: Data Story Generation via Adaptive Machine-Guided Elicitation of User FeedbackIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332736330:1(131-141)Online publication date: 1-Jan-2024
  • (2024)InkSight: Leveraging Sketch Interaction for Documenting Chart Findings in Computational NotebooksIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332717030:1(944-954)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '19: Proceedings of the 24th International Conference on Intelligent User Interfaces
March 2019
713 pages
ISBN:9781450362726
DOI:10.1145/3301275
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 March 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. drill-down data analysis
  2. exploratory data analysis
  3. visualization recommendation

Qualifiers

  • Research-article

Funding Sources

  • Microsoft
  • Toyota Research Institute
  • Siebel Energy Institute
  • 3M
  • Adobe
  • Google

Conference

IUI '19
Sponsor:

Acceptance Rates

IUI '19 Paper Acceptance Rate 71 of 282 submissions, 25%;
Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Inferring Visualization Intent from ConversationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679589(1184-1194)Online publication date: 21-Oct-2024
  • (2024)Socrates: Data Story Generation via Adaptive Machine-Guided Elicitation of User FeedbackIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332736330:1(131-141)Online publication date: 1-Jan-2024
  • (2024)InkSight: Leveraging Sketch Interaction for Documenting Chart Findings in Computational NotebooksIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332717030:1(944-954)Online publication date: 1-Jan-2024
  • (2024)Calliope-Net: Automatic Generation of Graph Data Facts via Annotated Node-Link DiagramsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332692530:1(562-572)Online publication date: 1-Jan-2024
  • (2024)VisCollage: Annotative Collages for Organizing Data Event Charts2024 IEEE 17th Pacific Visualization Conference (PacificVis)10.1109/PacificVis60374.2024.00036(262-271)Online publication date: 23-Apr-2024
  • (2024)Qutaber: task-based exploratory data analysis with enriched context awarenessJournal of Visualization10.1007/s12650-024-00975-127:3(503-520)Online publication date: 11-Mar-2024
  • (2023)Tendency on the Application of Drill-Down Analysis in Scientific Studies: A Systematic ReviewTechnologies10.3390/technologies1104011211:4(112)Online publication date: 13-Aug-2023
  • (2023)NetworkNarratives: Data Tours for Visual Network Exploration and AnalysisProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581452(1-15)Online publication date: 19-Apr-2023
  • (2023)Open Questions About the Visualization of Sociodemographic Data2023 IEEE Workshop on Visualization for Social Good (VIS4Good)10.1109/VIS4Good60218.2023.00010(16-20)Online publication date: 22-Oct-2023
  • (2023)A Unified Comparison of User Modeling Techniques for Predicting Data Interaction and Detecting Exploration BiasIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320947629:1(483-492)Online publication date: Jan-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media