[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3318464.3380574acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

IDEBench: A Benchmark for Interactive Data Exploration

Published: 31 May 2020 Publication History

Abstract

In recent years, many query processing techniques have been developed to better support interactive data exploration (IDE) of large structured datasets. To evaluate and compare database engines in terms of how well they support such workloads, experimenters have mostly used self-designed evaluation procedures rather than established benchmarks. In this paper we argue that this is due to the fact that the workloads and metrics of popular analytical benchmarks such as TPC-H or TPC-DS were designed for traditional performance reporting scenarios, and do not capture distinctive IDE characteristics. Guided by the findings of several user studies we present a new benchmark called IDEBench, designed to evaluate database engines based on common IDE workflows and metrics that matter to the end-user. We demonstrate the applicability of IDEBench through a number of experiments with five different database engines, and present and discuss our findings.

Supplementary Material

MP4 File (3318464.3380574.mp4)
Presentation Video

References

[1]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. The aqua approximate query answering system. In ACM SIGMOD, pages 574--576, 1999.
[2]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 29--42. ACM, 2013.
[3]
R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. In Information Visualization, 2005. INFOVIS 2005. IEEE Symposium on, pages 111--117. IEEE, 2005.
[4]
L. Battle, R. Chang, J. Heer, and M. Stonebraker. Position statement: The case for a visualization performance benchmark. IEEE Internet Computing, 13(3):48--55, 2009.
[5]
T. Beigbeder, R. Coughlan, C. Lusher, J. Plunkett, E. Agu, and M. Claypool. The effects of loss and latency on user performance in unreal tournament 2003®. In Proceedings of 3rd ACM SIGCOMM workshop on Network and system support for games, pages 144--151. ACM, 2004.
[6]
J. Brutlag. Speed matters for google web search. https://services.google.com/fh/files/blogs/google_delayexp.pdf, 2009.
[7]
S. K. Card, G. G. Robertson, and J. D. Mackinlay. The information visualizer, an information workspace. In ACM SIGCHI, pages 181--186. ACM, 1991.
[8]
S. Chaudhuri and U. Dayal. An overview of data warehousing and olap technology. ACM Sigmod record, 26(1):65--74, 1997.
[9]
A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An architecture for compiling udf-centric workflows. PVLDB, 8(12):1466--1477, 2015.
[10]
A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. Vizdom: interactive analytics through pen and touch. PVLDB, 8:2024--2027, 2015.
[11]
A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. The case for interactive data exploration accelerators (IDEAs). In HILDA@SIGMOD, page 11. ACM, 2016.
[12]
G. Cumming and S. Finch. Inference by eye: confidence intervals and how to read pictures of data. American Psychologist, 60(2):170, 2005.
[13]
M. El-Hindi, Z. Zhao, C. Binnig, and T. Kraska. Vistrees: fast indexes for interactive data exploration. In ACM SIGMOD, page 5, 2016.
[14]
F. Farber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database -- an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.
[15]
A. Galakatos, A. Crotty, E. Zgraggen, C. Binnig, and T. Kraska. Revisiting reuse for approximate query processing. PVLDB, 10(10):1142--1153, 2017.
[16]
P. Hanrahan. Analytic database technologies for a new kind of user: the data enthusiast. In ACM SIGMOD, pages 577--578. ACM, 2012.
[17]
J. Heer and B. Shneiderman. Interactive dynamics for visual analysis. Queue, 10:30, 2012.
[18]
P. Jayachandran, K. Tunga, N. Kamat, and A. Nandi. Combining user interaction, speculative query execution and sampling in the dice system. PVLDB, 7:1697--1700, 2014.
[19]
S. Joslyn and J. LeClerc. Decisions with uncertainty: the glass half full. Current Directions in Psychological Science, 22(4):308--315, 2013.
[20]
N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed and interactive cube exploration. In ICDE, pages 472--483. IEEE, 2014.
[21]
A. Kemper and T. Neumann. Hyper: A hybrid oltp&olap main memory database system based on virtual memory snapshots. In ICDE, pages 195--206. IEEE, 2011.
[22]
F. Li, B. Wu, K. Yi, and Z. Zhao. Wander join: Online aggregation via random walks. In ACM SIGMOD, pages 615--629. ACM, 2016.
[23]
Z. Liu and J. Heer. The effects of interactive latency on exploratory visual analysis. IEEE transactions on visualization and computer graphics, 20:2122--2131, 2014.
[24]
Z. Liu, B. Jiang, and J. Heer. immens: Real-time visual querying of big data. In Computer Graphics Forum, volume 32, pages 421--430. Wiley Online Library, 2013.
[25]
Monetdb. http://www.monetdb.org. Accessed: 2019--11-02.
[26]
R. B. Nelsen. An introduction to copulas. Springer Science & Business Media, 2007.
[27]
J. Nielsen. Powers of 10: Time scales in user experience. Retrieved January, 5:2015, 2009.
[28]
B. of Transportation Statistics. Bureau of transportation statistics. http://www.transtats.bts.gov, 2017. Accessed: 2019--10--21.
[29]
P. E. O'Neil, E. J. O'Neil, and X. Chen. The star schema benchmark (ssb). Pat, 200(0):50, 2007.
[30]
S. C. Seow. Designing and engineering time: The psychology of time perception in software. Addison-Wesley Professional, 2008.
[31]
B. Shneiderman. Response time and display rate in human performance with computers. ACM Computing Surveys (CSUR), 16(3):265--285, 1984.
[32]
B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Visual Languages, 1996. Proceedings., IEEE Symposium on, pages 336--343. IEEE, 1996.
[33]
M. Sklar. Fonctions de repartition an dimensions et leurs marges. Publ. inst. statist. univ. Paris, 8:229--231, 1959.
[34]
Snappy data. https://www.snappydata.io/. Accessed: 2019--11-02.
[35]
C. Stolte, D. Tang, and P. Hanrahan. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Trans. Vis. Comput. Graph., 8(1):52--65, 2002.
[36]
Tableau. http://www.tableau.com. Accessed: 2019--11-02.
[37]
TPC-DS. http://www.tpc.org/tpcds/, 2016. Accessed: 2019--11-02.
[38]
TPC-H. http://www.tpc.org/tpch/, 2016. Accessed: 2019--11-02.
[39]
VerdictDB. Verdictdb. https://www.verdictdb.com. Accessed: 2018-05--30.
[40]
E. Zgraggen, A. Galakatos, A. Crotty, J.-D. Fekete, and T. Kraska. How progressive visualizations affect exploratory analysis. IEEE transactions on visualization and computer graphics, 23(8):1977--1987, 2017.

Cited By

View all
  • (2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 1-Feb-2024
  • (2024)Optimizing Dataflow Systems for Scalable Interactive VisualizationProceedings of the ACM on Management of Data10.1145/36392762:1(1-25)Online publication date: 26-Mar-2024
  • (2024)Demonstration of ElasticNotebook: Migrating Live Computational Notebook StatesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654752(540-543)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. database benchmark
  2. interactive data exploration
  3. visual analytics

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)248
  • Downloads (Last 6 weeks)30
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 1-Feb-2024
  • (2024)Optimizing Dataflow Systems for Scalable Interactive VisualizationProceedings of the ACM on Management of Data10.1145/36392762:1(1-25)Online publication date: 26-Mar-2024
  • (2024)Demonstration of ElasticNotebook: Migrating Live Computational Notebook StatesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654752(540-543)Online publication date: 9-Jun-2024
  • (2024)Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data ExplorationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334145136:11(6532-6546)Online publication date: Nov-2024
  • (2024) QHB + : Accelerated Configuration Optimization for Automated Performance Tuning of Spark SQL Applications IEEE Access10.1109/ACCESS.2024.339133312(60138-60148)Online publication date: 2024
  • (2023)ElasticNotebook: Enabling Live Migration for Computational NotebooksProceedings of the VLDB Endowment10.14778/3626292.362629617:2(119-133)Online publication date: 1-Oct-2023
  • (2023)Lightweight Materialization for Fast Dashboards Over JoinsProceedings of the ACM on Management of Data10.1145/36267351:4(1-27)Online publication date: 12-Dec-2023
  • (2023)LAQy: Efficient and Reusable Query Approximations via Lazy SamplingProceedings of the ACM on Management of Data10.1145/35893191:2(1-26)Online publication date: 20-Jun-2023
  • (2023)BlinkViz: Fast and Scalable Approximate Visualization on Very Large Datasets using Neural-Enhanced Mixed Sum-Product NetworksProceedings of the ACM Web Conference 202310.1145/3543507.3583411(1734-1742)Online publication date: 30-Apr-2023
  • (2023)S/C: Speeding up Data Materialization with Bounded Memory2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00393(1981-1994)Online publication date: Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media