[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

AccuStripes: : Visual exploration and comparison of univariate data distributions using color and binning

Published: 18 July 2024 Publication History

Abstract

Understanding and analyzing univariate distributions of data in terms of their shapes as well as their specific characteristics, regarding gaps, spikes, or outliers, is crucial in many scientific disciplines. In this paper, we propose a design space composed of the visual channels position and color for representing accumulated distributions. The designs are a mixture of color-coded stripes with density lines. The width and coloring of the stripes is based on the applied binning technique. In a crowd-sourced experiment we explore a subspace, called the AccuStripes (i.e., “accumulated stripes”) design space, consisting of nine representations. These AccuStripes designs integrate three composition strategies (color only, overlay, filled curve) with three binning techniques, one uniform (UB) and two adaptive methods, namely Bayesian Blocks (BB) and Jenks’ Natural Breaks (NB). We evaluate the accuracy, efficiency, and confidence ratings of the nine AccuStripes designs for structural estimation and comparison tasks. Across all study tasks, the overlay composition was found to be most accurate and preferred by observers. Furthermore, the results demonstrate that while no binning method performed best in both identification and comparison, detection of structures using adaptive binning was the most accurate one. For validation we compared the best AccuStripes’ design, i.e., the overlay composition, to line charts. Our results show that the AccuStripes’ design outperformed the line charts in accuracy for all study tasks.

Graphical abstract

Display Omitted

Highlights

Novel design space to represent and compare univariate data distributions.
A subspace is introduced combining color-coded stripes with density lines.
The stripes’ width and coloring are defined by adaptive and uniform binning methods.
Evaluation and validation of the subspace through two crowdsource studies.

References

[1]
Menning K.M., Battles J.J., Benning T.L., Quantifying change in distributions: a new departure index that detects, measures and describes change in distributions from population structures, size-classes and other ordered data, Oecologia 154 (1) (2007) 75–84,.
[2]
Maurer J., Jerabek M., Salaberger D., Thor M., Kastner J., Major Z., Stress relaxation behaviour of glass fibre reinforced thermoplastic composites and its application to the design of interrupted in situ tensile tests for investigations by X-ray computed tomography, Polym Test 109 (2022),.
[3]
Floricel C., Nipu N., Biggs M., Wentzel A., Canahuate G., Dijk L.V., Mohamed A., Fuller D., Marai E., THALIS: Human-machine analysis of longitudinal symptoms in cancer therapy, IEEE Trans Vis Comput Graph 28 (1) (2022) 151–161,.
[4]
Blumenschein M., Debbeler L.J., Lages N.C., Renner B., Keim D.A., El-Assady M., v-plots: Designing hybrid charts for the comparative analysis of data distributions, Comput Graph Forum 39 (3) (2020) 565–577,.
[5]
Correll M., Li M., Kindlmann G., Scheidegger C., Looks good to me: Visualizations as sanity checks, IEEE Trans Vis Comput Graph 25 (1) (2019) 830–839,.
[6]
McNutt A., Kindlmann G., Correll M., Surfacing visualization mirages, in: CHI ’20: proceedings of the 2020 CHI conference on human factors in computing systems, Association for Computing Machinery ACM, 2020,.
[7]
Aigner W., Kainz C., Ma R., Miksch S., Bertin was right: An empirical evaluation of indexing to compare multivariate time-series data using line plots, Comput Graph Forum 30 (1) (2011) 215–228,.
[8]
Cho M., Kim B., Bae H.-J., Seo J., Stroscope: Multi-scale visualization of irregularly measured time-series data, IEEE Trans Vis Comput Graph 20 (5) (2014) 808–821,.
[9]
Javed W., McDonnel B., Elmqvist N., Graphical perception of multiple time series, IEEE Trans Vis Comput Graph 16 (6) (2010) 927–934,.
[10]
Lam H., Munzner T., Kincaid R., Overview use in multiple visual information resolution interfaces, IEEE Trans Vis Comput Graph 13 (6) (2007) 1278–1285,.
[11]
Sahann R., Möller T., Schmidt J., Histogram binning revisited with a focus on human perception, Proc VIS Short Pap 2021 (2021),. arXiv:2109.06612.
[12]
Pollack B., Bhattacharya S., Schmitt M., Bayesian block histogramming for high energy physics, 2017,. arXiv preprint arXiv:1708.00810, arXiv:1708.00810.
[13]
Fisher W.D., On grouping for maximum homogeneity, J Amer Statist Assoc 53 (284) (1958) 789–798,.
[14]
Szafir D.A., Haroz S., Gleicher M., Franconeri S., Four types of ensemble coding in data visualizations, J Vis 16 (5) (2016) 11,.
[15]
Thrun M.C., Gehlert T., Ultsch A., Analyzing the fine structure of distributions, Vafaee F. (Ed.), PLoS One 15 (10) (2020),.
[16]
Cha S.-H., Comprehensive survey on distance/similarity measures between probability density functions, Int J Math Models Methods Appl Sci 1 (4) (2007) 300–307. URL http://www.gly.fsu.edu/~parker/geostats/Cha.pdf.
[17]
Ma Y., Gu X., Wang Y., Histogram similarity measure using variable bin size distance, Comput Vis Image Underst 114 (8) (2010) 981–989,.
[18]
Bazan E, Dokládal P, Dokladalova E. Quantitative Analysis of Similarity Measures of Distributions. In: British machine vision conference 2019, BMVC 2019. Cardiff, United Kingdom; 2019.
[19]
Läuter H., Silverman: Density estimation for statistics and data analysis, Biom J 30 (7) (1988) 876–877,.
[20]
Rodrigues A.M.B., Barbosa G.D.J., Lopes H., Barbosa S.D.J., Comparing the effectiveness of visualizations of different data distributions, in: 2019 32nd SIBGRApI conference on graphics, patterns and images (SIBGRApI), IEEE, 2019,.
[21]
Jabbari A., Blanch R., Dupuy-Chessa S., Beyond horizon graphs, in: Proceedings of the 30th conference on l’interaction homme-machine, ACM, 2018,.
[22]
Gogolou A., Tsandilas T., Palpanas T., Bezerianos A., Comparing similarity perception in time series visualizations, IEEE Trans Vis Comput Graph 25 (1) (2019) 523–533,.
[23]
Ondov B., Jardine N., Elmqvist N., Franconeri S., Face to face: Evaluating visual comparison, IEEE Trans Vis Comput Graph 25 (1) (2019) 861–871,.
[24]
Correll M., Albers D., Franconeri S., Gleicher M., Comparing averages in time series data, in: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, 2012,.
[25]
Albers D., Correll M., Gleicher M., Task-driven evaluation of aggregation in time series visualization, in: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, 2014,.
[26]
Bade R., Schlechtweg S., Miksch S., Connecting time-oriented data and information to a coherent interactive visualization, in: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, 2004,.
[27]
Aigner W., Rind A., Hoffmann S., Comparative evaluation of an interactive time-series visualization that combines quantitative data with qualitative abstractions, Comput Graph Forum 31 (3pt2) (2012) 995–1004,.
[28]
Albers D., Dewey C., Gleicher M., Sequence surveyor: Leveraging overview for scalable genomic alignment visualization, IEEE Trans Vis Comput Graphics (2011),.
[29]
Szafir D.A., Stuffer D., Sohail Y., Gleicher M., TextDNA: Visualizing word usage with configurable colorfields, Comput Graph Forum 35 (3) (2016) 421–430,.
[30]
Saito T., Miyamura H., Yamamoto M., Saito H., Hoshiya Y., Kaseda T., Two-tone pseudo coloring: compact visualization for one-dimensional data, in: IEEE symposium on information visualization, 2005. INFOVIs 2005., IEEE, 2005,.
[31]
Berry L., Munzner T., BinX: Dynamic exploration of time series datasets across aggregation levels, in: IEEE symposium on information visualization, IEEE, 2004,.
[32]
Zeng Z., Battle L., A review and collation of graphical perception knowledge for visualization recommendation, in: Proceedings of the 2023 CHI conference on human factors in computing systems, ACM, 2023,.
[33]
Silverman B.W., Density Estimation for Statistics and Data Analysis, Routledge, 1986,.
[34]
Leow W.K., Li R., The analysis and applications of adaptive-binning color histograms, Comput Vis Image Underst 94 (1–3) (2004) 67–91,.
[35]
Jenks G.F., Optimal data classification for choropleth maps, Department of Geographiy, University of Kansas Occasional Paper, 1977.
[36]
Weglarczyk S., Kernel density estimation and its application, Zielinski W., Kuchar L., Michalski A., Kazmierczak B. (Eds.), ITM Web Conf 23 (2018) 00037,.
[37]
Lampe O.D., Hauser H., Interactive visualization of streaming data with kernel density estimation, in: 2011 IEEE Pacific visualization symposium, IEEE, 2011,.
[38]
Wickham H., A layered grammar of graphics, J Comput Graph Statist 19 (1) (2010) 3–28,.
[39]
Satyanarayan A., Moritz D., Wongsuphasawat K., Heer J., Vega-lite: A grammar of interactive graphics, IEEE Trans Vis Comput Graph 23 (1) (2017) 341–350,.
[40]
Wu C., Kao S.-C., Okuhara K., Examination and comparison of conflicting data in granulated datasets: Equal width interval vs. equal frequency interval, Inform Sci 239 (2013) 154–164,.
[41]
McColeman C.M., Yang F., Brady T.F., Franconeri S., Rethinking the ranks of visual channels, IEEE Trans Vis Comput Graph 28 (1) (2022) 707–717,.
[42]
Cleff T., Univariate data analysis, in: Exploratory data analysis in business and economics, Springer International Publishing, 2013, pp. 23–60,.
[43]
Liu Y., Heer J., Somewhere over the rainbow, in: Proceedings of the 2018 CHI conference on human factors in computing systems, ACM, 2018,.
[44]
Quadri G.J., Rosen P., A survey of perception-based visualization studies by task, IEEE Trans Vis Comput Graphics 28 (12) (2022) 5026–5048,.
[45]
Talbot J., Setlur V., Anand A., Four experiments on the perception of bar charts, IEEE Trans Vis Comput Graph 20 (12) (2014) 2152–2160,.
[46]
Setlur V., Correll M., Battersby S., Oscar: A semantic-based data binning approach, in: 2022 IEEE visualization and visual analytics (VIS), IEEE, 2022,.
[47]
Munzner T., Visualization analysis and design, Taylor & Francis Inc, University of British Columbia, Vancouver, Canada, ISBN 1466508914, 2014, URL https://www.ebook.de/de/product/22371060/tamara_university_of_british_columbia_vancouver_canada_munzner_visualization_analysis_and_design.html.
[48]
Reda K., Nalawade P., Ansah-Koi K., Graphical perception of continuous quantitative maps, in: Proceedings of the 2018 CHI conference on human factors in computing systems, ACM, 2018,.
[49]
Mittelstädt S., Stoffel A., Keim D.A., Methods for compensating contrast effects in information visualization, Comput Graph Forum 33 (3) (2014) 231–240,.
[50]
Han H.L., Nacenta M.A., The effect of visual and interactive representations on human performance and preference with scalar data fields, in: Proceedings of graphics interface 2020, in: GI 2020, Canadian Human-Computer Communications Society / Société canadienne du dialogue humain-machine, University of Toronto, 2020, pp. 225–235,.
[51]
Zeileis A., Fisher J.C., Hornik K., Ihaka R., McWhite C.D., Murrell P., Stauffer R., Wilke C.O., Colorspace: A toolbox for manipulating and assessing colors and palettes, J Stat Softw 96 (1) (2020),.
[52]
Correll M., Moritz D., Heer J., Value-suppressing uncertainty palettes, in: Proceedings of the 2018 CHI conference on human factors in computing systems, ACM, 2018,.
[53]
Wang C., Yu H., Ma K.-L., Importance-driven time-varying data visualization, IEEE Trans Vis Comput Graph 14 (6) (2008) 1547–1554,.
[54]
Palacio-Niño J.-O., Berzal F., Evaluation metrics for unsupervised learning algorithms, 2019, arXiv preprint arXiv:1905.05667. arXiv:1905.05667.
[55]
Fuchs J., Fischer F., Mansmann F., Bertini E., Isenberg P., Evaluation of alternative glyph designs for time series data in a small multiple setting, in: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, 2013,.
[56]
Heer J., Kong N., Agrawala M., Sizing the horizon: the effects of chart size and layering on the graphical perception of time series visualizations, in: Proceedings of the 27th international conference on human factors in computing systems - CHI 09, ACM Press, 2009,.
[57]
Heim A., Gröller E., Heinzl C., Cosi: Visual comparison of similarities in high-dimensional data ensembles, in: Andres B., Campen M., Sedlmair M. (Eds.), Vision, modeling, and visualization, The Eurographics Association, ISBN 978-3-03868-161-8, 2021,.
[58]
Apps script; google developers, 2009, URL https://developers.google.com/apps-script.
[59]
Wobbrock J.O., Findlater L., Gergle D., Higgins J.J., The aligned rank transform for nonparametric factorial analyses using only anova procedures, in: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, 2011,.
[60]
Elkin L.A., Kay M., Higgins J.J., Wobbrock J.O., An aligned rank transform procedure for multifactor contrast tests, in: The 34th annual ACM symposium on user interface software and technology, ACM, 2021,.
[61]
Vanderplas S., Cook D., Hofmann H., Testing statistical charts: What makes a good graph?, Annu Rev Stat Appl 7 (1) (2020) 61–88,.
[62]
Aigner W., Miksch S., Schumann H., Tominski C., Visualization of time-oriented data, Springer London, ISBN 978-0-85729-078-6, 2011,.
[63]
Brewer C.A., Pickle L., Evaluation of methods for classifying epidemiological data on choropleth maps in series, Ann Assoc Am Geogr 92 (4) (2002) 662–681,.
[64]
Schubert E., Sander J., Ester M., Kriegel H.P., Xu X., DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN, ACM Trans Database Syst 42 (3) (2017) 1–21,.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computers and Graphics
Computers and Graphics  Volume 119, Issue C
Apr 2024
407 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 18 July 2024

Author Tags

  1. Visual analysis
  2. Univariate data distributions
  3. Adaptive binning
  4. Crowd-sourced experiment

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media