Abstract
Automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design. Current automatic visualization approaches adopt mainly aggregation and filtering to extract patterns from the original data. However, these limited data transformations fail to capture complex patterns such as clusters and correlations. Although recent advances in feature engineering provide the potential for more kinds of automatic data transformations, the auto-generated transformations lack explainability concerning how patterns are connected with the original features. To tackle these challenges, we propose a novel explainable recommendation approach for extended kinds of data transformations in automatic visualization. We summarize the space of feasible data transformations and measures on explainability of transformation operations with a literature review and a pilot study, respectively. A recommendation algorithm is designed to compute optimal transformations, which can reveal specified types of patterns and maintain explainability. We demonstrate the effectiveness of our approach through two cases and a user study.
摘要
自动可视化技术能够为不熟悉可视化设计的用户生成有意义的可视化, 以支持他们的数据分析和模式发现需求. 当前, 主流的自动可视化方法采用聚合与过滤从原始数据抽取模式信息. 然而, 这些有限的数据变换并不能捕获聚类、 关联等复杂的模式. 尽管特征工程领域的最新进展为更加广泛的自动数据变换提供了可能, 其结果却缺少可解释性, 导致变换后的模式无法与原始数据特征建立联系. 为应对上述挑战, 我们面向自动可视化中广泛的数据变换类型, 提出一种创新的可解释推荐方法. 我们通过回顾既往文献总结可行的数据变换空间, 通过开展预实验总结变换可解释性的度量. 我们的推荐算法能够计算最优的数据变换, 这种变换能够在维持可解释性的同时揭示数据的模式信息. 真实场景下的使用案例与用户实验验证了我们方法的有效性.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Abdi H, Williams LJ, 2010. Principal component analysis. WIRE Comput Stat, 2(4):433–459. https://doi.org/10.1002/wics.101
Borzsony S, Kossmann D, Stocker K, 2001. The skyline operator. Proc 17th Int Conf on Data Engineering, p.421–430. https://doi.org/10.1109/ICDE.2001.914855
Burkart N, Huber MF, 2021. A survey on the explainability of supervised machine learning. J Artif Intell Res, 70:245–317. https://doi.org/10.1613/jair.1.12228
Cao MQ, Liang J, Li MZ, et al., 2020. TDIVis: visual analysis of tourism destination images. Front Inform Technol Electron Eng, 21(4):536–557. https://doi.org/10.1631/FITEE.1900631
Chakraborty S, Nagwani NK, 2014. Analysis and study of incremental DBSCAN clustering algorithm. https://arxiv.org/abs/1406.4754
Chegini M, Bernard J, Cui J, et al., 2020. Interactive visual labelling versus active learning: an experimental comparison. Front Inform Technol Electron Eng, 21(4):524–535. https://doi.org/10.1631/FITEE.1900549
Chen BY, Wu H, Mo W, et al., 2018. Autostacker: a compositional evolutionary learning system. Proc Genetic and Evolutionary Computation Conf, p.402–409. https://doi.org/10.1145/3205455.3205586
Chen SM, Andrienko N, Andrienko G, et al., 2020. LDA ensembles for interactive exploration and categorization of behaviors. IEEE Trans Visual Comput Graph, 26(9):2775–2792. https://doi.org/10.1109/TVCG.2019.2904069
Chen W, Zhang TY, Zhu HY, et al., 2021. Perspectives on cross-domain visual analysis of cyber-physical-social big data. Front Inform Technol Electron Eng, 22(12):1559–1564. https://doi.org/10.1631/FITEE.2100553
Collins C, Andrienko N, Schreck T, et al., 2018. Guidance in the human-machine analytics process. Vis Inform, 2(3):166–180. https://doi.org/10.1016/j.visinf.2018.09.003
Cui Z, Badam SK, Yalçin MA, et al., 2019. DataSite: proactive visual data exploration with computation of insight-based recommendations. Inform Visual, 18(2):251–267. https://doi.org/10.1177/1473871618806555
Dang TN, Wilkinson L, 2014. ScagExplorer: exploring scatterplots by their scagnostics. Proc IEEE Pacific Visualization Symp, p.73–80. https://doi.org/10.1109/PacificVis.2014.42
Demiralp Ç, Haas PJ, Parthasarathy S, et al., 2017. Foresight: recommending visual insights. Proc VLDB Endow, 10(12):1937–1940. https://doi.org/10.14778/3137765.3137813
Dey K, Shrivastava R, Kaushik S, et al., 2017. EmTaggeR: a word embedding based novel method for hashtag recommendation on Twitter. Proc IEEE Int Conf on Data Mining Workshops, p.1025–1032. https://doi.org/10.1109/ICDMW.2017.145
Dibia V, Demiralp Ç, 2019. Data2Vis: automatic generation of data visualizations using sequence-to-sequence recurrent neural networks. IEEE Comput Graph Appl, 39(5):33–46. https://doi.org/10.1109/MCG.2019.2924636
Ding R, Han S, Xu Y, et al., 2019. QuickInsights: quick and automatic discovery of insights from multi-dimensional data. Proc ACM SIGMOD Int Conf on Management of Data, p.317–332. https://doi.org/10.1145/3299869.3314037
Dong XB, Yu ZW, Cao WM, et al., 2020. A survey on ensemble learning. Front Comput Sci, 14(2):241–258. https://doi.org/10.1007/s11704-019-8208-z
Du L, Gao F, Chen X, et al., 2021. TabularNet: a neural network architecture for understanding semantic structures of tabular data. Proc 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.322–331. https://doi.org/10.1145/3447548.3467228
Fu P, Lin Z, Yuan FC, et al., 2018. Learning sentiment-specific word embedding via global sentiment representation. Proc AAAI Conf on Artificial Intelligence, p.4808–4815. https://doi.org/10.1609/aaai.v32i1.11916
Geng LQ, Hamilton HJ, 2006. Interestingness measures for data mining: a survey. ACM Comput Surv, 38(3):9. https://doi.org/10.1145/1132960.1132963
Giovannangeli L, Bourqui R, Giot R, et al., 2020. Toward automatic comparison of visualization techniques: application to graph visualization. Vis Inform, 4(2):86–98. https://doi.org/10.1016/j.visinf.2020.04.002
Gleicher M, 2013. Explainers: expert explorations with crafted projections. IEEE Trans Visual Comput Graph, 19(12):2042–2051. https://doi.org/10.1109/TVCG.2013.157
Golfarelli M, Rizzi S, 2018. From star schemas to big data: 20+ years of data warehouse research. In: Flesca S, Greco S, Masciari E, et al. (Eds.), A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. Springer, Cham, p.93–107. https://doi.org/10.1007/978-3-319-61893-7_6
He YY, Ganjam K, Lee K, et al., 2018a. Transform-data-by-example (TDE): extensible data transformation in Excel. Proc ACM SIGMOD Int Conf on Management of Data, p.1785–1788. https://doi.org/10.1145/3183713.3193539
He YY, Chu X, Ganjam K, et al., 2018b. Transform-data-by-example (TDE): an extensible search engine for data transformations. Proc VLDB Endow, 11(10):1165–1177. https://doi.org/10.14778/3231751.3231766
Heffetz Y, Vainshtein R, Katz G, et al., 2020. DeepLine: AutoML tool for pipelines generation using deep reinforcement learning and hierarchical actions filtering. Proc 26th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.2103–2113. https://doi.org/10.1145/3394486.3403261
Hu K, Orghian D, Hidalgo CA, 2018. DIVE: a mixed-initiative system supporting integrated data exploration workflows. Proc Workshop on Human-in-the-Loop Data Analytics, Article 5. https://doi.org/10.1145/3209900.3209910
Hu K, Bakker MA, Li S, et al., 2019. VizML: a machine learning approach to visualization recommendation. Proc CHI Conf on Human Factors in Computing Systems, Article 128. https://doi.org/10.1145/3290605.3300358
Ilyas A, da Trindade JMF, Fernandez RC, et al., 2018. Extracting syntactical patterns from databases. Proc 34th IEEE Int Conf on Data Engineering, p.41–52. https://doi.org/10.1109/ICDE.2018.00014
Ingram S, Munzner T, Irvine V, et al., 2010. DimStiller: workflows for dimensional analysis and reduction. Proc IEEE Symp on Visual Analytics Science and Technology, p.3–10. https://doi.org/10.1109/VAST.2010.5652392
Jin ZJ, Anderson MR, Cafarella M, et al., 2017. Foofah: transforming data by example. Proc ACM Int Conf on Management of Data, p.683–698. https://doi.org/10.1145/3035918.3064034
Jin ZJ, He YY, Chauduri S, 2020. Auto-transform: learning-to-transform by patterns. Proc VLDB Endow, 13(12):2368–2381. https://doi.org/10.14778/3407790.3407831
Kanter JM, Veeramachaneni K, 2015. Deep feature synthesis: towards automating data science endeavors. Proc IEEE Int Conf on Data Science and Advanced Analytics, p.1–10. https://doi.org/10.1109/DSAA.2015.7344858
Katz G, Shin ECR, Song D, 2016. ExploreKit: automatic feature generation and selection. Proc 16th IEEE Int Conf on Data Mining, p.979–984. https://doi.org/10.1109/ICDM.2016.0123
Kaul A, Maheshwary S, Pudi V, 2017. AutoLearn—automated feature generation and selection. Proc IEEE Int Conf on Data Mining, p.217–226. https://doi.org/10.1109/ICDM.2017.31
Khurana U, Turaga D, Samulowitz H, et al., 2016. Cognito: automated feature engineering for supervised learning. Proc 16th IEEE Int Conf on Data Mining Workshops, p.1304–1307. https://doi.org/10.1109/ICDMW.2016.0190
Khurana U, Samulowitz H, Turaga D, 2018. Ensembles with automated feature engineering. ICML AutoML Workshop.
Kolouri S, Pope PE, Martin CE, et al., 2018. Sliced-Wasserstein auto-encoders. Proc 17th Int Conf on Learning Representations.
Lam HT, Thiebaut JM, Sinn M, et al., 2017. One button machine for automating feature engineering in relational databases. https://arxiv.org/abs/1706.00327
Law PM, Endert A, Stasko J, 2020. Characterizing automated data insights. Proc IEEE Visualization Conf, p.171–175. https://doi.org/10.1109/VIS47514.2020.00041
Li DQ, Mei HH, Shen Y, et al., 2018. ECharts: a declarative framework for rapid construction of web-based visualization. Vis Inform, 2(2):136–146. https://doi.org/10.1016/j.visinf.2018.04.011
Li HT, Wang Y, Zhang SH, et al., 2022. KG4Vis: a knowledge graph-based approach for visualization recommendation. IEEE Trans Vis Comput Graph, 28(1):195–205. https://doi.org/10.1109/TVCG.2021.3114863
Lin H, Moritz D, Heer J, 2020. Dziban: balancing agency & automation in visualization design via anchored recommendations. Proc CHI Conf on Human Factors in Computing Systems, p.1–12. https://doi.org/10.1145/3313831.3376880
Liu JF, Xiong L, Pei J, et al., 2015. Finding Pareto optimal groups: group-based skyline. Proc VLDB Endow, 8(13):2086–2097. https://doi.org/10.14778/2831360.2831363
Liu SX, Andrienko G, Wu YC, et al., 2018. Steering data quality with visual analytics: the complexity challenge. Vis Inform, 2(4):191–197. https://doi.org/10.1016/j.visinf.2018.12.001
Lu JH, Chen W, Ma YX, et al., 2017. Recent progress and trends in predictive visual analytics. Front Comput Sci, 11(2):192–207. https://doi.org/10.1007/s11704-016-6028-y
Luo YY, Qin XD, Tang N, et al., 2018. DeepEye: towards automatic data visualization. Proc 34th IEEE Int Conf on Data Engineering, p.101–112. https://doi.org/10.1109/ICDE.2018.00019
McInnes L, Healy J, Melville J, 2018. UMAP: uniform manifold approximation and projection for dimension reduction. https://arxiv.org/abs/1802.03426v2
Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. Proc 1st Int Conf on Learning Representations.
Moritz D, Wang CL, Nelson GL, et al., 2019. Formalizing visualization design knowledge as constraints: actionable and extensible models in Draco. IEEE Trans Visual Comput Graph, 25(1):438–448. https://doi.org/10.1109/TVCG.2018.2865240
Nargesian F, Samulowitz H, Khurana U, et al., 2017. Learning feature engineering for classification. Proc 26th Int Joint Conf on Artificial Intelligence, p.2529–2535. https://doi.org/10.24963/ijcai.2017/352
Natani G, Watanabe S, 2021. Knowledge graph-based data transformation recommendation engine. Proc IEEE Int Conf on Big Data, p.4617–4623. https://doi.org/10.1109/BigData52589.2021.9671905
Ngatchou P, Zarei A, El-Sharkawi A, 2005. Pareto multi objective optimization. Proc 13th Int Conf on Intelligent Systems Application to Power Systems, p.84–91. https://doi.org/10.1109/ISAP.2005.1599245
Pan JC, Han DM, Guo FZ, et al., 2020. RCAnalyzer: visual analytics of rare categories in dynamic networks. Front Inform Technol Electron Eng, 21(4):491–506. https://doi.org/10.1631/FITEE.1900310
Pandey A, L’Yi S, Wang QW, et al., 2022. GenoREC: a recommendation system for interactive genomics data visualization. IEEE Trans Visual Comput Graph, early access. https://doi.org/10.1109/TVCG.2022.3209407
Qian X, Rossi RA, Du F, et al., 2021. Learning to recommend visualizations from data. Proc 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.1359–1369. https://doi.org/10.1145/3447548.3467224
Qin XD, Luo YY, Tang N, et al., 2018. DeepEye: an automatic big data visualization framework. Big Data Min Anal, 1(1):75–82. https://doi.org/10.26599/BDMA.2018.9020007
Qin XD, Luo YY, Tang N, et al., 2020. Making data visualization more efficient and effective: a survey. VLDB J, 29(1):93–117. https://doi.org/10.1007/s00778-019-00588-3
Rattaphun M, Fang WC, Chiu CY, 2022. Attention on global-local representation spaces in recommender systems. IEEE Trans Comput Soc Syst, 9(5):1394–1405. https://doi.org/10.1109/TCSS.2021.3129482
Shen LX, Shen EY, Tai ZW, et al., 2021. TaskVis: task-oriented visualization recommendation. Proc Eurographics Conf on Visualization. https://doi.org/10.2312/evs.20211061
Shi DQ, Xu XY, Sun FL, et al., 2021. Calliope: automatic visual data story generation from a spreadsheet. IEEE Trans Visual Comput Graph, 27(2):453–463. https://doi.org/10.1109/TVCG.2020.3030403
Siddiqui T, Lee J, Kim A, et al., 2017. Fast-forwarding to desired visualizations with zenvisage. Proc 8th Biennial Conf on Innovative Data Systems Research.
Singh R, 2016. BlinkFill: semi-supervised programming by example for syntactic string transformations. Proc VLDB Endow, 9(10):816–827. https://doi.org/10.14778/2977797.2977807
Tang B, Han S, Yiu ML, et al., 2017. Extracting top-k insights from multi-dimensional data. Proc ACM Int Conf on Management of Data, p.1509–1524. https://doi.org/10.1145/3035918.3035922
Tatu A, Albuquerque G, Eisemann M, et al., 2009. Combining automated analysis and visualization techniques for effective exploration of high-dimensional data. Proc IEEE Symp on Visual Analytics Science and Technology, p.59–66. https://doi.org/10.1109/VAST.2009.5332628
Tran B, Xue B, Zhang MJ, 2016. Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput, 8(1):3–15. https://doi.org/10.1007/s12293-015-0173-y
Vartak M, Madden S, Parameswaran A, et al., 2014. SeeDB: automatically generating query visualizations. Proc VLDB Endow, 7(13):1581–1584. https://doi.org/10.14778/2733004.2733035
Wang HN, Liu N, Zhang YY, et al., 2020. Deep reinforcement learning: a survey. Front Inform Technol Electron Eng, 21(12):1726–1744. https://doi.org/10.1631/FITEE.1900533
Wang Y, Sun ZD, Zhang HD, et al., 2019. DataShot: automatic generation of fact sheets from tabular data. IEEE Trans Visual Comput Graph, 26(1):895–905. https://doi.org/10.1109/TVCG.2019.2934398
Warren RH, Tompa FW, 2006. Multi-column substring matching for database schema translation. Proc 32nd Int Conf on Very Large Data Bases, p.331–342.
Wen Z, Zhou MX, 2008a. Evaluating the use of data transformation for information visualization. IEEE Trans Vis Comput Graph, 14(6):1309–1316. https://doi.org/10.1109/TVCG.2008.129
Wen Z, Zhou MX, 2008b. An optimization-based approach to dynamic data transformation for smart visualization. Proc 13th Int Conf on Intelligent User Interfaces, p.70–79. https://doi.org/10.1145/1378773.1378784
Wilkinson L, Anand A, Grossman R, 2005. Graph-theoretic scagnostics. Proc IEEE Symp on Information Visualization, p.157–164. https://doi.org/10.1109/INFVIS.2005.1532142
Wongsuphasawat K, Moritz D, Anand A, et al., 2016. Voyager: exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans Visual Comput Graph, 22(1):649–658. https://doi.org/10.1109/TVCG.2015.2467191
Wongsuphasawat K, Qu ZN, Moritz D, et al., 2017. Voyager 2: augmenting visual analysis with partial view specifications. Proc CHI Conf on Human Factors in Computing Systems, p.2648–2659. https://doi.org/10.1145/3025453.3025768
Wu AY, Wang Y, Zhou MY, et al., 2022. MultiVision: designing analytical dashboards with deep learning based recommendation. IEEE Trans Visual Comput Graph, 28(1):162–172. https://doi.org/10.1109/TVCG.2021.3114826
Xia JZ, Zhang YH, Ye H, et al., 2020. SuPoolVisor: a visual analytics system for mining pool surveillance. Front Inform Technol Electron Eng, 21(4):507–523. https://doi.org/10.1631/FITEE.1900532
Yan C, He YY, 2020. Auto-suggest: learning-to-recommend data preparation steps using data science notebooks. Proc ACM SIGMOD Int Conf on Management of Data, p.1539–1554. https://doi.org/10.1145/3318464.3389738
Yao QM, Wang MS, Hugo JE, et al., 2018. Taking human out of learning applications: a survey on automated machine learning. https://arxiv.org/abs/1810.13306v1
Zeng ZH, Moh P, Du F, et al., 2022. An evaluation-focused framework for visualization recommendation algorithms. IEEE Trans Visual Comput Graph, 28(1):346–356. https://doi.org/10.1109/TVCG.2021.3114814
Zhou MY, Tao W, Ji PX, et al., 2020. Table2Analysis: modeling and recommendation of common analysis patterns for multi-dimensional data. Proc 34th AAAI Conf on Artificial Intelligence, p.320–328. https://doi.org/10.1609/aaai.v34i01.5366
Zhou MY, Li QT, He XY, et al., 2021. Table2Charts: recommending charts by learning shared table representations. Proc 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.2389–2399. https://doi.org/10.1145/3447548.3467279
Zhu EK, He YY, Chaudhuri S, 2017. Auto-Join: joining tables by leveraging transformations. Proc VLDB Endow, 10(10):1034–1045. https://doi.org/10.14778/3115404.3115409
Zhu SJ, Sun GD, Jiang Q, et al., 2020. A survey on automatic infographics and visualization recommendations. Vis Inform, 4(3):24–40. https://doi.org/10.1016/j.visinf.2020.07.002
Zöller MA, Huber MF, 2021. Benchmark and survey of automated machine learning frameworks. J Artif Intell Res, 70:409–472. https://doi.org/10.1613/jair.1.11854
Author information
Authors and Affiliations
Contributions
Ziliang WU designed the research and drafted the paper. Wei CHEN, Yuxin MA, and Jiazhi XIA helped organize the paper. Ziliang WU, Tong XU, and Lei LV implemented the system. Fan YAN and Zhonghao QIAN collected the data. Wei CHEN revised and finalized the paper.
Corresponding author
Ethics declarations
Ziliang WU, Wei CHEN, Yuxin MA, Tong XU, Fan YAN, Lei LV, Zhonghao QIAN, and Jiazhi XIA declare that they have no conflict of interest.
Additional information
Project supported by the National Natural Science Foundation of China (No. 62132017) and the Fundamental Research Funds for the Central Universities, China (No. 226202200235)
Rights and permissions
About this article
Cite this article
Wu, Z., Chen, W., Ma, Y. et al. Explainable data transformation recommendation for automatic visualization. Front Inform Technol Electron Eng 24, 1007–1027 (2023). https://doi.org/10.1631/FITEE.2200409
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2200409