Abstract
Air quality analysis helps analysts understand the state of atmospheric pollution and its changing trends, providing robust data and theoretical support for developing and implementing environmental policies. Air quality data are typically represented as multivariate time series, which poses challenges due to the large amount of data, high dimensionality, and lack of labeled information. Analysts often struggle to discover internal relationships and patterns within the data. There is still significant room for improvement in related data mining and exploration methods, as issues such as perceptual burden and low efficiency must be addressed. To assist analysts in atmospheric pollution analysis, we propose an air quality visualization scheme based on feature extraction of multivariate time series data. We utilize the automated data modeling capability of deep learning and intuitive data visualization to help analysts explore and analyze complex air quality datasets. To extract features of air quality data effectively, we transform the multivariate time series feature extraction task into an automated deep learning self-supervised task and propose a feature extraction method called CTDCN for multivariate time series. Finally, we design and implement a visualization and analysis system for air quality multivariate time series. This system helps analysts discover potential information and patterns in air quality data, providing support and a foundation for informed decision-making. The system offers rich visualization views, allows users to change data modeling parameters, and interactively analyze and extract insights from the data through multiple views. Extensive experiments on UEA public datasets confirm CTDCN’s superior feature extraction capabilities, while case studies and user studies validate the effectiveness and practicality of our visualization approach.
Graphical abstract
Similar content being viewed by others
References
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdis Rev Comput Stat 2(4):433–459
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31:606–660
Cleveland RB, Cleveland WS, McRae JE, Terpenning I (1990) Stl: a seasonal-trend decomposition. J Off Stat 6(1):3–73
Eldele E, Ragab M, Chen Z, Wu M, Kwoh CK, Li X, Guan C (2021) Time-series representation learning via temporal and contextual contrasting. arXiv preprint arXiv:2106.14112
Forkan ARM, Kimm G, Morshed A, Jayaraman PP, Banerjee A, Huang W (2019) Aqvision: a tool for air quality data visualisation and pollution-free route tracking for smart city. In: 2019 23rd international conference in information visualization–part II, IEEE, pp 47–51
Franceschi J-Y, Dieuleveut A, Jaggi M (2019) Unsupervised scalable representation learning for multivariate time series. Adv Neural Inf Process Syst 32
Fujiwara T, Sakamoto N, Nonaka J, Yamamoto K, Ma K-L et al (2020) A visual analytics framework for reviewing multivariate time-series data with dimensionality reduction. IEEE Trans Vis Comput Gr 27(2):1601–1611
Hamed KH, Rao AR (1998) A modified Mann–Kendall trend test for autocorrelated data. J Hydrol 204(1–4):182–196
Hauke J, Kossowski T (2011) Comparison of values of Pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest Geogr 30(2):87–93
Jäckle D, Fischer F, Schreck T, Keim DA (2015) Temporal MDS plots for analysis of multivariate data. IEEE Trans Vis Comput Gr 22(1):141–150
Jiang X, Lou S, Scott PJ (2011) Morphological method for surface metrology and dimensional metrology based on the alpha shape. Meas Sci Technol 23(1):015003
Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 285–289
Kong L, Tang X, Zhu J, Wang Z, Li J, Wu H, Wu Q, Chen H, Zhu L, Wang W et al (2021) A 6-year-long (2013–2018) high-resolution air quality reanalysis dataset in china based on the assimilation of surface observations from cnemc. Earth Syst Sci Data 13(2):529–570
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400
Lyu X, Hueser M, Hyland SL, Zerveas G, Raetsch G (2018) Improving clinical predictions through unsupervised time series representation learning. arXiv preprint arXiv:1812.00490
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
Malhotra P, TV V, Vig L, Agarwal P, Shroff G (2017) Timenet: pre-trained deep recurrent neural network for time series classification. arXiv preprint arXiv:1706.08838
Ma J, Shou Z, Zareian A, Mansour H, Vetro A, Chang S-F (2019) Cdsa: cross-dimensional self-attention for multivariate, geo-tagged time series imputation. arXiv preprint arXiv:1905.09904
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426
Park JW, Yun CH, Jung HS, Lee YW (2011) Visualization of urban air pollution with cloud computing. In: 2011 IEEE world congress on services, IEEE, pp 578–583
Peng Y, Fan X, Chen R, Yu Z, Liu S, Chen Y, Zhao Y, Zhou F (2023) Visual abstraction of dynamic network via improved multi-class blue noise sampling. Front Comp Sci 17(1):171701
Press WH, Teukolsky SA (1990) Savitzky-golay smoothing filters. Comput Phys 4(6):669–672
Sacha D, Kraus M, Bernard J, Behrisch M, Schreck T, Asano Y, Keim DA (2017) Somflow: guided exploratory cluster analysis with self-organizing maps and analytic provenance. IEEE Trans Vis Comput Gr 24(1):120–130
Tonekaboni S, Eytan D, Goldenberg A (2021) Unsupervised representation learning for time series with temporal neighborhood coding. arXiv preprint arXiv:2106.00750
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proces Syst 30
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2006) Indexing multidimensional time-series. VLDB J 15:1–20
Weidele DKI (2019) Conditional parallel coordinates. In: 2019 IEEE visualization conference (VIS), IEEE, pp 221–225
Yue Z, Wang Y, Duan J, Yang T, Huang C, Tong, Y, Xu B (2021) Ts2vec: towards universal representation of time series. In: AAAI conference on artificial intelligence
Zeng Y-R, Chang YS, Fang YH (2019) Data visualization for air quality analysis on bigdata platform. In: 2019 international conference on system science and engineering (ICSSE), IEEE, pp 313–317
Zerveas G, Jayaraman S, Patel D, Bhamidipaty A, Eickhoff C (2021) A transformer-based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 2114–2124
Zhao Y, Ge L, Xie H, Bai G, Zhang Z, Wei Q, Lin Y, Liu Y, Zhou F (2022) Astf: visual abstractions of time-varying patterns in radio signals. IEEE Trans Vis Comput Gr 29(1):214–224
Zhao Y, Lv S, Long W, Fan Y, Yuan J, Jiang H, Zhou F (2023) Malicious webshell family dataset for webshell multi-classification research. Vis Inf
Zimmerman Z, Kamgar K, Senobari NS, Crites B, Funning G, Brisk P, Keogh E (2019) Matrix profile XIV: scaling time series motif discovery with gpus to break a quintillion pairwise comparisons a day and beyond. In: Proceedings of the ACM symposium on cloud computing, pp 74–86
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 62272071 and U1836114.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Results of the classification experiment
Appendix A: Results of the classification experiment
See Table 3.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Luo, X., Jiang, R., Yang, B. et al. Air quality visualization analysis based on multivariate time series data feature extraction. J Vis 27, 567–584 (2024). https://doi.org/10.1007/s12650-024-00981-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12650-024-00981-3