[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Public Access

Granulation of Large Temporal Databases: An Allan Variance Approach

Published: 15 October 2022 Publication History

Abstract

As the use of Big Data begins to dominate various scientific and engineering applications, the ability to conduct complex data analyses with speed and efficiency has become increasingly important. The availability of large amounts of data results in ever-growing storage requirements and magnifies issues related to query response times. In this work, we propose a novel methodology for granulation and data reduction of large temporal databases that can address both issues simultaneously. While prior data reduction techniques rely on heuristics or may be computationally intensive, our work borrows the concept of Allan Variance (AVAR) from the fields of signal processing and sensor characterization to efficiently and systematically reduce the size of temporal databases. Specifically, we use Allan variance to systematically determine the temporal window length over which data remains relevant. Large temporal databases are then granulated using the AVAR-determined window length. Averaging over the resulting granules produces aggregate information for each granule, resulting in significant data reduction. The query performance and data quality are evaluated using existing standard datasets, as well as for two large datasets that include temporal information for vehicular and weather data. Our results demonstrate that the AVAR-based data reduction approach is efficient and maintains data quality, while leading to an order of magnitude improvement in query execution times compared to three existing clustering-based data reduction methods.

References

[1]
Allan DW Statistics of atomic frequency standards Proc IEEE 1966 54 2 221-230
[2]
Bezdek JC, Ehrlich R, and Full W FCM: the fuzzy c-means clustering algorithm Comput Geosci 1984 10 2–3 191-203
[3]
Dua D, Graff C. UCI machine learning repository, 2017. http://archive.ics.uci.edu/ml. 2019
[4]
Goldberg D What every computer scientist should know about floating-point arithmetic ACM Comput Surv (CSUR) 1991 23 1 5-48
[5]
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, and Pirahesh H Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals Data Min Knowl Discov 1997 1 1 29-53
[6]
Haeri H, Beal CE, and Jerath K Near-optimal moving average estimation at characteristic timescales: an allan variance approach IEEE Control Syst Lett 2021 5 5 1531-1536
[7]
Hartigan JA Clustering algorithms 1975 Hoboken Wiley
[8]
Helsen J, Peeters C, Doro P, Ververs E, Jordaens P.J. Wind farm operation and maintenance optimization using big data. In: 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService), 2017;179–184.
[9]
Henrikson J Completeness and total boundedness of the Hausdorff metric MIT Undergrad J Math 1999 1 69-80
[10]
IEEE. IEEE standard for floating-point arithmetic. In: IEEE Std 754-2019 (Revision of IEEE 754-2008), 2019;1–84.
[11]
Januzaj E, Kriegel H.-P, Pfeifle M. DBDC: Density-based distributed clustering. In: International Conference on Extending Database Technology, 2004;88–105. Springer.
[12]
Jerath K, Brennan S, and Lagoa C Bridging the gap between sensor noise modeling and sensor characterization Measurement 2018 116 350-366
[13]
Jerath K and Brennan SN GPS-free terrain-based vehicle tracking performance as a function of inertial sensor characteristics Dyn Syst Control Conf 2011 54761 367-374
[14]
Johnston W Model visualization 2001 San Francisco Morgan Kaufmann Publishers Inc. 223-227
[15]
Kaufmann L. Clustering by means of medoids. In Proc. Statistical Data Analysis Based on the L1 Norm Conference. Neuchatel. 1987;1987:405–16.
[16]
Keogh E, Mueen A. Curse of dimensionality. In: Encyclopedia of Machine Learning and Data Mining. 2017;314–315
[17]
Kile H and Uhlen K Data reduction via clustering and averaging for contingency and reliability analysis Int J Elect Power Energy Syst 2012 43 1 1435-1442
[18]
Kodinariya TM and Makwana PR Review on determining number of cluster in k-means clustering Int J 2013 1 6 90-95
[19]
Liu H and Motoda H On issues of instance selection Data Min Knowl Discov 2002 6 115-130
[20]
Lu J, Liu A, Dong F, Gu F, Gama J, and Zhang G Learning under concept drift: a review IEEE Trans Knowl Data Eng 2018 31 12 2346-2363
[21]
Lumini A and Nanni L A clustering method for automatic biometric template selection Pattern Recogn 2006 39 3 495-497
[22]
MacQueen J. et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA. 1967
[23]
Maddipatla SP, Haeri H, Jerath K, and Brennan S Fast allan variance (FAVAR) and Dynamic fast allan variance (D-FAVAR) algorithms for both regularly and irregularly sampled data IFAC-PapersOnLine 2021 54 20 26-31
[24]
Madigan D, Nason M. Data reduction: sampling. In: Handbook of data mining and knowledge discovery. 2002;205–208.
[25]
Mishra AD and Garg D Selection of best sorting algorithm Int J Intell Inform Process 2008 2 2 363-368
[26]
NASA. Prediction of worldwide energy resource (POWER) datasets. https://power.larc.nasa.gov/.
[27]
Olvera-López JA, Carrasco-Ochoa JA, and Martínez-Trinidad JF A new fast prototype selection method based on clustering Pattern Anal Appl 2010 13 2 131-141
[28]
Pedrycz W. Granular computing: an introduction. In: Proceedings joint 9th IFSA world congress and 20th NAFIPS international conference (Cat. No. 01TH8569), 2001;3, 1349–1354. IEEE
[29]
Rehman MH, Liew CS, Abbas A, Jayaraman PP, Wah TY, and Khan SU Big data reduction methods: a survey Data Sci Eng 2016 1 4 265-284
[30]
Sesia I, Tavella P. Estimating the Allan variance in the presence of long periods of missing data and outliers. Metrologia. 2008;45(6).
[31]
Sinanaj L, Haeri H, Gao L, Maddipatla S, Chen C, Jerath K, Beal C, Brennan S. Allan Variance-based Granulation Technique for Large Temporal Databases. In: Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS, pages 17–28. INSTICC, SciTePress. 2021
[32]
Sun X, Liu L, Geng C, and Yang S Fast data reduction with granulation-based instances importance labeling IEEE Access 2019 7 33587-33597
[33]
Zadeh LA Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic Fuzzy Sets Syst 1997 90 2 111-127

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SN Computer Science
SN Computer Science  Volume 4, Issue 1
Dec 2022
1427 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 15 October 2022
Accepted: 20 August 2022
Received: 09 March 2022

Author Tags

  1. Big data
  2. Data reduction
  3. Temporal granulation
  4. Allan variance

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media