More Web Proxy on the site http://driver.im/

research-article

Public Access

Granulation of Large Temporal Databases: An Allan Variance Approach

Authors:

Lorina Sinanaj,

Satya Prasad Maddipatla,

Niket Kathiriya,

Kshitij JerathAuthors Info & Claims

SN Computer Science, Volume 4, Issue 1

https://doi.org/10.1007/s42979-022-01397-2

Published: 15 October 2022 Publication History

Abstract

As the use of Big Data begins to dominate various scientific and engineering applications, the ability to conduct complex data analyses with speed and efficiency has become increasingly important. The availability of large amounts of data results in ever-growing storage requirements and magnifies issues related to query response times. In this work, we propose a novel methodology for granulation and data reduction of large temporal databases that can address both issues simultaneously. While prior data reduction techniques rely on heuristics or may be computationally intensive, our work borrows the concept of Allan Variance (AVAR) from the fields of signal processing and sensor characterization to efficiently and systematically reduce the size of temporal databases. Specifically, we use Allan variance to systematically determine the temporal window length over which data remains relevant. Large temporal databases are then granulated using the AVAR-determined window length. Averaging over the resulting granules produces aggregate information for each granule, resulting in significant data reduction. The query performance and data quality are evaluated using existing standard datasets, as well as for two large datasets that include temporal information for vehicular and weather data. Our results demonstrate that the AVAR-based data reduction approach is efficient and maintains data quality, while leading to an order of magnitude improvement in query execution times compared to three existing clustering-based data reduction methods.

References

[1]

Allan DW Statistics of atomic frequency standards Proc IEEE 1966 54 2 221-230

[2]

Bezdek JC, Ehrlich R, and Full W FCM: the fuzzy c-means clustering algorithm Comput Geosci 1984 10 2–3 191-203

[3]

Dua D, Graff C. UCI machine learning repository, 2017. http://archive.ics.uci.edu/ml. 2019

[4]

Goldberg D What every computer scientist should know about floating-point arithmetic ACM Comput Surv (CSUR) 1991 23 1 5-48

[5]

Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, and Pirahesh H Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals Data Min Knowl Discov 1997 1 1 29-53

[6]

Haeri H, Beal CE, and Jerath K Near-optimal moving average estimation at characteristic timescales: an allan variance approach IEEE Control Syst Lett 2021 5 5 1531-1536

[7]

Hartigan JA Clustering algorithms 1975 Hoboken Wiley

[8]

Helsen J, Peeters C, Doro P, Ververs E, Jordaens P.J. Wind farm operation and maintenance optimization using big data. In: 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService), 2017;179–184.

[9]

Henrikson J Completeness and total boundedness of the Hausdorff metric MIT Undergrad J Math 1999 1 69-80

[10]

IEEE. IEEE standard for floating-point arithmetic. In: IEEE Std 754-2019 (Revision of IEEE 754-2008), 2019;1–84.

[11]

Januzaj E, Kriegel H.-P, Pfeifle M. DBDC: Density-based distributed clustering. In: International Conference on Extending Database Technology, 2004;88–105. Springer.

[12]

Jerath K, Brennan S, and Lagoa C Bridging the gap between sensor noise modeling and sensor characterization Measurement 2018 116 350-366

[13]

Jerath K and Brennan SN GPS-free terrain-based vehicle tracking performance as a function of inertial sensor characteristics Dyn Syst Control Conf 2011 54761 367-374

[14]

Johnston W Model visualization 2001 San Francisco Morgan Kaufmann Publishers Inc. 223-227

[15]

Kaufmann L. Clustering by means of medoids. In Proc. Statistical Data Analysis Based on the L1 Norm Conference. Neuchatel. 1987;1987:405–16.

[16]

Keogh E, Mueen A. Curse of dimensionality. In: Encyclopedia of Machine Learning and Data Mining. 2017;314–315

[17]

Kile H and Uhlen K Data reduction via clustering and averaging for contingency and reliability analysis Int J Elect Power Energy Syst 2012 43 1 1435-1442

[18]

Kodinariya TM and Makwana PR Review on determining number of cluster in k-means clustering Int J 2013 1 6 90-95

[19]

Liu H and Motoda H On issues of instance selection Data Min Knowl Discov 2002 6 115-130

[20]

Lu J, Liu A, Dong F, Gu F, Gama J, and Zhang G Learning under concept drift: a review IEEE Trans Knowl Data Eng 2018 31 12 2346-2363

[21]

Lumini A and Nanni L A clustering method for automatic biometric template selection Pattern Recogn 2006 39 3 495-497

[22]

MacQueen J. et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA. 1967

[23]

Maddipatla SP, Haeri H, Jerath K, and Brennan S Fast allan variance (FAVAR) and Dynamic fast allan variance (D-FAVAR) algorithms for both regularly and irregularly sampled data IFAC-PapersOnLine 2021 54 20 26-31

[24]

Madigan D, Nason M. Data reduction: sampling. In: Handbook of data mining and knowledge discovery. 2002;205–208.

[25]

Mishra AD and Garg D Selection of best sorting algorithm Int J Intell Inform Process 2008 2 2 363-368

[26]

NASA. Prediction of worldwide energy resource (POWER) datasets. https://power.larc.nasa.gov/.

[27]

Olvera-López JA, Carrasco-Ochoa JA, and Martínez-Trinidad JF A new fast prototype selection method based on clustering Pattern Anal Appl 2010 13 2 131-141

[28]

Pedrycz W. Granular computing: an introduction. In: Proceedings joint 9th IFSA world congress and 20th NAFIPS international conference (Cat. No. 01TH8569), 2001;3, 1349–1354. IEEE

[29]

Rehman MH, Liew CS, Abbas A, Jayaraman PP, Wah TY, and Khan SU Big data reduction methods: a survey Data Sci Eng 2016 1 4 265-284

[30]

Sesia I, Tavella P. Estimating the Allan variance in the presence of long periods of missing data and outliers. Metrologia. 2008;45(6).

[31]

Sinanaj L, Haeri H, Gao L, Maddipatla S, Chen C, Jerath K, Beal C, Brennan S. Allan Variance-based Granulation Technique for Large Temporal Databases. In: Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS, pages 17–28. INSTICC, SciTePress. 2021

[32]

Sun X, Liu L, Geng C, and Yang S Fast data reduction with granulation-based instances importance labeling IEEE Access 2019 7 33587-33597

[33]

Zadeh LA Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic Fuzzy Sets Syst 1997 90 2 111-127

Recommendations

Non-Relational Databases in Big Data
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

These days' Big data is becoming a very essential component for the industries where large volume of data at very high speed is used to solve particular data problems. Generally, big data is first analyzed and then used with other available data in the ...
Bringing SQL databases to key-based NoSQL databases: a canonical approach
Abstract
Big Data management has brought several challenges to data-centric applications, like the support to data heterogeneity, rapid data growth and huge data volume. NoSQL databases have been proposed to tackle Big Data challenges by offering ...
Data Reduction in Very Large Spatio-Temporal Datasets
WETICE '10: Proceedings of the 2010 19th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises

Today, huge amounts of data are being collected with spatial and temporal components from sources such as metrological, satellite imagery etc.. Efficient visualisation as well as discovery of useful knowledge from these datasets is therefore very ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SN Computer Science

SN Computer Science Volume 4, Issue 1

Dec 2022

1427 pages

EISSN:2661-8907

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 15 October 2022

Accepted: 20 August 2022

Received: 09 March 2022

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Computer and Network Systems

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents