[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Public Access

Living on the Edge: Data Transmission, Storage, and Analytics in Continuous Sensing Environments

Published: 08 July 2021 Publication History

Abstract

Voluminous time-series data streams produced in continuous sensing environments impose challenges pertaining to ingestion, storage, and analytics. In this study, we present a holistic approach based on data sketching to address these issues. We propose a hyper-sketching algorithm that combines discretization and frequency-based sketching to produce compact representations of the multi-feature, time-series data streams. We generate an ensemble of data sketches to make effective use of capabilities at the resource-constrained edge devices, the links over which data are transmitted, and the server pool where this data must be stored. The data sketches can be queried to construct datasets that are amenable to processing using popular analytical engines. We include several performance benchmarks using real-world data from different domains to profile the suitability of our design decisions. The proposed methodology can achieve up to ∼ 13 × and ∼ 2, 207 × reduction in data transfer and energy consumption at edge devices. We observe up to a ∼ 50% improvement in analytical job completion times in addition to the significant improvements in disk and network I/O.

References

[1]
ACM Intl. Conference on Distributed Event Based Systems. 2014. DEBS 2014 Grand Challenge: Smart homes. Retrieved from http://debs.org/debs-2014-smart-homes/.
[2]
The Apache Software Foundation. 2016. Apache Edgent: A Community for Accelerating Analytics at the Edge. Retrieved from http://edgent.apache.org/.
[3]
The Apache Software Foundation. 2016. Apache Spark: Lightning-fast cluster computing. Retrieved from http://spark. apache. org.
[4]
The Apache Software Foundation. 2018. Apache Hadoop: Open-source software for reliable, scalable, distributed computing. Retrieved from https://hadoop.apache.org/.
[5]
Amazon Web Services, Inc. 2019. AWS IoT Core. Retrieved from https://aws.amazon.com/iot-core/.
[6]
Amazon Web Services, Inc. 2019. AWS IoT Greengrass. Retrieved from https://aws.amazon.com/greengrass/.
[7]
The Graphite Project. 2019. Graphite. Retrieved from https://graphiteapp.org/.
[8]
The Apache Software Foundation. 2019. HDFS Architecture. Retrieved from https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html.
[9]
InfluxData Inc. 2019. InfluxDB: The modern engine for Metrics and Events. Retrieved from https://www.influxdata.com/.
[10]
The OpenTSDB Project. 2019. Open TSDB: The Scalable Time Series Database. Retrieved from http://opentsdb.net/.
[11]
The Linux Foundation. 2019. Prometheus: From metrics to insight. Retrieved from https://prometheus.io/.
[12]
Google Cloud. 2020. Cloud IoT Core. Retrieved from https://cloud.google.com/iot-core/.
[13]
Ganesh Ananthanarayanan et al. 2011. Disk-locality in datacenter computing considered irrelevant. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’11), Vol. 13. 12–12.
[14]
Juan-Carlos Baltazar et al. 2006. Study of cubic splines and Fourier series as interpolation techniques for filling in short periods of missing building energy use and weather data. J. Solar Energy Eng. 128, 2 (2006), 226–230.
[15]
Flavio Bonomi et al. 2012. Fog computing and its role in the internet of things. In Proceedings of the 1st Edition of the MCC Workshop on Mobile Cloud Computing. ACM, 13–16.
[16]
George E. P. Box et al. 2015. Time Series Analysis: Forecasting and Control. John Wiley & Sons.
[17]
James Brusey et al. 2009. Postural activity monitoring for increasing safety in bomb disposal missions. Measure. Sci. Technol. 20, 7 (2009), 075204.
[18]
Thilina Buddhika et al. 2017. Synopsis: A distributed sketch over voluminous spatiotemporal observational streams. IEEE Trans. Knowl. Data Eng. 29, 11 (2017), 2552–2566.
[19]
Graham Cormode. 2011. Sketch techniques for approximate query processing. In Foundations and Trends in Databases, Norwell, MA. Now Publishers, USA.
[20]
Graham Cormode et al. 2005. An improved data stream summary: The count-min sketch and its applications. J. Algorithms 55, 1 (2005), 58–75.
[21]
Giuseppe DeCandia et al. 2007. Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Operat. Syst. Rev. 41, 6 (2007), 205–220.
[22]
Pavan Edara et al. 2008. Asynchronous in-network prediction: Efficient aggregation in sensor networks. ACM Trans. Sensor Netw. 4, 4 (2008), 25.
[23]
Philippe Flajolet et al. 1985. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31, 2 (1985), 182–209.
[24]
Jordi Fonollosa et al. 2015. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors Actuat. B: Chem. 215 (2015), 618–629.
[25]
Deepak Ganesan et al. 2005. Multiresolution storage and search in sensor networks. ACM Trans. Storage 1, 3 (2005), 277–315.
[26]
Prasanna Ganesan et al. 2004. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proceedings of the 30th International Conference on Very Large Data bases-Volume 30. VLDB Endowment, 444–455.
[27]
Elena I. Gaura et al. 2011. Bare necessities—Knowledge-driven WSN design. In Proceedings of the IEEE SENSORS Conference. IEEE, 66–70.
[28]
Phillip B. Gibbons et al. 2003. Irisnet: An architecture for a worldwide sensor web. IEEE Pervas. Comput. 2, 4 (2003), 22–33.
[29]
Daniel Goldsmith et al. 2010. The Spanish inquisition protocol—Model-based transmission reduction for wireless sensor networks. In Proceedings of the IEEE SENSORS Conference. IEEE, 2043–2048.
[30]
Patrick Hunt et al. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the USENIX Annual Technical Conference, Vol. 8, 9.
[31]
Yahoo Inc. 2017. Frequent Items Sketches Overview. Retrieved from https://datasketches.github.io/docs/FrequentItems/FrequentItemsOverview.html.
[32]
Prem Jayaraman et al. 2014. Cardap: A scalable energy-efficient context aware distributed mobile data analytics platform for the fog. In Proceedings of the East European Conference on Advances in Databases and Information Systems. Springer, 192–206.
[33]
David R. Karger et al. 2004. Simple efficient load balancing algorithms for peer-to-peer systems. In Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, 36–43.
[34]
Martin Kleppmann. 2017. Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media.
[35]
William H. Kruskal et al. 1952. Use of ranks in one-criterion variance analysis. J. Amer. Stat. Assoc. 47, 260 (1952), 583–621.
[36]
Dave Locke. 2010. MQ telemetry transport (MQTT) v3. 1 protocol specification. IBM developerWorks, Markham, ON, Canada, Tech. Lib. Retrieved from https://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html.
[37]
Samuel R. Madden et al. 2005. TinyDB: An acquisitional query processing system for sensor networks. ACM Trans. Database Syst. 30, 1 (2005), 122–173.
[38]
Matthew Malensek et al. 2017. HERMES: Federating fog and cloud domains to support query evaluations in continuous sensing environments. IEEE Cloud Comput. 4, 2 (2017), 54–62.
[39]
Francesco Marcelloni et al. 2009. An efficient lossless compression algorithm for tiny nodes of monitoring wireless sensor networks. Comput. J. 52, 8 (2009), 969–987.
[40]
Massachusetts Department of Transportation. 2017. MassDOT developers’ data sources. Retrieved from https://www.mass.gov/massdot-developers-data-sources.
[41]
Peter Michalák et al. 2017. PATH2iot: A holistic, distributed stream processing system. In Proceedings of the IEEE International Conference on Cloud Computing Technology and Science (CloudCom’17). IEEE, 25–32.
[42]
Walter F. Miller. 1990. Short-Term Hourly Temperature Interpolation. Technical Report. Air Force Environmental Technical Applications Center, Scott AFB, IL.
[43]
Jayadev Misra et al. 1982. Finding repeated elements. Sci. Comput. Program. 2, 2 (1982), 143–152.
[44]
National Oceanic and Atmospheric Administration. 2016. The North American Mesoscale Forecast System. Retrieved from http://www.emc.ncep.noaa.gov/index.php?branch=NAM.
[45]
Aileen Nielsen. 2019. Practial Time Series Analysis. O’Reilly Media.
[46]
Gustavo Niemeyer. 2008. Geohash. Retrieved from http://en.wikipedia.org/wiki/Geohash.
[47]
NIST. 2009. order-preserving minimal perfect hashing. Retrieved from https://xlinux.nist.gov/dads/HTML/orderPreservMinPerfectHash.html.
[48]
Shadi A. Noghabi et al. 2016. Ambry: LinkedIn’s scalable geo-distributed object store. In Proceedings of the International Conference on Management of Data. ACM, 253–265.
[49]
MFXJ Oberhumer. [n.d.]. miniLZO: Mini version of the LZO real-time data compression library. Retrieved from http://www.oberhumer.com/opensource/lzo/.
[50]
Prashant Pandey et al. 2017. A general-purpose counting filter: Making every bit count. In Proceedings of the ACM International Conference on Management of Data. ACM, 775–787.
[51]
Apostolos Papageorgiou et al. 2015. Reconstructability-aware filtering and forwarding of time series data in internet-of-things architectures. In Proceedings of the IEEE International Congress on Big Data (BigDataCongress’15). IEEE, 576–583.
[52]
Emanuel Parzen. 1962. On estimation of a probability density function and mode. Ann. Math. Stat. 33, 3 (1962), 1065–1076.
[53]
Peter K. Pearson. 1990. Fast hashing of variable-length text strings. Commun. ACM 33, 6 (1990), 677–680.
[54]
F. Pedregosa et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.
[55]
Venugopalan Ramasubramanian et al. 2004. Beehive: O (1) lookup performance for power-law query distributions in peer-to-peer overlays. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’04), Vol. 4. 8–8.
[56]
Eduard Gibert Renart et al. 2017. Data-driven stream processing at the edge. In Proceedings of the IEEE 1st International Conference on Fog and Edge Computing (ICFEC’17). IEEE, 31–40.
[57]
Mathew Ryden et al. 2014. Nebula: Distributed edge cloud for data intensive computing. In Proceedings of the IEEE International Conference on Cloud Engineering (IC2E’14). IEEE, 57–66.
[58]
Christopher M. Sadler et al. 2006. Data compression algorithms for energy-constrained devices in delay tolerant networks. In Proceedings of the 4th International Conference on Embedded Networked Sensor Systems. ACM, 265–278.
[59]
Hooman Peiro Sajjad et al. 2016. Spanedge: Towards unifying stream processing over central and near-the-edge data centers. In Proceedings of the IEEE/ACM Symposium on Edge Computing (SEC’16). IEEE, 168–178.
[60]
M. Satyanarayanan et al. 2009. The case for VM-based cloudlets in mobile computing. IEEE Pervas. Comput.4 (2009), 14–23.
[61]
Tom Schoellhammer et al. 2004. Lightweight temporal compression of microclimate datasets. In proceedings of the First IEEE Workshop on Embedded Networked Sensors (EmNetS-I), Tampa, Florida, USA.
[62]
Zach Shelby et al. 2014. The constrained application protocol (CoAP). Retrieved from https://tools.ietf.org/html/rfc7252.
[63]
Wanita Sherchan et al. 2012. Using on-the-move mining for mobile crowdsensing. In Proceedings of the IEEE 13th International Conference on Mobile Data Management (MDM’12). IEEE, 115–124.
[64]
Ion Stoica et al. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Comput. Commun. Rev. 31, 4 (2001), 149–160.
[65]
Yufei Tao et al. 2004. Spatio-temporal aggregation using sketches. In Proceedings of the 20th International Conference on Data Engineering. IEEE, 214–225.
[66]
Bart Theeten et al. 2015. Chive: Bandwidth optimized continuous querying in distributed clouds. IEEE Trans. Cloud Comput. 3, 2 (2015), 219–232.
[67]
Jonas Traub et al. 2017. Optimized on-demand data streaming from sensor nodes. In Proceedings of the Symposium on Cloud Computing. ACM, 586–597.
[68]
Demetris Trihinas et al. 2015. AdaM: An adaptive monitoring framework for sampling and filtering on IoT devices. In Proceedings of the IEEE International Conference on Big Data (BigData’15). IEEE, 717–726.
[69]
Chun-Wei Tsai et al. 2014. Data mining for internet of things: A survey. IEEE Commun. Surveys Tutor. 16, 1 (2014), 77–97.
[70]
U.S. Environmental Protection Agency. 2018. Daily Summary Data—Criteria Gases. Retrieved from https://aqs.epa.gov/aqsweb/airdata/download_files.html#Daily.
[71]
Jan Van Leeuwen. 1976. On the construction of Huffman trees. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP’76). 382–410.
[72]
Chi Yang et al. 2011. Transmission reduction based on order compression of compound aggregate data over wireless sensor networks. In Proceedings of the 6th International Conference on Pervasive Computing and Applications (ICPCA’11). IEEE, 335–342.

Cited By

View all
  • (2024)Energy-Efficient Implementation of Explainable Feature Extraction Algorithms for Smart Sensor Data Processing2024 IEEE SENSORS10.1109/SENSORS60989.2024.10784817(1-4)Online publication date: 20-Oct-2024
  • (2024)Deep Dict: Deep Learning-Based Lossy Time Series Compressor for IoT DataICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622275(4245-4250)Online publication date: 9-Jun-2024
  • (2023)Time Series Compression for IoTWireless Communications & Mobile Computing10.1155/2023/50252552023Online publication date: 1-Jan-2023
  • Show More Cited By

Index Terms

  1. Living on the Edge: Data Transmission, Storage, and Analytics in Continuous Sensing Environments

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Internet of Things
      ACM Transactions on Internet of Things  Volume 2, Issue 3
      August 2021
      197 pages
      EISSN:2577-6207
      DOI:10.1145/3474396
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 08 July 2021
      Accepted: 01 February 2021
      Revised: 01 August 2020
      Received: 01 April 2019
      Published in TIOT Volume 2, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Data sketches
      2. Internet-of-Things
      3. edge computing
      4. streaming systems
      5. temporal data

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)186
      • Downloads (Last 6 weeks)40
      Reflects downloads up to 30 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Energy-Efficient Implementation of Explainable Feature Extraction Algorithms for Smart Sensor Data Processing2024 IEEE SENSORS10.1109/SENSORS60989.2024.10784817(1-4)Online publication date: 20-Oct-2024
      • (2024)Deep Dict: Deep Learning-Based Lossy Time Series Compressor for IoT DataICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622275(4245-4250)Online publication date: 9-Jun-2024
      • (2023)Time Series Compression for IoTWireless Communications & Mobile Computing10.1155/2023/50252552023Online publication date: 1-Jan-2023
      • (2023)A Fault Detection Mechanism for Database Management Systems on Mobile Edge Computing2023 11th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud)10.1109/MobileCloud58788.2023.00013(45-50)Online publication date: Jul-2023
      • (2023)Sensor-based optimization multi-decision model for sustainable smart citiesSustainable Energy Technologies and Assessments10.1016/j.seta.2023.10345260(103452)Online publication date: Dec-2023
      • (2023)S-Edge: heterogeneity-aware, light-weighted, and edge computing integrated adaptive traffic light control frameworkThe Journal of Supercomputing10.1007/s11227-023-05216-079:13(14923-14953)Online publication date: 12-Apr-2023
      • (2022)Edge Computing Technology Enablers: A Systematic Lecture StudyIEEE Access10.1109/ACCESS.2022.318363410(69264-69302)Online publication date: 2022
      • (2021)When Edge Computing Meets Compact Data Structures2021 IEEE Cloud Summit (Cloud Summit)10.1109/IEEECloudSummit52029.2021.00013(29-34)Online publication date: Oct-2021

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media