[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3097983.3098092acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Anarchists, Unite: Practical Entropy Approximation for Distributed Streams

Published: 04 August 2017 Publication History

Abstract

Entropy is a fundamental property of data and a key metric in many scientific and engineering fields. Entropy estimation has been extensively studied, but almost always under the assumption that there is a single data stream, seen in its entirety by one node running the estimation algorithm. Multiple distributed data sources are becoming increasingly common, however, with applications in signal processing, computer science, medicine, physics, and more. Centralizing all data can be infeasible, for example in networks of battery or bandwidth limited sensors, so entropy estimation in distributed streams requires new, communication-efficient approaches.
We propose a practical communication-efficient algorithm for continuously approximating the entropy of distributed streams, with deterministic, user-defined error bounds. Unlike previous streaming methods, it supports deletions and variable-sized time-based sliding windows, while still avoiding communication when possible. Moreover, it optionally incorporates a state-of-the-art entropy sketch, allowing for both bandwidth reduction and monitoring very high dimensional problems. Finally, it provides the approximation to all nodes, rather than to a centralized location, which is important in settings such as wireless sensor networks.
Evaluation on several public datasets from real application domains shows that our adaptive algorithm can often reduce the number of messages by two orders of magnitude, compared to centralizing all data in one node.

References

[1]
2016. Technical Assistance Document for the Reporting of Daily Air Quality -- the Air Quality Index (AQI). (May 2016). https://www.airnow.gov/index.cfm?action=pubs.index
[2]
R. E. Anderson. 2004. Entropy of EEG during anaesthetic induction: a comparative study with propofol or nitrous oxide as sole agent. British Journal of Anaesthesia 92, 2 (2004), 167--170.
[3]
C. Arackaparambil, S. Bratus, J. Brody, and A. Shubina. 2010. Distributed Moni- toring of Conditional Entropy for Anomaly Detection in Streams. In 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and PhD Forum (IPDPSW).
[4]
Chrisil Arackaparambil, Joshua Brody, and Amit Chakrabarti. 2009. Functional Monitoring Without Monotonicity. In Proc. ICALP '09.
[5]
Martin Arlitt and Tai Jin. 1998. 1998 World Cup Web Site Access Logs. (1998). http://ita.ee.lbl.gov/html/contrib/WorldCup.html
[6]
Christoph Bandt and Bernd Pompe. 2002. Permutation Entropy: A Natural Complexity Measure for Time Series. Phys. Rev. Lett. 88, 17 (2002), 174102.
[7]
Przemyslaw Berezinski, Bartosz Jasiul, and Marcin Szpyrka. 2015. An Entropy- Based Network Anomaly Detection Method. Entropy 17, 4 (2015), 2367--2408.
[8]
Lakshminath Bhuvanagiri and Sumit Ganguly. 2006. Estimating Entropy over Data Streams. In Proceedings of the 14th Conference on Annual European Symposium - Volume 14 (ESA'06).
[9]
J. Bruhn. 2006. Depth of anaesthesia monitoring: what's available, what's validated and what's next? British Journal of Anaesthesia 97, 1 (2006), 85--94.
[10]
G.C. Calafiore and L. El Ghaoui. 2014. Optimization Models. Cambridge University Press.
[11]
Amit Chakrabarti, Khanh Do Ba, and S. Muthukrishnan. 2006. Estimating Entropy and Entropy Norm on Data Streams. In Proc. STACS '06.
[12]
Wan-Li Cheng, Yu-Chih Kuo, Pay-Liam Lin, Ken-Hui Chang, Yu-Song Chen, Tso-Mei Lin, and Ruth Huang. 2004. Revised air quality index derived from an entropy function. Atmospheric Environment 38, 3 (2004), 383--391.
[13]
Peter Clifford and Ioana Cosma. 2013. A simple sketching algorithm for entropy estimation over streaming data. In Proc. AISTATS '13, Vol. 31.
[14]
Graham Cormode. 2013. The Continuous Distributed Monitoring Model. SIGMOD Rec. 42, 1 (2013), 5--14.
[15]
Graham Cormode, S. Muthukrishnan, Ke Yi, and Qin Zhang. 2012. Continuous Sampling from Distributed Streams. J. ACM 59, 2, Article 10 (2012), 10:1--10:25 pages.
[16]
Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-Embedded Modeling Language for Convex Optimization. JMLR 17, 83 (2016), 1--5.
[17]
Richard Klaus Ellerkmann, Vidal-Markus Liermann, Thorsten Michael Alves, Ingobert Wenningmann, Sascha Kreuer, Wolfram Wilhelm, Heiko Roepcke, Andreas Hoeft, and Jörgen Bruhn. 2004. Spectral Entropy and Bispectral Index as Measures of the Electroencephalographic Effects of Sevoflurane. Anesthesiology 101, 6 (2004), 1275--1282.
[18]
M. Ermes, J. Parkka, J. Mantyjarvi, and I. Korhonen. 2008. Detection of Daily Activities and Sports With Wearable Sensors in Controlled and Uncontrolled Conditions. IEEE Transactions on Information Technology in Biomedicine 12, 1 (2008), 20--26.
[19]
Arik Friedman, Izchak Sharfman, Daniel Keren, and Assaf Schuster. 2014. Privacy- Preserving Distributed Stream Monitoring. In 21st Annual Network and Distributed System Security Symposium (NDSS '14).
[20]
Moshe Gabel, Daniel Keren, and Assaf Schuster. 2014. Communication-efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series. In Proc. IPDPS '14.
[21]
Moshe Gabel, Daniel Keren, and Assaf Schuster. 2015. Monitoring Least Squares Models of Distributed Streams. In Proc. KDD '15.
[22]
S. García, M. Grill, J. Stiborek, and A. Zunino. 2014. An Empirical Comparison of Botnet Detection Methods. Computers & Security 45 (2014), 100--123.
[23]
Minos Garofalakis, Daniel Keren, and Vasilis Samoladas. 2013. Sketch-based Geometric Monitoring of Distributed Stream Queries. Proc. VLDB Endow. 6, 10 (2013), 937--948.
[24]
Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Izchak Sharfman, and Assaf Schuster. 2012. Prediction-based Geometric Monitoring over Distributed Data Streams. In Proc. SIGMOD '12 .
[25]
Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Izchak Sharfman, and Assaf Schuster. 2014. Distributed Geometric Query Monitoring Using Prediction Models. TODS 39, 2 (2014), 16:1--16:42.
[26]
Nikos Giatrakos, Yannis Kotidis, Antonios Deligiannakis, Vasilis Vassalos, and Yannis Theodoridis. 2013. In-network Approximate Computation of Outliers with Quality Guarantees. Inf. Syst. 38, 8 (2013), 1285--1308.
[27]
Nicholas J. A. Harvey, Jelani Nelson, and Krzysztof Onak. 2008. Sketching and Streaming Entropy via Approximation Theory. In Proc. FOCS '08.
[28]
Siu-Wai Ho and Sergio Verdú. 2015. Convexity/Concavity of Renyi Entropy and α-mutual Information. In 2015 IEEE International Symposium on Information Theory (ISIT).
[29]
Piotr Indyk. 2006. Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation. J. ACM 53, 3 (2006), 307--323.
[30]
Michael Kamp, Mario Boley, Daniel Keren, Assaf Schuster, and Izchak Sharfman. 2014. Communication-Efficient Distributed Online Prediction by Dynamic Model Synchronization. In Proc. ECML PKDD '14.
[31]
N. Kannathal, Min Lim Choo, U. Rajendra Acharya, and P.K. Sadasivan. 2005. Entropies for detection of epilepsy in EEG. Computer Methods and Programs in Biomedicine 80, 3 (2005), 187--194.
[32]
Daniel Keren, Guy Sagy, Amir Abboud, David Ben-David, Assaf Schuster, Izchak Sharfman, and Antonios Deligiannakis. 2014. Geometric Monitoring of Heterogeneous Streams. IEEE TKDE 26, 8 (2014), 1890--1903.
[33]
Daniel Keren, Izchak Sharfman, Assaf Schuster, and Avishay Livne. 2012. Shape Sensitive Geometric Monitoring. IEEE TKDE 24, 8 (2012), 1520--1535.
[34]
Arnon Lazerson, Moshe Gabel, Daniel Keren, and Assaf Schuster. 2017. One for All and All for One: Simultaneous Approximation of Multiple Functions over Distributed Streams. In Proc. DEBS '17.
[35]
Arnon Lazerson, Daniel Keren, and Assaf Schuster. 2016. Lightweight Monitoring of Distributed Streams. In Proc. KDD '16.
[36]
Arnon Lazerson, Izchak Sharfman, Daniel Keren, Assaf Schuster, Minos Garo- falakis, and Vasilis Samoladas. 2015. Monitoring Distributed Streams Using Convex Decompositions. Proc. VLDB Endow. 8, 5 (2015), 545--556.
[37]
S. Muthukrishnan. 2003. Data Streams: Algorithms and Applications. In Proc. SODA '03.
[38]
Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, David Garcia-Soriano, Nicolas Kourtellis, and Marco Serafini. 2015. The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines. In Proc. ICDE '15.
[39]
Ilya Nemenman, William Bialek, and Rob de Ruyter van Steveninck. 2004. En- tropy and information in neural spike trains: Progress on the sampling problem. Physical Review E 69, 5 (2004).
[40]
George Nychis, Vyas Sekar, David G. Andersen, Hyong Kim, and Hui Zhang. 2008. An Empirical Evaluation of Entropy-based Traffic Anomaly Detection. In Proceedings of the 8th ACM SIGCOMM Conference on Internet Measurement (IMC '08).
[41]
Liam Paninski. 2003. Estimation of entropy and mutual information. Neural computation 15, 6 (2003), 1191--1253.
[42]
Philippe Renevey and Andrzej Drygajlo. 2001. Entropy Based Voice Activity Detection in Very Noisy Conditions. In EUROSPEECH 2001 Scandinavia, 7th Eu- ropean Conference on Speech Communication and Technology, 2nd INTERSPEECH Event. 1887--1890.
[43]
Guy Sagy, Daniel Keren, Izchak Sharfman, and Assaf Schuster. 2010. Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation. Proc. VLDB Endow. 4, 2 (2010), 46--57.
[44]
Nicola Scafetta and Paolo Grigolini. 2002. Scaling detection in time series: Diffusion entropy analysis. Phys. Rev. E 66, 3 (2002), 036130.
[45]
Izchak Sharfman, Assaf Schuster, and Daniel Keren. 2007. A geometric approach to monitoring threshold functions over distributed data streams. TODS 32, 4 (2007), 23.
[46]
Adwitiya Sinha and Daya Krishan Lobiyal. 2013. Performance evaluation of data aggregation for cluster-based wireless sensor network. Human-centric Computing and Information Sciences 3, 1 (2013).
[47]
Ran Wolff. 2015. Distributed Convex Thresholding. In Proc. PODC '15 .
[48]
David P. Woodruff and Qin Zhang. 2012. Tight Bounds for Distributed Functional Monitoring. In Proc. STOC '12.
[49]
Gal Yehuda, Daniel Keren, and Islam Akaria. 2017. Monitoring Properties of Large, Distributed, Dynamic Graphs. In Proc. IPDPS '17.
[50]
Haiquan (Chuck) Zhao, Ashwin Lall, Mitsunori Ogihara, Oliver Spatscheck, Jia Wang, and Jun Xu. 2007. A Data Streaming Algorithm for Estimating Entropies of OD Flows. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC '07)

Cited By

View all
  • (2022)Communication Efficient Algorithms for Bounding and Approximating the Empirical Entropy in Distributed SystemsEntropy10.3390/e2411161124:11(1611)Online publication date: 5-Nov-2022
  • (2022)AutoMon: Automatic Distributed Monitoring for Arbitrary Multivariate FunctionsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517866(310-324)Online publication date: 10-Jun-2022
  • (2021)Efficient Approximate Algorithms for Empirical Entropy and Mutual InformationProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457255(274-286)Online publication date: 9-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. distributed streams
  3. entropy estimation

Qualifiers

  • Research-article

Funding Sources

  • Max & Rachel Javit fund
  • European Union's Horizon 2020 Research And Innovation Programme

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Communication Efficient Algorithms for Bounding and Approximating the Empirical Entropy in Distributed SystemsEntropy10.3390/e2411161124:11(1611)Online publication date: 5-Nov-2022
  • (2022)AutoMon: Automatic Distributed Monitoring for Arbitrary Multivariate FunctionsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517866(310-324)Online publication date: 10-Jun-2022
  • (2021)Efficient Approximate Algorithms for Empirical Entropy and Mutual InformationProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457255(274-286)Online publication date: 9-Jun-2021
  • (2021)Sketchy With a Chance of Adoption: Can Sketch-Based Telemetry Be Ready for Prime Time?2021 IEEE 7th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft51509.2021.9492582(9-16)Online publication date: 28-Jun-2021
  • (2021)A Distance-Based Scheme for Reducing Bandwidth in Distributed Geometric Monitoring2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00105(1164-1175)Online publication date: Apr-2021
  • (2019)Fast Approximation of Empirical Entropy via SubsamplingProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330938(658-667)Online publication date: 25-Jul-2019
  • (2019)Online Linear Models for Edge ComputingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_38(645-661)Online publication date: 16-Sep-2019
  • (2018)Lightweight Monitoring of Distributed StreamsACM Transactions on Database Systems10.1145/322611343:2(1-37)Online publication date: 31-Jul-2018
  • (2018)Violation Resolution in Distributed Stream NetworksComputational Intelligence, Cyber Security and Computational Models. Models and Techniques for Intelligent Systems and Automation10.1007/978-981-13-0716-4_13(144-171)Online publication date: 11-Sep-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media