[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Persistent Sketch: A Memory-Efficient and Robust Algorithm for Finding Top-k Persistent Flows

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14492))

Abstract

Finding top-k persistent flows in high-speed network traffic is crucial for applications like click-fraud detection and covert attacker detection. The prior studies either do not separate persistent and non-persistent flows during online traffic processing and waste significant space to record numerous non-persistent flows, or only realize unstable separation that is not robust to flow frequency. We proposes Persistent Sketch (PE-Sketch), the first memory-efficient and robust algorithm for finding top-k persistent flows. The basic idea is accurately separating persistent flows and then tracking them. Because it is difficult to perform separation by persistence directly, PE-Sketch introduces the concept of event sampling to sample the persistence increment events (each flow’s first arrival in every time window) with a pre-defined probability, where the number of sampled events is proportional to flow persistence. Then we design a memory-efficient candidate matrix to accurately separate and track the flows with the most sampled events, i.e., persistent flows. With the two key techniques, we find persistent flows regardless of their frequencies, attaining robust and accurate estimation results. Experimental results demonstrate that, compared to the state-of-the-art (On-Off Sketch), PE-Sketch is robust, and it can improve the precision by up to 15.6 times and reduce the error by up to 2 orders of magnitude when using the same space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lahiri, B., Chandrashekar, J., Tirthapura, S.: Space-efficient tracking of persistent items in a massive data stream. In: Proceedings of the 5th ACM International Conference on Distributed Event-Based System, pp. 255–266 (2011)

    Google Scholar 

  2. Oentaryo, R., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014)

    MathSciNet  Google Scholar 

  3. Zhu, T., Meng, Y., Hu, H., Zhang, X., Xue, M., Zhu, H.: Dissecting click fraud autonomy in the wild. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 271–286 (2021)

    Google Scholar 

  4. Xiao, Q., Qiao, Y., Zhen, M., Chen, S.: Estimating the persistent spreads in high-speed networks. In: 2014 IEEE 22nd International Conference on Network Protocols, pp. 131–142 (2014)

    Google Scholar 

  5. Xu, Z., Wang, X., Zhang, Y.: Towards persistent detection of DDoS attacks in NDN: a sketch-based approach. IEEE Trans. Dependable Secure Comput. 20, 1–17 (2022)

    Google Scholar 

  6. Zhou, Y., Zhou, Y., Chen, M., Chen, S.: Persistent spread measurement for big network data based on register intersection. SIGMETRICS Perform. Eval. Rev. 45(1), 67 (2017)

    Article  Google Scholar 

  7. Javed, M., Paxson, V.: Detecting stealthy, distributed SSH brute-forcing. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, New York, NY, USA, pp. 85–96 (2013)

    Google Scholar 

  8. Ghafir, I., Přenosil, V.: Advanced persistent threat attack detection: an overview. Int. J. Adv. Comput. Netw. Secur. 4(4), 50–54 (2014)

    Google Scholar 

  9. Wang, P., Jia, P., Tao, J., Guan, X.: Detecting a variety of long-term stealthy user behaviors on high speed links. IEEE Trans. Knowl. Data Eng. 31(10), 1912–1925 (2019)

    Article  Google Scholar 

  10. Albanese, M., Jajodia, S., Venkatesan, S.: Defending from stealthy botnets using moving target defenses. IEEE Secur. Priv. 16(1), 92–97 (2018)

    Article  Google Scholar 

  11. Alshamrani, A., Myneni, S., Chowdhary, A., Huang, D.: A survey on advanced persistent threats: techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 21(2), 1851–1877 (2019)

    Article  Google Scholar 

  12. Zhong, Z., Yan, S., Li, Z., Tan, D., Yang, T., Cui, B.: BurstSketch: finding bursts in data streams. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2375–2383 (2021)

    Google Scholar 

  13. Chen, L., Phan, R.C.W., Chen, Z., Huang, D.: Persistent items tracking in large data streams based on adaptive sampling. In: IEEE Conference on Computer Communications, pp. 1948–1957 (2022)

    Google Scholar 

  14. Dai, H., Shahzad, M., Liu, A.X., Zhong, Y.: Finding persistent items in data streams. Proc. VLDB Endow. 10(4), 289–300 (2016)

    Article  Google Scholar 

  15. Zhang, Y., et al.: On-off sketch: a fast and accurate sketch on persistence. In: Proceedings of the VLDB Endowment, vol. 14, pp. 128–140 (2020)

    Google Scholar 

  16. Du, Y., et al.: Short-term memory sampling for spread measurement in high-speed networks. In: IEEE Conference on Computer Communications, pp. 470–479 (2022)

    Google Scholar 

  17. Du, Y., Huang, H., Sun, Y.E., Chen, S., Gao, G., Wu, X.: Self-adaptive sampling based per-flow traffic measurement. IEEE/ACM Trans. Netw. 31(3), 1010–1025 (2023)

    Article  Google Scholar 

  18. Sun, Y.E., Huang, H., Ma, C., Chen, S., Du, Y., Xiao, Q.: Online spread estimation with non-duplicate sampling. In: IEEE Conference on Computer Communications, pp. 2440–2448 (2020)

    Google Scholar 

  19. Huang, H., et al.: Spread estimation with non-duplicate sampling in high-speed networks. IEEE/ACM Trans. Networking 29(5), 2073–2086 (2021)

    Article  Google Scholar 

  20. Huang, H., et al.: An efficient k-persistent spread estimator for traffic measurement in high-speed networks. IEEE/ACM Trans. Networking 28(4), 1463–1476 (2020)

    Article  Google Scholar 

  21. Zhou, Y., et al.: Cold filter: a meta-framework for faster and more accurate stream processing. In: Proceedings of the 2018 International Conference on Management of Data, pp. 741–756 (2018)

    Google Scholar 

  22. Zhao, B., Li, X., Tian, B., Mei, Z., Wu, W.: DHS: adaptive memory layout organization of sketch slots for fast and accurate data stream processing. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2285–2293 (2021)

    Google Scholar 

  23. Yang, T., et al.: HeavyKeeper: an accurate algorithm for finding top-\(k\) elephant flows. IEEE/ACM Trans. Networking 27(5), 1845–1858 (2019)

    Article  Google Scholar 

  24. Yang, T., Gong, J., Zhang, H., Zou, L., Shi, L., Li, X.: HeavyGuardian: separate and guard hot items in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2584–2593 (2018)

    Google Scholar 

  25. Ting, D.: Data sketches for disaggregated subset sum and frequent item estimation. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1129–1140 (2018)

    Google Scholar 

  26. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  Google Scholar 

  27. CAIDA: Anonymized Internet Traces 2016. https://catalog.caida.org/dataset/passive_2016_pcap. Accessed 20 June 2022

  28. Powers, D.M.W.: Applications and Explanations of Zipf’s Law. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 151–160 (1998)

    Google Scholar 

  29. Appleby, A.: Murmurhash. https://sites.google.com/site/murmurhash/. Accessed 9 June 2022

Download references

Acknowledgements

The work of Yu-E Sun and He Huang is supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62332013, Grant 62072322, and Grant U20A20182, and in part by the Open Project of Tongji University Embedded System and Service Computing of Ministry of Education of China under Grant ESSCKF 2022-05. The work of Yang Du is supported in part by NSFC under Grant 62202322 and in part by the Natural Science Foundation of Jiangsu Province under Grant BK20210706. The work of Jia Liu is supported by NSFC under Grant 62072231.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yu-E Sun or Yang Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, Z., Sun, YE., Du, Y., Liu, J., Huang, H. (2024). Persistent Sketch: A Memory-Efficient and Robust Algorithm for Finding Top-k Persistent Flows. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14492. Springer, Singapore. https://doi.org/10.1007/978-981-97-0811-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0811-6_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0810-9

  • Online ISBN: 978-981-97-0811-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics