Persistent Sketch: A Memory-Efficient and Robust Algorithm for Finding Top-k Persistent Flows

Ziqi Sun¹⁰,
Yu-E Sun^11,12,
Yang Du¹⁰,
Jia Liu¹³ &
…
He Huang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14492))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

652 Accesses

Abstract

Finding top-k persistent flows in high-speed network traffic is crucial for applications like click-fraud detection and covert attacker detection. The prior studies either do not separate persistent and non-persistent flows during online traffic processing and waste significant space to record numerous non-persistent flows, or only realize unstable separation that is not robust to flow frequency. We proposes Persistent Sketch (PE-Sketch), the first memory-efficient and robust algorithm for finding top-k persistent flows. The basic idea is accurately separating persistent flows and then tracking them. Because it is difficult to perform separation by persistence directly, PE-Sketch introduces the concept of event sampling to sample the persistence increment events (each flow’s first arrival in every time window) with a pre-defined probability, where the number of sampled events is proportional to flow persistence. Then we design a memory-efficient candidate matrix to accurately separate and track the flows with the most sampled events, i.e., persistent flows. With the two key techniques, we find persistent flows regardless of their frequencies, attaining robust and accurate estimation results. Experimental results demonstrate that, compared to the state-of-the-art (On-Off Sketch), PE-Sketch is robust, and it can improve the precision by up to 15.6 times and reduce the error by up to 2 orders of magnitude when using the same space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lahiri, B., Chandrashekar, J., Tirthapura, S.: Space-efficient tracking of persistent items in a massive data stream. In: Proceedings of the 5th ACM International Conference on Distributed Event-Based System, pp. 255–266 (2011)
Google Scholar
Oentaryo, R., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014)
MathSciNet Google Scholar
Zhu, T., Meng, Y., Hu, H., Zhang, X., Xue, M., Zhu, H.: Dissecting click fraud autonomy in the wild. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 271–286 (2021)
Google Scholar
Xiao, Q., Qiao, Y., Zhen, M., Chen, S.: Estimating the persistent spreads in high-speed networks. In: 2014 IEEE 22nd International Conference on Network Protocols, pp. 131–142 (2014)
Google Scholar
Xu, Z., Wang, X., Zhang, Y.: Towards persistent detection of DDoS attacks in NDN: a sketch-based approach. IEEE Trans. Dependable Secure Comput. 20, 1–17 (2022)
Google Scholar
Zhou, Y., Zhou, Y., Chen, M., Chen, S.: Persistent spread measurement for big network data based on register intersection. SIGMETRICS Perform. Eval. Rev. 45(1), 67 (2017)
Article Google Scholar
Javed, M., Paxson, V.: Detecting stealthy, distributed SSH brute-forcing. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, New York, NY, USA, pp. 85–96 (2013)
Google Scholar
Ghafir, I., Přenosil, V.: Advanced persistent threat attack detection: an overview. Int. J. Adv. Comput. Netw. Secur. 4(4), 50–54 (2014)
Google Scholar
Wang, P., Jia, P., Tao, J., Guan, X.: Detecting a variety of long-term stealthy user behaviors on high speed links. IEEE Trans. Knowl. Data Eng. 31(10), 1912–1925 (2019)
Article Google Scholar
Albanese, M., Jajodia, S., Venkatesan, S.: Defending from stealthy botnets using moving target defenses. IEEE Secur. Priv. 16(1), 92–97 (2018)
Article Google Scholar
Alshamrani, A., Myneni, S., Chowdhary, A., Huang, D.: A survey on advanced persistent threats: techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 21(2), 1851–1877 (2019)
Article Google Scholar
Zhong, Z., Yan, S., Li, Z., Tan, D., Yang, T., Cui, B.: BurstSketch: finding bursts in data streams. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2375–2383 (2021)
Google Scholar
Chen, L., Phan, R.C.W., Chen, Z., Huang, D.: Persistent items tracking in large data streams based on adaptive sampling. In: IEEE Conference on Computer Communications, pp. 1948–1957 (2022)
Google Scholar
Dai, H., Shahzad, M., Liu, A.X., Zhong, Y.: Finding persistent items in data streams. Proc. VLDB Endow. 10(4), 289–300 (2016)
Article Google Scholar
Zhang, Y., et al.: On-off sketch: a fast and accurate sketch on persistence. In: Proceedings of the VLDB Endowment, vol. 14, pp. 128–140 (2020)
Google Scholar
Du, Y., et al.: Short-term memory sampling for spread measurement in high-speed networks. In: IEEE Conference on Computer Communications, pp. 470–479 (2022)
Google Scholar
Du, Y., Huang, H., Sun, Y.E., Chen, S., Gao, G., Wu, X.: Self-adaptive sampling based per-flow traffic measurement. IEEE/ACM Trans. Netw. 31(3), 1010–1025 (2023)
Article Google Scholar
Sun, Y.E., Huang, H., Ma, C., Chen, S., Du, Y., Xiao, Q.: Online spread estimation with non-duplicate sampling. In: IEEE Conference on Computer Communications, pp. 2440–2448 (2020)
Google Scholar
Huang, H., et al.: Spread estimation with non-duplicate sampling in high-speed networks. IEEE/ACM Trans. Networking 29(5), 2073–2086 (2021)
Article Google Scholar
Huang, H., et al.: An efficient k-persistent spread estimator for traffic measurement in high-speed networks. IEEE/ACM Trans. Networking 28(4), 1463–1476 (2020)
Article Google Scholar
Zhou, Y., et al.: Cold filter: a meta-framework for faster and more accurate stream processing. In: Proceedings of the 2018 International Conference on Management of Data, pp. 741–756 (2018)
Google Scholar
Zhao, B., Li, X., Tian, B., Mei, Z., Wu, W.: DHS: adaptive memory layout organization of sketch slots for fast and accurate data stream processing. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2285–2293 (2021)
Google Scholar
Yang, T., et al.: HeavyKeeper: an accurate algorithm for finding top-\(k\) elephant flows. IEEE/ACM Trans. Networking 27(5), 1845–1858 (2019)
Article Google Scholar
Yang, T., Gong, J., Zhang, H., Zou, L., Shi, L., Li, X.: HeavyGuardian: separate and guard hot items in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2584–2593 (2018)
Google Scholar
Ting, D.: Data sketches for disaggregated subset sum and frequent item estimation. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1129–1140 (2018)
Google Scholar
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article Google Scholar
CAIDA: Anonymized Internet Traces 2016. https://catalog.caida.org/dataset/passive_2016_pcap. Accessed 20 June 2022
Powers, D.M.W.: Applications and Explanations of Zipf’s Law. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 151–160 (1998)
Google Scholar
Appleby, A.: Murmurhash. https://sites.google.com/site/murmurhash/. Accessed 9 June 2022

Download references

Acknowledgements

The work of Yu-E Sun and He Huang is supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62332013, Grant 62072322, and Grant U20A20182, and in part by the Open Project of Tongji University Embedded System and Service Computing of Ministry of Education of China under Grant ESSCKF 2022-05. The work of Yang Du is supported in part by NSFC under Grant 62202322 and in part by the Natural Science Foundation of Jiangsu Province under Grant BK20210706. The work of Jia Liu is supported by NSFC under Grant 62072231.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Ziqi Sun, Yang Du & He Huang
School of Rail Transportation, Soochow University, Suzhou, 215137, China
Yu-E Sun
Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai, 201804, China
Yu-E Sun
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Jia Liu

Authors

Ziqi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yu-E Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yang Du
View author publications
You can also search for this author in PubMed Google Scholar
Jia Liu
View author publications
You can also search for this author in PubMed Google Scholar
He Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yu-E Sun or Yang Du .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology, Melbourne, VIC, Australia
Zahir Tari
Tianjin University, Tianjin, China
Keqiu Li
University of Arizona, Tucson, AZ, USA
Hongyi Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Z., Sun, YE., Du, Y., Liu, J., Huang, H. (2024). Persistent Sketch: A Memory-Efficient and Robust Algorithm for Finding Top-k Persistent Flows. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14492. Springer, Singapore. https://doi.org/10.1007/978-981-97-0811-6_2

Download citation

DOI: https://doi.org/10.1007/978-981-97-0811-6_2
Published: 27 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0810-9
Online ISBN: 978-981-97-0811-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics