Abstract
Finding top-k persistent flows in high-speed network traffic is crucial for applications like click-fraud detection and covert attacker detection. The prior studies either do not separate persistent and non-persistent flows during online traffic processing and waste significant space to record numerous non-persistent flows, or only realize unstable separation that is not robust to flow frequency. We proposes Persistent Sketch (PE-Sketch), the first memory-efficient and robust algorithm for finding top-k persistent flows. The basic idea is accurately separating persistent flows and then tracking them. Because it is difficult to perform separation by persistence directly, PE-Sketch introduces the concept of event sampling to sample the persistence increment events (each flow’s first arrival in every time window) with a pre-defined probability, where the number of sampled events is proportional to flow persistence. Then we design a memory-efficient candidate matrix to accurately separate and track the flows with the most sampled events, i.e., persistent flows. With the two key techniques, we find persistent flows regardless of their frequencies, attaining robust and accurate estimation results. Experimental results demonstrate that, compared to the state-of-the-art (On-Off Sketch), PE-Sketch is robust, and it can improve the precision by up to 15.6 times and reduce the error by up to 2 orders of magnitude when using the same space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lahiri, B., Chandrashekar, J., Tirthapura, S.: Space-efficient tracking of persistent items in a massive data stream. In: Proceedings of the 5th ACM International Conference on Distributed Event-Based System, pp. 255–266 (2011)
Oentaryo, R., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014)
Zhu, T., Meng, Y., Hu, H., Zhang, X., Xue, M., Zhu, H.: Dissecting click fraud autonomy in the wild. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 271–286 (2021)
Xiao, Q., Qiao, Y., Zhen, M., Chen, S.: Estimating the persistent spreads in high-speed networks. In: 2014 IEEE 22nd International Conference on Network Protocols, pp. 131–142 (2014)
Xu, Z., Wang, X., Zhang, Y.: Towards persistent detection of DDoS attacks in NDN: a sketch-based approach. IEEE Trans. Dependable Secure Comput. 20, 1–17 (2022)
Zhou, Y., Zhou, Y., Chen, M., Chen, S.: Persistent spread measurement for big network data based on register intersection. SIGMETRICS Perform. Eval. Rev. 45(1), 67 (2017)
Javed, M., Paxson, V.: Detecting stealthy, distributed SSH brute-forcing. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, New York, NY, USA, pp. 85–96 (2013)
Ghafir, I., Přenosil, V.: Advanced persistent threat attack detection: an overview. Int. J. Adv. Comput. Netw. Secur. 4(4), 50–54 (2014)
Wang, P., Jia, P., Tao, J., Guan, X.: Detecting a variety of long-term stealthy user behaviors on high speed links. IEEE Trans. Knowl. Data Eng. 31(10), 1912–1925 (2019)
Albanese, M., Jajodia, S., Venkatesan, S.: Defending from stealthy botnets using moving target defenses. IEEE Secur. Priv. 16(1), 92–97 (2018)
Alshamrani, A., Myneni, S., Chowdhary, A., Huang, D.: A survey on advanced persistent threats: techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 21(2), 1851–1877 (2019)
Zhong, Z., Yan, S., Li, Z., Tan, D., Yang, T., Cui, B.: BurstSketch: finding bursts in data streams. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2375–2383 (2021)
Chen, L., Phan, R.C.W., Chen, Z., Huang, D.: Persistent items tracking in large data streams based on adaptive sampling. In: IEEE Conference on Computer Communications, pp. 1948–1957 (2022)
Dai, H., Shahzad, M., Liu, A.X., Zhong, Y.: Finding persistent items in data streams. Proc. VLDB Endow. 10(4), 289–300 (2016)
Zhang, Y., et al.: On-off sketch: a fast and accurate sketch on persistence. In: Proceedings of the VLDB Endowment, vol. 14, pp. 128–140 (2020)
Du, Y., et al.: Short-term memory sampling for spread measurement in high-speed networks. In: IEEE Conference on Computer Communications, pp. 470–479 (2022)
Du, Y., Huang, H., Sun, Y.E., Chen, S., Gao, G., Wu, X.: Self-adaptive sampling based per-flow traffic measurement. IEEE/ACM Trans. Netw. 31(3), 1010–1025 (2023)
Sun, Y.E., Huang, H., Ma, C., Chen, S., Du, Y., Xiao, Q.: Online spread estimation with non-duplicate sampling. In: IEEE Conference on Computer Communications, pp. 2440–2448 (2020)
Huang, H., et al.: Spread estimation with non-duplicate sampling in high-speed networks. IEEE/ACM Trans. Networking 29(5), 2073–2086 (2021)
Huang, H., et al.: An efficient k-persistent spread estimator for traffic measurement in high-speed networks. IEEE/ACM Trans. Networking 28(4), 1463–1476 (2020)
Zhou, Y., et al.: Cold filter: a meta-framework for faster and more accurate stream processing. In: Proceedings of the 2018 International Conference on Management of Data, pp. 741–756 (2018)
Zhao, B., Li, X., Tian, B., Mei, Z., Wu, W.: DHS: adaptive memory layout organization of sketch slots for fast and accurate data stream processing. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2285–2293 (2021)
Yang, T., et al.: HeavyKeeper: an accurate algorithm for finding top-\(k\) elephant flows. IEEE/ACM Trans. Networking 27(5), 1845–1858 (2019)
Yang, T., Gong, J., Zhang, H., Zou, L., Shi, L., Li, X.: HeavyGuardian: separate and guard hot items in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2584–2593 (2018)
Ting, D.: Data sketches for disaggregated subset sum and frequent item estimation. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1129–1140 (2018)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
CAIDA: Anonymized Internet Traces 2016. https://catalog.caida.org/dataset/passive_2016_pcap. Accessed 20 June 2022
Powers, D.M.W.: Applications and Explanations of Zipf’s Law. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 151–160 (1998)
Appleby, A.: Murmurhash. https://sites.google.com/site/murmurhash/. Accessed 9 June 2022
Acknowledgements
The work of Yu-E Sun and He Huang is supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62332013, Grant 62072322, and Grant U20A20182, and in part by the Open Project of Tongji University Embedded System and Service Computing of Ministry of Education of China under Grant ESSCKF 2022-05. The work of Yang Du is supported in part by NSFC under Grant 62202322 and in part by the Natural Science Foundation of Jiangsu Province under Grant BK20210706. The work of Jia Liu is supported by NSFC under Grant 62072231.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, Z., Sun, YE., Du, Y., Liu, J., Huang, H. (2024). Persistent Sketch: A Memory-Efficient and Robust Algorithm for Finding Top-k Persistent Flows. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14492. Springer, Singapore. https://doi.org/10.1007/978-981-97-0811-6_2
Download citation
DOI: https://doi.org/10.1007/978-981-97-0811-6_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0810-9
Online ISBN: 978-981-97-0811-6
eBook Packages: Computer ScienceComputer Science (R0)