[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

INT-MC: Low-Overhead In-Band Network-Wide Telemetry Based on Matrix Completion

Published: 13 December 2024 Publication History

Abstract

In-band Network Telemetry (INT) enhances real-time, high-resolution network monitoring capabilities by incorporating fine-grained internal state information into packets. Utilizing INT for network-wide visualization can significantly bolster network management and operation. Although existing studies have made significant contributions, they have not been able to simultaneously meet the following two objectives: 1) Comprehensive visibility of the network's internal state, which refers to obtaining information on all switch ports and their corresponding link states; 2) Low measurement overhead, which involves reducing measurement bandwidth overhead and minimizing the impact of the measurement process on the network. We designed INT-MC to meet both of these objectives. INT-MC is an efficient and cost-effective network-wide telemetry solution based on matrix completion. By modeling link metadata as a matrix and leveraging its low-rank property, INT-MC selectively measures certain links. It then employs matrix completion algorithms to infer information about unmeasured links, thereby achieving low-overhead network-wide telemetry. Designing paths to cover the target links involves an NP-complete problem, and the high computational complexity may delay the measurement tasks. To circumvent this, we propose an improved scheme based on Eulerian digraph decomposition, transforming the path calculation problem into a high-information path selection problem, significantly reducing computational costs.
We have implemented an INT-MC prototype within the NSFNet topology, consisting of 14 Tofino switches and 10 end-hosts, and conducted extensive experiments and evaluations. The results indicate that INT-MC incurs only 16% of the measurement overhead compared to existing network-wide telemetry solutions, while achieving nearly identical accuracy. Even under high-frequency measurements of 20 times per second, the bandwidth overhead of INT-MC is approximately 0.075% of the total bandwidth, exerting minimal impact on the network.

References

[1]
Behnaz Arzani, Selim Ciraci, Luiz Chamon, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Boon Thau Loo, and Geoff Outhred. 2018. 007: Democratically finding the cause of packet drops. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 419--435.
[2]
Wei Bai, Li Chen, Kai Chen, Dongsu Han, Chen Tian, and Hao Wang. 2015. $$Information-Agnostic$$ flow scheduling for commodity data centers. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). 455--468.
[3]
Ran Ben Basat, Sivaramakrishnan Ramanathan, Yuliang Li, Gianni Antichi, Minian Yu, and Michael Mitzenmacher. 2020. PINT: Probabilistic in-band network telemetry. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 662--680.
[4]
Ding Ma Bhaskar Chinni. [n.,d.]. Broadcom's In-band Flow Analyzer (IFA). https://www.broadcom.com/blog/network-performance-anomaly-detection-with-in-band-flow-analyzer
[5]
Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. 2010. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, Vol. 20, 4 (2010), 1956--1982.
[6]
Emmanuel Candes and Benjamin Recht. 2012. Exact matrix completion via convex optimization. Commun. ACM, Vol. 55, 6 (2012), 111--119.
[7]
Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM (JACM), Vol. 58, 3 (2011), 1--37.
[8]
Emmanuel J Candes and Yaniv Plan. 2010. Matrix completion with noise. Proc. IEEE, Vol. 98, 6 (2010), 925--936.
[9]
Justin Cappos, Ivan Beschastnikh, Arvind Krishnamurthy, and Tom Anderson. 2009. Seattle: a platform for educational cloud computing. In Proceedings of the 40th ACM technical symposium on Computer science education. 111--115.
[10]
Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, and Rachel Ward. 2015. Completing any low-rank matrix, provably. The Journal of Machine Learning Research, Vol. 16, 1 (2015), 2999--3034.
[11]
Yong Chen, Zhi-Zhong Chen, Curtis Kennedy, Guohui Lin, Yao Xu, and An Zhang. 2024. Approximating the directed path partition problem. Information and Computation, Vol. 297 (2024), 105150.
[12]
Alexander L. Chistov and Dima Grigoriev. 1984. Complexity of Quantifier Elimination in the Theory of Algebraically Closed Fields. Springer-Verlag (1984).
[13]
Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Peterson, Mike Wawrzoniak, and Mic Bowman. 2003. Planetlab: an overlay testbed for broad-coverage services. ACM SIGCOMM Computer Communication Review, Vol. 33, 3 (2003), 3--12.
[14]
Rogério Le ao Santos De Oliveira, Christiane Marie Schweitzer, Ailton Akira Shinoda, and Ligia Rodrigues Prete. 2014. Using mininet for emulation and prototyping software-defined networks. In 2014 IEEE Colombian conference on communications and computing (COLCOM). Ieee, 1--6.
[15]
Nathaniel Dean. 1986. What is the smallest number of dicycles in a dicycle decomposition of an Eulerian digraph? Journal of graph theory, Vol. 10, 3 (1986), 299--308.
[16]
Lei Deng, Haifeng Zheng, Xiao-Yang Liu, Xinxin Feng, and Zhizhang David Chen. 2020. Network latency estimation with leverage sampling for personal devices: An adaptive tensor completion approach. IEEE/ACM Transactions on Networking, Vol. 28, 6 (2020), 2797--2808.
[17]
Petros Drineas, Malik Magdon-Ismail, Michael W Mahoney, and David P Woodruff. 2012. Fast approximation of matrix coherence and statistical leverage. The Journal of Machine Learning Research, Vol. 13, 1 (2012), 3475--3506.
[18]
Herbert Fleischner. 1990. Eulerian graphs and related topics. Elsevier.
[19]
Krishna P Gummadi, Stefan Saroiu, and Steven D Gribble. 2002. King: Estimating latency between arbitrary internet end hosts. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment. 5--18.
[20]
Nguyen Viet Ha, Tran Nguyen Tuan Kiet, Bui Ngoc Thanh Binh, Nguyen Minh Tri, Tran Thi Thao Nguyen, and Masato Tsuru. 2024. Real-Time In-Band Network Link Loss Detection With Programmable Data Plane. In 2024 16th International Conference on Knowledge and Smart Technology (KST). IEEE, 167--172.
[21]
Nikhil Handigol, Brandon Heller, Vimalkumar Jeyakumar, David Mazières, and Nick McKeown. 2014. I know what your packet did last hop: Using packet histories to troubleshoot networks. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). 71--85.
[22]
Yao Hu, Debing Zhang, Jieping Ye, Xuelong Li, and Xiaofei He. 2012. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 9 (2012), 2117--2130.
[23]
S. Anubolu R. Manur H. Holbrook A. Ghanwani D. Cai H. Ou Y. Li X. Wang J. Kumar, S. Anubolu. 2024. Inband Flow Analyzer. Internet-Draft. https://datatracker.ietf.org/doc/draft-kumar-ippm-ifa/
[24]
Prateek Jain, Raghu Meka, and Inderjit Dhillon. 2010. Guaranteed rank minimization via singular value projection. Advances in Neural Information Processing Systems, Vol. 23 (2010).
[25]
Kai Jin, Kun Xie, Xin Wang, Jiazheng Tian, Gaogang Xie, Jigang Wen, and Kenli Li. 2022. Low cost online network traffic measurement with subspace-based matrix completion. IEEE Transactions on Network Science and Engineering, Vol. 10, 1 (2022), 53--67.
[26]
Raj Joshi, Cha Hwan Song, Xin Zhe Khooi, Nishant Budhdev, Ayush Mishra, Mun Choon Chan, and Ben Leong. 2023. Masking Corruption Packet Losses in Datacenter Networks with Link-local Retransmission. In Proceedings of the ACM SIGCOMM 2023 Conference. 288--304.
[27]
Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. 2016. Hula: Scalable load balancing using programmable data planes. In Proceedings of the Symposium on SDN Research. 1--12.
[28]
Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, Lawrence J Wobker, et al. 2015. In-band network telemetry via programmable dataplanes. In ACM SIGCOMM, Vol. 15. 1--2.
[29]
Youngho Kim, Dongeun Suh, and Sangheon Pack. 2018. Selective in-band network telemetry for overhead reduction. In 2018 IEEE 7th International Conference on Cloud Networking (CloudNet). IEEE, 1--3.
[30]
Jonathan Ledlie, Paul Gardner, and Margo I Seltzer. 2007. Network Coordinates in the Wild. In NSDI, Vol. 7. 299--311.
[31]
Rongpeng Li, Zhifeng Zhao, Jianchao Zheng, Chengli Mei, Yueming Cai, and Honggang Zhang. 2017. The learning and prediction of application-level traffic data in cellular networks. IEEE Transactions on Wireless Communications, Vol. 16, 6 (2017), 3899--3912.
[32]
Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, et al. 2019. HPCC: High precision congestion control. In Proceedings of the ACM Special Interest Group on Data Communication. 44--58.
[33]
Jörg Liebeherr, Almut Burchard, and Florin Ciucu. 2012. Delay bounds in communication networks with heavy-tailed and self-similar traffic. IEEE Transactions on Information Theory, Vol. 58, 2 (2012), 1010--1024.
[34]
Shih-Chun Lin, Pu Wang, Ian F Akyildiz, and Min Luo. 2017. Delay-based maximum power-weight scheduling with heavy-tailed traffic. IEEE/ACM Transactions on Networking, Vol. 25, 4 (2017), 2540--2555.
[35]
Kefei Liu, Zhuo Jiang, Jiao Zhang, Haoran Wei, Xiaolong Zhong, Lizhuang Tan, Tian Pan, and Tao Huang. 2023. Hostping: Diagnosing Intra-host Network Bottlenecks in RDMA Servers. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 15--29. https://www.usenix.org/conference/nsdi23/presentation/liu-kefei
[36]
Congcong Miao, Zhizhen Zhong, Ying Zhang, Kunling He, Fangchao Li, Minggang Chen, Yiren Zhao, Xiang Li, Zekun He, Xianneng Zou, et al. 2023. FlexWAN: Software Hardware Co-design for Cost-Effective and Resilient Optical Backbones. In Proceedings of the ACM SIGCOMM 2023 Conference. 319--332.
[37]
A. Morton. 2016. RFC 7799: Active and Passive Metrics and Methods. Internet Requests for Comments. https://doi.org/10.17487/RFC7799
[38]
Srinivas Narayana, Mina Tahmasbi, Jennifer Rexford, and David Walker. 2016. Compiling path queries. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). 207--222.
[39]
P4 Language Consortium. 2018. Behavioral Model. https://github.com/p4lang/behavioral-model.
[40]
Tian Pan, Enge Song, Zizheng Bian, Xingchen Lin, Xiaoyu Peng, Jiao Zhang, Tao Huang, Bin Liu, and Yunjie Liu. 2019. Int-path: Towards optimal path planning for in-band network-wide telemetry. In IEEE INFOCOM 2019-IEEE Conference On Computer Communications. IEEE, 487--495.
[41]
Bernard Péroche. 1984. NP-completeness of some problems of partitioning and covering in graphs. Discrete applied mathematics, Vol. 8, 2 (1984), 195--208.
[42]
Matthew Roughan, Yin Zhang, Walter Willinger, and Lili Qiu. 2011. Spatio-temporal compressive sensing and internet traffic matrices (extended version). IEEE/ACM Transactions on Networking, Vol. 20, 3 (2011), 662--676.
[43]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the social network's (datacenter) network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 123--137.
[44]
Vibhaalakshmi Sivaraman, Srinivas Narayana, Ori Rottenstreich, Shan Muthukrishnan, and Jennifer Rexford. 2017. Heavy-hitter detection entirely in the data plane. In Proceedings of the Symposium on SDN Research. 164--176.
[45]
George Steiner. 2003. On the k-path partition of graphs. Theoretical Computer Science, Vol. 290, 3 (2003), 2147--2155.
[46]
Carl A Sunshine. 1977. Source routing in computer networks. ACM SIGCOMM Computer Communication Review, Vol. 7, 1 (1977), 29--33.
[47]
S. Bhandari B. Gafni M. Spiegel T. Mizrahi, F. Brockners. 2022. In-situ OAM Loopback and Active Flags. Internet-Draft. https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-flags-08
[48]
Shaofei Tang, Sicheng Zhao, Xiaoqin Pan, and Zuqing Zhu. 2022. How to Use In-Band Network Telemetry Wisely: Network-Wise Orchestration of Sel-INT. IEEE/ACM Transactions on Networking, Vol. 31, 1 (2022), 421--435.
[49]
The Abilene Observatory. 2004. The Abilene Observatory Data Collections. Available online. [Online]. Available: http://www.cs.utexas.edu/ yzhang/research/AbileneTM/.
[50]
Steve Uhlig, Bruno Quoitin, Jean Lepropre, and Simon Balon. 2006. Providing public intradomain traffic matrices to the research community. ACM SIGCOMM Computer Communication Review, Vol. 36, 1 (2006), 83--86.
[51]
Cheng Wang and Kun Xie. 2023. HPETC: History Priority Enhanced Tensor Completion For Network Distance Measurement. IEEE Transactions on Parallel and Distributed Systems (2023).
[52]
Zhongxiang Wei, Ye Tian, Wei Chen, Liyuan Gu, and Xinming Zhang. 2023. DUNE: Improving accuracy for sketch-INT network measurement systems. In IEEE INFOCOM 2023-IEEE Conference on Computer Communications. IEEE, 1--10.
[53]
Wikipedia. 2023. Path cover. https://en.wikipedia.org/wiki/Path_cover
[54]
Bernard Wong, Aleksandrs Slivkins, and Emin Gün Sirer. 2005. Meridian: A lightweight network location service without virtual coordinates. ACM SIGCOMM Computer Communication Review, Vol. 35, 4 (2005), 85--96.
[55]
Kun Xie, Jiazheng Tian, Gaogang Xie, Guangxing Zhang, and Dafang Zhang. 2021. Low cost sparse network monitoring based on block matrix completion. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 1--10.
[56]
Kun Xie, Lele Wang, Xin Wang, Gaogang Xie, and Jigang Wen. 2017. Low cost and high accuracy data gathering in WSNs with matrix completion. IEEE Transactions on Mobile Computing, Vol. 17, 7 (2017), 1595--1608.
[57]
Kun Xie, Lele Wang, Xin Wang, Gaogang Xie, Guangxing Zhang, Dongliang Xie, and Jigang Wen. 2015. Sequential and adaptive sampling for matrix completion in network monitoring systems. In 2015 IEEE conference on computer communications (infocom). IEEE, 2443--2451.
[58]
Kaicheng Yang, Sheng Long, Qilong Shi, Yuanpeng Li, Zirui Liu, Yuhan Wu, Tong Yang, and Zhengyi Jia. 2023. Sketchint: Empowering int with towersketch for per-flow per-switch measurement. IEEE Transactions on Parallel and Distributed Systems (2023).
[59]
Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, and Steve Uhlig. 2018. Elastic sketch: Adaptive and fast network-wide measurements. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 561--575.
[60]
Chen-Yu Yen, Soheil Abbasloo, and H Jonathan Chao. 2023. Computers Can Learn from the Heuristic Designs and Master Internet Congestion Control. In Proceedings of the ACM SIGCOMM 2023 Conference. 255--274.
[61]
Qianchen Yuan, Fuliang Li, Tian Pan, Yuhua Lai, Yetao Gu, and Xingwei Wang. 2022. INT-Segment: MTU-Adaptive Single-Path In-Band Network-Wide Telemetry. In 2022 IEEE 30th International Conference on Network Protocols (ICNP). IEEE, 1--11.
[62]
Qianchen Yuan, Fuliang Li, Tian Pan, Yifan Xu, and Xingwei Wang. 2022. INT-react: An O (E) Path Planner for Resilient Network-Wide Telemetry Over Megascale Networks. In 2022 IEEE 30th International Conference on Network Protocols (ICNP). IEEE, 1--11.
[63]
Ellen W Zegura, Kenneth L Calvert, and Samrat Bhattacharjee. 1996. How to model an internetwork. In Proceedings of IEEE INFOCOM'96. Conference on Computer Communications, Vol. 2. IEEE, 594--602.
[64]
Yin Zhang, Matthew Roughan, Walter Willinger, and Lili Qiu. 2009. Spatio-temporal compressive sensing and internet traffic matrices. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication. 267--278.
[65]
Yikai Zhao, Kaicheng Yang, Zirui Liu, Tong Yang, Li Chen, Shiyi Liu, Naiqian Zheng, Ruixin Wang, Hanbo Wu, Yi Wang, et al. 2021. $$LightGuardian$$: A $$full-visibility$$, lightweight, in-band telemetry system using sketchlets. In 18th USENIX symposium on networked systems design and implementation (NSDI 21). 991--1010.
[66]
Zibin Zheng and Michael R Lyu. 2008. Ws-dream: A distributed reliability assessment mechanism for web services. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, 392--397.
[67]
Rui Zhu, Bang Liu, Di Niu, Zongpeng Li, and Hong Vicky Zhao. 2016. Network latency estimation for personal devices: A matrix completion approach. IEEE/ACM Transactions on Networking, Vol. 25, 2 (2016), 724--737.

Index Terms

  1. INT-MC: Low-Overhead In-Band Network-Wide Telemetry Based on Matrix Completion

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
    Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 8, Issue 3
    POMACS
    December 2024
    588 pages
    EISSN:2476-1249
    DOI:10.1145/3708555
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 December 2024
    Published in POMACS Volume 8, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. in-band network telemetry
    2. matrix completion
    3. network measurement
    4. network-wide telemetry

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 161
      Total Downloads
    • Downloads (Last 12 months)161
    • Downloads (Last 6 weeks)161
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media