[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3230543.3230552acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Synchronized network snapshots

Published: 07 August 2018 Publication History

Abstract

When monitoring a network, operators rarely have a finegrained and complete view of the network's state. Instead, today's network monitoring tools generally only measure a single device or path at a time; whole-network metrics are a composition of these independent measurements, i.e., an afterthought. Such tools fail to fully answer a wide range of questions. Is my load balancing algorithm taking advantage of all available paths evenly? How much of my network is concurrently loaded? Is application traffic synchronized? These types of concurrent network behavior are challenging to capture at fine granularity as they involve coordination across the entire network. At the same time, understanding them is essential to the design of network switches, architectures, and protocols.
This paper presents the design of a Synchronized Network Snapshot protocol. The goal of our primitive is the collection of a network-wide set of measurements. To ensure that the measurements are meaningful, our design guarantees they are both causally consistent and approximately synchronous. We demonstrate with a Wedge100BF implementation the feasibility of our approach as well as its many potential uses.

References

[1]
Aijay Adams, Petr Lapukhov, and Hongyi Zeng. 2016. Net-NORAD: Troubleshooting networks via end-to-end probing. (2016). https://code.facebook.com/posts/1534350660228025/netnorad-troubleshooting- networks-via- end-to-end-probing/.
[2]
Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. 2014. CONGA: Distributed Congestion-aware Load Balancing for Datacenters. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 503--514.
[3]
Dormando Anatoly Vorobey, Brad Fitzpatrick. 2009. Memcached. (2009). https://memcached.org
[4]
Apache Software Foundation. 2012. Hadoop, Terasort. (2012). https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/examples/terasort/package-summary.html
[5]
Apache Software Foundation. 2012. Hadoop, YARN. (2012). https://hadoop.apache.org/docs/r2.7.0/
[6]
Apache Software Foundation. 2014. PageRank, GraphX. (2014). https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/graphx/SynthBenchmark.scala
[7]
Apache Software Foundation. 2016. Spark. (2016). https://github.com/apache/spark/
[8]
Barefoot. 2017. Barefoot Tofino. https://www.barefootnetworks.com/technology/. (2017).
[9]
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13). ACM, New York, NY, USA, 99--110.
[10]
K Mani Chandy and Leslie Lamport. 1985. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS) 3, 1 (1985), 63--75.
[11]
Yan Chen, David Bindel, Hanhee Song, and Randy H. Katz. 2004. An Algebraic Approach to Practical and Scalable Overlay Network Monitoring. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM '04). ACM, New York, NY, USA, 55--66.
[12]
Christophe Croux and Catherine Dehon. 2010. Influence Functions of the Spearman and Kendall Correlation Measures. Statistical methods & applications 19, 4 (2010), 497--515.
[13]
Dormando. 2016. mc-crusher. (2016). https://github.com/memcached/mc-crusher
[14]
Glen Gibb, George Varghese, Mark Horowitz, and Nick McKeown. 2013. Design Principles for Packet Parsers. In Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems. IEEE, Washington, D.C., USA, 13--24.
[15]
Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen, Zhi-Wei Lin, and Varugis Kurien. 2015. Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 139--152.
[16]
C. Hopps. 2000. Analysis of an Equal-Cost Multi-Path Algorithm. RFC 2992. RFC Editor. 1--8 pages. https://tools.ietf.org/html/rfc2992
[17]
Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen. 2017. Tagger: Practical PFC Deadlock Prevention in Data Center Networks. In Proceedings of the 13th International Conference on Emerging Networking EXperiments and Technologies (CoNEXT '17). ACM, New York, NY, USA, 451--463.
[18]
John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, Thomas Anderson, and Arun Venkataramani. 2008. Consensus Routing: The Internet as a Distributed System. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI '08). USENIX Association, Berkeley, CA, USA, 351--364.
[19]
Prem Jonnalagadda. 2017. Disaggregation and Programmable Forwarding Planes. https://barefootnetworks.com/blog/disaggregation-and-programmable-forwarding-planes/. (2017).
[20]
Srikanth Kandula, Dina Katabi, Shantanu Sinha, and Arthur W. Berger. 2007. Dynamic Load Balancing Without Packet Reordering. Computer Communication Review 37, 2 (2007), 51--62.
[21]
Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitendra Padhye, and Paramvir Bahl. 2009. Detailed Diagnosis in Enterprise Networks. ACM SIGCOMM Computer Communication Review 39, 4 (2009), 243--254.
[22]
Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, and Lawrence J Wobker. 2015. In-band Network Telemetry via Programmable Dataplanes. In Demo paper at SIGCOMM '15.
[23]
Ajay D Kshemkalyani, Michel Raynal, and Mukesh Singhal. 1995. An introduction to snapshot algorithms in distributed computing. Distributed systems engineering 2, 4 (1995), 224.
[24]
Ten H Lai and Tao H Yang. 1987. On distributed snapshots. Inform. Process. Lett. 25, 3 (1987), 153--158.
[25]
Ki Suh Lee, Han Wang, Vishal Shrivastav, and Hakim Weatherspoon. 2016. Globally Synchronized Time via Datacenter Networks. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, New York, NY, USA, 454--467.
[26]
Ma łgorzata Steinder and Adarshpal S Sethi. 2004. A survey of fault localization techniques in computer networks. Science of computer programming 53, 2 (2004), 165--194.
[27]
Hon Fung Li, Thiruvengadam Radhakrishnan, and K. Venkatesh. 1987. Global State Detection in Non-FIFO Networks. In International Conference on Distributed Computing Systems (ICDCS). IEEE Computer Society, Washington, D.C., USA, 364--370.
[28]
Yuliang Li, Rui Miao, Changhoon Kim, and Minlan Yu. 2016. FlowRadar: A Better NetFlow for Data Centers. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI '16). USENIX Association, Berkeley, CA, USA, 311--324.
[29]
Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and Vladimir Braverman. 2016. One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, New York, NY, USA, 101--114.
[30]
Radhika Niranjan Mysore, Ratul Mahajan, Amin Vahdat, and George Varghese. 2014. Gestalt: Fast, Unified Fault Localization for Networked Systems. In USENIX ATC. USENIX Association, Philadelphia, PA, 255--267. https://www.usenix.org/conference/atc14/technical-sessions/presentation/mysore
[31]
Srinivas Narayana, Anirudh Sivaraman, Vikram Nathan, Prateesh Goyal, Venkat Arun, Mohammad Alizadeh, Vimalkumar Jeyakumar, and Changhoon Kim. 2017. Language-Directed Hardware Design for Network Performance Monitoring. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). ACM, New York, NY, USA, 85--98.
[32]
Remi Philippe. 2016. Next Generation Data Center Flow Telemetry. Technical Report. Cisco.
[33]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, and Alex C. Snoeren. 2017. Passive Realtime Datacenter Fault Detection and Localization. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation (NSDI '17). USENIX Association, Berkeley, CA, USA, 595--612.
[34]
Liron Schiff, Michael Borokhovich, and Stefan Schmid. 2014. Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks (HotNets-XIII). ACM, New York, NY, USA, 7:1--7:7.
[35]
Naveen Kr. Sharma, Antoine Kaufmann, Thomas Anderson, Changhoon Kim, Arvind Krishnamurthy, Jacob Nelson, and Simon Peter. 2017. Evaluating the Power of Flexible Packet Processing for Network Resource Allocation. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation (NSDI '17). USENIX Association, Berkeley, CA, USA, 67--82.
[36]
Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. 2016. Packet Transactions: High-Level Programming for Line-Rate Switches. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, New York, NY, USA, 15--28.
[37]
John Sonchack, Adam J. Aviv, Eric Keller, and Jonathan M. Smith. 2018. Turboflow: Information Rich Flow Record Generation on Commodity Switches. In Proceedings of the Thirteenth EuroSys Conference (EuroSys '18). ACM, New York, NY, USA, Article 11, 16 pages.
[38]
Madalene Spezialetti and Phil Kearns. 1986. Efficient Distributed Snapshots. In International Conference on Distributed Computing Systems (ICDCS). IEEE Computer Society, Washington, D.C., USA, 382--388.
[39]
Niels LM Van Adrichem, Christian Doerr, and Fernando A Kuipers. 2014. Opennetmon: Network monitoring in OpenFlow software-defined networks. In Network Operations and Management Symposium (NOMS). IEEE, Washington, D.C., USA, 1--8.
[40]
Minlan Yu, Lavanya Jose, and Rui Miao. 2013. Software Defined Traffic Measurement with OpenSketch. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI '13). USENIX Association, Berkeley, CA, USA, 29--42.
[41]
Qiao Zhang, Vincent Liu, Hongyi Zeng, and Arvind Krishnamurthy. 2017. High-resolution Measurement of Data Center Microbursts. In Proceedings of the 2017 Internet Measurement Conference (IMC '17). ACM, New York, NY, USA, 78--85.
[42]
Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, and Haitao Zheng. 2015. Packet-Level Telemetry in Large Datacenter Networks. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 479--491.

Cited By

View all
  • (2024)P4runpro: Enabling Runtime Programmability for RMT Programmable SwitchesProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672230(921-937)Online publication date: 4-Aug-2024
  • (2024)Distributed Network Telemetry With Resource Efficiency and Full AccuracyIEEE/ACM Transactions on Networking10.1109/TNET.2023.332734532:3(1857-1872)Online publication date: Jun-2024
  • (2024)RIDS: Towards Advanced IDS via RNN Model and Programmable Switches Co-Designed ApproachesIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621290(591-600)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication
August 2018
604 pages
ISBN:9781450355674
DOI:10.1145/3230543
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. network snapshots
  2. whole-network measurement

Qualifiers

  • Research-article

Conference

SIGCOMM '18
Sponsor:
SIGCOMM '18: ACM SIGCOMM 2018 Conference
August 20 - 25, 2018
Budapest, Hungary

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)176
  • Downloads (Last 6 weeks)33
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)P4runpro: Enabling Runtime Programmability for RMT Programmable SwitchesProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672230(921-937)Online publication date: 4-Aug-2024
  • (2024)Distributed Network Telemetry With Resource Efficiency and Full AccuracyIEEE/ACM Transactions on Networking10.1109/TNET.2023.332734532:3(1857-1872)Online publication date: Jun-2024
  • (2024)RIDS: Towards Advanced IDS via RNN Model and Programmable Switches Co-Designed ApproachesIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621290(591-600)Online publication date: 20-May-2024
  • (2024)Scalable Network Tomography for Dynamic Spectrum AccessIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621172(2209-2218)Online publication date: 20-May-2024
  • (2023)Cowbird: Freeing CPUs to Compute by Offloading the Disaggregation of MemoryProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604833(1060-1073)Online publication date: 10-Sep-2023
  • (2023)Vulnerabilities and Attacks of Inter-device Coordination in Programmable Networks2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188714(01-10)Online publication date: 19-Jun-2023
  • (2023)Aigis: Full-Coverage And Low-Overhead Mitigating Against Amplified Reflection DDoS AttacksGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437875(1711-1716)Online publication date: 4-Dec-2023
  • (2022)Causal network telemetryProceedings of the 5th International Workshop on P4 in Europe10.1145/3565475.3569084(46-52)Online publication date: 9-Dec-2022
  • (2022)PrintQueueProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544257(516-529)Online publication date: 22-Aug-2022
  • (2022)A survey on TCP enhancements using P4-programmable devicesComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2022.109030212:COnline publication date: 27-Jun-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media