Abstract
Group communication services (GCSs) are becoming increasingly important as a wide field of promising applications has emerged to serve millions of users distributed across the world. However, it is challenging to make the service fault tolerance and scalable to fulfill the voluminous demand of users in a distributed network (DN). While many reliable group communication protocols have been dedicated to addressing such a challenge so as to accommodate the changes in the network, they are often costly or require complicated strategies to handle the service interruptions caused by node departures or link failures, which hinders the service practicability. In this paper, we present two schemes to address the challenges. The first one is a location-aware replication scheme called NS, which makes replicas in a dispersed fashion that enables the services on nodes to gain immunity of failures with different patterns (e.g., network partition and single point failure) while keeping replication overhead low. The second one is a novel failure recovery scheme that exploits the independence between service recovery and structure recovery in time domain to achieve quick failure recovery. Our simulation results indicate that the two proposed schemes outperform the existing schemes and simple alternative schemes in service success rate, recovery latency, and communication cost.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chu Y, Rao S G, Seshan S, Zhang H. A case for end system multicast. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1456–1471.
Castro M, Druschel P, Kermarrec A M, Rowstron A I T. SCRIBE: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1489–1499.
Chawathe Y. Scattercast: An adaptable broadcast distribution framework. Multimedia Systems, 2003, 9(1): 104–118.
Francis P. Yoid: Extending the internet multicast architecture. http://www.aciri.org/yoid/docs/index.html, 2000.
Banerjee S, Bhattacharjee B, Kommareddy C. Scalable application layer multicast. In Proc. SIGCOMM 2002, Pittsburgh, USA, Aug. 19-23, 2002, pp.205–217.
Banerjee S, Kommareddy C, Kar K, Bhattacharjee B, Khuller S. OMNI: An efficient overlay multicast infrastructure for real-time applications. Computer Networks, 2006, 50(6): 826–841.
Jannotti J, Gifford D, Johnson K, Kaashoek M et al. Over-cast: Reliable multicasting with on overlay network. In Proc. OSDI 2000, San Diego, USA, Oct. 23-25, 2000, pp.197–212.
Zhang J, Liu L, Ramaswamy L, Pu C. PeerCast: Churnresilient end system multicast on heterogeneous overlay networks. Journal of Network and Computer Applications, 2008, 31(4): 821–850.
Castro M, Druschel P, Kermarrec A, Nandi A, Rowstron A, Singh A. SplitStream: High-bandwidth multicast in cooperative environments. In Proc. SOSP 2003, Bolton Landing, USA, Oct. 19-22, 2003, pp.298–313.
Kostić D, Rodriguez A, Albrecht J, Vahdat A. Bullet: High bandwidth data dissemination using an overlay mesh. ACM SIGOPS Operating Systems Review, 2003, 37(5): 282–297.
Zhang X, Liu J, Li B, Yum T. CoolStreaming/DONet: A data-driven overlay network for efficient live media streaming. In Proc. INFOCOM 2005, Miami, USA, Mar. 13-17, 2005, pp.13–17.
Pai V, Kumar K, Tamilmani K, Sambamurthy V, Mohr A. Chainsaw: Eliminating trees from overlay multicast. Peer-to-peer systems IV, 2005, pp.127–140.
Tran D A, Hua K A, Do T. Zigzag: An efficient peer-to-peer scheme for media streaming. In Proc. INFOCOM 2003, San Franciso, USA, Mar. 30-Apr. 3, 2003, pp.1283–1292.
Gu X, Nahrstedt K, Yu B. SpiderNet: An integrated peer-to-peer service composition framework. In Proc. HPDC 2004, Honolulu, Hawaii, USA, Jun. 4-6, 2004, pp.110–119.
Wang Y, Liu L, Pu C, Zhang G. GeoCast: An efficient overlay system for multicast applications. Technical Report, Georgia Institute of Technology, 2009, http://www.cercs.gatech.edu/tech-reports/tr2009/git-cercs-09-16.pdf.
Wen C, Wu C, Yang M. Hybrid tree based explicit routed multicast for QoS supported IPTV service. In Proc. GLOBE-COM 2009, Honolulu, Hawaii, USA, Nov. 30-Dec. 4, 2009, pp.1–6.
Fei A, Cui J, Gerla M, Cavendish D. A “dual-tree” scheme for fault-tolerant multicast. In Proc. ICC 2001, Helsinki, Finland, Jun. 11-14, 2001, pp.690–694.
Banerjee S, Lee S, Bhattacharjee B, Srinivasan A. Resilient multicast using overlays. ACM SIGMETRICS Performance Evaluation Review, 2003, 31(1): 102–113.
Gopalakrishnan V, Silaghi B, Bhattacharjee B, Keleher P. Adaptive replication in peer-to-peer systems. In Proc. ICDCS 2004, Tokyo, Japan, Mar. 23-26, 2004, pp.360–369.
Yoshinaga H, Tsuchiya T, Sawano H, Koyanagi K. A study on scalable object replication method for the distributed cooperative storage system. In Proc. ICDT 2009, Colmar, France, Jul. 20-25, 2009, pp.96–101.
Sandhu H S, Zhou S. Cluster-based file replication in large-scale distributed systems. ACM SIGMETRICS Performance Evaluation Review, 1992, 20(1): 91–102.
Shen H, Zhu Y. A proactive low-overhead file replication scheme for structured P2P content delivery networks. Journal of Parallel and Distributed Computing, 2009, 69(5): 429–440.
Tirado J M, Higuero D, Isaila F, Carretero J, Iamnitchi A. Affinity P2P: A self-organizing content-based locality-aware collaborative peer-to-peer network. Computer Networks, 2010, 54(12): 2056–2070.
Ho C, Lee S, Yu J. Cluster-based replication for P2P-based video-on-demand service. In Proc. ICEIE 2010, Kyoto, Japan, Aug. 1-3, 2010, pp.49–53.
Zhao K, Niu Z, Zhao Y, Yang J. Search with index replication in power-law like peer-to-peer networks. In Proc. ICCET 2010, Chengdu, China, Apr. 16-18, 2010, pp.334–338.
Zhang J, Liu L, Pu C, Ammar M. Reliable peer-to-peer end system multicasting through replication. In Proc. P2P 2004, Zurich, Switzerland, Aug. 25-27, 2004, pp.235–242.
Ratnasamy S, Francis P, Handley M, Karp R, Schenker S. A scalable content-addressable network. In Proc. SIGCOMM 2001, San Diego, USA, Aug. 27-31, 2001, pp.161–172.
Yamamoto H, Maruta D, Oie Y. Replication methods for load balancing on distributed storages in P2P networks. IEICE Transactions, 2006, E89-D(1): 171–180.
Kalogeraki V, Gunopulos D, Zeinalipour-Yazti D. A local search mechanism for peer-to-peer networks. In Proc. CIKM 2002, Nov. 5-8, 2002, pp.300–307.
Ganesan P, Bawa M, Garcia-Molina H. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proc. VLDB 2004, Toronto, Canada, Aug. 31-Sep. 3, 2004, pp.444–455.
Sato H, Matsuoka S, Endo T, Maruyama N. Access-pattern and bandwidth aware file replication algorithm in a grid environment. In Proc. Grid 2008, Tsukuba, Japan, Sep. 29-Oct. 1, 2008, pp.250–257.
Chang T, Ahamad M. Improving service performance through object replication in middleware: A peer-to-peer approach. In Proc. P2P 2005, Konstanz, Germany, Aug. 31-Sep. 2, 2005, pp.245–252.
Lv Q, Cao P, Cohen E, Li K, Shenker S. Search and replication in unstructured peer-to-peer networks. In Proc. ICS 2002, New York, USA, Jun. 22-26, 2002, pp.84–95.
Liu Y, Liu X, Xiao L, Ni L, Zhang X. Location-aware topology matching in P2P systems. In Proc. INFOCOM 2004, Hong Kong, China, Mar. 7-11, 2004, pp.2220–2230.
Falchi F, Gennaro C, Zezula P. A content-addressable network for similarity search in metric spaces. In Proc. DBISP2P 2005/2006, Trondheim, Norway, Aug. 28-29, 2005, pp.98–110.
Sahin O D, Gupta A, Agrawal D, El Abbadi A. A peer-to-peer framework for caching range queries. In Proc. ICDE 2004, Boston, USA, Mar. 30-Apr. 2, 2004, pp.165–176.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by National Science Foundation (NSF) grant from CISE NetSE Program and CyberTrust Cross-Cutting Program of USA, IBM faculty award, IBM SUR grant, grant from Intel Research Council, the National Basic Research 973 Program of China under Grant No. 2009CB320805, the National Natural Science Foundation of China under Grant No. 61170188, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011803, and Fundamental Research Funds for the Central Universities of China. The first author was supported by China Scholarship Council (CSC) and performed part of the work as a visiting Ph.D. candidate in 2007 » 2009 at the Distributed Data intensive Systems Lab (DiSL) in Georgia Institute of Technology.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wang, YH., Zhou, Z., Liu, L. et al. Fault Tolerance and Recovery for Group Communication Services in Distributed Networks. J. Comput. Sci. Technol. 27, 298–312 (2012). https://doi.org/10.1007/s11390-012-1224-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-012-1224-1