Abstract
Distributed systems provide geographically distributed resources for large-scale applications while managing large volumes of data. In this context, replication of data in several sites of the system is an effective solution for achieving interesting performances. A number of data replication strategies have been proposed in the literature. Data popularity is one of the most important parameters taken into consideration by these strategies. It analyzes the historic of the data access pattern, and provides predictions for future data requests. However, measuring data popularity is a challenging task because there are several factors that contribute to the evaluation of data popularity. In this paper, a new adaptive measurement for data popularity in distributed systems is proposed. The proposed measurement covers all factors taken into consideration by previous work of the literature. It also takes into consideration new factors to deal with the dynamic nature of the system so it can adapt to any access pattern. We show that the exploitation of our measurement improves the performances of replication strategies, while offering the possibility to use the data popularity parameter in new contexts in replication management.
Similar content being viewed by others
References
Abad, C.L., Roberts, N., Lu, Y., Campbell, R.: A storage-centric analysis of MapReduce workloads: file popularity, temporal locality and arrival patterns. In: Proceedings of the 2012 IEEE International Symposium on Workload Characterization, pp. 100–109 (2012)
Aiqiang, G., Luhong, D.: Lazy update propagation for data replication in cloud computing. In: Proceedings of the 5th International Conference on Pervasive Computing and Applications, pp. 250–254 (2010)
Al Mistarihi, H.H.E., Yong, C.H.: Replica management in data grid. Int. J. Comput. Sci. Netw. Secur. 8(6), 22–32 (2008)
Barrefors, B.: Dynamic data management in a data grid environment. Ph.D. thesis, University of Nebraska, USA (2015)
Bell, W.H., Cameron, D.G., Capozza, L., Millar, A.P., Stockinger, K., Zini, F.: OptorSim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17(4), 403–416 (2003)
Ben Charrada, F., Ounelli, H., Chettaoui, H.: An efficient replica placement strategy in highly dynamic data grids. Int. J. Grid Util. Comput. 2(2), 156–163 (2011)
Bonacorsi, D., Boccali, T., Giordano, D., Girone, M., Neri, M., Magini, N., Kuznetsov, V., Wildish, T.: Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond. In: Proceeding of the 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP 2015), pp. 1–10 (2015)
Bsoul, M., Al-Khasawneh, A., Kilani, Y., Obeidat, I.: A threshold-based dynamic data replication strategy. J. Supercomput. 60(3), 301–310 (2012)
Cameron, D.G., Carvajal-Schiaffino, R., Ferguson, J., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: OptorSim v2.1 installation and user guide. Technical report, CERN (2006)
Cameron, D.G., Carvajal-Schiaffino, R., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: Evaluating scheduling and replica optimisation strategies in OptorSim. In: Proceedings of the 4th International Workshop on Grid Computing, pp. 52–59 (2003)
Chang, R.-S., Chang, H.-P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45, 277–295 (2008)
Dayyani, S., Khayyambashi, M.: A comparative study of replication techniques in grid computing systems. Int. J. Comput. Sci. Inform. Secur. 11(9), 64–73 (2013)
Dogra, N., Singh, S.: A survey of dynamic replication strategies in distributed systems. Int. J. Comput. Appl. 110(11), 1–4 (2015)
Giommi, L.: Predicting CMS datasets popularity with machine learning. Master thesis, University of Bologna, Italy (2015)
Goel, S., Buyya, R.: Data replication strategies in wide area distributed systems. In: Enterprise Service Computing: From Concept to Deployment, pp. 211–241 (2006)
Grace, R.K., Manimegalai, R.: Dynamic replica placement and selection strategies in data grids: a comprehensive survey. J. Parallel Distrib. Comput. 74(2), 2099–2108 (2014)
Hamdeni, C., Hamrouni, T., Ben Charrada, F.: New evaluation criterion of file replicas placement for replication strategies in data grids. In: Proceedings of the 9th IEEE International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 1–8 (2014)
Hamdeni, C., Hamrouni, T., Charrada, F.B.: Data popularity measurements in distributed systems: survey and design directions. J. Netw. Comput. Appl. 72, 150–161 (2016)
Hamrouni, T., Hamdeni, C., Ben Charrada, F.: Impact of the distribution quality of file replicas on replication strategies. J. Netw. Comput. Appl. 56, 60–76 (2015)
Hamrouni, T., Hamdeni, C., Ben Charrada, F.: Objective assessment of the performance of data grid replication strategies based on distribution quality. Int. J. Web Eng. Technol. 11(1), 3–28 (2016)
Hockauf, R., Karl, W., Leberecht, M., Oberhuber, M., Wagner, M.: Exploiting spatial and temporal locality of accesses: a new hardware-based monitoring approach for DSM systems. In: Euro-Par98 Parallel Processing, Vol. 1470, pp. 206–215 (1998)
Hussein, M., Mousa, M.: A light-weight data replication for cloud data centers environment. Int. J. Innov. Res. Comput. Commun. Eng. 2(6), 2392–2400 (2014)
Ikeda, T., Ohara, M., Fukumoto, S., Arai, M., Iwasaki, K.: A distributed data replication protocol for file versioning with optimal node assignments. In: Proceedings of the 16th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 117–124 (2010)
Jacky, C., Kevin, L., Brian, N.L.: Availability and popularity measurements of peer-to-peer file systems. http://forensics.umass.edu/pubs/chu.labonte.p2pjournal.pdf. Accessed 1 Sept 2016
Kagan, A., Shepp, L.A.: Why the variance? Stat. Probab. Lett. 38(4), 329–333 (1998)
Kangasharju, J., Roberts, J., Ross, K.W.: Object replication strategies in content distribution networks. Comput. Commun. 25(4), 376–383 (2002)
Kia, H.S., Khan, S.U.: Server replication in multicast networks. In: Proceeding of the 10th IEEE International Conference on Frontiers of Information Technology, pp. 337–341 (2012)
Knoll, M., Abbadi, H., Weis, T.: Replication in peer-to-peer systems. In: Self-Organizing Systems, Vol. 5343, pp. 35–46 (2008)
Kolodziej, J., Khan, S.U.: Data scheduling in data grids and data centers: a short taxonomy of problems and intelligent resolution techniques. In: Transactions on Computational Collective Intelligence X, Vol. 7776, pp. 103–119 (2013)
Leu, F.Y., Lee, M.C., Lin, J.C.: Improving data grids performance by using popular file replicate first algorithm. In: Proceedings of the IEEE International Conference on Broadband, Wireless Computing, Communication and Applications, pp. 416–421 (2011)
Ma, J., Liu, W., Glatard, T.: A classification of file placement and replication methods on grids. Future Gener. Comput. Syst. 29(6), 1395–1406 (2013)
Malik, S.R., Khan, S.U., Ewen, S.J., Tziritas, N., Kolodziej, J., Zomaya, A.Y., Madani, S.A., Min-Allah, N., Wang, L., Xu, C., Malluhi, Q.M., Pecero, J.E., Balaji, P., Vishnu, A., Ranjan, R., Zeadally, S., Li, H.: Performance analysis of data intensive cloud systems based on data management and replication: a survey. Distrib. Parallel Datab. 34, 179–215 (2016)
Mansouri, N., Asadi, A.: Weighted data replication strategy for data grid considering economic approach. Int. J. Comput. Control Quantum Inform. Eng. 8(8), 47–56 (2014)
Manu, V., Shailendra, V., Priyank, B., Singh, K.D.: Eager computation and lazy propagation of modifications for reducing synchronization overhead in file replication system. In: Proceedings of the 3rd IEEE International Conference on Computer and Communication Technology, pp. 331–334 (2012)
McKinley, K.S., Temam, O.: A quantitative analysis of loop nest locality. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 94–104 (1996)
Myint, J., Hunger, A.: Modeling a load-adaptive data replication in cloud environments. In: Proceedings of the 3rd International Conference on Cloud Computing and Services Science, pp. 511–514 (2013)
Passarella, A.: A survey on content-centric technologies for the current internet: CDN and P2P solutions. Comput. Commun. 35(1), 1–32 (2012)
Rahmani, M., Benchaiba, M.: A comparative study of replication schemes for structured P2P networks. In: Proceedings of the 9th International Conference on Internet and Web Applications and Services, pp. 147–158 (2014)
Ranganathan, K., Foster, I.T.: Identifying dynamic replication strategies for a high-performance data grid. In: Proceedings of the Second International Workshop on Grid Computing, pp. 75–86 (2001)
Saadat, N., Rahmani, A.M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Future Gener. Comput. Syst. 28(4), 666–681 (2012)
Seddiki, M., Benchaiba, M.: Toward a global file popularity estimation in unstructured P2P networks. In: Proceedings of the 8th International Conference on Systems and Networks Communications, pp. 77–81 (2013)
Shorfuzzaman, M., Graham, P., Eskicioglu, M.R.: Popularity-driven dynamic replica placement in hierarchical data grids. In: Proceedings of the 9th IEEE International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 524–531 (2008)
Shorfuzzaman, M., Graham, P., Eskicioglu, R.: Adaptive popularity-driven replica placement in hierarchical data grids. J. Supercomput. 51(3), 374–392 (2010)
Singh, S.K., Prasad, A., Singh, P., Singh, R.: A replica placement and replacement algorithm for data-grid in DRTDBS. In: Proceedings of the IEEE International Conference on Electronics and Communication Systems, pp. 1–5 (2014)
Soosai, A.M., Abdullah, A., Othman, M., Latip, R., Sulaiman, M.N., Ibrahim, H.: Dynamic replica replacement strategy in data grid. In: Proceedings of the 8th International Conference on Computing Technology and Information Management, Vol. 2, pp. 578–584 (2012)
Souri, A., Pashazadeh, S., Navin, A.H.: Consistency of data replication protocols in database systems: a review. Int. J. Inform. Theory 3(4), 19–32 (2014)
Spaho, E., Barolli, L., Xhafa, F.: Data replication strategies in P2P systems: a survey. In: Proceedings of the 17th International Conference on Network-Based Information Systems, pp. 302–309 (2014)
Sun, D., Chang, G., Gao, S., Jin, L., Wang, X.: Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. J. Comput. Sci. Technol. 27(4), 256–272 (2012)
Suri, P.K., Singh, M.: DR2: a two-stage dynamic replication strategy for data grid. Int. J. Recent Trends Eng. 2(4), 201–203 (2009)
Tang, M., Lee, B.S., Tang, X., Yeo, C.K.: The impact of data replication on job scheduling performance in the data grid. Future Gener. Comput. Syst. 22(3), 254–268 (2006)
Thampi, S.M., Sekaran, K.C.: Review of replication schemes for unstructured P2P networks. In: Proceedings of IEEE International Advance Computing Conference, pp. 794–800 (2009)
Wang, X., Yang, S., Wang, S., Niu, X., Xu, J.: An application-based adaptive replica consistency for cloud storage. In: Proceedings of the 9th IEEE International Conference on Grid and Cloud Computing, pp. 13–17 (2010)
Wang, Z., Li, T., Xiong, N., Pan, Y.: A novel dynamic network data replication scheme based on historical access record and proactive deletion. J. Supercomput. 62(1), 227–250 (2012)
Watanabe, T., Kanzaki, A., Hara, T., Nishio, S.: An update propagation strategy considering access frequency in peer-to-peer networks. In: Database Systems for Advanced Applications, Vol. 4947, pp. 661–669 (2008)
Ye, Z., Li, S., Zhou, J.: A two-layer geo-cloud based dynamic replica creation strategy. Appl. Math. Inform. Sci. 8(1), 431–440 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hamdeni, C., Hamrouni, T. & Ben Charrada, F. Adaptive measurement method for data popularity in distributed systems. Cluster Comput 19, 1801–1818 (2016). https://doi.org/10.1007/s10586-016-0637-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0637-y