[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Adaptive measurement method for data popularity in distributed systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Distributed systems provide geographically distributed resources for large-scale applications while managing large volumes of data. In this context, replication of data in several sites of the system is an effective solution for achieving interesting performances. A number of data replication strategies have been proposed in the literature. Data popularity is one of the most important parameters taken into consideration by these strategies. It analyzes the historic of the data access pattern, and provides predictions for future data requests. However, measuring data popularity is a challenging task because there are several factors that contribute to the evaluation of data popularity. In this paper, a new adaptive measurement for data popularity in distributed systems is proposed. The proposed measurement covers all factors taken into consideration by previous work of the literature. It also takes into consideration new factors to deal with the dynamic nature of the system so it can adapt to any access pattern. We show that the exploitation of our measurement improves the performances of replication strategies, while offering the possibility to use the data popularity parameter in new contexts in replication management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Abad, C.L., Roberts, N., Lu, Y., Campbell, R.: A storage-centric analysis of MapReduce workloads: file popularity, temporal locality and arrival patterns. In: Proceedings of the 2012 IEEE International Symposium on Workload Characterization, pp. 100–109 (2012)

  2. Aiqiang, G., Luhong, D.: Lazy update propagation for data replication in cloud computing. In: Proceedings of the 5th International Conference on Pervasive Computing and Applications, pp. 250–254 (2010)

  3. Al Mistarihi, H.H.E., Yong, C.H.: Replica management in data grid. Int. J. Comput. Sci. Netw. Secur. 8(6), 22–32 (2008)

    Google Scholar 

  4. Barrefors, B.: Dynamic data management in a data grid environment. Ph.D. thesis, University of Nebraska, USA (2015)

  5. Bell, W.H., Cameron, D.G., Capozza, L., Millar, A.P., Stockinger, K., Zini, F.: OptorSim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17(4), 403–416 (2003)

    Article  MATH  Google Scholar 

  6. Ben Charrada, F., Ounelli, H., Chettaoui, H.: An efficient replica placement strategy in highly dynamic data grids. Int. J. Grid Util. Comput. 2(2), 156–163 (2011)

    Article  Google Scholar 

  7. Bonacorsi, D., Boccali, T., Giordano, D., Girone, M., Neri, M., Magini, N., Kuznetsov, V., Wildish, T.: Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond. In: Proceeding of the 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP 2015), pp. 1–10 (2015)

  8. Bsoul, M., Al-Khasawneh, A., Kilani, Y., Obeidat, I.: A threshold-based dynamic data replication strategy. J. Supercomput. 60(3), 301–310 (2012)

    Article  Google Scholar 

  9. Cameron, D.G., Carvajal-Schiaffino, R., Ferguson, J., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: OptorSim v2.1 installation and user guide. Technical report, CERN (2006)

  10. Cameron, D.G., Carvajal-Schiaffino, R., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: Evaluating scheduling and replica optimisation strategies in OptorSim. In: Proceedings of the 4th International Workshop on Grid Computing, pp. 52–59 (2003)

  11. Chang, R.-S., Chang, H.-P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45, 277–295 (2008)

    Article  Google Scholar 

  12. Dayyani, S., Khayyambashi, M.: A comparative study of replication techniques in grid computing systems. Int. J. Comput. Sci. Inform. Secur. 11(9), 64–73 (2013)

    Google Scholar 

  13. Dogra, N., Singh, S.: A survey of dynamic replication strategies in distributed systems. Int. J. Comput. Appl. 110(11), 1–4 (2015)

    Google Scholar 

  14. Giommi, L.: Predicting CMS datasets popularity with machine learning. Master thesis, University of Bologna, Italy (2015)

  15. Goel, S., Buyya, R.: Data replication strategies in wide area distributed systems. In: Enterprise Service Computing: From Concept to Deployment, pp. 211–241 (2006)

  16. Grace, R.K., Manimegalai, R.: Dynamic replica placement and selection strategies in data grids: a comprehensive survey. J. Parallel Distrib. Comput. 74(2), 2099–2108 (2014)

    Article  Google Scholar 

  17. Hamdeni, C., Hamrouni, T., Ben Charrada, F.: New evaluation criterion of file replicas placement for replication strategies in data grids. In: Proceedings of the 9th IEEE International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 1–8 (2014)

  18. Hamdeni, C., Hamrouni, T., Charrada, F.B.: Data popularity measurements in distributed systems: survey and design directions. J. Netw. Comput. Appl. 72, 150–161 (2016)

    Article  Google Scholar 

  19. Hamrouni, T., Hamdeni, C., Ben Charrada, F.: Impact of the distribution quality of file replicas on replication strategies. J. Netw. Comput. Appl. 56, 60–76 (2015)

    Article  Google Scholar 

  20. Hamrouni, T., Hamdeni, C., Ben Charrada, F.: Objective assessment of the performance of data grid replication strategies based on distribution quality. Int. J. Web Eng. Technol. 11(1), 3–28 (2016)

    Article  Google Scholar 

  21. Hockauf, R., Karl, W., Leberecht, M., Oberhuber, M., Wagner, M.: Exploiting spatial and temporal locality of accesses: a new hardware-based monitoring approach for DSM systems. In: Euro-Par98 Parallel Processing, Vol. 1470, pp. 206–215 (1998)

  22. Hussein, M., Mousa, M.: A light-weight data replication for cloud data centers environment. Int. J. Innov. Res. Comput. Commun. Eng. 2(6), 2392–2400 (2014)

    Google Scholar 

  23. Ikeda, T., Ohara, M., Fukumoto, S., Arai, M., Iwasaki, K.: A distributed data replication protocol for file versioning with optimal node assignments. In: Proceedings of the 16th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 117–124 (2010)

  24. Jacky, C., Kevin, L., Brian, N.L.: Availability and popularity measurements of peer-to-peer file systems. http://forensics.umass.edu/pubs/chu.labonte.p2pjournal.pdf. Accessed 1 Sept 2016

  25. Kagan, A., Shepp, L.A.: Why the variance? Stat. Probab. Lett. 38(4), 329–333 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kangasharju, J., Roberts, J., Ross, K.W.: Object replication strategies in content distribution networks. Comput. Commun. 25(4), 376–383 (2002)

    Article  Google Scholar 

  27. Kia, H.S., Khan, S.U.: Server replication in multicast networks. In: Proceeding of the 10th IEEE International Conference on Frontiers of Information Technology, pp. 337–341 (2012)

  28. Knoll, M., Abbadi, H., Weis, T.: Replication in peer-to-peer systems. In: Self-Organizing Systems, Vol. 5343, pp. 35–46 (2008)

  29. Kolodziej, J., Khan, S.U.: Data scheduling in data grids and data centers: a short taxonomy of problems and intelligent resolution techniques. In: Transactions on Computational Collective Intelligence X, Vol. 7776, pp. 103–119 (2013)

  30. Leu, F.Y., Lee, M.C., Lin, J.C.: Improving data grids performance by using popular file replicate first algorithm. In: Proceedings of the IEEE International Conference on Broadband, Wireless Computing, Communication and Applications, pp. 416–421 (2011)

  31. Ma, J., Liu, W., Glatard, T.: A classification of file placement and replication methods on grids. Future Gener. Comput. Syst. 29(6), 1395–1406 (2013)

    Article  Google Scholar 

  32. Malik, S.R., Khan, S.U., Ewen, S.J., Tziritas, N., Kolodziej, J., Zomaya, A.Y., Madani, S.A., Min-Allah, N., Wang, L., Xu, C., Malluhi, Q.M., Pecero, J.E., Balaji, P., Vishnu, A., Ranjan, R., Zeadally, S., Li, H.: Performance analysis of data intensive cloud systems based on data management and replication: a survey. Distrib. Parallel Datab. 34, 179–215 (2016)

    Article  Google Scholar 

  33. Mansouri, N., Asadi, A.: Weighted data replication strategy for data grid considering economic approach. Int. J. Comput. Control Quantum Inform. Eng. 8(8), 47–56 (2014)

    Google Scholar 

  34. Manu, V., Shailendra, V., Priyank, B., Singh, K.D.: Eager computation and lazy propagation of modifications for reducing synchronization overhead in file replication system. In: Proceedings of the 3rd IEEE International Conference on Computer and Communication Technology, pp. 331–334 (2012)

  35. McKinley, K.S., Temam, O.: A quantitative analysis of loop nest locality. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 94–104 (1996)

  36. Myint, J., Hunger, A.: Modeling a load-adaptive data replication in cloud environments. In: Proceedings of the 3rd International Conference on Cloud Computing and Services Science, pp. 511–514 (2013)

  37. Passarella, A.: A survey on content-centric technologies for the current internet: CDN and P2P solutions. Comput. Commun. 35(1), 1–32 (2012)

    Article  Google Scholar 

  38. Rahmani, M., Benchaiba, M.: A comparative study of replication schemes for structured P2P networks. In: Proceedings of the 9th International Conference on Internet and Web Applications and Services, pp. 147–158 (2014)

  39. Ranganathan, K., Foster, I.T.: Identifying dynamic replication strategies for a high-performance data grid. In: Proceedings of the Second International Workshop on Grid Computing, pp. 75–86 (2001)

  40. Saadat, N., Rahmani, A.M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Future Gener. Comput. Syst. 28(4), 666–681 (2012)

    Article  Google Scholar 

  41. Seddiki, M., Benchaiba, M.: Toward a global file popularity estimation in unstructured P2P networks. In: Proceedings of the 8th International Conference on Systems and Networks Communications, pp. 77–81 (2013)

  42. Shorfuzzaman, M., Graham, P., Eskicioglu, M.R.: Popularity-driven dynamic replica placement in hierarchical data grids. In: Proceedings of the 9th IEEE International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 524–531 (2008)

  43. Shorfuzzaman, M., Graham, P., Eskicioglu, R.: Adaptive popularity-driven replica placement in hierarchical data grids. J. Supercomput. 51(3), 374–392 (2010)

    Article  Google Scholar 

  44. Singh, S.K., Prasad, A., Singh, P., Singh, R.: A replica placement and replacement algorithm for data-grid in DRTDBS. In: Proceedings of the IEEE International Conference on Electronics and Communication Systems, pp. 1–5 (2014)

  45. Soosai, A.M., Abdullah, A., Othman, M., Latip, R., Sulaiman, M.N., Ibrahim, H.: Dynamic replica replacement strategy in data grid. In: Proceedings of the 8th International Conference on Computing Technology and Information Management, Vol. 2, pp. 578–584 (2012)

  46. Souri, A., Pashazadeh, S., Navin, A.H.: Consistency of data replication protocols in database systems: a review. Int. J. Inform. Theory 3(4), 19–32 (2014)

    Article  Google Scholar 

  47. Spaho, E., Barolli, L., Xhafa, F.: Data replication strategies in P2P systems: a survey. In: Proceedings of the 17th International Conference on Network-Based Information Systems, pp. 302–309 (2014)

  48. Sun, D., Chang, G., Gao, S., Jin, L., Wang, X.: Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. J. Comput. Sci. Technol. 27(4), 256–272 (2012)

    Article  MATH  Google Scholar 

  49. Suri, P.K., Singh, M.: DR2: a two-stage dynamic replication strategy for data grid. Int. J. Recent Trends Eng. 2(4), 201–203 (2009)

    Google Scholar 

  50. Tang, M., Lee, B.S., Tang, X., Yeo, C.K.: The impact of data replication on job scheduling performance in the data grid. Future Gener. Comput. Syst. 22(3), 254–268 (2006)

    Article  Google Scholar 

  51. Thampi, S.M., Sekaran, K.C.: Review of replication schemes for unstructured P2P networks. In: Proceedings of IEEE International Advance Computing Conference, pp. 794–800 (2009)

  52. Wang, X., Yang, S., Wang, S., Niu, X., Xu, J.: An application-based adaptive replica consistency for cloud storage. In: Proceedings of the 9th IEEE International Conference on Grid and Cloud Computing, pp. 13–17 (2010)

  53. Wang, Z., Li, T., Xiong, N., Pan, Y.: A novel dynamic network data replication scheme based on historical access record and proactive deletion. J. Supercomput. 62(1), 227–250 (2012)

    Article  Google Scholar 

  54. Watanabe, T., Kanzaki, A., Hara, T., Nishio, S.: An update propagation strategy considering access frequency in peer-to-peer networks. In: Database Systems for Advanced Applications, Vol. 4947, pp. 661–669 (2008)

  55. Ye, Z., Li, S., Zhou, J.: A two-layer geo-cloud based dynamic replica creation strategy. Appl. Math. Inform. Sci. 8(1), 431–440 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Hamrouni.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamdeni, C., Hamrouni, T. & Ben Charrada, F. Adaptive measurement method for data popularity in distributed systems. Cluster Comput 19, 1801–1818 (2016). https://doi.org/10.1007/s10586-016-0637-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0637-y

Keywords

Navigation