[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Data popularity measurements in distributed systems

Published: 01 September 2016 Publication History

Abstract

Distributed systems continue to be a promising area of research particularly in terms of providing efficient data access and maximum data availability for large-scale applications. For improving performances of distributed systems, several data replication strategies have been proposed to ensure reliability and data transfer speed as well as to offer the possibility to access the data efficiently from multiple locations. Data popularity is one of the most important parameters taken into consideration when designing data replication strategies. It assesses how much the data is requested by the sites of the system. In this paper, the importance of considering the data popularity parameter in replication management is highlighted. Different strategies are then identified and how they rely on the data popularity parameter is illustrated. Different calculation manners of data popularity are hence studied. This allows us to find out which factors are considered in order to assess data popularity. After classifying them into four categories, this work includes a critical discussion about each category. Some important directions for future work are then discussed towards possible solutions for a more effective data popularity assessment.

References

[1]
Abad, C.L., Roberts, N., Lu, Y., Campbell, R.H., 2012. A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns. In: Proceedings of the 2012 IEEE International Symposium on Workload Characterization, pp. 100-109.
[2]
Aiqiang, G., Luhong, D., 2010. Lazy update propagation for data replication in cloud computing. In: Proceedings of the 5th International Conference on Pervasive Computing and Applications, pp. 250-254.
[3]
H.H.E. Al Mistarihi, C.H. Yong, Replica management in data grid, Int. J. Comput. Sci. Netw. Secur., 8 (2008) 22-32.
[4]
B. Barrefors, Dynamic Data Management in a Data Grid Environment (Ph.D. Thesis), University of Nebraska, USA, 2015.
[5]
W.H. Bell, D.G. Cameron, L. Capozza, A.P. Millar, K. Stockinger, F. Zini, OptorSim: a grid simulator for studying dynamic data replication strategies, Int. J. High Perform. Comput. Appl., 17 (2003) 403-416.
[6]
F. Ben Charrada, H. Ounelli, H. Chettaoui, An efficient replica placement strategy in highly dynamic data grids, Int. J. Grid Util. Comput., 2 (2011) 156-163.
[7]
Bonacorsi, D., Boccali, T., Giordano, D., Girone, M., Neri, M., Magini, N., Kuznetsov, V., Wildish, T., 2015. Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond. In: Proceeding of the 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP 2015), pp. 1-10.
[8]
M. Bsoul, A. Al-Khasawneh, Y. Kilani, I. Obeidat, A threshold-based dynamic data replication strategy, J. Supercomput., 60 (2012) 301-310.
[9]
Cameron, D.G., Carvajal-Schiaffino, R., Ferguson, J., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F., 2006. OptorSim v2.1 Installation and User Guide. Technical report, CERN.
[10]
R.-S. Chang, H.-P. Chang, A dynamic data replication strategy using access-weights in data grids, J. Supercomput., 45 (2008) 277-295.
[11]
A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, The data grid: towards an architecture for the distributed management and analysis of large scientific datasets, J. Netw. Comput. Appl., 23 (2000) 187-200.
[12]
Chettaoui, H., Ben Charrada, F., 2012. A decentralized periodic replication strategy based on knapsack problem. In: Proceedings of the 13th International ACM/IEEE Conference on Grid Computing, pp. 3-13.
[13]
N.N. Dang, S.B. Lim, Combination of replication and scheduling in data grids, Int. J. Comput. Sci. Netw. Secur., 7 (2007) 304-308.
[14]
L. Giommi, Predicting CMS Datasets Popularity with Machine Learning (Master Thesis), University of Bologna, Italy, 2015.
[15]
T. Hamrouni, C. Hamdeni, F. Ben Charrada, Impact of the distribution quality of file replicas on replication strategies, J. Netw. Comput. Appl., 56 (2015) 60-76.
[16]
T. Hamrouni, S. Slimani, F. Ben Charrada, A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids, Eng. Appl. Artif. Intell., 48 (2016) 140-158.
[17]
Hockauf, R., Karl, W., Leberecht, M., Oberhuber, M., Wagner, M., 1998. Exploiting spatial and temporal locality of accesses: A new hardware-based monitoring approach for DSM systems. In: Euro-Par98 Parallel Processing, vol. 1470, pp. 206-215.
[18]
Holtman, K., 2001. CMS Data Grid System: Overview and Requirements. Technical Report, The Compact Muon Solenoid (CMS) Experiment Note 2001/037, CERN, Switzerland.
[19]
M. Hussein, M. Mousa, A light-weight data replication for cloud data centers environment, Int. J. Innov. Res. Comput. Commun. Eng., 2 (2014) 2392-2400.
[20]
Ikeda, T., Ohara, M., Fukumoto, S., Arai, M., Iwasaki, K., 2010. A distributed data replication protocol for file versioning with optimal node assignments. In: Proceedings of the 16th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 117-124.
[21]
Jacky, C., Kevin, L., Brian, N.L., 2004. Availability and popularity measurements of peer-to-peer file systems Online. Available at: {http://forensics.umass.edu/pubs/chu.labonte.p2pjournal.pdf} (accessed on 09.06.16).
[22]
J. Kangasharju, J. Roberts, K.W. Ross, Object replication strategies in content distribution networks, Comput. Commun., 25 (2002) 376-383.
[23]
B. Kemme, A. Schiper, G. Ramalingam, M. Shapiro, Dagstuhl seminar review: consistency in distributed system, SIGACT News, 45 (2014) 67-89.
[24]
Knoll, M., Abbadi, H., Weis, T., 2008. Replication in peer-to-peer systems. In: Self-Organizing Systems, vol. 5343, pp. 35-46.
[25]
Kolodziej, J., Khan, S.U., 2013. Data scheduling in data grids and data centers: A short taxonomy of problems and intelligent resolution techniques. In: Transactions on Computational Collective Intelligence X, vol. 7776, pp. 103-119.
[26]
Lei, M., Vrbsky, S.V., 2006. A data replication strategy to increase data availability in data grids. In: Proceedings of the 2006 International Conference on Grid Computing & Applications, pp. 221-227.
[27]
M. Lei, S.V. Vrbsky, X. Hong, An on-line replication strategy to increase availability in data grids, Future Gener Comput Syst, 24 (2008) 85-98.
[28]
Leu, F.Y., Lee, M.C., Lin, J.C., 2011. Improving data grids performance by using popular file replicate first algorithm. In: Proceedings of the IEEE International Conference on Broadband, Wireless Computing, Communication and Applications, pp. 416-421.
[29]
J. Ma, W. Liu, T. Glatard, A classification of file placement and replication methods on grids, Future Gener. Comput. Syst., 29 (2013) 1395-1406.
[30]
S.R. Malik, S.U. Khan, S.J. Ewen, N. Tziritas, J. Kolodziej, A.Y. Zomaya, S.A. Madani, N. Min-Allah, L. Wang, C. Xu, Q.M. Malluhi, J.E. Pecero, P. Balaji, A. Vishnu, R. Ranjan, S. Zeadally, H. Li, Performance analysis of data intensive cloud systems based on data management and replication: a survey, Distrib. Parallel Databases (2015) 1-37.
[31]
N. Mansouri, An effective weighted data replication strategy for data grid, Aust. J. Basic Appl. Sci., 6 (2012) 336-346.
[32]
N. Mansouri, A. Asadi, Weighted data replication strategy for data grid considering economic approach, Int. J. Comput. Control Quantum Inf. Eng., 8 (2014) 47-56.
[33]
Mansouri, Y., Garmehi, M., Sargolzaei, M., Shadi, M., 2008. Optimal number of replicas in data grid environment. In: Proceedings of the 1st International Conference on Distributed Framework and Applications, 2008, pp. 96-101.
[34]
Mansouri, Y., Monsefi, R., 2008. Optimal number of replicas with QoS assurance in data grid environment. In: Proceedings of the 2nd Asia International Conference on Modeling & Simulation, pp. 168-173.
[35]
Manu, V., Shailendra, V., Priyank, B., Singh, K.D., 2012. Eager computation and lazy propagation of modifications for reducing synchronization overhead in file replication system. In: Proceedings of the 3rd IEEE International Conference on Computer and Communication Technology, pp. 331-334.
[36]
McKinley, K.S., Temam, O., 1996. A quantitative analysis of loop nest locality. In: Proceedings of the 17th international conference on Architectural Support for Programming Languages and Operating Systems, pp. 94-104.
[37]
B. Meroufel, G. Belalem, Dynamic replication based on availability and popularity in the presence of failures, J. Inf. Process. Syst., 8 (2012) 263-278.
[38]
B.A. Milani, N.J. Navimipour, A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions, J. Netw. Comput. Appl., 64 (2016) 229-238.
[39]
Myint, J., A. Hunger, A., 2013. Modeling a load-adaptive data replication in cloud environments. In: Proceedings of the 3rd International Conference on Cloud Computing and Services Science, pp. 511-514.
[40]
A. Passarella, A survey on content-centric technologies for the current internet:¿CDN and P2P solutions, Comput. Commun., 35 (2012) 1-32.
[41]
Rahman, R.M., Barker, K., Alhajj, R., 2005. Replica placement in data grid: considering utility and risk. In: Proceedings of the International Conference on Information Technology: Coding and Computing, vol. 1, pp. 354-359.
[42]
Rahman, R.M., Barker, K., Alhajj, R., 2006. Replica placement design with static optimality and dynamic maintainability. In: Proceeding of the 6th IEEE International Symposium on Cluster Computing and the Grid, pp. 434-437.
[43]
R.M. Rahman, K. Barker, R. Alhajj, Replica placement strategies in data grid, J. Grid Comput., 6 (2008) 103-123.
[44]
Rahmani, M., Benchaiba, M., 2014. A comparative study of replication schemes for structured P2P networks. In: Proceedings of the 9th International Conference on Internet and Web Applications and Services, pp. 147-158.
[45]
Ranganathan, K., Foster, I., 2001. Identifying dynamic replication strategies for a high-performance data grid. In: Proceedings of the 2nd International Workshop on Grid Computing, pp. 75-86.
[46]
Q. Rasool, J. Li, G.S. Oreku, E.U. Munir, Fair-Share replication in data grid, Inf. Technol. J., 7 (2008) 776-782.
[47]
Rasool, Q., Li, J., Oreku, G.S., Munir, E.U., Yang, D., 2007. A comparative study of replica placement strategies in data grids. In: Proceeding of Advances in Web and Network Technologies, and Information Management, Computer Sciences, vol. 4537, Springer, 2007, pp. 135-143.
[48]
N. Saadat, A.M. Rahmani, PDDRA:¿a new pre-fetching based dynamic data replication algorithm in data grids, Future Gener. Comput. Syst., 28 (2012) 666-681.
[49]
Seddiki, M., Benchaiba, M., 2013. Toward a global file popularity estimation in unstructured p2p networks. In: Proceeding of the eighth International Conference on Systems and Networks Communications, pp. 77-81.
[50]
S. Senhadji, A. Kateb, H. Belbachir, Increasing replica consistency performance with load balancing strategy in data grid systems, Int J. Comput. Control Quantum Inf. Eng., 7 (2013) 89-94.
[51]
Shorfuzzaman, M., Graham, P., Eskicioglu, M.R., 2008. Popularity-driven dynamic replica placement in hierarchical data grids. In: Proceedings of the 9th IEEE International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 524-531.
[52]
M. Shorfuzzaman, P. Graham, M.R. Eskicioglu, Adaptive popularity-driven replica placement in hierarchical data grids, J. Supercomput., 51 (2010) 374-392.
[53]
Singh, S.K., Prasad, A., Singh, P.K., Singh, R.K., February 2014. A replica placement and replacement algorithm for data-grid in DRTDBS. In: Proceedings of the IEEE International Conference on Electronics and Communication Systems, pp. 1-5.
[54]
Soosai, A.M., Abdullah, A., Othman, M., Latip, R., Sulaiman, M.N., Ibrahim, H., 2012. Dynamic replica replacement strategy in data grid. In: Proceedings of the 8th International Conference on Computing Technology and Information Management, volume 2, pp. 578-584.
[55]
A. Souri, S. Pashazadeh, A.H. Navin, Consistency of data replication protocols in database systems: a review, Int. J. Inf. Theory, 3 (2014) 19-32.
[56]
Spaho, E., Barolli, L., Xhafa, F., 2014. Data replication strategies in P2P systems: a survey. In: Proceedings of the 17th International Conference on Network-Based Information Systems, pp. 302-309.
[57]
M. Steen, G. Pierre, S. Voulgaris, Challenges in very large distributed systems, J. Internet Serv. Appl., 3 (2011) 59-66.
[58]
D. Sun, G. Chang, S. Gao, L. Jin, X. Wang, Modeling a dynamic data replication strategy to increase system availability in cloud computing environments, J. Comput. Sci. Technol., 27 (2012) 256-272.
[59]
P.K. Suri, M. Singh, DR2:¿a two-stage dynamic replication strategy for data grid, Int. J. Recent Trends Eng., 2 (2009) 201-203.
[60]
M. Tang, B.S. Lee, X. Tang, C.K. Yeo, The impact of data replication on job scheduling performance in the data grid, Future Gener. Comput. Syst., 22 (2006) 254-268.
[61]
Thampi, S.M., Sekaran, K.C., 2009. Review of replication schemes for unstructured P2P networks. In: Proceedings of IEEE International Advance Computing Conference, pp. 794-800.
[62]
Wang, X., Yang, S., Wang, S., Niu, X., Xu, J., 2010. An application-based adaptive replica consistency for cloud storage. In: Proceedings of the 9th IEEE International Conference on Grid and Cloud Computing, pp. 13-17.
[63]
Z. Wang, T. Li, N. Xiong, Y. Pan, A novel dynamic network data replication scheme based on historical access record and proactive deletion, J. Supercomput., 62 (2012) 227-250.
[64]
Watanabe, T., Kanzaki, A., Hara, T., Nishio, S., 2008. An update propagation strategy considering access frequency in peer-to-peer networks. In: Database Systems for Advanced Applications, vol. 4947, pp. 661-669.
[65]
Wuqing, Z., Xu, X., Wang, Z., Zhang, Y., He, S., 2010. Improve the performance of data grids by value-based replication strategy. In: Proceedings of the 6th International Conference on Semantics Knowledge and Grid, pp. 313-316.
[66]
Z. Ye, S. Li, J. Zhou, A two-layer geo-cloud based dynamic replica creation strategy, Appl. Math. Inf. Sci., 8 (2014) 431-440.

Cited By

View all
  • (2023)The Doctrine of MEAN: Realizing Deduplication Storage at Unreliable EdgeIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330546034:10(2811-2826)Online publication date: 1-Oct-2023
  • (2023)When Deduplication Meets Migration: An Efficient and Adaptive Strategy in Distributed Storage SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329930934:10(2749-2766)Online publication date: 1-Oct-2023
  • (2022)Jingwei: An Efficient and Adaptable Data Migration Strategy for Deduplicated Storage SystemsIEEE INFOCOM 2022 - IEEE Conference on Computer Communications10.1109/INFOCOM48880.2022.9796954(1659-1668)Online publication date: 2-May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Network and Computer Applications
Journal of Network and Computer Applications  Volume 72, Issue C
September 2016
171 pages

Publisher

Academic Press Ltd.

United Kingdom

Publication History

Published: 01 September 2016

Author Tags

  1. Access pattern
  2. Data popularity
  3. Distributed system
  4. Replication strategy
  5. Temporal locality

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)The Doctrine of MEAN: Realizing Deduplication Storage at Unreliable EdgeIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330546034:10(2811-2826)Online publication date: 1-Oct-2023
  • (2023)When Deduplication Meets Migration: An Efficient and Adaptive Strategy in Distributed Storage SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329930934:10(2749-2766)Online publication date: 1-Oct-2023
  • (2022)Jingwei: An Efficient and Adaptable Data Migration Strategy for Deduplicated Storage SystemsIEEE INFOCOM 2022 - IEEE Conference on Computer Communications10.1109/INFOCOM48880.2022.9796954(1659-1668)Online publication date: 2-May-2022
  • (2021)Combining task scheduling and data replication for SLA compliance and enhancement of provider profit in cloudsApplied Intelligence10.1007/s10489-021-02267-951:10(7494-7516)Online publication date: 1-Oct-2021
  • (2018)Evaluation of site availability exploitation towards performance optimization in data gridsCluster Computing10.1007/s10586-018-2836-121:4(1967-1980)Online publication date: 1-Dec-2018
  • (2016)Adaptive measurement method for data popularity in distributed systemsCluster Computing10.1007/s10586-016-0637-y19:4(1801-1818)Online publication date: 1-Dec-2016

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media