CN101751309B - Optimized transcript distributing method in data grid - Google Patents
Optimized transcript distributing method in data grid Download PDFInfo
- Publication number
- CN101751309B CN101751309B CN2009102654216A CN200910265421A CN101751309B CN 101751309 B CN101751309 B CN 101751309B CN 2009102654216 A CN2009102654216 A CN 2009102654216A CN 200910265421 A CN200910265421 A CN 200910265421A CN 101751309 B CN101751309 B CN 101751309B
- Authority
- CN
- China
- Prior art keywords
- copy
- node
- resource
- territory
- transcript
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an optimized transcript distributing method in a data grid, belonging to the field of grid computation, and concretely relates to a grid system, in particular to a transcript distributing method in the data grid system. The invention provides a distributed dynamic self-adaptive transcript distributing method in the data grid, aiming at reducing the access cost of the resource request party, and comprising the following steps of: calculating the transcript amount through the response time of resources; and determining proper transcript place via comprehensive consideration of factors of load factor, node performance, real-time bandwidth, and the like so that the transcript distribution can be dynamically suitable for variation of access request and the network communication state. Compared with the existing method for simply requiring transcript amount to improve the data access performance, the invention evaluates the transcript amount and the optimum transcript place from the overall situation of the system, thereby effectively balancing the relationship between the transcript amount and the transcript maintenance cost. The method is not only suitable for data grid systems of read-only resources and particularly suitable for data grid systems with read-write data resources.
Description
Technical field
A kind of copy location mode of optimization in the main design data grid of the present invention belongs to grid computing (Grid) field, is specifically related to grid system, especially the distribution problem of copy in the data grids system.
Background technology
As a kind of new network computing platform, the purpose of grid is for the dynamic Virtual Organization of structure on the Internet resources environment of distribution, isomery, autonomy, and portion realizes striding the resource sharing and the resource collaboration in autonomous territory within it.Data grids are gridding technique application and realizations aspect data management, towards the wide area network isomerous environment, by the several data resource that distributes on the integrated network, the isomerism of shielding bottom physical resource, for upper layer application provides general and reliable data, services, realize integrated visit, storage and the service architecture of distribution, isomery, mass data.
The copy technology is to improve the important technology of availability of data, data access performance and the fault-tolerant ability of data grids system.On the one hand, the node in the data grids system has the height dynamic, adds at any time and leaves system, and the copy technology can improve the property obtained and the System Fault Tolerance of data; On the other hand,, can reduce access delay, eliminate the focus bottleneck and realize load balancing, effectively improve grid system performance by to being duplicated, provide a plurality of data trnascriptions by the data resource of frequent access in the data grid.
Copy creating strategy in the data grids can be divided into dynamic copies construction strategy and static copy creating strategy, and static copy creating strategy is not considered practical situations, creates a Copy general less use in advance on a plurality of nodes of appointment.The dynamic copies construction strategy can satisfy data grids system dynamics and large-scale requirement better.
Dynamic copies is created can be divided into two kinds of different strategies of centralized copy creating and distributed copies establishment again.Napster and Gnutella are the canonical systems that adopts centralized copy management method, because single point failure and extendability is bad is difficult to be applied in the large-scale distributed system.The distributed copies construction strategy of comparative maturity is to be six kinds of dynamic replication strategies of basis proposition with the Data Grid by Ranganathan and Foster: no replication policy, best client's replication strategy, waterfall replication strategy, simple cache policy, buffer memory add waterfall strategy, rapid diffusion strategy.These strategies are primarily aimed at the system features of Data Grid, and the reproduction process of utilization stratification is created copy, does not consider the cost that source data distribution, network communications capability, node storage capacity and copy upgrade.Another kind of dynamic copies construction strategy is based on the copy creating strategy of economic model, mainly the problem of Xie Jueing is to utilize the establishment of some economic model judgment data copies whether can bring " profit " to local system, and determines whether create a Copy in this locality with this.
In addition, in the available data grid system, usually the data resource shared of supposition is static with read-only, if the renewal of data resource is arranged, then occur, so the copy technology creates a Copy on each node in the path of query resource mostly, perhaps create a Copy at the two ends in path as new read-only resource, only consider the difference on the copy amount, do not consider the difference of deposit position.
For the read-only data resource, the many more performances that just can obtain more of copy amount.But, but in having the data grids system of write data resource, use the copy consistency maintenance costs that these methods will be brought unnecessary resources duplication and great number.For example for a data grids system of sharing and be made as main application with the multimedia phonotape and videotape, all nodes have formed a jumbo virtual media storage system under the grid system service support, each node both provided stores service for other node, also can download the making of multimedia file, participation multimedia file and upgrade, the people that are in diverse geographic location can cooperate jointly and finish the multimedia making task.In such data grids system, owing to need safeguard the data consistency of copy, pursuing copy amount merely can greatly increase the cost that copy consistency is safeguarded.
Therefore, for the data grids system that has read-write data resource, the position of copy and quantity can directly have influence on the access performance of resource.Need be optimized distribution to the data copy from system's overall situation angle, determine suitable copy amount and distributing position, the relation between the access performance that the expense that the balance copy is safeguarded and many copies bring promotes improves the overall data access performance.
For content of the present invention is described better, below at the network structure of data grids system do following explanation:
(1) connects with full distribution mode between all nodes in the network,, can be divided into super node and ordinary node according to the network bandwidth of node and the otherness of processing power.The super node do as one likes can be born by node good, that bandwidth is high, line duration is long, other resources, also will bear the management function of the normal operation of maintenance system in sharing own resource, access system.Under the management of super node, network structure presents the tree structure of hierarchy type, and the leaf node in the tree construction is an ordinary node, and other nodes in the tree construction are super nodes.Especially, interior all nodes of tree have formed the territory in the tree structure, and the root node of tree is called the territory super node.
Fig. 1 has shown the network structure example of a super node.For the ease of performance, 1 simplification of accompanying drawing has embodied the administrative relationships between super node and the ordinary node, and the full distribution of having ignored between the node connects.Node set shown in dotted lines in Figure 1 is the territory.
(2) shared resource of each super node ordinary node that local shared resources (LSR) and it are managed is born the responsibility of replica management, is responsible to define the copy Distribution Strategy of optimization.All data resource set notes of super node P management are LR, wherein LR={LR
0, LR
1..., LR
n, n is the number of the data resource be in charge of of super node P.
(3) each node in the network is all safeguarded a local replica catalogue, preserves the copy information of self shared resource.In the local replica catalogue of super node, except the copy information of local shared resources (LSR), also comprise duplicate directory information by all nodes of its management.The replica management service arrangement provides the replica management service on super node, all information of transcript visit comprise request which node initiates, visited which copy, resource response time, and the information such as real-time network bandwidth of node.These data will distribute for the optimization of copy provides sufficient foundation.
Summary of the invention
The objective of the invention is to be applied in the data grids for solving existing copy technology, when shared resource is read-write resource, the copy consistency maintenance cost is big, the problem of access performance difference, and provide a kind of copy location mode of optimization, the visit situation of data resource is calculated quantity and position, the variation of dynamically adapting for data request of access and the network communication situation of copy according to node.This method can the active balance copy amount and the copy maintenance costs between relation, be not only applicable to the data grids system of read-only resource, be particularly useful for having the data grids system of read-write data resource.
A kind of copy location mode of optimization in the data grids of the present invention, as shown in Figure 2, concrete steps are as follows:
Step 1. in whole grid system scope, determine to need to place all territories of copy
Determine that principle is: if the data resource LR that the node visit super node P in certain territory is managed
j(time of 0≤j<n) surpasses resource LR in the total system
jAverage response time, then need in this territory, create a Copy.
(1) super node P calculates data resource LR in the total system
jAverage response time.Resource LR
jAverage response time (Average Response Time ART) is defined as in the unit interval access resources LR
jThe ratio of response time sum and access times of all copies.Computing formula is as follows:
Wherein, Time
jResource LR in the representation unit time interval
jThe number of times that all copies are accessed, RT
kThe k time access resources LR in the representation unit time interval
jThe response time of copy.
(2) super node P is according to the data resource LR that writes down in the local replica catalogue
jThe visit situation of all copies, (note is D to add up not same area
k, k=0,1 ..., m, m are the numbers of system's internal area) interior node visit resource LR
jThe territory average response time.Computing method are as follows:
Wherein, RT
J, iBe territory D
kInterior nodes P
iAccess resources LR
jResponse time,
Be unit interval internal area D
kInterior all node visit data resource LR
jNumber of times.
(3) if
Be territory D
kInterior nodes access resources LR
jAveraging time greater than resource LR in the total system
jAverage response time, territory D then is described
kInterior copy amount is very few, need be at territory D
kIn create a Copy.
Step 2. need in need placing the territory of copy, each to determine the copy amount created
According to territory D
kInterior all request resource LR
jActual response time calculate the number create a Copy.
Introduce mark: in the unit interval, territory D
kIn node visit resource LR
jCopy number note be count, need the quantity note of newly-increased copy to be Δ count, T
AvgThe representative domain interior nodes is to resource LR
jMean access time, T
LowThe representative domain interior nodes is to resource LR
jThe minimum access time.
Consider that the purpose that increases copy amount is to reduce territory D
kInterior to resource LR
jMean access time T
Avg, make T
AvgCan be as far as possible near territory D
kInterior to resource LR
jThe minimum access time T
LowIf minimum access time T
LowWith mean access time T
AvgDiffer bigger, then need to create many copy amounts.Therefore, the quantity of newly-increased copy should satisfy following formula:
The computing formula that can derive the copy amount Δ count that needs create according to formula 3 is as follows:
Step 3. the place of in each need place the territory of copy, determining to be fit to place copy
In the wide area network scope, because network bandwidth characteristics such as dynamic limited, node make Network Transmission postpone to become the most important index that influences data access performance, the processing time of the CPU of node itself and I/O is negligible by contrast.Definition replicator (RF) is used for weighing node and is fit to place resource LR
jThe degree of copy.Node P
iThe computing formula of replicator as follows:
Wherein, Memory
iBe node P
iFree memory, filesize is resource LR
jThe size that is taken up space, AvgBW
iBe node P
iAverage available bandwidth.
Because super node P does not have territory D
kSo the information of interior all nodes is territory D
kTerritory D is transferred in definite work of interior copy position
kThe territory super node finish.Node P sends a request message, notice territory D
kThe territory super node place of determine placing copy.
Territory D
kThe territory super node receive request message after, finish following work: territory D
kIn do not have resource LR
jThe node of copy belongs to place to be selected, and the territory super node calculates the replicator of each node to be selected, and the replicator value of node is big more, and suitable more this copy of placement of this node is described.Select Δ count to have the place of the node of maximum replicator as copy creating.According to the definition of replicator, on these nodes, place copy as can be known, can effectively reduce the access time of resource.Territory D
kThe territory super node node address that Δ count is fit to place copy is returned to super node P.
Step 4. the copy placement location in appointment creates a Copy
Be responsible for creating a Copy on the place of Δ count suitable placement copy by super node P, the copy of finishing optimization distributes.
Beneficial effect
The copy location mode of the dynamic self-adapting that the present invention provides, calculate the number of newly-increased copy by the average response time of resource, variation and network communication situation according to request of access are calculated the copy placement location, can and reduce between the copy consistency maintenance costs in the minimizing resource response time to obtain active balance.Its rationality, advantage and good effect are as follows:
(1) the resource access situation in the statistical unit time is visited the average response time of this resource to each territory computational fields interior nodes.In the territory average response time can this territory interior nodes of concentrated expression to the deficiency extent and the visit temperature of data resource.The distribution copy can greatly improve or reduce the cost of data access in the big territory of average response time.
(2) calculate the quantity of newly-built copy according to the average response time of territory interior nodes access resources, can effectively control the quantity of copy, reduce the cost that copy consistency is safeguarded.
(3) notion of proposition replicator is calculated the best place that is fit to place copy according to user capture feature and node real-time bandwidth data etc., realizes the optimization distribution of copy.
(4) be suitable for and any data grids system with super node, not only be suitable for the system that has the read-only data resource, especially be fit to have the system of read-write data resource, application is wide, has solved existing grid copy technology and has not been suitable for the defective that has read-write data resource.
(5) simple to operate, do not rely on center control nodes, only can finish by simple coordination between the super node.
(6) the method run cost is little, and majority operation is finished in super node this locality, seldom takies the wide area network bandwidth.
Description of drawings
Fig. 1 is the schematic network structure with data grids system of super node;
Fig. 2 is the operational flowchart of copy location mode;
Fig. 3 is resource LR
jCopy distribution situation synoptic diagram;
Fig. 4 be in the unit interval node in the grid system to resource LR
jVisit situation example.
Embodiment
The invention will be further described below in conjunction with the drawings and specific embodiments.
Each super node is responsible to define the copy Distribution Strategy of optimization to self shared resource and the shared resource of the self-administered ordinary node responsibility of bearing replica management.When the visit situation of all copies under the self-administered data resource of the periodic record of super node comprises, which node in which territory sent request, visited which copy, the response time of resource etc.These copy visit informations are recorded in the local replica catalogue of super node.
For example, resource LR
jThe copy distribution situation as shown in Figure 3, shown two territories among the figure, the territory super node is respectively node A and Node B.Resource LR
jHave 4 copies, be placed on respectively on node A1, node A12, node A2 and the Node B 11.Resource LR
jBe the shared resource of node A12, because node A12 is an ordinary node, so the father node P of node A12 is responsible for resource LR
jReplica management, record resource LR in the local replica catalogue of node P
jAll copy informations, shown in dotted line in the accompanying drawing 3.
Fig. 4 has shown that the node in the grid system is to resource LR in the unit interval
jThe visit situation, among Fig. 4 with the band arrow line segment represent that all are to resource LR
jRequest of access, the line segment initiating terminal represents that resource initiates a some territory, place, the key head indication is represented the concrete copy of visiting.The response time of the numeral resource on line segment next door.
Suppose the data resource LR that super node P is in charge of
jNeed create a Copy, then super node P carries out following operation:
1. in whole grid system scope, determine to need to place all territories of copy
Node P is according to resource LR in the unit interval
jResponse time, according to formula 1 computational resource LR
jAverage response time be 118, promptly
According to formula 2 computational fields A interior nodes access resources LR
jAverage response time be 60, promptly
Territory B interior nodes access resources LR
jThe average access cost be 140, promptly
Territory C interior nodes access resources LR
jThe average access cost be 200, promptly
Because ART
J, A<ART
j, ART
J, B>ART
j, ART
J, C>ART
jSo, need be in territory C and territory B establishing resource LR
jCopy.
2. in need placing the territory of copy, each determines the quantity that creates a Copy
According to the copy amount that needs in formula 4 calculating accompanying drawings 4 examples to create.The copy number of visiting in the B of territory is 2, and then the copy number that need create in the B of territory is:
The copy number of visiting in the C of territory is 1, and the copy number that need create at territory C is:
3. the place of in each need place the territory of copy, determining to be fit to place copy
Node P sends a request message, the place that the territory super node B of notice territory B and the territory super node C of territory C determine to place copy.Because Node B and node C have the information of all nodes in the territory, computational resource LRj is placed on the replicator (RF) on each node to be selected respectively.2 nodes with maximum replicator are as the copy creating place in the Node B selection territory; 1 node with maximum replicator is as the copy creating place in the node C selection territory.At last, Node B and the node C node address that will be fit to place copy returns to super node P.
4. the nodal information that returns according to territory super node B and territory super node C of node P is created two latest copys respectively respectively on the node of territory B appointment, creates a latest copy on the C specified node of territory.
Claims (1)
1. copy location mode in the data grids of an optimization is characterized in that: at the data grids system with super node, at first by super node P self-administered all resources are monitored in real time, when needs are resource LR
jWhen creating a Copy, resource LR in the super node P statistical unit time interval
jThe visit situation, calculate resource LR in the total system
jAverage response time and each territory in all node visit resource LR
jAverage response time, specifically adopt computing formula
Calculate resource LR in the total system
jAverage response time ART
j, Time wherein
jResource LR in the representation unit time interval
jThe number of times that all copies are accessed, RT
kThe k time access resources LR in the representation unit time interval
jThe response time of copy; Adopt computing formula
Computational fields D
kInterior nodes access resources LR
jAverage response time
RT wherein
J, iBe territory D
kInterior nodes P
iAccess resources LR
jResponse time,
Be unit interval internal area D
kInterior all node visit data resource LR
jNumber of times; If certain territory D
kInterior nodes access resources LR
jAverage response time
Greater than resource LR in the total system
jAverage response time ART
j, resource LR then is described
jAt territory D
kInterior copy amount is very few, need be at territory D
kIn create a Copy, wherein, 0≤j<n, n are the numbers of resource in the whole data grids system; K=0,1 ..., m, m are the numbers of data grids system internal area, i=0, and 1 ..., t, t are territory D
kThe number of interior nodes;
Secondly, at the territory D that needs create a Copy
kIn, if territory D in the unit interval
kIn node visit resource LR
jCopy number note be count, then territory D
kThe quantity that needs newly-increased copy
Wherein, T
AvgRepresentative domain D
kInterior nodes access resources LR
jMean access time, T
LowRepresentative domain D
kInterior nodes access resources LR
jThe minimum access time;
Then, the D that creates a Copy by needs
kThe copy placement location that be responsible for to determine is fit to of territory super node; Territory D
kIn do not have resource LR
jThe node of copy belongs to node to be selected, and the territory super node calculates each node P to be selected
iReplicator
Memory wherein
iBe node P
iFree memory, filesize is resource LR
jThe size that is taken up space, AvgBW
iBe node P
iAverage available bandwidth, the replicator value of node is big more, and suitable more this copy of placement of this node is described, selects Δ count the node with maximum replicator as the optimal node that is fit to place copy;
At last, the optimal node at determined suitable placement copy creates a Copy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102654216A CN101751309B (en) | 2009-12-28 | 2009-12-28 | Optimized transcript distributing method in data grid |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102654216A CN101751309B (en) | 2009-12-28 | 2009-12-28 | Optimized transcript distributing method in data grid |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101751309A CN101751309A (en) | 2010-06-23 |
CN101751309B true CN101751309B (en) | 2011-06-29 |
Family
ID=42478319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102654216A Expired - Fee Related CN101751309B (en) | 2009-12-28 | 2009-12-28 | Optimized transcript distributing method in data grid |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101751309B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102137141B (en) | 2010-10-11 | 2014-01-01 | 华为技术有限公司 | Data storage control method and data storage control device |
CN102156730B (en) * | 2011-04-07 | 2013-03-20 | 江苏省电力公司 | File storage dynamic aggregation based optimization method |
CN102375893A (en) * | 2011-11-17 | 2012-03-14 | 浪潮(北京)电子信息产业有限公司 | Distributed file system and method for establishing duplicate copy |
CN102497394B (en) * | 2011-11-28 | 2014-01-15 | 中国科学院研究生院 | Duplicate file placement method in content distribution network based on optimized model |
CN102801772B (en) * | 2012-03-07 | 2015-05-27 | 武汉理工大学 | DCell network-oriented energy-saving copy placement method for cloud computing environment |
CN102984280B (en) * | 2012-12-18 | 2015-05-20 | 北京工业大学 | Data backup system and method for social cloud storage network application |
CN103095812B (en) * | 2012-12-29 | 2016-04-13 | 华中科技大学 | A kind of copy creating method based on user's request response time |
CN103491128B (en) * | 2013-06-13 | 2016-08-10 | 中国科学院大学 | The optimization laying method of popular Resource Replica in a kind of peer-to-peer network |
CN107465706B (en) * | 2016-06-06 | 2021-06-18 | 中国船舶工业系统工程研究院 | Distributed data object storage device based on wireless communication network |
CN108319618B (en) * | 2017-01-17 | 2022-05-06 | 阿里巴巴集团控股有限公司 | Data distribution control method, system and device of distributed storage system |
CN106911777A (en) * | 2017-02-24 | 2017-06-30 | 郑州云海信息技术有限公司 | A kind of data processing method and server |
CN108924203B (en) * | 2018-06-25 | 2021-07-27 | 深圳市金蝶天燕云计算股份有限公司 | Data copy self-adaptive distribution method, distributed computing system and related equipment |
CN114095573B (en) * | 2021-10-19 | 2023-11-28 | 陕西悟空云信息技术有限公司 | Content copy placement method of CDN-P2P network based on edge cache |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101022396A (en) * | 2007-03-15 | 2007-08-22 | 上海交通大学 | Grid data duplicate management system |
CN101187931A (en) * | 2007-12-12 | 2008-05-28 | 浙江大学 | Distribution type file system multi-file copy management method |
CN101340458A (en) * | 2008-07-09 | 2009-01-07 | 南京邮电大学 | Grid data copy generation method based on time and space limitation |
-
2009
- 2009-12-28 CN CN2009102654216A patent/CN101751309B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101022396A (en) * | 2007-03-15 | 2007-08-22 | 上海交通大学 | Grid data duplicate management system |
CN101187931A (en) * | 2007-12-12 | 2008-05-28 | 浙江大学 | Distribution type file system multi-file copy management method |
CN101340458A (en) * | 2008-07-09 | 2009-01-07 | 南京邮电大学 | Grid data copy generation method based on time and space limitation |
Also Published As
Publication number | Publication date |
---|---|
CN101751309A (en) | 2010-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101751309B (en) | Optimized transcript distributing method in data grid | |
CN102055650B (en) | Load balance method and system and management server | |
CN101753625B (en) | Method for deployment of copy service and copy establishment in peer-to-peer network environment | |
Li et al. | A cost-effective mechanism for cloud data reliability management based on proactive replica checking | |
CN101370030B (en) | Resource load stabilization method based on contents duplication | |
Zeng et al. | Research on cloud storage architecture and key technologies | |
CN110213352B (en) | Method for aggregating dispersed autonomous storage resources with uniform name space | |
CN103458044B (en) | The metadata sharing management method of many storage clusters under a kind of wide area network-oriented environment | |
CN101697526A (en) | Method and system for load balancing of metadata management in distributed file system | |
CN108512908A (en) | A kind of cloud storage fault tolerant mechanism based on Ceph and the web-based management platform based on Ceph | |
Rajalakshmi et al. | An improved dynamic data replica selection and placement in cloud | |
CN117271583A (en) | System and method for optimizing big data query | |
Zhang et al. | Data replication placement strategy based on bidding mode for cloud storage cluster | |
CN100583802C (en) | Duplicate copy selection method based on global minimum access price | |
US8543700B1 (en) | Asynchronous content transfer | |
Sun et al. | Dynamic data replication based on access cost in distributed systems | |
Zhang et al. | A modeling reliability analysis technique for cloud storage system | |
CN117076391B (en) | Water conservancy metadata management system | |
CN107547657A (en) | A kind of method, apparatus and storage medium numbered based on one point data in cloud storage system | |
CN101895577A (en) | Distribution method of network sharing resources | |
CN104052611B (en) | Cloud storage system data availability maintenance method and device thereof | |
KR100989490B1 (en) | Sysem and its method for load distribution between servers | |
Le et al. | An evaluation of power-proportional data placement for Hadoop distributed file systems | |
CN102096723A (en) | Data query method based on copy replication algorithm | |
Sathya et al. | Replication strategies for data grids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110629 Termination date: 20121228 |