CN107124306A

CN107124306A - Content delivery network server optimization dispositions method under network function virtualized environment

Info

Publication number: CN107124306A
Application number: CN201710270020.4A
Authority: CN
Inventors: 孙罡; 杨广华; 廖丹; 虞红芳; 孙健
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-04-24
Filing date: 2017-04-24
Publication date: 2017-09-01
Anticipated expiration: 2037-04-24
Also published as: CN107124306B

Abstract

Aiming at the defects of the existing CDN copy placement scheme, the present invention proposes a content delivery network server optimization deployment method under the network function virtualization environment. The invention utilizes the spectral clustering model, taking into account the data and topological characteristics of the physical network. The adjacency matrix is established by using the topological relationship of the underlying network and the traffic demand of the nodes, and the physical network nodes are divided into several sets by using the spectral clustering algorithm. Then traverse the nodes in each set, and calculate the cost required for it to serve as the replica center node to provide content services for the rest of the set. Select the node with the lowest cost and use it as the placement node of the replica server to reduce the cost.

Description

Optimal deployment method of content delivery network server in network function virtualization environment

技术领域technical field

本发明属于网络技术领域，具体涉及网络功能虚拟化环境下的内容交付网络服务器优化部署方法。The invention belongs to the field of network technology, and in particular relates to a content delivery network server optimization deployment method under a network function virtualization environment.

背景技术Background technique

随着网络技术的发展，尤其是移动互联网技术的快速发展，使得网络内容从文字图片内容的传播逐渐变为视频形式传播。这使得网络流量呈现出爆炸式增长，增大了网络发生拥堵的可能。网络的拥堵导致用户的体验下降，从而使得内容提供商损失了大量的客户，盈利减少。With the development of network technology, especially the rapid development of mobile Internet technology, the dissemination of network content has gradually changed from text and image content to video dissemination. This results in explosive growth of network traffic and increases the possibility of network congestion. Network congestion leads to a decline in user experience, which causes content providers to lose a large number of customers and reduce profits.

而CDN(内容交付网络)的发展，使得内容副本服务器可以部署在离用户较近的边缘网络上，使得用户可以在较短的时间内获取网络内容，增加了用户体验。如图一(a)所示为传统网络，主要分为三个部件：服务器、通信链路和终端用户。在传统的网络中，终端用户请求服务器内容或服务器为用户提供服务，要经过的通信链路较长。当终端用户较多的时候，网络上的流量较大，容易造成拥塞，同时只有一个服务器为用户服务，也很有可能造成服务器负载过大导致宕机。如图一(b)所示在CDN网络中，主要的部件有四个：服务器、通信链路、副本服务器和终端用户。其中服务器是提供内容或者内容更新删除的源服务器，所有的网络内容都从此服务器流入网络。而直接与此服务器相连的是副本服务器，且副本服务器的位置在离终端用户较为接近。而用户所需的内容直接由较近的副本服务器来提供服务，无需跨过大量的网络设备到达原始服务器索取内容。因此可以提高服务质量，减小用户的等待延迟。而CDN技术中一个关键问题就是CDN内容副本服务器的放置问题，现有的副本放置算法都是基于专用的CDN物理服务器来进行部署。而专用的服务器需要耗费大量的人力来部署。同时设备的维护以及更新需要大量的人力物力。老旧设备如果处理不好，也会造成环境污染。同时现有的CDN副本服务器放置算法有明显的缺陷，不能兼顾底层网络的拓扑以用户的请求流量数据来部署。因此及无法有效的降低核心网络流量，提高用户体验。With the development of CDN (Content Delivery Network), the content replica server can be deployed on the edge network closer to the user, so that the user can obtain the network content in a relatively short period of time, which increases the user experience. As shown in Figure 1(a), the traditional network is mainly divided into three components: server, communication link and end user. In a traditional network, end users request server content or servers provide services to users, and the communication link to go through is long. When there are many end users, the traffic on the network is large, which is likely to cause congestion. At the same time, there is only one server serving users, which may also cause excessive load on the server and cause downtime. As shown in Figure 1(b), in the CDN network, there are four main components: server, communication link, replica server and end user. The server is a source server that provides content or updates and deletes content, and all network content flows into the network from this server. The replica server is directly connected to this server, and the location of the replica server is relatively close to the end user. The content required by the user is directly served by the closer copy server, without going across a large number of network devices to the original server to obtain the content. Therefore, the service quality can be improved and the user's waiting delay can be reduced. A key issue in CDN technology is the placement of CDN content replica servers. Existing replica placement algorithms are deployed based on dedicated CDN physical servers. Dedicated servers require a lot of manpower to deploy. At the same time, the maintenance and updating of equipment requires a lot of manpower and material resources. If the old equipment is not handled well, it will also cause environmental pollution. At the same time, the existing CDN replica server placement algorithm has obvious defects, and cannot take into account the topology of the underlying network and deploy it based on user request traffic data. Therefore, it is impossible to effectively reduce core network traffic and improve user experience.

在现有的CDN副本放置算法的研究中，一种是将CDN副本服务器的放置问题抽象成工厂选址问题。然后利用随机算法将其进行放置。具体步骤为：首先确定承建CDN网络的物理拓扑结构，确定各个节点的流量需求以及不同节点间的带宽；然后在所有物理节点中随机选取适当的节点来放置副本服务器，计算各个节点到最优(距离最近或负载均衡等度量)的副本中心的成本总和。重复上述步骤10次，取成本总和最小的那次作为最终的解决方案。理论依据证明，在重复10次左右，随机算法的结果较为稳定，效果较好。但是依然存在以下不足：(1)随机算法具有较大的波动性，所产生的解决方案具有很大的局限性。(2)随机算法对物理网络的拓扑结构不敏感，不能根据不同的网络拓扑结构进行调整，算法的拓扑敏感性较差。(3)随机算法对物理网络的节点流量需求，以及带宽资源不敏感，不能根据实际的底层网络需求进行调整，算法的数据敏感性较差。In the existing research on CDN replica placement algorithm, one is to abstract the placement problem of CDN replica server into the problem of factory location selection. It is then placed using a random algorithm. The specific steps are: first determine the physical topology of the CDN network, determine the traffic requirements of each node and the bandwidth between different nodes; then randomly select appropriate nodes from all physical nodes to place the replica server, and calculate each node to the optimal ( The sum of the costs of the replica centers for metrics such as closest distance or load balancing. Repeat the above steps 10 times, and take the one with the smallest sum of costs as the final solution. Theoretical evidence proves that the results of the random algorithm are more stable and the effect is better when it is repeated about 10 times. But there are still the following deficiencies: (1) The random algorithm has a large volatility, and the resulting solutions have great limitations. (2) The random algorithm is not sensitive to the topological structure of the physical network, and cannot be adjusted according to different network topological structures, and the topology sensitivity of the algorithm is poor. (3) The random algorithm is not sensitive to the node traffic requirements of the physical network and bandwidth resources, and cannot be adjusted according to the actual underlying network requirements, and the data sensitivity of the algorithm is poor.

另一种对CDN副本放置的方式是：基于贪心算法的CDN副本放置方案，利用迭代的思想降低了放置算法的复杂度。算法的具体过程如下：遍历所有物理节点，计算每个物理节点作为副本中心节点为其他节点服务所消耗的成本总和。选取成本总和最小的节点作为副本中心节点。迭代上述过程，直到所有的节点均放置完成。该放置方案的缺点体现在：(1)贪心算法很容易陷入局部最优解。由于先放置的副本中心对后放置的副本中心具有影响，所以很难得到全局性的最优。(2)贪心算法对底层网络的数据是敏感的，但对底层网络的拓扑敏感性较低。Another way to place CDN copies is: a greedy algorithm-based CDN copy placement scheme, which reduces the complexity of the placement algorithm by using the idea of iteration. The specific process of the algorithm is as follows: traverse all physical nodes, and calculate the sum of the costs consumed by each physical node as a copy center node serving other nodes. The node with the smallest sum of costs is selected as the replica center node. Iterate the above process until all nodes are placed. The disadvantages of this placement scheme are as follows: (1) The greedy algorithm is easy to fall into the local optimal solution. Since the replica center placed first has an influence on the replica center placed later, it is difficult to obtain a global optimum. (2) The greedy algorithm is sensitive to the data of the underlying network, but is less sensitive to the topology of the underlying network.

发明内容Contents of the invention

本发明的发明目的在于：针对现有CDN副本放置方案的缺陷，提出了一种NFV(网络功能虚拟化，Network Function Virtualization)环境下的CDN网络副本服务器放置方法。本发明利用谱聚类模型，兼顾了物理网络的数据和拓扑特性。利用底层网络的拓扑关系以及节点的流量需求建立邻接矩阵，利用谱聚类算法将物理网络节点划分成几个集合。然后遍历每个集合中的节点，计算其作为副本中心节点来为集合中其余节点提供内容服务所需成本。选取成本最小的节点，使其作为副本服务器的放置节点，以此来降低成本。The object of the present invention is to propose a CDN network replica server placement method under the NFV (Network Function Virtualization) environment for the defects of the existing CDN replica placement scheme. The invention utilizes the spectral clustering model, taking into account the data and topological characteristics of the physical network. The adjacency matrix is established by using the topological relationship of the underlying network and the traffic demand of the nodes, and the physical network nodes are divided into several sets by using the spectral clustering algorithm. Then traverse the nodes in each set, and calculate the cost required for it to serve as the replica center node to provide content services for the rest of the set. Select the node with the lowest cost and use it as the placement node of the replica server to reduce the cost.

本发明的网络功能虚拟化环境下的内容交付网络服务器优化部署方法，包括下列步骤：步骤1：基于承建内容交付网络的物理拓扑结构，将物理节点中的服务器作为虚拟节点，确定待部署的虚拟网络拓扑结构；The content delivery network server optimization deployment method in the network function virtualization environment of the present invention includes the following steps: Step 1: Based on the physical topology structure of the content delivery network, the servers in the physical nodes are used as virtual nodes, and the virtual nodes to be deployed are determined. Network Topology;

步骤2：根据虚拟网络拓扑结构和虚拟节点的节点流量需求计算相似度矩阵W：Step 2: Calculate the similarity matrix W according to the virtual network topology and node traffic requirements of virtual nodes:

计算任意两个虚拟节点之间的相似度w_ij：若两个虚拟节点之间有链路连接(即虚拟节点之间存直连关系)，则相似度w_ij＝(|d_i-d_j|)^2.5,i≠j；否则，相似度w_ij＝0；其中i、j为虚拟节点标识符，d_i、d_j表示不同虚拟节点的节点流量需求；Calculate the similarity w _ij between any two virtual nodes: if there is a link between the two virtual nodes (that is, there is a direct connection between the virtual nodes), then the similarity w _ij =(|d _i -d _j |) ^2.5 , i≠j; otherwise, the similarity w _ij =0; where i, j are virtual node identifiers, d _i , d _j represent the node flow requirements of different virtual nodes;

由相似度w_ij得到相似度矩阵n表示虚拟节点数目；Get the similarity matrix from the similarity w _ij n represents the number of virtual nodes;

步骤3：根据公式D＝dia(dr₁,dr₂,...,dr_n)得到度数矩阵D，由L＝D-W得到拉普拉斯矩阵L，其中dia为对角符号，对角元 Step 3: According to the formula D=dia(dr ₁ ,dr ₂ ,...,dr _n ), the degree matrix D is obtained, and the Laplace matrix L is obtained by L=DW, wherein dia is a diagonal symbol, and the diagonal elements

并对拉普拉斯矩阵L进行归一化处理，得到归一化的拉普拉斯矩阵L_sym：L_ysm＝D^-1/ ²LD^-1/2；And normalize the Laplacian matrix L to obtain a normalized Laplacian matrix L _sym : _Lysm = D ^-1/ ² LD ^-1/2 ;

步骤4：计算L_ysm的n个特征值及特征向量，将前k个最小特征值对应的特征向量u₁,...,u_k按列排列形成矩阵U_n×k＝(u₁,...,u_k)，其中k表示待部署的副本服务器数目；Step 4: Calculate the n eigenvalues and eigenvectors of L _ysm , arrange the eigenvectors u ₁ ,...,u _k corresponding to the first k smallest eigenvalues in columns to form a matrix U _n×k =(u ₁ ,. ..,u _k ), where k represents the number of replica servers to be deployed;

对矩阵U_n×k按行归一化得到矩阵T＝(t_ij)_n×k,其中矩阵元素 Normalize the matrix U _n×k by row to get the matrix T=(t _ij ) _n×k , where the matrix elements

对矩阵T按行取得向量y₁,y₂,…,y_n，其中向量y_i的下标i对应矩阵T的第i行，且i＝1,…,n；Obtain the vectors y ₁ , y ₂ ,...,y _n for the matrix T by row, where the subscript i of the vector y _i corresponds to the i-th row of the matrix T, and i=1,...,n;

步骤5：对y₁,...,y_n进行k均值聚类处理，得到k个聚类结果为C₁,C₂,...,C_k(其中每个集合中的元素均是代表y向量的下标值)，由每个聚类结果包含的虚拟节点标识符得到k个节点聚类集合A_m,m＝1,…,k，即A_m＝{i:i∈C_i},m＝1,…,k；Step 5: Carry out k-means clustering processing on y ₁ ,...,y _n , and obtain k clustering results as C ₁ ,C ₂ ,...,C _k (the elements in each set represent The subscript value of the y vector), from the virtual node identifier contained in each clustering result to obtain k node clustering sets A _m , m=1,...,k, that is, A _m ={i:i∈C _i } ,m=1,...,k;

如：聚类结果中向量y₁,y₄,y₇∈C₁，则聚类集合A₁由虚拟节点v₁,v₄,v₇组成。For example: the vectors y ₁ , y ₄ , y ₇ ∈ C ₁ in the clustering result, then the clustering set A ₁ is composed of virtual nodes v ₁ , v ₄ , v ₇ .

步骤6：对于每个节点聚类集合A_m，遍历A_m中的每个虚拟节点v_r，计算虚拟节点v_r作为中心节点的成本总和其中c_ir表示从节点v_i到节点v_r的链路长度，即节点v_i到节点v_r的跳数，p为单位带宽成本；选取成本总和cost最小的虚拟节点作为每个A_m的副本中心节点。Step 6: For each node clustering set A _m , traverse each virtual node v _r in A _m and calculate the sum of the costs of the virtual node v _r as the central node where c _ir represents the link length from node v _i to node v _r , that is, the number of hops from node v _i to node v _r , and p is the unit bandwidth cost; select the virtual node with the smallest total cost as the copy of each A _m central node.

由于采用了上述技术方案，本发明的有益效果是：Owing to adopted above-mentioned technical scheme, the beneficial effect of the present invention is:

(1)降低部署成本。本发明充分考虑了网络的拓扑和数据信息。可以根据网络的拓扑和数据信息有效的选择合理的节点来作为副本中心节点来兼顾拓扑和流量需求信息从而降低成本。(1) Reduce deployment costs. The invention fully considers the topology and data information of the network. According to the topology and data information of the network, a reasonable node can be effectively selected as the replica center node to take into account the topology and traffic demand information to reduce costs.

(2)降低服务延迟。本发明利用谱聚类算法来将网络拓扑进行分割，更能充分的考虑到网络的延迟特性，因此可以将副本中心的选取分散到离用户更近的节点以降低网络延迟。(2) Reduce service delay. The present invention uses a spectrum clustering algorithm to divide the network topology, which can fully consider the delay characteristics of the network, so the selection of the copy center can be distributed to nodes closer to the user to reduce the network delay.

(3)降低核心网络流量。由于本发明更能有效的分散副本中心选取的位置到离用户更近的节点上，所以CDN网络和用户间的流量需求只在副本中心到用户之间来回传递，减小了核心网络的流量，从而特高了网络的健壮性减低了拥堵发生的可能。(3) Reduce core network traffic. Since the present invention can more effectively disperse the location selected by the replica center to a node closer to the user, the traffic demand between the CDN network and the user is only transmitted back and forth between the replica center and the user, reducing the traffic of the core network. Thereby, the robustness of the network is extremely high and the possibility of congestion is reduced.

附图说明Description of drawings

图1是传统网络、CDN网络模型图。Figure 1 is a model diagram of a traditional network and a CDN network.

图2是本发明的图例网络的虚拟化过程示意图。Fig. 2 is a schematic diagram of the virtualization process of the example network of the present invention.

图3是本发明的聚类处理流程示意图。Fig. 3 is a schematic diagram of the clustering processing flow of the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚，下面结合实施方式和附图，对本发明作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the implementation methods and accompanying drawings.

本发明的网络功能虚拟化环境下的内容交付网络服务器优化部署方法的实施包括NFV环境下的CDN网络模型化、谱聚类算法划分子图和副本中心选取，具体为：The implementation of the content delivery network server optimization deployment method in the network function virtualization environment of the present invention includes CDN network modeling in the NFV environment, spectral clustering algorithm division of sub-graphs and replica center selection, specifically:

(1)NFV环境下的CDN网络模型(1) CDN network model in NFV environment

图2(a)所示为实际的物理网络拓扑，包含通用的服务器、大型数据中心和云数据中心等计算节点以及路由器交换机等功能节点。在网络虚拟化(NV)环境中，物理网络的一切节点和带宽都可以虚拟化为计算资源和带宽资源，如图2(b)所示。而NFV技术可以在不同的计算节点部署相同的网络功能软件或者在同一计算节点上部署不同的网络功能软件。在本专利中，我们主要是在不同的计算节点上部署相同的网络功能。因此原问题在底层的物理网络上部署副本服务器转变为在抽象的上部署副本服务器。这样做可以屏蔽掉底层的不同网络实体，而专注于副本服务器的选取。Figure 2(a) shows the actual physical network topology, including computing nodes such as general servers, large data centers and cloud data centers, and functional nodes such as routers and switches. In the network virtualization (NV) environment, all nodes and bandwidth of the physical network can be virtualized into computing resources and bandwidth resources, as shown in Figure 2(b). The NFV technology can deploy the same network function software on different computing nodes or deploy different network function software on the same computing node. In this patent, we mainly deploy the same network function on different computing nodes. Therefore, the original problem of deploying a replica server on the underlying physical network is transformed into deploying a replica server on an abstract network. Doing so can shield different underlying network entities and focus on the selection of replica servers.

(2)谱聚类算法划分子图(2) Spectral clustering algorithm to divide subgraphs

令G＝(V,E)表示图1(b)中的网络拓扑图。其中V表示节点聚类集合，E表示链路结合。d₁,d₂,...,d_n表示节点聚类集合中各个节点的流量需求，下标n表示总的节点个数。如果节点和节点之间有链路连接，利用公式(1)计算的相似度，否则相似度均设为0。Let G = (V, E) represent the network topology in Figure 1(b). Among them, V represents the clustering set of nodes, and E represents the combination of links. d ₁ ,d ₂ ,...,d _n represent the traffic demand of each node in the node clustering set, and the subscript n represents the total number of nodes. If there is a link connection between nodes, use the similarity calculated by formula (1), otherwise the similarity is set to 0.

w_ij＝(|d_i-d_j|)^2.5,(i≠j) (1)w _ij ＝(|d _i -d _j |) ^2.5 ,(i≠j) (1)

利用公式(2)求的相似度的邻接矩阵The adjacency matrix of similarity calculated by formula (2)

矩阵D＝dia(dr₁,dr₂,...,dr_n)为度数矩阵，其中dr_i(i＝1,…,n)由公式(3)求出Matrix D=dia(dr ₁ ,dr ₂ ,...,dr _n ) is a degree matrix, where dr _i (i=1,...,n) is obtained by formula (3)

dia为对角符号，表示对角矩阵，其余矩阵项为零。dia is the diagonal symbol, which means a diagonal matrix, and the remaining matrix entries are zero.

利用公式(4)求得拉普拉斯矩阵L：Use the formula (4) to obtain the Laplacian matrix L:

L＝D-W (4)L=D-W (4)

利用公式(5)求得拉普拉斯矩阵L_sym Use formula (5) to find the Laplacian matrix L _sym

L_ysm＝D^-1/2LD^-1/2 (5)L _ysm = D ^-1/2 LD ^-1/2 (5)

利用matlab求得L_ysm的所有特征值λ₁,λ₂,...,λ_n其中λ₁≤λ₂≤...λ_n及其对应的特征向量u₁,u₂,...,u_n(特征向量为列向量u_i＝(u_1,i,u_2,i,...,u_n,i)^T)。取特征值λ₁,...,λ_k(k为要放置的副本服务器个数，此值应小于n)对应的特征向量u₁,...,u_k。将此k个特征向量按列排列形成矩阵U_n×k＝(u₁,...,u_k)。求矩阵U按行所得归一化的矩阵T＝(t_ij)_n×k,其中矩阵元素t_ij按照公式(6)求得Use matlab to obtain all eigenvalues λ ₁ , λ ₂ ,...,λ _n of L _ysm where λ ₁ ≤λ ₂ ≤...λ _n and their corresponding eigenvectors u ₁ ,u ₂ ,..., u _n (the feature vector is the column vector u _i =(u _1,i ,u _2,i ,...,u _n,i ) ^T ). Take the eigenvectors u ₁ ,...,u _k corresponding to the eigenvalues λ ₁ ,...,λ _k (k is the number of replica servers to be placed, this value should be less than n). The k eigenvectors are arranged in columns to form a matrix U _n×k =(u ₁ ,...,u _k ). Calculate matrix U and get normalized matrix T=(t _ij ) _n×k by row, where matrix element t _ij is obtained according to formula (6)

对矩阵T按行取得向量y₁,y₂,...,y_i,...,y_n，其中向量下标i对应矩阵T的第i行。利用k-mean算法对y₁,...,y_n进行聚类。Obtain the vectors y ₁ , y ₂ ,...,y _i ,...,y _n of the matrix T row by row, where the vector subscript i corresponds to the i-th row of the matrix T. Use the k-mean algorithm to cluster y ₁ ,...,y _n .

其中k-mean算法的步骤如下在n个向量中，首先随机选取k个向量作为中心。The steps of the k-mean algorithm are as follows. Among the n vectors, first randomly select k vectors as the centers.

②遍历其他非中心向量，按照公式(7)计算非中心向量到k个中心向量的距离②Traverse other non-central vectors, and calculate the distance from non-central vectors to k central vectors according to formula (7)

③对每个非中心向量，将此非中心向量与距离最小的中心向量划为一类。得到聚类结果C₁',...,C'_k，其中每个集合中的元素均是代表y向量的下标值。③ For each non-central vector, classify this non-central vector and the central vector with the smallest distance into one category. The clustering results C ₁ ',...,C' _k are obtained, where the elements in each set are subscript values representing the y vector.

对每个聚类结果，重新计算出一个中心向量，此向量到聚类结果中其他节点的距离之和最小。因此得到了k个新的中心向量。For each clustering result, a center vector is recalculated, and the sum of distances from this vector to other nodes in the clustering result is the smallest. Thus k new center vectors are obtained.

⑤重复步骤②-，直到新更新的k个中心与更新前的中心距离差小于。⑤Repeat step ②- until the distance difference between the newly updated k centers and the center before updating is less than .

⑥得到k-means聚类法对向量y₁,...,y_n的聚类结果为C₁,C₂,...,C_k(其中每个集合中的元素均是代表y向量的下标值)⑥ Obtain the k-means clustering method for the clustering results of vectors y ₁ ,...,y _n as C ₁ ,C ₂ ,...,C _k (the elements in each set represent the y vector subscript value)

通过k-means算法后，我们得到了聚类结果C₁,...,C_k，其中每个集合中的元素也与网络拓扑G＝(V,E)中节点的下标相对应。利用公式(8)求得网络拓扑的聚类集合，即节点聚类集合A₁,A₂,...,A_k：After the k-means algorithm, we get the clustering results C ₁ ,...,C _k , where the elements in each set also correspond to the subscripts of the nodes in the network topology G=(V,E). Use the formula (8) to obtain the clustering set of the network topology, that is, the node clustering set A ₁ , A ₂ ,...,A _k :

A_m＝{i:i∈C_i},m＝1,…,k (8)A _m ={i:i∈C _i }, m=1,...,k (8)

到此为止，图的聚类结束，我们得到了不同的聚类集合，及A₁,...,A_k。接下来要在每个聚类集合中选取一个合适的节点作为副本中心节点来为所有其他需求节点服务。So far, the graph clustering is over, and we get different cluster sets, and A ₁ ,...,A _k . Next, select a suitable node in each cluster set as the replica center node to serve all other demand nodes.

(3)副本中心选取(3) copy center selection

对于每个聚类集合，遍历聚类集合的每个节点，计算其为为集合中其他节点服务的成本总和cost，具体计算见公式(9)：For each clustering set, traverse each node of the clustering set, and calculate it as the sum cost of serving other nodes in the set, see formula (9) for the specific calculation:

d_i表示聚类集合A_m中节点v_i的流量需求，v_r表示，c_ir表示从节点v_i到节点v_j的链路长度(用跳数表示)，p为单位带宽成本。d _i represents the traffic demand of node v _i in clustering set _Am , v _r represents, c _ir represents the link length from node v _i to node v _j (indicated by the number of hops), and p is the unit bandwidth cost.

选取成本总和cost最小的节点作为每个聚类集合的副本中心节点。Select the node with the smallest cost sum cost as the replica center node of each clustering set.

综上，本发明将虚拟CDN网络副本服务器的放置问题与聚类问题相结合更能有效的解决虚拟CDN网络的放置。To sum up, the present invention combines the placement problem of virtual CDN network copy servers with the clustering problem to more effectively solve the virtual CDN network placement problem.

Claims

1. the content delivery network server optimization dispositions method under network function virtualized environment, it is characterised in that including under Row step：Step 1：Based on the physical topological structure for undertaking the construction of content delivery network, using the server in physical node as virtual Node, it is determined that virtual network topology to be disposed；

Step 2：Similarity matrix W is calculated according to the node flow demand of virtual network topology and dummy node：

Calculate the similarity w between any two dummy node_ij：If having link connection, similarity between two dummy nodes w_ij=(| d_i-d_j|)^2.5,i≠j；Otherwise, similarity w_ij=0；Wherein i, j are virtual node identifiers, d_i、d_jRepresent different void Intend the node flow demand of node；

By similarity w_ijObtain similarity matrixN represents dummy node number；

Step 3：According to formula D=dia (dr₁,dr₂,...,dr_n) degree matrix D is obtained, Laplce's square is obtained by L=D-W Battle array L, wherein dia are diagonal symbol, diagonal element

And Laplacian Matrix L is normalized, obtain normalized Laplacian Matrix L_sym：L_ysm=D^-1/2LD^-1/2；

Step 4：Calculate L_ysmN characteristic value and characteristic vector, by the corresponding characteristic vector u of preceding k minimal eigenvalue₁,..., u_kBy row arrangement form matrix U_n×k=(u₁,...,u_k), wherein k represents replica server number to be disposed；

To matrix U_n×kMatrix T=(t are obtained by row normalization_ij)_n×k, wherein matrix element

Vector y is obtained by row to matrix T₁,y₂,…,y_n, wherein vector y_iSubscript i homographies T the i-th row, and i=1 ..., n；

Step 5：To y₁,...,y_nK mean cluster processing is carried out, k cluster result is obtained for C₁,C₂,...,C_k, by each cluster As a result the virtual node identifiers included obtain k node clustering set A_m, m=1 ..., k；

Step 6：For each node clustering set A_m, travel through A_mIn each dummy node v_r, calculate dummy node v_rIn The cost summation of heart nodeWherein c_irRepresent from node v_iTo node v_rLinkage length, that is, save Point v_iTo node v_rHop count, p be unit bandwidth cost；

Choose the minimum dummy nodes of cost summation cost and be used as each node clustering set A_mCopy Centroid.

2. the method as described in claim 1, it is characterised in that in step 5, to y₁,...,y_nCarry out k mean clusters processing tool Body is：

1. in n vector y₁,...,y_nIn, k vector is randomly selected as center vector；

2. in vectorial y₁,...,y_nIn, non-central vector is calculated to the distance of k center vector

3. to each non-central vector, divide its center vector minimum with distance into a class, obtain cluster result C₁',..., C'_k, wherein the element in each cluster result represents y_iSubscript value；

4. to each cluster result C'_m, recalculate a center vector, the center vector to cluster result C'_mIn it is non-in Heart vector it is minimum apart from sum, obtain k new center vectors；

5. repeat step 2. -4., until k center vector newly updating with the range difference of center vector before updating is less than threshold epsilon.

3. method as claimed in claim 2, it is characterised in that the value of threshold epsilon is ε≤0.001.