CN116841718A - Physical resource scheduling method and scheduler based on Kubernetes - Google Patents
Physical resource scheduling method and scheduler based on Kubernetes Download PDFInfo
- Publication number
- CN116841718A CN116841718A CN202210286835.2A CN202210286835A CN116841718A CN 116841718 A CN116841718 A CN 116841718A CN 202210286835 A CN202210286835 A CN 202210286835A CN 116841718 A CN116841718 A CN 116841718A
- Authority
- CN
- China
- Prior art keywords
- pod
- training
- current
- information
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 238000011156 evaluation Methods 0.000 claims abstract description 74
- 238000013210 evaluation model Methods 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims description 314
- 230000006870 function Effects 0.000 claims description 54
- 238000004422 calculation algorithm Methods 0.000 claims description 40
- 238000003066 decision tree Methods 0.000 claims description 17
- 238000007621 cluster analysis Methods 0.000 claims description 14
- 238000002372 labelling Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000009471 action Effects 0.000 claims description 7
- 230000002159 abnormal effect Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 19
- 238000005457 optimization Methods 0.000 description 13
- 230000007246 mechanism Effects 0.000 description 8
- 230000003068 static effect Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000011425 standardization method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及云计算技术领域,尤其涉及一种基于Kubernetes的物理资源调度方法及调度器。The present invention relates to the technical field of cloud computing, and in particular to a physical resource scheduling method and a scheduler based on Kubernetes.
背景技术Background Art
Kubernetes是一种用于管理云平台中多个主机上容器化的应用的分布式系统引擎,其可以实现物理资源调度、自动化部署运营、服务发现、弹性伸缩、高可用等功能。Kubernetes is a distributed system engine used to manage containerized applications on multiple hosts in a cloud platform. It can implement functions such as physical resource scheduling, automated deployment and operation, service discovery, elastic scaling, and high availability.
物理资源调度可以通过Kubernetes调度器实现,其核心是如何从集群中选择合适的节点(Node)分配给工作单元(Pod)。当前,Kubernetes调度器在实现物理资源调度时,通常需要包含Node预选、Node优选和Node选定三个阶段。Physical resource scheduling can be implemented through the Kubernetes scheduler, and its core is how to select appropriate nodes from the cluster to allocate to work units (Pods). Currently, when implementing physical resource scheduling, the Kubernetes scheduler usually needs to include three stages: node pre-selection, node optimization, and node selection.
Node预选阶段主要是通过Pod对应配置中的Requests配置对Node的硬件资源进行检查,也叫做静态资源检查。由于静态资源检查使用的Requests配置主要是配置Deployment、StatusfulSet等对象时对CPU、内存等资源进行预判和设置,而由于程序运行时的情况各异,这将导致Requests配置与实际情况产生差异,即在实际的Pod运行过程中,可能出现从Requests配置来看,一个Node有大量的物理资源可用,但实际上Node上运行的Pod已经占用了Node上大量的物理资源,Node上剩余的物理资源已经较为紧张。The Node pre-selection phase mainly checks the hardware resources of the Node through the Requests configuration in the corresponding Pod configuration, which is also called static resource checking. Since the Requests configuration used in static resource checking is mainly used to predict and set resources such as CPU and memory when configuring objects such as Deployment and StatusfulSet, and since the situations when the program is running are different, this will cause the Requests configuration to differ from the actual situation. That is, in the actual Pod running process, it may appear that a Node has a large amount of physical resources available from the Requests configuration, but in fact the Pod running on the Node has already occupied a large amount of physical resources on the Node, and the remaining physical resources on the Node are already relatively tight.
进而,当Kubernetes中出现大量的Pod需要调度时,受基于Requests配置的预选方式的影响,很容易出现这一类看似空闲的Node被判断为优选结果的情况,从而导致较多的Pod被调度到该Node,使该Node上的物理资源出现异常。严重时,甚至会出现Node异常离线的问题,对Kubernetes整体的运行安全产生不利的影响,降低了集群的稳定性。Furthermore, when a large number of Pods need to be scheduled in Kubernetes, it is easy for such seemingly idle Nodes to be judged as the preferred result due to the pre-selection method based on Requests configuration, resulting in more Pods being scheduled to the Node, causing abnormal physical resources on the Node. In serious cases, the Node may even go offline abnormally, which has an adverse impact on the overall operation security of Kubernetes and reduces the stability of the cluster.
发明内容Summary of the invention
本发明提供一种基于Kubernetes的物理资源调度方法及调度器,用以解决现有技术中存在的缺陷。The present invention provides a physical resource scheduling method and a scheduler based on Kubernetes, so as to solve the defects existing in the prior art.
本发明提供一种基于Kubernetes的物理资源调度方法,包括:The present invention provides a physical resource scheduling method based on Kubernetes, comprising:
获取Kubernetes集群中的调度信息,所述调度信息包括待调度Pod的Pod信息以及各节点的负载信息;Obtain scheduling information in the Kubernetes cluster, including Pod information of the Pod to be scheduled and load information of each node;
遍历各所述节点,将所述待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到所述调度评价模型输出的所述待调度Pod被调度至所述当前节点时的评价信息;Traversing each of the nodes, inputting the Pod information of the Pod to be scheduled and the load information of the current node into the scheduling evaluation model, and obtaining the evaluation information of the Pod to be scheduled when being scheduled to the current node output by the scheduling evaluation model;
基于各所述节点对应的评价信息,对所述待调度Pod进行调度;Scheduling the Pod to be scheduled based on the evaluation information corresponding to each of the nodes;
其中,所述调度评价模型基于训练样本集以及所述训练样本集中各训练样本对应的评价信息标签训练得到,所述训练样本包括Pod样本的Pod信息以及所述Pod样本被调度到的节点样本的负载信息。The scheduling evaluation model is trained based on a training sample set and evaluation information labels corresponding to each training sample in the training sample set. The training samples include Pod information of Pod samples and load information of node samples to which the Pod samples are scheduled.
根据本发明提供的一种基于Kubernetes的物理资源调度方法,各所述训练样本对应的评价信息标签基于如下方法确定:According to a Kubernetes-based physical resource scheduling method provided by the present invention, the evaluation information label corresponding to each training sample is determined based on the following method:
基于密度聚类算法,对各所述训练样本进行聚类分析,得到多个聚类簇;Based on the density clustering algorithm, cluster analysis is performed on each of the training samples to obtain multiple clusters;
基于所述多个聚类簇中的样本特征,对所述多个聚类簇中包含的训练样本进行集中标注,得到各所述训练样本对应的评价信息标签。Based on the sample features in the multiple clusters, the training samples included in the multiple clusters are collectively labeled to obtain evaluation information labels corresponding to the training samples.
根据本发明提供的一种基于Kubernetes的物理资源调度方法,所述基于密度聚类算法,对各所述训练样本进行聚类分析,得到多个聚类簇,包括:According to a Kubernetes-based physical resource scheduling method provided by the present invention, the density-based clustering algorithm performs cluster analysis on each of the training samples to obtain multiple cluster clusters, including:
遍历各所述训练样本,基于当前训练样本的预设邻域内存在的其他训练样本的数量,判断所述当前训练样本是否为核样本;若所述当前训练样本为核样本,则将所述当前训练样本存入核样本集;Traversing each of the training samples, and judging whether the current training sample is a core sample based on the number of other training samples existing in a preset neighborhood of the current training sample; if the current training sample is a core sample, storing the current training sample in a core sample set;
遍历所述核样本集中的每一核样本,基于当前核样本的预设邻域内存在的其他训练样本与所述当前核样本之间的隶属度,确定所述当前核样本的预设邻域内存在的其他训练样本与所述当前核样本是否属于同一初始簇;Traversing each core sample in the core sample set, and determining whether other training samples existing in a preset neighborhood of the current core sample and the current core sample belong to the same initial cluster based on the degree of membership between the current core sample and other training samples existing in a preset neighborhood of the current core sample;
确定所述核样本集中各核样本对应的初始簇后,计算各所述核样本对应的初始簇之间的轮廓系数;After determining the initial clusters corresponding to the nuclear samples in the nuclear sample set, calculating the silhouette coefficients between the initial clusters corresponding to the nuclear samples;
调整所述密度聚类算法中的预设参数,重复执行遍历动作,计算得到多个轮廓系数,并基于所述多个轮廓系数,确定所述多个聚类簇。The preset parameters in the density clustering algorithm are adjusted, the traversal action is repeatedly performed, a plurality of silhouette coefficients are calculated, and the plurality of cluster clusters are determined based on the plurality of silhouette coefficients.
根据本发明提供的一种基于Kubernetes的物理资源调度方法,当前训练样本的预设邻域内存在的其他训练样本,基于如下方法确定:According to a Kubernetes-based physical resource scheduling method provided by the present invention, other training samples existing in a preset neighborhood of a current training sample are determined based on the following method:
计算所述当前训练样本与各所述训练样本之间的Hausdorff距离;Calculating the Hausdorff distance between the current training sample and each of the training samples;
若所述Hausdorff距离小于所述预设邻域的预设半径,则确定所述Hausdorff距离对应的训练样本为所述当前训练样本的预设邻域内存在的其他训练样本。If the Hausdorff distance is smaller than the preset radius of the preset neighborhood, it is determined that the training sample corresponding to the Hausdorff distance is other training samples existing in the preset neighborhood of the current training sample.
根据本发明提供的一种基于Kubernetes的物理资源调度方法,所述当前核样本的预设邻域内存在的其他训练样本与所述当前核样本之间的隶属度,基于如下方法确定:According to a Kubernetes-based physical resource scheduling method provided by the present invention, the degree of membership between other training samples existing in a preset neighborhood of the current core sample and the current core sample is determined based on the following method:
遍历所述当前核样本的预设邻域内存在的每一其他训练样本,确定与当前其他训练样本邻近且处于所述当前核样本的预设邻域内的若干临近样本;Traversing each other training sample existing in a preset neighborhood of the current core sample, and determining a number of adjacent samples that are adjacent to the other current training samples and are within the preset neighborhood of the current core sample;
基于所述当前其他训练样本与所述若干临近样本之间的距离的均值、最大值以及最小值,确定所述当前其他训练样本与所述当前核样本之间的隶属度。The degree of membership between the current other training sample and the current core sample is determined based on the average, maximum and minimum values of the distances between the current other training sample and the plurality of adjacent samples.
根据本发明提供的一种基于Kubernetes的物理资源调度方法,所述调度评价模型基于如下方法训练得到:According to a physical resource scheduling method based on Kubernetes provided by the present invention, the scheduling evaluation model is trained based on the following method:
基于所述训练样本以及所述训练样本集中各训练样本对应的评价信息标签,采用包含有正则项的损失函数,对多个分类决策树模型进行迭代训练,得到所述调度评价模型。Based on the training samples and the evaluation information labels corresponding to each training sample in the training sample set, a loss function including a regularization term is used to iteratively train multiple classification decision tree models to obtain the scheduling evaluation model.
根据本发明提供的一种基于Kubernetes的物理资源调度方法,所述迭代训练的过程中,当前轮次的迭代训练时采用的损失函数基于所述当前轮次的前一轮次的迭代训练时采用的损失函数的一阶梯度以及二阶梯度确定。According to a Kubernetes-based physical resource scheduling method provided by the present invention, during the iterative training process, the loss function used in the current round of iterative training is determined based on the first-order gradient and second-order gradient of the loss function used in the previous round of iterative training of the current round.
本发明还提供一种调度器,包括:The present invention also provides a scheduler, comprising:
信息获取模块,用于获取Kubernetes集群中的调度信息,所述调度信息包括待调度Pod的Pod信息以及各节点的负载信息;An information acquisition module is used to obtain scheduling information in the Kubernetes cluster, where the scheduling information includes Pod information of the Pod to be scheduled and load information of each node;
评价模块,用于遍历各所述节点,将所述待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到所述调度评价模型输出的所述待调度Pod被调度至所述当前节点时的评价信息;An evaluation module, used to traverse each of the nodes, input the Pod information of the Pod to be scheduled and the load information of the current node into a scheduling evaluation model, and obtain the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model;
调度模块,用于基于各所述节点对应的评价信息,对所述待调度Pod进行调度;A scheduling module, used to schedule the Pod to be scheduled based on the evaluation information corresponding to each of the nodes;
其中,所述调度评价模型基于训练样本集以及所述训练样本集中各训练样本对应的评价信息标签训练得到,所述训练样本包括Pod样本的Pod信息以及所述Pod样本被调度到的节点样本的负载信息。The scheduling evaluation model is trained based on a training sample set and evaluation information labels corresponding to each training sample in the training sample set. The training samples include Pod information of Pod samples and load information of node samples to which the Pod samples are scheduled.
本发明还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述的基于Kubernetes的物理资源调度方法。The present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, a Kubernetes-based physical resource scheduling method as described above is implemented.
本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述的基于Kubernetes的物理资源调度方法。The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements any of the above-described Kubernetes-based physical resource scheduling methods.
本发明还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上述任一种所述的基于Kubernetes的物理资源调度方法。The present invention also provides a computer program product, including a computer program, which, when executed by a processor, implements any of the above-described physical resource scheduling methods based on Kubernetes.
本发明提供的基于Kubernetes的物理资源调度方法及调度器,通过获取Kubernetes集群中的调度信息,调度信息包括待调度Pod的Pod信息以及各节点的负载信息;然后遍历各节点,将待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到调度评价模型输出的待调度Pod被调度至当前节点时的评价信息;最好基于各节点对应的评价信息,对待调度Pod进行调度。该方法引入了当前节点的负载信息,避免了只考虑硬件资源而导致的节点上物理资源出现异常的情况,保证了Kubernetes集群的安全运行,提高了Kubernetes集群的稳定性。同时,该方法结合了调度评价模型,可以使对待调度Pod的调度过程更加客观,调度效果更佳,调度效率更高,还可以实现物理资源的充分利用。The physical resource scheduling method and scheduler based on Kubernetes provided by the present invention obtain the scheduling information in the Kubernetes cluster, and the scheduling information includes the Pod information of the Pod to be scheduled and the load information of each node; then traverse each node, input the Pod information of the Pod to be scheduled and the load information of the current node into the scheduling evaluation model, and obtain the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model; preferably, the Pod to be scheduled is scheduled based on the evaluation information corresponding to each node. The method introduces the load information of the current node, avoids the situation where the physical resources on the node are abnormal due to only considering the hardware resources, ensures the safe operation of the Kubernetes cluster, and improves the stability of the Kubernetes cluster. At the same time, the method is combined with the scheduling evaluation model, which can make the scheduling process of the Pod to be scheduled more objective, the scheduling effect is better, the scheduling efficiency is higher, and the full utilization of physical resources can be achieved.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, a brief introduction is given below to the drawings required for use in the embodiments or the description of the prior art. Obviously, for ordinary technicians in this field, other drawings can be obtained based on the drawings in the following description without creative work.
图1是本发明提供的Kubernetes的物理资源调度方法的流程示意图;FIG1 is a flow chart of a physical resource scheduling method for Kubernetes provided by the present invention;
图2是本发明提供的调度器的结构示意图;FIG2 is a schematic diagram of the structure of a scheduler provided by the present invention;
图3是本发明提供的电子设备的结构示意图。FIG. 3 is a schematic diagram of the structure of an electronic device provided by the present invention.
具体实施方式DETAILED DESCRIPTION
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
当前,Kubernetes在进行物理资源调度时,主要包括Kubernetes原生方案以及Kubernetes原生方案的改进方案。Currently, Kubernetes mainly uses the Kubernetes native solution and the improved solution of the Kubernetes native solution when scheduling physical resources.
对于原生方案,主要是根据当前Kubernetes集群中Node的资源占用情况,经过一系列的甄选操作,得到一个可用于部署当前新Pod的Node,并将该新Pod部署到该Node上。For the native solution, it is mainly based on the resource usage of the Node in the current Kubernetes cluster. After a series of selection operations, a Node that can be used to deploy the current new Pod is obtained, and the new Pod is deployed to the Node.
Kubernetes为Pod选择合适Node的过程,包含Node预选、Node优选和Node选定三个阶段。The process of Kubernetes selecting a suitable Node for a Pod includes three stages: Node pre-selection, Node optimization, and Node selection.
Node预选阶段是使用一系列的预选规则对每个Node进行检查,将不符合条件的Node过滤,从而完成Node预选。该阶段的预选规则主要是一些强制性规则的校验,用来检查主机是否能够匹配Pod所需要的资源,如果有一个规则没有任何主机能够满足,则该Pod会被挂起,直到有主机能够满足。预选阶段规则校验涉及的资源基本为不可压缩资源(例如内存、硬盘空间),一般只要不退出Pod就无法回收。The Node pre-selection phase uses a series of pre-selection rules to check each Node, filter out the Nodes that do not meet the conditions, and complete the Node pre-selection. The pre-selection rules in this phase are mainly some mandatory rule verifications, which are used to check whether the host can match the resources required by the Pod. If there is a rule that no host can meet, the Pod will be suspended until a host can meet it. The resources involved in the rule verification in the pre-selection phase are basically incompressible resources (such as memory and hard disk space), which generally cannot be recycled unless the Pod is exited.
Node优选阶段是对预选出的Node进行优先级排序,以便选出最合适运行Pod对象的Node。Node优选通过一系列的优先函数,对每个Node的条件进行打分,然后通过预设置的权重得到某个Node的最终得分。优选阶段主要涉及一些可压缩资源,例如CPU循环、Disk I/O带宽等。这些资源可以被限制和回收,Pod可以降低这些资源的使用量,从而不被杀掉。The node optimization phase is to prioritize the pre-selected nodes in order to select the most suitable node to run the Pod object. Node optimization scores the conditions of each node through a series of priority functions, and then obtains the final score of a node through the preset weight. The optimization phase mainly involves some compressible resources, such as CPU cycles, Disk I/O bandwidth, etc. These resources can be limited and recycled, and the Pod can reduce the usage of these resources so as not to be killed.
Node选择是在优先级排序结果中,选择优先级最高的Node运行Pod。当这类Node多于1个时,则随机选择一个。调度过程将Pod与选定的Node的绑定之后,Kubernetes会在Node上启动Pod。Node selection is to select the Node with the highest priority from the priority sorting results to run the Pod. If there are more than one such Node, one is randomly selected. After the scheduling process binds the Pod to the selected Node, Kubernetes will start the Pod on the Node.
Kubernetes调度器在进行Node初选时,对硬件资源的判断是基于静态资源来计算的。主要包括如下步骤:When the Kubernetes scheduler performs the initial node selection, the hardware resource judgment is based on static resources. The main steps are as follows:
1)获取各Node上已经部署的Pod信息。1) Get the Pod information deployed on each Node.
2)获取已部署Pod的静态配置信息。2) Get the static configuration information of the deployed Pod.
3)根据Pod相关配置中的Requests信息计算该节点总资源的使用量。3) Calculate the total resource usage of the node based on the Requests information in the Pod-related configuration.
4)根据各Node剩余的资源量计算Node的分值,剩余资源量较高的给予较高的评分。4) The score of each Node is calculated based on the remaining resources of each Node, and a higher score is given to a Node with a higher remaining resource amount.
5)根据上一步计算的Node得分,以及其它各初选函数得到的分值进行初选。5) Perform a preliminary selection based on the Node score calculated in the previous step and the scores obtained by other preliminary selection functions.
这种方案在调度Pod到Node时,由于静态资源检查使用的Requests配置与实际情况可能产生差异,因此,在运行了一段时间的集群中,可能会出现下面的情况:根据Requests配置显示某一节点较为空闲。但实际上该节点上运行的Pod占用了大量内存,该节点上运行的操作系统的内存资源较为紧张。When scheduling Pods to Nodes, this solution may cause a discrepancy between the Requests configuration used in static resource checks and the actual situation. Therefore, in a cluster that has been running for a period of time, the following situation may occur: According to the Requests configuration, a certain node is relatively idle. However, in fact, the Pods running on the node occupy a large amount of memory, and the memory resources of the operating system running on the node are relatively tight.
由于上述配置与实际运行情况的背离,当Kubernetes中出现大量的Pod需要调度时,受基于Requests配置的预选方式影响,很容易出现这一类看似空闲的Node被判断为优选结果的情况,从而导致较多的Pod被调度到该Node,使实际系统资源出现异常。严重时,甚至会出现Node异常离线的问题,对Kubernetes整体的运行安全产生不利的影响。Due to the deviation between the above configuration and the actual operation, when a large number of Pods need to be scheduled in Kubernetes, it is easy for this type of seemingly idle Node to be judged as the preferred result due to the pre-selection method based on the Requests configuration, resulting in more Pods being scheduled to the Node, causing abnormalities in the actual system resources. In serious cases, the Node may even go offline abnormally, which has an adverse impact on the overall operation security of Kubernetes.
对于Kubernetes原生方案的改进方案,主要是在Kubernetes原生方案的基础上,增加了对Node当前资源使用情况以及历史的Pod调度信息的考虑。这种方案虽然从调度的当前时刻看,调度的Pod所在Node已经是当前最优。但是,由于应用运行对于资源的使用有不确定性,因此,调度的应用可能在后面发生资源使用的变化,导致集群出现稳定性异常。The improvement plan for the Kubernetes native solution is mainly based on the Kubernetes native solution, adding consideration to the current resource usage of the Node and the historical Pod scheduling information. Although this solution considers that the Node where the scheduled Pod is located is currently the best from the current scheduling moment, due to the uncertainty of resource usage for application operation, the resource usage of the scheduled application may change later, resulting in abnormal cluster stability.
为此,本发明实施例中提供了一种基于Kubernetes的物理资源调度方法。To this end, an embodiment of the present invention provides a physical resource scheduling method based on Kubernetes.
图1为本发明实施例中提供的一种基于Kubernetes的物理资源调度方法的流程示意图,如图1所示,该方法包括:FIG1 is a flow chart of a physical resource scheduling method based on Kubernetes provided in an embodiment of the present invention. As shown in FIG1 , the method includes:
S1,获取Kubernetes集群中的调度信息,所述调度信息包括待调度Pod的Pod信息以及各节点的负载信息;S1, obtain the scheduling information in the Kubernetes cluster, where the scheduling information includes the Pod information of the Pod to be scheduled and the load information of each node;
S2,遍历各所述节点,将所述待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到所述调度评价模型输出的所述待调度Pod被调度至所述当前节点时的评价信息;S2, traversing each of the nodes, inputting the Pod information of the Pod to be scheduled and the load information of the current node into the scheduling evaluation model, and obtaining the evaluation information of the Pod to be scheduled when being scheduled to the current node output by the scheduling evaluation model;
S3,基于各所述节点对应的评价信息,对所述待调度Pod进行调度;S3, scheduling the Pod to be scheduled based on the evaluation information corresponding to each of the nodes;
其中,所述调度评价模型基于训练样本集以及所述训练样本集中各训练样本对应的评价信息标签训练得到,所述训练样本包括Pod样本的Pod信息以及所述Pod样本被调度到的节点样本的负载信息。The scheduling evaluation model is trained based on a training sample set and evaluation information labels corresponding to each training sample in the training sample set. The training samples include Pod information of Pod samples and load information of node samples to which the Pod samples are scheduled.
具体地,本发明实施例中提供的基于Kubernetes的物理资源调度方法,其执行主体为Kubernetes调度器,该Kubernetes调度器可以配置于服务器内,该服务器可以是本地服务器,也可以是云端服务器,本地服务器具体可以是计算机等,本发明实施例中对此不作具体限定。Specifically, the Kubernetes-based physical resource scheduling method provided in the embodiment of the present invention is executed by a Kubernetes scheduler, which can be configured in a server. The server can be a local server or a cloud server. The local server can specifically be a computer, etc., which is not specifically limited in the embodiment of the present invention.
首先执行步骤S1,获取Kubernetes集群中的调度信息。Kubernetes集群中包括多个Pod以及多个节点,Pod是Kubernetes创建或部署的最小的基本单元,一个Pod代表Kubernetes集群上正在运行的一个工作单元,可以包括多个容器进程。节点是能够承载Pod运行的主机,在每个节点上均有对应的操作系统。First, execute step S1 to obtain the scheduling information in the Kubernetes cluster. The Kubernetes cluster includes multiple Pods and multiple nodes. Pod is the smallest basic unit created or deployed by Kubernetes. A Pod represents a working unit running on the Kubernetes cluster and can include multiple container processes. A node is a host that can host the Pod to run, and each node has a corresponding operating system.
调度信息可以包括待调度Pod的Pod信息以及各节点的负载信息,待调度Pod是指需要确定将其调度至哪一个节点的Pod,Pod信息可以包括待调度Pod的部署信息、运行相关信息以及待调度Pod上运行的应用的历史运行时刻的资源使用信息等。各节点是指Kubernetes集群中的节点,负载信息可以包括节点的名称、IP地址、CPU占用率、CPU核数、内存容量、磁盘占用率、磁盘容量以及I/O速率等信息。Scheduling information may include Pod information of the Pod to be scheduled and load information of each node. The Pod to be scheduled refers to the Pod that needs to determine which node to schedule it to. The Pod information may include deployment information of the Pod to be scheduled, operation-related information, and resource usage information of the application running on the Pod to be scheduled at the historical running time. Each node refers to a node in the Kubernetes cluster, and the load information may include the name, IP address, CPU usage, number of CPU cores, memory capacity, disk usage, disk capacity, and I/O rate of the node.
获取Kubernetes集群中的调度信息后,由于其中包含众多属性数据,而部分属性数据可能与后续评价信息的确定无关联甚至可能对后续评价信息的确定产生干扰,因此可以对调度信息进行预处理操作。预处理操作可以包括:重复数据删除操作、唯一值数据删除操作、数值型缺失值补插操作、类别型特征编码操作以及标准化操作。After obtaining the scheduling information in the Kubernetes cluster, since it contains a lot of attribute data, some of which may be irrelevant to the determination of subsequent evaluation information or may even interfere with the determination of subsequent evaluation information, the scheduling information can be preprocessed. The preprocessing operation may include: deduplication, unique value data deletion, numerical missing value interpolation, categorical feature encoding, and standardization.
1)重复数据删除操作。在调度信息采集的过程中,存在重复爬取的情况,需要删除这部分数据。通过识别发现多条重复值,则直接删除这部分重复数据。此外,调度信息中存在的如IP地址、节点名称等无用的数据,也可删除。1) De-duplicate data operation. In the process of collecting scheduling information, there are cases of repeated crawling, and this part of the data needs to be deleted. If multiple duplicate values are found through identification, this part of the duplicate data is directly deleted. In addition, useless data such as IP addresses and node names in the scheduling information can also be deleted.
2)唯一值数据删除操作。由于唯一值数据通常是一些id属性,这些属性并不能刻画样本自身的分布规律,所以简单地删除这些数据即可。2) Unique value data deletion operation: Since unique value data are usually some id attributes, these attributes cannot describe the distribution law of the sample itself, so simply delete these data.
3)数值型缺失值插补操作。对于调度信息中的数值型数据,如果数值型数据的距离是可度量的,则使用该数值型数据的有效值的平均值来插补缺失的值;如果数值型数据的距离是不可度量的,则使用该数值型数据的有效值的众数来插补缺失的值。也可采用多重插补,多重插补认为待插补的值是随机的,实践上通常是估计出待插补的值,再加上不同的噪声,形成多组可选插补值,根据某种选择依据,选取最合适的插补值。3) Interpolation operation of missing values of numerical type. For the numerical data in the scheduling information, if the distance of the numerical data is measurable, the average value of the valid value of the numerical data is used to interpolate the missing value; if the distance of the numerical data is not measurable, the mode of the valid value of the numerical data is used to interpolate the missing value. Multiple interpolation can also be used. Multiple interpolation assumes that the values to be interpolated are random. In practice, the values to be interpolated are usually estimated, and different noises are added to form multiple groups of optional interpolation values. The most appropriate interpolation value is selected according to a certain selection basis.
4)类别型特征编码操作。对于调度信息中的类别型特征,该类别型特征为无序类别变量,应将其转化为矩阵。对类别数量少于10,采用one-hot编码方式;对类别超过10的采用频率编码方式。4) Categorical feature encoding operation. For categorical features in the scheduling information, the categorical features are unordered categorical variables and should be converted into matrices. For categories with less than 10, one-hot encoding is used; for categories with more than 10, frequency encoding is used.
5)数值标准化。本发明实施例中,可以采用min-max标准化方式,将调度信息变换至[0,1]区间内。5) Numerical standardization: In the embodiment of the present invention, a min-max standardization method may be used to transform the scheduling information into the interval [0, 1].
然后执行步骤S2,遍历Kubernetes集群中各节点,以当前时刻遍历到的节点作为当前节点,将待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到调度评价模型输出的待调度Pod被调度至当前节点时的评价信息。该调度评价模型用于表征待调度Pod被调度至当前节点的合适程度,得到的评价信息通常以分值的形式进行表示,分值越高则越合适,分值越低则越不合适。该调度评价模型可以是基于机器学习的决策树模型,也可以是机器学习的神经网络模型,此处不做具体限定。Then execute step S2, traverse each node in the Kubernetes cluster, take the node traversed at the current moment as the current node, input the Pod information of the Pod to be scheduled and the load information of the current node into the scheduling evaluation model, and obtain the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model. The scheduling evaluation model is used to characterize the suitability of the Pod to be scheduled to the current node, and the obtained evaluation information is usually expressed in the form of a score. The higher the score, the more suitable it is, and the lower the score, the less suitable it is. The scheduling evaluation model can be a decision tree model based on machine learning, or it can be a neural network model of machine learning, which is not specifically limited here.
该调度评价模型可以通过训练样本集以及训练样本集中各训练样本对应的评价信息标签,结合损失函数对初始调度评价模型进行训练得到。采用的训练样本集中可以包括多个训练样本,每个训练样本均包括一个Pod样本的Pod信息以及该Pod样本被调度到的节点样本的负载信息。每个训练样本均对应有一个评价信息标签,该评价信息标签用于表征该训练样本中Pod样本被调度至当前节点的合适程度。该评价信息标签可以通过人工标注得到,也可以在人工标注时引入Pod样本的调度规律等参考信息,此处不做具体限定。The scheduling evaluation model can be obtained by training the initial scheduling evaluation model through the training sample set and the evaluation information labels corresponding to each training sample in the training sample set, combined with the loss function. The adopted training sample set may include multiple training samples, each training sample includes the Pod information of a Pod sample and the load information of the node sample to which the Pod sample is scheduled. Each training sample corresponds to an evaluation information label, which is used to characterize the suitability of the Pod sample in the training sample to be scheduled to the current node. The evaluation information label can be obtained by manual labeling, or reference information such as the scheduling rules of the Pod sample can be introduced during manual labeling, which is not specifically limited here.
在利用训练样本集以及训练样本集中各训练样本对应的评价信息标签对初始调度评价模型进行训练之前,可以对各训练样本进行预处理操作。预处理操作也可以包括:重复数据删除操作、唯一值数据删除操作、数值型缺失值补插操作、类别型特征编码操作以及标准化操作。在对各训练样本进行标准化操作时,可以对各训练样本x1,x2…xn进行线性变换,使线性变换结果yi分别落到[0,1]区间,转换函数如下:Before using the training sample set and the evaluation information labels corresponding to each training sample in the training sample set to train the initial scheduling evaluation model, each training sample can be preprocessed. The preprocessing operation can also include: duplicate data deletion operation, unique value data deletion operation, numerical missing value interpolation operation, categorical feature encoding operation and standardization operation. When performing standardization operations on each training sample, each training sample x 1 ,x 2 …x n can be linearly transformed so that the linear transformation results yi fall into the interval [0,1] respectively. The conversion function is as follows:
其中,n为训练样本的数量。Among them, n is the number of training samples.
该步骤S2中,Kubernetes集群中每个节点的负载信息与待调度Pod的Pod信息,均可以作为调度评价模型的输入,通过调度评价模型得到待调度Pod被调度至Kubernetes集群中每个节点的评价信息。In step S2, the load information of each node in the Kubernetes cluster and the Pod information of the Pod to be scheduled can be used as inputs of the scheduling evaluation model, and the scheduling evaluation model is used to obtain the evaluation information of the Pod to be scheduled being scheduled to each node in the Kubernetes cluster.
最后执行步骤S3,通过各节点对应的评价信息,对待调度Pod进行调度。此处,各节点对应的评价信息即为待调度Pod被调度至各节点的评价信息,可以直接将待调度Pod调度至评价信息最高的节点,即将待调度Pod部署至评价信息最高的节点。Finally, step S3 is executed to schedule the Pod to be scheduled according to the evaluation information corresponding to each node. Here, the evaluation information corresponding to each node is the evaluation information of the Pod to be scheduled to each node. The Pod to be scheduled can be directly scheduled to the node with the highest evaluation information, that is, the Pod to be scheduled is deployed to the node with the highest evaluation information.
本发明实施例中提供的基于Kubernetes的物理资源调度方法首先获取Kubernetes集群中的调度信息,调度信息包括待调度Pod的Pod信息以及各节点的负载信息;然后遍历各节点,将待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到调度评价模型输出的待调度Pod被调度至当前节点时的评价信息;最好基于各节点对应的评价信息,对待调度Pod进行调度。该方法引入了当前节点的负载信息,避免了只考虑硬件资源而导致的节点上物理资源出现异常的情况,保证了Kubernetes集群的安全运行,提高了Kubernetes集群的稳定性。同时,该方法结合了调度评价模型,可以使对待调度Pod的调度过程更加客观,调度效果更佳,调度效率更高,还可以实现物理资源的充分利用。The physical resource scheduling method based on Kubernetes provided in the embodiment of the present invention first obtains the scheduling information in the Kubernetes cluster, and the scheduling information includes the Pod information of the Pod to be scheduled and the load information of each node; then traverses each node, and inputs the Pod information of the Pod to be scheduled and the load information of the current node into the scheduling evaluation model, and obtains the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model; preferably, the Pod to be scheduled is scheduled based on the evaluation information corresponding to each node. This method introduces the load information of the current node, avoids the situation where the physical resources on the node are abnormal due to only considering hardware resources, ensures the safe operation of the Kubernetes cluster, and improves the stability of the Kubernetes cluster. At the same time, this method combines the scheduling evaluation model, which can make the scheduling process of the Pod to be scheduled more objective, the scheduling effect is better, the scheduling efficiency is higher, and the full utilization of physical resources can be achieved.
在调度过程中,通过使用调度评价模型对节点的适配性进行预测型评价,再根据评价信息排序,选择最合适的节点来部署Pod。不但基于当前的系统情况来调度,还能预测未来该pod与节点的适配性,得到最佳的调度结果。During the scheduling process, the node's adaptability is predicted and evaluated using the scheduling evaluation model, and then the most suitable node is selected to deploy the Pod based on the evaluation information. Not only is scheduling based on the current system situation, but the compatibility of the pod and the node in the future can also be predicted to obtain the best scheduling result.
在上述实施例的基础上,本发明实施例中提供的基于Kubernetes的物理资源调度方法,各所述训练样本对应的评价信息标签基于如下方法确定:On the basis of the above-mentioned embodiment, in the physical resource scheduling method based on Kubernetes provided in the embodiment of the present invention, the evaluation information label corresponding to each training sample is determined based on the following method:
基于密度聚类算法,对各所述训练样本进行聚类分析,得到多个聚类簇;Based on the density clustering algorithm, cluster analysis is performed on each of the training samples to obtain multiple clusters;
基于所述多个聚类簇中的样本特征,对所述多个聚类簇中包含的训练样本进行集中标注,得到各所述训练样本对应的评价信息标签。Based on the sample features in the multiple clusters, the training samples included in the multiple clusters are collectively labeled to obtain evaluation information labels corresponding to the training samples.
具体地,本发明实施例中,在确定各训练样本对应的评价信息标签时,可以先通过密度聚类算法,对各训练样本进行聚类分析,得到多个聚类簇。密度聚类算法是指基于密度的聚类算法,是以数据集在空间分布上的稠密度为依据进行聚类的算法。密度聚类算法可以包括具有噪声的基于密度的空间聚类应用(Density-Based Spatial Clustering ofApplication with Noise,DBSCAN)算法以及对点排序确定簇结构(Ordering Points toidentify the clustering structure,OPTICS)算法等。Specifically, in an embodiment of the present invention, when determining the evaluation information label corresponding to each training sample, each training sample can be first clustered by a density clustering algorithm to obtain multiple clusters. The density clustering algorithm refers to a density-based clustering algorithm, which is an algorithm that performs clustering based on the density of the data set in spatial distribution. The density clustering algorithm can include a density-based spatial clustering application with noise (Density-Based Spatial Clustering of Application with Noise, DBSCAN) algorithm and an ordering point to identify the clustering structure (Ordering Points to identify the clustering structure, OPTICS) algorithm, etc.
对各训练样本进行聚类分析的过程,即是对各训练样本进行分类的过程,属于同一类别的若干个训练样本构成一个聚类簇,因此对各训练样本进行聚类分析后可以得到多个聚类簇。The process of clustering analysis on each training sample is the process of classifying each training sample. Several training samples belonging to the same category constitute a cluster. Therefore, multiple clusters can be obtained after clustering analysis on each training sample.
通过聚类分析能够寻找到每个节点样本在资源调度时的分配规律,处于同一聚类簇中的Pod样本与节点样本之间的调度关系彼此相似,处于不同聚类簇中的Pod样本与节点样本之间的调度关系彼此相似性较低。Through cluster analysis, we can find the allocation rules of each node sample during resource scheduling. The scheduling relationship between Pod samples and node samples in the same cluster is similar to each other, while the scheduling relationship between Pod samples and node samples in different clusters is less similar.
然后可以对每个聚类簇中的训练样本进行特征提取,得到每个聚类簇中的样本特征,进而可以通过每个聚类簇中的样本特征,对每个聚类簇中包含的训练样本进行集中标注。即可以理解的是,由于在每个聚类簇中包含的所有训练样本均属于同一类别,因此均可以采用相同的评价信息标签进行标注,如此不需要人工逐一对训练样本集中的每个训练样本均进行标注,可以提升标注效率。Then, the features of the training samples in each cluster can be extracted to obtain the sample features in each cluster, and then the training samples contained in each cluster can be collectively labeled through the sample features in each cluster. That is, it can be understood that since all the training samples contained in each cluster belong to the same category, they can all be labeled with the same evaluation information label, so there is no need to manually label each training sample in the training sample set one by one, which can improve the labeling efficiency.
该集中标注的过程可以通过人工实现,聚类簇中的样本特征为人工标注提供标注依据。在得到多个聚类簇后再进行的人工标注,与传统的人工标注不同,标注人员能够基于聚类分析得到的多个聚类簇,对每个聚类簇进行集中标注。由于每个聚类簇的分配特性相似,因此聚类簇中每个节点的调度情况也类似,评价信息标签也相似,这样就为评价信息标签设定了一个更好的参照系,无需标注人员逐条审核,极大节省标注时间。The centralized labeling process can be achieved manually, and the sample features in the clusters provide a basis for manual labeling. Manual labeling after obtaining multiple clusters is different from traditional manual labeling. Labelers can label each cluster centrally based on the multiple clusters obtained by cluster analysis. Since the allocation characteristics of each cluster are similar, the scheduling of each node in the cluster is also similar, and the evaluation information labels are also similar. This sets a better reference system for the evaluation information labels, eliminating the need for labelers to review each item one by one, greatly saving labeling time.
集中标注的内容可以是每个Pod最适合被调度到的Node的节点编号,若Kubernetes集群中共有N个Node,则评价信息标签可以分别表示为[1,2,3,…,N]。The content of the centralized labeling can be the node number of the Node to which each Pod is most suitable for scheduling. If there are N nodes in the Kubernetes cluster, the evaluation information labels can be represented as [1, 2, 3, ..., N] respectively.
本发明实施例中,采用密度聚类算法对各训练样本进行聚类分析,得到多个聚类簇,更加直观地挖掘出Pod资源调度规律。通过多个聚类簇中的样本特征,对多个聚类簇中包含的训练样本进行集中标注,可以提高训练样本的标注效率,节省标注时间,降低人工成本。In the embodiment of the present invention, a density clustering algorithm is used to perform cluster analysis on each training sample to obtain multiple clusters, and the Pod resource scheduling rules are more intuitively mined. Through the sample features in multiple clusters, the training samples contained in the multiple clusters are centrally labeled, which can improve the labeling efficiency of the training samples, save labeling time, and reduce labor costs.
在上述实施例的基础上,本发明实施例中提供的基于Kubernetes的物理资源调度方法,所述基于密度聚类算法,对各所述训练样本进行聚类分析,得到多个聚类簇,包括:On the basis of the above embodiment, the physical resource scheduling method based on Kubernetes provided in the embodiment of the present invention, based on the density clustering algorithm, performs cluster analysis on each of the training samples to obtain multiple cluster clusters, including:
S4,遍历各所述训练样本,基于当前训练样本的预设邻域内存在的其他训练样本的数量,判断所述当前训练样本是否为核样本;若所述当前训练样本为核样本,则将所述当前训练样本存入核样本集;S4, traversing each of the training samples, and judging whether the current training sample is a core sample based on the number of other training samples existing in a preset neighborhood of the current training sample; if the current training sample is a core sample, storing the current training sample in a core sample set;
S5,遍历所述核样本集中的每一核样本,基于当前核样本的预设邻域内存在的其他训练样本与所述当前核样本之间的隶属度,确定所述当前核样本的预设邻域内存在的其他训练样本与所述当前核样本是否属于同一初始簇;S5, traversing each core sample in the core sample set, and determining whether other training samples existing in the preset neighborhood of the current core sample and the current core sample belong to the same initial cluster based on the degree of membership between the other training samples existing in the preset neighborhood of the current core sample and the current core sample;
S6,确定所述核样本集中各核样本对应的初始簇后,计算各所述核样本对应的初始簇之间的轮廓系数;S6, after determining the initial clusters corresponding to the nuclear samples in the nuclear sample set, calculating the silhouette coefficients between the initial clusters corresponding to the nuclear samples;
S7,调整所述密度聚类算法中的预设参数,重复执行遍历动作,计算得到多个轮廓系数,并基于所述多个轮廓系数,确定所述多个聚类簇。S7, adjusting preset parameters in the density clustering algorithm, repeatedly performing the traversal action, calculating a plurality of silhouette coefficients, and determining the plurality of clustering clusters based on the plurality of silhouette coefficients.
具体地,本发明实施例中,在通过密度聚类算法,对各训练样本进行聚类分析,得到多个聚类簇时,采用的密度聚类算法是DBSCAN算法。首先执行步骤S4,即遍历各训练样本,以当前时刻遍历到的训练样本作为当前训练样本,通过计算当前训练样本与各训练样本中除当前训练样本外的其他训练样本之间的距离,确定当前训练样本的预设邻域内存在的其他训练样本的数量。预设邻域的预设半径e可以根据需要进行设定,此处不作具体限定。Specifically, in the embodiment of the present invention, when clustering analysis is performed on each training sample by a density clustering algorithm to obtain multiple clusters, the density clustering algorithm used is the DBSCAN algorithm. First, step S4 is executed, that is, each training sample is traversed, and the training sample traversed at the current moment is used as the current training sample. By calculating the distance between the current training sample and other training samples in each training sample except the current training sample, the number of other training samples existing in the preset neighborhood of the current training sample is determined. The preset radius e of the preset neighborhood can be set as needed and is not specifically limited here.
可以理解的是,在计算当前训练样本与各训练样本中的其他训练样本之间的距离时,可以采用计算欧式距离的方式进行。除此之外,由于欧式距离计算相对简单,往往无法代表各训练样本之间的真实距离,尤其是在本发明实施例中存在多个训练样本的场景下,训练样本之间差异较大,欧氏距离无法完全表达其距离特征。因此,本发明实施例中为解决欧式距离带来的问题,也可以采用计算Hausdorff距离的方式进行计算,以改进原始的DBSCAN算法中距离度量不全面的问题。It is understandable that when calculating the distance between the current training sample and other training samples in each training sample, the Euclidean distance can be used. In addition, since the Euclidean distance calculation is relatively simple, it often cannot represent the true distance between each training sample, especially in the scenario where there are multiple training samples in the embodiment of the present invention, the differences between the training samples are large, and the Euclidean distance cannot fully express their distance characteristics. Therefore, in order to solve the problems caused by the Euclidean distance in the embodiment of the present invention, the Hausdorff distance can also be used to calculate, so as to improve the problem of incomplete distance measurement in the original DBSCAN algorithm.
此后,根据当前训练样本的预设邻域内存在的其他训练样本的数量,判断所述当前训练样本是否为核样本。在判断过程中,可以引入数量阈值MinPts,然后比较当前训练样本的预设邻域内存在的其他训练样本的数量与数量阈值之间的大小关系,若当前训练样本的预设邻域内存在的其他训练样本的数量大于或等于数量阈值,则可以确定当前训练样本为核样本,否则可以确定当前训练样本不是核样本。可以理解的是,核样本可以是聚类分析的聚类中心,也即后续得到的各聚类簇的中心。Thereafter, it is determined whether the current training sample is a core sample based on the number of other training samples existing in the preset neighborhood of the current training sample. In the judgment process, a quantity threshold MinPts may be introduced, and then the relationship between the number of other training samples existing in the preset neighborhood of the current training sample and the quantity threshold is compared. If the number of other training samples existing in the preset neighborhood of the current training sample is greater than or equal to the quantity threshold, it can be determined that the current training sample is a core sample, otherwise it can be determined that the current training sample is not a core sample. It is understandable that the core sample may be the cluster center of the cluster analysis, that is, the center of each cluster cluster obtained subsequently.
在确定当前训练样本为核样本的情况下,可以将当前训练样本存入核样本集。在遍历各训练样本之后,可以得到各训练样本中属于核样本的训练样本,且所有的核样本均存储至核样本集内。When the current training sample is determined to be a core sample, the current training sample can be stored in the core sample set. After traversing each training sample, training samples belonging to the core sample in each training sample can be obtained, and all the core samples are stored in the core sample set.
然后执行步骤S5,遍历核样本集中的每一核样本,以当前时刻遍历到的核样本作为当前核样本,计算当前核样本的预设邻域内存在的其他训练样本与所述当前核样本之间的隶属度。Then, step S5 is executed to traverse each core sample in the core sample set, take the core sample traversed at the current moment as the current core sample, and calculate the membership between other training samples existing in the preset neighborhood of the current core sample and the current core sample.
由于原始的DBSCAN算法将每个训练样本同等对待,但在实际训练样本中可能存在噪声点,这些噪声点可能会影响各聚类簇的正确构造,进而降低聚类分析的泛化能力,并且在本发明实施例中需要更多关注到这些噪声样本,将噪声样本与其他聚类簇中的训练样本区分开来,以便后续标注人员能够更好区分训练样本中调度关系的差异,提升标注效率。Since the original DBSCAN algorithm treats each training sample equally, there may be noise points in the actual training samples. These noise points may affect the correct construction of each cluster, thereby reducing the generalization ability of the cluster analysis. In the embodiments of the present invention, more attention should be paid to these noise samples to distinguish the noise samples from the training samples in other clusters, so that subsequent labeling personnel can better distinguish the differences in scheduling relationships in the training samples and improve labeling efficiency.
由于聚类簇之间的重叠,分布在聚类簇边缘的某些训练样本在实际应用中并不一定属于某一个聚类簇,而可能是属于多个聚类簇或者为噪声样本,这导致这些噪声样本的类别出现模糊不确定性;而分布在簇类中心的训练样本的类别的确定性要远远高于分布在边缘的训练样本,因此在聚类时不能把训练样本集中的所有训练样本等同看待。考虑到隶属度函数基于模糊集理论,不像传统聚类算法只是简单的判断某个训练样本是否属于某个聚类簇,而是能够计算出某个训练样本在多大程度上属于某个聚类簇,这样可以控制各训练样本的权重,因此为了消除噪声样本的影响,在DBSCAN算法中引入隶属度函数。Due to the overlap between clusters, some training samples distributed at the edge of the cluster do not necessarily belong to a certain cluster in practical applications, but may belong to multiple clusters or be noise samples, which leads to fuzzy uncertainty in the categories of these noise samples; while the certainty of the categories of training samples distributed in the center of the cluster is much higher than that of training samples distributed at the edge, so all training samples in the training sample set cannot be treated equally during clustering. Considering that the membership function is based on fuzzy set theory, unlike traditional clustering algorithms that simply judge whether a training sample belongs to a cluster, it can calculate the extent to which a training sample belongs to a cluster, so that the weight of each training sample can be controlled. Therefore, in order to eliminate the influence of noise samples, the membership function is introduced in the DBSCAN algorithm.
隶属度函数可以选取可以根据需要进行,可以选取传统的基于距离的隶属度函数,但是这种隶属度函数只考虑了训练样本与该训练样本所处的类别中心点,即核样本之间的关系,没有考虑样本分布情况对隶属度的影响。而在Kubernetes集群中,根据集群原始的节点调度特性以及经验,真实数据分布具有分散性。在此基础上,也可以隶属度函数进行改进,将Pod样本的分散性分布特性也考虑在内,得到KNN隶属度函数。通过KNN隶属度函数来计算隶属度,提高聚类分析算法的抗噪性。The membership function can be selected as needed. The traditional distance-based membership function can be selected, but this membership function only considers the relationship between the training sample and the center point of the category where the training sample is located, that is, the core sample, and does not consider the impact of the sample distribution on the membership. In the Kubernetes cluster, according to the original node scheduling characteristics and experience of the cluster, the real data distribution is dispersed. On this basis, the membership function can also be improved to take into account the dispersed distribution characteristics of the Pod sample and obtain the KNN membership function. The membership is calculated by the KNN membership function to improve the noise resistance of the clustering analysis algorithm.
此后,通过当前核样本的预设邻域内存在的其他训练样本与当前核样本之间的隶属度,确定当前核样本的预设邻域内存在的其他训练样本与当前核样本是否属于同一初始簇。在判断当前核样本的预设邻域内存在的其他训练样本与当前核样本是否属于同一初始簇时,可以引入隶属度阈值,然后将当前核样本的预设邻域内存在的每一其他训练样本与当前核样本之间的隶属度与隶属度阈值进行比较,若当前核样本的预设邻域内存在的某一其他训练样本与当前核样本之间的隶属度小于隶属度阈值,则可以确定该其他训练样本与当前训练样本不属于同一初始簇。否则,则可以确定该其他训练样本与当前训练样本属于同一初始簇,并将该其他训练样本与当前训练样本共同存入相同的初始簇内。Thereafter, by the degree of membership between other training samples existing in the preset neighborhood of the current core sample and the current core sample, it is determined whether other training samples existing in the preset neighborhood of the current core sample and the current core sample belong to the same initial cluster. When judging whether other training samples existing in the preset neighborhood of the current core sample and the current core sample belong to the same initial cluster, a membership threshold can be introduced, and then the degree of membership between each other training sample existing in the preset neighborhood of the current core sample and the current core sample is compared with the membership threshold. If the degree of membership between a certain other training sample existing in the preset neighborhood of the current core sample and the current core sample is less than the membership threshold, it can be determined that the other training sample and the current training sample do not belong to the same initial cluster. Otherwise, it can be determined that the other training sample and the current training sample belong to the same initial cluster, and the other training sample and the current training sample are stored in the same initial cluster together.
然后执行步骤S6,确定核样本集中各核样本对应的初始簇后,计算各核样本对应的初始簇之间的轮廓系数。由于初始确定的密度聚类算法中的预设邻域的预设半径e、数量阈值MinPts以及隶属度阈值等预设参数不一定保证得到合适的各聚类簇,因此可以通过计算各核样本对应的初始簇之间的轮廓系数,以便于通过调整轮廓系数得到使得聚类效果最优的预设半径以及数量阈值。Then, step S6 is performed to determine the initial clusters corresponding to each core sample in the core sample set, and then calculate the silhouette coefficients between the initial clusters corresponding to each core sample. Since the preset parameters such as the preset radius e, the quantity threshold MinPts, and the membership threshold of the preset neighborhood in the initially determined density clustering algorithm may not necessarily guarantee the acquisition of appropriate clusters, the silhouette coefficients between the initial clusters corresponding to each core sample can be calculated so as to adjust the silhouette coefficients to obtain the preset radius and quantity threshold that optimize the clustering effect.
此处,轮廓系数可以通过如下计算公式表示:Here, the silhouette coefficient can be expressed by the following calculation formula:
其中,ai为训练样本i到自身所在的初始簇中所有其它的训练样本之间的平均距离,bi为训练样本i到其它的初始簇中所有的训练样本之间的距离最小值。Where ai is the average distance between training sample i and all other training samples in its initial cluster, and bi is the minimum distance between training sample i and all training samples in other initial clusters.
每个训练样本均对应有一个轮廓系数,该轮廓系数SC的取值范围在-1到1之间。当SC取负值时,训练样本i到其它的初始簇内的训练样本的距离小于其到自身所在的初始簇内所有其他的训练样本的距离,说明两个初始簇有重叠,这样的聚类效果较差,SC值越大,其聚类质量越高,聚类效果越好。Each training sample corresponds to a silhouette coefficient, and the value range of the silhouette coefficient SC is between -1 and 1. When SC takes a negative value, the distance from training sample i to the training samples in other initial clusters is smaller than the distance from it to all other training samples in its own initial cluster, indicating that the two initial clusters overlap, and the clustering effect is poor. The larger the SC value, the higher the clustering quality and the better the clustering effect.
最后执行步骤S7,调整密度聚类算法中的预设参数,重复执行遍历动作,即重复执行上述步骤S4-S6,并计算得到多个轮廓系数。通过多个轮廓系数,确定出多个聚类簇。Finally, step S7 is executed to adjust the preset parameters in the density clustering algorithm, and the traversal action is repeated, that is, the above steps S4-S6 are repeated, and multiple silhouette coefficients are calculated. Multiple clusters are determined through multiple silhouette coefficients.
此处,预设参数可以包括预设邻域的预设半径e、数量阈值MinPts以及隶属度阈值等参数。Here, the preset parameters may include parameters such as a preset radius e of a preset neighborhood, a quantity threshold MinPts, and a membership threshold.
在通过多个轮廓系数确定出多个聚类簇时,可以从多个轮廓系数中选取最大的轮廓系数对应的各初始簇,并将其作为最终的聚类分析结果,即将最大的轮廓系数对应的各初始簇作为最终的各聚类簇。When multiple clusters are determined by multiple silhouette coefficients, initial clusters corresponding to the largest silhouette coefficient can be selected from the multiple silhouette coefficients and used as the final cluster analysis results, that is, the initial clusters corresponding to the largest silhouette coefficient are used as the final clusters.
本发明实施例中,通过利用隶属度与轮廓系数来调整预设参数,进而自动选取最优参数,解决了传统DBSCAN算法存在的手动设定参数不准确的缺陷。In the embodiment of the present invention, the membership degree and the silhouette coefficient are used to adjust the preset parameters, and then the optimal parameters are automatically selected, thereby solving the defect of inaccurate manual parameter setting in the traditional DBSCAN algorithm.
在上述实施例的基础上,本发明实施例中提供的基于Kubernetes的物理资源调度方法,所述当前训练样本的预设邻域内存在的其他训练样本,基于如下方法确定:On the basis of the above-mentioned embodiment, in the Kubernetes-based physical resource scheduling method provided in the embodiment of the present invention, other training samples existing in the preset neighborhood of the current training sample are determined based on the following method:
计算所述当前训练样本与各所述训练样本之间的Hausdorff距离;Calculating the Hausdorff distance between the current training sample and each of the training samples;
若所述Hausdorff距离小于所述预设邻域的预设半径,则确定所述Hausdorff距离对应的训练样本为所述当前训练样本的预设邻域内存在的其他训练样本。If the Hausdorff distance is smaller than the preset radius of the preset neighborhood, it is determined that the training sample corresponding to the Hausdorff distance is other training samples existing in the preset neighborhood of the current training sample.
具体地,本发明实施例中,在确定当前训练样本的预设邻域内存在的其他训练样本时,可以通过计算当前训练样本与各训练样本之间的Hausdorff距离实现,可以代表各训练样本之间的真实距离,使距离度量更加全面。Specifically, in an embodiment of the present invention, when determining other training samples existing in a preset neighborhood of the current training sample, this can be achieved by calculating the Hausdorff distance between the current training sample and each training sample, which can represent the true distance between each training sample and make the distance metric more comprehensive.
在计算当前训练样本与各训练样本之间的Hausdorff距离之后,可以判断每个训练样本与当前训练样本之间的Hausdorff距离是否小于预设半径,如果小于预设半径,则可以确定该训练样本在当前训练样本的预设邻域内,即为该预设邻域内除当前训练样本外的其他训练样本。After calculating the Hausdorff distance between the current training sample and each training sample, it can be determined whether the Hausdorff distance between each training sample and the current training sample is less than a preset radius. If it is less than the preset radius, it can be determined that the training sample is within the preset neighborhood of the current training sample, that is, other training samples within the preset neighborhood except the current training sample.
本发明实施例中,使用Hausdorff距离,能够更好度量各训练样本之间的相似度,提升密度聚类算法鲁棒性。In the embodiment of the present invention, the Hausdorff distance is used to better measure the similarity between training samples and improve the robustness of the density clustering algorithm.
在上述实施例的基础上,本发明实施例中提供的基于Kubernetes的物理资源调度方法,所述当前核样本的预设邻域内存在的其他训练样本与所述当前核样本之间的隶属度,基于如下方法确定:On the basis of the above-mentioned embodiment, in the Kubernetes-based physical resource scheduling method provided in the embodiment of the present invention, the degree of membership between other training samples existing in the preset neighborhood of the current core sample and the current core sample is determined based on the following method:
遍历所述当前核样本的预设邻域内存在的每一其他训练样本,确定与当前其他训练样本邻近且处于所述当前核样本的预设邻域内的若干临近样本;Traversing each other training sample existing in a preset neighborhood of the current core sample, and determining a number of adjacent samples that are adjacent to the other current training samples and are within the preset neighborhood of the current core sample;
基于所述当前其他训练样本与所述若干临近样本之间的距离的均值、最大值以及最小值,确定所述当前其他训练样本与所述当前核样本之间的隶属度。The degree of membership between the current other training sample and the current core sample is determined based on the average, maximum and minimum values of the distances between the current other training sample and the plurality of adjacent samples.
具体地,本发明实施例中,在确定隶属度时,虽然基于KNN的隶属度函数能够考虑训练样本的空间分布关系,但仅是将周围若干训练样本点的距离纳入考虑范围,未能将一些特殊情况或总体情况反映出来,因此本发明实施例中提出了一种模糊性判断的隶属度计算方法,即K近邻隶属度计算方法,可以减少噪声样本的影响。Specifically, in an embodiment of the present invention, when determining the membership, although the KNN-based membership function can take into account the spatial distribution relationship of the training samples, it only takes the distances of several surrounding training sample points into consideration, and fails to reflect some special cases or overall situations. Therefore, a fuzzy judgment membership calculation method is proposed in an embodiment of the present invention, namely, the K nearest neighbor membership calculation method, which can reduce the impact of noise samples.
首先,遍历当前核样本的预设邻域内存在的每一其他训练样本,以当前时刻遍历到的当前核样本的预设邻域内存在的其他训练样本作为当前其他训练样本。确定与当前其他训练样本邻近且处于当前核样本的预设邻域内的若干临近样本,即对于当前其他训练样本xi,找到距离其最邻近的且处于同一类别下的k个临近样本,组成集合Di={d1,d2,...,dk}。其中,dj(j=1,2,...,k)为当前其他训练样本xi到第j个临近样本的距离,k为近邻数,可以根据需要进行设定,此处不做具体限定。First, traverse each other training sample existing in the preset neighborhood of the current core sample, and take the other training sample existing in the preset neighborhood of the current core sample traversed at the current moment as the current other training sample. Determine a number of adjacent samples that are adjacent to the current other training sample and are in the preset neighborhood of the current core sample, that is, for the current other training sample x i , find the k adjacent samples that are closest to it and are in the same category, forming a set D i ={d 1 ,d 2 ,...,d k }. Among them, d j (j=1,2,...,k) is the distance from the current other training sample x i to the jth adjacent sample, and k is the number of neighbors, which can be set as needed and is not specifically limited here.
可以理解的是,近邻数k也是密度聚类算法中的预设参数之一,因此在每轮计算各核样本对应的初始簇之间的轮廓系数之后,不仅需要调整预设半径e、数量阈值MinPts以及隶属度阈值等参数,还需要调整近邻数k。It is understandable that the number of nearest neighbors k is also one of the preset parameters in the density clustering algorithm. Therefore, after calculating the silhouette coefficient between the initial clusters corresponding to each core sample in each round, it is necessary not only to adjust the preset radius e, the number threshold MinPts, and the membership threshold and other parameters, but also to adjust the number of nearest neighbors k.
然后,通过当前其他训练样本与各临近样本之间的距离的均值、最大值以及最小值,确定当前其他训练样本与当前核样本之间的隶属度。即当前其他训练样本xi的隶属度可以通过如下公式计算得到:Then, the membership between the current other training samples and the current core sample is determined by the mean, maximum and minimum values of the distances between the current other training samples and each adjacent sample. That is, the membership of the current other training samples x i can be calculated by the following formula:
其中,dai为当前其他训练样本xi到集合Di的平均距离,dmax、dmin分别为dai中的最大值、最小值;θ为用于控制隶属度下限的参数,θ<1且为足够小的正数;f为控制隶属度函数变化趋势的参数,为定值。Among them, dai is the average distance from the current other training samples xi to the set Di , dmax and dmin are the maximum and minimum values in dai respectively; θ is the parameter used to control the lower limit of membership, θ<1 and is a sufficiently small positive number; f is the parameter that controls the changing trend of the membership function, which is a constant.
此后,在遍历当前核样本的预设邻域内存在的每一其他训练样本之后,可以重新对噪声样本进行判断,判断噪声样本到已生成的初始簇中各训练样本的距离是否小于等于预设邻域的预设半径,若小于等于,则将该噪声样本归为该初始簇。Thereafter, after traversing every other training sample in the preset neighborhood of the current kernel sample, the noise sample can be re-judged to determine whether the distance from the noise sample to each training sample in the generated initial cluster is less than or equal to the preset radius of the preset neighborhood. If so, the noise sample is classified into the initial cluster.
本发明实施例中,采用K近邻隶属度来考虑模糊集中的知识又能充分考虑周围样本点对样本的影响,能够有效提高DBSCAN的精确度。In the embodiment of the present invention, the K nearest neighbor membership is used to consider the knowledge in the fuzzy set and fully consider the influence of the surrounding sample points on the sample, which can effectively improve the accuracy of DBSCAN.
在上述实施例的基础上,本发明实施例中采用的改进后的DBSCAN聚类算法,其流程如下:Based on the above embodiment, the improved DBSCAN clustering algorithm used in the embodiment of the present invention has the following process:
输入:训练样本集D,预设半径ε,数量阈值MinPts,隶属度阈值F,近邻数K。Input: training sample set D, preset radius ε, quantity threshold MinPts, membership threshold F, number of neighbors K.
输出:聚类分析结果及噪声样本Output: Cluster analysis results and noise samples
1)从训练样本集D中随机抽取一个未被处理的训练样本;1) Randomly select an unprocessed training sample from the training sample set D;
2)基于Hausdorff距离计算抽取的训练样本与训练样本集D中其他训练样本之间的距离,且在抽取的训练样本的预设邻域内的其他训练样本的数量满足数量阈值要求,则为核样本,将核样本放入核样本集seed中并将该样本点归为初始簇C,否则返回步骤1);2) The distance between the extracted training sample and other training samples in the training sample set D is calculated based on the Hausdorff distance. If the number of other training samples in the preset neighborhood of the extracted training sample meets the quantity threshold requirement, then it is a core sample. The core sample is placed in the core sample set seed and the sample point is classified as the initial cluster C. Otherwise, return to step 1);
3)遍历seed中的每个核样本,通过计算得到的每个样本点之间的K近邻隶属度来判断噪声样本能否归并到现有的初始簇C中;3) Traverse each core sample in the seed, and determine whether the noise sample can be merged into the existing initial cluster C by calculating the K nearest neighbor membership between each sample point;
4)遍历整个训练样本集D,进行DBSCAN聚类;4) Traverse the entire training sample set D and perform DBSCAN clustering;
5)计算聚类后的轮廓系数SC并调整预设半径ε、数量阈值MinPts、隶属度阈值F以及近邻数K等预设参数;5) Calculate the silhouette coefficient SC after clustering and adjust preset parameters such as the preset radius ε, quantity threshold MinPts, membership threshold F, and number of neighbors K;
6)重复步骤4)、5),并比较计算得到的SC;6) Repeat steps 4) and 5) and compare the calculated SC;
7)选取步骤6)中满足最优SC时的各初始簇作为输出聚类分析结果,即得到各聚类簇。7) Selecting each initial cluster that satisfies the optimal SC in step 6) as the output cluster analysis result, that is, obtaining each cluster cluster.
在上述实施例的基础上,本发明实施例中采用的改进后的DBSCAN聚类算法,所述调度评价模型基于如下方法训练得到:On the basis of the above embodiment, the improved DBSCAN clustering algorithm adopted in the embodiment of the present invention, the scheduling evaluation model is trained based on the following method:
基于所述训练样本以及所述训练样本集中各训练样本对应的评价信息标签,采用包含有正则项的损失函数,对多个分类决策树模型进行迭代训练,得到所述调度评价模型。Based on the training samples and the evaluation information labels corresponding to each training sample in the training sample set, a loss function including a regularization term is used to iteratively train multiple classification decision tree models to obtain the scheduling evaluation model.
具体地,本发明实施例中,采用的调度评价模型可以通过训练样本集以及其中各训练样本对应的评价信息标签,采用包含有正则项的损失函数,对多个分类决策树模型进行迭代训练,得到调度评价模型。分类决策树模型可以采用CART分类决策树模型。Specifically, in an embodiment of the present invention, the scheduling evaluation model used can be obtained by iteratively training multiple classification decision tree models using a loss function including a regularization term through a training sample set and an evaluation information label corresponding to each training sample. The classification decision tree model can be a CART classification decision tree model.
此处,模型迭代训练时采用Boosting思想,以一系列CART分类决策树模型作为弱学习器(拟合度较低),并以损失函数的负梯度为依据训练多个个体学习器,并按照梯度提升的方法进行集成,组合成一个准确可靠的强学习器(拟合度接近真值)。Here, the Boosting idea is adopted in the iterative training of the model. A series of CART classification decision tree models are used as weak learners (with low fitting degree), and multiple individual learners are trained based on the negative gradient of the loss function. They are integrated according to the gradient boosting method to form an accurate and reliable strong learner (with fitting degree close to the true value).
模型迭代训练始于一个常数预测,每次迭代生成一棵CART分类决策树拟合现有模型的残差,通过多轮迭代提升现有模型的准确率。此外,考虑到标注好的训练样本集类别不平衡、高维度等问题,模型迭代训练时在损失函数中加入正则化项对模型的复杂程度施加惩罚,以提高模型的泛化能力,防止过拟合,且使用二阶导数使得损失函数更精确,利用集成学习的思想优化了类别不平衡的问题。The iterative model training starts with a constant prediction. Each iteration generates a CART classification decision tree to fit the residual of the existing model, and the accuracy of the existing model is improved through multiple rounds of iterations. In addition, considering the problems of imbalanced categories and high dimensions of the labeled training sample set, a regularization term is added to the loss function during iterative model training to impose a penalty on the complexity of the model to improve the generalization ability of the model and prevent overfitting. The second-order derivative is used to make the loss function more accurate, and the idea of ensemble learning is used to optimize the problem of imbalanced categories.
在上述实施例的基础上,本发明实施例中提供的基于Kubernetes的物理资源调度方法,所述迭代训练的过程中,当前轮次的迭代训练时采用的损失函数基于所述当前轮次的前一轮次的迭代训练时采用的损失函数的一阶梯度以及二阶梯度确定。Based on the above embodiments, in the Kubernetes-based physical resource scheduling method provided in the embodiments of the present invention, during the iterative training process, the loss function used in the current round of iterative training is determined based on the first-order gradient and second-order gradient of the loss function used in the previous round of iterative training of the current round.
具体地,本发明实施例中,在进行迭代训练的过程中,假设训练样本集包括有a个训练样本,b个特征,训练样本集可以表示为D={(xi,yi)},xi∈Rb,yi∈R。xi表示训练样本集中的第i个训练样本的特征,yi表示第i个训练样本对应的评价信息标签。Specifically, in the embodiment of the present invention, during the iterative training, assuming that the training sample set includes a training samples and b features, the training sample set can be expressed as D = {( xi , yi )} , xi∈Rb , yi∈R.xi represents the feature of the ith training sample in the training sample set, andyi represents the evaluation information label corresponding to the ith training sample.
CART分类决策树模型采用加法模型与前向分步算法。前向分步算法是指在叠加新的模型的基础上同步进行优化,具体而言,就是每一次叠加的模型都去拟合上一次模型拟合后产生的残差。从算法模型解释上来说,Boosting Tree是决策树的加法模型,因此K棵树的最终预测值为:The CART classification decision tree model uses an additive model and a forward step-by-step algorithm. The forward step-by-step algorithm is to optimize simultaneously on the basis of superimposing a new model. Specifically, each superimposed model fits the residual generated after the previous model is fitted. From the perspective of the algorithm model, Boosting Tree is an additive model of decision trees, so the final prediction value of K trees is for:
F={f(x)=ωq(x)},q:Rb→T,ω∈RT F={f(x)=ω q(x) },q:R b →T,ω∈R T
其中,每个函数fk对应一棵独立的树结构向量q和叶子权重ω,q由训练样本指向相应的叶子标签,每棵CART树的每个叶子节点对应一个连续分数值,即权重,第i个节点的分数为ω;T为叶子节点个数;F为CART分类决策树模型构成的集合;ωq(x)为每个CART分类决策树模型得到的预测值。对于每个训练样本,各CART分类决策树模型依据不同分类规则将其分类到叶子节点中,通过累加对应叶子的分数ω获得最终的预测值。Among them, each function fk corresponds to an independent tree structure vector q and leaf weight ω, q points to the corresponding leaf label from the training sample, each leaf node of each CART tree corresponds to a continuous score value, that is, the weight, the score of the i-th node is ω; T is the number of leaf nodes; F is the set of CART classification decision tree models; ωq(x) is the predicted value obtained by each CART classification decision tree model. For each training sample, each CART classification decision tree model classifies it into a leaf node according to different classification rules, and obtains the final predicted value by accumulating the score ω of the corresponding leaf.
模型的损失函数可表示为:The loss function of the model can be expressed as:
其中,n为训练样本的数量。Among them, n is the number of training samples.
模型的预测精度由模型的偏差和方差共同决定,损失函数代表了模型的偏差,想要方差小则需要在目标函数中添加正则项,用于防止过拟合。因此为了学习模型中的函数集合,模型的目标函数定义为:The prediction accuracy of the model is determined by the model's bias and variance. The loss function represents the model's bias. If you want to reduce the variance, you need to add a regularization term to the objective function to prevent overfitting. Therefore, in order to learn the set of functions in the model, the objective function of the model is defined as:
其中,为可微的凸损失函数,用于度量预测值与标签值的差距;Ω(fk)是一个正则项,用于对模型的复杂程度进行惩罚,该附加的正则项能够平滑各叶节点的权重,避免过拟合。γ、λ为控制模型复杂程度的正则化项参数,参数值越大,模型越不容易过拟合。T表示树里面叶子节点的个数。in, is a differentiable convex loss function used to measure the gap between the predicted value and the label value; Ω(f k ) is a regularization term used to penalize the complexity of the model. This additional regularization term can smooth the weights of each leaf node to avoid overfitting. γ and λ are regularization term parameters that control the complexity of the model. The larger the parameter value, the less likely the model is to overfit. T represents the number of leaf nodes in the tree.
模型中需要得到每一棵树的fk。由于算法基于Boosting Tree模型,想要一次性得到所有树的模型是不可能的,只能通过迭代训练来得到第t棵树。首先定义初始的提升树为则第t次迭代得到的模型就是:The model needs to obtain f k for each tree. Since the algorithm is based on the Boosting Tree model, it is impossible to obtain the model of all trees at once. The only way is to obtain the tth tree through iterative training. First, define the initial boosting tree as Then the model obtained at the tth iteration is:
经过t次的迭代后再进行累加,目标函数就是:After t iterations and accumulation, the objective function is:
再对每步训练目标函数进行二阶泰勒展开,使用泰勒展开的二阶信息能让梯度收敛更快更准确,因为一阶导数指引梯度方向,二阶导数指引梯度方向如何变化,二者结合可以更为精准的逼近真实的损失函数。二阶泰勒展开后的目标函数为:Then perform a second-order Taylor expansion on the objective function of each training step. Using the second-order information of Taylor expansion can make the gradient converge faster and more accurately, because the first-order derivative guides the direction of the gradient, and the second-order derivative guides how the gradient direction changes. The combination of the two can more accurately approximate the real loss function. The objective function after the second-order Taylor expansion is:
其中,gi和hi分别为损失函数的一阶、二阶梯度。将常数项移除,则可以得到第t次训练的简化目标函数为:Among them, gi and hi are the first-order and second-order gradients of the loss function respectively. Removing the constant term, the simplified objective function of the t-th training can be obtained as:
模型最终输出是这一系列弱学习器输出结果的累计,这种训练模式不仅保留了CART分类决策树对于特征选择的最佳切分,更通过累积逼近的方法,摒弃了决策树容易过拟合的劣势,最终得到既充分拟合训练数据,又具有足够泛化能力的调度评价模型。The final output of the model is the accumulation of the output results of this series of weak learners. This training mode not only retains the optimal segmentation of the CART classification decision tree for feature selection, but also eliminates the disadvantage of the decision tree's easy overfitting through the cumulative approximation method. In the end, a scheduling evaluation model is obtained that fully fits the training data and has sufficient generalization ability.
通过对训练结果调优,最终调度评价模型能够成功预测出当前Pod应被分配到的最合适的Node的节点编号,利用此预测结果可继续对调度器进行迭代优化。By tuning the training results, the final scheduling evaluation model can successfully predict the node number of the most suitable Node to which the current Pod should be assigned. This prediction result can be used to continue to iteratively optimize the scheduler.
在调度评价模型训练完成后,继续对所需的数据进行收集。并与原始数据一起作为新的训练数据,持续对调度评价模型进行训练,增强调度评价模型的预测准确性。After the scheduling evaluation model training is completed, continue to collect the required data. And use it together with the original data as new training data to continuously train the scheduling evaluation model to enhance the prediction accuracy of the scheduling evaluation model.
综上所述,本发明实施例中提供的基于Kubernetes的物理资源调度方法,采用人工智能算法,Kubernetes在调度应用的Pod时,不再简单的根据静态资源分配情况来进行调度,而是结合历史的资源使用情况、Pod所在应用的历史情况、当前系统的资源的使用情况、未来的资源使用情况预测来进行综合的评价判断,得到一个最优的结果。其优点在于:In summary, the physical resource scheduling method based on Kubernetes provided in the embodiment of the present invention adopts an artificial intelligence algorithm. When Kubernetes schedules the Pod of an application, it no longer simply schedules it based on the static resource allocation situation, but combines the historical resource usage, the historical situation of the application where the Pod is located, the resource usage of the current system, and the future resource usage forecast to make a comprehensive evaluation and judgment to obtain an optimal result. Its advantages are:
1)解决了原生调度器静态调度机制可能引起的集中调度问题。1) Solve the centralized scheduling problem that may be caused by the static scheduling mechanism of the native scheduler.
2)引入人工智能算法进行节点综合评价,综合考虑历史集群资源使用情况和未来可能的使用情况推测来进行综合判断,调度结果更加准确。2) Introduce artificial intelligence algorithms to conduct comprehensive node evaluation, comprehensively consider historical cluster resource usage and possible future usage speculation to make comprehensive judgments, and the scheduling results are more accurate.
3)支持使用Kubernetes的service机制部署,防止出现调度器单点故障。3) Supports deployment using Kubernetes' service mechanism to prevent single point failure of the scheduler.
4)使用原生的Kubernetes机制引入新的调度器调度机制,对Kubernetes本身没有侵入性的改造,代码完全解耦合。4) Use the native Kubernetes mechanism to introduce a new scheduler scheduling mechanism, without any invasive modification to Kubernetes itself, and the code is completely decoupled.
如图2所示,在上述实施例的基础上,本发明实施例中提供了一种调度器,包括:As shown in FIG. 2 , based on the above embodiment, an embodiment of the present invention provides a scheduler, including:
信息获取模块21,用于获取Kubernetes集群中的调度信息,所述调度信息包括待调度Pod的Pod信息以及各节点的负载信息;The information acquisition module 21 is used to obtain the scheduling information in the Kubernetes cluster, where the scheduling information includes the Pod information of the Pod to be scheduled and the load information of each node;
评价模块22,用于遍历各所述节点,将所述待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到所述调度评价模型输出的所述待调度Pod被调度至所述当前节点时的评价信息;An evaluation module 22 is used to traverse each of the nodes, input the Pod information of the Pod to be scheduled and the load information of the current node into a scheduling evaluation model, and obtain the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model;
调度模块23,用于基于各所述节点对应的评价信息,对所述待调度Pod进行调度;A scheduling module 23, configured to schedule the Pod to be scheduled based on the evaluation information corresponding to each of the nodes;
其中,所述调度评价模型基于训练样本集以及所述训练样本集中各训练样本对应的评价信息标签训练得到,所述训练样本包括Pod样本的Pod信息以及所述Pod样本被调度到的节点样本的负载信息。The scheduling evaluation model is trained based on a training sample set and evaluation information labels corresponding to each training sample in the training sample set, and the training samples include Pod information of Pod samples and load information of node samples to which the Pod samples are scheduled.
在上述实施例的基础上,本发明实施例中提供的调度器,包括聚类模块,用于:On the basis of the above embodiment, the scheduler provided in the embodiment of the present invention includes a clustering module, which is used for:
基于密度聚类算法,对各所述训练样本进行聚类分析,得到多个聚类簇;Based on the density clustering algorithm, cluster analysis is performed on each of the training samples to obtain multiple clusters;
基于所述多个聚类簇中的样本特征,对所述多个聚类簇中包含的训练样本进行集中标注,得到各所述训练样本对应的评价信息标签。Based on the sample features in the multiple clusters, the training samples included in the multiple clusters are collectively labeled to obtain evaluation information labels corresponding to the training samples.
在上述实施例的基础上,本发明实施例中提供的调度器,所述聚类模块,用于:On the basis of the above embodiment, the clustering module of the scheduler provided in the embodiment of the present invention is used for:
遍历各所述训练样本,基于当前训练样本的预设邻域内存在的其他训练样本的数量,判断所述当前训练样本是否为核样本;若所述当前训练样本为核样本,则将所述当前训练样本存入核样本集;Traversing each of the training samples, and judging whether the current training sample is a core sample based on the number of other training samples existing in a preset neighborhood of the current training sample; if the current training sample is a core sample, storing the current training sample in a core sample set;
遍历所述核样本集中的每一核样本,基于当前核样本的预设邻域内存在的其他训练样本与所述当前核样本之间的隶属度,确定所述当前核样本的预设邻域内存在的其他训练样本与所述当前核样本是否属于同一初始簇;Traversing each core sample in the core sample set, and determining whether other training samples existing in a preset neighborhood of the current core sample and the current core sample belong to the same initial cluster based on the degree of membership between the current core sample and other training samples existing in a preset neighborhood of the current core sample;
确定所述核样本集中各核样本对应的初始簇后,计算各所述核样本对应的初始簇之间的轮廓系数;After determining the initial clusters corresponding to the nuclear samples in the nuclear sample set, calculating the silhouette coefficients between the initial clusters corresponding to the nuclear samples;
调整所述密度聚类算法中的预设参数,重复执行遍历动作,计算得到多个轮廓系数,并基于所述多个轮廓系数,确定所述多个聚类簇。The preset parameters in the density clustering algorithm are adjusted, the traversal action is repeatedly performed, a plurality of silhouette coefficients are calculated, and the plurality of cluster clusters are determined based on the plurality of silhouette coefficients.
在上述实施例的基础上,本发明实施例中提供的调度器,还包括距离计算模块,用于:On the basis of the above embodiment, the scheduler provided in the embodiment of the present invention further includes a distance calculation module, which is used to:
计算所述当前训练样本与各所述训练样本之间的Hausdorff距离;Calculating the Hausdorff distance between the current training sample and each of the training samples;
若所述Hausdorff距离小于所述预设邻域的预设半径,则确定所述Hausdorff距离对应的训练样本为所述当前训练样本的预设邻域内存在的其他训练样本。If the Hausdorff distance is smaller than the preset radius of the preset neighborhood, it is determined that the training sample corresponding to the Hausdorff distance is other training samples existing in the preset neighborhood of the current training sample.
在上述实施例的基础上,本发明实施例中提供的调度器,还包括隶属度计算模块,用于:On the basis of the above embodiment, the scheduler provided in the embodiment of the present invention further includes a membership calculation module, which is used to:
遍历所述当前核样本的预设邻域内存在的每一其他训练样本,确定与当前其他训练样本邻近且处于所述当前核样本的预设邻域内的若干临近样本;Traversing each other training sample existing in a preset neighborhood of the current core sample, and determining a number of adjacent samples that are adjacent to the other current training samples and are within the preset neighborhood of the current core sample;
基于所述当前其他训练样本与所述若干临近样本之间的距离的均值、最大值以及最小值,确定所述当前其他训练样本与所述当前核样本之间的隶属度。The degree of membership between the current other training sample and the current core sample is determined based on the average, maximum and minimum values of the distances between the current other training sample and the plurality of adjacent samples.
在上述实施例的基础上,本发明实施例中提供的调度器,还包括训练模块,用于:On the basis of the above embodiment, the scheduler provided in the embodiment of the present invention further includes a training module for:
基于所述训练样本以及所述训练样本集中各训练样本对应的评价信息标签,采用包含有正则项的损失函数,对多个分类决策树模型进行迭代训练,得到所述调度评价模型。Based on the training samples and the evaluation information labels corresponding to each training sample in the training sample set, a loss function including a regularization term is used to iteratively train multiple classification decision tree models to obtain the scheduling evaluation model.
在上述实施例的基础上,本发明实施例中提供的调度器,所述迭代训练的过程中,当前轮次的迭代训练时采用的损失函数基于所述当前轮次的前一轮次的迭代训练时采用的损失函数的一阶梯度以及二阶梯度确定。Based on the above embodiments, in the scheduler provided in the embodiments of the present invention, during the iterative training process, the loss function used in the current round of iterative training is determined based on the first-order gradient and second-order gradient of the loss function used in the previous round of iterative training.
在上述实施例的基础上,本发明实施例中提供的调度器,对Pod进行调度时,允许在优选过程中加入自定义的优选函数。进行优选时,通过HTTP方式调用自定义调度器中的函数并获取相应的优选结果,结合优选结果预置的权重进行优选判断。On the basis of the above embodiment, the scheduler provided in the embodiment of the present invention allows adding a custom optimization function in the optimization process when scheduling Pods. When performing the optimization, the function in the custom scheduler is called through HTTP to obtain the corresponding optimization result, and the optimization judgment is performed in combination with the preset weight of the optimization result.
以下为保证调度器实现调度,Kubernetes中涉及到的工作:The following are the tasks involved in Kubernetes to ensure that the scheduler can implement scheduling:
Kubernetes的API server提供了Watch机制供外围组件获取Kubernetes集群中的各种信息。Pod Cache通过该机制获取Kubernetes集群中Pod的变化情况,并经过数据整理后记录到内存中,供优选函数使用。The Kubernetes API server provides a Watch mechanism for peripheral components to obtain various information in the Kubernetes cluster. Pod Cache uses this mechanism to obtain changes in Pods in the Kubernetes cluster, and records the data in memory after data sorting for use by the optimization function.
因此,在利用调度评价模型进行评价前,需要从数据库中获取进行对节点评价所需的实时监控数据。Therefore, before using the scheduling evaluation model for evaluation, it is necessary to obtain the real-time monitoring data required for node evaluation from the database.
Kubernetes中具有缓存模块,用于通过Watch机制获取实时监控数据并存入内存。A)缓存模块通过Watch集群中Pod的变化信息并存储,并提供查询接口。B)启动对Pod变化的监听。Kubernetes has a cache module that is used to obtain real-time monitoring data through the Watch mechanism and store it in memory. A) The cache module stores the change information of Pods in the Watch cluster and provides a query interface. B) Start monitoring Pod changes.
a)在程序启动时,设置了一个参数,配置了运行环境中Kubernetes config文件的位置。a) When the program starts, a parameter is set to configure the location of the Kubernetes config file in the running environment.
b)基于配置文件的信息,创建了一个Kubernetes客户端,并通过该客户端生成了一个SharedInformerFactory。b) Based on the information in the configuration file, a Kubernetes client is created and a SharedInformerFactory is generated through the client.
c)在Factory上调用Core().V1().Pods(),生成对应的Informer并启动Watch动作。c) Call Core().V1().Pods() on the Factory to generate the corresponding Informer and start the Watch action.
d)提供三个函数PodAdd、PodUpdate、PodDelete,分别用于接收到watch返回的信息时的信息处理。d) Provide three functions PodAdd, PodUpdate, and PodDelete, which are used to process the information returned by watch.
缓存模块的数据结构定义如下:a)由于Watch到的变动信息返回时所使用的结构体corev1.Pod已经包含了比较详细的Pod相关信息,因此首先将该结构体进行简单的封装并命名为PodInfo;b)定义HostPodCache,内部包含一个map,定义为map[types.UID]*PodInfo,用于存放一个Node上的所有有效Pod信息。c)定义PodInfoCache,内部包含一个map,定义为map[string]*HostPodCache,用于存放集群中所有Node的Pod信息,map的key值用于存放Node的名称,该名称从Kubernetes集群中获取。The data structure of the cache module is defined as follows: a) Since the structure corev1.Pod used when the Watched change information is returned already contains relatively detailed Pod-related information, the structure is simply encapsulated and named PodInfo; b) HostPodCache is defined, which contains a map defined as map[types.UID]*PodInfo, which is used to store all valid Pod information on a Node. c) PodInfoCache is defined, which contains a map defined as map[string]*HostPodCache, which is used to store the Pod information of all Nodes in the cluster. The key value of the map is used to store the name of the Node, which is obtained from the Kubernetes cluster.
缓存模块中缓存信息的创建与维护方法如下:a)程序定义一个全局唯一的PodInfoCache实例,该实例在整个程序的运行工程中一直有效,并不断地维护其中的数据。b)在Watch动作生效时,API Service会集中返回当前集群内全部的Pod信息,模块通过处理函数分别将这些Pod信息整理后存入之前定义的PodInfoCache结构中。c)当集群中Pod出现变动时,收到API Service发送的信息后,通过判断Pod的变化状态使用相应的处理函数,将Pod信息向缓存的数据结构中更新。The creation and maintenance methods of cache information in the cache module are as follows: a) The program defines a globally unique PodInfoCache instance, which is valid throughout the entire program operation and continuously maintains the data in it. b) When the Watch action takes effect, the API Service will return all the Pod information in the current cluster in a centralized manner. The module uses the processing function to sort out the Pod information and store it in the previously defined PodInfoCache structure. c) When the Pod in the cluster changes, after receiving the information sent by the API Service, the Pod information is updated in the cached data structure by judging the change status of the Pod and using the corresponding processing function.
调度器在对Pod进行调度时采用的优选函数的格式如下,该格式符合Kubernetes提供的scheduler扩展机制中对优选函数的格式要求:Func func(pod v1.Pod,nodes[]v1.Node)(*schedulerapi.HostPriorityList,error)。The format of the preferred function used by the scheduler when scheduling Pod is as follows, which conforms to the format requirements of the preferred function in the scheduler extension mechanism provided by Kubernetes: Func func(pod v1.Pod,nodes[]v1.Node)(*schedulerapi.HostPriorityList,error).
上述优选函数的实现过程如下:The implementation process of the above optimization function is as follows:
a)生成一个schedulerapi.HostPriorityList,用于保存返回结果。a) Generate a schedulerapi.HostPriorityList to save the returned results.
b)获取当前集群静态资源数据、动态监控数据、Pod缓存数据以及所有节点的实时负载数据。b) Obtain the current cluster static resource data, dynamic monitoring data, Pod cache data, and real-time load data of all nodes.
c)遍历传入的集群Node列表Nodes:将当前节点信息、待调度的Pod信息、相关的应用信息和收集到的其它信息,经过预处理后输入评价模型,得到该Node对于当前Pod的评价结果。并保存到结果集中。c) Traverse the incoming cluster Node list Nodes: input the current node information, the Pod information to be scheduled, the related application information and other collected information into the evaluation model after preprocessing, and obtain the evaluation result of the Node for the current Pod. And save it in the result set.
d)返回结果集,结果集中包含了对传入列表中每个Node的评分结果。d) Returns a result set that contains the scoring results for each Node in the passed list.
具体地,本发明实施例中提供的调度器中各模块的作用与上述方法类实施例中各步骤的操作流程是一一对应的,实现的效果也是一致的,具体参见上述实施例,本发明实施例中对此不再赘述。Specifically, the functions of each module in the scheduler provided in the embodiment of the present invention correspond one-to-one to the operation flow of each step in the above-mentioned method embodiment, and the effects achieved are also consistent. Please refer to the above-mentioned embodiment for details, and no further description will be given in the embodiment of the present invention.
图3示例了一种电子设备的实体结构示意图,如图3所示,该电子设备可以包括:处理器(Processor)310、通信接口(Communications Interface)320、存储器(Memory)330和通信总线340,其中,处理器310,通信接口320,存储器330通过通信总线340完成相互间的通信。处理器310可以调用存储器330中的逻辑指令,以执行上述各实施例中提供的基于Kubernetes的物理资源调度方法,该方法包括:获取Kubernetes集群中的调度信息,所述调度信息包括待调度Pod的Pod信息以及各节点的负载信息;遍历各所述节点,将所述待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到所述调度评价模型输出的所述待调度Pod被调度至所述当前节点时的评价信息;基于各所述节点对应的评价信息,对所述待调度Pod进行调度;其中,所述调度评价模型基于训练样本集以及所述训练样本集中各训练样本对应的评价信息标签训练得到,所述训练样本包括Pod样本的Pod信息以及所述Pod样本被调度到的节点样本的负载信息。Figure 3 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 3, the electronic device may include: a processor (Processor) 310, a communication interface (Communications Interface) 320, a memory (Memory) 330 and a communication bus 340, wherein the processor 310, the communication interface 320, and the memory 330 communicate with each other through the communication bus 340. The processor 310 can call the logic instructions in the memory 330 to execute the Kubernetes-based physical resource scheduling method provided in the above embodiments, the method comprising: obtaining scheduling information in the Kubernetes cluster, the scheduling information including the Pod information of the Pod to be scheduled and the load information of each node; traversing each of the nodes, inputting the Pod information of the Pod to be scheduled and the load information of the current node into the scheduling evaluation model, and obtaining the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model; scheduling the Pod to be scheduled based on the evaluation information corresponding to each of the nodes; wherein the scheduling evaluation model is trained based on a training sample set and the evaluation information labels corresponding to each training sample in the training sample set, and the training sample includes the Pod information of the Pod sample and the load information of the node sample to which the Pod sample is scheduled.
此外,上述的存储器330中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 330 can be implemented in the form of a software functional unit and can be stored in a computer-readable storage medium when it is sold or used as an independent product. Based on such an understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program codes.
另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,计算机程序可存储在非暂态计算机可读存储介质上,所述计算机程序被处理器执行时,计算机能够执行上述各实施例中提供的基于Kubernetes的物理资源调度方法,该方法包括:获取Kubernetes集群中的调度信息,所述调度信息包括待调度Pod的Pod信息以及各节点的负载信息;遍历各所述节点,将所述待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到所述调度评价模型输出的所述待调度Pod被调度至所述当前节点时的评价信息;基于各所述节点对应的评价信息,对所述待调度Pod进行调度;其中,所述调度评价模型基于训练样本集以及所述训练样本集中各训练样本对应的评价信息标签训练得到,所述训练样本包括Pod样本的Pod信息以及所述Pod样本被调度到的节点样本的负载信息。On the other hand, the present invention also provides a computer program product, which includes a computer program, and the computer program can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the Kubernetes-based physical resource scheduling method provided in the above-mentioned embodiments, the method including: obtaining scheduling information in the Kubernetes cluster, the scheduling information including Pod information of the Pod to be scheduled and load information of each node; traversing each of the nodes, inputting the Pod information of the Pod to be scheduled and the load information of the current node into a scheduling evaluation model, and obtaining the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model; scheduling the Pod to be scheduled based on the evaluation information corresponding to each of the nodes; wherein the scheduling evaluation model is trained based on a training sample set and evaluation information labels corresponding to each training sample in the training sample set, and the training sample includes Pod information of the Pod sample and load information of the node sample to which the Pod sample is scheduled.
又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例中提供的基于Kubernetes的物理资源调度方法,该方法包括:获取Kubernetes集群中的调度信息,所述调度信息包括待调度Pod的Pod信息以及各节点的负载信息;遍历各所述节点,将所述待调度Pod的Pod信息以及当前节点的负载信息输入至调度评价模型,得到所述调度评价模型输出的所述待调度Pod被调度至所述当前节点时的评价信息;基于各所述节点对应的评价信息,对所述待调度Pod进行调度;其中,所述调度评价模型基于训练样本集以及所述训练样本集中各训练样本对应的评价信息标签训练得到,所述训练样本包括Pod样本的Pod信息以及所述Pod样本被调度到的节点样本的负载信息。On the other hand, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to execute the Kubernetes-based physical resource scheduling method provided in the above-mentioned embodiments, the method comprising: obtaining scheduling information in the Kubernetes cluster, the scheduling information comprising Pod information of the Pod to be scheduled and load information of each node; traversing each of the nodes, inputting the Pod information of the Pod to be scheduled and the load information of the current node into a scheduling evaluation model, and obtaining the evaluation information of the Pod to be scheduled when it is scheduled to the current node output by the scheduling evaluation model; scheduling the Pod to be scheduled based on the evaluation information corresponding to each of the nodes; wherein the scheduling evaluation model is trained based on a training sample set and evaluation information labels corresponding to each training sample in the training sample set, and the training samples include Pod information of the Pod sample and load information of the node sample to which the Pod sample is scheduled.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210286835.2A CN116841718A (en) | 2022-03-22 | 2022-03-22 | Physical resource scheduling method and scheduler based on Kubernetes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210286835.2A CN116841718A (en) | 2022-03-22 | 2022-03-22 | Physical resource scheduling method and scheduler based on Kubernetes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116841718A true CN116841718A (en) | 2023-10-03 |
Family
ID=88163953
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210286835.2A Pending CN116841718A (en) | 2022-03-22 | 2022-03-22 | Physical resource scheduling method and scheduler based on Kubernetes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116841718A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118312331A (en) * | 2024-06-11 | 2024-07-09 | 湖南科技大学 | Cloud-edge cooperative calculation dynamic regulation and control method |
-
2022
- 2022-03-22 CN CN202210286835.2A patent/CN116841718A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118312331A (en) * | 2024-06-11 | 2024-07-09 | 湖南科技大学 | Cloud-edge cooperative calculation dynamic regulation and control method |
CN118312331B (en) * | 2024-06-11 | 2024-08-06 | 湖南科技大学 | Cloud-edge cooperative calculation dynamic regulation and control method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111369016B (en) | Image recognition-based on-site operation and maintenance method and system | |
CN108734355A (en) | A kind of short-term electric load method of parallel prediction and system applied to power quality harnessed synthetically scene | |
CN113326177B (en) | Index anomaly detection method, device, equipment and storage medium | |
CN112819107B (en) | Fault prediction method of gas pressure regulating equipment based on artificial intelligence | |
CN109213752A (en) | A kind of data cleansing conversion method based on CIM | |
CN116562514B (en) | Method and system for immediately analyzing production conditions of enterprises based on neural network | |
CN110276483A (en) | Prediction Method of Sugar Raw Materials Based on Neural Network | |
CN109308290A (en) | An efficient data cleaning and conversion method based on CIM | |
WO2025087218A1 (en) | Method and system for detecting industrial internet abnormal node, medium, and device | |
CN116883285A (en) | Product image optimization method and system based on intelligent electronic commerce platform | |
CN117522607A (en) | An enterprise financial management system | |
US20210303934A1 (en) | Method of data prediction and system thereof | |
CN116841718A (en) | Physical resource scheduling method and scheduler based on Kubernetes | |
CN118819781B (en) | A method and system for optimizing the scheduling of meteorological satellite data throughout the entire process | |
CN119168611A (en) | A method and system for intelligent operation and maintenance of equipment based on knowledge mining | |
CN119065831A (en) | An edge computing optimization method based on big data network | |
CN118115098A (en) | Big data analysis and processing system based on deep learning | |
CN115185649A (en) | Resource scheduling method, device, equipment and storage medium | |
CN118709761B (en) | A method and system for classifying power label knowledge based on large model | |
CN118193151B (en) | Distributed industrial task collaborative scheduling method and system based on group intelligence | |
CN118941831B (en) | Fiber separation method and device based on computer vision recognition | |
CN114782206B (en) | Method, device, computer equipment and storage medium for predicting claim label | |
CN119228018B (en) | A furniture order splitting processing method, terminal device and storage medium | |
CN114826921A (en) | Network resource dynamic allocation method, system and medium based on sampling subgraph | |
CN118586887A (en) | Method, device, electronic device and storage medium for determining equipment maintenance plan |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |