CN114529004A

CN114529004A - Quantum clustering method based on nearest neighbor KNN and improved wave function

Info

Publication number: CN114529004A
Application number: CN202210151032.6A
Authority: CN
Inventors: 陈云霞; 朱家晓; 王聪; 林坤松
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-24

Abstract

The present invention provides a quantum clustering method based on the nearest neighbor KNN and improved wave function, which includes: acquiring and normalizing the original data of a group of sample points to be classified, and determining the quantum clustering model based on the nearest neighbor KNN. Input parameters, calculate the wave function parameters of all sample points, the wave function parameters include the scale parameters and shape parameters of the distribution that the calculated wave function obeys, calculate the potential energy surface of the quantum clustering, and determine the number of classifications and classifications according to the calculated potential energy surface boundary. The method proposed in the present invention inherits all the advantages of the quantum clustering method, and is more suitable for classifying data subject to Weibull distribution, providing a new choice for data classification, and at the same time, it does not require any artificially given input parameters, The input parameters of the quantum clustering model can be calculated without giving the classification labels of the sample data, which is highly practical and has high accuracy.

Description

Quantum clustering method based on nearest neighbor KNN and improved wave function

技术领域technical field

本发明属于聚类技术领域，涉及数据的分类，特别是一种基于最近邻KNN和改进波函数的量子聚类方法。The invention belongs to the technical field of clustering and relates to data classification, in particular to a quantum clustering method based on nearest neighbor KNN and improved wave function.

背景技术Background technique

量子聚类方法是一种以量子力学为基础所提出的聚类模型，该模型方法的核心思想是粒子都倾向于处于势能值最低处，而同一类粒子则会聚集于同一个势能极小值点附近，因此可通过计算粒子所处的势能面来进行聚类分析，每一个势能面的极小值点即代表着一个聚类的中心点，而势能值的极小值点个数即代表着聚类个数，最后通过势能值来判断粒子所归属的聚类中心点。相比于其他聚类方法，量子聚类不需要提前给出聚类的个数，对数据密度的变化更加敏感，自身输入的参数的微小变化不会影响分类结果，在给定参数下具有完全确定的分类结果。这些优势使得量子聚类自发明以来受到了广泛的关注。The quantum clustering method is a clustering model based on quantum mechanics. The core idea of this model method is that particles tend to be at the lowest potential energy value, while the same type of particles will gather at the same potential energy minimum value. Therefore, cluster analysis can be performed by calculating the potential energy surface where the particle is located. The minimum value point of each potential energy surface represents the center point of a cluster, and the number of minimum value points of the potential energy value represents the The number of clusters is determined, and finally the center point of the cluster to which the particle belongs is determined by the potential energy value. Compared with other clustering methods, quantum clustering does not need to give the number of clusters in advance, and is more sensitive to changes in data density. Small changes in the parameters input by itself will not affect the classification results. definite classification results. These advantages have given quantum clustering a lot of attention since its invention.

量子聚类中假设波函数服从高斯分布，然而在某些情况下，待分类的数据有可能不服从正态分布。此外，尽管量子聚类方法经常被认为是一个无参数聚类方法，然而量子聚类方法仍然需要输入模型参数，目前常用KNN方法和“模式搜索”的方法计算输入模型参数。其中，最近邻KNN方法不需要提供样本标签就可以计算得到输入参数，然而其计算依然需要人为给出近邻个数这一参数，这使得量子聚类很难真正意义上成为一个无参数聚类方法；“模式搜索”方法虽然不再需要人为给定参数，然而该方法需要提供样本分类标签，这又有悖于量子聚类作为无监督学习的核心特点。因此，为适合对服从工程上常见的威布尔分布的数据进行分类，同时使量子聚类方法成为真正意义上的一个无参数聚类方法，寻求一种基于最近邻KNN和改进波函数的量子聚类方法，以在既不人为给定参数、又不提供样本数据分类标签的情况下确定量子聚类模型的输入参数，并为数据分类提供一种新的选择是十分迫切且必要的。In quantum clustering, it is assumed that the wave function obeys a Gaussian distribution. However, in some cases, the data to be classified may not obey a normal distribution. In addition, although the quantum clustering method is often regarded as a parameter-free clustering method, the quantum clustering method still requires input model parameters. Currently, the KNN method and the "pattern search" method are commonly used to calculate the input model parameters. Among them, the nearest neighbor KNN method can calculate the input parameters without providing sample labels, but its calculation still needs to manually give the parameter of the number of neighbors, which makes it difficult for quantum clustering to become a parameter-free clustering method in the true sense. ; Although the "pattern search" method no longer requires artificially given parameters, this method needs to provide sample classification labels, which is contrary to the core feature of quantum clustering as unsupervised learning. Therefore, in order to be suitable for classifying the data subject to the Weibull distribution common in engineering, and at the same time make the quantum clustering method a parameter-free clustering method in the true sense, a quantum clustering method based on the nearest neighbor KNN and improved wave function is sought. It is very urgent and necessary to use a class method to determine the input parameters of quantum clustering model without artificially given parameters and without providing sample data classification labels, and to provide a new choice for data classification.

发明内容SUMMARY OF THE INVENTION

本发明针对上述现有技术中的缺陷，提出一种基于最近邻KNN和改进波函数的量子聚类方法。该方法包括获取一组待分类样本点的原始数据并将其归一化，基于最近邻KNN确定量子聚类模型的输入参数，计算所有样本点的波函数参数，所述波函数参数包括计算波函数所服从分布的尺度参数和形状参数，计算量子聚类的势能面，根据所计算的势能面来确定分类个数和分类边界。本发明所提方法继承了量子聚类方法的所有优点，且更适合对服从威布尔分布的数据进行分类，为数据分类提供了一种新的选择，同时既不需要人为给定任何输入参数，也不需要给出样本数据的分类标签，即可计算得到量子聚类模型的输入参数，实用性强且准确率高。Aiming at the above-mentioned defects in the prior art, the present invention proposes a quantum clustering method based on nearest neighbor KNN and improved wave function. The method includes acquiring and normalizing raw data of a set of sample points to be classified, determining input parameters of a quantum clustering model based on the nearest neighbor KNN, and calculating wave function parameters of all sample points, where the wave function parameters include computational wave The function obeys the scale parameters and shape parameters of the distribution, calculates the potential energy surface of the quantum cluster, and determines the number of classifications and classification boundaries according to the calculated potential energy surface. The method proposed in the present invention inherits all the advantages of the quantum clustering method, and is more suitable for classifying data subject to Weibull distribution, providing a new choice for data classification, and at the same time, it does not require any artificially given input parameters, The input parameters of the quantum clustering model can be calculated without giving the classification labels of the sample data, which is highly practical and has high accuracy.

本发明提供一种基于最近邻KNN和改进波函数的量子聚类方法，其包括以下步骤：The present invention provides a quantum clustering method based on nearest neighbor KNN and improved wave function, which comprises the following steps:

S1、获取一组待分类样本点的原始数据并将其归一化；S1. Obtain the raw data of a set of sample points to be classified and normalize them;

S2、基于最近邻KNN确定量子聚类模型的输入参数，所述输入参数为波函数所服从分布的方差s_weibull；S2, determine the input parameter of the quantum clustering model based on the nearest neighbor KNN, and the input parameter is the variance s _weibull of the distribution that the wave function obeys;

S21、依次计算所有可能近邻个数K的最近邻KNN结果

S21. Calculate the nearest neighbor KNN results of all possible neighbors K in turn

S22、计算最近邻KNN结果的增量

S22. Calculate the increment of the nearest neighbor KNN result

S23、寻找最近邻KNN结果的增量最大值对应的近邻个数：寻找最近邻KNN结果增量

的最大值

并找到所对应的近邻个数K_max；S23. Find the number of neighbors corresponding to the incremental maximum value of the nearest neighbor KNN result: find the nearest neighbor KNN result increment

the maximum value of

and find the corresponding number of neighbors K _max ;

S24、获取所求近邻个数对应的最近邻KNN结果作为量子聚类模型的输入参数：基于步骤S21得到的所有最近邻KNN结果

根据步骤S23获得的K_max获取相应的最近邻KNN结果

并以此作为量子聚类模型的输入参数

S24. Obtain the nearest neighbor KNN results corresponding to the number of neighbors sought as the input parameters of the quantum clustering model: based on all the nearest neighbor KNN results obtained in step S21

Obtain the corresponding nearest neighbor KNN result according to the K _max obtained in step S23

and use it as the input parameter of the quantum clustering model

S3、根据步骤S2所提供输入参数s_weibull，计算所有样本点的波函数参数，所述波函数参数包括计算波函数所服从分布的尺度参数和形状参数；S3, according to the input parameter s _weibull provided in step S2, calculate the wave function parameters of all sample points, and the wave function parameters include the scale parameters and shape parameters of the distribution obeyed by the calculated wave function;

S31、获得步骤S1中归一化后的第i个样本点

和步骤S2中所提供的输入参数s_weibull；S31. Obtain the i-th sample point normalized in step S1

and the input parameter _sweibull provided in step S2;

S32、计算第i个样本点

和输入参数s_weibull的商，利用两者的商和形状参数的关系公式获得第i个样本点

所对应的形状参数β_i：S32. Calculate the ith sample point

and the quotient of the input parameter s _weibull , use the relational formula between the quotient of the two and the shape parameter to obtain the i-th sample point

The corresponding shape parameter β _i :

其中，Γ(·)表示伽马函数；Among them, Γ( ) represents the gamma function;

S33、计算第i个样本点

所对应的尺度参数η_i：S33. Calculate the ith sample point

The corresponding scale parameter η _i :

S34、重复步骤S31到S33，计算所有样本点所对应的形状参数β_i和尺度参数η_i；S34, repeat steps S31 to S33, calculate the shape parameter β _i and scale parameter η _i corresponding to all sample points;

S4、计算量子聚类的势能面；S4. Calculate the potential energy surface of quantum clustering;

S41、计算波函数ψ_wei(x)：S41. Calculate the wave function ψ _wei (x):

S42、计算势能面函数V_wei(x)：S42. Calculate the potential energy surface function V _wei (x):

其中，E表示量子聚类模型所设置的能量常数；Among them, E represents the energy constant set by the quantum clustering model;

S5、根据步骤S4所计算的势能面来确定分类个数和分类边界，所确定的分类个数等同于势能面函数V_wei(x)的极小值点个数，所确定的分类边界由极小值点之间的势能鞍点和势能极大值点共同确定。S5, determine the number of classifications and the classification boundary according to the potential energy surface calculated in step S4, the determined number of classifications is equal to the number of minimum value points of the potential energy surface function V _wei (x), and the determined classification boundary is determined by the extreme The potential energy saddle point and the potential energy maximum value point between the small value points are jointly determined.

进一步，所述步骤S21具体包括以下步骤：Further, the step S21 specifically includes the following steps:

S211、针对第i个待分类样本点的归一化数据

寻找距离该样本点最近的K个数据点

S211, normalized data for the i-th sample point to be classified

Find the K data points closest to the sample point

S212、基于待分类样本点的归一化数据

计算第i个待分类样本点和距离其最近的K个样本点的距离平均值y_i ^K，即S212, normalized data based on sample points to be classified

Calculate the average distance y _i ^K between the i-th sample point to be classified and the K nearest sample points, namely

S213、重复执行步骤S211到S212，计算i＝1到i＝N的所有待分类样本点所对应的距离平均值

其中N表示待分类样本点的个数；S213. Repeat steps S211 to S212 to calculate the average distances corresponding to all sample points to be classified from i=1 to i=N

where N represents the number of sample points to be classified;

S214、计算所有待分类样本点的距离平均值

的平均值，作为K个近邻个数下的最近邻KNN结果

即：S214. Calculate the average distance of all sample points to be classified

The average value of , as the nearest neighbor KNN result under the number of K nearest neighbors

which is:

S215、重复执行步骤S211到S214，计算近邻个数为K＝1到K＝N-1的所有最近邻KNN结果

S215. Repeat steps S211 to S214 to calculate all the nearest neighbor KNN results with the number of neighbors from K=1 to K=N-1

所述步骤S22具体包括以下步骤：The step S22 specifically includes the following steps:

S221、根据所计算得到的所有最近邻KNN结果

计算近邻个数为K下的最近邻KNN结果增量

即：S221. According to all the calculated nearest neighbor KNN results

Calculate the nearest neighbor KNN result increment when the number of neighbors is K

which is:

S222、重复执行步骤S221，计算近邻个数为K＝1到K＝N-2的所有最近邻KNN结果增量

S222. Repeat step S221 to calculate the result increment of all nearest neighbors KNN whose number of neighbors is K=1 to K=N-2

进一步，所述步骤S1具体包括以下步骤：Further, the step S1 specifically includes the following steps:

S11、获取一组待分类样本点的原始数据X＝[x₁,x₂,x₃,...,x_N]；S11. Obtain a set of raw data X=[x ₁ , x ₂ , x ₃ ,...,x _N ] of the sample points to be classified;

S12、寻找待分类样本点的原始数据X中的最大值x_max和最小值x_min；S12, find the maximum value x _max and the minimum value x _min in the original data X of the sample points to be classified;

S13、设置所归一化后数据的最大值

和最小值

S13. Set the maximum value of the normalized data

and minimum

S14、基于最大最小值标准化的归一化方法，计算待分类样本点的归一化数据

其中

为：S14. Calculate the normalized data of the sample points to be classified based on the normalization method based on the normalization of the maximum and minimum values.

in

for:

可优选的，所述步骤S32中要求计算得到的形状参数β_i＞2。Preferably, the calculated shape parameter β _i >2 is required in the step S32 .

可优选的，所述步骤S2中所述量子聚类模型中波函数服从威布尔分布。Preferably, the wave function in the quantum clustering model in the step S2 obeys a Weibull distribution.

可优选的，所述步骤S32中采用数值解获得第i个样本点

所对应的形状参数β_i。Preferably, in the step S32, a numerical solution is used to obtain the ith sample point

The corresponding shape parameter β _i .

可优选的，所述步骤S14中归一化方法包括最大最小值标准化或零-均值z-score标准化。Preferably, the normalization method in step S14 includes maximum and minimum normalization or zero-mean z-score normalization.

与现有技术相比，本发明的技术效果为：Compared with the prior art, the technical effect of the present invention is:

1、本发明设计的一种基于最近邻KNN和改进波函数的量子聚类方法，与现有的其他聚类方法相比，所提方法继承了量子聚类方法的所有优点，即不需要提前给出聚类的个数即可实现分类，对数据密度的变化相比于其他聚类方法更加敏感，自身输入的参数的微小变化不会影响分类结果，在给定参数下具有完全确定的分类结果；与现有的量子聚类方法相比，所提方法作为量子聚类方法的一种补充，更适合对服从威布尔分布的数据进行分类，为数据分类提供了一种新的选择。1. A quantum clustering method based on the nearest neighbor KNN and improved wave function designed by the present invention, compared with other existing clustering methods, the proposed method inherits all the advantages of the quantum clustering method, that is, it does not require advance The classification can be achieved by giving the number of clusters. Compared with other clustering methods, it is more sensitive to changes in data density. Small changes in the parameters input by itself will not affect the classification results, and it has a completely definite classification under the given parameters. Results; Compared with the existing quantum clustering methods, the proposed method, as a supplement to the quantum clustering method, is more suitable for classifying the data obeying Weibull distribution, and provides a new option for data classification.

2、本发明设计的一种基于最近邻KNN和改进波函数的量子聚类方法，遍历可能近邻个数的最近邻KNN结果，通过计算最近邻KNN结果增量的最大值来确定近邻个数，并以此对应的最近邻KNN结果作为量子聚类模型的输入参数，简单实用且准确率高；不需要人为给定任何输入参数即可计算得到量子聚类模型的输入参数，且不需要给出样本数据的分类标签，相比于已有的量子聚类的参数给定方法具有很大改进。2. A quantum clustering method based on the nearest neighbor KNN and the improved wave function designed by the present invention traverses the nearest neighbor KNN results of the possible number of neighbors, and determines the number of neighbors by calculating the maximum value of the increment of the nearest neighbor KNN results, And the corresponding nearest neighbor KNN result is used as the input parameter of the quantum clustering model, which is simple and practical and has high accuracy; the input parameters of the quantum clustering model can be calculated without any artificially given input parameters, and do not need to be given. The classification labels of sample data are greatly improved compared to the existing methods of parameter setting for quantum clustering.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本申请的其它特征、目的和优点将会变得更明显。Other features, objects and advantages of the present application will become more apparent upon reading the detailed description of non-limiting embodiments with reference to the following drawings.

图1是本发明的基于最近邻KNN和改进波函数的量子聚类方法流程图；Fig. 1 is the quantum clustering method flow chart of the present invention based on nearest neighbor KNN and improved wave function;

图2是本发明的确定量子聚类模型的输入参数流程图；Fig. 2 is the input parameter flow chart of determining quantum clustering model of the present invention;

图3是本发明的随机生成的三类一维样本数据示例图；Fig. 3 is an example diagram of three types of one-dimensional sample data randomly generated according to the present invention;

图4是本发明的归一化后所随机生成的三类一维样本数据示例图；4 is an example diagram of three types of one-dimensional sample data randomly generated after normalization of the present invention;

图5是本发明的不同的威布尔形状参数下所对应的样本数据除以输入参数的曲线图；Fig. 5 is the graph of the corresponding sample data divided by the input parameter under different Weibull shape parameters of the present invention;

图6是本发明所计算得到的量子聚类势能曲线示例图；Fig. 6 is the quantum clustering potential energy curve example diagram that the present invention calculates;

图7是本发明的随机生成的两类一维样本数据示例图；7 is an example diagram of two types of one-dimensional sample data randomly generated according to the present invention;

图8是本发明的归一化后所随机生成的两类一维样本数据示例图；8 is an example diagram of two types of one-dimensional sample data randomly generated after normalization of the present invention;

图9是本发明的不同近邻个数下的最近邻KNN结果示例图；9 is an example diagram of the nearest neighbor KNN result under different numbers of neighbors of the present invention;

图10是本发明的不同近邻个数下的最近邻KNN增量示例图；10 is an example diagram of the nearest neighbor KNN increment under different neighbor numbers of the present invention;

图11是本发明所提供的输入参数所计算得到的量子聚类势能曲线示例图。FIG. 11 is an example diagram of a quantum clustering potential energy curve calculated by the input parameters provided by the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

图1示出了本发明的基于最近邻KNN和改进波函数的量子聚类方法，该方法包括以下步骤：Fig. 1 shows the quantum clustering method based on the nearest neighbor KNN and improved wave function of the present invention, and the method comprises the following steps:

S1、获取一组待分类样本点的原始数据并将其归一化。S1. Obtain the raw data of a set of sample points to be classified and normalize them.

S11、获取一组待分类样本点的原始数据X＝[x₁,x₂,x₃,...,x_N]。S11. Obtain a set of raw data X=[x ₁ , x ₂ , x ₃ , . . . , x _N ] of the sample points to be classified.

S12、寻找待分类样本点的原始数据X中的最大值x_max和最小值x_min。S12. Find the maximum value x _max and the minimum value x _min in the original data X of the sample points to be classified.

S13、设置所归一化后数据的最大值

和最小值

步骤S212中要求计算得到的形状参数β_i＞2。S13. Set the maximum value of the normalized data

and minimum

In step S212, the calculated shape parameter β _i >2 is required.

其中

in

for:

归一化方法包括但不仅限于最大最小值标准化或零-均值z-score标准化。Normalization methods include, but are not limited to, max-min normalization or zero-mean z-score normalization.

S2、基于最近邻KNN确定量子聚类模型的输入参数，量子聚类模型中波函数服从威布尔分布，输入参数为波函数所服从分布的方差s_weibull。如图2所示。S2. Determine the input parameters of the quantum clustering model based on the nearest neighbor KNN. In the quantum clustering model, the wave function obeys the _Weibull distribution, and the input parameter is the variance s weibull of the distribution obeyed by the wave function. as shown in picture 2.

S21、依次计算所有可能近邻个数K的最近邻KNN结果

S211、针对第i个待分类样本点的归一化数据

寻找距离该样本点最近的K个数据点

S211, normalized data for the i-th sample point to be classified

Find the K data points closest to the sample point

S212、基于待分类样本点的归一化数据

其中N表示待分类样本点的个数。S213. Repeat steps S211 to S212 to calculate the average distances corresponding to all sample points to be classified from i=1 to i=N

where N represents the number of sample points to be classified.

S214、计算所有待分类样本点的距离平均值

的平均值，作为K个近邻个数下的最近邻KNN结果

which is:

S22、计算最近邻KNN结果的增量

S22. Calculate the increment of the nearest neighbor KNN result

S221、根据所计算得到的所有最近邻KNN结果

计算近邻个数为K下的最近邻KNN结果增量

即：S221. According to all the calculated nearest neighbor KNN results

which is:

的最大值

并找到所对应的近邻个数K_max。S23. Find the number of neighbors corresponding to the incremental maximum value of the nearest neighbor KNN result: find the nearest neighbor KNN result increment

the maximum value of

And find the corresponding number of neighbors K _max .

根据步骤S23获得的K_max获取相应的最近邻KNN结果

并以此作为量子聚类模型的输入参数

and use it as the input parameter of the quantum clustering model

S3、根据步骤S2所提供输入参数s_weibull，计算所有样本点的波函数参数，波函数参数包括计算波函数所服从分布的尺度参数和形状参数。S3. Calculate the wave function parameters of all sample points according to the input parameter s _weibull provided in step S2, where the wave function parameters include scale parameters and shape parameters of the distribution obeyed by the calculated wave function.

S31、获得步骤S1中归一化后的第i个样本点

和步骤S2中所提供的输入参数s_weibull。S31. Obtain the i-th sample point normalized in step S1

and the input parameter _sweibull provided in step S2.

S32、计算第i个样本点

和输入参数s_weibull的商，利用两者的商和形状参数的关系公式采用数值解获得第i个样本点

所对应的形状参数β_i：S32. Calculate the ith sample point

and the quotient of the input parameter s _weibull , use the relationship between the quotient of the two and the shape parameter to use the numerical solution to obtain the i-th sample point

The corresponding shape parameter β _i :

其中，Γ(·)表示伽马函数。where Γ(·) represents the gamma function.

要求计算得到的形状参数β_i＞2。The calculated shape parameter β _i >2 is required.

S33、计算第i个样本点

所对应的尺度参数η_i：S33. Calculate the ith sample point

The corresponding scale parameter η _i :

S34、重复步骤S31到S33，计算所有样本点所对应的形状参数β_i和尺度参数η_i。S34. Repeat steps S31 to S33 to calculate the shape parameter β _i and scale parameter η _i corresponding to all the sample points.

S4、计算量子聚类的势能面。S4. Calculate the potential energy surface of the quantum clustering.

S41、计算波函数ψ_wei(x)：S41. Calculate the wave function ψ _wei (x):

其中，E表示量子聚类模型所设置的能量常数。Among them, E represents the energy constant set by the quantum clustering model.

下面结合具体的案例对本发明做进一步的详细说明。The present invention will be further described in detail below in conjunction with specific cases.

在一个具体实施例中，图3是根据三组不同的威布尔分布参数来随机生成的三类一维的数据样本，用来展示本方法的计算流程。图3中第一类样本包含50个数据点，第二类样本包含30个数据点，第三类样本包含20个数据点。In a specific embodiment, FIG. 3 is three types of one-dimensional data samples randomly generated according to three sets of different Weibull distribution parameters, which are used to illustrate the calculation process of the present method. In Figure 3, the first type of samples contains 50 data points, the second type of samples contains 30 data points, and the third type of samples contains 20 data points.

S1、获取图3中的100个待分类数据并将数据归一化。S1. Acquire 100 pieces of data to be classified in FIG. 3 and normalize the data.

设置所归一化后数据的最大值

和最小值

Set the maximum value of the normalized data

and minimum

根据公式(1)计算归一化后的数据，归一化后的数据如图4所示。The normalized data is calculated according to formula (1), and the normalized data is shown in FIG. 4 .

S2、提供量子聚类模型的输入参数s，所提供的输入参数s是波函数所服从的威布尔分布的方差s_weibull。S2. Provide the input parameter s of the quantum clustering model, and the provided input parameter s is the variance s weibull of the _Weibull distribution obeyed by the wave function.

最近邻KNN法的K值设置为25，并以此确定所提供的输入参数。The K value of the nearest neighbor KNN method is set to 25, and this determines the provided input parameters.

S3、根据步骤S2所提供输入参数s_weibull，计算所有样本点的波函数参数。S3. Calculate the wave function parameters of all sample points according to the input parameter s _weibull provided in step S2.

其中步骤S3中的S32步骤需要使用数值解的方式求得样本点

所对应的形状参数β_i，其中要求计算得到的形状参数β_i＞2；若令

则f_i随x_i的变化曲线如图5所示。The step S32 in the step S3 needs to use a numerical solution to obtain the sample points

The corresponding shape parameter β _i , which requires the calculated shape parameter β _i >2; if let

Then the change curve of f _i with _xi is shown in Figure 5.

S4、计算量子聚类的势能面，计算所得到的势能面函数随x的变化曲线如图6所示。S4. Calculate the potential energy surface of the quantum clustering, and the variation curve of the obtained potential energy surface function with x is shown in FIG. 6 .

S5、根据步骤S4所计算的势能面来确定分类个数和分类边界。如图6所示，可见势能曲线包含三个极小值，因此该组数据被分为三类。其中分类边界通过极小值之间的极大值点确定。图6中可看出第二类样本数据中有三个样本数据出现了错误归类的现象，而其他数据均分类准确。可见本方法具有较高的分类准确程度。S5. Determine the number of classifications and classification boundaries according to the potential energy surface calculated in step S4. As shown in Figure 6, it can be seen that the potential energy curve contains three minima, so this set of data is divided into three categories. The classification boundary is determined by the maximum points between the minimum values. In Figure 6, it can be seen that there are three sample data in the second type of sample data that are misclassified, while the other data are classified accurately. It can be seen that this method has a high classification accuracy.

在另外一个具体实施例中，图7是随机生成的两类一维的数据样本，用来展示本方法的步骤S2基于最近邻KNN确定量子聚类模型输入参数的计算流程。图7中第一类样本包含50个数据点，第二类样本包含30个数据点。In another specific embodiment, FIG. 7 shows two types of one-dimensional data samples randomly generated, which is used to illustrate the calculation process of determining the input parameters of the quantum clustering model based on the nearest neighbor KNN in step S2 of the method. In Figure 7, the first type of samples contains 50 data points, and the second type of samples contains 30 data points.

S1、获取图7中的所有数据样本的值并将其归一化。S1. Obtain the values of all data samples in Figure 7 and normalize them.

数据样本中的最大值5.7283，最小值0.6202；The maximum value in the data sample is 5.7283, and the minimum value is 0.6202;

根据公式(1)计算归一化后的数据，归一化后的数据如图8所示。The normalized data is calculated according to formula (1), and the normalized data is shown in FIG. 8 .

S21、依次计算近邻个数K从1到79时的KNN结果。图9为计算所得到的不同近邻个数K下的最近邻KNN结果。S21. Calculate the KNN result when the number of neighbors K ranges from 1 to 79 in sequence. Figure 9 shows the results of the nearest neighbor KNN under different numbers of neighbors K obtained by calculation.

S22、计算最近邻KNN结果的增量。图10为计算得到的最近邻KNN增量随不同近邻个数K的变化曲线。S22. Calculate the increment of the nearest neighbor KNN result. Figure 10 shows the change curve of the calculated nearest neighbor KNN increment with different number of neighbors K.

S23、寻找最近邻KNN增量的最大值，并找到所对应的近邻个数K_max。根据图10可知在K＝29时的最近邻KNN结果增量达到最大值，即

是所有不同近邻个数K下的最近邻KNN结果增量的最大值。S23. Find the maximum value of the nearest neighbor KNN increment, and find the corresponding nearest neighbor number K _max . According to Figure 10, it can be seen that the increment of the nearest neighbor KNN result reaches the maximum value when K=29, that is,

is the maximum value of the nearest-neighbor KNN result increment for all different number of neighbors K.

S24、根据步骤S23中所得到的K_max＝29，获得步骤S21中计算得到的相应的最近邻KNN结果：

以此作为量子聚类模型的输入参数s_weibull＝0.1108。S24. According to K _max =29 obtained in step S23, obtain the corresponding nearest neighbor KNN result calculated in step S21:

Take this as the input parameter of the quantum clustering model s weibull = _0.1108 .

图11展示的是根据该输入参数所计算得到的量子聚类势能曲线，势能曲线的极小值点个数代表分类的个数，样本数据归属于同一个极小值点就是量子聚类所分类得到的同一类样本。Figure 11 shows the quantum clustering potential energy curve calculated according to the input parameters. The number of minimum points of the potential energy curve represents the number of classifications. The sample data belonging to the same minimum point is classified by quantum clustering. obtained samples of the same type.

本发明设计的一种基于最近邻KNN和改进波函数的量子聚类方法，与现有的其他聚类方法相比，所提方法继承了量子聚类方法的所有优点，即不需要提前给出聚类的个数即可实现分类，对数据密度的变化相比于其他聚类方法更加敏感，自身输入的参数的微小变化不会影响分类结果，在给定参数下具有完全确定的分类结果；与现有的量子聚类方法相比，所提方法作为量子聚类方法的一种补充，更适合对服从威布尔分布的数据进行分类，为数据分类提供了一种新的选择；遍历可能近邻个数的最近邻KNN结果，通过计算最近邻KNN结果增量的最大值来确定近邻个数，并以此对应的最近邻KNN结果作为量子聚类模型的输入参数，简单实用且准确率高；不需要人为给定任何输入参数即可计算得到量子聚类模型的输入参数，且不需要给出样本数据的分类标签，相比于已有的量子聚类的参数给定方法具有很大改进。Compared with other existing clustering methods, the proposed method inherits all the advantages of the quantum clustering method, that is, it does not need to be given in advance. The number of clusters can achieve classification, and it is more sensitive to changes in data density than other clustering methods. Small changes in the parameters input by itself will not affect the classification results, and the classification results can be completely determined under the given parameters; Compared with the existing quantum clustering methods, the proposed method, as a supplement to the quantum clustering method, is more suitable for classifying data subject to Weibull distribution, and provides a new choice for data classification; traversing the possible neighbors The number of nearest neighbor KNN results, the number of nearest neighbors is determined by calculating the maximum value of the increment of the nearest neighbor KNN result, and the corresponding nearest neighbor KNN result is used as the input parameter of the quantum clustering model, which is simple and practical and has high accuracy; The input parameters of the quantum clustering model can be calculated without artificially specifying any input parameters, and the classification labels of the sample data do not need to be given, which is a great improvement compared with the existing methods of specifying parameters for quantum clustering.

最后所应说明的是：以上实施例仅以说明而非限制本发明的技术方案，尽管参照上述实施例对本发明进行了详细说明，本领域的普通技术人员应当理解：依然可以对本发明进行修改或者等同替换，而不脱离本发明的精神和范围的任何修改或局部替换，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only to illustrate rather than limit the technical solutions of the present invention, although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: the present invention can still be modified or Equivalent replacements, and any modifications or partial replacements that do not depart from the spirit and scope of the present invention, shall all be included in the scope of the claims of the present invention.

Claims

1. A quantum clustering method based on nearest neighbor KNN and improved wave function is characterized by comprising the following steps:

s1, acquiring and normalizing the original data of a group of sample points to be classified;

s2, determining input parameters of the quantum clustering model based on the nearest neighbor KNN, wherein the input parameters are the variance S of the distribution obeyed by the wave function_weibull；

S21, calculating the nearest neighbor KNN result of all possible neighbor numbers K in sequence

S22, calculating the increment of the nearest neighbor KNN result

S23, searching the neighbor number corresponding to the increment maximum value of the nearest neighbor KNN result: finding nearest neighbor KNN result increment

Maximum value of

And find the corresponding neighbor number K_max；

S24, acquiring a nearest neighbor KNN result corresponding to the number of the obtained neighbors as an input parameter of the quantum clustering model: based on all nearest neighbor KNN results obtained in step S21

K obtained according to step S23_maxObtaining corresponding nearest neighbor KNN result

And the parameters are used as input parameters of a quantum clustering model

S3, providing the input parameter S according to the step S2_weibullCalculating wave function parameters of all sample points, wherein the wave function parameters comprise scale parameters and shape parameters of distribution obeyed by the calculated wave function;

s31, obtaining the ith sample point normalized in the step S1

And the input parameter S provided in step S2_weibull；

S32, calculating the ith sample point

And an input parameter s_weibullThe ith sample point is obtained by using the quotient of the two and the relation formula of the shape parameters

Corresponding shape parameter beta_i：

Wherein Γ (·) represents a gamma function;

s33, calculating the ith sample point

Corresponding scale parameter eta_i：

S34, repeating the steps S31 to S33, and calculating the shape parameters beta corresponding to all the sample points_iAnd a scale parameter η_i；

S4, calculating the potential energy surface of the quantum cluster;

s41, calculating wave function psi_wei(x)：

S42, calculating a potential energy surface function V_wei(x)：

Wherein E represents an energy constant set by the quantum clustering model;

s5, determining the classification number and the classification boundary according to the potential energy surface calculated in the step S4, wherein the determined classification number is equal to the potential energy surface function V_wei(x) The determined classification boundary is determined by potential energy saddle points and potential energy maximum points between the minimum value points.

2. The quantum clustering method based on nearest neighbor KNN and improved wave function as claimed in claim 1, wherein the step S21 specifically comprises the following steps:

s211, normalizing data aiming at ith sample point to be classified

Searching K data points nearest to the sample point

S212, normalizing data based on sample points to be classified

Calculating the distance average value y of the ith sample point to be classified and K sample points nearest to the ith sample point to be classified_i ^KI.e. by

S213, repeating steps S211 to S212, and calculating the distance average corresponding to all sample points to be classified, where i is 1 to i is N

Wherein N represents the number of sample points to be classified;

s214, calculating the distance average value of all sample points to be classified

As the nearest neighbor KNN result for the number of K neighbors

Namely:

s215, repeating steps S211 to S214, and calculating all nearest neighbor KNN results with the number of neighbors K-1 to K-N-1

The step S22 specifically includes the following steps:

s221, all nearest neighbor KNN results obtained by calculation

Calculating nearest neighbor KNN result increment under K nearest neighbor number

Namely:

s222, repeatedly executing step S221, and calculating all nearest neighbor KNN result increments with the number of neighbors K-1 to K-N-2

3. The quantum clustering method based on nearest neighbor KNN and improved wave function as claimed in claim 1, wherein the step S1 specifically comprises the following steps:

s11, acquiring a group of classes to be classifiedRaw data X ═ X for sample points₁,x₂,x₃,...,x_N]；

S12, finding the maximum value X in the original data X of the sample points to be classified_maxAnd the minimum value x_min；

S13, setting the maximum value of the normalized data

And minimum value

S14, calculating the normalized data of the sample points to be classified based on the normalization method of the maximum and minimum value normalization

Wherein

Comprises the following steps:

4. the quantum clustering model of improved wave function as claimed in claim 1, wherein the step S32 requires the calculated shape parameter β_i＞2。

5. The quantum clustering model for improving the wave function as claimed in claim 1, wherein the wave function in the quantum clustering model in the step S2 follows a weibull distribution.

6. The quantum clustering model of improved wave function as claimed in claim 1, wherein the step S32 adopts numerical solution to obtain the i-th sample point

Corresponding shape parameter beta_i。

7. The method for determining the input parameter of the quantum clustering model of claim 1, wherein the normalization method in the step S14 comprises maximum-minimum normalization or zero-mean z-score normalization.