CN104598720B

CN104598720B - Cmp time setting method based on cluster and multi-task learning

Info

Publication number: CN104598720B
Application number: CN201410805040.3A
Authority: CN
Inventors: 刘民; 段运强; 董明宇; 郝井华
Original assignee: Tsinghua University
Current assignee: Zhengda Industrial Biotechnology (shanghai) Co Ltd
Priority date: 2014-12-23
Filing date: 2014-12-23
Publication date: 2018-04-10
Anticipated expiration: 2034-12-23
Also published as: CN104598720A

Abstract

The chemical mechanical grinding time setting method based on clustering and multi-task learning belongs to the field of automatic control, information technology and advanced manufacturing. The method takes the process index and product state as input, and the key factor affecting the process index - grinding time is used as the output to establish a reverse model to optimize the setting of grinding time. In the process of constructing the above reverse model, in view of the problem of many types of production varieties and little data of a single variety, similar varieties are clustered according to product characteristics, and a multi-task learning method based on common parameter extraction is used for modeling in each category; The calculated model parameters are divided into the common part of this kind of variety and the private part of single variety.

Description

A chemical mechanical grinding time setting method based on clustering and multi-task learning

技术领域technical field

本发明属于自动控制、信息技术和先进制造领域。为解决现代微电子生产制造中多品种混合生产模式下人工设定化学机械研磨时间所导致的返工率高，整条生产线的生产效率偏低的问题，本发明采用基于逆向模型的优化设定方法，将工艺指标及产品状态作为输入，影响工艺指标的关键因素一研磨时间作为输出建立逆向模型用于对研磨时间进行优化设定。针对生产数据中所存在的产品品种多且单个品种数据量少的问题，提出一种基于聚类和多任务学习的化学机械研磨时间设定方法，可实现对化学机械研磨时间的优化设定，提高化学机械研磨工序的生产效率。The invention belongs to the fields of automatic control, information technology and advanced manufacturing. In order to solve the problems of high rework rate and low production efficiency of the entire production line caused by manual setting of chemical mechanical grinding time in the multi-variety mixed production mode in modern microelectronics production and manufacturing, the present invention adopts an optimal setting method based on a reverse model , taking the process index and product status as input, and the key factor affecting the process index—grinding time as output—to establish a reverse model to optimize the setting of grinding time. Aiming at the problem that there are many product varieties in the production data and the data volume of a single variety is small, a chemical mechanical grinding time setting method based on clustering and multi-task learning is proposed, which can realize the optimal setting of chemical mechanical grinding time. Increases productivity in chemical mechanical polishing processes.

背景技术Background technique

化学机械研磨是微电子制造过程中的关键工序，影响着整条生产线的生产效率。由于现代微电子制造业多品种小批量的生产特点，企业常采用多品种混合生产的模式。在这种混合生产模式下，传统的Run-to-Run(RtR)优化控制方法难以取得理想效果。RtR的主要思想是利用上一批次乃至几批次的生产信息指导新来批次的生产，一般假设相同品种的产品在一台设备上连续加工或假设若干台设备有规律的循环加工少量品种。但由于多品种混合生产下若干台设备可能连续加工任意品种的产品，不同品种和设备间存在差异，传统RtR方法效果较差。因此，目前研磨时间仍然依靠人工经验设定及先行片测试，一旦出现不合格产品，整个批次都要进行返工，难以实现较高的生产效率，亟需对化学机械研磨的加工时间进行优化设定。传统的操作参数优化设定方法需要首先对工艺指标进行建模，然后在指标模型的基础上优化操作参数。工艺指标模型是以产品状态及操作参数为输入、工艺指标为输出的正向模型，可是在数据较少的时候，很难准确建立指标模型，导致操作优化效果不佳。借鉴RtR中用历史批次指导未来生产的想法，针对化学机械研磨数据品种多、每个品种中数据少的特点，本发明将工艺指标及产品状态作为输入，影响工艺指标的关键因素-研磨时间作为输出建立逆向模型用于对研磨时间进行优化设定，基于此提出一种基于聚类和多任务学习的化学机械研磨时间设定方法。Chemical mechanical polishing is a key process in the microelectronics manufacturing process, which affects the production efficiency of the entire production line. Due to the multi-variety and small-batch production characteristics of the modern microelectronics manufacturing industry, enterprises often adopt the mode of multi-variety mixed production. In this mixed production mode, the traditional Run-to-Run (RtR) optimization control method is difficult to achieve ideal results. The main idea of RtR is to use the production information of the previous batch or even several batches to guide the production of new batches. Generally, it is assumed that the products of the same variety are processed continuously on one equipment or several equipments are assumed to process a small number of varieties in a regular cycle. . However, due to the multi-variety mixed production, several equipment may continuously process any variety of products, and there are differences between different varieties and equipment, and the effect of the traditional RtR method is poor. Therefore, at present, the grinding time still depends on manual experience setting and advanced sheet testing. Once unqualified products appear, the entire batch will have to be reworked. It is difficult to achieve high production efficiency. It is urgent to optimize the processing time of chemical mechanical grinding. Certainly. The traditional optimization setting method of operating parameters needs to firstly model the process index, and then optimize the operating parameters on the basis of the index model. The process index model is a forward model with product status and operating parameters as input and process index as output. However, when there is less data, it is difficult to accurately establish the index model, resulting in poor operation optimization results. Drawing lessons from the idea of using historical batches to guide future production in RtR, and aiming at the characteristics of many types of chemical mechanical grinding data and less data in each type, the present invention uses process indicators and product status as input, and the key factor affecting process indicators-grinding time As an output, an inverse model is established to optimize the setting of grinding time, and based on this, a chemical-mechanical grinding time setting method based on clustering and multi-task learning is proposed.

发明内容Contents of the invention

为解决多品种混合生产环境下的化学机械研磨时间优化设定问题，本发明将工艺指标及产品状态作为输入，影响工艺指标的关键因素-研磨时间作为输出建立逆向模型用于对研磨时间进行优化设定，基于此提出一种基于聚类和多任务学习的化学机械研磨时间设定方法。为处理实际生产数据中产品品种多，每个品种数据少的问题，提出一种基于聚类和多任务学习的两步建模方法，将传统聚类多任务学习中的聚类过程和参数学习过程分开考虑。品种聚类过程假设每个品种中的数据符合一个多维高斯分布，采用最大似然估计对多维高斯分布的均值向量和方差矩阵进行估计。根据估计结果，采用巴氏距离表示两个多维高斯分布的相似度。通过计算不同品种概率分布之间的相似度得到相似度矩阵。以相似度矩阵作为输入采用仿射传播算法完成品种聚类。同时为了保证小样本下聚类所得的结果中每个类别中的样本数量都足够大，将每个品种的样本数量作为一种先验知识嵌入到品种聚类过程中。In order to solve the problem of optimal setting of chemical mechanical grinding time in a multi-variety mixed production environment, the present invention takes the process index and product status as input, and the key factor affecting the process index-grinding time as output to establish a reverse model for optimizing the grinding time Based on this, a chemical mechanical grinding time setting method based on clustering and multi-task learning is proposed. In order to deal with the problem of many varieties of products and little data of each variety in the actual production data, a two-step modeling method based on clustering and multi-task learning is proposed, which integrates the clustering process and parameter learning in traditional clustering multi-task learning The process is considered separately. The variety clustering process assumes that the data in each variety conforms to a multidimensional Gaussian distribution, and uses maximum likelihood estimation to estimate the mean vector and variance matrix of the multidimensional Gaussian distribution. According to the estimated results, the Bhattacharyachian distance is used to represent the similarity of two multidimensional Gaussian distributions. The similarity matrix is obtained by calculating the similarity between the probability distributions of different varieties. Using the similarity matrix as input, the affine propagation algorithm was used to complete the clustering of varieties. At the same time, in order to ensure that the number of samples in each category in the results of clustering under small samples is large enough, the number of samples of each variety is embedded in the process of variety clustering as a priori knowledge.

在每个类别中采用所提出的基于共有参数提取的多任务学习算法计算模型参数。为了解决单个任务中样本数量少的问题，所提出的基于共有参数提取的多任务学习算法将模型参数分解为共有部分和私有部分并同时学习这两部分。共有部分可以弥补样本数据不足带来的模型偏差。The model parameters are computed in each category using the proposed multi-task learning algorithm based on common parameter extraction. To solve the problem of small number of samples in a single task, the proposed multi-task learning algorithm based on common parameter extraction decomposes model parameters into common and private parts and learns both parts simultaneously. The common part can make up for the model bias caused by insufficient sample data.

基于聚类和多任务学习的化学机械研磨时间设定方法，其特征在于，所述方法是在计算机上依次按以下步骤实现的：The chemical mechanical grinding time setting method based on clustering and multi-task learning is characterized in that the method is implemented on a computer in the following steps:

步骤(1)：数据整理Step (1): Data curation

本方法中所建立的优化设定模型以4项工艺指标及产品状态组成模型输入行向量x，其中包括：研磨材料去除速率、来片厚度、抽测先行片出片厚度和lot抽测出片厚度，以化学机械研磨时间为模型输出y；假设模型输入输出之间的关系满足下式：The optimal setting model established in this method uses 4 process indicators and product status to form a model input row vector x, including: abrasive material removal rate, incoming sheet thickness, random measurement of the thickness of the preceding sheet and the thickness of the sheet measured by lot, Take the chemical mechanical grinding time as the model output y; assume that the relationship between the model input and output satisfies the following formula:

y＝xw+δy=xw+δ

其中列向量w表示待确定的模型参数，δ为噪声。The column vector w represents the model parameters to be determined, and δ is the noise.

不失一般性的，假设有m个产品品种；第i个产品品种的N_i个样本输入记为矩阵其第j行x_i(j)表示第i个产品品种的第j个样本的输入向量，d为模型输入变量个数，本方法中d＝4；第i个产品品种的N_i个样本输出记为列向量其第j个元素表示第i个产品品种的第j个样本的输出，即化学机械研磨时间；为便于后续表示，用X表示由输入矩阵[X₁，X₂，...，X_m-1，X_m]纵向排列所构成的矩阵；同样用Y表示由输出向量[y₁，y₂，...，y_m-1，y_m]纵向排列构成的列向量；X和Y均有N_i行，N表示所有的样本总数；列向量表示每个样本所属品种的标号，取值范围为{1，2，...，m-1，m}；Without loss of generality, suppose there are m product varieties; N _i sample inputs of the i-th product variety are recorded as matrix Its j-th row x _i(j) represents the input vector of the j-th sample of the i-th product variety, d is the number of model input variables, and d=4 in this method; N _i samples of the i-th product variety output denoted as a column vector its jth element Indicates the output of the j-th sample of the i-th product variety, that is, the chemical-mechanical grinding time; for subsequent representation, X is used to represent the input matrix [X ₁ , X ₂ ,..., X _m-1 , X _m ] The matrix formed by the vertical arrangement; Y is also used to represent the column vector formed by the vertical arrangement of the output vector [y ₁ , y ₂ ,..., y _m-1 , y _m ]; both X and Y have N _i rows, N represents the total number of samples; column vector Indicates the label of the variety to which each sample belongs, and the value range is {1, 2, ..., m-1, m};

步骤(2)：计算不同品种的相似度矩阵Step (2): Calculate the similarity matrix of different varieties

假设每个品种的数据服从不同的多维高斯分布，采用极大似然估计方法计算每个品种的概率分布函数；例如，对第i个品种，其多维高斯分布的均值向量和和方差矩阵的估计值：Assuming that the data of each variety obeys different multidimensional Gaussian distributions, the probability distribution function of each variety is calculated using the maximum likelihood estimation method; for example, for the i-th variety, the mean vector of its multidimensional Gaussian distribution and and variance matrix Estimated value of :

其中矩阵行向量z_i(j)表示矩阵Z_i的第j行，包含4个输入变量和1个输出变量；where matrix The row vector z _i(j) represents the jth row of the matrix Z _i , including 4 input variables and 1 output variable;

采用巴氏距离比较不同品种之间的相似度，即比较不同品种多维高斯分布的相似度：The similarity between different varieties is compared by using the Barrett's distance, that is, the similarity of multidimensional Gaussian distribution of different varieties is compared:

在多维高斯分布的假设下，巴氏距离有解析表达式，假设两个多维高斯分布的概率分布为G₁～N(μ₁，∑₁)，G₂～N(μ₂，∑₂)的距离计算方法为：Under the assumption of multidimensional Gaussian distribution, the Bhattacharyian distance has an analytical expression, assuming that the probability distribution of two multidimensional Gaussian distributions is G ₁ ~ N(μ ₁ , ∑ ₁ ), G ₂ ~ N(μ ₂ , ∑ ₂ ) The distance calculation method is:

其中|A|表示矩阵A的行列式；in |A| represents the determinant of matrix A;

根据上述的差异度计算方法，基于每个品种多元高斯分布均值向量和方差矩阵的估计值，计算得到相似度矩阵；因后续仿射传播聚类算法所需输入为相似度，所以差异度取负数得到相似度；According to the above calculation method of difference degree, based on the mean vector of multivariate Gaussian distribution of each variety and variance matrix The estimated value of is calculated to obtain the similarity matrix; because the input required by the subsequent affine propagation clustering algorithm is the similarity, the difference is negative to obtain the similarity;

步骤(3)：基于仿射传播的产品特征聚类Step (3): Product feature clustering based on affine propagation

仿射传播聚类是一种基于信息累积的聚类算法，根据不同点的累积信息量确定聚类中心，主要利用相似度矩阵计算两种信息量，γ(i，k)，a(i，k)：Affine propagation clustering is a clustering algorithm based on information accumulation. The cluster center is determined according to the accumulated information of different points, and the similarity matrix is mainly used to calculate two kinds of information, γ(i, k), a(i, k):

开始设置a(i，k)＝0，然后根据上式迭代的更新γ(i，k)和a(i，k)直到收敛；Start to set a(i,k)=0, then iteratively update γ(i,k) and a(i,k) according to the above formula until convergence;

仿射传播聚类中用偏好向量表示先验知识中每个任务成为聚类中心的可能性；在迭代中用偏好向量代替相似度矩阵中的对角线元素进而影响聚类中心的选择；因为样本数目较多的产品品种后续会得到更加准确的模型，更适合作为聚类中心，为了将样本数目这一先验用于聚类的过程中，采用如下方法设置仿射传播算法中的偏好向量；In the affine propagation clustering, the preference vector is used to represent the possibility of each task in the prior knowledge becoming the cluster center; in the iteration, the preference vector is used to replace the diagonal elements in the similarity matrix to affect the selection of the cluster center; because Products with a large number of samples will get more accurate models later, which are more suitable as cluster centers. In order to use the prior of the number of samples in the process of clustering, the following method is used to set the preference vector in the affine propagation algorithm ;

设偏好向量的设定值为p＝[p₁，p₂，...，p_m-1，p_m]：Assume that the set value of the preference vector is p=[p ₁ , p ₂ ,..., p _m-1 , p _m ]:

其中N_i代表每个任务的样本数，L表示希望样本数量大于L的品种更倾向于成为聚类中心；典型值设定a＝0.005，b＝2000，L＝50；Among them, N _i represents the number of samples for each task, and L indicates that varieties with a sample size larger than L are more likely to become cluster centers; typical values are set to a=0.005, b=2000, L=50;

步骤(4)：基于共有参数提取的多任务学习Step (4): Multi-task learning based on common parameter extraction

在聚类后得到L个类别，对每个类别中的品种使用基于共有参数提取的多任务学习算法，其算法的主要思想是将每个品种的模型参数分为两部分：共享参数和私有参数；共享参数是每个类别中所有品种的数据模型中相同的部分，用列向量表示；而私有参数是每个类别中每个品种的数据模型中不同的部分，用列向量表示；如果一个类别中有γ个品种，则γ个列向量vⁱ可构成一个γ列的矩阵V；基于共有参数提取的多任务学习算法可根据每个类别中的数据学习出这两部分的参数从而得到最终模型的参数，例如对第i个任务，最终的模型参数为：After clustering, L categories are obtained, and a multi-task learning algorithm based on common parameter extraction is used for the varieties in each category. The main idea of the algorithm is to divide the model parameters of each variety into two parts: shared parameters and private parameters. ; Shared parameters are the same parts of the data model for all varieties in each category, with a column vector Represents; while the private parameters are different parts of the data model for each variety in each category, with a column vector If there are γ varieties in a category, then γ column vectors v ⁱ can form a matrix V of γ columns; the multi-task learning algorithm based on the extraction of common parameters can learn the two parts according to the data in each category Parameters to get the parameters of the final model, for example, for the i-th task, the final model parameters are:

wⁱ＝u+vⁱ w ⁱ =u+v ⁱ

在进行模型学习以前，需要先对数据进行归一化处理；然后设定模型的参数，包括λ₁，λ₂，λ₃，随机初始化共有参数向量u和私有参数矩阵V；Before learning the model, it is necessary to normalize the data; then set the parameters of the model, including λ ₁ , λ ₂ , λ ₃ , and randomly initialize the common parameter vector u and the private parameter matrix V;

迭代过程如下，其中X表示一类中γ个品种的输入矩阵[X₁，X₂，...，XX_r-1，X_r]纵向排列所构成的矩阵；同样用Y表示一类中γ个品种的输出向量[y₁，y₂，...，y_r-1，y_r]纵向排列构成的列向量对第k次迭代计算：The iterative process is as follows, where X represents the matrix formed by the vertical arrangement of input matrices [X ₁ , X ₂ , ..., XX _r-1 , X _r ] of γ varieties in one class; Y also represents γ in one class The output vectors [y ₁ , y ₂ , ..., y _r-1 , y _r ] of varieties are arranged vertically to form a column vector for the k-th iteration calculation:

上式中：In the above formula:

根据p_k，u_k-1和Q_k，V_k-1更新u_k，V_k Update u k , V _k according to p _k , u _k-1 and Q _k , V _k _-1

其中α∈[0，1]，可令α⁰＝0；t₀＝1， Where α∈[0,1], α ⁰ =0 can be set; t ₀ =1,

在迭代过程中，步长l_k采用如下方法确定：In the iterative process, the step size l _k is determined by the following method:

l_k＝2^jkl_k-1，其中j_k为使得下式成立的最小非负正整数：l _k ＝2 ^jk l _k-1 , where j _k is the smallest non-negative positive integer that makes the following formula true:

对L个聚类所得的类别分别使用上述方法即可得到L个模型库，其包含了所有m个品种的模型。Using the above methods for the categories obtained by the L clusters respectively, L model libraries can be obtained, which include all models of m varieties.

附图说明Description of drawings

图1：基于聚类和多任务学习的化学机械研磨时间设定方法流程图Figure 1: Flow chart of the method for setting chemical mechanical grinding time based on clustering and multi-task learning

图2：基于聚类和多任务学习的化学机械研磨时间设定方法软硬件组成图。Figure 2: The hardware and software composition diagram of the chemical mechanical grinding time setting method based on clustering and multi-task learning.

具体实施方式Detailed ways

本发明提出基于聚类和多任务学习的化学机械研磨时间设定方法，其主要优势在于可用于多品种混合生产，相比人工设定可提高生产效率。在实际应用过程中，如果有新的生产批次到来，可根据其品种和加工层种类及其他批次信息计算研磨时间。本发明的基于聚类和多任务学习算法依赖于相关的硬件设备，包括：数据采集系统、算法服务器和用户客户端，并由基于智能优化软件实现。本发明提出方法流程图如图2所示。The present invention proposes a chemical mechanical grinding time setting method based on clustering and multi-task learning. Its main advantage is that it can be used in multi-variety mixed production, and can improve production efficiency compared with manual setting. In the actual application process, if a new production batch arrives, the grinding time can be calculated according to its variety and processing layer type and other batch information. The learning algorithm based on clustering and multi-task of the present invention depends on related hardware equipment, including: data acquisition system, algorithm server and user client, and is realized by intelligent optimization software. The flow chart of the method proposed by the present invention is shown in FIG. 2 .

步骤(1)：数据采集Step (1): Data Acquisition

采集的生产信息包括lot的产品品种、加工层次、材料去除率、来片厚度、先行片出片厚度、抽检出片厚度、来片厚度范围、出片厚度范围，将初始化信息存储至生产过程数据库中；The collected production information includes lot product variety, processing level, material removal rate, incoming sheet thickness, leading sheet output thickness, random inspection output sheet thickness, incoming sheet thickness range, and output sheet thickness range, and the initialization information is stored in the production process database middle;

步骤(2)：数据整理Step (2): Data Wrangling

在模型训练服务器上，首先去除异常历史记录，包括历史数据中人为输入的明显异常记录和个别数据项缺失的异常记录。然后根据模型学习的要求整理数据，其中将产品品种和品种最为一个组合定位为一种广义品种，及相同品种的不同层次也认为是不同品种。所组成的数据集为其中第j行由4维数据构成，包括材料去除速率、来片厚度、先行片出片厚度和lot抽测出片厚度；第j个元素为化学机械研磨时间；品种编号向量I。On the model training server, first remove the abnormal historical records, including the obvious abnormal records entered by human beings in the historical data and the abnormal records missing individual data items. Then organize the data according to the requirements of model learning, in which the combination of product variety and variety is positioned as a generalized variety, and different levels of the same variety are also considered as different varieties. The dataset composed of where line j Consists of 4-dimensional data, including material removal rate, thickness of the incoming piece, thickness of the leading piece and the thickness of the piece measured by lot; jth element is the chemical mechanical grinding time; the variety number vector I.

步骤(3)：模型训练Step (3): Model training

在模型训练服务器上，根据整理后的数据集{X_i，y_i}，i＝{1，2，...，m-1，m}和品种编号向量I进行模型学习。其中包括两步：产品特征聚类和共有参数提取的多任务学习。根据发明描述中的计算流程完成整个模型训练的过程。本发明提出方法流程图如图1所示。On the model training server, model learning is performed according to the sorted data set {X _i , y _i }, i={1, 2, . . . , m−1, m} and the variety number vector I. It includes two steps: product feature clustering and multi-task learning of common parameter extraction. Complete the entire model training process according to the calculation process in the invention description. The flow chart of the method proposed by the present invention is shown in FIG. 1 .

步骤(4)：参数优化Step (4): Parameter Optimization

在基于产品特征聚类和共有参数提取的多任务学习中，主要的模型参数包括样本数阈值L、聚类数参数a和多任务学习参数λ₁，λ₂，λ₃。样本阈值L可通过对整个数据集中每个品种的样本数统计来确定。其余三个参数采用交叉验证的方法进行优化。其主要过程是对整个数据集进行拆分并分别作为训练集和测试集。例如如果进行3交叉验证，则将每个品种的数据分为3份，然后取1份作为测试数据，2分作为训练数据。用训练数据训练得到模型，然后用测试数据测试模型的优化设定效果。通过设定不同的a，λ₁，λ₂，λ₃选择效果最好的参数。最后将最佳的模型传输到现场服务器上。In the multi-task learning based on product feature clustering and common parameter extraction, the main model parameters include sample number threshold L, cluster number parameter a and multi-task learning parameters λ ₁ , λ ₂ , λ ₃ . The sample threshold L can be determined by counting the number of samples of each variety in the entire data set. The remaining three parameters are optimized by cross-validation method. The main process is to split the entire data set and use it as training set and test set respectively. For example, if 3 cross-validation is performed, the data of each variety is divided into 3 parts, and then 1 part is used as test data and 2 parts are used as training data. Use the training data to train the model, and then use the test data to test the optimization setting effect of the model. By setting different a, λ ₁ , λ ₂ , λ ₃ to select the parameter with the best effect. Finally the best model is transferred to an on-site server.

步骤(4)：在线应用Step (4): Online Application

在现场服务器上，根据实际生产数据传过来的当前加工批次数据，选择对应品种和层次的数据确定模型参数。当前批次中的数据x包括材料去除速率、来片厚度，缺少先行片出片厚度和lot抽测出片厚度。此处用此品种和层次的标准出片厚度代替先行片出片厚度和lot抽测出片厚度。用模型参数点乘向量x可得到化学机械研磨时间的优化设定值。On the on-site server, according to the current processing batch data transmitted from the actual production data, select the data corresponding to the variety and level to determine the model parameters. The data x in the current batch includes the material removal rate, the thickness of the incoming piece, the thickness of the missing piece of the leading piece, and the thickness of the piece measured by lot. Here, the standard sheet thickness of this variety and level is used to replace the sheet thickness of the preceding sheet and the thickness of the sheet measured by lot sampling. The optimal setting value of the chemical mechanical polishing time can be obtained by multiplying the vector x with the model parameters.

基于上述所提出的基于产品特征聚类和共有参数提取多任务学习方法，本发明做了大量的仿真试验，由于篇幅所限，这里仅给出该发明应用到化学机械研磨时间优化设定的仿真结果。输入数据由4维数据组成，包括材料去除速率、来片厚度，先行片出片厚度和lot抽测出片厚度。数据取自2011-1-1至2013-7-25之间的工业现场数据，441个品种，共11250条记录，采用3交叉验证确定模型参数。Based on the above-mentioned proposed multi-task learning method based on product feature clustering and common parameter extraction, the present invention has done a large number of simulation experiments. Due to space limitations, only the simulation of the application of the invention to the optimal setting of chemical mechanical grinding time is given here. result. The input data consists of 4-dimensional data, including the material removal rate, the thickness of the incoming piece, the thickness of the piece out of the first piece and the thickness of the piece measured by lot. The data is taken from the industrial field data between 2011-1-1 and 2013-7-25, 441 varieties, a total of 11250 records, using 3 cross-validation to determine the model parameters.

本发明与典型的多任务学习算法CMTL-convex(Clustered multi-task learningA convex formulation)和非线性建模方法核极限学习机(KELM)、支持向量机(SVM)进行了比较。其中为了证明品种聚类算法的有效性，分别对不采用品种聚类方法的KELM、SVM与进行品种聚类以后的Cluster-KELM和Cluster-SVM进行比较。KELM与SVM中所用的是高斯核函数，表达式为：The present invention is compared with typical multi-task learning algorithm CMTL-convex (Clustered multi-task learningA convex formulation) and nonlinear modeling methods Kernel Extreme Learning Machine (KELM) and Support Vector Machine (SVM). Among them, in order to prove the effectiveness of the variety clustering algorithm, KELM and SVM without the variety clustering method are compared with Cluster-KELM and Cluster-SVM after the variety clustering. The Gaussian kernel function used in KELM and SVM is expressed as:

exp(-γ*||u-v||²)exp(-γ*||uv|| ² )

另外的一个参数为模型中的正则项权重。相关参数取值如下表所示。Another parameter is the weight of the regular term in the model. The relevant parameter values are shown in the table below.

表1 算法相关参数设置Table 1 Algorithm-related parameter settings

随机抽取不同比例(0.2，0.4，0.6)的数据作为测试数据进行算法性能比较。性能指标选择正则化均方误差(normalized Mean Squared Error，nMSE)，平均均方误差(averaged Means Squared Error，aMSE)和10％误差以外所占比例(error＞10％)比较结果如表2所示Data with different ratios (0.2, 0.4, 0.6) were randomly selected as test data for algorithm performance comparison. The performance index selection regularized mean square error (normalized Mean Squared Error, nMSE), average mean square error (averaged Means Squared Error, aMSE) and the proportion of 10% error (error> 10%) comparison results are shown in Table 2

表2 LS-IELM与OS-ELM、Fixed-LSSVM算法性能比较结果Table 2 Performance comparison results of LS-IELM and OS-ELM, Fixed-LSSVM algorithm

从表中可看出，本发明提出的TwCMTL相比CMTL-convex、SVM、KELM、Cluster-SVM、Cluster-KELM具有更好的测试精度，具有更好的泛化能力。It can be seen from the table that TwCMTL proposed by the present invention has better test accuracy and better generalization ability than CMTL-convex, SVM, KELM, Cluster-SVM, and Cluster-KELM.

Claims

1. based on cluster and multi-task learning cmp time setting method, it is characterised in that methods described be Realized according to the following steps successively on computer：

Step (1)：Data preparation

The optimal setting model established in this method with 4 technic indexs and Product Status composition model line of input vector x, its Include：Grinding-material removal rate, carry out piece thickness, take a sample test leading piece slice thickness and lot and take a sample test slice thickness, with chemical machine Tool milling time is that model exports y；Relation between hypothesized model input and output meets following formula：

Y=xw+ δ

Wherein column vector w represents model parameter to be determined, and δ is noise；

Assuming that there is m product variety；The N of i-th of product variety_iIndividual sample input is designated as matrixIts jth row x_i(j) Represent the input vector of j-th of sample of i-th of product variety, d is mode input variable number, d=4；I-th of product variety N_iIndividual sample output is designated as column vectorIts j-th of elementRepresent the defeated of j-th of sample of i-th of product variety Go out, i.e. the cmp time；Represent for ease of follow-up, represented with X by input matrix [X₁, X₂..., X_m-1, X_m] longitudinal direction row The formed matrix of row；Equally represented with Y by output vector [y₁, y₂..., y_m-1, y_m] longitudinal arrangement form column vector；X and Y hasOK, N represents all total sample numbers；Column vectorRepresent the mark of the affiliated kind of each sample Number, span is { 1,2 ..., m-1, m }；

Step (2)：Calculate the similarity matrix of different cultivars

The probability-distribution function of each kind is calculated using Maximum Likelihood Estimation, each product under Multi-dimensional Gaussian distribution hypothesis The corresponding mean vector of kindAnd variance matrixEstimate be：

<mrow> <msub> <mover> <mi>&mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>i</mi> </msub> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> </munderover> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow>

<mrow> <msub> <mover> <mo>&Sigma;</mo> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> </munderover> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>&mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>&mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>&prime;</mo> </msup> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow>

Wherein matrixRow vector z_i(j)Representing matrix Z_iJth row, comprising 4 input variables and 1 output variable；

Diversity factor between different cultivars is compared using Pasteur's distance, that is, compares the diversity factor of different cultivars Multi-dimensional Gaussian distribution； The definition of Pasteur's distance is：

<mrow> <msub> <mi>d</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <mi>p</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>,</mo> <mi>q</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mo>&Integral;</mo> <msqrt> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mi>q</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mi>d</mi> <mi>x</mi> <mo>)</mo> </mrow> </mrow>

Under the hypothesis of Multi-dimensional Gaussian distribution, Pasteur's distance has analytical expression：Two Multi-dimensional Gaussian distribution G₁~N (μ₁, ∑₁) And G₂~N (μ₂, ∑₂) Pasteur apart from calculating formula：

<mrow> <msub> <mi>d</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>G</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>G</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>8</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>&mu;</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>&mu;</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msup> <mi>&Gamma;</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msub> <mi>&mu;</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>&mu;</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>ln</mi> <mrow> <mo>(</mo> <mo>|</mo> <msub> <mi>&Sigma;</mi> <mn>1</mn> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </mrow> </msup> <mo>|</mo> <msub> <mi>&Sigma;</mi> <mn>2</mn> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </mrow> </msup> <mo>|</mo> <mi>&Gamma;</mi> <mo>|</mo> <mo>)</mo> </mrow> </mrow>

Wherein| A | representing matrix A determinant；

Because subsequent affine propagation clustering method required input is similarity, diversity factor is taken and negative obtains similarity；It is based on The polynary Gaussian Profile mean vector of each kindAnd variance matrixEstimate, similarity matrix is calculated；

Step (3)：Product feature cluster based on affine propagation

Affine propagation clustering is a kind of clustering method based on information accumulation, is determined according to the cumulative information amount of difference in cluster The heart, two kinds of information content are calculated using similarity matrix；To point i and point k, involved two kinds of information content r (i, k) and a (i, k) For：

<mrow> <mi>r</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&LeftArrow;</mo> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mrow> <mi>k</mi> <mo>*</mo> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mi>k</mi> <mo>*</mo> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mo>{</mo> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>*</mo> <mo>)</mo> </mrow> <mo>+</mo> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>*</mo> <mo>)</mo> </mrow> <mo>}</mo> </mrow>

<mrow> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&LeftArrow;</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>*</mo> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mo>{</mo> <mn>0</mn> <mo>,</mo> <mi>r</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>*</mo> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>,</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow>

Iteration starts setting up a (i, k)=0, then according to the renewal r (i, k) of above formula iteration and a (i, k) until convergence；

Represent that each task is as the possibility of cluster centre in priori with preference vector in affine propagation clustering；In iteration It is middle to replace the diagonal entry in similarity matrix with preference vector and then influence the selection of cluster centre；Because number of samples compared with Extended meeting obtains more accurate model after more product varietys, is more suitable for cluster centre, in order to by this elder generation of number of samples During testing for clustering, the preference vector in affine transmission method is set with the following method；

If the setting value of preference vector is p=[p₁, p₂..., p_m-1, p_m]：

<mrow> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>-</mo> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mo>(</mo> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>)</mo> <mo>&times;</mo> <mi>a</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>></mo> <msub> <mi>L</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> <mo>&times;</mo> <mi>a</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>&le;</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

Wherein L₁Expression wishes that sample size is more than L₁Kind be more likely to turn into cluster centre；Set a=0.005, b= 2000, L=50；

Step (4)：Multi-task learning based on shared parameter extraction

The L classification obtained after cluster, the multi-task learning based on shared parameter extraction is used to the kind in each classification Method, its main thought are that the model parameter of each kind is divided into two parts：Shared parameter and privately owned parameter；Shared parameter is Identical part in the data model of all kinds, uses column vector in each classificationRepresent；And privately owned parameter is each Part different in the data model of each kind, uses column vector in classificationRepresent；If there are r in a classification Kind, then r column vector v⁽ⁱ⁾It may make up the matrix V of a r row；Multi-task learning method based on shared parameter extraction can root Learn this two-part parameter according to the data in each classification so as to obtain the parameter of final mask, to i-th of task, finally Model parameter be：

wⁱ=u+v⁽ⁱ⁾

, it is necessary to which first data are normalized before model learning is carried out；Then the parameter of setting model, including λ₁, λ₂, λ₃, random initializtion shares parameter vector u and privately owned parameter matrix V；

Iterative process is as follows, the input matrix [X of r kind during wherein X represents a kind of₁, X₂..., X_r-1, X_r] longitudinal arrangement institute The matrix of composition；Output vector [the y of r kind in one kind is equally represented with Y₁, y₂..., y_r-1, y_r] longitudinal arrangement composition Column vector

To kth time iterative calculation：

<mrow> <msubsup> <mi>q</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>-</mo> <mfrac> <msub> <mi>&lambda;</mi> <mn>3</mn> </msub> <mrow> <msub> <mi>l</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <mo>|</mo> <msubsup> <mi>s</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> </mrow> </mfrac> <mo>)</mo> </mrow> <msubsup> <mi>s</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow>

In above formula：

<mrow> <msub> <mo>&dtri;</mo> <mi>u</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>X</mi> <mi>T</mi> </msup> <mfrac> <mrow> <mi>Y</mi> <mo>-</mo> <msub> <mi>X</mi> <msub> <mi>u</mi> <mi>k</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <mi>Y</mi> <mo>-</mo> <msub> <mi>X</mi> <msub> <mi>u</mi> <mi>k</mi> </msub> </msub> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> </mrow> </mfrac> <mo>-</mo> <mn>2</mn> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>X</mi> <mi>i</mi> <mi>T</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>+</mo> <msubsup> <mi>v</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <mn>2</mn> <msub> <mi>&lambda;</mi> <mn>2</mn> </msub> <mo>&CenterDot;</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> </mrow>

<mrow> <msub> <mo>&dtri;</mo> <mi>V</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mn>2</mn> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>X</mi> <mi>i</mi> <mi>T</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>+</mo> <msubsup> <mi>v</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>

According to p_k, u_k-1And Q_k, V_k-1Update u_k, V_k

u_k=p_k+α_k(p_k-p_k-1)

V_k=Q_k+α_k(Q_k-Q_k-1)

Wherein α ∈ [0,1], can make α₀=0；t₀=1,

In an iterative process, step-length l_kDetermine with the following method：

Wherein j_kTo cause the minimum non-negative positive integer of following formula establishment：

<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>V</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>|</mo> <mo>|</mo> <mi>Y</mi> <mo>-</mo> <mi>X</mi> <mi>u</mi> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> <mo>+</mo> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mo>|</mo> <mo>|</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>u</mi> <mo>+</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <msub> <mi>&lambda;</mi> <mn>2</mn> </msub> <mo>|</mo> <mo>|</mo> <mi>u</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow>

L model library i.e. can obtain using the above method respectively to the classification of L cluster gained, it comprises all m kinds Model.