CN107545276B

CN107545276B - Multi-view learning method combining low-rank representation and sparse regression

Info

Publication number: CN107545276B
Application number: CN201710648597.4A
Authority: CN
Inventors: 刘安安; 史英迪; 苏育挺
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-08-01
Filing date: 2017-08-01
Publication date: 2021-02-05
Anticipated expiration: 2037-08-01
Also published as: CN107545276A

Abstract

The invention discloses a multi-view learning method combining low-rank representation and sparse regression. The method includes the following steps: extracting low-level features and high-level attribute features respectively on a SUN data set with an image memory score label; The three parts of low-rank representation, combined sparse regression model and multi-view consistency loss are placed under the same framework to form a whole, and a multi-view model combining low-rank and sparse regression is constructed. The problem of memorability is to obtain the relationship between the underlying image features, image attribute features and image memory under the optimal parameters; combine the low-level features and high-level attribute features of the image, and use the relationship results obtained under the optimal parameters to predict the database test. Set image memory, and use relevant evaluation criteria to verify the prediction results. The invention combines the multi-view learning framework of low-rank representation and sparse regression to accurately predict the memorability of image regions.

Description

A Multi-View Learning Approach to Joint Low-Rank Representation and Sparse Regression

技术领域technical field

本发明涉及低秩表示和稀疏回归领域，尤其涉及一种联合低秩表示和稀疏回归的多视角学习方法。The invention relates to the field of low-rank representation and sparse regression, in particular to a multi-view learning method combining low-rank representation and sparse regression.

背景技术Background technique

人类有记住成千上万图像的能力，然而并不是所有的图像都以同样的方式被储存在大脑中。一些有代表性的图片看一眼就能记住，而其他图像很容易从记忆中消失。图像记忆被用来测量在特定时间段之后图像被记住或被遗忘的程度。以前的研究工作已经表明，对图片的记忆力和图像的固有属性有关，即对图片的记忆力在不同的时间间隔内以及在不同观察者之间是一致性的。在这种情况下，就像研究许多其他高级图像属性(如人气，兴趣，情绪和美学)一样，一些研究工作开始探索图像内容表示和图像记忆之间的潜在相关性。Humans have the ability to remember thousands of images, but not all images are stored in the brain in the same way. Some representative images are remembered at a glance, while other images are easily lost from memory. Image memory is used to measure how well images are remembered or forgotten after a specific period of time. Previous research work has shown that memory for pictures is related to the inherent property of images that memory for pictures is consistent across time intervals and across observers. In this context, as with many other high-level image attributes (such as popularity, interest, mood, and aesthetics), several research works have begun to explore the potential correlation between image content representation and image memory.

分析图像可记忆性可以应用在诸如用户界面设计、视频摘要、场景理解和广告设计等几个领域中。例如，可以通过选择有意义的图像来将可记忆性用作引导标准来总结图像集合或视频。通过提高消费者对目标品牌的记忆，可以设计难忘的广告帮助商人扩大影响力。Analyzing image memorability can be applied in several fields such as user interface design, video summarization, scene understanding, and advertising design. For example, image collections or videos can be summarized using memorability as a guiding criterion by selecting meaningful images. By enhancing consumers' memory of the target brand, unforgettable advertisements can be designed to help merchants expand their influence.

近来，低秩表现(LRR)已经成功应用于多媒体和计算机视觉领域。为了更好地处理特征表示问题，LRR用于通过将原始数据矩阵分解为低秩表示矩阵，同时消除不相关的细节，揭示嵌入数据中的底层低维子空间结构。传统方法通常不足以进行异常值的处理。为了解决这个问题，最近有一些研究也着重于稀疏回归学习。Recently, low-rank representation (LRR) has been successfully applied in multimedia and computer vision domains. To better handle the feature representation problem, LRR is used to reveal the underlying low-dimensional subspace structure embedded in the data by decomposing the original data matrix into a low-rank representation matrix while eliminating irrelevant details. Traditional methods are often insufficient for outlier handling. To address this issue, some recent studies also focus on sparse regression learning.

然而，这些作品的主要缺点之一是特征表示和记忆预测在两个分开的阶段进行。也就是说，当确定用于图像可记忆性预测的特征组合的图案时，回归步骤的最终性能主要由处理的特征决定。虽然参考文献[1]提出了联合低秩和稀疏回归的特征编码算法来处理异常值。同样，参考文献[2]开发了一种用于降维的联合图嵌入和稀疏回归框架。但它们都是为视觉分类问题设计的，而不是图像记忆预测任务。However, one of the main shortcomings of these works is that feature representation and memory prediction are performed in two separate stages. That is, when determining the pattern of feature combinations for image memorability prediction, the final performance of the regression step is mainly determined by the processed features. While reference [1] proposes a feature encoding algorithm for joint low-rank and sparse regression to deal with outliers. Likewise, Reference [2] develops a joint graph embedding and sparse regression framework for dimensionality reduction. But they are all designed for visual classification problems, not image memory prediction tasks.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种联合低秩表示和稀疏回归的多视角学习方法，本发明联合低秩表示和稀疏回归的多视角学习框架，准确预测图像区域的可记忆性，详见下文描述：The present invention provides a multi-view learning method that combines low-rank representation and sparse regression. The present invention combines a multi-view learning framework of low-rank representation and sparse regression to accurately predict the memorability of image regions, as described below:

对带有图像记忆度分数标签的SUN数据集分别进行低级特征和高级属性特征的提取；Extract low-level features and high-level attribute features respectively for the SUN dataset with image memory score labels;

将低秩表示、结合稀疏回归模型和多视角一致性损失三部分放在同一个框架下构成一个整体，构建联合低秩和稀疏回归的多视角模型；The three parts of low-rank representation, combined sparse regression model and multi-view consistency loss are placed under the same framework to form a whole, and a multi-view model combining low-rank and sparse regression is constructed;

利用多视觉自适应回归算法解决自动预测图像的可记忆性的问题，在最优参数下得到图像底层特征、图像属性特征和图像记忆度的关系；The multi-vision adaptive regression algorithm is used to solve the problem of automatically predicting the memorability of images, and the relationship between image underlying features, image attribute features and image memory is obtained under the optimal parameters;

组合图像的低级特征和高级属性特征，利用在最优参数下得到的关系结果，预测数据库测试集图像记忆度，并用相关评价标准来验证预测结果；Combine the low-level features and high-level attribute features of the image, use the relationship results obtained under the optimal parameters to predict the image memory of the database test set, and use the relevant evaluation criteria to verify the prediction results;

所述利用多视觉自适应回归算法解决自动预测图像的可记忆性的问题为：通过松弛变量Q来转换等价的问题：The problem of automatically predicting the memorability of the image by using the multi-vision adaptive regression algorithm is: converting the equivalent problem through the slack variable Q:

s.t.X＝XA+E,Q＝Aws.t.X=XA+E, Q=Aw

其中，A为低秩表示的映射矩阵；w为低秩特征表示和输出记忆度分数之间的线性依赖关系；E是稀疏误差约束部分；α是预测误差部分和正则化部分之间的平衡参数；β为控制稀疏的参数；λ＞0是平衡参数；X为输入的特征；y为可记忆度分数向量；*是核范数表示；φ为图正则化的约束项；L是拉普拉斯算子；Among them, A is the mapping matrix of the low-rank representation; w is the linear dependence between the low-rank feature representation and the output memory score; E is the sparse error constraint part; α is the balance parameter between the prediction error part and the regularization part ; β is the parameter to control sparsity; λ > 0 is the balance parameter; X is the input feature; y is the memorability score vector; * is the nuclear norm representation; φ is the constraint term of graph regularization; operator;

引入两个松弛变量Y₁和Y₂以获得增广的拉格朗日函数：Two slack variables Y ₁ and Y ₂ are introduced to obtain the augmented Lagrangian function:

其中，＜,＞代表矩阵的内积操作，Y₁和Y₂代表拉格朗日算子矩阵，μ＞0是正惩罚参数，将上述方法合并为：Among them, <,> represents the inner product operation of the matrix, Y ₁ and Y ₂ represent the Lagrangian operator matrix, and μ > 0 is the positive penalty parameter. The above methods are combined as:

其中in

引入变量t，定义A_t,E_t,Q_t,w_t,Y_1,t,Y_2,t和μ作为变量的第t次迭代的结果，得到第t+1次迭代结果如下所示：Introduce variable t, define A _t , E _t , Q _t , w _t , Y _{1, t} , Y _{2, t} and μ as the result of the t-th iteration of the variables, and the result of the t+1-th iteration is as follows:

A的迭代结果：The iterative result of A:

其中，

in,

固定w,A,Q得到E的优化结果如下：Fixing w, A, Q to get the optimization result of E is as follows:

通过固定E,A,Q，优化w结果如下：By fixing E, A, Q, the result of optimizing w is as follows:

上述问题是岭回归问题，最优解是

The above problem is a ridge regression problem, and the optimal solution is

最后固定E,w,A，优化Q，得到：Finally, fix E, w, A, optimize Q, get:

Y₁和Y₂通过以下方案更新：Y ₁ and Y ₂ are updated with the following scheme:

Y_1,t+1＝Y_1,t+μ_t(X-XA_t+1-E_t+1)Y _1,t+1 =Y _1,t + μ _t (X-XA _t+1 -E _t+1 )

Y_2,t+1＝Y_2,t+μ_t(Q_t+1-A_t+1w_t+1)Y _2,t+1 =Y _2,t + μ _t (Q _t+1 -A _t+1 w _t+1 )

其中，

为求偏导的符号；in,

is the symbol for partial derivative;

所述联合低秩和稀疏回归的多视角模型具体为：The multi-view model of the joint low-rank and sparse regression is specifically:

其中：in:

G(φ_H,φ_l)＝tr[(X_lA_lw_l)^TX_HA_Hw_H]G(φ _H ,φ _l )=tr[(X _l A _l w _l ) ^T X _H A _H w _H ]

为用作高级特征预测误差的损失函数；

为用作低级特征预测误差的损失函数；

为用于解决过拟合问题的图形正则化表示；X_H为高级属性特征；A_H为高级属性特征低秩表示的映射矩阵；E_H为是高级属性特征稀疏误差约束部分；w_H为高级属性特征的低秩表示和输出记忆度分数之间的线性依赖关系；A_l为低级特征低秩表示的映射矩阵；E_l为低级属性特征稀疏误差约束部分；X_l为低级属性特征；w_l为低级属性特征的低秩表示和输出记忆度分数之间的线性依赖关系。

is the loss function used as the prediction error for advanced features;

is the loss function used as the prediction error of low-level features;

is the graphic regularization representation used to solve the overfitting problem; X _H is the high-level attribute feature; A _H is the mapping matrix of the low-rank representation of the high-level attribute feature; E _H is the sparse error constraint part of the high-level attribute feature; w _H is the high-level attribute feature Linear dependence between low-rank representation of attribute features and output memory score; A _l is the mapping matrix of low-rank representation of low-level features; E _l is the sparse error constraint part of low-level attribute features; X _l is low-level attribute features; w _l is the linear dependence between the low-rank representation of low-level attribute features and the output memory score.

所述方法还包括：获取图像可记忆性数据集。The method also includes acquiring an image memorability dataset.

所述低级特征包括：尺度不变特征变换特征、搜索树特征、方向梯度直方图特征、以及结构相似性特征。The low-level features include: scale-invariant feature transformation features, search tree features, directional gradient histogram features, and structural similarity features.

所述高级属性特征包括：327维场景类别属性特征、以及106维对象属性特征。The advanced attribute features include: 327-dimensional scene category attribute features and 106-dimensional object attribute features.

本发明提供的技术方案的有益效果是：The beneficial effects of the technical scheme provided by the present invention are:

1、联合低秩表示和稀疏回归用于图像可记忆性预测，其中采用低等级约束来揭示嵌入原始数据的内在结构，利用稀疏约束去除异常值以及冗余信息，当低秩表示和稀疏回归共同执行时，所有特征共享的低秩表示可以捕获特征的内部结构，从而提高了预测的准确率；1. Combine low-rank representation and sparse regression for image memorability prediction, in which low-rank constraints are used to reveal the inherent structure of the embedded original data, and sparse constraints are used to remove outliers and redundant information. When low-rank representation and sparse regression work together When executed, the low-rank representation shared by all features can capture the internal structure of features, thereby improving the accuracy of prediction;

2、本发明基于多视觉自适应回归(MAR)算法，以快速收敛来解决目标函数的优化问题。2. The present invention is based on the multi-vision adaptive regression (MAR) algorithm to solve the optimization problem of the objective function with fast convergence.

附图说明Description of drawings

图1为一种联合低秩表示和稀疏回归的多视角学习方法的流程图；Figure 1 is a flowchart of a multi-view learning method combining low-rank representation and sparse regression;

图2为标有图像记忆度分数的数据库图像样例；Figure 2 is an example of a database image marked with an image memory score;

图3为算法收敛图；Fig. 3 is the algorithm convergence diagram;

图4为本方法与其他方法结果对比图。Figure 4 is a comparison chart of the results of this method and other methods.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面对本发明实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the embodiments of the present invention are further described in detail below.

实施例1Example 1

研究表明图像属性特征相比其原始底层特征是很高级别的语义特征，对图像的视觉特征进行研究并对图像记忆度进行预测，本发明实施例提出了一种用于图像可记忆性预测的联合低秩表示和稀疏回归的多视角学习方法，参见图1，该方法包括以下步骤：Studies have shown that image attribute features are very high-level semantic features compared to their original underlying features, and the visual features of images are studied and image memory is predicted. A multi-view learning method combining low-rank representation and sparse regression, see Figure 1. The method includes the following steps:

101：获取图像可记忆性数据集；101: Obtain an image memory data set;

其中，该图像可记忆性数据集^[1]包含来自SUN数据集^[11]的2,222张图像。图像的记忆得分通过Amazon Mechanical Turk的Visual Memory Game得到，图像可记忆性是从0到1的连续值。值越高，图像越难记忆。具有各种记忆得分的样本图像如图2所示。Among them, the image memorability dataset ^[1] contains 2,222 images from the SUN dataset ^[11] . The memory score of the image is obtained by Amazon Mechanical Turk's Visual Memory Game, and the image memorability is a continuous value from 0 to 1. The higher the value, the harder the image is to remember. Sample images with various memory scores are shown in Figure 2.

102：对带有图像记忆度分数标签的SUN数据集分别进行低级特征和高级属性特征的提取；102: Extract low-level features and high-level attribute features respectively for the SUN dataset with image memory score labels;

其中，提取的低级特征包括：SIFT(尺度不变特征变换，Scale-invariant featuretransform，SIFT)、Gist(搜索树，Generalized Search Trees)、HOG(方向梯度直方图，Histogram of Oriented Gradient)和SSIM(结构相似性，structural similarityindex))特征，该4种低级特征共同构成了低级特征库。本发明实施例同时使用两种类型的高级属性特征，包括：327维场景类别属性特征，106维对象属性特征。Among them, the extracted low-level features include: SIFT (Scale-invariant feature transform, SIFT), Gist (Search Tree, Generalized Search Trees), HOG (Histogram of Oriented Gradient) and SSIM (Structure Similarity, structural similarityindex)) feature, the four low-level features together constitute the low-level feature library. The embodiment of the present invention simultaneously uses two types of advanced attribute features, including: a 327-dimensional scene category attribute feature and a 106-dimensional object attribute feature.

其中，场景类别属性涵盖327个场景类别，对象属性特征通过106个对象类别标记，具体的维数根据实际应用中的需要进行设定，本发明实施例对此不做限制。。The scene category attribute covers 327 scene categories, and the object attribute feature is marked by 106 object categories, and the specific dimension is set according to actual application needs, which is not limited in this embodiment of the present invention. .

103：将低秩表示、稀疏回归模型和多视角一致性损失三部分放在同一个框架下构成一个整体，构建Mv-JLRSR(联合低秩和稀疏回归的多视角)模型；103: The low-rank representation, the sparse regression model and the multi-view consistency loss are placed under the same framework to form a whole, and the Mv-JLRSR (multi-view joint low-rank and sparse regression) model is constructed;

104：利用多视觉自适应回归(MAR)算法解决自动预测图像的可记忆性的问题，在最优参数下得到图像底层特征、图像属性特征和图像记忆度的关系；104: Use the multi-vision adaptive regression (MAR) algorithm to solve the problem of automatically predicting the memorability of images, and obtain the relationship between the underlying image features, image attribute features and image memory under optimal parameters;

105：组合图像的低级特征和高级属性特征，利用在最优参数下得到的关系结果，预测数据库测试集图像记忆度，并用相关评价标准来验证预测结果。105: Combine the low-level features and high-level attribute features of the image, use the relationship results obtained under the optimal parameters to predict the image memory degree of the database test set, and use the relevant evaluation criteria to verify the prediction results.

综上所述，本发明实施例通过上述步骤101-步骤105采用低秩约束来揭示原始数据的内在结构并利用稀疏约束去除特征的异常值以及冗余信息，当低级代表和稀疏回归共同执行时，所有特征共享的最低等级表示不仅可以捕获所有模态的全局结构，而且可以表示回归的要求；由于拟订的目标函数不平滑，难以解决，因此利用多视角自适应回归(MAR)算法来解决自动预测图像的可记忆性问题，以快速收敛来解决优化问题。To sum up, in the embodiment of the present invention, the low-rank constraints are used to reveal the internal structure of the original data through the above steps 101 to 105, and the outliers and redundant information of the features are removed by using the sparse constraints. When the low-level representation and the sparse regression are executed jointly , the lowest-level representation shared by all features can not only capture the global structure of all modalities, but also represent the requirements of regression; since the formulated objective function is not smooth and difficult to solve, the Multi-View Adaptive Regression (MAR) algorithm is used to solve automatic Predict the memorability problem of images to solve optimization problems with fast convergence.

实施例2Example 2

下面结合具体的计算公式对实施例1中的方案进行进一步地介绍，详见下文描述：The scheme in Embodiment 1 is further introduced below in conjunction with specific calculation formulas, and is described in detail below:

201：图像可记忆性数据集包含来自SUN数据集的2,222张图像；201: The image memorability dataset contains 2,222 images from the SUN dataset;

其中，该数据集为本领域技术人员所公知，本发明实施例对此不做赘述。The data set is known to those skilled in the art, and details are not described in this embodiment of the present invention.

202：对带有图像记忆度分数标签的SUN数据集的图片进行特征提取，提取的SIFT、Gist、HOG和SSIM特征构成低级特征库，并使用两种类型的高级属性特征，包括327维场景类别属性，106维对象属性202: Perform feature extraction on the pictures of the SUN dataset with image memory score labels, the extracted SIFT, Gist, HOG and SSIM features form a low-level feature library, and use two types of high-level attribute features, including 327-dimensional scene categories properties, 106-dimensional object properties

其中，上述数据集包括2222张各种环境下的图片，每张图片都标记好了图像记忆度分数，附图2展示了数据库中标有记忆度分数图片的样例。特征表示为

D_i代表此类特征的维数，N代表数据库中所含图像个数(2222)。这些特征构成特征库B＝{B₁,...,B_M}。Among them, the above data set includes 2222 pictures in various environments, and each picture is marked with the image memory score. Figure 2 shows a sample of the pictures marked with the memory score in the database. Features are expressed as

D _i represents the dimension of such features, and N represents the number of images contained in the database (2222). These features constitute a feature library B={B ₁ , . . . , B _M }.

203：建立Mv-JLRSR模型，在提取的低级特征和高级属性特征的基础上联合低秩表示和稀疏回归，建立更鲁棒的特征表示、以及建立准确的回归模型。203: Establish an Mv-JLRSR model, combine low-rank representation and sparse regression on the basis of the extracted low-level features and high-level attribute features, establish a more robust feature representation, and establish an accurate regression model.

Mv-JLRSR模型所定义的一般框架如下所示：The general framework defined by the Mv-JLRSR model is as follows:

其中，F(A,w)是用作预测误差的损失函数；L(A,E)表示基于低秩表示的特征编码器；G(A)是用于解决过拟合问题的图形正则化表示；A为低秩表示的映射矩阵；w为低秩特征表示和输出记忆度分数之间的线性依赖关系；E是稀疏误差约束部分。where F(A,w) is the loss function used as prediction error; L(A,E) represents the feature encoder based on low-rank representation; G(A) is the graph regularization representation used to solve the overfitting problem ; A is the mapping matrix of the low-rank representation; w is the linear dependence between the low-rank feature representation and the output memory score; E is the sparse error constraint part.

图像可记忆性数据集^[1]包含来自SUN数据集^[11]的2,222张图像，图像的记忆得分通过Amazon Mechanical Turk的Visual Memory Game得到；结合自适应迁移学习的回归训练，采用线性回归的方法对提取的特征库进行训练。图像记忆度的分数预测分为两方面，一方面是直接利用特征表示来预测图像记忆度，得到每一类图像底层特征到图像记忆度的映射矩阵w_i，另一方面图像高级属性特征(在预测图像记忆度分数中也起到了非常重要的作用，结合低秩学习，得到每类图像属性与图像记忆度的关系；根据初始图像特征向量集X∈R^N×D，Mv-JLRSR模型的目标是在提取的视觉线索的基础上联合低秩表示和稀疏回归以增强鲁棒特征表示和准确回归模型。The image memorability dataset ^[1] contains 2,222 images from the SUN dataset ^[11] , and the memory scores of the images are obtained through the Visual Memory Game of Amazon Mechanical Turk; combined with the regression training of adaptive transfer learning, the linear regression method is adopted Train the extracted feature library. The score prediction of image memory degree is divided into two aspects. On the one hand, the feature representation is directly used to predict the image memory degree, and the mapping matrix w _i from each type of image underlying features to image memory degree is obtained. On the other hand, the image high-level attribute features (in It also plays a very important role in predicting the image memory score. Combined with low-rank learning, the relationship between each type of image attribute and image memory is obtained; according to the initial image feature vector set X∈R ^N×D , the goal of the Mv-JLRSR model is It combines low-rank representation and sparse regression based on the extracted visual cues to enhance robust feature representation and accurate regression models.

对每一个部分进行具体介绍：A detailed introduction to each part:

由于低秩约束可以去除噪声或冗余信息来帮助揭示数据的本质结构。因此，这些提取好的低级和高级特征可以被集成到特征学习中来处理这些问题。LRR假设原始特征矩阵包含所有样本共享的潜在最低秩结构分量及其唯一的误差矩阵：Due to low-rank constraints, noise or redundant information can be removed to help reveal the essential structure of the data. Therefore, these extracted low-level and high-level features can be integrated into feature learning to deal with these problems. LRR assumes that the original feature matrix contains the potential lowest-rank structural component shared by all samples and its unique error matrix:

其中，A∈R^D×D是N个样本的低秩投影矩阵，E∈R^N×D是用l₁范数约束的唯一稀疏误差部分，以便处理随机误差，λ＞0是平衡参数，X为输入的特征；D为低秩约束后的特征维数；rank为特征的低秩表示法。where A∈R ^D×D is the low-rank projection matrix of N samples, E∈R ^N×D is the only sparse error part constrained with the _l1 norm to handle random errors, λ>0 is the balance parameter, X is the input feature; D is the feature dimension after low-rank constraint; rank is the low-rank representation of the feature.

由于上述方程很难优化，因此采用核范数||A||_*(_*表示核范数是指矩阵奇异值的和)来逼近A的秩，因此L(A,E)的公式可以被定义如下：Since the above equation is difficult to optimize, the nuclear norm ||A|| _* ( _* indicates that the nuclear norm refers to the sum of the singular values of the matrix) is used to approximate the rank of A, so the formula of L(A, E) can be defined as follows:

在本发明实施例提出的框架中，将图像记忆预测的问题作为标准回归问题。提出了lasso^[5]回归方法，通过建立输入特征矩阵X与可记忆度分数向量y之间的线性关系v，最小化最小二乘误差

来解决预测问题。在最小二乘误差部分加入岭正则化^[6]后，获得具有岭回归的典型最小二乘问题。In the framework proposed by the embodiments of the present invention, the problem of image memory prediction is regarded as a standard regression problem. The lasso ^[5] regression method is proposed to minimize the least squares error by establishing a linear relationship v between the input feature matrix X and the memorability score vector y

to solve the prediction problem. After adding ridge regularization ^[6] to the least squares error part, a typical least squares problem with ridge regression is obtained.

其中，α是预测误差部分和正则化部分之间的平衡参数。where α is a balance parameter between the prediction error part and the regularization part.

从矩阵分解的角度来看，变换向量v可以被分解为两个分量的乘积，即应用低秩投影矩阵A来捕获样本之间共享的低秩结构，并将系数向量w应用于将变换后的样本与它们的记忆得分相关联。引入Q＝Aw并将损失函数F(A,w)定义为：From the perspective of matrix decomposition, the transformation vector v can be decomposed into a product of two components, i.e. applying a low-rank projection matrix A to capture the low-rank structure shared between samples, and applying a coefficient vector w to transform the transformed Samples are associated with their memory scores. Introduce Q=Aw and define the loss function F(A,w) as:

基于多元学习的思想，采用图形正则化来保持几何结构的一致性来解决这个问题。图形正则化的核心思想是样本在特征表示形式上是接近的，那么记忆得分也是接近的，反之亦然。通过最小化图形正则化器G(A)，实现特征和记忆度得分之间的几何结构一致性：Based on the idea of multi-learning, graph regularization is adopted to maintain the consistency of the geometric structure to solve this problem. The core idea of graph regularization is that the samples are close in feature representation, so the memory scores are also close, and vice versa. Geometric consistency between features and memory scores is achieved by minimizing the graph regularizer G(A):

其中，L＝B-S是拉普拉斯算子，B是角矩阵，B_ii＝∑_jS_ij，S是高斯相似性函数所计算出的权重矩阵，它的计算由高斯相似性函数得到：Among them, L=BS is the Laplacian operator, B is the angle matrix, B _ii =∑ _j S _ij , S is the weight matrix calculated by the Gaussian similarity function, and its calculation is obtained by the Gaussian similarity function:

其中，y_i和y_j是第i个样本和第j个样本的流行度得分，N_K表示y_i是y_i的K临近数据，σ是一个半径参数，它被简单地设置为所有视频对上的欧氏距离的中值。where y _i and y _j are the popularity scores of the i-th sample and the j-th sample, N _K denotes that y _i is the K neighbors of yi _i , and σ is a radius parameter, which is simply set as all video pairs The median value of the Euclidean distance on .

通常提取多个特征以表示来自不同视图的图像。这是因为这些多重表示可以提供兼容和补充信息。对于图像记忆预测任务，自然的选择是将这些多个表示集成在一起，以表示图像以获得更好的性能，而不是依赖于单个特征。本发明实施例提取的是高级的属性特征和低级的视觉特征。Often multiple features are extracted to represent images from different views. This is because these multiple representations can provide compatible and complementary information. For image memory prediction tasks, the natural choice is to integrate these multiple representations to represent images for better performance, rather than relying on a single feature. The embodiments of the present invention extract high-level attribute features and low-level visual features.

因此定义Mv-JLRSR模型为：Therefore, the Mv-JLRSR model is defined as:

其中：in:

A∈R^D×D是N个样本的低秩投影矩阵来捕获样本之间共享的底层低秩结构，E∈R^N×D是利用L₁范数来解决随机误差。A∈R ^D×D is a low-rank projection matrix of N samples to capture the underlying low-rank structure shared between samples, and E∈R ^N×D utilizes the L ₁ norm to account for random errors.

其中，

为用作高级特征预测误差的损失函数；

为用作低级特征预测误差的损失函数；

为用于解决过拟合问题的图形正则化表示；X_H为高级属性特征；A_H为高级属性特征低秩表示的映射矩阵；E_H为是高级属性特征稀疏误差约束部分；β为控制稀疏的参数；w_H为高级属性特征的低秩表示和输出记忆度分数之间的线性依赖关系；A_l为低级特征低秩表示的映射矩阵；E_l为低级属性特征稀疏误差约束部分；X_l为低级属性特征；E_l为低级属性特征稀疏误差约束部分；w_l为低级属性特征的低秩表示和输出记忆度分数之间的线性依赖关系；y为训练样本的标签。in,

is the loss function used as the prediction error for advanced features;

is the loss function used as the prediction error of low-level features;

is the graphic regularization representation used to solve the overfitting problem; X _H is the high-level attribute feature; A _H is the mapping matrix of the low-rank representation of the high-level attribute feature; E _H is the sparse error constraint part of the high-level attribute feature; β is the control sparsity parameter; w _H is the linear dependence between the low-rank representation of high-level attribute features and the output memory score; A _l is the mapping matrix of low-level feature low-rank representation; E _l is the sparse error constraint part of low-level attribute features; X _l is the low-level attribute feature; E _l is the sparse error constraint part of the low-level attribute feature; w _l is the linear dependence between the low-rank representation of the low-level attribute feature and the output memory score; y is the label of the training sample.

β||A_lw_l||+λ||X_lA_lw_l-y||_F是定义的误差函数，通过在输入特征矩阵(X_H和X_l)和可记忆度分数向量y之间建立线性向量v来解决预测问题。

β||A _l w _l ||+λ||X _l A _l w _l -y|| _F is the error function defined by summing the input feature matrix (X _H and X _l ) and the memorability score vector y A linear vector v is established between to solve the prediction problem.

是为了保证特征相近的样本记忆力分数也是接近的。

It is to ensure that the memory scores of samples with similar characteristics are also close.

对Mv-JLRSR模型中F_H(φ_H)及F_L(φ_L)中的α，β，λ和φ初始化；分别固定A，E，w和Q并求导，不断重复求导过程直到误差达到设定的最小值。Initialize α, β, λ and φ in F _H (φ _H ) and _FL (φ _L ) in the Mv-JLRSR model; fix A, E, w and Q respectively and derive them, and repeat the derivation process until the error reaches the set minimum value.

下面具体介绍求解过程，利用多视觉自适应回归(MAR)算法^[7]来解决自动预测图像的可记忆性的问题，进而解决优化问题；The following is a detailed introduction to the solution process, and the multi-vision adaptive regression (MAR) algorithm ^[7] is used to solve the problem of automatically predicting the memorability of the image, and then solve the optimization problem;

首先，介绍一个松弛变量Q来转换上述等价的问题：First, a slack variable Q is introduced to transform the above equivalent problem:

然后，引入两个松弛变量Y₁和Y₂以获得增广的拉格朗日函数：Then, two slack variables Y ₁ and Y ₂ are introduced to obtain the augmented Lagrangian function:

其中＜,＞代表矩阵的内积操作，Y₁和Y₂代表拉格朗日算子矩阵，μ＞0是正惩罚参数，*是核范数表示；φ为图正则化的约束项；将上述方法合并为：Where <,> represents the inner product operation of the matrix, Y ₁ and Y ₂ represent the Lagrangian operator matrix, μ>0 is the positive penalty parameter, * is the nuclear norm representation; φ is the constraint term for graph regularization; The methods are combined into:

其中in

本方法采用交替迭代的方法来求解。通过将二次项h(A,Q,E,w,Y₁,Y₂,μ)近似为二阶泰勒展开来分别处理每个子问题。为了更好地理解这个过程，引入了一个变量t，并定义了A_t,E_t,Q_t,w_t,Y_1,t,Y_2,t和μ作为变量的第t次迭代的结果，因此得到第t+1次迭代结果如下所示：This method adopts the alternate iteration method to solve. Each subproblem is treated separately by approximating the quadratic term h(A,Q,E,w,Y ₁ , Y ₂ , μ) as a second-order Taylor expansion. To better understand this process, a variable t is introduced, and A _t , E _t , Q _t , w _t , Y _{1, t} , Y _{2, t} and μ are defined as the results of the t-th iteration of the variables, Therefore, the result of the t+1th iteration is as follows:

A的迭代结果：The iterative result of A:

其中，

in,

然后固定w,A,Q得到E的优化结果如下：Then fix w, A, Q to get the optimization result of E as follows:

上述问题实际上是众所周知的岭回归问题，其最优解是

The above problem is actually the well-known ridge regression problem, and its optimal solution is

最后固定E,w,A，优化Q，可以得到：Finally, fix E, w, A, and optimize Q, you can get:

此外，拉格朗日乘数Y₁和Y₂可以通过以下方案更新：Furthermore, the Lagrangian multipliers Y ₁ and Y ₂ can be updated by the following scheme:

其中，

为求偏导的符号。in,

Symbols for partial derivatives.

在所选评价标准下研究预测的分数与实际分数间的关系，得到算法性能结果。The relationship between the predicted score and the actual score is studied under the selected evaluation criteria, and the algorithm performance results are obtained.

其中，本发明实施例将数据库随机分为10组，对每一组都进行上述步骤得到10组相关系数，取平均值评价算法性能。本方法选择的评价标准有排序相关(RankingCorrelation)和R-value，在实施例3中还有详细介绍。In the embodiment of the present invention, the database is randomly divided into 10 groups, and the above steps are performed for each group to obtain 10 groups of correlation coefficients, and the average value is taken to evaluate the performance of the algorithm. The evaluation criteria selected by this method include Ranking Correlation and R-value, which are also described in detail in Embodiment 3.

实施例3Example 3

下面结合具体的实验数据，图3至图4对实施例1和2中的方案进行可行性验证，详见下文描述：Below in conjunction with concrete experimental data, Fig. 3 to Fig. 4 carry out feasibility verification to the scheme in embodiment 1 and 2, see the following description for details:

图像可记忆性数据集包含来自SUN数据集的2,222张图像。图像的记忆得分通过Amazon Mechanical Turk的Visual Memory Game得到，图像可记忆性是从0到1的连续值。值越高，图像越难记忆，具有各种记忆得分的样本图像如图2所示。The Image Memorability dataset contains 2,222 images from the SUN dataset. The memory score of the image is obtained by Amazon Mechanical Turk's Visual Memory Game, and the image memorability is a continuous value from 0 to 1. The higher the value, the harder the image is to remember, and sample images with various memory scores are shown in Figure 2.

本方法采取两种评估方法：This method adopts two evaluation methods:

排序相关评估方法(Ranking Correlation，RC)：得到真实记忆度排序和预测记忆度分数排序关系，采用排序相关的Spearman相关系数的标准来衡量两种排序之间的相关系数。它的取值范围是[-1,1],值越高代表两种排序更接近：Ranking Correlation (RC): Obtain the ranking relationship between the actual memory ranking and the predicted memory score, and use the ranking-related Spearman correlation coefficient standard to measure the correlation coefficient between the two rankings. Its value range is [-1, 1], and the higher the value, the closer the two sorts are:

其中，N是测试集图像个数，r₁中的元素r_1i是第i张图片在真实结果中排序的位置，r₂中的元素r_2i是第i张图片在预测结果中排序的位置。Among them, N is the number of images in the test set, the element r _1i in r ₁ is the position where the ith image is sorted in the real result, and the element r _2i in r ₂ is the position where the ith image is sorted in the predicted result.

R-value：评估预测分数与实际分数间的相关系数便于回归模型比较。R-value取值范围是[-1,1],1代表正相关，-1代表负相关：R-value: Evaluate the correlation between predicted scores and actual scores for easy comparison of regression models. The value range of R-value is [-1,1], 1 means positive correlation, -1 means negative correlation:

其中，N是测试集图像个数，s_i是图像真实记忆度分数向量，

是所有图像真实记忆度分数的均值；v_i是图像预测记忆度分数向量，

是所有图像预测记忆度分数的均值。Among them, N is the number of images in the test set, s _i is the image real memory score vector,

is the mean of the real memory score of all images; v _i is the image prediction memory score vector,

is the mean of the predicted memory scores for all images.

实验中将本方法与以下四种方法进行对比：In the experiment, this method is compared with the following four methods:

LR(Liner Regression)：利用线性预测函数训练底层特征与记忆度分数之间的关系；LR (Liner Regression): Use the linear prediction function to train the relationship between the underlying features and the memory score;

SVR(Support Vector Regression)：支持向量回归，将底层特征串在一起，结合RBF核函数学习非线性函数预测图像记忆度；SVR (Support Vector Regression): Support Vector Regression, string together the underlying features, combined with RBF kernel function to learn nonlinear functions to predict image memory;

MRR^[9](Multiple Rank Regression)：采用多阶左投影向量和右投影向量建立回归模型；MRR ^[9] (Multiple Rank Regression): Use multi-order left projection vector and right projection vector to establish a regression model;

MLHR^[10](Multi-Level via Hierarchical Regression)：基于分层回归的多媒体信息分析。MLHR ^[10] (Multi-Level via Hierarchical Regression): Multi-Level Regression Based Multimedia Information Analysis.

图3验证了算法的收敛性；图4展示了本方法与其他方法性能比较结果，可以看到本方法优于其他方法。对比方法只探究了底层特征与记忆度预测的关系。本方法将底层特征同图像属性特征结合在同一框架下对图像记忆度进行预测，得到一个较为稳定的模型。实验结果验证了本方法的可行性与优越性。Figure 3 verifies the convergence of the algorithm; Figure 4 shows the performance comparison results of this method and other methods, and it can be seen that this method is better than other methods. Contrastive methods only explore the relationship between underlying features and memory prediction. This method combines the underlying features and image attribute features to predict the image memory degree under the same framework, and obtains a relatively stable model. The experimental results verify the feasibility and superiority of this method.

参考文献references

[1]Zhang Z,Li F,Zhao M,et al.Joint low-rank and sparse principalfeature coding for enhanced robust representation and visual classification[J].IEEE Transactions on Image Processing,2016,25(6):2429-2443.[1] Zhang Z, Li F, Zhao M, et al. Joint low-rank and sparse principal feature coding for enhanced robust representation and visual classification [J]. IEEE Transactions on Image Processing, 2016, 25(6): 2429-2443 .

[2]Shi X,Guo Z,Lai Z,et al.A framework of joint graph embedding andsparse regression for dimensionality reduction[J].IEEE Transactions on ImageProcessing,2015,24(4):1341-1355.[2]Shi X,Guo Z,Lai Z,et al.A framework of joint graph embedding and sparse regression for dimensionality reduction[J].IEEE Transactions on ImageProcessing,2015,24(4):1341-1355.

[3]P.Isola,J.Xiao,A.Torralba,and A.Oliva,“What makes an imagememorable？”in Proc.Int.Conf.Comput.Vis.Pattern Recognit.,2011,pp.145–152.[3] P.Isola, J.Xiao, A.Torralba, and A.Oliva, "What makes an imagememorable?" in Proc.Int.Conf.Comput.Vis.Pattern Recognit., 2011, pp.145–152.

[4]P.Isola,D.Parikh,A.Torralba,and A.Oliva,“Understanding theintrinsic memorability of images,”in Proc.Adv.Conf.Neural Inf.Process.Syst.,2011,pp.2429–2437.[4] P.Isola, D.Parikh, A.Torralba, and A.Oliva, "Understanding the intrinsic memorability of images," in Proc.Adv.Conf.Neural Inf.Process.Syst., 2011, pp.2429–2437 .

[5]Tibshirani R.Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society.Series B(Methodological),1996:267-288.[5]Tibshirani R.Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society.Series B(Methodological),1996:267-288.

[6]Hoerl A E,Kennard R W.Ridge regression:Biased estimation fornonorthogonal problems.Technometrics,1970,12(1):55-67.[6] Hoerl A E, Kennard R W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 1970, 12(1):55-67.

[7]Purcell S,Neale B,Todd-Brown K,et al.PLINK:a tool set for whole-genome association and population-based linkage analyses.The American JournalofHuman Genetics,2007,81(3):559-575.[7] Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 2007, 81(3):559-575.

[8]Q.You,H.Jin,and J.Luo,“Visual sentiment analysis by attending onlocal image regions,”in Thirty-First AAAI Conference on ArtificialIntelligence,2017.[8] Q. You, H. Jin, and J. Luo, “Visual sentiment analysis by attending on local image regions,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.

[9]Hou C,Nie F,Yi D,et al.Efficient image classification via multiplerank regression.IEEE Transactions on Image Processing,2013,22(1):340-352.[9] Hou C, Nie F, Yi D, et al. Efficient image classification via multiplerank regression. IEEE Transactions on Image Processing, 2013, 22(1): 340-352.

[10]Sundt B.A multi-level hierarchical credibility regression model[J].Scandinavian Actuarial Journal,1980,1980(1):25-32.[10] Sundt B.A multi-level hierarchical credibility regression model[J]. Scandinavian Actuarial Journal, 1980, 1980(1): 25-32.

[11]J.Xiao,J.Hays,K.Ehinger,A.Oliva,A.Torralba et al.,“Sun database:Large-scale scene recognition from abbey to zoo,”in Proc.Int.Conf.Comput.Vis.Pattern Recognit.,2010,pp.3485–3492.[11] J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba et al., “Sun database: Large-scale scene recognition from abbey to zoo,” in Proc.Int.Conf.Comput. Vis. Pattern Recognit., 2010, pp. 3485–3492.

本领域技术人员可以理解附图只是一个优选实施例的示意图，上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. A multi-view learning method combining low-rank representation and sparse regression, characterized in that the method comprises the following steps:

Extract low-level features and high-level attribute features respectively for the SUN dataset with image memory score labels;

The three parts of low-rank representation, combined sparse regression model and multi-view consistency loss are placed under the same framework to form a whole, and a multi-view model combining low-rank and sparse regression is constructed;

The multi-vision adaptive regression algorithm is used to solve the problem of automatically predicting the memorability of images, and the relationship between image underlying features, image attribute features and image memory is obtained under the optimal parameters;

Combine the low-level features and high-level attribute features of the image, use the relationship results obtained under the optimal parameters to predict the image memory of the database test set, and use the relevant evaluation criteria to verify the prediction results;

The problem of automatically predicting the memorability of the image by using the multi-vision adaptive regression algorithm is: converting the equivalent problem through the slack variable Q:

s.t.X=XA+E, Q=Aw

Among them, A is the mapping matrix of the low-rank representation; w is the linear dependence between the low-rank feature representation and the output memory score; E is the sparse error constraint part; α is the balance parameter between the prediction error part and the regularization part ; β is the parameter to control sparsity; λ > 0 is the balance parameter; X is the input feature; y is the memorability score vector; * is the nuclear norm representation; φ is the constraint term of graph regularization; operator;

Two slack variables Y ₁ and Y ₂ are introduced to obtain the augmented Lagrangian function:

Among them, <,> represents the inner product operation of the matrix, Y ₁ and Y ₂ represent the Lagrangian operator matrix, and μ > 0 is the positive penalty parameter. The above methods are combined as:

in

Introduce variable t, define A _t , E _t , Q _t , w _t , Y _{1, t} , Y _{2, t} and μ as the result of the t-th iteration of the variables, and the result of the t+1-th iteration is as follows:

The iterative result of A:

in,

Fixing w, A, Q to get the optimization result of E is as follows:

By fixing E, A, Q, the result of optimizing w is as follows:

The above problem is a ridge regression problem, and the optimal solution is

Finally, fix E, w, A, optimize Q, get:

Y ₁ and Y ₂ are updated with the following scheme:

Y _1,t+1 =Y _1,t + μ _t (X-XA _t+1 -E _t+1 )

Y _2,t+1 =Y _2,t + μ _t (Q _t+1 -A _t+1 w _t+1 )

in,

is the symbol for partial derivative;

The multi-view model of the joint low-rank and sparse regression is specifically:

in:

G(φ _H ,φ _l )=tr[(X _l A _l w _l ) ^T X _H A _H w _H ]

is the loss function used as the prediction error for advanced features;

is the loss function used as the prediction error of low-level features;

2 . The multi-view learning method combining low-rank representation and sparse regression according to claim 1 , wherein the method further comprises: acquiring an image memorability data set. 3 .

3. A multi-view learning method combining low-rank representation and sparse regression according to claim 1, wherein the low-level features comprise: scale-invariant feature transformation features, search tree features, and directional gradient histogram features , and structural similarity features.

4 . The multi-view learning method combining low-rank representation and sparse regression according to claim 1 , wherein the high-level attribute features include: 327-dimensional scene category attribute features and 106-dimensional object attribute features. 5 .