CN114821180A

CN114821180A - A Weakly Supervised Fine-Grained Image Classification Method Based on Soft Threshold Penalty Mechanism

Info

Publication number: CN114821180A
Application number: CN202210487333.6A
Authority: CN
Inventors: 董琴; 范浩楠; 刘柱; 杨国宇
Original assignee: Yancheng Institute of Technology
Current assignee: Beijing Aifenghuan Information Technology Co ltd
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-07-29
Anticipated expiration: 2042-05-06
Also published as: CN114821180B

Abstract

The invention provides a weak supervision fine-grained image classification method based on a soft threshold punishment mechanism, which comprises the following steps: step 1: constructing a fine-grained image classification network of a two-level cascade network structure based on a soft threshold punishment mechanism; step 2: acquiring an image to be classified; and step 3: preprocessing the image to be classified; and 4, step 4: and based on the fine-grained image classification network, carrying out image classification on the preprocessing result and outputting an image classification result. The weak supervision fine-grained image classification method based on the soft threshold punishment mechanism optimizes the MMAL-Net network structure, reduces the calculated amount of the whole model by reducing the number of network branch layers, and also reduces the requirement of the training model on hardware; on the basis of a new branch structure, adding a soft threshold punishment mechanism module to resist noise appearing in the image; effectively shielding interference information, thereby improving the overall precision.

Description

A Weakly Supervised Fine-Grained Image Classification Method Based on Soft Threshold Penalty Mechanism

技术领域technical field

本发明涉及图像分类和智能优化技术领域，特别涉及一种基于软阈值惩罚机制的弱监督细粒度图像分类方法。The invention relates to the technical field of image classification and intelligent optimization, in particular to a weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism.

背景技术Background technique

MMAL-Net是多分支多尺度学习网络，基于全局特征的弱监督细粒度分类方法。沿用了另一种局部特征分类所用到的方法，以三级级联作为整体架构。MMAL-Net模型分类的准确性较高，在诸多的数据集上能够到达SOTA的准确率，甚至在飞行器数据集上达到94.7％，是目前该数据集的最高准确率。MMAL-Net的算法流程如图2所示。MMAL-Net is a multi-branch multi-scale learning network, a weakly supervised fine-grained classification method based on global features. Another method used for local feature classification is followed, and the three-level cascade is used as the overall architecture. The classification accuracy of the MMAL-Net model is high, and it can reach the accuracy of SOTA on many datasets, even reaching 94.7% on the aircraft dataset, which is currently the highest accuracy rate of this dataset. The algorithm flow of MMAL-Net is shown in Figure 2.

MMAL-Net以RA-CNN网络作为基础结构，选用了三级级联网络的方式来进行，在每一级网络上，采用ResNet进行特征的提取与分类，与之不同的是，在每一级网络之间，穿插有两个模块，分别是AOLM(Attention Object Location Module)和APPM(Attention PartProposal Module)模块，这两个模块将整个三级网络分成了三个分支，分别是原始图像分支、主体图像分支和部分图像分支。MMAL-Net uses the RA-CNN network as the basic structure, and selects a three-level cascade network to carry out. On each level of the network, ResNet is used for feature extraction and classification. The difference is that at each level There are two modules interspersed between the networks, namely AOLM (Attention Object Location Module) and APPM (Attention Part Proposal Module) modules. These two modules divide the entire three-level network into three branches, namely the original image branch and the main body. Image branches and partial image branches.

ALOM用来预测物体的位置。采用了特征图的聚合操作，在特征提取的最后阶段，让特征图在通道维度进行聚合，设置相应阈值提取响应较高的最大连通域，并将该相应区域的感受野映射回原始图像，便可以找到原始图像中主体部分的位置，再将此部分的切割出来，然后再次进行特征的提取。ALOM is used to predict the position of objects. The aggregation operation of the feature map is adopted. In the final stage of feature extraction, the feature map is aggregated in the channel dimension, the corresponding threshold is set to extract the largest connected domain with high response, and the receptive field of the corresponding region is mapped back to the original image, so that the The position of the main part in the original image can be found, then cut out this part, and then perform feature extraction again.

APPM在不需要边界边框或者标注的情况下，预测物体重点区域的信息。选用了一些固定大小的滑窗，并对滑窗中的数据进行池化操作，从而算出来每一个区域的计算结果，并对这些结果进行排序，选择结果比较大的区域，在对该区域进行非极大抑制操作后，再将该部件的图像输入网络。APPM predicts the information of key regions of objects without the need for bounding boxes or labels. Some fixed-size sliding windows are selected, and the data in the sliding window is pooled to calculate the calculation results of each area, and these results are sorted, and the area with the larger result is selected. After the non-maximum suppression operation, the image of the part is then fed into the network.

在MMAL-Net三级网络中所用到的Loss函数均是基础的Cross-entropy loss，最后对三级网络的loss值求和，得出最终的loss。The Loss function used in the MMAL-Net three-level network is the basic Cross-entropy loss, and finally the loss values of the three-level network are summed to obtain the final loss.

但是，MMAL-Net存在以下两点缺点：However, MMAL-Net has the following two disadvantages:

(1)在整个三级级联网络结构中，虽然通过共用一组参数让整体网络的参数数量减少，但计算量却因为复杂的三次级联结构显著提升，又因为前两级网络的输入都是448*448像素的图片，让整个网络的计算量又一步提升，因此训练网络的速度变慢很多，也占用了很大一部分显存。(1) In the entire three-level cascaded network structure, although the number of parameters of the overall network is reduced by sharing a set of parameters, the amount of calculation is significantly increased because of the complex three-level cascaded structure, and because the inputs of the first two levels of network are both It is a 448*448 pixel picture, which further increases the calculation amount of the entire network, so the speed of training the network is much slower, and it also takes up a large part of the video memory.

(2)在中部分支的设置中，可以通过原图得到主体的定位信息，从而得到具有辨识性的区域。但是在一些不存在明显主体的细粒度图像分类任务中，这一分支的存在反而会削弱整个的特征提取能力。(2) In the setting of the middle branch, the positioning information of the main body can be obtained through the original image, so as to obtain a distinguishable area. However, in some fine-grained image classification tasks without obvious subjects, the existence of this branch will weaken the entire feature extraction ability.

发明内容SUMMARY OF THE INVENTION

本发明提供一种基于软阈值惩罚机制的弱监督细粒度图像分类方法，优化了MMAL-Net网络结构，通过减少网络分支层数，来减少整体模型的计算量，也降低了训练模型对硬件的要求；在新分支结构的基础上，加入软阈值惩罚机制模块，来抵制图像中出现的噪声；有效屏蔽干扰信息，从而使得整体精度提升。The invention provides a weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism, optimizes the MMAL-Net network structure, reduces the calculation amount of the overall model by reducing the number of network branch layers, and also reduces the training model on the hardware. Requirements; On the basis of the new branch structure, a soft threshold penalty mechanism module is added to resist the noise appearing in the image; the interference information is effectively shielded, thereby improving the overall accuracy.

本发明提供一种基于软阈值惩罚机制的弱监督细粒度图像分类方法，包括：The present invention provides a weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism, including:

步骤1：基于软阈值惩罚机制，构建二级级联网络结构的细粒度图像分类网络；Step 1: Based on the soft threshold penalty mechanism, construct a fine-grained image classification network with a two-level cascade network structure;

步骤2：获取待分类图像；Step 2: Obtain the image to be classified;

步骤3：对所述待分类图像进行预处理；Step 3: preprocessing the to-be-classified image;

步骤4：基于所述细粒度图像分类网络，对预处理结果进行图像分类，并输出图像分类结果。Step 4: Perform image classification on the preprocessing result based on the fine-grained image classification network, and output the image classification result.

优选的，所述步骤1：基于软阈值惩罚机制，构建二级级联网络结构的细粒度图像分类网络，包括：Preferably, the step 1: constructing a fine-grained image classification network with a two-level cascade network structure based on a soft threshold penalty mechanism, including:

构建第一网络分支，所述第一网络分支包括：依次连接的Input448*448*3、第一ResNet50、Feature14*14*2048、第一GAP、第一FC和第一Softmax；Build a first network branch, the first network branch includes: Input448*448*3, the first ResNet50, Feature14*14*2048, the first GAP, the first FC and the first Softmax connected in sequence;

构建第二网络分支，所述第二网络分支包括：依次连接的Input224*224*3*mult、第二ResNet50、Feature*7*7*2048*mult、第二GAP、第二FC和第二Softmax；Build a second network branch, the second network branch includes: Input224*224*3*mult, second ResNet50, Feature*7*7*2048*mult, second GAP, second FC and second Softmax connected in sequence ;

将所述第二网络分支中的Input224*224*3*mult通过crop与所述第一网络分支中的Input448*448*3连接；Connect Input224*224*3*mult in the second network branch to Input448*448*3 in the first network branch through crop;

将所述第一网络分支中的Feature14*14*2048通过APPM与所述crop连接；Connect Feature14*14*2048 in the first network branch to the crop through APPM;

为所述第一网络分支设置第一损失函数RawLoss；setting a first loss function RawLoss for the first network branch;

为所述第二网络分支设置第二损失函数PartLoss；setting a second loss function PartLoss for the second network branch;

在所述APPM中设置软阈值惩罚机制；A soft threshold penalty mechanism is set in the APPM;

所述第一网络分支、第二网络分支、APPM和crop组成二级级联网络结构的细粒度图像分类网络。The first network branch, the second network branch, APPM and crop form a fine-grained image classification network with a two-level cascade network structure.

优选的，所述APPM是基于SCDA形成，将所述APPM对特征提取出来的所述Feature14*14*2048沿着池化层的通道方向进行合拢，得到14*14*1的二维图，用预设的多个不同尺寸的滑窗对所述二维图进行滑窗计算，计算过程如公式(2-1)所示：Preferably, the APPM is formed based on SCDA, and the Feature14*14*2048 extracted from the feature by the APPM is closed along the channel direction of the pooling layer to obtain a 14*14*1 two-dimensional image, using A plurality of preset sliding windows of different sizes are used to perform sliding window calculation on the two-dimensional image, and the calculation process is shown in formula (2-1):

其中，H和W分别为滑窗的高度和宽度，A(x，y)为合拢好的二维图的坐标位置对应的数值，a_w为滑窗计算结果。Among them, H and W are the height and width of the sliding window respectively, A(x, y) is the value corresponding to the coordinate position of the collapsed two-dimensional image, and a _w is the calculation result of the sliding window.

优选的，所述第一损失函数RawLoss的公式如公式(2-2)所示：Preferably, the formula of the first loss function RawLoss is shown in formula (2-2):

其中，m_i为第i个样本图像，n_i为第一卷积神经网络CNN对应于第i个样本图像的预测概率。Among them, m _i is the ith sample image, and _ni is the prediction probability of the first convolutional neural network CNN corresponding to the ith sample image.

优选的，所述第二损失函数PartLoss的公式如公式(2-3)所示：Preferably, the formula of the second loss function PartLoss is shown in formula (2-3):

其中，q为由所述第二ResNet50筛选出的局部特征区域个数，m_iq为第i个样本图像对应的第q个局部特征区域个数，n_iq为第二卷积神经网络CNN对应于第i个样本图像对应的第q个局部特征区域个数的预测概率。Wherein, q is the number of local feature regions selected by the second ResNet50, m _iq is the number of the qth local feature regions corresponding to the ith sample image, and n _iq is the second convolutional neural network CNN corresponding to The predicted probability of the number of qth local feature regions corresponding to the ith sample image.

优选的，所述软阈值惩罚机制包括：Preferably, the soft threshold penalty mechanism includes:

设F(x，y)为不含噪声的图像，N(x，y)为噪声，G(x，y)为噪声影响之后的图像，选用L_1/2范式进行模型的建立如公式(2-5)所示：Let F(x, y) be the image without noise, N(x, y) be the noise, G(x, y) be the image after the noise, and choose the L _1/2 paradigm to build the model as shown in formula (2) -5) shown:

i表示的是图像的序号，当进行图像迭代处理时，G(x_i,y_i)-F(x_i,y_i)出现残差，说明图像中出现噪声，并造成影响；i represents the serial number of the image. When the image is iteratively processed, there is a residual error in G(x _i , y _i )-F(x _i , y _i ), which means that there is noise in the image and affects it;

通过软阈值来限定残差状态，先构造惩罚因子||G(x_i,y_i)-F(x_i,y_i)||_h，来限制G(x_i,y_i)-F(x_i,y_i)不大于0，从而降低图像被噪声干扰的程度，如公式(2-6)所示：The residual state is limited by the soft threshold, and the penalty factor ||G(x _i ,y _i )-F(x _i ,y _i )|| _h is constructed first to limit G(x _i ,y _i )-F(x _i , y _i ) is not greater than 0, thereby reducing the degree of image interference by noise, as shown in formula (2-6):

公式中λ表示惩罚系数，调节该系数可以使结果接近真实值，由于又软阈值的限制，近似值也可能会出现在真实值的之上，因此软阈值可以有效降低噪声干扰的影响。In the formula, λ represents the penalty coefficient. Adjusting this coefficient can make the result close to the real value. Due to the limitation of the soft threshold, the approximate value may also appear above the real value. Therefore, the soft threshold can effectively reduce the influence of noise interference.

优选的，为进一步优化软阈值方法，可进一步修改目标函数，如公式(2-7)所示：Preferably, in order to further optimize the soft threshold method, the objective function can be further modified, as shown in formula (2-7):

其中，V为辅助变量，λ₁和λ₂均为惩罚系数，在计算过程中，使用软阈值算法对V进行迭代更新处理。Among them, V is an auxiliary variable, and λ ₁ and λ ₂ are both penalty coefficients. In the calculation process, a soft threshold algorithm is used to iteratively update V.

优选的，所述步骤2：获取待分类图像，包括：Preferably, the step 2: obtaining the images to be classified, including:

当用户在相册的查看界面触控圈选多个第一照片形成圈选框且所述圈选框在预设的第一时间内沿同一扩大方向扩大时，获取并输出预设的免触圈选提示信息，同时，控制所述圈选框沿所述扩大方向以预设的第一扩大速度继续进行扩大；When the user touches and selects a plurality of first photos on the viewing interface of the album to form a circled selection box and the circled selection box expands in the same expansion direction within the preset first time, obtain and output the preset touch-free circle selecting prompt information, and at the same time, controlling the encircled box to continue expanding along the expanding direction at a preset first expanding speed;

动态获取所述用户当前的眼部视线；Dynamically obtain the current eye sight of the user;

确定所述查看界面内对应于所述眼部视线的第一注视点位；determining a first gaze point corresponding to the eye sight in the viewing interface;

若所述第一注视点位落在所述查看界面内的所述圈选框内，获取所述第一注视点位与所述圈选框的所述扩大方向上的目标框边之间的第一垂直距离；If the first gaze point is within the encircled box in the viewing interface, obtain the distance between the first gaze point and the edge of the target frame in the enlarging direction of the encircled box. the first vertical distance;

基于所述第一垂直距离，对所述第一扩大速度进行调整，调整公式如公式(2-8)所示：Based on the first vertical distance, the first expansion speed is adjusted, and the adjustment formula is shown in formula (2-8):

其中，v₁′为调整后的所述第一扩大速度，v₁为调整前的所述第一扩大速度，l₁为所述第一垂直距离，

为预设的第一关系系数；Wherein, v ₁ ′ is the first expansion speed after adjustment, v ₁ is the first expansion speed before adjustment, l ₁ is the first vertical distance,

is the preset first relationship coefficient;

若所述第一注视点位落在所述查看界面内的所述圈选框外所述扩大方向上的待圈选范围内且所述第一注视点位在预设的第二时间内发生变化，获取所述第一注视点位与所述圈选框的所述扩大方向上的目标框边之间的第二垂直距离；If the first gaze point falls within the to-be-selected range outside the encircled box in the viewing interface in the expanding direction and the first gaze point occurs within a preset second time change, and obtain the second vertical distance between the first gaze point and the target frame edge in the enlarging direction of the circle selection frame;

基于所述第二垂直距离，对所述第一扩大速度进行调整，调整公式如公式(2-9)所示：Based on the second vertical distance, the first expansion speed is adjusted, and the adjustment formula is shown in formula (2-9):

其中，v₂′为调整后的所述第一扩大速度，v₂为调整前的所述第一扩大速度，l₂为所述第二垂直距离，

为预设的第二关系系数；Wherein, v ₂ ′ is the first expansion speed after adjustment, v ₂ is the first expansion speed before adjustment, l ₂ is the second vertical distance,

is the preset second relationship coefficient;

若所述第一注视点位落在所述查看界面内的所述圈选框外所述扩大方向上的待圈选范围内且所述第一注视点位在所述第二时间内未发生变化，获取所述查看界面内对应于所述第一注视点位的第二照片；If the first gaze point is within the to-be-selected range outside the encircled box in the viewing interface in the enlarging direction and the first gaze point does not occur within the second time change, and obtain a second photo corresponding to the first gaze point in the viewing interface;

当所述第二照片刚好进入所述圈选框时，控制所述圈选框停止扩大；When the second photo just enters the circled selection box, controlling the circled selection box to stop expanding;

获取所述圈选框的所述扩大方向上的目标框边的移动类型；Obtain the movement type of the target frame edge in the expansion direction of the circled selection frame;

当所述移动类型为向下换行时，控制所述圈选框退选所述查看界面内所述第二照片所在行内所述第二照片右侧的全部第三照片；When the movement type is line feed downward, controlling the circle selection box to deselect all the third photos to the right of the second photo in the row where the second photo is located in the viewing interface;

当所述移动类型为向上换行时，控制所述圈选框退选所述查看界面内所述第二照片所在行内所述第二照片左侧的全部第四照片；When the movement type is line wrap up, controlling the circle selection box to deselect all the fourth photos on the left side of the second photo in the row where the second photo is located in the viewing interface;

当所述移动类型为向右换列时，控制所述圈选框退选所述查看界面内所述第二照片所在列内所述第二照片下侧的全部第五照片；When the movement type is to change column to the right, control the circle selection box to deselect all the fifth photos below the second photo in the column where the second photo is located in the viewing interface;

当所述移动类型为向左换列时，控制所述圈选框退选所述查看界面内所述第二照片所在列内所述第二照片上侧的全部第六照片；When the movement type is to change columns to the left, controlling the circle selection box to deselect all sixth photos on the upper side of the second photo in the column where the second photo is located in the viewing interface;

退选完成后，将所述圈选框内圈选的全部第七照片作为待分类图像，完成获取。After the selection is completed, all the seventh photos circled in the circle selection box are used as images to be classified, and the acquisition is completed.

优选的，所述获取所述查看界面内对应于所述第一注视点位的第二照片，包括：Preferably, the acquiring the second photo corresponding to the first gaze point in the viewing interface includes:

获取所述显示界面当前的缩略比例；obtaining the current thumbnail scale of the display interface;

获取对应于所述显示界面的大小的预设的缩略比例阈值；obtaining a preset abbreviated ratio threshold corresponding to the size of the display interface;

若所述缩略比例大于等于所述缩略比例阈值，确定所述查看界面内位于所述第一注视点位的第八照片，并作为第二照片，完成获取；If the abbreviated ratio is greater than or equal to the abbreviated ratio threshold, determine the eighth photo located at the first gaze point in the viewing interface, and use it as the second photo to complete the acquisition;

否则，确定所述查看界面内位于所述第一注视点位的多个第九照片；Otherwise, determining a plurality of ninth photos located at the first gaze point in the viewing interface;

基于所述第九照片，生成截止照片放大确认框；Based on the ninth photo, generating a cut-off photo enlargement confirmation box;

在所述显示界面内悬浮显示所述截止照片放大确认框；suspending and displaying the cut-off photo enlargement confirmation box in the display interface;

确定所述查看界面内对应于所述用户当前的眼部视线的第二注视点位；determining a second gaze point in the viewing interface that corresponds to the user's current eye sight;

当所述第二注视点位落在所述截止照片放大确认框内且在预设的第三时间内所述第二注视点位未发生改变时，将所述截止照片放大确认框内位于所述第二注视点位的所述第九照片，并作为第二照片，完成获取。When the second gaze point is located in the cut-off photo enlargement confirmation frame and the second gaze point does not change within the preset third time, the cut-off photo magnification confirmation frame is located in the cut-off photo magnification confirmation frame. The ninth photo of the second gaze point is used as the second photo to complete the acquisition.

本发明提供一种基于软阈值惩罚机制的弱监督细粒度图像分类系统，包括：The present invention provides a weakly supervised fine-grained image classification system based on a soft threshold penalty mechanism, including:

构建模块，用于基于软阈值惩罚机制，构建二级级联网络结构的细粒度图像分类网络；The building block is used to construct a fine-grained image classification network with a two-level cascade network structure based on the soft threshold penalty mechanism;

获取模块，用于获取待分类图像；The acquisition module is used to acquire the images to be classified;

预处理模块，用于对所述待分类图像进行预处理；a preprocessing module for preprocessing the to-be-classified image;

分类模块，用于基于所述细粒度图像分类网络，对预处理结果进行图像分类，并输出图像分类结果。The classification module is configured to perform image classification on the preprocessing result based on the fine-grained image classification network, and output the image classification result.

本发明的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description, claims, and drawings.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.

附图说明Description of drawings

附图用来提供对本发明的进一步理解，并且构成说明书的一部分，与本发明的实施例一起用于解释本发明，并不构成对本发明的限制。在附图中：The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the specification, and are used to explain the present invention together with the embodiments of the present invention, and do not constitute a limitation to the present invention. In the attached image:

图1为本发明实施例中一种基于软阈值惩罚机制的弱监督细粒度图像分类方法的流程图；1 is a flowchart of a weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism in an embodiment of the present invention;

图2为本发明实施例中MMAL-Net的结构示意图；2 is a schematic structural diagram of MMAL-Net in an embodiment of the present invention;

图3为本发明实施例中细粒度图像分类网络的结构示意图；3 is a schematic structural diagram of a fine-grained image classification network in an embodiment of the present invention;

图4为本发明实施例中一种基于软阈值惩罚机制的弱监督细粒度图像分类系统的示意图。FIG. 4 is a schematic diagram of a weakly supervised fine-grained image classification system based on a soft threshold penalty mechanism according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的优选实施例进行说明，应当理解，此处所描述的优选实施例仅用于说明和解释本发明，并不用于限定本发明。The preferred embodiments of the present invention will be described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention, but not to limit the present invention.

本发明提供一种基于软阈值惩罚机制的弱监督细粒度图像分类方法，如图1所示，包括：The present invention provides a weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism, as shown in Figure 1, including:

步骤2：获取待分类图像；Step 2: Obtain the image to be classified;

所述步骤1：基于软阈值惩罚机制，构建二级级联网络结构的细粒度图像分类网络，包括：Described step 1: Based on the soft threshold penalty mechanism, construct a fine-grained image classification network with a two-level cascade network structure, including:

所述APPM是基于SCDA形成，将所述APPM对特征提取出来的所述Feature14*14*2048沿着池化层的通道方向进行合拢，得到14*14*1的二维图，用预设的多个不同尺寸的滑窗对所述二维图进行滑窗计算，计算过程如公式(2-1)所示：The APPM is formed based on SCDA, and the Feature14*14*2048 extracted by the APPM is closed along the channel direction of the pooling layer to obtain a 14*14*1 two-dimensional image. Multiple sliding windows of different sizes perform sliding window calculation on the two-dimensional image, and the calculation process is shown in formula (2-1):

所述第一损失函数RawLoss的公式如公式(2-2)所示：The formula of the first loss function RawLoss is shown in formula (2-2):

所述第二损失函数PartLoss的公式如公式(2-3)所示：The formula of the second loss function PartLoss is shown in formula (2-3):

所述软阈值惩罚机制包括：The soft threshold penalty mechanism includes:

为进一步优化软阈值方法，可进一步修改目标函数，如公式(2-7)所示：To further optimize the soft threshold method, the objective function can be further modified, as shown in formula (2-7):

上述技术方案的工作原理及有益效果为：The working principle and beneficial effects of the above technical solutions are as follows:

本发明把MMAL-Net模型的三级级联网络修改成二级级联方式，把最后一级的推理任务放到了第一分支上。也就是说，在整个网络模型的运行过程中，仅需要通过一个普通的网络分类模型，就能得到分类细粒度网络的效果。其结构如图3所示。The invention modifies the three-level cascade network of the MMAL-Net model into a two-level cascade mode, and puts the reasoning task of the last level on the first branch. That is to say, in the running process of the entire network model, the effect of classifying fine-grained networks can be obtained only through a common network classification model. Its structure is shown in Figure 3.

Input448*448*3为图像输入(图像大小为448像素*448像素，R(红)、G(绿)、B(蓝)三个颜色通道)；Input448*448*3 is image input (image size is 448 pixels*448 pixels, R (red), G (green), B (blue) three color channels);

第一ResNet50为一种网络模型，深度残差网络，50是指网络层数；The first ResNet50 is a network model, a deep residual network, and 50 refers to the number of network layers;

Feature14*14*2048为2048张14像素*14像素特征二维图；Feature14*14*2048 is 2048 14-pixel*14-pixel feature two-dimensional maps;

第一GAP为全局平均池化(Global Average Pooling)；The first GAP is Global Average Pooling;

第一FC为全连接层(Fully Connection)；The first FC is a fully connected layer (Fully Connection);

第一Softmax为作为输出层的激励函数，在机器学习中常被看作是一种多分类器。通俗的意思就是，将一个物品输入，得出其中可能属于的类别概率。The first Softmax is the excitation function as the output layer, which is often regarded as a multi-classifier in machine learning. The popular meaning is to input an item and get the probability of the category it may belong to.

Input224*224*3*mult为对Input448*448*3进行crop方法所得到的固定大小的图像输入(第一分支图像输入的部分区块，图像大小为224像素*224像素，R(红)、G(绿)、B(蓝)三个颜色通道)，mult是指图像裁剪后所得到固定区块数目。Input224*224*3*mult is a fixed-size image input obtained by performing the crop method on Input448*448*3 (part of the image input block of the first branch, the image size is 224 pixels*224 pixels, R (red), G (green), B (blue) three color channels), mult refers to the number of fixed blocks obtained after the image is cropped.

第二ResNet50与第一ResNet50同理；The second ResNet50 is the same as the first ResNet50;

Feature*7*7*2048*mult为mult份2048张7像素*7像素特征二维图；Feature*7*7*2048*mult is 2048 7-pixel*7-pixel feature two-dimensional maps in mult copies;

第二GAP、第二FC和第二Softmax分别与第一GAP、第一FC和第一Softmax同理；The second GAP, the second FC and the second Softmax are respectively the same as the first GAP, the first FC and the first Softmax;

Crop为裁剪，是直接从图像中截出一部分，保留原图像的真实尺寸比。根据图像裁剪的方法不同，所得到的裁剪图像的数量也不同。Crop is cropping, which is to directly cut out a part of the image and retain the true size ratio of the original image. Depending on the method of image cropping, the number of resulting cropped images is also different.

经过图像预处理(滤波降噪、灰度化和缩至448像素*448像素大小)，将缩至448像素*448像素大小的图像作为网络的第一级输入。在网络模型的设计过程中，以ResNet50作为特征提取的主干网络。残差块经过特殊的结构设计得到ResNet50网络，其中50是指深度网络中卷积层和全连接层的总层数，整体精度适中，总体的计算量也适中，ResNet50的残差块采用了BottleNeck结构，既不会影响精度，也不会降低整体的计算量，特征通道的数量为2048，也更有助于选择出具有辨识性的区域。整体上，网络的两个分支特征提取采用了参数共享的方式，不仅减少了整个网络的参数量，还能使个网络适用于不同尺寸、不同部位的图像。After image preprocessing (filtering noise reduction, grayscale, and shrinking to 448 pixels * 448 pixels size), the image reduced to 448 pixels * 448 pixels size is used as the first-level input of the network. In the design process of the network model, ResNet50 is used as the backbone network for feature extraction. The residual block is specially designed to obtain the ResNet50 network, of which 50 refers to the total number of convolutional layers and fully connected layers in the deep network. The overall accuracy is moderate, and the overall calculation amount is also moderate. The ResNet50 residual block uses BottleNeck The structure will neither affect the accuracy nor reduce the overall calculation amount. The number of feature channels is 2048, which is also more helpful for selecting regions with identification. On the whole, the feature extraction of the two branches of the network adopts the method of parameter sharing, which not only reduces the amount of parameters of the entire network, but also makes each network suitable for images of different sizes and different parts.

网络的局部特征提取选用了APPM结构，是基于SCDA(Selective ConvolutionalDescriptor Aggregation，选择性卷积描述符聚合)的研究形成的。首先，将模块对特征提取出来的14*14*2048沿着通道方向进行合拢，便得到了14*14*1的二维图。14*14*1的二维图为1张14像素*14像素大小的图像特征二维图。The local feature extraction of the network adopts the APPM structure, which is formed based on the research of SCDA (Selective Convolutional Descriptor Aggregation). First, the 14*14*2048 extracted from the feature by the module are closed along the channel direction, and a 14*14*1 two-dimensional image is obtained. A 14*14*1 two-dimensional map is a two-dimensional map of image features with a size of 14 pixels*14 pixels.

然后，用几种预先设定好的不同尺寸的滑窗来计算，滑窗内部的计算过程如公式(2-1)所示Then, use several pre-set sliding windows of different sizes to calculate, and the calculation process inside the sliding window is shown in formula (2-1).

预设好滑窗的高度和宽度分别是公式中的H和W，A(x,y)指合拢好的二维图坐标位置的对应数值，a_w即表示通过当前的滑窗所计算出来的数值。A(x，y)为合拢好的二维图的坐标位置对应的数值具体为14*14*1的二维图在对应待分类图像即原始图像中位置坐标值。其次，对不同位置的a_w值排序，其中输出值越大的区域就是具有辨识性特征的区域。最后，在存在特征区域重叠，不能简单排序的区域，使用NMS(non maximum suppression，非极大抑制)的方式选取多个高辨识性、低冗余度的候选区作为局部特征区域，成为第二层的输入图像。The preset height and width of the sliding window are H and W in the formula respectively, A(x,y) refers to the corresponding value of the coordinate position of the collapsed two-dimensional image, and a _w means the value calculated by the current sliding window. numerical value. A(x, y) is the value corresponding to the coordinate position of the collapsed two-dimensional image, specifically the coordinate value of the position of the two-dimensional image of 14*14*1 in the corresponding image to be classified, that is, the original image. Secondly, sort the a _w values of different positions, and the area with the larger output value is the area with discriminative features. Finally, in the area where the feature areas overlap and cannot be easily sorted, NMS (non maximum suppression, non-maximum suppression) is used to select multiple candidate areas with high identification and low redundancy as local feature areas, which become the second The input image for the layer.

本发明所采用的损失函数就是多分类问题中最常用的Cross-Entropy(交叉熵)损失。第一层Raw Loss公式如(2-2)所示The loss function adopted in the present invention is the most commonly used Cross-Entropy loss in the multi-classification problem. The Raw Loss formula of the first layer is shown in (2-2)

其中，i表示的是样本图像，m_i表示的是样本图像的真实标号，通常代入1来计算，n_i指的是此类别神经网络的预测概率。m_i为第i个样本图像，该样本图像来源于上文滑窗截取的图像，共2048张，即i取值从1到2048。Among them, i represents the sample image, m _i represents the real label of the sample image, which is usually calculated by substituting 1, and _ni refers to the prediction probability of this type of neural network. m _i is the ith sample image, which is derived from the images captured by the sliding window above, with a total of 2048 images, that is, the value of i ranges from 1 to 2048.

第二层Part Loss公式如(2-3)所示The second layer Part Loss formula is shown in (2-3)

此处的q表示筛选出的局部特征区域个数，对每一块区域计算并得出损失值，最后求出平均值。m_iq为第i个样本图像对应的第q个局部特征区域具体为：该样本图像来源于第一分支预处理后的图像crop(裁剪)之后的图像，其对应的第q个局部特征区域。第二ResNet50筛选出的局部特征区域来自于待分类图像即原始图像预处理后的图像，即经过滤波降噪、灰度化后的图像，共2048个局部特征区域。Here, q represents the number of selected local feature regions, and the loss value is calculated and obtained for each region, and finally the average value is obtained. m _iq is the q th local feature area corresponding to the ith sample image. Specifically, the sample image is derived from the image after cropping (crop) of the preprocessed image of the first branch, and its corresponding q th local feature area. The local feature regions screened by the second ResNet50 come from the image to be classified, that is, the preprocessed image of the original image, that is, the image after filtering, noise reduction, and grayscale. There are a total of 2048 local feature regions.

第一卷积神经网络CNN来自于第一ResNet50中，ResNet50就是卷积神经网络的一种网络模型；第二卷积神经网络与之同理。The first convolutional neural network CNN comes from the first ResNet50, which is a network model of the convolutional neural network; the second convolutional neural network is the same.

总的损失函数公式如(2-4)所示The total loss function formula is shown in (2-4)

Loss_total＝μLoss_raw+ωLoss_part， (2-4)Loss _total = μLoss _raw +ωLoss _part , (2-4)

Loss_raw，loss_part分别表示第一层和第二层的损失值，Loss_total即表示整体损失，μ和ω是取0-1之间的值，表示的是两层分支对整体网络的影响权重，用以调节。Loss _raw , loss _part represent the loss value of the first layer and the second layer respectively, Loss _total represents the overall loss, μ and ω are values between 0-1, which represent the influence weight of the two-layer branch on the overall network , to adjust.

为降低网络的复杂度，能够更好的被使用，本发明使用的轻量化方法使用的是SqueezeNet(轻量化卷积神经网)结构，首先采用1*1的卷积核对局部特征卷积，起到降维作用即减少输入的通道数量，然后在Expand层分别用1*1和3*3的卷积核进行卷积运算，最后将运行结果拼接得到最终的输出特征数据。In order to reduce the complexity of the network and enable it to be used better, the lightweight method used in the present invention uses the SqueezeNet (lightweight convolutional neural network) structure. The dimensionality reduction effect is to reduce the number of input channels, and then perform convolution operations with 1*1 and 3*3 convolution kernels in the Expand layer, and finally splicing the running results to obtain the final output feature data.

整体网络结构层次的减少，能够在识别准确率依旧高的条件下有效减少总的计算量。但是就原始数据集而言存在很多的环境噪声，又缺失主体监测的分支。本发明引入软阈值惩罚机制，来对局部特征提取能力进一步的提升。噪声对图像的影响可分为加法模型和乘法模型，设定F(x,y)为不含噪声的图像，N(x,y)为噪声，G(x,y)为噪声影响之后的图像，即原始图像，为了能够更好地提取特征，本发明选用L_1/2范式进行模型的建立如公式(2-5)所示The reduction of the overall network structure level can effectively reduce the total amount of calculation under the condition that the recognition accuracy is still high. However, as far as the original data set is concerned, there is a lot of environmental noise, and the branch of subject monitoring is missing. The present invention introduces a soft threshold penalty mechanism to further improve the local feature extraction capability. The influence of noise on the image can be divided into an additive model and a multiplicative model. Set F(x, y) as the image without noise, N(x, y) as the noise, and G(x, y) as the image after noise influence. , that is, the original image. In order to better extract features, the present invention selects the L _1/2 normal form to establish the model, as shown in formula (2-5).

i表示的是样本图像,当进行图像迭代处理时，G(x_i,y_i)-F(x_i,y_i)出现残差，说明图像中出现噪声，并造成影响。本发明提出的软阈值方法，通过软阈值来限定残差状态。i represents the sample image. When the image is iteratively processed, G(x _i , y _i )-F(x _i , y _i ) has a residual error, which means that there is noise in the image and affects it. The soft threshold method proposed by the present invention defines the residual state by the soft threshold.

(x_i,y_i)为第i个特征图像的坐标；

为在数学上表示真实值，在式中表示后面式中取得最小值时(x_i,y_i)的值；arg为复数辐角，指的是复数的辐角主值，在此式中argmin即表示后面函数取最小值时，xi和yi的取值；F、N、G仅为函数名。(x _i , y _i ) is the coordinate of the i-th feature image;

In order to express the true value mathematically, the value of (x _i , y _i ) when the minimum value is obtained in the following formula is expressed in the formula; arg is the complex argument, which refers to the main value of the complex argument, in this formula argmin That is, when the following function takes the minimum value, the values of xi and yi; F, N, and G are only function names.

先构造惩罚因子||G(x_i,y_i)-F(x_i,y_i)||_h，来限制G(x_i,y_i)-F(x_i,y_i)不大于0，从而降低图像被噪声干扰的程度，如公式(2-6)所示First construct the penalty factor ||G(x _i ,y _i )-F(x _i ,y _i )|| _h to restrict G(x _i ,y _i )-F(x _i ,y _i ) not to be greater than 0, Thereby reducing the degree of image interference by noise, as shown in formula (2-6)

公式中λ表示惩罚系数，调节该系数可以使结果接近真实值，由于又软阈值的限制，近似值也可能会出现在真实值的之上，因此软阈值可以有效降低噪声干扰的影响。s.t.全称subject to，意思是使得...满足...在式中的意思是：在N(x_i,y_i)>0,F(x_i,y_i)>0条件下使上式成立。In the formula, λ represents the penalty coefficient. Adjusting this coefficient can make the result close to the real value. Due to the limitation of the soft threshold, the approximate value may also appear above the real value. Therefore, the soft threshold can effectively reduce the influence of noise interference. The full name of st is subject to, which means to make...satisfy...In the formula, it means: under the condition of N(x _i , y _i )>0, F(x _i , y _i )>0, the above formula is established .

h为预设的常数，优先取1/2。h is a preset constant, preferably 1/2.

为进一步优化软阈值方法，可进一步修改目标函数To further optimize the soft threshold method, the objective function can be further modified

其中V为辅助变量，在计算过程中，使用软阈值算法对V进行迭代更新处理。利用软阈值惩罚机制有效抵抗数据中的噪声。Among them, V is an auxiliary variable. In the calculation process, a soft threshold algorithm is used to iteratively update V. The soft threshold penalty mechanism is used to effectively resist the noise in the data.

本申请优化了MMAL-Net网络结构，通过减少网络分支层数，来减少整体模型的计算量，也降低了训练模型对硬件的要求；在新分支结构的基础上，加入软阈值惩罚机制模块，来抵制图像中出现的噪声；有效屏蔽干扰信息，从而使得整体精度提升。This application optimizes the MMAL-Net network structure, reduces the calculation amount of the overall model by reducing the number of network branch layers, and also reduces the hardware requirements of the training model; on the basis of the new branch structure, a soft threshold penalty mechanism module is added, To resist the noise appearing in the image; effectively shield the interference information, so as to improve the overall accuracy.

细粒度图像分类最初是采用强监督的深度学习来完成分类任务，强监督细粒度图像分类方法的监督信息依赖过多的标注，除了用于细粒度分类的网络外，整个算法框架中需要一个部件定位的目标检测网络或是语义分割网络。这导致了数据标注的成本与网络结构的成本都是高昂的，使得强监督方法无法在实际生产过程中得到较好的应用。因此，本申请仅使用类别标签，不需要额外标注信息，属于弱监督方法。Fine-grained image classification originally used strongly supervised deep learning to complete the classification task. The supervision information of the strongly supervised fine-grained image classification method relies on too many annotations. In addition to the network used for fine-grained classification, the entire algorithm framework needs a component Localized object detection network or semantic segmentation network. This leads to the high cost of data annotation and network structure, which makes the strong supervision method unable to be well applied in the actual production process. Therefore, this application only uses category labels, does not require additional labeling information, and belongs to a weakly supervised method.

在一个实施例中，所述步骤2：获取待分类图像，包括：In one embodiment, the step 2: acquiring the image to be classified includes:

当用户在相册的查看界面触控圈选多个第一照片形成圈选框且所述圈选框在预设的第一时间内沿同一扩大方向扩大时，获取并输出预设的免触圈选提示信息，同时，控制所述圈选框沿所述扩大方向以预设的第一扩大速度继续进行扩大；When the user touches and selects multiple first photos on the viewing interface of the album to form a circled selection box, and the circled selection box expands in the same expansion direction within the preset first time, obtain and output the preset touch-free circle selecting prompt information, and at the same time, controlling the encircling box to continue to expand along the expanding direction at a preset first expanding speed;

is the preset first relationship coefficient;

若所述第一注视点位落在所述查看界面内的所述圈选框外所述扩大方向上的待圈选范围内且所述第一注视点位在预设的第二时间内发生变化，获取所述第一注视点位与所述圈选框的所述扩大方向上的目标框边之间的第二垂直距离；If the first gaze point falls within the to-be-selected range outside the encircled box in the viewing interface in the expanding direction and the first gaze point occurs within a preset second time change, and obtain the second vertical distance between the first gaze point and the edge of the target frame in the expansion direction of the circle selection frame;

is the preset second relationship coefficient;

一般的，当用户选择需要进行图像分类的图像时，由于选择的量一定较多，在智能终端(例如：手机和平板等)上触控操作时，需要用户持续触控操作，体验较差，若时间过长，可能会造成用户手指不适；特别是一些从事图像分类工作的用户，更是体验不佳。例如：用户打开手机相册，手指在手机屏幕左上角向右侧触控移动，使得全选界面内需要选择的照片，当需要继续向下选择照片时，手指需向下触控移动，使得界面向下滑动并同时全选新的界面内出现需要选择的照片，但是，手指需要保持不动，直至到截止照片时，手指才能松开。因此，亟需进行解决。Generally, when a user selects an image that needs to be classified, due to the large amount of selection, when a touch operation is performed on a smart terminal (such as a mobile phone and a tablet, etc.), the user needs to continue to touch the operation, and the experience is poor. If the time is too long, it may cause discomfort to the user's fingers; especially for some users who are engaged in image classification, the experience is even worse. For example, when the user opens the mobile phone album, the finger touches and moves to the right on the upper left corner of the mobile phone screen, so that the photos to be selected in the all-select interface are selected. Swipe down and select all at the same time. The photo to be selected appears in the new interface. However, you need to keep your finger still until the photo is cut off. Therefore, there is an urgent need to solve it.

当用户触控圈选相册内的第一照片时，已选的第一照片组成圈选框，当圈选框在预设的第一时间(例如：2秒)内沿同一扩大方向(例如：竖直向下)扩大时，说明用户需要继续选择相册内更多的第一照片(例如：相册的显示界面在向下移动)，此时，可以进行自动圈选介入，输出预设的免触圈选提示信息(例如：在显示界面上显示“手指可以移开的哦！开始自动圈选”)，并控制圈选框沿扩大方向(例如：竖直向下)以预设的第一扩大速度(例如：1.2cm/s)继续进行扩大。When the user touches the circle to select the first photo in the album, the selected first picture forms a circle, and the circle expands in the same direction (for example: 2 seconds) within a preset first time (for example: 2 seconds). vertically downward), it indicates that the user needs to continue to select more first photos in the album (for example, the display interface of the album is moving downward). Circle the prompt information (for example: "Finger can be removed! Start automatic circle") on the display interface, and control the circle to expand along the expansion direction (for example: vertically downward) to the preset first expansion The speed (eg: 1.2cm/s) continues to expand.

此时，用户会不断查看圈选框已圈选的第一照片或查看圈选框即将圈选的第一照片，确定是否到截止照片。获取用户的眼部视线(视线获取属于现有技术范畴，不作赘述)，确定查看界面内对应于眼部视线的第一注视点位，第一注视点位为用户正在注视的位置。若第一注视点位落在圈选框内，说明用户在查看圈选框已圈选的第一照片，不太跟得上圈选框的第一扩大速度，基于第一注视点位与目标框边的第一垂直距离，对第一扩大速度进行降速，第一垂直距离越大，说明第一扩大速度较快，应越对第一扩大速度进行降速。若第一注视点位落在查看界面内圈选框外扩大方向上的待圈选范围内且第一注视点位在预设的第二时间(例如：3秒)内发生变化，说明用户在查看圈选框即将圈选的第一照片且未到截止照片，第一注视点位与目标框边的第二垂直距离越大，说明第一扩大速度较慢，越需要对第一扩大速度提速。充分保证了圈选框的扩大速度能够贴近用户，提升了人性化，更提升了免触圈选的智能化。当第一注视点位落在待圈选范围内且第一注视点位在第二时间未发生变化，说明已到达截止照片；但是，截止照片不一定是一行中最后一张照片或一列中最后一张照片，需要对截止照片之后的第一照片进行剔除。进一步提升免触圈选的智能化和人性化。获取为截止照片的第二照片。当第二照片刚好进入圈选框时，基于目标框边的移动类型，剔除无用照片。一般的，移动类型为向下换行，用户查看照片是以从左到右循环查看，因此，控制圈选框退选第二照片所在行右侧的全部第三照片。当移动类型为向上换行时，与之同理。当移动类型为向右换列时，用户查看照片是以从上到小循环查看，因此，控制圈选框退选第二照片下侧的全部第五照片。当移动类型为向左换列时，与之同理。At this time, the user will continuously check the first photo that has been selected in the circled box or the first picture that is about to be circled in the circled box, to determine whether the photo deadline is reached. Acquire the user's eye sight line (the sight line acquisition belongs to the category of the prior art and will not be described repeatedly), and determine the first gaze point corresponding to the eye sight line in the viewing interface, where the first gaze point is the position the user is looking at. If the first fixation point falls within the circled selection box, it means that the user is not able to keep up with the first expansion speed of the circled selection box when viewing the first photo that has been circled in the circled selection box. Based on the first fixation point and the target The first vertical distance of the frame edge is used to slow down the first expansion speed. The larger the first vertical distance is, the faster the first expansion speed is, and the more the first expansion speed should be slowed down. If the first gaze point falls within the to-be-selected range in the expanding direction outside the circled selection box in the viewing interface and the first gaze point changes within the preset second time (for example: 3 seconds), it means that the user is in the View the first photo that is about to be selected in the circle selection box and the photo has not yet reached the deadline. The larger the second vertical distance between the first gaze point and the edge of the target frame, the slower the first expansion speed, and the more it is necessary to speed up the first expansion speed. . It fully guarantees that the expansion speed of the circle selection box can be close to the user, which improves the humanization and the intelligence of the touch-free circle selection. When the first fixation point falls within the range to be circled and the first fixation point does not change in the second time, it means that the cut-off photo has been reached; however, the cut-off photo is not necessarily the last photo in a row or the last photo in a column For a photo, the first photo after the cut-off photo needs to be eliminated. Further enhance the intelligence and humanization of touch-free circle selection. Acquire a second photo that is the cut-off photo. When the second photo just enters the marquee frame, the useless photo is eliminated based on the movement type of the target frame. Generally, the movement type is line feed downward, and users view photos in a circular view from left to right. Therefore, control the circle selection box to deselect all the third photos on the right side of the row where the second photo is located. When the movement type is newline up, the same is true. When the movement type is right-to-column, the user views the photos in a circular view from top to small. Therefore, control the circle selection box to deselect all the fifth photos below the second photo. When the movement type is left-to-column, the same is true.

在一个实施例中，所述获取所述查看界面内对应于所述第一注视点位的第二照片，包括：In one embodiment, the acquiring the second photo corresponding to the first gaze point in the viewing interface includes:

当所述第二注视点位落在所述截止照片放大确认框内且在预设的第三时间内所述第二注视点位未发生改变时，将所述截止照片放大确认框内位于所述第二注视点位的所述第九照片，并作为第二照片，完成获取。When the second gaze point is located in the cut-off photo enlargement confirmation frame and the second gaze point has not changed within the preset third time, the cut-off photo magnification confirmation frame is located in the cut-off photo magnification confirmation frame. The ninth photo of the second gaze point is used as the second photo to complete the acquisition.

获取为截止照片的第二照片时，一般的，只需确定查看界面内位于第一注视点位的照片即可。但是，由于一些智能终端的手机屏幕大小较小或者一些相册的缩略比例较小(例如：1：50，一个界面内显示50张照片)，视线获取的精度有限，使得第一注视点位无法用于精准定位用户注视的截止照片。When acquiring the second photo that is the cut-off photo, generally, it is only necessary to determine the photo located at the first gaze point in the viewing interface. However, due to the small size of the mobile phone screen of some smart terminals or the small abbreviated ratio of some albums (for example: 1:50, 50 photos are displayed in one interface), the accuracy of the line of sight acquisition is limited, making the first gaze point impossible Cutoff photo for pinpointing the user's gaze.

因此，获取显示界面当前的缩略比例，同时，获取显示界面的大小的缩略比例阈值；缩略比例阈值为该显示界面的大小下，第一注视点位可用于精准定位用户注视的截止照片最小缩略比例。若缩略比例大于等于缩略比例阈值，说明第一注视点位可用于精准定位用户注视的截止照片，确定位于第一注视点位的第八照片作为第二照片即可。否则(缩略比例小于缩略比例阈值)，确定第一注视点位的第九照片，基于第九照片，生成截止照片放大确认框，供用户进一步确认，当用户当前的第二注视点位落在截止照片放大确认框内且在预设的第三时间(例如：2秒)内未发生改变时，说明用户注视截止照片，确定位于第二注视点位的第九照片作为第二照片即可。极大程度上提升了应用于不同智能终端的适用性，更提升了截止照片确定的精准性。Therefore, the current thumbnail ratio of the display interface is obtained, and at the same time, the thumbnail ratio threshold of the size of the display interface is obtained; the thumbnail ratio threshold is the cut-off photo of the first gaze point that can be used to accurately locate the user's gaze under the size of the display interface. Minimum abbreviated scale. If the abbreviated ratio is greater than or equal to the abbreviated ratio threshold, it means that the first gaze point can be used to accurately locate the cut-off photo of the user's gaze, and the eighth photo at the first gaze point can be determined as the second photo. Otherwise (the abbreviated ratio is less than the abbreviated ratio threshold), determine the ninth photo of the first gaze point, and based on the ninth photo, generate a cut-off photo enlargement confirmation box for the user to further confirm, when the user's current second gaze point falls In the cut-off photo enlargement confirmation box and there is no change within the preset third time (for example: 2 seconds), it is indicated that the user is staring at the cut-off photo, and the ninth photo at the second gaze point can be determined as the second photo. . It greatly improves the applicability of different smart terminals, and also improves the accuracy of cut-off photos.

本发明提供一种基于软阈值惩罚机制的弱监督细粒度图像分类系统，如图4所示，包括：The present invention provides a weakly supervised fine-grained image classification system based on a soft threshold penalty mechanism, as shown in Figure 4, including:

构建模块1，用于基于软阈值惩罚机制，构建二级级联网络结构的细粒度图像分类网络；Building module 1 is used to build a fine-grained image classification network with a two-level cascade network structure based on a soft threshold penalty mechanism;

获取模块2，用于获取待分类图像；Obtaining module 2, used to obtain images to be classified;

预处理模块3，用于对所述待分类图像进行预处理；Preprocessing module 3, for preprocessing the to-be-classified image;

分类模块4，用于基于所述细粒度图像分类网络，对预处理结果进行图像分类，并输出图像分类结果。The classification module 4 is configured to perform image classification on the preprocessing result based on the fine-grained image classification network, and output the image classification result.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims

1. a weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism, is characterized in that, comprising:

Step 1: Based on the soft threshold penalty mechanism, construct a fine-grained image classification network with a two-level cascade network structure;

Step 2: Obtain the image to be classified;

Step 3: preprocessing the to-be-classified image;

Step 4: Perform image classification on the preprocessing result based on the fine-grained image classification network, and output the image classification result.

2. A weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism as claimed in claim 1, wherein the step 1: constructing a fine-grained two-level cascade network structure based on the soft threshold penalty mechanism Image classification networks, including:

Build a first network branch, the first network branch includes: Input448*448*3, the first ResNet50, Feature14*14*2048, the first GAP, the first FC and the first Softmax connected in sequence;

Build a second network branch, the second network branch includes: Input224*224*3*mult, second ResNet50, Feature*7*7*2048*mult, second GAP, second FC and second Softmax connected in sequence ;

Connect Input224*224*3*mult in the second network branch to Input448*448*3 in the first network branch through crop;

Connect Feature14*14*2048 in the first network branch to the crop through APPM;

setting a first loss function RawLoss for the first network branch;

setting a second loss function PartLoss for the second network branch;

A soft threshold penalty mechanism is set in the APPM;

The first network branch, the second network branch, APPM and crop form a fine-grained image classification network with a two-level cascade network structure.

3. A weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism as claimed in claim 2, wherein the APPM is formed based on SCDA, and the Feature14* extracted from the APPM pair feature 14*2048 is closed along the channel direction of the pooling layer to obtain a two-dimensional image of 14*14*1, and the sliding window calculation is performed on the two-dimensional image with multiple preset sliding windows of different sizes. The calculation process is as follows Formula (2-1) shows:

Among them, H and W are the height and width of the sliding window respectively, A(x, y) is the value corresponding to the coordinate position of the collapsed two-dimensional image, and a _w is the calculation result of the sliding window.

4. a kind of weak supervision fine-grained image classification method based on soft threshold penalty mechanism as claimed in claim 2, is characterized in that, the formula of described first loss function RawLoss is as shown in formula (2-2):

Among them, m _i is the ith sample image, and _ni is the prediction probability of the first convolutional neural network CNN corresponding to the ith sample image.

5. a kind of weakly supervised fine-grained image classification method based on soft threshold penalty mechanism as claimed in claim 2, is characterized in that, the formula of described second loss function PartLoss is as shown in formula (2-3):

Wherein, q is the number of local feature regions selected by the second ResNet50, m _iq is the number of the qth local feature regions corresponding to the ith sample image, and n _iq is the second convolutional neural network CNN corresponding to The predicted probability of the number of qth local feature regions corresponding to the ith sample image.

6. A weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism according to claim 2, wherein the soft threshold penalty mechanism comprises:

Let F(x, y) be the image without noise, N(x, y) be the noise, G(x, y) be the image after the noise, and choose the L _1/2 paradigm to build the model as shown in formula (2) -5) shown:

i represents the serial number of the image. When the image is iteratively processed, there is a residual error in G(x _i , y _i )-F(x _i , y _i ), which means that there is noise in the image and affects it;

The residual state is limited by the soft threshold, and the penalty factor ||G(x _i ,y _i )-F(x _i ,y _i )|| _h is constructed first to limit G(x _i ,y _i )-F(x _i , y _i ) is not greater than 0, thereby reducing the degree of image interference by noise, as shown in formula (2-6):

In the formula, λ represents the penalty coefficient. Adjusting this coefficient can make the result close to the real value. Due to the limitation of the soft threshold, the approximate value may also appear above the real value. Therefore, the soft threshold can effectively reduce the influence of noise interference.

7. A weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism as claimed in claim 2, characterized in that, in order to further optimize the soft threshold method, the objective function can be further modified, as shown in formula (2-7) Show:

Among them, V is an auxiliary variable, and λ ₁ and λ ₂ are both penalty coefficients. In the calculation process, a soft threshold algorithm is used to iteratively update V.

8. A weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism according to claim 1, wherein the step 2: acquiring an image to be classified, comprises:

When the user touches and selects multiple first photos on the viewing interface of the album to form a circled selection box, and the circled selection box expands in the same expansion direction within the preset first time, obtain and output the preset touch-free circle selecting prompt information, and at the same time, controlling the encircled box to continue expanding along the expanding direction at a preset first expanding speed;

Dynamically obtain the current eye sight of the user;

determining a first gaze point corresponding to the eye sight in the viewing interface;

If the first gaze point is within the encircled box in the viewing interface, obtain the distance between the first gaze point and the edge of the target frame in the enlarging direction of the encircled box. the first vertical distance;

Based on the first vertical distance, the first expansion speed is adjusted, and the adjustment formula is shown in formula (2-8):

Wherein, v ₁ ′ is the first expansion speed after adjustment, v ₁ is the first expansion speed before adjustment, l ₁ is the first vertical distance,

is the preset first relationship coefficient;

If the first gaze point falls within the to-be-selected range outside the encircled box in the viewing interface in the expanding direction and the first gaze point occurs within a preset second time change, and obtain the second vertical distance between the first gaze point and the target frame edge in the enlarging direction of the circle selection frame;

Based on the second vertical distance, the first expansion speed is adjusted, and the adjustment formula is shown in formula (2-9):

Wherein, v ₂ ′ is the first expansion speed after adjustment, v ₂ is the first expansion speed before adjustment, l ₂ is the second vertical distance,

is the preset second relationship coefficient;

If the first gaze point is within the to-be-selected range outside the encircled box in the viewing interface in the enlarging direction and the first gaze point does not occur within the second time change, and obtain a second photo corresponding to the first gaze point in the viewing interface;

When the second photo just enters the circled selection box, controlling the circled selection box to stop expanding;

Obtain the movement type of the target frame edge in the expansion direction of the circled selection frame;

When the movement type is line feed downward, controlling the circle selection box to deselect all the third photos to the right of the second photo in the row where the second photo is located in the viewing interface;

When the movement type is line wrap up, controlling the circle selection box to deselect all the fourth photos on the left side of the second photo in the row where the second photo is located in the viewing interface;

When the movement type is to change column to the right, control the circle selection box to deselect all the fifth photos below the second photo in the column where the second photo is located in the viewing interface;

When the movement type is to change columns to the left, controlling the circle selection box to deselect all sixth photos on the upper side of the second photo in the column where the second photo is located in the viewing interface;

After the selection is completed, all the seventh photos circled in the circle selection box are used as images to be classified, and the acquisition is completed.

9 . The weakly supervised fine-grained image classification method based on a soft threshold penalty mechanism according to claim 8 , wherein the acquiring the second photo corresponding to the first gaze point in the viewing interface. 10 . ,include:

obtaining the current thumbnail scale of the display interface;

obtaining a preset abbreviated ratio threshold corresponding to the size of the display interface;

If the abbreviated ratio is greater than or equal to the abbreviated ratio threshold, determine the eighth photo located at the first gaze point in the viewing interface, and use it as the second photo to complete the acquisition;

Otherwise, determining a plurality of ninth photos located at the first gaze point in the viewing interface;

Based on the ninth photo, generating a cut-off photo enlargement confirmation box;

suspending and displaying the cut-off photo enlargement confirmation box in the display interface;

determining a second gaze point in the viewing interface that corresponds to the user's current eye sight;

When the second gaze point is located in the cut-off photo enlargement confirmation frame and the second gaze point has not changed within the preset third time, the cut-off photo magnification confirmation frame is located in the cut-off photo magnification confirmation frame. The ninth photo of the second gaze point is used as the second photo to complete the acquisition.

10. A weakly supervised fine-grained image classification system based on a soft threshold penalty mechanism, characterized in that it comprises:

The building block is used to construct a fine-grained image classification network with a two-level cascade network structure based on the soft threshold penalty mechanism;

The acquisition module is used to acquire the images to be classified;

a preprocessing module for preprocessing the to-be-classified image;

The classification module is configured to perform image classification on the preprocessing result based on the fine-grained image classification network, and output the image classification result.