CN113269067B

CN113269067B - Periodic industrial video clip key frame two-stage extraction method based on deep learning

Info

Publication number: CN113269067B
Application number: CN202110532120.6A
Authority: CN
Inventors: 王雅琳; 戚雨栋; 袁小锋; 王凯; 刘晨亮; 郭静宇; 刘柢炬; 桂卫华
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2023-04-07
Anticipated expiration: 2041-05-17
Also published as: CN113269067A

Abstract

The invention relates to a two-stage extraction method for key frames of periodic industrial video segments based on deep learning. The method includes: acquiring an industrial video image, extracting a region of interest, preprocessing, and obtaining a preprocessed image sequence; constructing a semantic segmentation network model based on deep learning, and extracting the target area of the preprocessed image; in the first stage, constructing The convolutional neural network classifies the preprocessed image, and divides its time series to obtain a set of candidate key frame sequences; in the second stage, constructing the similarity matrix of the target area, and performing a process on the candidate key frame sequence Clustering, screening and fusion, get keyframes. The present invention aims at the problem that the characteristics of industrial video are complex and the current method lacks globality and locality. It introduces deep learning technology and uses the two-stage idea of "first global and then local" to extract key frames of industrial video faster and more accurately. It is of guiding significance to optimize production and realize quality improvement and production increase.

Description

Two-stage key frame extraction method for periodic industrial video clips based on deep learning

技术领域Technical Field

本发明涉及机器视觉、图像处理、模式识别领域，特别涉及一种基于深度学习的周期性工业视频片段关键帧两阶段提取方法。The present invention relates to the fields of machine vision, image processing and pattern recognition, and in particular to a two-stage extraction method of key frames of periodic industrial video clips based on deep learning.

背景技术Background Art

周期性生产过程是一种常见的工业生产过程。在这类过程中，一系列既定的工序被周而复始的执行。例如，在钢铁烧结过程中，存在着“布料→点火→台车行进→卸料”这一周期性生产过程；再比如，在注塑过程中，“合模→填充→保压→冷却→开模→脱模”这一系列工序被循环执行。The cyclical production process is a common industrial production process. In this type of process, a series of established procedures are executed repeatedly. For example, in the steel sintering process, there is a cyclical production process of "laying → ignition → trolley movement → unloading"; for another example, in the injection molding process, the series of procedures of "closing mold → filling → pressure holding → cooling → mold opening → demolding" are executed cyclically.

工业视频是工业生产过程工况信息的直观表现和间接反映。对于某一生产工序而言，关键帧是其监控视频片段中最能反映当前工业生产过程工况特征的图像，是评估该工序当前生产工况的重要特征参数之一。但是由于工业过程的复杂性，导致目前对关键帧的提取存在着以下问题。Industrial video is an intuitive representation and indirect reflection of the working condition information of the industrial production process. For a certain production process, the key frame is the image in the monitoring video clip that best reflects the current working condition characteristics of the industrial production process, and is one of the important characteristic parameters for evaluating the current production condition of the process. However, due to the complexity of the industrial process, the current extraction of key frames has the following problems.

(1)生产周期的动态性(1) Dynamics of the production cycle

理论上对于周期性生产过程，在生产速率一定的情况下，可以确定每个关键帧之间的时间间隔。在人为确定第一帧关键帧后，可以根据生产速率确定后续生产过程中的各关键帧。但是受到物料、燃料、操作、环境等因素的波动的影响，生产周期往往存在一定的波动，导致各关键帧之间时间间隔无法确定。Theoretically, for a periodic production process, the time interval between each key frame can be determined when the production rate is constant. After the first key frame is manually determined, the key frames in the subsequent production process can be determined according to the production rate. However, due to the fluctuations of factors such as materials, fuel, operation, and environment, the production cycle often fluctuates to a certain extent, resulting in the inability to determine the time interval between key frames.

(2)工序间的相似性(2) Similarity between processes

在实际生产过程中，不同工序往往在同一场合下进行，这使得获得的各工序监控视频间存在着较多相似场景。例如，在烧结过程的机尾断面监控视频中，“台车运行”和“卸料”两个工序之间便存在着“烧结料层”这一共同场景，而“卸料”过程特有的“燃烧带”图像在这一场景中仅占了很小的一部分，从图像特征角度来看，这导致了两工序图像间的相似性。而传统的手工特征提取方法无法有效的克服这一相似性，造成了工序视频片段分割的困难性。In the actual production process, different processes are often carried out in the same place, which makes the monitoring videos of each process have many similar scenes. For example, in the tail section monitoring video of the sintering process, there is a common scene of "sintering material layer" between the two processes of "trolley operation" and "unloading", while the "burning zone" image unique to the "unloading" process only occupies a small part of this scene. From the perspective of image features, this leads to the similarity between the images of the two processes. However, the traditional manual feature extraction method cannot effectively overcome this similarity, which makes it difficult to segment the process video clips.

(3)工序内的相似性(3) Similarity within the process

在实际生产过程中，生产设备的动作，以及物料、产品的各种物理化学变化，往往为连续变化过程，监控视频各帧之间的差异较小，并主要体现在空间位置和纹理上，传统的手工特征无法有效的表达这一差异性。例如，在烧结过程的机尾断面监控视频中，“卸料”工序的各断面图像间的主要差异主要表现为燃烧带空间分布、纹理等变化，简单的亮度、直方图等手工特征无法精确的描述这一变化。这一问题便导致了工序视频片段关键帧提取的困难性。In the actual production process, the movement of production equipment, as well as various physical and chemical changes of materials and products, are often continuous changes. The differences between the frames of the monitoring video are small and mainly reflected in the spatial position and texture. Traditional manual features cannot effectively express this difference. For example, in the tail section monitoring video of the sintering process, the main differences between the cross-sectional images of the "unloading" process are mainly manifested in the changes in the spatial distribution and texture of the combustion zone. Simple manual features such as brightness and histogram cannot accurately describe this change. This problem leads to the difficulty of extracting key frames of process video clips.

因此，如何克服上述问题，准确提取工业视频图像特征，快速实现周期性工业视频片段关键帧提取是工业过程工况评估中亟需解决的问题。Therefore, how to overcome the above problems, accurately extract industrial video image features, and quickly realize key frame extraction of periodic industrial video clips is an urgent problem to be solved in industrial process condition assessment.

发明内容Summary of the invention

基于此，本发明针对上述技术问题，提出了一种基于深度学习的关键帧提取方法，其目的是为了解决现有关键帧提取过程各关键帧时间间隔无法确定，无法有效克服工序相似性以及特征无法精确描述的技术问题，提供一种准确提取工业视频图像特征，快速实现周期性工业视频片段关键帧提取的方法。Based on this, the present invention proposes a key frame extraction method based on deep learning to address the above-mentioned technical problems. The purpose is to solve the technical problems that the time intervals of key frames in the existing key frame extraction process cannot be determined, the similarity of processes cannot be effectively overcome, and the features cannot be accurately described. This method provides a method for accurately extracting industrial video image features and quickly realizing key frame extraction of periodic industrial video clips.

本发明提供了一种基于深度学习的工业视频周期性生产片段关键帧两阶段提取方法，具体包括：The present invention provides a two-stage extraction method of key frames of periodic production segments of industrial videos based on deep learning, which specifically includes:

S1:获取工业视频图像，提取兴趣区域图像，并进行预处理，获得预处理图像序列；S1: Acquire industrial video images, extract interest area images, and perform preprocessing to obtain preprocessed image sequences;

S2:构建基于深度学习的语义分割网络模型，对所述预处理图像序列提取图像目标区域；S2: constructing a semantic segmentation network model based on deep learning to extract the image target area from the preprocessed image sequence;

S3:获取所述步骤S2中语义分割网络模型中间层的输出特征，并构建卷积神经网络模型，对所述预处理图像序列进行二分类，获得图像类别特征；S3: Obtain the output features of the middle layer of the semantic segmentation network model in step S2, and construct a convolutional neural network model to perform binary classification on the preprocessed image sequence to obtain image category features;

S4:根据所述图像类别特征对所述预处理后的图像序列进行分割获得候选关键帧序列集合；S4: Segmenting the preprocessed image sequence according to the image category features to obtain a set of candidate key frame sequences;

S5：计算所述候选关键帧序列集合中各图像目标区域的相似度，构建相似度矩阵，并以所述相似度矩阵为输入，对所述候选关键帧序列进行聚类处理，获得多类别图像集合；S5: calculating the similarity of each image target area in the candidate key frame sequence set, constructing a similarity matrix, and taking the similarity matrix as input, performing clustering processing on the candidate key frame sequence to obtain a multi-category image set;

S6:根据工业过程实际需求，构建关键帧选择指标和权值矩阵，根据所述关键帧选择指标对所述多类别图像集合进行筛选获得关键帧序列，并根据所述权值矩阵对所述关键帧序列进行加权平均，获得关键帧。S6: According to the actual needs of the industrial process, a key frame selection index and a weight matrix are constructed, the multi-category image set is screened according to the key frame selection index to obtain a key frame sequence, and the key frame sequence is weighted averaged according to the weight matrix to obtain a key frame.

进一步的，所述步骤S1中的预处理包括去噪、色彩校正和去雾处理。Furthermore, the preprocessing in step S1 includes denoising, color correction and dehazing.

进一步的，所述步骤S2具体包括：Furthermore, the step S2 specifically includes:

从预处理图像序列随机选取多张第一典型图像，并筛选出第一掩模图像，构建第一训练集和第一测试集；Randomly select a plurality of first typical images from the preprocessed image sequence, and screen out a first mask image to construct a first training set and a first test set;

将所述第一训练集和第一测试集进行平移、尺度、亮度和旋转变换处理获得增强训练集和测试集；Performing translation, scale, brightness and rotation transformation on the first training set and the first test set to obtain enhanced training set and test set;

构建深度语义分割网络模型，以所述增强训练集为输入对网络模型进行输入，并以增强测试集对网络模型进行测试，获得训练后的深度语义分割网络模型；Constructing a deep semantic segmentation network model, inputting the network model with the enhanced training set as input, and testing the network model with the enhanced test set to obtain a trained deep semantic segmentation network model;

将所述预处理图像采用训练后的深度语义分割网络模型进行类别特征提取，获得图像类别特征。The preprocessed image is subjected to category feature extraction using a trained deep semantic segmentation network model to obtain image category features.

进一步的，所述步骤S3具体包括：Furthermore, the step S3 specifically includes:

从预处理图像序列随机选取多张第二典型图像，并根据工业过程的实际需求将所述第二典型图像进行分类，构建第二训练集和第二测试集；Randomly selecting a plurality of second typical images from the preprocessed image sequence, and classifying the second typical images according to actual requirements of the industrial process to construct a second training set and a second test set;

以所述第二训练集和第二测试集为输入，采用步骤S2中的深度语义分割模型进行模拟，获取模型中间层输出作为图像深度特征；Taking the second training set and the second test set as input, using the deep semantic segmentation model in step S2 for simulation, and obtaining the output of the middle layer of the model as the image depth feature;

构建卷积神经网络模型，以所述图像深度特征为输入和分类作为输出，多网络进行训练和测试，获得训练后的卷积神经网络模型；Constructing a convolutional neural network model, taking the image depth features as input and classification as output, training and testing multiple networks to obtain a trained convolutional neural network model;

将所述预处理图像采用训练后的卷积神经网络模型进行特征提取，获得图像类别特征。The preprocessed image is subjected to feature extraction using a trained convolutional neural network model to obtain image category features.

进一步的，所述步骤S4具体包括：Furthermore, the step S4 specifically includes:

构建临时图像序列并设定最小图像序列长度；遍历所述预处理图像序列，提取当前图像的类别特征，并判断图像是否属于目标图像；Constructing a temporary image sequence and setting a minimum image sequence length; traversing the preprocessed image sequence, extracting the category features of the current image, and determining whether the image belongs to the target image;

若当前图像为目标图像时，将当前图像添加至临时图像序列，且目标图像数量增加1，当所述临时图像序列的数量大于最小图像序列长度时，将临时图像序列中除最后张图像外的所有图像添加至当前目标图像序列；所述当前目标图像序列集合即为候选关键帧序列集合。If the current image is the target image, the current image is added to the temporary image sequence, and the number of target images increases by 1. When the number of the temporary image sequence is greater than the minimum image sequence length, all images in the temporary image sequence except the last image are added to the current target image sequence; the current target image sequence set is the candidate key frame sequence set.

进一步的，所述构建相似度矩阵具体包括：Furthermore, the constructing of the similarity matrix specifically includes:

取候选关键帧序列

中任意两张图像I_n和I_m，利用所述深度语义分割网络提取相应目标区域Mask_n和Mask_m，并计算Mask_n和Mask_m之间的相似度

Get candidate key frame sequence

For any two images I _n and I _m in the image, the deep semantic segmentation network is used to extract the corresponding target regions Mask _n and Mask _m , and the similarity between Mask _n and Mask _m is calculated.

其中，

表示Mask_n和Mask_m之间的匹配特征描述子数量，W和H分别为图像的宽度和长度，∑∑Mask_n和∑∑Mask_m分别表示目标区域Mask_n和Mask_m的面积，K_n和K_m分别表示Mask_n和Mask_m的特征描述子数量；in,

represents the number of matching feature descriptors between Mask _n and Mask _m , W and H are the width and length of the image respectively, ∑∑Mask _n and ∑∑Mask _m represent the areas of the target regions Mask _n and Mask _m respectively, K _n and K _m represent the number of feature descriptors of Mask _n and Mask _m respectively;

计算候选关键帧序列

中所有图像之间的相似度，得相似度矩阵Calculate candidate key frame sequence

The similarity between all images in the , and the similarity matrix

进一步的，所述聚类处理具体包括：Furthermore, the clustering process specifically includes:

根据工业实际需求，设定类别数量D，以所述相似度矩阵为输入，对相对应的候选关键帧序列进行聚类操作，获得多类别图像集合。According to actual industrial needs, the number of categories D is set, and the similarity matrix is used as input to perform clustering operations on the corresponding candidate key frame sequences to obtain a multi-category image set.

进一步的，所述步骤S6具体包括：Furthermore, the step S6 specifically includes:

以工业过程实际需求为目标，根据所述图像目标区域构建关键帧选择目标，从所述类别图像集合中选择图像获得关键帧序列；Taking the actual needs of the industrial process as the goal, constructing a key frame selection target according to the target area of the image, and selecting images from the category image set to obtain a key frame sequence;

以工业过程实际需求为目标，根据所述图像目标区域构建权值矩阵，对所述关键帧序列中图像进行加权平均，获得关键帧。Taking the actual needs of the industrial process as the goal, a weight matrix is constructed according to the target area of the image, and the images in the key frame sequence are weighted averaged to obtain the key frame.

有益效果：Beneficial effects:

本发明的上述实施例所述的基于深度学习的工业视频周期性生产片段关键帧两阶段提取方法通过引入深度学习技术，弥补了传统手工特征方法在工业图像特征提取能力上的不足，能够完整准确地提取图像特征；通过生成候选关键帧序列，对海量的工业图像进行初步粗筛选，减少了第二阶段中聚类操作的计算量，并提高了其准确率；使用聚类操作对候选关键帧序列进行分割，提高了关键帧序列内图像间的相似度，减少了关键帧合成的计算量，并避免了噪声图像的干扰；采用多图像加权平均合成关键帧的方式，最大程度减少了图像变化过程中的特征丢失，能够更加完整的反映工业生产过程中的视觉信息。The two-stage method for extracting key frames of periodic production fragments of industrial videos based on deep learning described in the above-mentioned embodiment of the present invention makes up for the shortcomings of traditional manual feature methods in the ability to extract industrial image features by introducing deep learning technology, and can extract image features completely and accurately; by generating a candidate key frame sequence, a preliminary rough screening is performed on a large number of industrial images, which reduces the computational complexity of the clustering operation in the second stage and improves its accuracy; the candidate key frame sequence is segmented by clustering operation, which improves the similarity between images in the key frame sequence, reduces the computational complexity of key frame synthesis, and avoids the interference of noise images; the method of synthesizing key frames by weighted average of multiple images is adopted to minimize the feature loss during the image change process, and can more completely reflect the visual information in the industrial production process.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为本发明的基于深度学习的周期性工业视频片段关键帧两阶段提取方法的流程示意图；FIG1 is a schematic flow chart of a two-stage method for extracting key frames of periodic industrial video clips based on deep learning according to the present invention;

图2为本发明实施例提供的典型的原始ROI图像，及其预处理后的图像；FIG2 is a typical original ROI image and a preprocessed image thereof provided by an embodiment of the present invention;

图3为本发明实施例提供的深度语义分割网络结构示意图；FIG3 is a schematic diagram of a deep semantic segmentation network structure provided by an embodiment of the present invention;

图4为本发明实施例提供的典型的预处理后图像及其图像目标区域；FIG4 is a typical preprocessed image and its image target area provided by an embodiment of the present invention;

图5为本发明实施例提供的聚类结果的示意图；FIG5 is a schematic diagram of a clustering result provided by an embodiment of the present invention;

图6为本发明实施例提供的典型关键帧；FIG6 is a typical key frame provided by an embodiment of the present invention;

图7为本发明实施例提供的各方法对关键帧提取效果的对比图。FIG. 7 is a comparison diagram of the key frame extraction effects of various methods provided by the embodiments of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

如图1所示，在本发明实施例中，提出了一种基于深度学习的周期性工业视频片段关键帧两阶段提取方法的流程示意图，具体包括以下步骤：As shown in FIG1 , in an embodiment of the present invention, a flowchart of a two-stage key frame extraction method for periodic industrial video clips based on deep learning is proposed, which specifically includes the following steps:

步骤S1，获取工业视频图像，提取兴趣区域图像，并进行预处理，获得预处理图像序列。Step S1, acquiring industrial video images, extracting images of regions of interest, and performing preprocessing to obtain preprocessed image sequences.

在本发明实施例中，对获取的工业视频图像，进行定宽高裁剪，去除图像中的无用背景，提取出感兴趣区域(Region ofInterest，ROI)图像，再对所述ROI图像进行去噪、色彩校正和去雾等预处理操作，降低图像中受不同光照、高温、扬尘影响而产生的噪点、照度不均以及雾化等缺陷，得到预处理后图像序列，如图2所示的ROI图像及其预处理后的图像。In an embodiment of the present invention, the acquired industrial video image is cropped with a fixed width and height to remove useless background in the image, extract the region of interest (ROI) image, and then perform preprocessing operations such as denoising, color correction and defogging on the ROI image to reduce defects such as noise, uneven illumination and fogging caused by different lighting, high temperature and dust in the image, and obtain a preprocessed image sequence, such as the ROI image and its preprocessed image shown in Figure 2.

步骤S2，构建基于深度学习的语义分割网络模型，对所述预处理图像序列提取图像目标区域。Step S2, constructing a semantic segmentation network model based on deep learning to extract the image target area from the preprocessed image sequence.

在本发明实施例中，首先从所述预处理后的图像中随机选取多张第一典型图像，并筛选出第一掩模图像，构建第一训练集和第一测试集；将所述第一训练集和第一测试集图像进行随机的平移、尺度、亮度和旋转变换等数据增强操作，得增强训练集和测试集；构建深度语义分割网络，如图3所示，其输入的烧结断面尺寸为1024×128×3，整体结构包括四个编码器层和四个对应的解码器层。每层编码器层包含两个3×3的卷积层(Convolution)、一个批正则化层(BatchNormalization)和一个最大池化层(MaxPooling)；每层解码器包含一个上采样层(Upsampling)、一个3×3的卷积层、一个联合层(concatenate)、两个3×3的卷积层，以及一个批正则化层。最后经两个3×3的卷积层，由Sigmoid激活函数激活后，输出大小为1024×128×1的燃烧带形态。图3展示了本文设计的深度语义分割网络结构。然后选取所述增强训练集和增强测试集对网络进行训练和测试，训练时采用交叉熵(Cross Entropy)作为损失函数，Adam作为优化器，其学习率为3×10^-4。使用训练后的深度语义分割网络，提取所述预处理图像的目标区域，提取结果如图4所示。In an embodiment of the present invention, firstly, a plurality of first typical images are randomly selected from the preprocessed images, and a first mask image is screened out to construct a first training set and a first test set; the first training set and the first test set images are subjected to data enhancement operations such as random translation, scale, brightness and rotation transformation to obtain an enhanced training set and a test set; a deep semantic segmentation network is constructed, as shown in FIG3 , wherein the input sintering section size is 1024×128×3, and the overall structure includes four encoder layers and four corresponding decoder layers. Each encoder layer includes two 3×3 convolution layers, a batch normalization layer, and a maximum pooling layer; each decoder layer includes an upsampling layer, a 3×3 convolution layer, a concatenate layer, two 3×3 convolution layers, and a batch normalization layer. Finally, after two 3×3 convolution layers are activated by a Sigmoid activation function, a combustion zone morphology with a size of 1024×128×1 is output. FIG3 shows the deep semantic segmentation network structure designed in this paper. Then, the enhanced training set and the enhanced test set are selected to train and test the network. Cross Entropy is used as the loss function and Adam is used as the optimizer with a learning rate of 3×10 ^-4 during training. The trained deep semantic segmentation network is used to extract the target area of the preprocessed image. The extraction result is shown in FIG4 .

步骤S3，获取所述步骤S2中语义分割网络模型中间层的输出特征，并构建卷积神经网络模型，对所述预处理图像序列进行二分类，获得图像类别特征。Step S3, obtaining the output features of the middle layer of the semantic segmentation network model in step S2, and constructing a convolutional neural network model to perform binary classification on the preprocessed image sequence to obtain image category features.

在本发明实施例中，引入迁移学习的思想，从预处理图像序列随机选取多张第二典型图像，并根据工业过程的实际需求将所述第二典型图像进行分类，构建第二训练集和第二测试集；以所述第二训练集和第二测试集为输入，采用步骤S2中的深度语义分割模型进行模拟，获取模型中间层输出作为图像深度特征；构建构建卷积神经网络模型，主要包括一个Flatten层、一个128维的全连接层、一个批正则化层、一个2维的全连接层和一个Sigmoid激活层，以所述图像深度特征为输入，以所述人工分类结果为输出，对网络进行训练，训练时采用交叉熵(Cross Entropy)作为损失函数，Adam作为优化器，其学习率为3×10^-4；将所述预处理图像采用训练后的卷积神经网络模型进行特征提取，获得图像类别特征。In the embodiment of the present invention, the idea of transfer learning is introduced, and a plurality of second typical images are randomly selected from the preprocessed image sequence, and the second typical images are classified according to the actual needs of the industrial process to construct a second training set and a second test set; the second training set and the second test set are used as input, and the deep semantic segmentation model in step S2 is used for simulation, and the output of the middle layer of the model is obtained as the image depth feature; a convolutional neural network model is constructed, which mainly includes a Flatten layer, a 128-dimensional fully connected layer, a batch normalization layer, a 2-dimensional fully connected layer and a Sigmoid activation layer, and the image depth feature is used as input and the manual classification result is used as output to train the network, and cross entropy (Cross Entropy) is used as the loss function during training, and Adam is used as the optimizer, and its learning rate is 3× ^10-4 ; the preprocessed image is subjected to feature extraction using the trained convolutional neural network model to obtain image category features.

步骤S4，根据所述图像类别特征对所述预处理后的图像序列进行分割获得候选关键帧序列集合。Step S4: segment the preprocessed image sequence according to the image category features to obtain a set of candidate key frame sequences.

在本发明实施例中，所述分割处理具体包括：In the embodiment of the present invention, the segmentation process specifically includes:

步骤S41，输入预处理图像序列S_input和最小图像序列长度δ；Step S41, inputting a preprocessed image sequence S _input and a minimum image sequence length δ;

步骤S42，定义当前目标图像序列

和临时图像序列T，以及目标图像数量C_g＝0和非目标图像数量Cn_g＝0；Step S42, defining the current target image sequence

and a temporary image sequence T, and a target image number C _g = 0 and a non-target image number C n _{g =} 0;

步骤S43，遍历图像序列S_input，提取当前图像I的类别特征；Step S43, traversing the image sequence S _input , extracting the category features of the current image I;

步骤S44，判断图像I是否为目标图像，如果是，则跳转至步骤S45；否则，跳转至步骤S47；Step S44, determine whether the image I is the target image, if yes, jump to step S45; otherwise, jump to step S47;

步骤S45，将图像I添加至临时图像序列T，同时令目标图像数量C_g自增1；Step S45, adding the image I to the temporary image sequence T, and increasing the number of target images _Cg by 1;

步骤S46，判断目标图像数量C_g是否大于等于最小图像序列长度δ，如果是，则令非目标图像数量C_ng＝0；Step S46, determining whether the number of target images _Cg is greater than or equal to the minimum image sequence length δ, if so, setting the number of non-target images _Cng = 0;

步骤S47，令非目标图像数量C_ng自增1，同时判断目标图像数量C_g是否大于等于最小图像序列长度δ，如果是，则将图像I添加至临时图像序列T；Step S47, increment the number of non-target images C _ng by 1, and determine whether the number of target images C _g is greater than or equal to the minimum image sequence length δ. If yes, add the image I to the temporary image sequence T;

步骤S48，判断非目标图像数量C_ng是否大于等于最小图像序列长度δ，如果是，则跳转至步骤S49；否则，跳转至步骤S412；Step S48, determining whether the number of non-target images C _ng is greater than or equal to the minimum image sequence length δ, if yes, jump to step S49; otherwise, jump to step S412;

步骤S49，判断目标图像数量C_g是否大于等于最小图像序列长度δ，如果是，则跳转至步骤S410；否则，跳转至步骤S411；Step S49, determining whether the target image quantity _Cg is greater than or equal to the minimum image sequence length δ, if yes, jump to step S410; otherwise, jump to step S411;

步骤S410，将临时图像序列T中除最后δ张图像外的所有图像添加至当前目标图像序列

Step S410: add all images except the last δ images in the temporary image sequence T to the current target image sequence

步骤S411，将目标图像数量C_g和非目标图像数量C_ng清零，同时清空临时图像序列T；Step S411, clearing the number of target images _Cg and the number of non-target images _Cng , and clearing the temporary image sequence T;

步骤S412，重复步骤S43至步骤S411，直到图像序列S_input终止；Step S412, repeating steps S43 to S411 until the image sequence S _input is terminated;

步骤S413，得到候选关键帧序列集合

Step S413: Obtain a candidate key frame sequence set

步骤S5，计算所述候选关键帧序列集合中各图像目标区域的相似度，构建相似度矩阵，并以所述相似度矩阵为输入，对所述候选关键帧序列进行聚类处理，获得多类别图像集合。Step S5, calculating the similarity of each image target area in the candidate key frame sequence set, constructing a similarity matrix, and using the similarity matrix as input, performing clustering processing on the candidate key frame sequence to obtain a multi-category image set.

在本发明实施例中，取候选关键帧序列

中任意两张图像I_n和I_m，利用所述深度语义分割网络提取相应目标区域Mask_n和Mask_m；首先使用STFI算法提取Mask_n和Mask_m的SIFT特征描述集合

和

其中

和

分别为128维的特征描述子；然后计算F_n中特征描述子

与F_m中各特征描述子

之间的欧式距离In the embodiment of the present invention, the candidate key frame sequence is taken

Take any two images I _n and I _m in the image, and use the deep semantic segmentation network to extract the corresponding target regions Mask _n and Mask _m ; first use the STFI algorithm to extract the SIFT feature description set of Mask _n and Mask _m

and

in

and

They are 128-dimensional feature descriptors respectively; then calculate the feature descriptors in F _n

and each feature descriptor in _Fm

The Euclidean distance between

并选取距离最小的特征描述子作为

在F_m的匹配特征描述子And select the feature descriptor with the smallest distance as

Matching feature descriptors in _Fm

同理，可以得到F_m中特征描述子

在F_n中的匹配特征描述子为

如果Similarly, we can get the feature descriptor in _Fm

The matching feature descriptor in _Fn is

if

则称

与

为M_i和M_j之间的匹配特征描述子；Then it is called

and

is the matching feature descriptor between _Mi and _Mj ;

考虑工业过程的时序规律，以及候选关键帧序列中各图像之间的相似性，这里记Mask_n和Mask_m的相似度为Considering the temporal law of the industrial process and the similarity between the images in the candidate key frame sequence, the similarity between Mask _n and Mask _m is recorded as

其中

表示Mask_n和Mask_m之间的匹配特征描述子数量，W和H分别为图像的宽度和长度，∑∑Mask_n和∑∑Mask_m分别表示Mask_n和Mask_m的面积，K_n和K_m分别表示Mask_n和Mask_m的特征描述子数量；in

represents the number of matching feature descriptors between Mask _n and Mask _m , W and H are the width and length of the image respectively, ∑∑Mask _n and ∑∑Mask _m represent the areas of Mask _n and Mask _m respectively, _Kn and _Km represent the number of feature descriptors of Mask _n and Mask _m respectively;

计算候选关键帧序列

The similarity between all images in the , and the similarity matrix

在本发明实施例中，所述聚类处理具体包括：结合工业生产实际，将生产过程划分为前期、中期和后期，选取类别数量D＝3；采用谱聚类算法，以所述相似度矩阵A_i为输入，对相应的候选关键帧序列

进行聚类操作，得到多类别图像集合C＝(c₁,c₂,…,c_d,…,c_D)，其聚类结果示意图如图5所示。In the embodiment of the present invention, the clustering process specifically includes: combining the actual industrial production, dividing the production process into early stage, middle stage and late stage, selecting the number of categories D=3; using the spectral clustering algorithm, taking the similarity matrix _Ai as input, and clustering the corresponding candidate key frame sequences

A clustering operation is performed to obtain a multi-category image set C = (c ₁ , c ₂ , …, c _d , …, c _D ), and a schematic diagram of the clustering result is shown in FIG5 .

步骤S6，根据工业过程实际需求，构建关键帧选择指标和权值矩阵，根据所述关键帧选择指标对所述多类别图像集合进行筛选获得关键帧序列，并根据所述权值矩阵对所述关键帧序列进行加权平均，获得关键帧。Step S6, constructing a key frame selection index and a weight matrix according to the actual needs of the industrial process, screening the multi-category image set according to the key frame selection index to obtain a key frame sequence, and performing weighted averaging on the key frame sequence according to the weight matrix to obtain a key frame.

在本发明实施例中，以工业过程实际需求为目标，认为关键帧序列需满足目标区域总面积最大，并位于生产周期的中部；在所述目标区域的基础上，构建关键帧选择指标In the embodiment of the present invention, the actual needs of the industrial process are taken as the target, and it is considered that the key frame sequence needs to meet the total area of the target area, which is the largest and located in the middle of the production cycle; based on the target area, a key frame selection index is constructed.

其中N为候选关键帧序列的图像数量。Where N is the number of images in the candidate key frame sequence.

从所述多类别图像集合C＝(c₁,c₂,…,c_d,…,c_D)中选择最佳图像集合，得关键帧序列

The best image set is selected from the multi-category image set C = (c ₁ , c ₂ , ..., c _d , ..., c _D ) to obtain a key frame sequence

以工业过程实际需求为目标，根据所述图像目标区域，以目标区域面积为权值，构建权值矩阵W＝[w₁,w₂,…,w_K]；对关键帧序列

中所有图像，以所述权值矩阵W为权，计算其加权平均，得关键帧I_key，其结果如图6所示。Taking the actual needs of the industrial process as the goal, according to the image target area, the target area area is used as the weight to construct a weight matrix W = [w ₁ ,w ₂ ,…,w _K ]; for the key frame sequence

For all images in , the weight matrix W is used as the weight, and the weighted average is calculated to obtain the key frame I _key , and the result is shown in FIG6 .

在本发明实施例，图7展示了图像特征曲线峰值法以目标面积为特征，对工业视频关键帧提取的结果。图A、B和C分别为生产专家提取的关键帧、图像特征曲线峰值法提取的关键帧和本文所提方法提取的关键帧的及其对应的图像目标区域。为了对两种方法的优异进行评估，本发明使用均值哈希距离、差值哈希距离、感知哈希距离、余弦距离和SIFT匹配特征点匹配率(生产专家提取的关键帧的SIFT特征点被匹配的百分比)计算两种方法与生产专家所提关键帧之间的相似度。其中，均值哈希距离、差值哈希距离和感知哈希距离越小，说明两幅图像之间的相似度越高；余弦距离和SIFT匹配特征点匹配率越大，说明两幅图像之间的相似度越高。表1展示上述方法对两种算法的评估结果，可见本发明所提方法能够更加准确的提取关键帧。In an embodiment of the present invention, FIG. 7 shows the result of extracting the key frames of industrial videos using the peak method of the image feature curve with the target area as the feature. FIG. A, B and C are the key frames extracted by the production expert, the key frames extracted by the peak method of the image feature curve and the key frames extracted by the method proposed in this article and their corresponding image target areas, respectively. In order to evaluate the excellence of the two methods, the present invention uses the mean hash distance, the difference hash distance, the perceptual hash distance, the cosine distance and the SIFT matching feature point matching rate (the percentage of SIFT feature points of the key frames extracted by the production expert being matched) to calculate the similarity between the two methods and the key frames proposed by the production expert. Among them, the smaller the mean hash distance, the difference hash distance and the perceptual hash distance, the higher the similarity between the two images; the larger the cosine distance and the SIFT matching feature point matching rate, the higher the similarity between the two images. Table 1 shows the evaluation results of the above method on the two algorithms, and it can be seen that the method proposed in this invention can extract key frames more accurately.

表1不同方法与生产专家所提关键帧之间的相似度Table 1 Similarity between different methods and key frames proposed by production experts

根据生产专家和本文所提方法，图7中框1内的图像属于同一生产周期，但图像特征曲线峰值法将其分为了三个周期，可见本发明所提方法对关键帧提取的准确率更高。According to production experts and the method proposed in this paper, the images in frame 1 in Figure 7 belong to the same production cycle, but the image feature curve peak method divides it into three cycles. It can be seen that the method proposed in this invention has a higher accuracy rate in extracting key frames.

本发明的上述实施例所述的基于深度学习的工业视频周期性生产片段关键帧两阶段提取方法通过引入深度学习技术，弥补了传统手工特征方法在工业图像特征提取能力上的不足，能够完整准确地提取图像特征；通过生成候选关键帧序列，对海量的工业图像进行初步粗筛选，减少了第二阶段中聚类的计算量，并提高了聚类的准确率；使用聚类操作对候选关键帧序列进行二次分割，提高了关键帧序列内图像间的相似度，减少了关键帧计算的计算量，并避免了噪声图像的干扰；采用多图像加权平均合成关键帧的方式，最大程度减少了图像变化过程中的特征丢失，能够更加完整的反映工业生产过程中的视觉信息。The two-stage method for extracting key frames of periodic production fragments of industrial videos based on deep learning described in the above-mentioned embodiment of the present invention makes up for the shortcomings of traditional manual feature methods in the ability to extract industrial image features by introducing deep learning technology, and can extract image features completely and accurately; by generating a candidate key frame sequence, a preliminary rough screening is performed on a large number of industrial images, which reduces the amount of clustering calculations in the second stage and improves the accuracy of clustering; the candidate key frame sequence is secondary segmented using a clustering operation, which improves the similarity between images in the key frame sequence, reduces the amount of key frame calculation, and avoids the interference of noise images; the method of synthesizing key frames by weighted averaging of multiple images is used to minimize the feature loss during the image change process, and can more completely reflect the visual information in the industrial production process.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation methods of the present invention, and the description thereof is relatively specific and detailed, but it cannot be understood as limiting the scope of the patent of the present invention. It should be pointed out that, for ordinary technicians in this field, several variations and improvements can be made without departing from the concept of the present invention, which all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention shall be subject to the attached claims.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由权利要求指出。Those skilled in the art will readily appreciate other embodiments of the present disclosure after considering the specification and practicing the invention disclosed herein. This application is intended to cover any variations, uses or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or customary techniques in the art that are not disclosed in the present disclosure. The specification and examples are intended to be exemplary only, and the true scope and spirit of the present disclosure are indicated by the claims.

应该理解的是，虽然本发明各实施例的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，各实施例中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although each step in the flow chart of each embodiment of the present invention is shown in sequence according to the indication of the arrow, these steps are not necessarily performed in sequence according to the order indicated by the arrow. Unless there is a clear explanation in this article, the execution of these steps does not have strict order restrictions, and these steps can be performed in other orders. Moreover, at least a portion of the steps in each embodiment may include a plurality of sub-steps or a plurality of stages, and these sub-steps or stages are not necessarily performed at the same time, but can be performed at different times, and the execution order of these sub-steps or stages is not necessarily performed in sequence, but can be performed in turn or alternately with at least a portion of other steps or sub-steps or stages of other steps.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一非易失性计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those skilled in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be completed by instructing the relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium. When the program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to memory, storage, database or other media used in the embodiments provided in this application can include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM) or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-described embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above-described embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

Claims

1. A periodic industrial video clip key frame two-stage extraction method based on deep learning is characterized by specifically comprising the following steps:

s1: acquiring an industrial video image, extracting an interest area image, and preprocessing to obtain a preprocessed image sequence;

s2: constructing a semantic segmentation network model based on deep learning, and extracting an image target region from the preprocessed image sequence;

s3: acquiring the output characteristics of the middle layer of the semantic segmentation network model in the step S2, constructing a convolutional neural network model, and performing secondary classification on the preprocessed image sequence to obtain image classification characteristics;

s4: segmenting the preprocessed image sequence according to the image category characteristics to obtain a candidate key frame sequence set;

s5: calculating the similarity of each image target area in the candidate key frame sequence set, constructing a similarity matrix, and performing clustering processing on the candidate key frame sequence by taking the similarity matrix as input to obtain a multi-class image set;

s6: according to actual requirements of an industrial process, constructing a key frame selection index and a weight matrix, screening the multi-category image set according to the key frame selection index to obtain a key frame sequence, and performing weighted average on the key frame sequence according to the weight matrix to obtain a key frame;

the segmentation processing in step S4 specifically includes:

step S41, inputting a pre-processing image sequence S _input And a minimum image sequence length δ;

step S42, defining the current target image sequence

And a temporary image sequence T, and a number of target images C _g =0 and number of non-target images C _ng ＝0；

Step S43, traversing the image sequence S _input Extracting the category characteristics of the current image I;

step S44, judging whether the image I is a target image, if so, jumping to step S45; otherwise, jumping to step S47;

step S45, adding the image I to the temporary image sequence T and making the number of target images C _g Self-increment by 1;

step S46, judging the number C of target images _g Whether the image sequence length is larger than or equal to the minimum image sequence length delta or not, if so, the number of non-target images C is made _ng ＝0；

Step S47, make the number of non-target images C _ng Increasing by 1, and judging the number of target images C _g Whether the length is larger than or equal to the minimum image sequence length delta, if so, adding the image I to the temporary image sequence T;

step S48, judging the number C of non-target images _ng Whether the image sequence length is larger than or equal to the minimum image sequence length delta or not is judged, if yes, the step S49 is skipped; otherwise, jumping to step S412;

step S49, judging the number C of target images _g Whether the length is larger than or equal to the minimum image sequence length delta or not is judged, if yes, the step S410 is skipped to; otherwise, jumping to step S411;

in the step S410, the process is executed,adding all images except the last delta images in the temporary image sequence T to the current target image sequence

Step S411, counting the number of target images C _g And the number of non-target images C _ng Clearing and clearing the temporary image sequence T;

step S412, repeating steps S43 to S411 until the image sequence S _input Terminating;

step S413, obtaining a candidate key frame sequence set

2. The periodic industrial video segment key frame two-stage extraction method based on deep learning of claim 1, wherein the preprocessing in the step S1 comprises denoising, color correction and defogging.

3. The periodic industrial video segment key frame two-stage extraction method based on deep learning of claim 1, wherein the step S2 specifically comprises:

randomly selecting a plurality of first typical images from the preprocessed image sequence, screening out first mask images, and constructing a first training set and a first testing set;

carrying out translation, scale, brightness and rotation transformation processing on the first training set and the first test set to obtain an enhanced training set and a test set;

constructing a deep semantic segmentation network model, inputting the network model by taking the enhanced training set as input, and testing the network model by using the enhanced testing set to obtain a trained deep semantic segmentation network model;

and extracting a target region from the preprocessed image by adopting a trained deep semantic segmentation network model.

4. The periodic industrial video clip key frame two-stage extraction method based on deep learning according to claim 1, wherein the step S3 specifically comprises:

randomly selecting a plurality of second typical images from the preprocessed image sequence, classifying the second typical images according to the actual requirements of the industrial process, and constructing a second training set and a second testing set;

taking the second training set and the second testing set as input, simulating by adopting the depth semantic segmentation network model in the step S2, and obtaining the output of the middle layer of the model as the depth characteristic of the image;

constructing a convolutional neural network model, taking the image depth characteristics as input and classification as output, and training and testing multiple networks to obtain a trained convolutional neural network model;

and performing feature extraction on the preprocessed image by adopting a trained convolutional neural network model to obtain image category features.

5. The periodic industrial video segment key frame two-stage extraction method based on deep learning of claim 1, wherein the step S4 specifically comprises:

constructing a temporary image sequence and setting the length delta of the minimum image sequence; traversing the preprocessed image sequence, extracting the category characteristics of the current image, and judging whether the image belongs to a target image;

if the current image is the target image, adding the current image to the temporary image sequence, increasing the number of the target images by 1, and adding all images except the last image in the temporary image sequence to the current target image sequence when the number of the temporary image sequence is greater than the length delta of the minimum image sequence; and the current target image sequence set is a candidate key frame sequence set.

6. The periodic industrial video segment key frame two-stage extraction method based on deep learning of claim 1, wherein the constructing of the similarity matrix specifically comprises:

taking a sequence of candidate key frames

Any two images I in _n And I _m Extracting corresponding target area Mask by using the deep semantic segmentation network model _n And Mask _m And calculating Mask _n And Mask _m The degree of similarity therebetween->

Wherein,

represents Mask _n And Mask _m The number of matched feature descriptors between W and H are the width and length of the image, sigma Mask _n Sum sigma Mask _m Respectively represent target regions Mask _n And Mask _m Area of (C), K _n And K _m Respectively represent Mask _n And Mask _m The number of feature descriptors of (a);

computing a candidate sequence of key frames

In all images to obtain a similarity matrix>

7. The periodic industrial video segment key frame two-stage extraction method based on deep learning of claim 1, wherein the clustering process specifically comprises:

and setting the category quantity D according to the actual industrial requirements, and carrying out clustering operation on the corresponding candidate key frame sequence by taking the similarity matrix as input to obtain a multi-category image set.

8. The periodic industrial video segment key frame two-stage extraction method based on deep learning of claim 1, wherein the step S6 specifically comprises:

constructing a key frame selection index according to the image target region by taking the actual requirement of the industrial process as a target, and selecting an image from the category image set to obtain a key frame sequence;

and taking actual requirements of the industrial process as targets, constructing a weight matrix according to the image target region, and carrying out weighted average on the images in the key frame sequence to obtain key frames.