CN116664845A

CN116664845A - Smart construction site image segmentation method and system based on inter-block contrastive attention mechanism

Info

Publication number: CN116664845A
Application number: CN202310935833.6A
Authority: CN
Inventors: 聂秀山; 方静远; 宁阳; 袭肖明; 郭杰
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-08-29
Anticipated expiration: 2043-07-28
Also published as: CN116664845B

Abstract

The application belongs to the technical field of image segmentation and provides an intelligent building image segmentation method and system based on an inter-block contrast attention mechanism. The method comprises the steps of predicting a target segmentation area of a scene image of a construction site by adopting a segmentation model based on the scene image of the construction site to be segmented; extracting a feature map of a scene image training sample of a construction site; based on the split labels, a plurality of label vectors are obtained; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; and optimizing the super parameters of the segmentation model according to the output result of the segmentation model and the segmentation label. The method can carry out target segmentation processing on the scene image of the construction site, and effectively realize intelligent monitoring of safety of the construction site.

Description

Image Segmentation Method and System for Smart Construction Site Based on Inter-Block Contrastive Attention Mechanism

技术领域technical field

本发明属于图像分割技术领域，尤其涉及一种基于块间对比注意力机制的智慧工地图像分割方法及系统。The invention belongs to the technical field of image segmentation, and in particular relates to an image segmentation method and system for a smart construction site based on an inter-block contrastive attention mechanism.

背景技术Background technique

本部分的陈述仅仅是提供了与本发明相关的背景技术信息，不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.

语义分割任务是计算机视觉中的一项重要课题，其主要任务是为图像的每个像素分配像素级别的类别标签，在自动驾驶、计算机视觉、医学影像分析以及计算机辅助诊断等领域发挥着重要作用。Semantic segmentation task is an important topic in computer vision. Its main task is to assign a pixel-level category label to each pixel of an image. It plays an important role in the fields of autonomous driving, computer vision, medical image analysis, and computer-aided diagnosis. .

智慧工地是指通过信息化手段，对建筑工地实行科学管理、智能生产等，其涉及计算机技术、人工智能技术、传感技术和虚拟现实等技术。建筑工程通常存在工期紧、任务重、风险高、管理难等问题。目前对于建筑工地现场的管理，主要包括巡检和抽检，存在时效性较差、人员监管成本高等问题，从而导致违章操作的频率提高，导致施工现场安全、质量、进度都无法得到有效保证。Smart construction site refers to the implementation of scientific management and intelligent production on construction sites through information technology, which involves computer technology, artificial intelligence technology, sensor technology and virtual reality technology. Construction projects usually have problems such as tight schedules, heavy tasks, high risks, and difficult management. At present, the management of construction sites mainly includes inspections and random inspections, which have problems such as poor timeliness and high personnel supervision costs, which lead to an increase in the frequency of illegal operations, resulting in the inability to effectively guarantee the safety, quality, and progress of construction sites.

随着人工智能发展，也逐步将人工智能技术应用到施工场地辅助监管系统，运用基于深度学习的图像识别算法技术，对工地监控、塔吊拍摄的图片进行监测，目前对于智慧工地场景图像的分割任务有的仅考虑全局的长距离依赖信息，有的仅考虑短距离依赖信息，这两者均影响了图像分割任务的结果的准确性。With the development of artificial intelligence, artificial intelligence technology is gradually applied to the construction site auxiliary supervision system, and the image recognition algorithm technology based on deep learning is used to monitor the construction site monitoring and the pictures taken by the tower crane. At present, the segmentation task of the intelligent construction site scene image Some only consider the global long-distance dependence information, and some only consider the short-distance dependence information, both of which affect the accuracy of the results of the image segmentation task.

发明内容Contents of the invention

为了解决上述背景技术中存在的技术问题，本发明提供一种基于块间对比注意力机制的智慧工地图像分割方法及系统，其能够对工地场景图像进行目标分割处理，有效实现工地安全的智能监测，提供生产管理效率，保障了工地安全施工。In order to solve the technical problems in the above-mentioned background technology, the present invention provides a smart construction site image segmentation method and system based on inter-block comparative attention mechanism, which can perform target segmentation processing on construction site scene images, and effectively realize intelligent monitoring of construction site safety , provide production management efficiency, and ensure safe construction on the site.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

本发明的第一个方面提供一种基于块间对比注意力机制的智慧工地图像分割方法。The first aspect of the present invention provides a method for image segmentation of smart construction sites based on an inter-block contrastive attention mechanism.

基于块间对比注意力机制的智慧工地图像分割方法，包括：A smart construction site image segmentation method based on the contrastive attention mechanism between blocks, including:

基于待分割的工地场景图像，采用已训练的分割模型，预测工地场景图像的目标分割区域；Based on the construction site scene image to be segmented, the trained segmentation model is used to predict the target segmentation area of the construction site scene image;

所述分割模型的训练过程包括：获取标注分割标签的工地场景图像训练样本；提取工地场景图像训练样本的特征图；对分割标签进行独热编码处理，得到若干个标签向量；对特征图进行分块处理和最大池化处理，依据标签向量，得到块级分类标签；将特征图分成若干个块级特征图；在块级分类标签监督下，对块级特征图进行映射，得到若干个块级CAM；基于若干个块级特征图和若干个块级CAM，建立块级相关性矩阵；将块级相关性矩阵映射为全局相关性矩阵；计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度，得到输出特征图，根据输出特征图，得到分割模型的输出结果，根据分割模型的输出结果和分割标签，采用损失函数，优化分割模型的超参数，得到已训练的分割模型。The training process of the segmentation model includes: obtaining the construction site scene image training samples marked with segmentation labels; extracting the feature map of the construction site scene image training samples; performing one-hot encoding processing on the segmentation labels to obtain several label vectors; Block processing and maximum pooling processing, according to the label vector, obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several block-level CAM; based on several block-level feature maps and several block-level CAMs, establish a block-level correlation matrix; map the block-level correlation matrix to a global correlation matrix; calculate the positive sample similarity of the block-level correlation matrix, block-level The negative sample similarity of the correlation matrix, the positive sample similarity of the global correlation matrix, and the negative sample similarity of the global correlation matrix are used to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the segmentation model Output the result and segmentation label, use the loss function, optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model.

进一步地，所述基于若干个块级特征图和若干个块级CAM，建立块级相关性矩阵的过程包括：将若干个块级特征图和若干个块级CAM分别进行矩阵变换后，进行矩阵相乘，得到通道和类别之间长依赖关系的块级相关性矩阵。Further, the process of establishing a block-level correlation matrix based on several block-level feature maps and several block-level CAMs includes: performing matrix transformation on several block-level feature maps and several block-level CAMs respectively, and performing matrix Multiplied together, a block-level correlation matrix of long dependencies between channels and categories is obtained.

进一步地，所述块级相关性矩阵的正样本相似度和块级相关性矩阵的负样本相似度的计算过程包括：将块级相关性矩阵中响应值高于设定值的通道为正样本，其余为负样本，引入权重矩阵，进行对比学习，得到块级相关性矩阵的正样本相似度和块级相关性矩阵的负样本相似度。Further, the calculation process of the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix includes: the channels in the block-level correlation matrix whose response value is higher than the set value are positive samples , the rest are negative samples, and the weight matrix is introduced for comparative learning to obtain the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix.

进一步地，所述将块级相关性矩阵映射为全局相关性矩阵的过程包括：通过全连接层，将块级相关性矩阵映射为全局相关性矩阵。Further, the process of mapping the block-level correlation matrix to a global correlation matrix includes: mapping the block-level correlation matrix to a global correlation matrix through a fully connected layer.

进一步地，所述全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度的计算过程包括：将全局相关性矩阵中响应值高于设定值的通道为正样本，其余为负样本，引入权重矩阵，进行对比学习，得到全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度。Further, the calculation process of the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix includes: the channel in the global correlation matrix whose response value is higher than the set value is a positive sample, and the rest are Negative samples, introduce the weight matrix, and carry out comparative learning to obtain the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix.

进一步地，在得到输出特征图之后还包括：对输出特征图的维度进行处理，使之与所述特征图的尺寸相同，通过上采样，得到与特征图大小相同的语义分割掩码，即为分割模型的输出结果。Further, after obtaining the output feature map, it also includes: processing the dimension of the output feature map to make it the same size as the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through upsampling, that is, The output of the segmentation model.

进一步地，所述损失函数包括：Further, the loss function includes:

在块级分类标签监督下，对块级特征图进行映射，得到若干个块级CAM的预测损失函数；Under the supervision of block-level classification labels, the block-level feature maps are mapped to obtain several block-level CAM prediction loss functions;

进行目标分割区域的语义分割损失函数；Carry out the semantic segmentation loss function of the target segmentation area;

以及块间长依赖关系与块内长依赖关系之间的对比损失函数。and a contrastive loss function between inter-block long dependencies and intra-block long dependencies.

本发明的第二个方面提供一种基于块间对比注意力机制的智慧工地图像分割系统。The second aspect of the present invention provides a smart construction site image segmentation system based on inter-block contrastive attention mechanism.

基于块间对比注意力机制的智慧工地图像分割系统，包括：Smart construction site image segmentation system based on inter-block contrastive attention mechanism, including:

预测模块，其被配置为：基于待分割的工地场景图像，采用已训练的分割模型，预测工地场景图像的目标分割区域；A prediction module, which is configured to: predict the target segmentation area of the construction site scene image based on the construction site scene image to be segmented, using a trained segmentation model;

分割模型训练模块，其被配置为：获取标注分割标签的工地场景图像训练样本；提取工地场景图像训练样本的特征图；对分割标签进行独热编码处理，得到若干个标签向量；对特征图进行分块处理和最大池化处理，依据标签向量，得到块级分类标签；将特征图分成若干个块级特征图；在块级分类标签监督下，对块级特征图进行映射，得到若干个块级CAM；基于若干个块级特征图和若干个块级CAM，建立块级相关性矩阵；将块级相关性矩阵映射为全局相关性矩阵；计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度，得到输出特征图，根据输出特征图，得到分割模型的输出结果，根据分割模型的输出结果和分割标签，采用损失函数，优化分割模型的超参数，得到已训练的分割模型。The segmentation model training module is configured to: obtain training samples of construction site scene images marked with segmentation labels; extract feature maps of construction site scene image training samples; perform one-hot encoding on the segmentation labels to obtain several label vectors; Block processing and maximum pooling processing, according to the label vector, obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several blocks level CAM; based on several block-level feature maps and several block-level CAMs, a block-level correlation matrix is established; the block-level correlation matrix is mapped to a global correlation matrix; the positive sample similarity and block-level correlation matrix are calculated. The negative sample similarity of the level correlation matrix, the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix are obtained to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the segmentation model The output result and the segmentation label, using the loss function, optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model.

本发明的第三个方面提供一种计算机可读存储介质。A third aspect of the present invention provides a computer readable storage medium.

一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上述第一个方面所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps in the method for image segmentation of a smart construction site based on an inter-block contrastive attention mechanism as described in the first aspect above are implemented.

本发明的第四个方面提供一种计算机设备。A fourth aspect of the present invention provides a computer device.

一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述第一个方面所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。A computer device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program, it realizes the attention based on the comparison between blocks as described in the first aspect above The steps in the image segmentation method of smart construction site based on force mechanism.

与现有技术相比，本发明的有益效果是：Compared with prior art, the beneficial effect of the present invention is:

对于工地监控、塔吊传输的施工现场监测图像，本发明通过神经网络进行特征提取，并对输入特征图像进行像素级分类，得到输入图像中各个物体对象的分割区域，有利于进一步检测施工现场的违章操作与安全隐患。For site monitoring and construction site monitoring images transmitted by tower cranes, the present invention performs feature extraction through a neural network, and performs pixel-level classification on the input feature images to obtain the segmented areas of each object in the input image, which is conducive to further detection of violations at the construction site Operational and safety hazards.

本发明将对比学习引入到有监督设定下的语义分割任务，通过对比学习，使得具有相同标签的像素在特征空间中更为接近，而具有不同标签的像素在特征空间中有相对较大的距离，以进一步增强特征的表征能力。由于注意力机制和对比学习在语义分割任务中表现优异，本发明将注意力机制和对比学习进行结合，通过对比损失迫使特征图和类激活映射（Class Activation Mapping，CAM）的通道相关性矩阵具有更高的置信度，从而获得鲁棒且精确的工地场景图像目标分割结果，提高了工地场景图像的分割精度。The present invention introduces contrastive learning into the semantic segmentation task under supervised setting. Through contrastive learning, the pixels with the same label are closer in the feature space, while the pixels with different labels have relatively large distances in the feature space. distance to further enhance the representation ability of features. Since the attention mechanism and contrastive learning perform well in semantic segmentation tasks, the present invention combines the attention mechanism with contrastive learning, forcing the feature map and the channel correlation matrix of the Class Activation Mapping (CAM) to have Higher confidence, so as to obtain robust and accurate target segmentation results of construction site scene images, and improve the segmentation accuracy of construction site scene images.

附图说明Description of drawings

构成本发明的一部分的说明书附图用来提供对本发明的进一步理解，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。The accompanying drawings constituting a part of the present invention are used to provide a further understanding of the present invention, and the schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention.

图1是本发明示出的基于块间对比注意力机制的智慧工地图像分割方法的流程图；Fig. 1 is the flow chart of the intelligent construction site image segmentation method based on the inter-block contrast attention mechanism shown in the present invention;

图2是本发明示出的基于块间对比注意力机制的智慧工地图像分割方法的框架图。Fig. 2 is a frame diagram of a smart construction site image segmentation method based on an inter-block contrastive attention mechanism shown in the present invention.

具体实施方式Detailed ways

下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出，以下详细说明都是例示性的，旨在对本发明提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本发明的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific embodiments, and is not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.

需要注意的是，附图中的流程图和框图示出了根据本公开的各种实施例的方法和系统的可能实现的体系架构、功能和操作。应当注意，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，所述模块、程序段、或代码的一部分可以包括一个或多个用于实现各个实施例中所规定的逻辑功能的可执行指令。也应当注意，在有些作为备选的实现中，方框中所标注的功能也可以按照不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，或者它们有时也可以按照相反的顺序执行，这取决于所涉及的功能。同样应当注意的是，流程图和/或框图中的每个方框、以及流程图和/或框图中的方框的组合，可以使用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以使用专用硬件与计算机指令的组合来实现。It should be noted that the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functions and operations of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that each block in a flowchart or a block diagram may represent a module, a program segment, or a part of a code, and the module, a program segment, or a part of a code may include one or more An executable instruction for a specified logical function. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the flowchart and/or block diagrams, and combinations of blocks in the flowchart and/or block diagrams, can be implemented using a dedicated hardware-based system that performs the specified functions or operations , or can be implemented using a combination of dedicated hardware and computer instructions.

实施例一Embodiment one

如图1、图2所示，本实施例提供了一种基于块间对比注意力机制的智慧工地图像分割方法，本实施例以该方法应用于服务器进行举例说明，可以理解的是，该方法也可以应用于终端，还可以应用于包括终端和服务器和系统，并通过终端和服务器的交互实现。服务器可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务器、云通信、中间件服务、域名服务、安全服务CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等，但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接，本申请在此不做限制。本实施例中，该方法包括以下步骤：As shown in Figure 1 and Figure 2, this embodiment provides a smart construction site image segmentation method based on the inter-block contrastive attention mechanism. It can also be applied to a terminal, and can also be applied to a terminal, a server and a system, and is realized through interaction between the terminal and the server. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network servers, cloud communications, intermediate Cloud servers for basic cloud computing services such as software services, domain name services, security service CDN, and big data and artificial intelligence platforms. The terminal may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto. The terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this application. In this embodiment, the method includes the following steps:

获取标注分割标签的工地场景图像训练样本；提取工地场景图像训练样本的特征图；对分割标签进行独热编码处理，得到若干个标签向量；对特征图进行分块处理和最大池化处理，依据标签向量，得到块级分类标签；将特征图分成若干个块级特征图；在块级分类标签监督下，对块级特征图进行映射，得到若干个块级CAM；基于若干个块级特征图和若干个块级CAM，建立块级相关性矩阵；将块级相关性矩阵映射为全局相关性矩阵；计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度，得到输出特征图，根据输出特征图，得到分割模型的输出结果，根据分割模型的输出结果和分割标签，采用损失函数，优化分割模型的超参数，得到已训练的分割模型。Obtain the training samples of construction site scene images marked with segmentation labels; extract the feature map of the training site image training samples; perform one-hot encoding on the segmentation labels to obtain several label vectors; perform block processing and maximum pooling processing on the feature maps, according to Label vector to obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several block-level CAMs; based on several block-level feature maps and several block-level CAMs to establish a block-level correlation matrix; map the block-level correlation matrix to a global correlation matrix; calculate the positive sample similarity of the block-level correlation matrix, the negative sample similarity of the block-level correlation matrix, The positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix are used to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the output result of the segmentation model and the segmentation label, the loss function is used , optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model.

下面对本实施例的具体方案进行详细介绍：The specific scheme of this embodiment is introduced in detail below:

1、特征提取1. Feature extraction

对于一张输入图像，首先通过主干网络（如：VGG、ResNet-101）将输入图像映射为的特征图，其中H、W分别为特征图的长和宽，C为特征图的通道数。For an input image, the input image is first mapped to The feature map of , where H and W are the length and width of the feature map, respectively, and C is the number of channels of the feature map.

2、构造块级特征图2. Building block level feature map

首先对分割标签进行独热编码（One-Hot Encoding），得到K个标签向量，其中K为数据集的类别数。设将特征图分为/>个块，/>，随后通过/>的最大池化，获取块级分类标签/>。随后，将/>划分为Np个块，例如Np=4，即将特征图/>分为16个长和宽分别为h和w的块级特征图，并在块级分类标签/>监督下，通过/>卷积，将块级/>映射为Np个/>的块级CAM。最终块级特征图/>的维度为/>，块级CAM的维度为/>。First, one-hot encoding (One-Hot Encoding) is performed on the segmentation label to obtain K label vectors , where K is the number of categories in the dataset. Let the feature map be divided into /> blocks, /> , followed by /> The maximum pooling of to obtain block-level classification labels /> . Subsequently, the /> Divided into Np blocks, such as Np=4, that is, the feature map /> Divided into 16 block-level feature maps with length and width h and w respectively, and classify labels at the block level /> supervised by /> Convolution, the block-level /> mapped to Np /> block-level CAM. Final block-level feature map /> has a dimension of /> , the dimension of the block-level CAM is /> .

3、挖掘块级别的类间长依赖关系3. Mining long dependencies between classes at the block level

将个/>的块级特征图/>，经过reshape变为N_p/>hw/>C的维度，/>个/>的块级CAM，经过reshape变为N_p/>K/>hw的维度，/>和CAM进行矩阵乘法后得到N_p/>K/>C的矩阵，该矩阵建立K和C的相关性，即/>个/>的相关性矩阵/>，/>体现了第i各类别与第j个通道之间的相关性。最终可以在块内建立起通道和类别之间的长依赖关系。通过构建相关性矩阵T挖掘块内通道和数据集类别之间的长依赖关系，在获取长依赖信息的同时，保证像素级分割任务对细粒度短依赖信息的需求。Will a /> The block-level feature map of /> , becomes N _p /> after reshape hw/> the dimension of C, /> a /> The block-level CAM of is changed to N _p /> after reshape K/> dimension of hw, /> N _p /> is obtained after matrix multiplication with CAM K/> A matrix of C that establishes the correlation of K and C, i.e. /> a /> The correlation matrix /> , /> It reflects the correlation between the i-th categories and the j-th channel. Eventually long dependencies between channels and categories can be established within blocks. By constructing a correlation matrix T to mine the long-term dependencies between the channels in the block and the categories of the dataset, while obtaining the long-term dependency information, the pixel-level segmentation task needs fine-grained short-term dependency information.

4、最优样本的自动获取4. Automatic acquisition of optimal samples

维护一个可学习的权重矩阵A用于获取对比学习正样本，维度为，该矩阵包含K个归一化后的权重向量/>。具体来看，对于某一个类别/>，本实施例将相关性矩阵/>中响应值较高的通道作为正样本，将相关性矩阵/>中响应值较低的通道作为负样本，正样本自适应地获取更高的权重系数。同时，为了考虑块内所有样本的信息，本实施例将系数矩阵1-A作为选择负样本的自适应权重系数。Maintain a learnable weight matrix A for obtaining positive samples for contrastive learning, the dimension is , the matrix contains K normalized weight vectors /> . Specifically, for a certain category /> , in this embodiment the correlation matrix /> The channel with a higher response value in the medium is used as a positive sample, and the correlation matrix /> The channel with a lower response value in the medium is used as a negative sample, and the positive sample adaptively obtains a higher weight coefficient. At the same time, in order to consider the information of all samples in the block, this embodiment uses the coefficient matrix 1-A as an adaptive weight coefficient for selecting negative samples.

5、块间对比注意力机制的构建5. The construction of inter-block contrastive attention mechanism

通过引入块级对比学习迫使具有更强大的表征能力，即某一通道对于某一类别相关性更大，使之在/>中的响应值/>越大。通过全连接层，将/>个块级相关性矩阵/>映射为全局相关性矩阵/>，即/>，其中/>表示输入维度为，输出维度为1的线性层，获取块间长依赖关系。通过最优样本的自动获取策略，对于每个类别同时选取块级和全局相关性矩阵/>和/>中响应值高的通道作为正样本，响应值低的通道作为负样本。并通过权重矩阵A为正样本相似度赋予更高的权重并加权求和作为正样本相似度，通过1-A为负样本相似度赋予更高权重并进行加权求和作为负样本相似度。进一步计算对比损失，通过对比损失可使得正样本间的距离缩小，负样本之间的距离被拉大，即对于某一类别，与该类别相关的通道在对比损失的作用下会更相似，与之不相关的通过将会被疏远。By introducing block-level contrastive learning to force It has a stronger representation ability, that is, a certain channel is more relevant to a certain category, so that it can be used in /> Response value in /> bigger. Through the fully connected layer, the /> block-level correlation matrix/> map to global correlation matrix /> , i.e. /> , where /> Indicates that the input dimension is , output a linear layer with dimension 1, and capture long dependencies between blocks. Select both block-level and global correlation matrices for each category through an automatic acquisition strategy of optimal samples /> and /> Channels with high response values are regarded as positive samples, and channels with low response values are regarded as negative samples. And through the weight matrix A, assign higher weights to the similarity of positive samples and weight the summation as the similarity of positive samples, and assign higher weights to the similarity of negative samples through 1-A and perform weighted summation as the similarity of negative samples. Further calculate the contrast loss, through the contrast loss, the distance between positive samples can be reduced, and the distance between negative samples can be enlarged, that is, for a certain category, the channels related to this category will be more similar under the effect of contrast loss, and Irrelevant passages will be alienated.

由于最后用于对比损失计算的正负样本来自于每个块，因此上述操作可将块级语义信息拓展到全局。进一步，基于若干个块级特征图和若干个块级CAM，引入注意力机制。传统自注意力机制模型常采用“查询-键-值”模型(Query-Key-Value,QKV)，即将输入特征图分别经过线性变换W_q、W_k、W_v得到Q、K、V三个特征图，Q和K通过缩放点积等方式获取相关性矩阵，与V进行矩阵乘法，获取输出特征图。本发明通过输入特征图和CAM构建注意力机制，获取输出特征图的过程包括：将块级CAM经过的线性变换后作为注意力机制的V，与若干个块级特征图和若干个块级CAM进行注意力机制的计算，构建输出特征图。Since the final positive and negative samples used for the comparison loss calculation come from each block, the above operations can extend the block-level semantic information to the whole world. Further, based on several block-level feature maps and several block-level CAMs, an attention mechanism is introduced. The traditional self-attention mechanism model often adopts the "Query-Key-Value" model (Query-Key-Value, QKV), that is, the input feature map is linearly transformed by W _q , W _k , and W _v to obtain three parameters of Q, K, and V. The feature map, Q and K obtain the correlation matrix by scaling the dot product, etc., and perform matrix multiplication with V to obtain the output feature map. The present invention constructs attention mechanism through input feature map and CAM, and the process of obtaining output feature map includes: passing block-level CAM through After the linear transformation of V as the attention mechanism, calculate the attention mechanism with several block-level feature maps and several block-level CAMs to construct the output feature map.

即首先，首先获取通道和类别相关性矩阵。其次，在块内获取类间长距离依赖后，通过将多个块级相关性矩阵进行聚合，获取全局类间长距离依赖关系的全局相关性矩阵。最后，正负样本采样后计算对比损失，损失回传给网络，从而通过对比学习迫使模型学习到更具结构性的通道和类别相关性矩阵。从而同时建立块内长依赖关系和块间长依赖关系，满足了语义分割任务同时对全局语义信息和细粒度信息的依赖。That is, first, the channel and category correlation matrices are obtained first. Second, after obtaining the inter-class long-distance dependencies within a block, the global correlation matrix of the global inter-class long-distance dependencies is obtained by aggregating multiple block-level correlation matrices. Finally, the contrastive loss is calculated after the positive and negative samples are sampled, and the loss is passed back to the network, thereby forcing the model to learn a more structured channel and category correlation matrix through contrastive learning. In this way, the intra-block long-term dependency and inter-block long-term dependency are established at the same time, which satisfies the semantic segmentation task's dependence on global semantic information and fine-grained information at the same time.

6、获取语义分割掩码6. Get the semantic segmentation mask

经过块间对比注意力的运算得到的特征图/>，并将其重塑为维的特征图，其与输入特征图具有相同维度。然后通过上采样操作，得到与原图大小相同的语义分割掩码，从而获得精确且鲁棒的分割结果，并与分割标签计算分割损失/>。Obtained by the operation of contrastive attention between blocks Feature map of /> , and reshape it to A feature map of dimension , which has the same dimension as the input feature map. Then through the upsampling operation, the semantic segmentation mask with the same size as the original image is obtained, so as to obtain accurate and robust segmentation results, and calculate the segmentation loss with the segmentation label /> .

7、计算损失并进行梯度回传7. Calculate the loss and perform gradient return

将类级标签指导下的CAM分类预测损失、语义分割损失/>，以及块间对比注意力的对比损失/>进行汇总，得到最终损失/>，并进行梯度回传，本发明将其定义如下：CAM classification prediction loss guided by class-level labels , semantic segmentation loss /> , and the contrastive loss for inter-block contrastive attention /> Aggregate to get the final loss /> , and carry out gradient return, the present invention defines it as follows:

其中，、/>、/>分别为各损失的权重系数，通过大量实验，本发明将其分别设置为1、0.4、1。in, , /> , /> are the weight coefficients of each loss, which are set to 1, 0.4, and 1 respectively in the present invention through a large number of experiments.

8、模型测试及应用8. Model testing and application

将测试集输入训练好的分割模型，输出预测的分割结果，并通过平均交并比（mIoU）对模型性能进行评价。Input the test set into the trained segmentation model, output the predicted segmentation results, and evaluate the model performance by the average intersection-over-union ratio (mIoU).

经过测试的模型，可用于工地现场图像分割任务，对于工地现场监控以及塔吊拍摄的图像，经过分割模型，获取图像中各目标的分割区域。The tested model can be used for construction site image segmentation tasks. For construction site monitoring and images taken by tower cranes, the segmented area of each target in the image can be obtained through the segmentation model.

在局部信息方面，本实施例用分块的注意力机制建立原始特征图的通道和类别之间的相关性，相对于传统注意力机制，块内注意力机制更利于网络将注意力集中在如何对更具细粒度的信息进行挖掘上。在全局信息层面，本实施例提出块间对比学习，首先，将每个通道和类别相关性矩阵中响应值较高的通道作为对应类别的正样本，响应值较低的通道作为对应类别的负样本。更为具体地，将每一类别的块内正负样本拓展到全局正负样本，促使模型在挖掘更具区分性的全局语义信息的同时获取细粒度的局部语义信息。值得注意是的，本实施例在如何选取正负样本方面，维持了一个可学习的权重矩阵，在保证不丢失任何正负样本信息的同时使选取的正负样本更具拟合性。In terms of local information, this embodiment uses the block attention mechanism to establish the correlation between the channels and categories of the original feature map. Compared with the traditional attention mechanism, the block attention mechanism is more conducive to the network to focus on how to Mining for finer-grained information. At the level of global information, this embodiment proposes inter-block comparative learning. First, channels with higher response values in each channel and category correlation matrix are used as positive samples of the corresponding category, and channels with lower response values are used as negative samples of the corresponding category. sample. More specifically, the intra-block positive and negative samples of each category are extended to global positive and negative samples, which enables the model to acquire fine-grained local semantic information while mining more discriminative global semantic information. It is worth noting that this embodiment maintains a learnable weight matrix in terms of how to select positive and negative samples, which makes the selected positive and negative samples more fitting while ensuring that no information about the positive and negative samples is lost.

本发明通过对特征图以及CAM进行分块的操作，并在块维度上计算特征图以及CAM的通道相关性，挖掘图像类别对通道的依赖关系。为了使生成的特征图以及CAM的通道相关性矩阵的嵌入空间具有更强大的表征能力，本发明提出了块级对比注意力机制，其不但能对图像的长距离依赖关系进行建模，而且基于块级特征建立了通道和类别之间的短距离依赖关系，可以同时保证语义分割任务对粗粒度和细粒度信息的需求。利用对比学习强调通道和类别间的关联性，并将每个类别的块级正负样本进行融合，获取每个类别的全局正负样本，建立起块之间的关联性，使分割模型具有更好的表征能力以及鲁棒的分割性能。In the present invention, the feature map and the CAM are divided into blocks, and the channel correlation of the feature map and the CAM is calculated in the block dimension, so as to mine the dependence of the image category on the channel. In order to make the generated feature map and the embedding space of the channel correlation matrix of CAM have more powerful representation capabilities, the present invention proposes a block-level contrastive attention mechanism, which can not only model the long-distance dependencies of images, but also based on Block-level features establish short-distance dependencies between channels and categories, which can simultaneously guarantee the needs of coarse-grained and fine-grained information for semantic segmentation tasks. Use comparative learning to emphasize the correlation between channels and categories, and fuse the block-level positive and negative samples of each category to obtain the global positive and negative samples of each category, establish the correlation between blocks, and make the segmentation model more efficient Good representation ability and robust segmentation performance.

实施例二Embodiment two

本实施例提供了一种基于块间对比注意力机制的智慧工地图像分割系统。This embodiment provides a smart construction site image segmentation system based on an inter-block contrastive attention mechanism.

分割模型训练模块，其被配置为：获取标注分割标签的工地场景图像训练样本；提取工地场景图像训练样本的特征图；对分割标签进行独热编码处理，得到若干个标签向量；对特征图进行分块处理和最大池化处理，依据标签向量，得到块级分类标签；将特征图分成若干个块级特征图；在块级分类标签监督下，对块级特征图进行映射，得到若干个块级CAM；基于若干个块级特征图和若干个块级CAM，建立块级相关性矩阵；将块级相关性矩阵映射为全局相关性矩阵；计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度，得到输出特征图，根据输出特征图，得到分割模型的输出结果，根据分割模型的输出结果和分割标签，采用损失函数，优化分割模型的超参数，得到已训练的分割模型。此处需要说明的是，上述预测模块和分割模型训练模块与实施例一中的步骤所实现的示例和应用场景相同，但不限于上述实施例一所公开的内容。需要说明的是，上述模块作为系统的一部分可以在诸如一组计算机可执行指令的计算机系统中执行。The segmentation model training module is configured to: obtain training samples of construction site scene images marked with segmentation labels; extract feature maps of construction site scene image training samples; perform one-hot encoding on the segmentation labels to obtain several label vectors; Block processing and maximum pooling processing, according to the label vector, obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several blocks level CAM; based on several block-level feature maps and several block-level CAMs, a block-level correlation matrix is established; the block-level correlation matrix is mapped to a global correlation matrix; the positive sample similarity and block-level correlation matrix are calculated. The negative sample similarity of the level correlation matrix, the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix are obtained to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the segmentation model The output result and the segmentation label, using the loss function, optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model. It should be noted here that the examples and application scenarios implemented by the above-mentioned prediction module and segmentation model training module are the same as those in the first embodiment, but are not limited to the content disclosed in the first embodiment above. It should be noted that, as a part of the system, the above-mentioned modules can be executed in a computer system such as a set of computer-executable instructions.

实施例三Embodiment three

本实施例提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上述实施例一所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。This embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for image segmentation of a smart construction site based on the contrastive attention mechanism between blocks as described in the first embodiment above is implemented. A step of.

实施例四Embodiment four

本实施例提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述实施例一所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。This embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the above-mentioned first embodiment based on Steps in a Smart Construction Site Image Segmentation Method with Inter-Block Contrastive Attention Mechanism.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. The intelligent map image segmentation method based on the inter-block contrast attention mechanism is characterized by comprising the following steps of:

based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;

the training process of the segmentation model comprises the following steps: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.

2. The intelligent job site image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the process of establishing a block-level correlation matrix based on a number of block-level feature maps and a number of block-level CAMs comprises: and respectively carrying out matrix transformation on the plurality of block-level feature maps and the plurality of block-level CAM, and then carrying out matrix multiplication to obtain a block-level correlation matrix of long dependency relationship between the channel and the category.

3. The intelligent work image segmentation method based on inter-block contrast attention mechanism of claim 1, wherein the calculation process of the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix comprises: and taking the channel with the response value higher than the set value in the block-level correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix.

4. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the process of mapping the block-level correlation matrix into a global correlation matrix comprises: the block-level correlation matrix is mapped to a global correlation matrix through the full connection layer.

5. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the calculation process of the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix comprises: and taking the channel with the response value higher than the set value in the global correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix.

6. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, further comprising, after obtaining the output feature map: and processing the dimension of the output feature map to be the same as the dimension of the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through up-sampling, namely, obtaining an output result of the segmentation model.

7. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the loss function comprises:

under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of prediction loss functions of the block level CAM;

performing a semantic segmentation loss function of the target segmentation region;

and a contrast loss function between inter-block long dependencies and intra-block long dependencies.

8. An intelligent building image segmentation system based on inter-block contrast attention mechanism, comprising:

a prediction module configured to: based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;

a segmentation model training module configured to: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.

9. A computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the steps of the intelligent worker image segmentation method based on inter-block contrast attention mechanism as claimed in any of claims 1-7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps in the intelligent inter-block contrast attention based method of any of claims 1-7 when the program is executed.