CN116664845A - Smart construction site image segmentation method and system based on inter-block contrastive attention mechanism - Google Patents
Smart construction site image segmentation method and system based on inter-block contrastive attention mechanism Download PDFInfo
- Publication number
- CN116664845A CN116664845A CN202310935833.6A CN202310935833A CN116664845A CN 116664845 A CN116664845 A CN 116664845A CN 202310935833 A CN202310935833 A CN 202310935833A CN 116664845 A CN116664845 A CN 116664845A
- Authority
- CN
- China
- Prior art keywords
- block
- level
- correlation matrix
- segmentation
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title claims abstract description 62
- 230000007246 mechanism Effects 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000003709 image segmentation Methods 0.000 title claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 103
- 230000011218 segmentation Effects 0.000 claims abstract description 94
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 19
- 239000013598 vector Substances 0.000 claims abstract description 16
- 238000013507 mapping Methods 0.000 claims abstract description 10
- 238000011176 pooling Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims 1
- 238000012544 monitoring process Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 8
- 230000000052 comparative effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000007774 longterm Effects 0.000 description 4
- 238000005065 mining Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004195 computer-aided diagnosis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于图像分割技术领域,尤其涉及一种基于块间对比注意力机制的智慧工地图像分割方法及系统。The invention belongs to the technical field of image segmentation, and in particular relates to an image segmentation method and system for a smart construction site based on an inter-block contrastive attention mechanism.
背景技术Background technique
本部分的陈述仅仅是提供了与本发明相关的背景技术信息,不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.
语义分割任务是计算机视觉中的一项重要课题,其主要任务是为图像的每个像素分配像素级别的类别标签,在自动驾驶、计算机视觉、医学影像分析以及计算机辅助诊断等领域发挥着重要作用。Semantic segmentation task is an important topic in computer vision. Its main task is to assign a pixel-level category label to each pixel of an image. It plays an important role in the fields of autonomous driving, computer vision, medical image analysis, and computer-aided diagnosis. .
智慧工地是指通过信息化手段,对建筑工地实行科学管理、智能生产等,其涉及计算机技术、人工智能技术、传感技术和虚拟现实等技术。建筑工程通常存在工期紧、任务重、风险高、管理难等问题。目前对于建筑工地现场的管理,主要包括巡检和抽检,存在时效性较差、人员监管成本高等问题,从而导致违章操作的频率提高,导致施工现场安全、质量、进度都无法得到有效保证。Smart construction site refers to the implementation of scientific management and intelligent production on construction sites through information technology, which involves computer technology, artificial intelligence technology, sensor technology and virtual reality technology. Construction projects usually have problems such as tight schedules, heavy tasks, high risks, and difficult management. At present, the management of construction sites mainly includes inspections and random inspections, which have problems such as poor timeliness and high personnel supervision costs, which lead to an increase in the frequency of illegal operations, resulting in the inability to effectively guarantee the safety, quality, and progress of construction sites.
随着人工智能发展,也逐步将人工智能技术应用到施工场地辅助监管系统,运用基于深度学习的图像识别算法技术,对工地监控、塔吊拍摄的图片进行监测,目前对于智慧工地场景图像的分割任务有的仅考虑全局的长距离依赖信息,有的仅考虑短距离依赖信息,这两者均影响了图像分割任务的结果的准确性。With the development of artificial intelligence, artificial intelligence technology is gradually applied to the construction site auxiliary supervision system, and the image recognition algorithm technology based on deep learning is used to monitor the construction site monitoring and the pictures taken by the tower crane. At present, the segmentation task of the intelligent construction site scene image Some only consider the global long-distance dependence information, and some only consider the short-distance dependence information, both of which affect the accuracy of the results of the image segmentation task.
发明内容Contents of the invention
为了解决上述背景技术中存在的技术问题,本发明提供一种基于块间对比注意力机制的智慧工地图像分割方法及系统,其能够对工地场景图像进行目标分割处理,有效实现工地安全的智能监测,提供生产管理效率,保障了工地安全施工。In order to solve the technical problems in the above-mentioned background technology, the present invention provides a smart construction site image segmentation method and system based on inter-block comparative attention mechanism, which can perform target segmentation processing on construction site scene images, and effectively realize intelligent monitoring of construction site safety , provide production management efficiency, and ensure safe construction on the site.
为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
本发明的第一个方面提供一种基于块间对比注意力机制的智慧工地图像分割方法。The first aspect of the present invention provides a method for image segmentation of smart construction sites based on an inter-block contrastive attention mechanism.
基于块间对比注意力机制的智慧工地图像分割方法,包括:A smart construction site image segmentation method based on the contrastive attention mechanism between blocks, including:
基于待分割的工地场景图像,采用已训练的分割模型,预测工地场景图像的目标分割区域;Based on the construction site scene image to be segmented, the trained segmentation model is used to predict the target segmentation area of the construction site scene image;
所述分割模型的训练过程包括:获取标注分割标签的工地场景图像训练样本;提取工地场景图像训练样本的特征图;对分割标签进行独热编码处理,得到若干个标签向量;对特征图进行分块处理和最大池化处理,依据标签向量,得到块级分类标签;将特征图分成若干个块级特征图;在块级分类标签监督下,对块级特征图进行映射,得到若干个块级CAM;基于若干个块级特征图和若干个块级CAM,建立块级相关性矩阵;将块级相关性矩阵映射为全局相关性矩阵;计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度,得到输出特征图,根据输出特征图,得到分割模型的输出结果,根据分割模型的输出结果和分割标签,采用损失函数,优化分割模型的超参数,得到已训练的分割模型。The training process of the segmentation model includes: obtaining the construction site scene image training samples marked with segmentation labels; extracting the feature map of the construction site scene image training samples; performing one-hot encoding processing on the segmentation labels to obtain several label vectors; Block processing and maximum pooling processing, according to the label vector, obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several block-level CAM; based on several block-level feature maps and several block-level CAMs, establish a block-level correlation matrix; map the block-level correlation matrix to a global correlation matrix; calculate the positive sample similarity of the block-level correlation matrix, block-level The negative sample similarity of the correlation matrix, the positive sample similarity of the global correlation matrix, and the negative sample similarity of the global correlation matrix are used to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the segmentation model Output the result and segmentation label, use the loss function, optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model.
进一步地,所述基于若干个块级特征图和若干个块级CAM,建立块级相关性矩阵的过程包括:将若干个块级特征图和若干个块级CAM分别进行矩阵变换后,进行矩阵相乘,得到通道和类别之间长依赖关系的块级相关性矩阵。Further, the process of establishing a block-level correlation matrix based on several block-level feature maps and several block-level CAMs includes: performing matrix transformation on several block-level feature maps and several block-level CAMs respectively, and performing matrix Multiplied together, a block-level correlation matrix of long dependencies between channels and categories is obtained.
进一步地,所述块级相关性矩阵的正样本相似度和块级相关性矩阵的负样本相似度的计算过程包括:将块级相关性矩阵中响应值高于设定值的通道为正样本,其余为负样本,引入权重矩阵,进行对比学习,得到块级相关性矩阵的正样本相似度和块级相关性矩阵的负样本相似度。Further, the calculation process of the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix includes: the channels in the block-level correlation matrix whose response value is higher than the set value are positive samples , the rest are negative samples, and the weight matrix is introduced for comparative learning to obtain the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix.
进一步地,所述将块级相关性矩阵映射为全局相关性矩阵的过程包括:通过全连接层,将块级相关性矩阵映射为全局相关性矩阵。Further, the process of mapping the block-level correlation matrix to a global correlation matrix includes: mapping the block-level correlation matrix to a global correlation matrix through a fully connected layer.
进一步地,所述全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度的计算过程包括:将全局相关性矩阵中响应值高于设定值的通道为正样本,其余为负样本,引入权重矩阵,进行对比学习,得到全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度。Further, the calculation process of the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix includes: the channel in the global correlation matrix whose response value is higher than the set value is a positive sample, and the rest are Negative samples, introduce the weight matrix, and carry out comparative learning to obtain the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix.
进一步地,在得到输出特征图之后还包括:对输出特征图的维度进行处理,使之与所述特征图的尺寸相同,通过上采样,得到与特征图大小相同的语义分割掩码,即为分割模型的输出结果。Further, after obtaining the output feature map, it also includes: processing the dimension of the output feature map to make it the same size as the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through upsampling, that is, The output of the segmentation model.
进一步地,所述损失函数包括:Further, the loss function includes:
在块级分类标签监督下,对块级特征图进行映射,得到若干个块级CAM的预测损失函数;Under the supervision of block-level classification labels, the block-level feature maps are mapped to obtain several block-level CAM prediction loss functions;
进行目标分割区域的语义分割损失函数;Carry out the semantic segmentation loss function of the target segmentation area;
以及块间长依赖关系与块内长依赖关系之间的对比损失函数。and a contrastive loss function between inter-block long dependencies and intra-block long dependencies.
本发明的第二个方面提供一种基于块间对比注意力机制的智慧工地图像分割系统。The second aspect of the present invention provides a smart construction site image segmentation system based on inter-block contrastive attention mechanism.
基于块间对比注意力机制的智慧工地图像分割系统,包括:Smart construction site image segmentation system based on inter-block contrastive attention mechanism, including:
预测模块,其被配置为:基于待分割的工地场景图像,采用已训练的分割模型,预测工地场景图像的目标分割区域;A prediction module, which is configured to: predict the target segmentation area of the construction site scene image based on the construction site scene image to be segmented, using a trained segmentation model;
分割模型训练模块,其被配置为:获取标注分割标签的工地场景图像训练样本;提取工地场景图像训练样本的特征图;对分割标签进行独热编码处理,得到若干个标签向量;对特征图进行分块处理和最大池化处理,依据标签向量,得到块级分类标签;将特征图分成若干个块级特征图;在块级分类标签监督下,对块级特征图进行映射,得到若干个块级CAM;基于若干个块级特征图和若干个块级CAM,建立块级相关性矩阵;将块级相关性矩阵映射为全局相关性矩阵;计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度,得到输出特征图,根据输出特征图,得到分割模型的输出结果,根据分割模型的输出结果和分割标签,采用损失函数,优化分割模型的超参数,得到已训练的分割模型。The segmentation model training module is configured to: obtain training samples of construction site scene images marked with segmentation labels; extract feature maps of construction site scene image training samples; perform one-hot encoding on the segmentation labels to obtain several label vectors; Block processing and maximum pooling processing, according to the label vector, obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several blocks level CAM; based on several block-level feature maps and several block-level CAMs, a block-level correlation matrix is established; the block-level correlation matrix is mapped to a global correlation matrix; the positive sample similarity and block-level correlation matrix are calculated. The negative sample similarity of the level correlation matrix, the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix are obtained to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the segmentation model The output result and the segmentation label, using the loss function, optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model.
本发明的第三个方面提供一种计算机可读存储介质。A third aspect of the present invention provides a computer readable storage medium.
一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一个方面所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps in the method for image segmentation of a smart construction site based on an inter-block contrastive attention mechanism as described in the first aspect above are implemented.
本发明的第四个方面提供一种计算机设备。A fourth aspect of the present invention provides a computer device.
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述第一个方面所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。A computer device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program, it realizes the attention based on the comparison between blocks as described in the first aspect above The steps in the image segmentation method of smart construction site based on force mechanism.
与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:
对于工地监控、塔吊传输的施工现场监测图像,本发明通过神经网络进行特征提取,并对输入特征图像进行像素级分类,得到输入图像中各个物体对象的分割区域,有利于进一步检测施工现场的违章操作与安全隐患。For site monitoring and construction site monitoring images transmitted by tower cranes, the present invention performs feature extraction through a neural network, and performs pixel-level classification on the input feature images to obtain the segmented areas of each object in the input image, which is conducive to further detection of violations at the construction site Operational and safety hazards.
本发明将对比学习引入到有监督设定下的语义分割任务,通过对比学习,使得具有相同标签的像素在特征空间中更为接近,而具有不同标签的像素在特征空间中有相对较大的距离,以进一步增强特征的表征能力。由于注意力机制和对比学习在语义分割任务中表现优异,本发明将注意力机制和对比学习进行结合,通过对比损失迫使特征图和类激活映射(Class Activation Mapping,CAM)的通道相关性矩阵具有更高的置信度,从而获得鲁棒且精确的工地场景图像目标分割结果,提高了工地场景图像的分割精度。The present invention introduces contrastive learning into the semantic segmentation task under supervised setting. Through contrastive learning, the pixels with the same label are closer in the feature space, while the pixels with different labels have relatively large distances in the feature space. distance to further enhance the representation ability of features. Since the attention mechanism and contrastive learning perform well in semantic segmentation tasks, the present invention combines the attention mechanism with contrastive learning, forcing the feature map and the channel correlation matrix of the Class Activation Mapping (CAM) to have Higher confidence, so as to obtain robust and accurate target segmentation results of construction site scene images, and improve the segmentation accuracy of construction site scene images.
附图说明Description of drawings
构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。The accompanying drawings constituting a part of the present invention are used to provide a further understanding of the present invention, and the schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention.
图1是本发明示出的基于块间对比注意力机制的智慧工地图像分割方法的流程图;Fig. 1 is the flow chart of the intelligent construction site image segmentation method based on the inter-block contrast attention mechanism shown in the present invention;
图2是本发明示出的基于块间对比注意力机制的智慧工地图像分割方法的框架图。Fig. 2 is a frame diagram of a smart construction site image segmentation method based on an inter-block contrastive attention mechanism shown in the present invention.
具体实施方式Detailed ways
下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.
应该指出,以下详细说明都是例示性的,旨在对本发明提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific embodiments, and is not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.
需要注意的是,附图中的流程图和框图示出了根据本公开的各种实施例的方法和系统的可能实现的体系架构、功能和操作。应当注意,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分可以包括一个或多个用于实现各个实施例中所规定的逻辑功能的可执行指令。也应当注意,在有些作为备选的实现中,方框中所标注的功能也可以按照不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,或者它们有时也可以按照相反的顺序执行,这取决于所涉及的功能。同样应当注意的是,流程图和/或框图中的每个方框、以及流程图和/或框图中的方框的组合,可以使用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以使用专用硬件与计算机指令的组合来实现。It should be noted that the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functions and operations of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that each block in a flowchart or a block diagram may represent a module, a program segment, or a part of a code, and the module, a program segment, or a part of a code may include one or more An executable instruction for a specified logical function. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the flowchart and/or block diagrams, and combinations of blocks in the flowchart and/or block diagrams, can be implemented using a dedicated hardware-based system that performs the specified functions or operations , or can be implemented using a combination of dedicated hardware and computer instructions.
实施例一Embodiment one
如图1、图2所示,本实施例提供了一种基于块间对比注意力机制的智慧工地图像分割方法,本实施例以该方法应用于服务器进行举例说明,可以理解的是,该方法也可以应用于终端,还可以应用于包括终端和服务器和系统,并通过终端和服务器的交互实现。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务器、云通信、中间件服务、域名服务、安全服务CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。本实施例中,该方法包括以下步骤:As shown in Figure 1 and Figure 2, this embodiment provides a smart construction site image segmentation method based on the inter-block contrastive attention mechanism. It can also be applied to a terminal, and can also be applied to a terminal, a server and a system, and is realized through interaction between the terminal and the server. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network servers, cloud communications, intermediate Cloud servers for basic cloud computing services such as software services, domain name services, security service CDN, and big data and artificial intelligence platforms. The terminal may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto. The terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this application. In this embodiment, the method includes the following steps:
基于待分割的工地场景图像,采用已训练的分割模型,预测工地场景图像的目标分割区域;Based on the construction site scene image to be segmented, the trained segmentation model is used to predict the target segmentation area of the construction site scene image;
获取标注分割标签的工地场景图像训练样本;提取工地场景图像训练样本的特征图;对分割标签进行独热编码处理,得到若干个标签向量;对特征图进行分块处理和最大池化处理,依据标签向量,得到块级分类标签;将特征图分成若干个块级特征图;在块级分类标签监督下,对块级特征图进行映射,得到若干个块级CAM;基于若干个块级特征图和若干个块级CAM,建立块级相关性矩阵;将块级相关性矩阵映射为全局相关性矩阵;计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度,得到输出特征图,根据输出特征图,得到分割模型的输出结果,根据分割模型的输出结果和分割标签,采用损失函数,优化分割模型的超参数,得到已训练的分割模型。Obtain the training samples of construction site scene images marked with segmentation labels; extract the feature map of the training site image training samples; perform one-hot encoding on the segmentation labels to obtain several label vectors; perform block processing and maximum pooling processing on the feature maps, according to Label vector to obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several block-level CAMs; based on several block-level feature maps and several block-level CAMs to establish a block-level correlation matrix; map the block-level correlation matrix to a global correlation matrix; calculate the positive sample similarity of the block-level correlation matrix, the negative sample similarity of the block-level correlation matrix, The positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix are used to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the output result of the segmentation model and the segmentation label, the loss function is used , optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model.
下面对本实施例的具体方案进行详细介绍:The specific scheme of this embodiment is introduced in detail below:
1、特征提取1. Feature extraction
对于一张输入图像,首先通过主干网络(如:VGG、ResNet-101)将输入图像映射为的特征图,其中H、W分别为特征图的长和宽,C为特征图的通道数。For an input image, the input image is first mapped to The feature map of , where H and W are the length and width of the feature map, respectively, and C is the number of channels of the feature map.
2、构造块级特征图2. Building block level feature map
首先对分割标签进行独热编码(One-Hot Encoding),得到K个标签向量,其中K为数据集的类别数。设将特征图分为/>个块,/>,随后通过/>的最大池化,获取块级分类标签/>。随后,将/>划分为Np个块,例如Np=4,即将特征图/>分为16个长和宽分别为h和w的块级特征图,并在块级分类标签/>监督下,通过/>卷积,将块级/>映射为Np个/>的块级CAM。最终块级特征图/>的维度为/>,块级CAM的维度为/>。First, one-hot encoding (One-Hot Encoding) is performed on the segmentation label to obtain K label vectors , where K is the number of categories in the dataset. Let the feature map be divided into /> blocks, /> , followed by /> The maximum pooling of to obtain block-level classification labels /> . Subsequently, the /> Divided into Np blocks, such as Np=4, that is, the feature map /> Divided into 16 block-level feature maps with length and width h and w respectively, and classify labels at the block level /> supervised by /> Convolution, the block-level /> mapped to Np /> block-level CAM. Final block-level feature map /> has a dimension of /> , the dimension of the block-level CAM is /> .
3、挖掘块级别的类间长依赖关系3. Mining long dependencies between classes at the block level
将个/>的块级特征图/>,经过reshape变为Np/>hw/>C的维度,/>个/>的块级CAM,经过reshape变为Np/>K/>hw的维度,/>和CAM进行矩阵乘法后得到Np/>K/>C的矩阵,该矩阵建立K和C的相关性,即/>个/>的相关性矩阵/>,/>体现了第i各类别与第j个通道之间的相关性。最终可以在块内建立起通道和类别之间的长依赖关系。通过构建相关性矩阵T挖掘块内通道和数据集类别之间的长依赖关系,在获取长依赖信息的同时,保证像素级分割任务对细粒度短依赖信息的需求。Will a /> The block-level feature map of /> , becomes N p /> after reshape hw/> the dimension of C, /> a /> The block-level CAM of is changed to N p /> after reshape K/> dimension of hw, /> N p /> is obtained after matrix multiplication with CAM K/> A matrix of C that establishes the correlation of K and C, i.e. /> a /> The correlation matrix /> , /> It reflects the correlation between the i-th categories and the j-th channel. Eventually long dependencies between channels and categories can be established within blocks. By constructing a correlation matrix T to mine the long-term dependencies between the channels in the block and the categories of the dataset, while obtaining the long-term dependency information, the pixel-level segmentation task needs fine-grained short-term dependency information.
4、最优样本的自动获取4. Automatic acquisition of optimal samples
维护一个可学习的权重矩阵A用于获取对比学习正样本,维度为,该矩阵包含K个归一化后的权重向量/>。具体来看,对于某一个类别/>,本实施例将相关性矩阵/>中响应值较高的通道作为正样本,将相关性矩阵/>中响应值较低的通道作为负样本,正样本自适应地获取更高的权重系数。同时,为了考虑块内所有样本的信息,本实施例将系数矩阵1-A作为选择负样本的自适应权重系数。Maintain a learnable weight matrix A for obtaining positive samples for contrastive learning, the dimension is , the matrix contains K normalized weight vectors /> . Specifically, for a certain category /> , in this embodiment the correlation matrix /> The channel with a higher response value in the medium is used as a positive sample, and the correlation matrix /> The channel with a lower response value in the medium is used as a negative sample, and the positive sample adaptively obtains a higher weight coefficient. At the same time, in order to consider the information of all samples in the block, this embodiment uses the coefficient matrix 1-A as an adaptive weight coefficient for selecting negative samples.
5、块间对比注意力机制的构建5. The construction of inter-block contrastive attention mechanism
通过引入块级对比学习迫使具有更强大的表征能力,即某一通道对于某一类别相关性更大,使之在/>中的响应值/>越大。通过全连接层,将/>个块级相关性矩阵/>映射为全局相关性矩阵/>,即/>,其中/>表示输入维度为,输出维度为1的线性层,获取块间长依赖关系。通过最优样本的自动获取策略,对于每个类别同时选取块级和全局相关性矩阵/>和/>中响应值高的通道作为正样本,响应值低的通道作为负样本。并通过权重矩阵A为正样本相似度赋予更高的权重并加权求和作为正样本相似度,通过1-A为负样本相似度赋予更高权重并进行加权求和作为负样本相似度。进一步计算对比损失,通过对比损失可使得正样本间的距离缩小,负样本之间的距离被拉大,即对于某一类别,与该类别相关的通道在对比损失的作用下会更相似,与之不相关的通过将会被疏远。By introducing block-level contrastive learning to force It has a stronger representation ability, that is, a certain channel is more relevant to a certain category, so that it can be used in /> Response value in /> bigger. Through the fully connected layer, the /> block-level correlation matrix/> map to global correlation matrix /> , i.e. /> , where /> Indicates that the input dimension is , output a linear layer with dimension 1, and capture long dependencies between blocks. Select both block-level and global correlation matrices for each category through an automatic acquisition strategy of optimal samples /> and /> Channels with high response values are regarded as positive samples, and channels with low response values are regarded as negative samples. And through the weight matrix A, assign higher weights to the similarity of positive samples and weight the summation as the similarity of positive samples, and assign higher weights to the similarity of negative samples through 1-A and perform weighted summation as the similarity of negative samples. Further calculate the contrast loss, through the contrast loss, the distance between positive samples can be reduced, and the distance between negative samples can be enlarged, that is, for a certain category, the channels related to this category will be more similar under the effect of contrast loss, and Irrelevant passages will be alienated.
由于最后用于对比损失计算的正负样本来自于每个块,因此上述操作可将块级语义信息拓展到全局。进一步,基于若干个块级特征图和若干个块级CAM,引入注意力机制。传统自注意力机制模型常采用“查询-键-值”模型(Query-Key-Value,QKV),即将输入特征图分别经过线性变换Wq、Wk、Wv得到Q、K、V三个特征图,Q和K通过缩放点积等方式获取相关性矩阵,与V进行矩阵乘法,获取输出特征图。本发明通过输入特征图和CAM构建注意力机制,获取输出特征图的过程包括:将块级CAM经过的线性变换后作为注意力机制的V,与若干个块级特征图和若干个块级CAM进行注意力机制的计算,构建输出特征图。Since the final positive and negative samples used for the comparison loss calculation come from each block, the above operations can extend the block-level semantic information to the whole world. Further, based on several block-level feature maps and several block-level CAMs, an attention mechanism is introduced. The traditional self-attention mechanism model often adopts the "Query-Key-Value" model (Query-Key-Value, QKV), that is, the input feature map is linearly transformed by W q , W k , and W v to obtain three parameters of Q, K, and V. The feature map, Q and K obtain the correlation matrix by scaling the dot product, etc., and perform matrix multiplication with V to obtain the output feature map. The present invention constructs attention mechanism through input feature map and CAM, and the process of obtaining output feature map includes: passing block-level CAM through After the linear transformation of V as the attention mechanism, calculate the attention mechanism with several block-level feature maps and several block-level CAMs to construct the output feature map.
即首先,首先获取通道和类别相关性矩阵。其次,在块内获取类间长距离依赖后,通过将多个块级相关性矩阵进行聚合,获取全局类间长距离依赖关系的全局相关性矩阵。最后,正负样本采样后计算对比损失,损失回传给网络,从而通过对比学习迫使模型学习到更具结构性的通道和类别相关性矩阵。从而同时建立块内长依赖关系和块间长依赖关系,满足了语义分割任务同时对全局语义信息和细粒度信息的依赖。That is, first, the channel and category correlation matrices are obtained first. Second, after obtaining the inter-class long-distance dependencies within a block, the global correlation matrix of the global inter-class long-distance dependencies is obtained by aggregating multiple block-level correlation matrices. Finally, the contrastive loss is calculated after the positive and negative samples are sampled, and the loss is passed back to the network, thereby forcing the model to learn a more structured channel and category correlation matrix through contrastive learning. In this way, the intra-block long-term dependency and inter-block long-term dependency are established at the same time, which satisfies the semantic segmentation task's dependence on global semantic information and fine-grained information at the same time.
6、获取语义分割掩码6. Get the semantic segmentation mask
经过块间对比注意力的运算得到的特征图/>,并将其重塑为维的特征图,其与输入特征图具有相同维度。然后通过上采样操作,得到与原图大小相同的语义分割掩码,从而获得精确且鲁棒的分割结果,并与分割标签计算分割损失/>。Obtained by the operation of contrastive attention between blocks Feature map of /> , and reshape it to A feature map of dimension , which has the same dimension as the input feature map. Then through the upsampling operation, the semantic segmentation mask with the same size as the original image is obtained, so as to obtain accurate and robust segmentation results, and calculate the segmentation loss with the segmentation label /> .
7、计算损失并进行梯度回传7. Calculate the loss and perform gradient return
将类级标签指导下的CAM分类预测损失、语义分割损失/>,以及块间对比注意力的对比损失/>进行汇总,得到最终损失/>,并进行梯度回传,本发明将其定义如下:CAM classification prediction loss guided by class-level labels , semantic segmentation loss /> , and the contrastive loss for inter-block contrastive attention /> Aggregate to get the final loss /> , and carry out gradient return, the present invention defines it as follows:
其中,、/>、/>分别为各损失的权重系数,通过大量实验,本发明将其分别设置为1、0.4、1。in, , /> , /> are the weight coefficients of each loss, which are set to 1, 0.4, and 1 respectively in the present invention through a large number of experiments.
8、模型测试及应用8. Model testing and application
将测试集输入训练好的分割模型,输出预测的分割结果,并通过平均交并比(mIoU)对模型性能进行评价。Input the test set into the trained segmentation model, output the predicted segmentation results, and evaluate the model performance by the average intersection-over-union ratio (mIoU).
经过测试的模型,可用于工地现场图像分割任务,对于工地现场监控以及塔吊拍摄的图像,经过分割模型,获取图像中各目标的分割区域。The tested model can be used for construction site image segmentation tasks. For construction site monitoring and images taken by tower cranes, the segmented area of each target in the image can be obtained through the segmentation model.
在局部信息方面,本实施例用分块的注意力机制建立原始特征图的通道和类别之间的相关性,相对于传统注意力机制,块内注意力机制更利于网络将注意力集中在如何对更具细粒度的信息进行挖掘上。在全局信息层面,本实施例提出块间对比学习,首先,将每个通道和类别相关性矩阵中响应值较高的通道作为对应类别的正样本,响应值较低的通道作为对应类别的负样本。更为具体地,将每一类别的块内正负样本拓展到全局正负样本,促使模型在挖掘更具区分性的全局语义信息的同时获取细粒度的局部语义信息。值得注意是的,本实施例在如何选取正负样本方面,维持了一个可学习的权重矩阵,在保证不丢失任何正负样本信息的同时使选取的正负样本更具拟合性。In terms of local information, this embodiment uses the block attention mechanism to establish the correlation between the channels and categories of the original feature map. Compared with the traditional attention mechanism, the block attention mechanism is more conducive to the network to focus on how to Mining for finer-grained information. At the level of global information, this embodiment proposes inter-block comparative learning. First, channels with higher response values in each channel and category correlation matrix are used as positive samples of the corresponding category, and channels with lower response values are used as negative samples of the corresponding category. sample. More specifically, the intra-block positive and negative samples of each category are extended to global positive and negative samples, which enables the model to acquire fine-grained local semantic information while mining more discriminative global semantic information. It is worth noting that this embodiment maintains a learnable weight matrix in terms of how to select positive and negative samples, which makes the selected positive and negative samples more fitting while ensuring that no information about the positive and negative samples is lost.
本发明通过对特征图以及CAM进行分块的操作,并在块维度上计算特征图以及CAM的通道相关性,挖掘图像类别对通道的依赖关系。为了使生成的特征图以及CAM的通道相关性矩阵的嵌入空间具有更强大的表征能力,本发明提出了块级对比注意力机制,其不但能对图像的长距离依赖关系进行建模,而且基于块级特征建立了通道和类别之间的短距离依赖关系,可以同时保证语义分割任务对粗粒度和细粒度信息的需求。利用对比学习强调通道和类别间的关联性,并将每个类别的块级正负样本进行融合,获取每个类别的全局正负样本,建立起块之间的关联性,使分割模型具有更好的表征能力以及鲁棒的分割性能。In the present invention, the feature map and the CAM are divided into blocks, and the channel correlation of the feature map and the CAM is calculated in the block dimension, so as to mine the dependence of the image category on the channel. In order to make the generated feature map and the embedding space of the channel correlation matrix of CAM have more powerful representation capabilities, the present invention proposes a block-level contrastive attention mechanism, which can not only model the long-distance dependencies of images, but also based on Block-level features establish short-distance dependencies between channels and categories, which can simultaneously guarantee the needs of coarse-grained and fine-grained information for semantic segmentation tasks. Use comparative learning to emphasize the correlation between channels and categories, and fuse the block-level positive and negative samples of each category to obtain the global positive and negative samples of each category, establish the correlation between blocks, and make the segmentation model more efficient Good representation ability and robust segmentation performance.
实施例二Embodiment two
本实施例提供了一种基于块间对比注意力机制的智慧工地图像分割系统。This embodiment provides a smart construction site image segmentation system based on an inter-block contrastive attention mechanism.
基于块间对比注意力机制的智慧工地图像分割系统,包括:Smart construction site image segmentation system based on inter-block contrastive attention mechanism, including:
预测模块,其被配置为:基于待分割的工地场景图像,采用已训练的分割模型,预测工地场景图像的目标分割区域;A prediction module, which is configured to: predict the target segmentation area of the construction site scene image based on the construction site scene image to be segmented, using a trained segmentation model;
分割模型训练模块,其被配置为:获取标注分割标签的工地场景图像训练样本;提取工地场景图像训练样本的特征图;对分割标签进行独热编码处理,得到若干个标签向量;对特征图进行分块处理和最大池化处理,依据标签向量,得到块级分类标签;将特征图分成若干个块级特征图;在块级分类标签监督下,对块级特征图进行映射,得到若干个块级CAM;基于若干个块级特征图和若干个块级CAM,建立块级相关性矩阵;将块级相关性矩阵映射为全局相关性矩阵;计算块级相关性矩阵的正样本相似度、块级相关性矩阵的负样本相似度、全局相关性矩阵的正样本相似度和全局相关性矩阵的负样本相似度,得到输出特征图,根据输出特征图,得到分割模型的输出结果,根据分割模型的输出结果和分割标签,采用损失函数,优化分割模型的超参数,得到已训练的分割模型。此处需要说明的是,上述预测模块和分割模型训练模块与实施例一中的步骤所实现的示例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为系统的一部分可以在诸如一组计算机可执行指令的计算机系统中执行。The segmentation model training module is configured to: obtain training samples of construction site scene images marked with segmentation labels; extract feature maps of construction site scene image training samples; perform one-hot encoding on the segmentation labels to obtain several label vectors; Block processing and maximum pooling processing, according to the label vector, obtain block-level classification labels; divide the feature map into several block-level feature maps; under the supervision of block-level classification labels, map the block-level feature maps to obtain several blocks level CAM; based on several block-level feature maps and several block-level CAMs, a block-level correlation matrix is established; the block-level correlation matrix is mapped to a global correlation matrix; the positive sample similarity and block-level correlation matrix are calculated. The negative sample similarity of the level correlation matrix, the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix are obtained to obtain the output feature map. According to the output feature map, the output result of the segmentation model is obtained. According to the segmentation model The output result and the segmentation label, using the loss function, optimize the hyperparameters of the segmentation model, and obtain the trained segmentation model. It should be noted here that the examples and application scenarios implemented by the above-mentioned prediction module and segmentation model training module are the same as those in the first embodiment, but are not limited to the content disclosed in the first embodiment above. It should be noted that, as a part of the system, the above-mentioned modules can be executed in a computer system such as a set of computer-executable instructions.
实施例三Embodiment three
本实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述实施例一所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。This embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for image segmentation of a smart construction site based on the contrastive attention mechanism between blocks as described in the first embodiment above is implemented. A step of.
实施例四Embodiment four
本实施例提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述实施例一所述的基于块间对比注意力机制的智慧工地图像分割方法中的步骤。This embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the above-mentioned first embodiment based on Steps in a Smart Construction Site Image Segmentation Method with Inter-Block Contrastive Attention Mechanism.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310935833.6A CN116664845B (en) | 2023-07-28 | 2023-07-28 | Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310935833.6A CN116664845B (en) | 2023-07-28 | 2023-07-28 | Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116664845A true CN116664845A (en) | 2023-08-29 |
CN116664845B CN116664845B (en) | 2023-10-13 |
Family
ID=87717426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310935833.6A Active CN116664845B (en) | 2023-07-28 | 2023-07-28 | Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116664845B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118230261A (en) * | 2024-05-27 | 2024-06-21 | 四川省建筑科学研究院有限公司 | Smart construction site construction safety early warning method and system based on image data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801104A (en) * | 2021-01-20 | 2021-05-14 | 吉林大学 | Image pixel level pseudo label determination method and system based on semantic segmentation |
CN113657393A (en) * | 2021-08-16 | 2021-11-16 | 山东建筑大学 | A semi-supervised image segmentation method and system with missing shape priors |
CN114283162A (en) * | 2021-12-27 | 2022-04-05 | 河北工业大学 | Real-world image segmentation method based on contrastive self-supervised learning |
CN114359873A (en) * | 2022-01-06 | 2022-04-15 | 中南大学 | Weak supervision vehicle feasible region segmentation method integrating road space prior and region level characteristics |
CN115019039A (en) * | 2022-05-26 | 2022-09-06 | 湖北工业大学 | An instance segmentation method and system combining self-supervision and global information enhancement |
CN115953784A (en) * | 2022-12-27 | 2023-04-11 | 江南大学 | Laser coded character segmentation method based on residual and feature block attention |
WO2023056889A1 (en) * | 2021-10-09 | 2023-04-13 | 百果园技术(新加坡)有限公司 | Model training and scene recognition method and apparatus, device, and medium |
CN116229465A (en) * | 2023-02-27 | 2023-06-06 | 哈尔滨工程大学 | A Weakly Supervised Semantic Segmentation Method for Ships |
WO2023102223A1 (en) * | 2021-12-03 | 2023-06-08 | Innopeak Technology, Inc. | Cross-coupled multi-task learning for depth mapping and semantic segmentation |
-
2023
- 2023-07-28 CN CN202310935833.6A patent/CN116664845B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801104A (en) * | 2021-01-20 | 2021-05-14 | 吉林大学 | Image pixel level pseudo label determination method and system based on semantic segmentation |
CN113657393A (en) * | 2021-08-16 | 2021-11-16 | 山东建筑大学 | A semi-supervised image segmentation method and system with missing shape priors |
WO2023056889A1 (en) * | 2021-10-09 | 2023-04-13 | 百果园技术(新加坡)有限公司 | Model training and scene recognition method and apparatus, device, and medium |
WO2023102223A1 (en) * | 2021-12-03 | 2023-06-08 | Innopeak Technology, Inc. | Cross-coupled multi-task learning for depth mapping and semantic segmentation |
CN114283162A (en) * | 2021-12-27 | 2022-04-05 | 河北工业大学 | Real-world image segmentation method based on contrastive self-supervised learning |
CN114359873A (en) * | 2022-01-06 | 2022-04-15 | 中南大学 | Weak supervision vehicle feasible region segmentation method integrating road space prior and region level characteristics |
CN115019039A (en) * | 2022-05-26 | 2022-09-06 | 湖北工业大学 | An instance segmentation method and system combining self-supervision and global information enhancement |
CN115953784A (en) * | 2022-12-27 | 2023-04-11 | 江南大学 | Laser coded character segmentation method based on residual and feature block attention |
CN116229465A (en) * | 2023-02-27 | 2023-06-06 | 哈尔滨工程大学 | A Weakly Supervised Semantic Segmentation Method for Ships |
Non-Patent Citations (3)
Title |
---|
ZHANG, PINGPING 等: "Deep gated attention networks for large-scale street-level scene segmentation", 《PATTERN RECOGNITION》, pages 702 - 714 * |
彭启伟;冯杰;吕进;余磊;程鼎;: "基于全局注意力机制的语义分割方法研究", 现代信息科技, no. 04, pages 110 - 112 * |
李宾皑;李颖;郝鸣阳;顾书玉;: "弱监督学习语义分割方法综述", 数字通信世界, no. 07, pages 263 - 265 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118230261A (en) * | 2024-05-27 | 2024-06-21 | 四川省建筑科学研究院有限公司 | Smart construction site construction safety early warning method and system based on image data |
Also Published As
Publication number | Publication date |
---|---|
CN116664845B (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220129731A1 (en) | Method and apparatus for training image recognition model, and method and apparatus for recognizing image | |
US11775574B2 (en) | Method and apparatus for visual question answering, computer device and medium | |
JP2023527615A (en) | Target object detection model training method, target object detection method, device, electronic device, storage medium and computer program | |
CN109977832B (en) | Image processing method, device and storage medium | |
CN113408662B (en) | Image recognition and training method and device for image recognition model | |
CN110020658B (en) | Salient object detection method based on multitask deep learning | |
CN112954399B (en) | Image processing method and device and computer equipment | |
CN115861462B (en) | Training method and device for image generation model, electronic equipment and storage medium | |
CN114612741A (en) | Defect recognition model training method and device, electronic equipment and storage medium | |
CN113139490B (en) | Image feature matching method and device, computer equipment and storage medium | |
CN113902007A (en) | Model training method and device, image recognition method and device, equipment and medium | |
CN111476133A (en) | Object extraction method for unmanned vehicle-oriented foreground and background encoder-decoder network | |
CN114418930A (en) | Underwater whale target detection method based on light YOLOv4 | |
CN116664845A (en) | Smart construction site image segmentation method and system based on inter-block contrastive attention mechanism | |
Cao et al. | An improved YOLOv4 lightweight traffic sign detection algorithm | |
CN114913339A (en) | Training method and device of feature map extraction model | |
CN117523315A (en) | Small sample remote sensing scene image classification methods, devices, equipment and media | |
CN117671364A (en) | Model processing method and device for image recognition, electronic equipment and storage medium | |
CN117315516A (en) | Unmanned aerial vehicle detection method and device based on multiscale attention-like distillation | |
CN116778446A (en) | Method and device for rapidly detecting lane lines and readable medium | |
CN116863437A (en) | Lane line detection model training method, device, equipment, medium and vehicle | |
CN113221796B (en) | Vector neuron-based pedestrian attribute identification method and system | |
CN113343979B (en) | Method, apparatus, device, medium and program product for training a model | |
CN115937993A (en) | Living body detection model training method, living body detection device and electronic equipment | |
CN114842541A (en) | Model training and face recognition method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |