CN114897916A

CN114897916A - Image processing method and device, nonvolatile readable storage medium and electronic equipment

Info

Publication number: CN114897916A
Application number: CN202210493747.XA
Authority: CN
Inventors: 杨勇杰; 林崇仰; 王进
Original assignee: Rainbow Software Co ltd
Current assignee: Rainbow Software Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-08-12
Also published as: KR20250007637A; WO2023217046A1

Abstract

The application discloses an image processing method and device, a nonvolatile readable storage medium and electronic equipment. Wherein, the method comprises the following steps: acquiring first image data of a target object and second image data of a target scene; segmenting the first image data according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data; and fusing the first target area and the second image data to generate a fused image. The method and the device solve the technical problem that the background environment cannot be considered due to the lack of the respective characteristics of the front camera and the rear camera.

Description

Image processing method and apparatus, non-volatile readable storage medium, electronic device

技术领域technical field

本申请涉及图像处理技术领域，具体而言，涉及一种图像处理方法及装置、非易失性可读存储介质、电子设备。The present application relates to the technical field of image processing, and in particular, to an image processing method and apparatus, a non-volatile readable storage medium, and an electronic device.

背景技术Background technique

相机是智能终端中最重要的传感器之一，使用前置摄像头自拍，以及后置摄像头拍摄他人和风景，无论是存储还是分享，都已经成为当代许多人日常生活中最重要的组成部分。目前绝大多数的智能终端中相机及应用，都只基于前置相机或后置相机之一，来获取图像或视频数据及处理，缺乏有效利用前置相机和后置相机各自的特性，做出有趣的人像图像或视频算法特效。并且在自拍的时候，拍摄者的注意力会关注在自身上，而无法兼顾背景环境。The camera is one of the most important sensors in a smart terminal. Using the front camera to take selfies and the rear camera to capture other people and landscapes, whether for storage or sharing, has become the most important part of the daily life of many people today. At present, the vast majority of cameras and applications in smart terminals are only based on one of the front camera or the rear camera to obtain image or video data and process them. Interesting portrait image or video algorithm special effects. And when taking a selfie, the photographer's attention will focus on himself, and cannot take into account the background environment.

针对上述的问题，目前尚未提出有效的解决方案。For the above problems, no effective solution has been proposed yet.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种图像处理方法及装置、非易失性可读存储介质、电子设备，以至少解决由于缺乏有效利用前置相机和后置相机各自的特性造成的无法兼顾背景环境的技术问题。Embodiments of the present application provide an image processing method and device, a non-volatile readable storage medium, and an electronic device, to at least solve the problem of inability to take into account the background environment due to the lack of effective use of the respective characteristics of the front camera and the rear camera. technical problem.

根据本申请实施例的一个方面，提供了一种图像处理方法，包括：获取目标对象的第一图像数据以及目标场景的第二图像数据；根据所述第一图像数据的类型对所述第一图像数据进行分割处理，得到所述第一图像数据中与所述目标对象对应的第一目标区域；将所述第一目标区域与所述第二图像数据进行融合处理，生成融合后的图像。According to an aspect of the embodiments of the present application, an image processing method is provided, including: acquiring first image data of a target object and second image data of a target scene; The image data is segmented to obtain a first target area corresponding to the target object in the first image data; the first target area and the second image data are fused to generate a fused image.

可选地，根据所述第一图像数据的类型对所述第一图像数据进行分割处理，得到所述第一图像数据中与所述目标对象对应的第一目标区域，包括：当所述第一图像数据是视频，利用预先训练的第一神经网络模型对所述第一图像数据进行分割处理，以得到所述目标对象对应的第一目标区域；当所述第一图像数据是图片，利用预先确定的第二神经网络模型对所述第一图像数据进行分割处理，得到所述目标对象对应的第一目标区域。Optionally, performing segmentation processing on the first image data according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data, including: when the first image data is An image data is a video, and the first image data is segmented by using a pre-trained first neural network model to obtain a first target area corresponding to the target object; when the first image data is a picture, use The predetermined second neural network model performs segmentation processing on the first image data to obtain a first target area corresponding to the target object.

可选地，对所述第一图像数据进行类别判断之前，所述方法还包括：对所述第一图像数据进行内容识别和判断；当输入的图像数据被判定为属于分割处理所支持的场景时，所述第一图像数据进行类别判断；当输入的图像数据被判定为不属于分割处理所支持的场景时，所述第一图像数据结束处理。Optionally, before the category judgment is performed on the first image data, the method further includes: performing content recognition and judgment on the first image data; when the input image data is judged to belong to a scene supported by the segmentation process When it is determined that the first image data does not belong to the scene supported by the segmentation process, the first image data is determined to be a category; when the input image data is determined not to belong to a scene supported by the segmentation process, the first image data ends the process.

可选地，所述第一神经网络模型为采用骨干网络和解码网络跨层连接的方式作为网络结构的轻量级模型。Optionally, the first neural network model is a lightweight model in which a backbone network and a decoding network are connected across layers as a network structure.

可选地，所述预先训练的第一神经网络模型的训练方法包括：获取训练数据集，其中，所述训练数据集包括：第一样本图像数据和第一样本目标区域，其中，所述第一样本目标区域为基于所述第一样本图像数据获得目标区域掩码图；基于所述训练数据集训练神经网络模型，生成所述第一神经网络模型，其中，在训练所述第一神经网络模型的过程中，基于帧间信息对所述第一神经网络模型进行一致性约束。Optionally, the training method of the pre-trained first neural network model includes: acquiring a training data set, wherein the training data set includes: first sample image data and a first sample target area, wherein the The first sample target area is to obtain a target area mask map based on the first sample image data; the neural network model is trained based on the training data set, and the first neural network model is generated, wherein, in the training of the During the process of the first neural network model, consistency constraints are imposed on the first neural network model based on the inter-frame information.

可选地，在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后，所述方法还包括：将所述第一目标区域转化为灰度掩码图，并将所述灰度掩码图的边界进行平滑。Optionally, after using the pre-trained first neural network model to obtain the first target area corresponding to the target object, the method further includes: converting the first target area into a grayscale mask, and converting the first target area into a grayscale mask. The boundaries of the grayscale mask are smoothed.

可选地，在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后，所述方法还包括：获取所述第一图像数据的前帧分割结果，并利用所述前帧分割结果对所述目标对应的所述第一目标区域进行时域平滑滤波。Optionally, after obtaining the first target area corresponding to the target object by using the pre-trained first neural network model, the method further includes: obtaining a previous frame segmentation result of the first image data, and using the The first target area corresponding to the target is subjected to temporal smoothing filtering on the result of the previous frame segmentation.

可选地，所述第二神经网络模型为加入空洞卷积采用和注意力机制的卷积神经网络模型。Optionally, the second neural network model is a convolutional neural network model with atrous convolution adoption and attention mechanism added.

可选地，在利用预先训练的第二神经网络模型得到所述目标对象对应的第一目标区域之后，所述方法还包括：结合所述第一图像数据，并利用预先训练的第三神经网络对所述第一目标区域进行优化处理，得到处理后的第一目标区域。Optionally, after obtaining the first target area corresponding to the target object by using a pre-trained second neural network model, the method further includes: combining the first image data and using a pre-trained third neural network Optimizing the first target area is performed to obtain a processed first target area.

可选地，所述预先训练的第三神经网络模型的训练方法包括：获取纯色背景的图像；对所述纯色背景的图像进行抠图处理和预标注，得到标签蒙版图像；以所述纯色背景的图像和所述标签蒙版图像作为样本数据训练得到所述第三神经网络模型。Optionally, the training method of the pre-trained third neural network model includes: acquiring an image of a solid-color background; performing matting processing and pre-labeling on the image of the solid-color background to obtain a label mask image; The third neural network model is obtained by training the background image and the label mask image as sample data.

可选地，将所述第一目标区域与所述第二图像数据进行融合处理，生成融合后的图像之前，所述方法还包括：将所述第一目标区域输入后处理模块进行后处理。Optionally, before performing fusion processing on the first target area and the second image data to generate a fused image, the method further includes: inputting the first target area into a post-processing module for post-processing.

可选地，将所述第一目标区域与所述第二图像进行融合处理，生成融合后的图像，包括：评估所述第二图像数据的环境信息，根据所述环境信息校正所述第一目标区域，获得校正后的第一目标区域；确定所述第一目标区域在所述第二图像数据中对应的第二目标区域；将所述第二目标区域替换为所述校正后的第一目标区域。Optionally, performing fusion processing on the first target area and the second image to generate a fused image includes: evaluating the environmental information of the second image data, and correcting the first target area according to the environmental information. target area, obtain the corrected first target area; determine the second target area corresponding to the first target area in the second image data; replace the second target area with the corrected first target area target area.

根据本申请实施例的另一方面，还提供了一种图像处理装置，包括：获取模块，用于获取目标对象的第一图像数据以及目标场景的第二图像数据，其中，采集所述第一图像数据和所述第二图像数据的采集装置位于同一个终端设备上；分割模块，用于对所述第一图像数据进行分割处理，得到所述第一图像数据中与所述目标对象对应的第一目标区域；融合模块，用于将所述第一目标区域与所述第二图像数据进行融合处理，生成融合后的图像。According to another aspect of the embodiments of the present application, an image processing apparatus is further provided, including: an acquisition module configured to acquire first image data of a target object and second image data of a target scene, wherein the first image data is acquired The device for collecting image data and the second image data is located on the same terminal device; the segmentation module is used to perform segmentation processing on the first image data to obtain the first image data corresponding to the target object. a first target area; a fusion module, configured to perform fusion processing on the first target area and the second image data to generate a fused image.

根据本申请实施例的再一方面，还提供了一种非易失性存储介质，非易失性存储介质包括存储的程序，其中，在程序运行时控制非易失性存储介质所在设备执行上述图像的处理方法。According to yet another aspect of the embodiments of the present application, a non-volatile storage medium is also provided, the non-volatile storage medium includes a stored program, wherein when the program runs, the device where the non-volatile storage medium is located is controlled to execute the above image processing method.

根据本申请实施例的再一方面，还提供了一种电子设备，包括存储器和处理器；处理器用于运行程序，其中，程序运行时执行上述图像的处理方法。According to another aspect of the embodiments of the present application, an electronic device is also provided, including a memory and a processor; the processor is used for running a program, wherein the above-mentioned image processing method is executed when the program is running.

在本申请实施例中，采用获取目标对象的第一图像数据以及目标场景的第二图像数据；根据所述第一图像数据的类型对所述第一图像数据进行分割处理，得到所述第一图像数据中与所述目标对象对应的第一目标区域；将所述第一目标区域与所述第二图像数据进行融合处理，生成融合后的图像的方式，通过同一终端设备上的采集装置分别采集第一图像和第二图像并将第一图像、第二图像进行融合，达到了充分利用前置与后置摄像头的目的，从而实现了兼顾背景环境的技术效果，进而解决了由于缺乏有效利用前置相机和后置相机各自的特性造成的无法兼顾背景环境技术问题。In the embodiment of the present application, the first image data of the target object and the second image data of the target scene are acquired; the first image data is segmented according to the type of the first image data, and the first image data is obtained. The first target area corresponding to the target object in the image data; the method of merging the first target area and the second image data to generate a fused image is performed by the acquisition device on the same terminal device. Collect the first image and the second image and fuse the first image and the second image to achieve the purpose of making full use of the front and rear cameras, thereby realizing the technical effect of taking into account the background environment, and solving the problem of lack of effective utilization. Due to the respective characteristics of the front camera and the rear camera, it is impossible to take into account the technical problems of the background environment.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

图1是根据本申请实施例的一种可选的图像处理方法的示意图；1 is a schematic diagram of an optional image processing method according to an embodiment of the present application;

图2是根据本申请实施例的另一种可选的图像处理方法的示意图；2 is a schematic diagram of another optional image processing method according to an embodiment of the present application;

图3是根据本申请实施例的一种可选的图像分割处理流程示意图；3 is a schematic diagram of an optional image segmentation processing flow according to an embodiment of the present application;

图4是根据本申请实施例的一种可选的图像处理装置示意图。FIG. 4 is a schematic diagram of an optional image processing apparatus according to an embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only The embodiments are part of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the scope of protection of the present application.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

根据本申请实施例，提供了一种图像的处理的方法实施例，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present application, an embodiment of a method for processing an image is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings may be executed in a computer system such as a set of computer-executable instructions, and, Although a logical order is shown in the flowcharts, in some cases steps shown or described may be performed in an order different from that herein.

图1是根据本申请实施例的一种图像的处理方法，如图1所示，该方法包括如下步骤：FIG. 1 is an image processing method according to an embodiment of the present application. As shown in FIG. 1 , the method includes the following steps:

步骤S102，获取目标对象的第一图像数据以及目标场景的第二图像数据；Step S102, obtaining the first image data of the target object and the second image data of the target scene;

步骤S104，根据第一图像数据的类型对第一图像数据进行分割处理，得到第一图像数据中与目标对象对应的第一目标区域；Step S104, performing segmentation processing on the first image data according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data;

步骤S106，将第一目标区域与第二图像数据进行融合处理，生成融合后的图像。Step S106, performing fusion processing on the first target area and the second image data to generate a fused image.

通过上述步骤，可以实现位于同一终端设备上的采集装置分别采集第一图像数据和第二图像数据，并将第一图像数据包含的目标对象与第二图像数据进行融合，达到了充分利用前置与后置摄像头优势的目的，从而实现了兼顾背景环境的技术效果，进而解决了由于缺乏有效利用前置相机和后置相机各自的特性造成的无法兼顾背景环境技术问题。Through the above steps, the acquisition device located on the same terminal device can collect the first image data and the second image data respectively, and fuse the target object contained in the first image data with the second image data, so as to make full use of the front-end image data. In this way, the technical effect of taking into account the background environment is achieved, thereby solving the technical problem that the background environment cannot be taken into account due to the lack of effective use of the respective characteristics of the front camera and the rear camera.

需要进行说明的是，上述图像数据的采集装置是同一终端(例如，手机和笔记本)的前置摄像头和后置摄像头，采集的图像数据中包含目标对象的即为第一图像数据，故第一图像数据采集装置并非固定为前置摄像头，亦可以为后置摄像头，同理，第二图像数据的采集装置为前置摄像头或后置摄像头；第一图像数据和第二图像数据可以是图片也可以是视频；目标对象可以是人像也可以是其他对象，例如动物和物品；目标场景可以为目标对象所处的场景或任意虚拟场景。It should be noted that the above-mentioned image data collection devices are the front camera and the rear camera of the same terminal (for example, a mobile phone and a notebook), and the collected image data that includes the target object is the first image data, so the first image data The image data acquisition device is not fixed to a front camera, but can also be a rear camera. Similarly, the second image data acquisition device is a front camera or a rear camera; the first image data and the second image data can be pictures or It can be a video; the target object can be a portrait or other objects, such as animals and objects; the target scene can be the scene where the target object is located or any virtual scene.

具体的，以利用手机的前置摄像头拍摄人像图像、后置摄像头拍摄背景视频为例，图像处理的步骤可以如图2所示，采集的人像图像和背景视频分别经过不同的处理流程后进行融合得到融合后的视频，一方面，实现了在同一时刻同时采集了前景和背景信息，可以更多地保留和还原了拍摄地点的场景信息；另一方面，充分利用了手机前后置摄像头各自的优势，既能够方便观察自拍人像的成像情况，也方便观察整个背景场景的信息以便获取更好的取景。Specifically, taking the use of the front camera of the mobile phone to shoot the portrait image and the rear camera to shoot the background video as an example, the steps of image processing can be as shown in Figure 2. The collected portrait image and background video are fused after different processing procedures. The merged video is obtained. On the one hand, the foreground and background information can be collected at the same time, and more scene information of the shooting location can be preserved and restored; on the other hand, the advantages of the front and rear cameras of the mobile phone are fully utilized. , which not only facilitates observation of the imaging situation of the self-portrait portrait, but also facilitates observation of the information of the entire background scene for better framing.

在本申请的一些实施例中，根据所述第一图像数据的类型对所述第一图像数据进行分割处理，得到所述第一图像数据中与所述目标对象对应的第一目标区域，包括：当所述第一图像数据是视频，利用预先训练的第一神经网络模型对所述第一图像数据进行分割处理，以得到所述目标对象对应的第一目标区域；当所述第一图像数据是图片，利用预先确定的第二神经网络模型对所述第一图像数据进行分割处理，得到所述目标对象对应的第一目标区域。In some embodiments of the present application, the first image data is segmented according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data, including : when the first image data is a video, use the pre-trained first neural network model to segment the first image data to obtain the first target area corresponding to the target object; when the first image The data is a picture, and the first image data is segmented by using a predetermined second neural network model to obtain a first target area corresponding to the target object.

需要进行说明的是，由于本申请不限制终端采集数据的类型，故采集的第一图像数据和第二图像数据可以是图片也可以是视频，在实际分割中针对不同类型的数据在处理上目标和方法不同，故在对图像数据进行分割处理之前，对上述第一图像数据进行类别判断。进一步的，针对视频数据的分割处理，不仅要保证精度还要同时维持实时处理，示例性的，第一神经网络模型可以为轻量级神经网络模型，用于对视频图像进行分割处理；针对图片数据的分割处理，由于图像对细节提出更高要求，示例性的，第二神经网络模型可以为卷积神经网络模型。It should be noted that, since this application does not limit the types of data collected by the terminal, the collected first image data and second image data may be pictures or videos. In actual segmentation, the processing targets for different types of data are Different from the method, before the segmentation processing of the image data, the category judgment is performed on the above-mentioned first image data. Further, for the segmentation processing of video data, it is necessary to not only ensure the accuracy but also maintain real-time processing. Exemplarily, the first neural network model may be a lightweight neural network model, which is used for segmentation processing of video images; For data segmentation processing, since images have higher requirements on details, exemplary, the second neural network model can be a convolutional neural network model.

在本申请的一些实施例中，对所述第一图像数据进行类别判断之前，所述方法还包括：对所述第一图像数据进行内容识别和判断；当输入的图像数据被判定为属于分割处理所支持的场景时，所述第一图像数据进行类别判断；当输入的图像数据被判定为不属于分割处理所支持的场景时，所述第一图像数据结束处理。In some embodiments of the present application, before the category judgment is performed on the first image data, the method further includes: performing content recognition and judgment on the first image data; when the input image data is judged to belong to segmentation When processing a supported scene, the first image data performs category judgment; when it is judged that the input image data does not belong to a scene supported by the segmentation process, the first image data ends the processing.

具体的，以移动设备的图像数据作为输入为例，通过计算机视觉的方法，实现对图片中目标区域的提取，处理流程如图3所示：输入图像数据然后对图像进行预处理，对预处理过后的图像进行辨别是否属于分割引擎所支持的场景，若支持则进行处理，若不支持则不进行处理。首先，分割引擎支持的场景要求图像包含目标对象，其中，目标对象的种类是用户预设，并通过检测识别算法获取图像包含的目标对象的种类；其次，目标对象距离采集装置的距离需要满足预设条件，若目标对象与采集装置之间的距离超过预设范围，过近会导致采集的目标对象不全，例如只有部分五官无法形成合照，过远会导致采集的目标对象细节损失严重，严重影响后续融合效果，故无法启动采集装置。Specifically, taking the image data of the mobile device as the input as an example, the extraction of the target area in the picture is realized by the method of computer vision. The processing flow is shown in Figure 3: input image data and then preprocess the image, Afterwards, it is judged whether it belongs to the scene supported by the segmentation engine, and if it is supported, it will be processed, and if it is not supported, it will not be processed. First, the scene supported by the segmentation engine requires the image to contain the target object, where the type of the target object is preset by the user, and the type of the target object contained in the image is obtained through the detection and recognition algorithm; secondly, the distance between the target object and the acquisition device needs to meet the predetermined Setting conditions, if the distance between the target object and the acquisition device exceeds the preset range, too close will result in incomplete acquisition of the target object. For example, only part of the facial features cannot form a group photo. If the distance is too far, the details of the acquired target object will be seriously lost and seriously affected. The subsequent fusion effect, so the acquisition device cannot be activated.

需要进行说明的是，图像数据的预处理为以移动设备图像数据为输入，根据后续分割网络要求将其转换为对应的数据格式，并对图像进行尺寸、色彩、角度等方面的调整，得到可用图像数据；对预处理过后的图像进行辨别是否属于分割引擎所支持的场景是通过基于海量带标签监督数据预先训练的场景判别卷积神经网络构成，能够快速准确的对输入图像的内容进行识别，仅当输入图像被判定为属于分割引擎所支持的场景时，才进入后续处理流程。识别网络是由卷积层、激活层、池化层堆叠组成的典型分类网络，考虑到移动设备端性能要求，网络层数和宽度上都做了严格的限制和优化，保证设备端毫秒级运行。It should be noted that the preprocessing of the image data is to take the image data of the mobile device as the input, convert it into the corresponding data format according to the subsequent segmentation network requirements, and adjust the size, color, angle, etc. of the image to obtain the available data. Image data; identifying whether the preprocessed image belongs to the scene supported by the segmentation engine is composed of a scene discrimination convolutional neural network pre-trained based on massive labeled supervision data, which can quickly and accurately identify the content of the input image, Only when the input image is determined to belong to the scene supported by the segmentation engine, the subsequent processing flow is entered. The recognition network is a typical classification network composed of a stack of convolutional layers, activation layers, and pooling layers. Considering the performance requirements of mobile devices, the number and width of the network are strictly limited and optimized to ensure millisecond-level operation on the device. .

在本申请的一些实施例中，所述第一神经网络模型为采用骨干网络和解码网络跨层连接的方式作为网络结构的轻量级模型。所述预先训练的第一神经网络模型的训练方法包括：获取训练数据集，其中，所述训练数据集包括：第一样本图像数据和第一样本目标区域，其中，所述第一样本目标区域为基于所述第一样本图像数据获得目标区域掩码图；基于所述训练数据集训练神经网络模型，生成所述第一神经网络模型，其中，在训练所述第一神经网络模型的过程中，基于帧间信息对所述第一神经网络模型进行一致性约束。In some embodiments of the present application, the first neural network model is a lightweight model using a cross-layer connection between a backbone network and a decoding network as a network structure. The training method of the pre-trained first neural network model includes: acquiring a training data set, wherein the training data set includes: first sample image data and a first sample target area, wherein the first sample This target area is to obtain a target area mask map based on the first sample image data; train a neural network model based on the training data set to generate the first neural network model, wherein, when training the first neural network In the process of modeling, the first neural network model is subject to consistency constraints based on the inter-frame information.

具体的，对于第一神经网络模型，其训练数据集的建立首先通过搜集大量的第一样本图像数据，其中，第一样本图像数据需同时包含对象和背景的合影，由于训练网络和预测网络是相互独立的，样本图像数据并不限制对象和场景为识别时的目标对象和目标背景，但是样本图像数据需要覆盖实际使用场景类别。此外，本申请并不限制第一样本图像数据的生成方式，可以实际场景采集亦可以为通过后期合成。进一步的，获取第一样本图像数据以后，通过人工或自动识别确定第一样本目标区域，即将通过对第一样本图像数据进行标注得到目标区域掩码图作为监督信息。考虑到视频应用的实时性要求，网络部分采用了轻量级的模型，有卷积层、激活层、池化层、反卷积组成。在网络层数，卷积类型，降采样位置等方面做了针对性优化。网络结构上采用骨干网络和解码网络跨层连接的方式，提高分割精度。同时，网络部署中借助运行硬件平台的指令优化、专用设备优化，达到实时分割的计算性能。训练过程中，为了提高视频时序上结果的稳定性，训练过程中加入了帧间信息一致性约束。Specifically, for the first neural network model, the training data set is first established by collecting a large amount of first sample image data, wherein the first sample image data needs to include a group photo of the object and the background at the same time. The networks are independent of each other, and the sample image data does not restrict the objects and scenes to be the target object and target background for recognition, but the sample image data needs to cover the actual use scene category. In addition, the present application does not limit the method of generating the first sample image data, which can be collected in actual scenes or synthesized in a later stage. Further, after acquiring the first sample image data, the target area of the first sample is determined by manual or automatic identification, that is, the target area mask map is obtained by annotating the first sample image data as the supervision information. Considering the real-time requirements of video applications, the network part adopts a lightweight model, which consists of a convolution layer, an activation layer, a pooling layer, and a deconvolution layer. Targeted optimization has been made in terms of the number of network layers, convolution types, and downsampling positions. The network structure adopts the cross-layer connection of the backbone network and the decoding network to improve the segmentation accuracy. At the same time, in the network deployment, with the help of the instruction optimization of the running hardware platform and the optimization of special equipment, the computing performance of real-time segmentation can be achieved. In the training process, in order to improve the stability of the video timing results, the inter-frame information consistency constraint is added to the training process.

在本申请的一些实施例中，在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后，所述方法还包括：将所述第一目标区域转化为灰度掩码图；并将所述灰度掩码图的边界进行平滑。In some embodiments of the present application, after obtaining the first target area corresponding to the target object by using the pre-trained first neural network model, the method further includes: converting the first target area into a grayscale mask code image; and smooth the boundary of the grayscale mask image.

具体的，视频通过预先训练的第一神经网络模型得到对应的第一目标区域之后，将第一目标区域转换为背景为0值的灰度掩码图。其次，通过图像处理算法去除小的孤立区域，平滑掩码边界，掩码的引入不仅有利于屏蔽目标区域外的噪声，可以充分提取和利用感兴趣的区域，从而优化分割结果。Specifically, after the video obtains the corresponding first target area through the pre-trained first neural network model, the first target area is converted into a grayscale mask image with a background value of 0. Secondly, the image processing algorithm removes small isolated areas and smoothes the mask boundary. The introduction of the mask is not only conducive to shielding the noise outside the target area, but also can fully extract and utilize the area of interest, thereby optimizing the segmentation results.

在本申请的一些实施例中，在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后，所述方法还包括：获取所述第一图像数据的前帧分割结果，并利用所述前帧分割结果对所述目标对应的第一目标区域进行时域平滑滤波。具体的，利用前帧的分割结果进行时域平滑滤波，增加了帧间稳定性，保证了输出视频结果的连续性。In some embodiments of the present application, after obtaining the first target area corresponding to the target object by using the pre-trained first neural network model, the method further includes: obtaining a previous frame segmentation result of the first image data , and use the previous frame segmentation result to perform temporal smoothing filtering on the first target area corresponding to the target. Specifically, the temporal smoothing filtering is performed by using the segmentation result of the previous frame, which increases the inter-frame stability and ensures the continuity of the output video result.

在本申请的一些实施例中，所述第二神经网络模型为加入了空洞卷积和注意力机制的卷积神经网络模型。In some embodiments of the present application, the second neural network model is a convolutional neural network model with atrous convolution and an attention mechanism added.

具体的，对于第二神经网络模型，该网络模型由参数量更多、结构更加复杂的深度卷积网络构成。考虑到拍照模式下对细节提出了更高的要求，在训练数据人工标注时就采用更加精细的标准，提高了监督数据的质量。同时拍照模式下对算力性能的要求降低，因此在网络部分适当的增加了空洞卷积、注意力机制等结构来提高网络解析能力。网络的深度、宽度、特征图大小上也做了适当放宽，从而达到更加精准的分割需求。部署阶段同样借助运行平台上的指令优化、专用设备优化，提升模型运行速度。Specifically, for the second neural network model, the network model is composed of a deep convolutional network with more parameters and a more complex structure. Taking into account the higher requirements for details in the photo mode, more refined standards are adopted when the training data is manually annotated, which improves the quality of the supervision data. At the same time, the requirements for computing power performance in the photo mode are reduced, so structures such as hole convolution and attention mechanism are appropriately added to the network part to improve the network analysis ability. The depth, width, and feature map size of the network have also been appropriately relaxed to achieve more accurate segmentation requirements. In the deployment stage, the optimization of instructions on the operating platform and the optimization of special equipment are also used to improve the running speed of the model.

需要进行说明的是，第一目标区域可以是目标对象在第一图像数据中所在的区域。It should be noted that the first target area may be the area where the target object is located in the first image data.

在本申请的一些实施例中，在利用预先训练的第二神经网络模型得到所述目标对象对应的第一目标区域之后，所述方法还包括：结合所述第一图像数据，并利用预先训练的第三神经网络对所述第一目标区域进行优化处理，得到处理后的第一目标区域。In some embodiments of the present application, after obtaining the first target area corresponding to the target object by using the pre-trained second neural network model, the method further includes: combining the first image data and using the pre-trained The third neural network of the device performs optimization processing on the first target area to obtain the processed first target area.

其中，所述预先训练的第三神经网络模型的训练方法包括：获取纯色背景的图像；对所述纯色背景的图像进行抠图处理和预标注，得到标签蒙版图像；以所述纯色背景的图像和所述标签蒙版图像为样本数据训练得到所述第三神经网络模型。Wherein, the training method of the pre-trained third neural network model includes: acquiring an image of a solid-color background; performing matting processing and pre-labeling on the image of the solid-color background to obtain a label mask image; The image and the label mask image are sample data to train to obtain the third neural network model.

需要进行说明的是，如图3所示，第三神经网络可以是Matting网络(一种用于图像精细化分割的神经网络)。It should be noted that, as shown in FIG. 3 , the third neural network may be a Matting network (a neural network used for refined image segmentation).

具体的，第三神经网络中输入为第二神经网络模型的输出，以及原始第一图像数据。由于分割网络分辨率和降采样等限制，通过第二神经网络获得的区域图在物体边缘、毛发等区域无法获得精细的分割结果。以Matting网络为例说明上述步骤，在第二神经网络输出的第一目标区域的基础上，Matting网络根据置信度获取trimap图(三分图)，同时在网络中加入注意力机制，能够使网络更加专注于边缘，提高边缘精度。网络的输出是一张蒙版图，即为处理过后的第一目标区域图，它是对每个像素位置不透明度的回归。Specifically, the input in the third neural network is the output of the second neural network model and the original first image data. Due to the limitations of segmentation network resolution and downsampling, the region map obtained by the second neural network cannot obtain fine segmentation results in areas such as object edges and hair. Taking the Matting network as an example to illustrate the above steps, on the basis of the first target area output by the second neural network, the Matting network obtains the trimap map (triple map) according to the confidence, and at the same time adds an attention mechanism to the network, which can make the network Focus more on edges and improve edge accuracy. The output of the network is a mask map, which is the processed first target area map, which is a regression of the opacity of each pixel position.

由于第三神经网络训练需要更加精细透明度作为监督数据，为此，本申请采用定点采集、自动预标注、人工修正三个步骤来获取训练数据。具体的，定点采集指搭建纯色背景采集环境，其环境光可调，获取较为自然的纯色背景数据。自动预标注采用图像抠图算法和训练的第三神经网络对纯色背景数据进行处理，得到初始蒙版图。最后，通过人工修正方式对初始蒙版图结果中错误区域进行微调，得到最终标注结果。在实施过程中，自动预标注部分的第三神经网络可以随着数据更新进行迭代，使得预标注效果不断提升，进一步逐渐降低人工标注成本。Since the training of the third neural network requires more fine-grained transparency as the supervision data, this application adopts three steps of fixed-point collection, automatic pre-labeling, and manual correction to obtain the training data. Specifically, fixed-point collection refers to building a solid-color background collection environment with adjustable ambient light to obtain relatively natural solid-color background data. The automatic pre-labeling uses the image matting algorithm and the trained third neural network to process the solid color background data to obtain the initial mask map. Finally, the error area in the initial mask map result is fine-tuned by manual correction, and the final labeling result is obtained. During the implementation process, the third neural network of the automatic pre-labeling part can iterate with the data update, so that the pre-labeling effect is continuously improved, and the manual labeling cost is further gradually reduced.

在本申请的一些实施例中，将所述第一目标区域与所述第二图像数据进行融合处理，生成融合后的图像之前，所述方法还包括：将所述第一目标区域输入后处理模块进行后处理。In some embodiments of the present application, before performing fusion processing on the first target area and the second image data to generate a fused image, the method further includes: inputting the first target area into post-processing module for post-processing.

需要进行说明的是，在终端设备是可移动的情况下，例如手机，往往是手持设备拍摄，所拍摄的视频往往包含一定的抖动，可以利用视频防抖模块通过硬件及软件技术，减轻或消除视频中的抖动，使得拍摄的视频更加稳定。本申请针对目标图像种类包含视频的，均会加入防抖模块进行防抖处理。It should be noted that when the terminal device is movable, such as a mobile phone, it is often shot by a handheld device, and the captured video often contains a certain amount of jitter. The video anti-shake module can be used to reduce or eliminate it through hardware and software technology. Shaking in the video, making the captured video more stable. In this application, if the target image type includes video, an anti-shake module will be added to perform anti-shake processing.

在一些可选的方式中，由于受手机算力及内存的限制，基于神经网络的语义分割算法只能处理较小分辨率的图像，一般不大于512x512像素，而这相对于初始图像高清(720P)、超清(1080P)甚至4K等大分辨率，往往要小得多。一般的处理方式，是先将图像下采样至神经网络分辨率，获得结果后再上采样至原始分辨率。而直接上采样后的结果和原始分辨率在边界上往往是对不齐的，且由于存在缩放的关系，会造成小图的细节丢失，再进行上采样后，对于小图上无法显现的细节获得的分割结果也是不准确的。基于此，本申请采用后处理模块，基于第一神经网络、第二神经网络或第三神经网络的分割结果，通过保持边缘滤波器算法计算出和原始分辨率图像边界对齐更准确，以及细节更丰富的人像前景区域。In some optional ways, due to the limitation of the computing power and memory of the mobile phone, the neural network-based semantic segmentation algorithm can only process images with smaller resolutions, generally no larger than 512x512 pixels, which is relatively high-definition (720P) images compared to the original images. ), Ultra HD (1080P), or even 4K, which tend to be much smaller. The general processing method is to first downsample the image to the neural network resolution, and then upsample to the original resolution after obtaining the result. The result after direct upsampling and the original resolution are often misaligned on the boundary, and due to the scaling relationship, the details of the small image will be lost. After upsampling, the details that cannot be displayed on the small image will be lost. The obtained segmentation results are also inaccurate. Based on this, the present application adopts a post-processing module, and based on the segmentation results of the first neural network, the second neural network or the third neural network, the boundary alignment with the original resolution image calculated by the edge filter algorithm is more accurate, and the details are more accurate. Rich portrait foreground area.

在本申请的一些实施例中，将所述第一目标区域与所述第二图像数据进行融合处理，生成融合后的图像，包括：估所述第二图像数据的环境信息，根据所述环境信息校正所述第一目标区域，获得校正后的第一目标区域；确定所述第一目标区域在所述第二图像数据中对应的第二目标区域；将所述第二目标区域替换为所述校正后的第一目标区域。In some embodiments of the present application, performing fusion processing on the first target area and the second image data to generate a fused image includes: estimating the environment information of the second image data, according to the environment information to correct the first target area to obtain the corrected first target area; determine the second target area corresponding to the first target area in the second image data; replace the second target area with the The corrected first target area.

融合是根据前背景各自的特点实现使得结果尽可能自然，包括保留前景肤色的自然性，且调整前景的色调能够更自然地适应背景的色调，使得融合后的图像视觉上形成一个完整的整体，而减少抠图痕迹。本申请不限制融合的对象种类，可以实现包括图像前景和图像背景、图像前景和视频背景、视频前景和视频背景以及视频前景和图像背景的融合。示例性的，以人像前景区域为第一目标区域和背景视频结果为第二图像，将输出的人像与背景视频融合的结果视频。Fusion is based on the characteristics of the front and the background to make the result as natural as possible, including retaining the naturalness of the foreground skin color, and adjusting the color tone of the foreground can more naturally adapt to the color tone of the background, so that the fused image visually forms a complete whole. And reduce the cutout marks. The present application does not limit the types of objects to be fused, and the fusion of image foreground and image background, image foreground and video background, video foreground and video background, and video foreground and image background can be implemented. Exemplarily, taking the foreground area of the portrait as the first target area and the background video result as the second image, the resultant video of the output portrait and the background video is fused.

具体的，由于前景和背景目标对象位置和图像质量不同，融合模块中直接替换将导致最终结果不协调。故本申请会首先评估背景的环境信息，包括光照的方向、强度、颜色等信息，再对前景进行相应的校正，保证前背景图像图像质量统一。其次，针对放置位置，前景区域放置于背景区域的位置一般放置于中心，此外，前景图像的上、下、左、右任意一条或多条边界接触到图像边界，融合结果也会保留该特性。在实际的应用中，采集的背景图像待融合区域并不是纯背景，会存在有遮挡的情况，例如区域内出现非目标对象的人和物等，针对上述区域图像存在遮挡情况，本申请还包括对背景图像进行目标识别，当识别结果包含遮挡目标时，则对用户进行提示。Specifically, due to the difference in the position and image quality of the foreground and background target objects, direct replacement in the fusion module will lead to inconsistent final results. Therefore, this application will first evaluate the environmental information of the background, including the direction, intensity, color and other information of the illumination, and then correct the foreground accordingly to ensure the uniform image quality of the front and background images. Secondly, for the placement position, the position where the foreground area is placed in the background area is generally placed in the center. In addition, any one or more boundaries of the upper, lower, left, and right of the foreground image touch the image boundary, and the fusion result will also retain this feature. In practical applications, the collected background image to be fused is not a pure background, and there may be occlusions, such as persons and objects that are not target objects in the area. For the occlusion of the images in the above-mentioned areas, this application also includes Perform target recognition on the background image, and prompt the user when the recognition result contains an occluded target.

此外，若待融合的对象包括视频，融合计算涉及的计算量巨大，为了不仅要保证精度还要同时维持实时处理，对对应的对象进行简化处理，包括优化整体流程，只进行轻量级网络分割，而不使用精细级的网络，以及对各模块都进行相应的简化，比如减少甚至完全去除后处理，对使用的轻量级网络进一步压缩模型。In addition, if the object to be fused includes video, the amount of calculation involved in the fusion calculation is huge. In order to not only ensure accuracy but also maintain real-time processing, simplify the processing of the corresponding object, including optimizing the overall process, and only perform lightweight network segmentation. , instead of using a fine-level network, and simplifying each module accordingly, such as reducing or even completely removing post-processing, and further compressing the model for the lightweight network used.

本申请实施例还提供了一种图像处理装置，如图4所示，包括：获取模块40，用于获取目标对象的第一图像数据以及目标场景的第二图像数据；分割模块42，用于根据所述第一图像数据的类型对所述第一图像数据进行分割处理，得到所述第一图像数据中与所述目标对象对应的第一目标区域；融合模块44，将所述第一目标区域与所述第二图像数据进行融合处理，生成融合后的图像。The embodiment of the present application also provides an image processing apparatus, as shown in FIG. 4 , including: an acquisition module 40 for acquiring first image data of a target object and second image data of a target scene; a segmentation module 42 for acquiring The first image data is segmented according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data; the fusion module 44 is to divide the first target The region is fused with the second image data to generate a fused image.

分割模块42包括：第一处理子模块；判断子模块用于对所述第一图像数据进行类别判断；当所述第一图像数据是视频，利用预先训练的第一神经网络模型对所述第一图像数据进行分割处理，以得到所述目标对象对应的第一目标区域；当所述第一图像数据是图片，利用预先确定的第二神经网络模型对所述第一图像数据进行分割处理，得到所述目标对象对应的第一目标区域。The segmentation module 42 includes: a first processing sub-module; the judging sub-module is used to perform category judgment on the first image data; when the first image data is a video, the pre-trained first neural network model is used to classify the first image data. Perform segmentation processing on image data to obtain a first target area corresponding to the target object; when the first image data is a picture, use a predetermined second neural network model to perform segmentation processing on the first image data, A first target area corresponding to the target object is obtained.

判断子模块包括：判断单元、第一训练单元、转化单元；The judgment sub-module includes: a judgment unit, a first training unit, and a transformation unit;

判断单元用于对所述第一图像数据进行内容识别和判断，当输入的图像数据被判定为属于分割处理所支持的场景时，所述第一图像数据进行类别判断；当输入的图像数据被判定为不属于分割处理所支持的场景时，所述第一图像数据结束处理。The judging unit is used to identify and judge the content of the first image data. When the inputted image data is judged to belong to the scene supported by the segmentation process, the first image data is judged by category; When it is determined that it does not belong to a scene supported by the segmentation process, the first image data ends the process.

第一训练单元用于获取训练数据集，其中，所述训练数据集包括：第一样本图像数据和第一样本目标区域，其中，所述第一样本目标区域为基于所述第一样本图像数据获得目标区域掩码图；基于所述训练数据集训练神经网络模型，生成所述第一神经网络模型，其中，在训练所述第一神经网络模型的过程中，基于帧间信息对所述第一神经网络模型进行一致性约束。The first training unit is used to obtain a training data set, wherein the training data set includes: first sample image data and a first sample target area, wherein the first sample target area is based on the first sample target area. Obtain a target area mask map from sample image data; train a neural network model based on the training data set, and generate the first neural network model, wherein, in the process of training the first neural network model, based on the inter-frame information Consistency constraints are imposed on the first neural network model.

转化单元用于将所述第一目标区域转化为灰度掩码图；并将所述灰度掩码图的边界进行平滑。The conversion unit is used for converting the first target area into a grayscale mask image; and smoothing the boundary of the grayscale mask image.

第一处理子模块用于结合所述第一图像数据，并利用预先训练的第三神经网络对所述第一目标区域进行优化处理，得到处理后的第一目标区域。The first processing sub-module is configured to combine the first image data, and use a pre-trained third neural network to perform optimization processing on the first target area to obtain a processed first target area.

第一处理子模块包括：第二训练单元；The first processing submodule includes: a second training unit;

第二训练单元用于获取纯色背景的图像；对所述纯色背景的图像进行抠图处理和预标注，得到标签蒙版图像；以所述纯色背景的图像为样本数据，所述标签蒙版图像作为所述监督数据训练得到所述第三神经网络模型。The second training unit is used to obtain an image of a solid-color background; performing matting processing and pre-labeling on the image of the solid-color background to obtain a label mask image; taking the image of the solid-color background as sample data, the label mask image The third neural network model is obtained by training as the supervision data.

融合模块44包括：第二处理子模块和生成子模块；第二处理子模块用于将所述第一目标区域输入后处理模块进行后处理。The fusion module 44 includes: a second processing sub-module and a generating sub-module; the second processing sub-module is configured to input the first target region into a post-processing module for post-processing.

生成子模块用于评估所述第二图像数据的环境信息，根据所述环境信息校正所述第一目标区域，获得校正后的第一目标区域；The generating submodule is used to evaluate the environmental information of the second image data, correct the first target area according to the environmental information, and obtain the corrected first target area;

确定所述第一目标区域在所述第二图像数据中对应的第二目标区域；determining a second target area corresponding to the first target area in the second image data;

将所述第二目标区域替换为所述校正后的第一目标区域。The second target area is replaced with the corrected first target area.

上述非易失性存储介质用于存储执行以下功能的程序：获取目标对象的第一图像以及目标场景的第二图像；根据所述第一图像数据的类型对所述第一图像数据进行分割处理，得到第一图像中与目标对象对应的第一目标区域；将第一目标区域与第二图像进行融合处理，生成融合后的图像。The above-mentioned non-volatile storage medium is used for storing a program that performs the following functions: acquiring a first image of a target object and a second image of a target scene; performing segmentation processing on the first image data according to the type of the first image data , obtain the first target area corresponding to the target object in the first image; perform fusion processing on the first target area and the second image to generate a fused image.

上述处理器用于运行执行以下功能的程序：获取目标对象的第一图像以及目标场景的第二图像；根据所述第一图像数据的类型对所述第一图像数据进行分割处理，得到第一图像中与目标对象对应的第一目标区域；将第一目标区域与第二图像进行融合处理，生成融合后的图像。The above-mentioned processor is used to run a program that performs the following functions: acquiring a first image of a target object and a second image of a target scene; performing segmentation processing on the first image data according to the type of the first image data to obtain a first image the first target area corresponding to the target object in the middle; the first target area and the second image are fused to generate a fused image.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

在本申请的上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present application, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的技术内容，可通过其它的方式实现。其中，以上所描述的装置实施例仅仅是示意性的，例如单元的划分，可以为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，单元或模块的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are only illustrative, for example, the division of units may be a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or integrated into Another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。Units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed over multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , which includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes .

以上仅是本申请的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本申请的保护范围。The above are only the preferred embodiments of the present application. It should be pointed out that for those skilled in the art, some improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications should also be regarded as The protection scope of this application.

Claims

1. A method of processing an image, comprising:

acquiring first image data of a target object and second image data of a target scene;

segmenting the first image data according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data;

and fusing the first target area and the second image data to generate a fused image.

2. The method according to claim 1, wherein performing segmentation processing on the first image data according to the type of the first image data to obtain a first target region corresponding to the target object in the first image data comprises:

when the first image data is a video, segmenting the first image data by using a pre-trained first neural network model to obtain a first target area corresponding to the target object;

when the first image data is a picture, the first image data is segmented by using a predetermined second neural network model to obtain a first target area corresponding to the target object.

3. The method of claim 1, wherein prior to performing the classification determination on the first image data, the method further comprises:

performing content identification and judgment on the first image data;

performing a category determination on the first image data when the input first image data is determined to belong to a scene supported by a segmentation process;

when the input first image data is determined not to belong to a scene supported by segmentation processing, the first image data ends processing.

4. The method of claim 2, wherein the first neural network model is a lightweight model that employs a cross-layer connection of a backbone network and a decoding network as a network structure.

5. The method of claim 2, wherein the method of training the pre-trained first neural network model comprises:

obtaining a training data set, wherein the training data set comprises: the image processing method comprises the steps of obtaining first sample image data and a first sample target area, wherein the first sample target area is a target area mask map obtained based on the first sample image data;

training a neural network model based on the training data set to generate the first neural network model, wherein in the process of training the first neural network model, consistency constraint is carried out on the first neural network model based on interframe information.

6. The method of claim 2, wherein after obtaining the first target region corresponding to the target object by using the pre-trained first neural network model, the method further comprises:

and converting the first target area into a gray mask image, and smoothing the boundary of the gray mask image.

7. The method of claim 2, wherein after obtaining the first target region corresponding to the target object by using the pre-trained first neural network model, the method further comprises:

and acquiring a previous frame segmentation result of the first image data, and performing time domain smoothing filtering on the first target region corresponding to the target by using the previous frame segmentation result.

8. The method of claim 2, wherein the second neural network model is a convolutional neural network model that incorporates a hole convolution and attention mechanism.

9. The method of claim 2, wherein after obtaining the first target region corresponding to the target object using a pre-trained second neural network model, the method further comprises:

and combining the first image data and utilizing a pre-trained third neural network to perform optimization processing on the first target area to obtain a processed first target area.

10. The method of claim 9, wherein the method of training the pre-trained third neural network model comprises:

acquiring an image of a solid background;

performing matting and pre-labeling on the image of the solid background to obtain a label mask image;

and training by taking the image of the solid background and the label mask image as sample data to obtain the third neural network model.

11. The method according to claim 1 or 9, wherein before performing the fusion process on the first target region and the second image data to generate a fused image, the method further comprises:

and inputting the first target area into a post-processing module for post-processing.

12. The method of claim 1, wherein fusing the first target region with the second image data to generate a fused image comprises:

evaluating environmental information of the second image data, correcting the first target area according to the environmental information, and obtaining a corrected first target area;

determining a second target area corresponding to the first target area in the second image data;

replacing the second target area with the corrected first target area.

13. An image processing apparatus characterized by comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring first image data of a target object and second image data of a target scene;

the segmentation module is used for segmenting the first image data according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data;

and the fusion module is used for fusing the first target area and the second image data to generate a fused image.

14. A non-volatile storage medium, comprising a stored program, wherein when the program runs, a device in which the non-volatile storage medium is located is controlled to execute the image processing method according to any one of claims 1 to 12.

15. An electronic device comprising a memory and a processor; the processor is configured to execute a program, wherein the program executes a method for processing an image according to any one of claims 1 to 12.