CN115424264A

CN115424264A - Panorama segmentation method, related device, electronic equipment and storage medium

Info

Publication number: CN115424264A
Application number: CN202210945482.2A
Authority: CN
Inventors: 张圆; 殷保才
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-12-02

Abstract

The application discloses a panorama segmentation method, a related device, an electronic device and a storage medium, wherein the panorama segmentation method comprises the following steps: extracting a first feature map of an image to be segmented; respectively predicting to obtain category information and position information of a first pixel point in the first characteristic diagram based on the first characteristic diagram, and performing characteristic generation based on the first characteristic diagram to obtain a second characteristic diagram; extracting the position information of each image object based on the category information and the position information of a first pixel point in the first feature map; and carrying out panoramic segmentation on the basis of the second characteristic diagram and the position information of each image object to obtain a panoramic segmentation diagram of the image to be segmented. By the aid of the scheme, panoramic segmentation can be deployed on the edge device in real time.

Description

Panorama segmentation method, related device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a panorama segmentation method, a related apparatus, an electronic device, and a storage medium.

Background

Scene awareness is a fundamental task in the direction of computer vision research, and there are a number of techniques falling on the ground in many practical scenes. Among them, panorama segmentation has been a high research focus as an underlying task of scene perception. Different from other scene perception tasks such as target detection, semantic segmentation, instance segmentation and the like, panoramic segmentation can be regarded as combination of semantic segmentation and instance segmentation, namely, all target objects are detected, and different instances in the same category are distinguished (for example, target objects 'a', 'b', 'c' and the like belonging to the category 'human' are distinguished). Therefore, panoramic segmentation is the work with the highest complexity but the most comprehensive analysis in the perception task of the computer visual scene.

Currently, the mainstream panorama segmentation scheme mainly includes the following two technical routes: the method realizes panoramic segmentation based on two-stage and post-processing fusion, and realizes panoramic segmentation end to end based on a single stage of query. In the method, because the technical process is more and the complexity of each atomic capability is higher, real-time deployment on actually-landed edge equipment is difficult; while the latter simplifies the panoramic segmentation task in the process, it is still difficult to implement real-time deployment on the edge device due to the high computational complexity of the attention mechanism in the transform and the lack of accelerated optimization of the underlying hardware. In view of this, how to deploy panorama segmentation in real time on an edge device becomes an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a panorama segmentation method, a related device, an electronic device and a storage medium, and the panorama segmentation can be deployed on edge equipment in real time.

In order to solve the above technical problem, a first aspect of the present application provides a panorama segmentation method, including: extracting a first feature map of an image to be segmented; the image to be segmented contains image objects of a plurality of categories, and the image objects comprise at least one of an example and a background; respectively predicting to obtain category information and position information of a first pixel point in the first characteristic diagram based on the first characteristic diagram, and performing characteristic generation based on the first characteristic diagram to obtain a second characteristic diagram; extracting the position information of each image object based on the category information and the position information of a first pixel point in the first feature map; and carrying out panoramic segmentation on the basis of the second characteristic diagram and the position information of each image object to obtain a panoramic segmentation diagram of the image to be segmented.

In order to solve the above technical problem, a second aspect of the present application provides a panorama segmentation apparatus, comprising: the image segmentation device comprises a feature extraction module, an information prediction module, a feature generation module, an information extraction module and an image segmentation module, wherein the feature extraction module is used for extracting a first feature map of an image to be segmented; the image to be segmented contains image objects of a plurality of categories, and the image objects comprise at least one of an example and a background; the information prediction module is used for respectively predicting to obtain category information and position information of a first pixel point in the first characteristic diagram based on the first characteristic diagram; the characteristic generation module is used for generating characteristics based on the first characteristic diagram to obtain a second characteristic diagram; the information extraction module is used for extracting the position information of each image object based on the category information and the position information of the first pixel point in the first characteristic diagram; and the image segmentation module is used for carrying out panoramic segmentation on the basis of the second characteristic diagram and the position information of each image object to obtain a panoramic segmentation diagram of the image to be segmented.

In order to solve the above technical problem, a third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the memory stores program instructions, and the processor is configured to execute the program instructions to implement the panorama segmentation method according to the first aspect.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer-readable storage medium storing program instructions executable by a processor, the program instructions being configured to implement the panorama segmentation method of the first aspect described above.

According to the scheme, the first feature map of the image to be segmented is extracted, and the image to be segmented contains image objects of a plurality of classes, wherein the image objects comprise at least one of instances and backgrounds. On the basis, based on the first feature diagram, the category information and the position information of a first pixel point in the first feature diagram are respectively obtained through prediction, feature generation is carried out based on the first feature diagram, and a second feature diagram is obtained, so that the position information and the position information of each image object are extracted based on the category information and the position information of the first pixel point in the first feature diagram, and then panoramic segmentation is carried out based on the second feature diagram and the position information of each image object, and the panoramic segmentation diagram of the image to be segmented is obtained. Therefore, the panoramic segmentation can be deployed in real time on the edge device.

Drawings

FIG. 1 is a schematic flowchart of an embodiment of a panorama segmentation method according to the present application;

FIG. 2a is a diagram illustrating an embodiment of an image to be segmented;

FIG. 2b is a schematic view of an embodiment of a panoramic segmentation map;

FIG. 3 is a block diagram of an embodiment of a panorama segmentation model;

FIG. 4 is a diagram of one embodiment of convolution parameters;

FIG. 5 is a schematic flow chart diagram of an embodiment of training a panorama segmentation model;

FIG. 6 is a schematic diagram of a frame of an embodiment of a panorama segmentation apparatus of the present application;

FIG. 7 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 8 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The embodiments of the present application will be described in detail below with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the section "/" herein generally indicates that the former and latter associated objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

According to the method and the device, on one hand, problems caused by a heuristic post-processing flow of manual design can be effectively avoided due to a single-flow-based end-to-end technical system, the panoramic segmentation effect is favorably improved, on the other hand, operators with high operational complexity such as transformers are not needed in the panoramic segmentation process, and the consumption of computing power, storage and other resources for deploying panoramic segmentation is greatly reduced. Therefore, the panoramic segmentation can be deployed in real time on the edge device. It should be noted that the edge device has characteristics of low power consumption, low computational power, and the like, and exemplarily, the edge device may include but is not limited to: a sweeping robot, a companion robot, a teaching robot, a vehicle-mounted device (e.g., a vehicle-mounted autopilot device), etc., which are not limited herein.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a panorama segmentation method according to an embodiment of the present application.

Specifically, the method may include the steps of:

step S11: and extracting a first feature map of the image to be segmented.

In the embodiment of the disclosure, the image to be segmented contains image objects of several classes, and the image objects comprise at least one of instances and backgrounds. That is to say, the image to be segmented may include an example, may also include a background, and may also include both an example and a background, which is not limited herein.

In one implementation scenario, the image to be segmented may include: the image to be segmented includes four categories of image objects, including two categories of examples (namely, pedestrian and vehicle), and two categories of backgrounds (namely, road and sky), wherein the two categories of examples relate to five examples (namely, pedestrian "a", pedestrian "B", pedestrian "c", vehicle "a", and vehicle "B"), and the purpose of panoramic segmentation is to segment each image object. Other cases may be analogized, and no one example is given here.

In an implementation scenario, please refer to fig. 2a and fig. 2b in combination for easy understanding, in which fig. 2a is a schematic diagram of an embodiment of an image to be segmented, fig. 2b is a schematic diagram of an embodiment of a panorama segmentation map, and more specifically, fig. 2b is a panorama segmentation map obtained after the panorama segmentation of fig. 2 a. As shown in fig. 2a and 2b, fig. 2b not only marks different image objects in different colors, but also further marks categories to which the respective image objects belong. Other cases may be analogized, and no one example is given here.

In one implementation scenario, the image to be segmented may be set according to an application scenario. For example, in a smart home scene, an image to be segmented may be an image captured by a sweeping robot through a built-in camera thereof; alternatively, in an automatic driving scenario, the image to be segmented may be an image captured by an in-vehicle device through an in-vehicle camera. Other cases may be analogized, and no one example is given here. In addition, it should be noted that the cameras for capturing the images to be segmented may include, but are not limited to: a general camera, a wide-angle camera, a fisheye camera, etc., which are not limited herein.

In an implementation scenario, in order to improve the efficiency of panorama segmentation, a panorama segmentation model may be trained in advance, and the panorama segmentation model may further include a feature extraction network, which may include, but is not limited to, convolutional layers, and the like, and is not limited herein. Illustratively, the feature extraction network may be specifically designed as a residual error network, and the network structure of the feature extraction network is not limited herein. On the basis, feature extraction can be carried out on the image to be segmented based on the feature extraction network, so that a first feature map is obtained.

In an implementation scenario, different from the foregoing extraction manner, in order to further improve accuracy of panorama segmentation, the first feature map may contain feature information of multiple scales of an image to be segmented. Specifically, the third feature maps of multiple scales can be extracted and obtained based on the image to be segmented, and the first feature map can be obtained based on the third feature maps of multiple scales through fusion. It should be noted that the third feature maps with different scales have different resolutions. The higher the resolution of the third feature map is, the shallower the feature information included in the third feature map is (for example, the shallow texture feature or the like), whereas the lower the resolution of the third feature map is, the deeper the feature information included in the third feature map is (for example, the deep semantic feature or the like). In the mode, the third feature maps with multiple scales are extracted and obtained based on the image to be segmented, and the first feature map is obtained based on the fusion of the third feature maps with multiple scales, so that the first feature map can contain feature information of the image to be segmented with multiple scales, and therefore shallow features and deep features can be simultaneously referred to in the subsequent category prediction, parameter prediction and feature generation processes, and the accuracy of panoramic segmentation is improved.

In a specific implementation scenario, the specific number of the multiple scales and the resolution corresponding to each of the multiple scales are not specifically limited herein. Illustratively, taking three scales as an example, the resolutions corresponding to the three scales may include: one fourth of the resolution of the image to be segmented, one eighth of the resolution of the image to be segmented, and one sixteenth of the resolution of the image to be segmented. Other cases may be analogized and are not illustrated here.

In a specific implementation scenario, as described above, in order to improve the efficiency of panorama segmentation, a panorama segmentation model may be trained in advance, so that a panorama segmentation map of an image to be segmented may be obtained by performing panorama segmentation on the image to be segmented based on the panorama segmentation model, which is different from the panorama segmentation model in the foregoing implementation. Illustratively, the backbone network may include, but is not limited to: convolutional neural networks such as ResNet and DensenNet, taking the case that the backbone network is set as ResNet, the feature extraction sub-network may be a residual block in ResNet at this time, and the rest may be the same, which is not illustrated herein. Specifically, a model structure with appropriate parameter quantities may be selected according to the actual computational power of the edge device and hardware resources such as memory. Referring to fig. 3, fig. 3 is a schematic diagram of a frame of an embodiment of a panorama segmentation model. As shown in fig. 3, each feature extraction sub-network of the backbone network may extract a third feature map with different scales, that is, the third feature maps obtained through different downsampling magnifications. Further, converged networks may include, but are not limited to: FPN (Feature Pyramid Networks), PANet, etc., without limitation. And the third feature maps with various scales can be fused through a fusion network to obtain a feature map with a unified downsampling multiplying power, namely the first feature map. The specific process of feature extraction by the backbone network can refer to the technical details of convolutional neural networks such as ResNet and DenseNet, the specific process of fusion of the third feature maps with multiple scales by the fusion network can refer to the technical details of networks such as FPN and PANet, and details are not repeated here.

Step S12: and respectively predicting to obtain the category information and the position information of the first pixel point in the first characteristic diagram based on the first characteristic diagram, and performing characteristic generation based on the first characteristic diagram to obtain a second characteristic diagram.

In the embodiment of the present disclosure, the category information of the first pixel point in the first feature map represents the possibility that the image objects to which the first pixel point belongs are of a plurality of categories, respectively. Illustratively, the category information may be represented by confidence levels, that is, the category information may include confidence levels of image objects to which the first pixel points belong that are respectively predicted into a plurality of categories, it should be noted that the confidence levels may specifically represent confidence levels, and a numerical range thereof may be a positive number, the confidence level that the image object to which the first pixel points belong is predicted into a certain category is higher, that is, the confidence level that the image object to which the first pixel points belong is predicted into the certain category is higher, and conversely, the confidence level that the image object to which the first pixel points belong is predicted into the certain category is lower, that is, the confidence level that the image object to which the first pixel points belong is predicted into the certain category is lower; or, the category information may also be represented by a probability value, that is, the category information may include probability values that the image object to which the first pixel belongs predicts into a plurality of categories, where it should be noted that the probability value specifically represents occurrence probability of an event, and a numerical range of the probability value may be 0 to 1, and the probability value that the image object to which the first pixel belongs predicts into a certain category is larger, that is, probability of occurrence of an event that "the image object to which the first pixel belongs is the category" is higher, and conversely, the probability value that the image object to which the first pixel belongs predicts into a certain category is smaller, that is, probability of occurrence of an event that "the image object to which the first pixel belongs is the category" is lower. The two manners are only two possible embodiments adopted for expressing the category information in the actual application process, and the specific manner of expressing the category information is not limited thereby. For convenience of description, taking the example that the probability value is used to express the category information and the resolution of the first feature map is S × S, the category information of each first pixel point in the first feature map may be expressed as a tensor of S × C, where C denotes a total number of several categories. Exemplarily, taking the image to be segmented shown in fig. 2a as an example, the several categories include three categories of examples, namely, a wall, a pot, and a door, and two categories of backgrounds, namely, five image objects, in total, at this time, the category information of each first pixel in the first feature map may include probability values that the image objects to which the first pixel in the first feature map belongs are the five image objects, respectively, that is, the category information of each first pixel in the first feature map may be expressed as a tensor of S × S5 at this time. Other cases may be analogized, and no one example is given here.

In one implementation scenario, as mentioned above, to improve the efficiency of panorama segmentation, a panorama segmentation model may be trained in advance. With continued reference to fig. 3, the panorama segmentation model may include a category prediction network, and the category prediction network may be used to predict category information, that is, the first feature map may be input into the category prediction network to obtain category information of the first pixel point in the first feature map. The class prediction network may include, but is not limited to, a convolutional layer, and the network structure of the class prediction network is not limited herein. Further, in order to facilitate deployment of the panorama segmentation in real time at the edge device, the class prediction network may be formed by a plurality of convolutional layers, where the number of convolutional layers is not limited.

In the embodiment of the present disclosure, the position information of the first pixel point in the first feature map is used to perform image segmentation at the first pixel point, so as to obtain a segmented image of the image object to which the first pixel point belongs. In order to further facilitate deployment of panorama segmentation in real time at the edge device, the position information of the first pixel point in the first feature map may specifically include a convolution parameter of the first pixel point. For convenience of description, taking the resolution of the first feature map as S × S as an example, the position information of each first pixel point in the first feature map may be represented as S × D ₁ Of D, wherein D ₁ The dimension of the convolution parameter is represented, that is, the position information of each first pixel point can be represented as D ₁ Convolution parameters of the dimension. Exemplarily, D ₁ May be set to 5, 10, 15, etc., where D is ₁ The specific numerical values of (a) are not limited. Of course, in practical application, the position information is not limited to be represented by convolution parameters. For example, the position information of the first pixel may further specifically include a matrix parameter of the first pixel, and the position information of each first pixel may be represented as M × D ₂ Wherein M x M denotes the size of the matrix parameter, D ₂ The dimension representing the matrix parameter, such as 3 × 5, 5 × 10, etc., may be set, and is not limited herein. The two manners are only two possible embodiments for expressing the position information in the practical application process, and the specific manner for expressing the position information is not limited thereby.

In an implementation scenario, as described above, in order to improve the efficiency of panorama segmentation, a panorama segmentation model may be trained in advance. With continued reference to fig. 3, the panorama segmentation model may include a location prediction network, and the location prediction network may be configured to predict the location information, that is, the first feature map may be input to the location prediction network to predict the location information. The location prediction network may include, but is not limited to, a convolutional layer, and the network structure of the location prediction network is not limited herein. Further, in order to facilitate deployment of the panorama segmentation in real time at the edge device, the position prediction network may be formed by a plurality of convolutional layers, where the number of convolutional layers is not limited.

In the embodiment of the present disclosure, the resolution of the second feature map may be the same as the resolution of the image to be segmented. For convenience of description, the resolution of the image to be segmented may be denoted as H × W, where H denotes a height, and W denotes a width, and the second feature map may be denoted as H × W × E, where E denotes a feature dimension, and for example, E may be set to 5, 10, 15, and the like, where a specific value of E is not limited. In addition, the second feature map is used to perform panorama segmentation in combination with the position information to obtain a panorama segmentation map of the image to be segmented, which may specifically refer to the following related description and will not be described herein again.

In an implementation scenario, as described above, in order to improve the efficiency of panorama segmentation, a panorama segmentation model may be trained in advance. With continued reference to fig. 3, the panorama segmentation model may include a feature generation network, and the feature generation network may be configured to perform feature generation, that is, the first feature map may be input into the feature generation network to obtain a second feature map. It should be noted that the feature generation network may include, but is not limited to, a convolutional layer, and the network structure of the feature generation network is not limited herein. Further, in order to facilitate deployment of the panorama segmentation in real time at the edge device, the feature generation network may be formed by a plurality of convolutional layers, where the number of convolutional layers is not limited.

Step S13: and extracting the position information of each image object based on the category information and the position information of the first pixel point in the first characteristic diagram.

In an implementation scenario, an image region of each image object in the first feature map may be determined based on the category information and the location information of the first pixel point in the first feature map, and the first pixel point at the center of gravity position in the first feature map may be determined based on the center of gravity position of the image region of the image object as a target pixel point, so as to extract location information of the target pixel point as location information of the image object. It should be noted that, the calculation manner of the image center of gravity (i.e., the image center of mass) may refer to the technical details about the image center of gravity (or the image center of mass), and is not described herein again. In the above manner, by extracting the position information at the center position as the position information of the image object, the position condition of the image object can be referred to in the subsequent image segmentation process, which is helpful for improving the segmentation accuracy of each image object.

In a specific implementation scenario, non-maximum suppression may be performed based on the category information of the first pixel in the first feature map, so as to obtain the image area of each image object and the category to which each image object belongs in the first feature map. In this embodiment, the category to which each image object belongs may be marked in the finally obtained panorama segmentation map. Referring to fig. 2a and 2b in combination, the to-be-segmented image shown in fig. 2a finally obtains the panorama segmented image shown in fig. 2b, the panorama segmented image shown in fig. 2b is marked with each image object by using different color blocks, and the color blocks of each image object are also marked with the category to which the image object belongs. In addition, in order to improve the accuracy of the image region, the candidate regions of the image objects in the first feature map may be further predicted. It should be noted that each image object may correspond to at least one candidate region, and each candidate region corresponds to a prediction confidence level, where the prediction confidence level indicates a possibility that an image object exists in the candidate region, and on this basis, the candidate regions may be screened through Non-Maximum Suppression (NMS), so that a target region of each image object in the first feature map may be finally obtained. The process of performing region screening by using non-maximum suppression may refer to the technical details of non-maximum suppression, which are not described herein again. Meanwhile, for each first pixel point in the first feature map, the category with the highest probability can be selected as the category to which the image object to which the first pixel point belongs. Based on this, the connected domain formed by the first pixel points belonging to the same category in each target region can be further used as the image region of the image object in the first feature map in the target region. According to the mode, the non-maximum value suppression is carried out on the basis of the category information of the first pixel point in the first characteristic diagram, the image area of each image object in the first characteristic diagram and the category to which each image object belongs are obtained, and the category to which each image object belongs is marked in the panorama segmentation diagram, so that each image object can be effectively screened out through the non-maximum value suppression, and the accuracy of the subsequent convolution parameter extraction is improved.

In a specific implementation scenario, as described above, after the image region of each image object in the first feature map is obtained, the position information of each image object may be determined according to the barycentric position of the image region of each image object. Referring to fig. 4, fig. 4 is a schematic diagram of an embodiment of position information. As shown in FIG. 4, take the example that the image to be segmented contains two image objects, wherein the center of gravity of the image region of one image object is located at (i) ₁ ,j ₁ ) The image region of the other image object has its center of gravity at (i) ₂ ,j ₂ ). As described above, the position information of each first pixel point in the first feature map may be represented as S × D ₁ The tensor of (c), then, can be taken to be located at (i) in the tensor ₁ ,j ₁ ) Position information of the position is taken as position information of the first image object, and the tensor is positioned at (i) ₂ ,j ₂ ) Position information of the position as position information of the second image object. Other cases may be analogized and are not illustrated here.

In another implementation scenario, different from the foregoing implementation manner of extracting the convolution parameter, taking the probability value to express the category information as an example, under the condition that hardware resources such as computing power and memory of the edge device are relatively abundant, the category information of the first pixel point in the first feature map may further include probability values that the first pixel point belongs to each image object respectively. That is to say, the category information may include not only probability values that the image objects to which the first pixel points belong are in a plurality of categories, but also probability values that the first pixel points belong to each image object. In order to distinguish the two probability values, the probability values of the image objects to which the first pixel points belong to a plurality of categories respectively can be called as first probability values, and the first pixel points belong to the image objects respectivelyThe probability value is referred to as a second probability value. On this basis, for each first pixel point in the first feature map, the first pixel point respectively belongs to the image object corresponding to the maximum value in the second probability values of the image objects, and is determined as the image object to which the first pixel point belongs, so that the connected domain formed by the first pixel points belonging to the same image object can be used as the image area of the image object in the first feature map, and meanwhile, for each first pixel point in the first feature map, the image object to which the first pixel point belongs can also be respectively determined as the category to which the image object to which the first pixel point belongs, and is corresponding to the maximum value in the first probability values of a plurality of categories. As described above, in this embodiment, the category to which each image object belongs may be marked in the finally obtained panorama segmentation map. Further, for each image object, after obtaining the image region in the first feature map, the position information of each first pixel point in the first feature map (i.e., the size is S × D) ₁ Tensor of the image region), and performing weighting processing on the position information (for example, weighting processing may be performed on the convolution parameter or weighting processing may be performed on the matrix parameter) to obtain final position information of the image object, where the first pixel point located closer to the center of gravity of the image region has a larger weight, and the first pixel point located farther from the center of gravity of the image region has a smaller weight. In the above manner, the position information corresponding to each first pixel point in the image region of the image object is weighted to obtain the position information of the image object, and the weight of the first pixel point is negatively correlated with the distance from the first pixel point to the gravity center position, so that on one hand, each first pixel point in the image region can be simultaneously referred to in the position information of the image object, and on the other hand, the first pixel points at different positions can be respectively referred to in different degrees, thereby being beneficial to improving the accuracy of the position information of the image object.

In a specific implementation scenario, the pixel distance from each first pixel point in the image region to the center of gravity position thereof may be obtained, and the maximum pixel distance is screened out, so that for each first pixel point in the image region, the ratio of the corresponding pixel distance to the maximum pixel distance may be obtained first, then a value obtained by subtracting the ratio from 1 is used as the initial weight of the first pixel point, and finally the initial weight of each first pixel point in the image region may be normalized to obtain the final weight of each first pixel point in the image.

Step S14: and carrying out panoramic segmentation on the basis of the second characteristic diagram and the position information of each image object to obtain a panoramic segmentation diagram of the image to be segmented.

In one implementation scenario, the second feature maps may be processed based on the position information of each image object to obtain a segmented image of each image object, and then the segmented images of each image object are fused to obtain a panoramic segmented map of the image to be segmented. In the manner, the second characteristic maps are respectively processed based on the position information of the image objects to obtain the segmentation images of the image objects, and then the segmentation images of the image objects are fused to obtain the panoramic segmentation map of the image to be segmented, so that the operations of image segmentation, image fusion and the like in the dimension of the image objects are facilitated, the granularity of panoramic segmentation is facilitated to be refined, and the accuracy of panoramic segmentation is further improved.

In a specific implementation scenario, for example, the convolution parameter is used to express the location information, and the segmented image of the image object may be obtained by performing pixel-by-pixel convolution on each second pixel point in the second feature map according to the convolution parameter of the image object. The specific process of the pixel-by-pixel convolution can refer to the technical details of the convolution operation, which are not described herein again.

In a specific implementation scenario, still taking the example that the convolution parameter is used to express the location information and the second feature map is represented as H × W × E, after performing pixel-by-pixel convolution on the second feature map by the convolution parameter of a certain image object, an initial image of H × W may be obtained, and the pixel value of each pixel point in the initial image represents the probability value that the pixel point belongs to the image object. Based on this, the pixel value of the pixel point whose pixel value is lower than the preset threshold (e.g., 0.5, 0.6, etc.) in the initial image may be directly set to 0, and the pixel value of the pixel point whose pixel value is not lower than the preset threshold in the initial image may be directly set to 1, so as to obtain a mask image whose pixel value is represented by 0-1 as the segmentation image of the image object.

In a specific implementation scenario, taking N image objects contained in the image to be segmented as an example, the N segmented images can be finally obtained through the above processing. In addition, for the convenience of subsequent processing, each segmented image may be a mask map represented by 0-1 as described above. For a specific processing procedure, reference may be made to the following related description, which is not repeated herein.

In a specific implementation scenario, as described above, a mask map with segmented images of each image object represented by 0-1 is taken as an example, and as shown in fig. 2b, different image objects are marked by color patches of different colors in the panorama segmented image. The tensor of S × C (i.e., the category information of each first pixel in the first feature map) may be up-sampled (e.g., an interpolation operation may be performed) in advance to obtain a tensor having the same resolution as that of the image to be segmented. The value in (i, j, k) in the tensor represents the probability value of the kth category of the image object to which the pixel point (i, j) belongs in the image to be segmented. On this basis, if the pixel value of the pixel point in the image to be segmented in the segmented image of only one of the image objects is 1, and the other pixel values are all 0, the pixel point can be marked as the color corresponding to the image object in the panoramic segmentation image. On the contrary, if the pixel values of the pixel points in the segmented image of the image objects to be segmented are all 1, the probability values of the image objects to which the pixel points belong respectively belonging to various categories can be continuously searched in the tensor W × H × C, the category corresponding to the maximum probability value can be selected as the category to which the image object to which the pixel point belongs, and finally, the image object to which the pixel point belongs can be determined according to the category, and the pixel point is marked as the color corresponding to the image object in the panoramic segmented image. With reference to fig. 2b, the panoramic image to be segmented shown in fig. 2a can be obtained through the above implementation process. Other cases may be analogized, and no one example is given here.

In another implementation scenario, as distinct from the aforementioned operations such as image segmentation and image fusion performed on the dimensions of the image objects, when the accuracy requirement on the panoramic segmentation is relatively relaxed, after obtaining the second feature map and the position information of each image object, the position information of one image object may be selected to process the second feature map (for example, when the position information is represented by a convolution parameter, the second feature map may be convolved pixel by using the convolution parameter), so as to obtain a segmented image of the image object, and based on the segmented image of the image object, the position area of the image object is labeled with an image label (such as a certain specific color) corresponding to the image object on an initial image with the same resolution as that of the image object to be segmented, then the feature information corresponding to the position area is removed from the second feature map, and the position information of the next image object is selected to process the second feature map from which the above feature information is removed, so as to obtain a segmented image of the first image object, and the image is continued on the initial image, and the image label of the position label of the next image object is selected, so as to obtain a segmented image, and the segmented image of the whole image to obtain the segmented image to be labeled.

According to the scheme, the first feature map of the image to be segmented is extracted, and the image to be segmented contains image objects of a plurality of classes, wherein the image objects comprise at least one of instances and backgrounds. On the basis, based on the first feature map, the category information and the position information of the first pixel points in the first feature map are respectively obtained through prediction, and feature generation is carried out based on the first feature map to obtain the second feature map, so that the position information of each image object is extracted based on the category information and the position information of the first pixel points in the first feature map, and then panoramic segmentation is carried out based on the second feature map and the position information of each image object to obtain a panoramic segmentation map of the image to be segmented. Therefore, the panoramic segmentation can be deployed in real time on the edge device.

Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of training a panorama segmentation model. As described in the foregoing disclosed embodiments, the panorama segmentation map may be obtained by performing panorama segmentation on an image to be segmented based on a panorama segmentation model. Further, the panoramic segmentation model is obtained based on sample image training, similar to the image to be segmented, the sample image may contain sample objects of a plurality of sample categories, the sample image is marked with annotation information, the annotation information includes sample categories to which the sample objects belong, to which sample pixel points in the sample image belong, respectively, the sample objects include at least one of an instance and a background, and the panoramic segmentation model is obtained based on category prediction loss and image segmentation loss joint training. Specifically, the method may include the steps of:

step S51: and extracting a first sample characteristic diagram of the sample image, and respectively obtaining the sample class to which the sample object actually belongs and the sample segmentation image of each sample object, wherein each first sample pixel point in the first sample characteristic diagram belongs, based on the labeling information.

In an implementation scenario, the process of extracting the first sample feature map of the sample image may refer to the related description of "extracting the first feature map of the image to be segmented" in the foregoing disclosure embodiment, and is not described herein again.

In an implementation scenario, the sample image may be scaled to the same resolution as that of the first sample feature map, so that for each first sample pixel point in the first sample feature map, a sample pixel point corresponding to the first sample pixel point in the sample image may be determined, and a sample category labeled by the corresponding sample pixel point is used as a sample category to which a sample object to which the first sample pixel point belongs actually belongs. Illustratively, as described above, the sample image is marked with the annotation information, and the annotation information includes the sample category to which the sample object to which each sample pixel point in the sample image belongs. Taking the height and width of the sample image as H1 and W1, respectively, and the total number of the several sample categories as C as an example, for convenience of description, the resolution of the first sample feature map may be denoted as H2 × W2, and the labeling information may be denoted as a tensor of H1 × W1 × C. It should be noted that, in the tensor H1 × W1 × C, the element value located at (i, j, k) represents a probability value that the sample object to which the sample pixel (i, j) belongs in the sample image actually belongs to the kth sample class, and based on the labeling information, if the sample pixel (i, j) belongs to the sample object to which the sample pixel (i, j) belongs actually belongs to the kth sample class, the element value located at (i, j, k) in the tensor H1 × W1 × C may be 1, otherwise may be 0. On this basis, the tensor can be scaled to the same resolution as the first sample feature map, and a new tensor H2W 2C is obtained. It should be noted that, in the tensor H2 × W2 × C, the element value at (i, j, k) indicates the probability value that the first sample pixel point (i, j) in the first sample feature map belongs to the kth sample class, and further, for the first sample pixel point (i, j), the sample class corresponding to the maximum probability value may be used as the sample class to which the sample object to which the first sample pixel point (i, j) belongs actually belongs.

In an implementation scenario, the annotation information may further include sample objects to which each sample pixel point in the sample image belongs. On this basis, for the sample segmentation image of each sample object, if the sample pixel point belongs to the sample object, the pixel value of the sample pixel point on the sample segmentation image of the sample object is set to 1, otherwise, the pixel value is set to 0. By analogy, a sample segmentation image of each sample object can be obtained.

Step S52: and respectively predicting to obtain the sample category to which the sample object prediction belongs and the prediction position information of each first sample pixel point in the first sample characteristic diagram based on the first sample characteristic diagram, and performing characteristic generation based on the first sample characteristic diagram to obtain a second sample characteristic diagram.

In an implementation scenario, the specific processes of category prediction, parameter prediction, and feature generation may refer to the relevant descriptions in the foregoing disclosed embodiments, and are not described herein again.

Step S53: and extracting the predicted position information of each sample object based on the labeling information and the predicted position information of each first sample pixel point.

Specifically, as described above, the annotation information may further include sample objects to which each sample pixel point in the sample image belongs, and then the implementation process of "obtaining, based on the annotation information, a sample category to which each sample pixel point in the first sample feature map belongs actually" may be referred to in the foregoing step, to obtain the sample object to which each first sample pixel point in the first sample feature map belongs, and specifically, the sample image may be scaled to the same resolution as the first sample feature map, so that for each first sample pixel point in the first sample feature map, a sample pixel point corresponding to the first sample pixel point in the sample image may be determined, and the sample object to which the corresponding sample pixel point belongs may be used as the sample object to which the first sample pixel point belongs. On this basis, a connected domain formed by first sample pixels belonging to the same sample object in the first sample feature map may be used as a sample image region of the sample object, and based on a sample gravity center position of the sample image region of the sample object, a first sample pixel at a sample gravity center position in the first sample feature map is determined to be used as a target sample pixel, and predicted position information of the target sample pixel is used as predicted position information of the sample object, which may specifically refer to the relevant description about "extracting position information of an image object" in the foregoing disclosed embodiment, and is not described herein again.

Step S54: and respectively processing the second sample characteristic map based on the predicted position information of each sample object to obtain a predicted segmentation image of each sample object.

Specifically, reference may be made to the implementation process of the step "processing the second feature maps respectively based on the position information of each image object to obtain the segmented images of each image object" in the foregoing disclosed embodiment. Furthermore, as described in the foregoing disclosure, in the case of expressing the prediction position information by using the prediction convolution parameter, for each sample object, the pixel-by-pixel convolution may be performed on each second sample pixel point in the second sample feature map based on the prediction convolution parameter of the sample object, so as to obtain the prediction segmented image of the sample object.

Step S55: and obtaining a class prediction loss based on the difference between the sample class to which the sample object to which the first sample pixel point belongs actually belongs and the sample class to which the prediction belongs, and obtaining an image segmentation loss based on the difference between the sample segmentation image of the sample object and the prediction segmentation image.

Specifically, a loss function such as a cross entropy loss function, a local loss, or the like may be used to measure a difference between a sample class to which a sample object to which a first sample pixel point belongs is actually attributed and a sample class to which prediction is attributed, so as to obtain a class prediction loss. In addition, a loss function such as a cross entropy loss function, a Dice loss, or the like may be used to measure a difference between a sample segmentation image of a sample object and a prediction segmentation image, so as to obtain an image segmentation loss. For ease of description, the class prediction penalty can be denoted as L _cls The image segmentation loss can be noted as L _mask 。

Step S56: and adjusting network parameters of the panoramic segmentation model based on the category prediction loss and the image segmentation loss.

Specifically, the category prediction loss and the image segmentation loss may be weighted and summed to obtain a total loss of the panorama segmentation model, and the network parameter of the panorama segmentation model may be adjusted based on the total loss. For example, based on the total loss, an optimization manner such as gradient descent may be adopted to adjust network parameters of the panorama segmentation model, and for a specific adjustment process, reference may be made to technical details of the optimization manner such as gradient descent, which is not described herein again.

According to the scheme, the first sample characteristic diagram of the sample image is extracted, and the sample type to which the sample object actually belongs and the sample segmentation image of each sample object are obtained respectively based on the labeling information. On the basis, respectively predicting and obtaining the sample class to which the sample object belongs and the predicted position information of each first sample pixel point in the first sample characteristic diagram based on the first sample characteristic diagram, and performing characteristic generation based on the first sample characteristic diagram to obtain a second sample characteristic diagram, thereby extracting the predicted position information of each sample object based on the marking information and the predicted position information of each first sample pixel point, respectively processing the second sample characteristic diagram based on the predicted position information of each sample object to obtain a predicted segmentation image of each sample object, and obtaining a difference between the sample class to which the sample object belongs and the sample class to which the prediction belongs based on the sample object to which the first sample pixel point belongs, the method comprises the steps of obtaining a class prediction loss, obtaining an image segmentation loss based on the difference between a sample segmentation image of a sample object and a prediction segmentation image, and further adjusting network parameters of a panoramic segmentation model based on the class prediction loss and the image segmentation loss, so that the panoramic segmentation model can be constrained to perform feature extraction as accurately as possible through the class prediction loss and the image segmentation loss, the panoramic segmentation model can be constrained to perform class prediction as accurately as possible through the class prediction loss, and parameter prediction and feature generation can be performed as accurately as possible through the image segmentation loss, and therefore the model precision of the panoramic segmentation model can be improved through joint training.

Referring to fig. 6, fig. 6 is a schematic frame diagram of a panorama segmentation apparatus 60 according to an embodiment of the present disclosure. The panorama segmentation apparatus 60 includes: the image segmentation method comprises a feature extraction module 61, an information prediction module 62, a feature generation module 63, an information extraction module 64 and an image segmentation module 65, wherein the feature extraction module 61 is used for extracting a first feature map of an image to be segmented; the image to be segmented contains image objects of a plurality of categories, and the image objects comprise at least one of instances and backgrounds; the information prediction module 62 is configured to respectively predict category information and position information of a first pixel point in the first feature map based on the first feature map; a feature generation module 63, configured to perform feature generation based on the first feature map to obtain a second feature map; an information extraction module 64, configured to extract location information of each image object based on the category information and the location information of the first pixel point in the first feature map; and the image segmentation module 65 is configured to perform panorama segmentation based on the second feature map and the position information of each image object to obtain a panorama segmentation map of the image to be segmented.

According to the scheme, on one hand, problems caused by a heuristic post-processing flow of manual design can be effectively avoided due to a single-flow end-to-end technology system, the panoramic segmentation effect is favorably improved, on the other hand, operators with high operation complexity such as transformers are not needed in the panoramic segmentation process, and the consumption of resources such as computing power and storage for deploying panoramic segmentation is greatly reduced. Therefore, the panoramic segmentation can be deployed in real time on the edge device.

In some disclosed embodiments, the information extraction module 64 includes a region determination sub-module, configured to determine an image region of each image object in the first feature map based on the category information and the position information of the first pixel point in the first feature map; the information extraction module 64 includes a center-of-gravity determination submodule configured to determine, based on a center-of-gravity position of an image region of the image object, a first pixel point at the center-of-gravity position in the first feature map as a target pixel point; the information extraction module 64 includes an information acquisition sub-module, which is used to extract the position information of the target pixel point as the position information of the image object.

In some disclosed embodiments, the region determining submodule is specifically configured to perform non-maximum suppression based on category information of a first pixel point in the first feature map, so as to obtain an image region of each image object in the first feature map and a category to which each image object belongs; the panoramic division map is marked with the category to which each image object belongs.

In some disclosed embodiments, the image segmentation module 65 includes a feature processing sub-module configured to process the second feature map based on the position information of each image object, respectively, to obtain a segmented image of each image object, and the image segmentation module 65 includes an image fusion sub-module configured to perform fusion based on the segmented images of each image object, to obtain a panoramic segmentation map of the image to be segmented.

In some disclosed embodiments, the position information includes convolution parameters, and the segmented image of the image object is obtained by performing pixel-by-pixel convolution on each second pixel point in the second feature map by the convolution parameters of the image object.

In some disclosed embodiments, the panorama segmentation graph is obtained by performing panorama segmentation on an image to be segmented based on a panorama segmentation model, the panorama segmentation model is obtained by training based on a sample image, the sample image contains sample objects of a plurality of sample categories, the sample image is marked with labeling information, the labeling information includes sample categories to which the sample objects belong respectively, the sample objects include at least one of instances and backgrounds, and the panorama segmentation model is obtained by performing joint training based on category prediction loss and image segmentation loss.

In some disclosed embodiments, the panorama segmentation apparatus 60 includes a sample feature extraction module, configured to extract a first sample feature map of the sample image, and the panorama segmentation apparatus 60 includes a sample information acquisition module, configured to obtain, based on the annotation information, a sample class to which each first sample pixel point in the first sample feature map belongs and a sample segmentation image of each sample object; the panorama segmentation device 60 includes a sample information prediction module, configured to respectively predict, based on the first sample feature map, sample categories to which sample objects to which first sample pixels belong in the first sample feature map are respectively predicted and sample position information of the first sample pixels; the panorama segmentation apparatus 60 includes a sample feature generation module, configured to perform feature generation based on the first sample feature map to obtain a second sample feature map; the panorama segmentation apparatus 60 includes a prediction information extraction module for extracting prediction position information of each sample object based on the labeling information and the prediction position information of each first sample pixel point; the panorama segmentation apparatus 60 includes a prediction segmentation acquisition module configured to process the second sample feature map based on the prediction position information of each sample object, respectively, to obtain a prediction segmentation image of each sample object; the panorama segmentation apparatus 60 includes a category loss measurement module, configured to obtain a category prediction loss based on a difference between a sample category to which a sample object to which a first sample pixel belongs actually belongs and a sample category to which prediction belongs; the panorama segmenting apparatus 60 includes a segmentation loss metric module for deriving an image segmentation loss based on a difference between a sample segmented image of a sample object and a predicted segmented image; the panorama segmentation apparatus 60 includes a network parameter adjustment module for adjusting a network parameter of the panorama segmentation model based on the class prediction loss and the image segmentation loss.

In some disclosed embodiments, the panorama segmentation map is obtained by performing panorama segmentation on an image to be segmented based on a panorama segmentation model, where the panorama segmentation model includes a category prediction network, a location prediction network, and a feature generation network, the category prediction network is used for predicting category information, the location prediction network is used for predicting location information, and the feature generation network is used for performing feature generation.

In some disclosed embodiments, the first feature map contains feature information of the image to be segmented at multiple scales; the feature extraction module 61 comprises a multi-scale feature extraction submodule and is used for extracting a third feature map with multiple scales based on an image to be segmented; the feature extraction module 61 includes a multi-scale feature fusion submodule, and is configured to fuse the third feature maps based on multiple scales to obtain the first feature map.

In some disclosed embodiments, the panorama segmentation graph is obtained by performing panorama segmentation on an image to be segmented based on a panorama segmentation model, wherein the panorama segmentation model comprises a backbone network and a fusion network, and the backbone network comprises a plurality of feature extraction sub-networks which are sequentially connected; the feature extraction sub-networks are respectively used for extracting third feature maps with different scales, and the fusion network is used for fusing the third feature maps with multiple scales to obtain the first feature map.

Referring to fig. 7, fig. 7 is a schematic frame diagram of an embodiment of an electronic device 70 of the present application. The electronic device 70 comprises a memory 71 and a processor 72 coupled to each other, wherein the memory 71 stores program instructions, and the processor 72 is configured to execute the program instructions to implement the steps in any of the embodiments of the panorama segmentation method described above. Specifically, the electronic device 70 may be an edge device, such as may include but is not limited to: a sweeping robot, a reading robot, etc., without limitation.

In particular, the processor 72 is configured to control itself and the memory 71 to implement the steps in any of the above embodiments of the panorama segmentation method. Processor 72 may also be referred to as a CPU (Central Processing Unit). The processor 72 may be an integrated circuit chip having signal processing capabilities. The Processor 72 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 72 may be collectively implemented by an integrated circuit chip.

Referring to fig. 8, fig. 8 is a block diagram illustrating an embodiment of a computer readable storage medium 80 according to the present application. The computer readable storage medium 80 stores program instructions 81 executable by the processor, the program instructions 81 being for implementing the steps in any of the panorama segmentation method embodiments described above.

According to the scheme, on one hand, due to the fact that a single-flow-end-based technical system is adopted, the problem caused by a heuristic post-processing flow of manual design can be effectively avoided, the panoramic segmentation effect is favorably improved, on the other hand, operators with high operation complexity such as transformers are not needed in the panoramic segmentation process, and the consumption of resources such as computing power and storage for deploying panoramic segmentation is greatly reduced. Therefore, the panoramic segmentation can be deployed in real time on the edge device.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is considered as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Claims

1. A panorama segmentation method, comprising:

extracting a first feature map of an image to be segmented; wherein the image to be segmented contains image objects of several classes, and the image objects comprise at least one of instances and backgrounds;

respectively predicting to obtain category information and position information of a first pixel point in the first feature map based on the first feature map, and performing feature generation based on the first feature map to obtain a second feature map;

extracting the position information of each image object based on the category information and the position information of the first pixel point;

and performing panoramic segmentation on the basis of the second characteristic map and the position information of each image object to obtain a panoramic segmentation map of the image to be segmented.

2. The method according to claim 1, wherein the extracting the position information of each image object based on the category information and the position information of the first pixel point comprises:

determining an image area of each image object in the first feature map based on the category information of the first pixel point;

determining a first pixel point at the gravity center position in the first feature map as a target pixel point based on the gravity center position of the image area of the image object;

and extracting the position information of the target pixel point as the position information of the image object.

3. The method according to claim 2, wherein the determining an image area of each image object in the first feature map based on the category information of the first pixel point comprises:

performing non-maximum suppression based on the category information of the first pixel point to obtain an image area of each image object and a category to which each image object belongs in the first feature map;

and marking the category to which each image object belongs in the panorama segmentation map.

4. The method according to claim 1, wherein the performing panorama segmentation based on the second feature map and the position information of each image object to obtain a panorama segmentation map of the image to be segmented comprises:

processing the second feature maps respectively based on the position information of the image objects to obtain segmented images of the image objects;

and fusing the segmentation images based on the image objects to obtain a panoramic segmentation image of the image to be segmented.

5. The method of claim 4, wherein the position information includes convolution parameters, and the segmented image of the image object is obtained by performing a pixel-by-pixel convolution on each second pixel point in the second feature map according to the convolution parameters of the image object.

6. The method according to claim 1, wherein the panorama segmentation map is obtained by performing panorama segmentation on the image to be segmented based on a panorama segmentation model, the panorama segmentation model is obtained by training based on a sample image, the sample image contains sample objects of a plurality of sample classes, the sample image is labeled with labeling information, the labeling information includes sample classes to which the sample objects respectively belong, the sample objects include at least one of instances and backgrounds, and the panorama segmentation model is obtained by joint training based on class prediction loss and image segmentation loss.

7. The method of claim 6, wherein the step of training the panorama segmentation model comprises:

extracting a first sample characteristic diagram of the sample image, and respectively obtaining sample categories to which sample objects respectively belong and sample segmentation images of the sample objects on the basis of the labeling information, wherein the sample categories are actually attributed to the sample objects respectively to which the first sample pixel points in the first sample characteristic diagram belong;

respectively predicting to obtain sample types to which the sample objects respectively belong and predicted position information of the first sample pixel points in the first sample characteristic diagram based on the first sample characteristic diagram, and performing characteristic generation based on the first sample characteristic diagram to obtain a second sample characteristic diagram;

extracting the predicted position information of each sample object based on the labeling information and the predicted position information of each first sample pixel point;

respectively processing the second sample feature map based on the predicted position information of each sample object to obtain a predicted segmentation image of each sample object;

obtaining a class prediction loss based on the difference between the sample class to which the sample object to which the first sample pixel point belongs actually belongs and the sample class to which prediction belongs, and obtaining an image segmentation loss based on the difference between the sample segmentation image of the sample object and the prediction segmentation image;

adjusting network parameters of the panorama segmentation model based on the class prediction loss and the image segmentation loss.

8. The method of claim 1, wherein the panorama segmentation map is obtained by performing panorama segmentation on the image to be segmented based on a panorama segmentation model, and wherein the panorama segmentation model comprises a category prediction network, a location prediction network, and a feature generation network, the category prediction network is used for predicting the category information, the location prediction network is used for predicting the location information, and the feature generation network is used for performing the feature generation.

9. The method according to claim 1, wherein the first feature map contains feature information of the image to be segmented at multiple scales; the extracting of the first feature map of the image to be segmented comprises the following steps:

extracting to obtain third feature maps of multiple scales based on the image to be segmented;

and fusing the third feature maps based on the multiple scales to obtain the first feature map.

10. The method according to claim 9, wherein the panorama segmentation map is obtained by performing panorama segmentation on the image to be segmented based on a panorama segmentation model, the panorama segmentation model comprises a backbone network and a fusion network, and the backbone network comprises a plurality of feature extraction sub-networks which are sequentially connected;

the feature extraction sub-networks are respectively used for extracting third feature maps with different scales, and the fusion network is used for fusing the third feature maps with multiple scales to obtain the first feature map.

11. A panorama segmentation apparatus, characterized by comprising:

the characteristic extraction module is used for extracting a first characteristic diagram of the image to be segmented; wherein the image to be segmented contains image objects of several classes, and the image objects comprise at least one of instances and backgrounds;

the information prediction module is used for respectively predicting to obtain category information and position information of a first pixel point in the first feature map based on the first feature map;

the characteristic generating module is used for generating characteristics based on the first characteristic diagram to obtain a second characteristic diagram;

the information extraction module is used for extracting the position information of each image object based on the category information and the position information of the first pixel point;

and the image segmentation module is used for carrying out panoramic segmentation on the basis of the second characteristic map and the position information of each image object to obtain a panoramic segmentation map of the image to be segmented.

12. An electronic device comprising a memory and a processor coupled to each other, the memory having stored therein program instructions, the processor being configured to execute the program instructions to implement the panorama segmentation method of any one of claims 1-10.

13. A computer-readable storage medium, characterized in that program instructions executable by a processor for implementing the panorama segmentation method of any one of claims 1 to 10 are stored.