CN118351589B

CN118351589B - Image processing method, apparatus, device, storage medium, and program product

Info

Publication number: CN118351589B
Application number: CN202410751020.6A
Authority: CN
Inventors: 杨扬
Original assignee: Hunan Huoyan Medical Technology Co ltd
Current assignee: Hunan Huoyan Medical Technology Co ltd
Priority date: 2024-06-12
Filing date: 2024-06-12
Publication date: 2024-08-27
Anticipated expiration: 2044-06-12
Also published as: CN118351589A

Abstract

The invention relates to the technical field of image processing, and discloses an image processing method, an image processing device, a storage medium and a program product. The method comprises the following steps: acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; processing eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data. According to the invention, the first characteristic diagram and the second characteristic diagram with the same or similar spatial resolution are averaged, so that the information of the two characteristic diagrams can be fully utilized, and the influence of noise or abnormal value possibly generated in the fusion process can be reduced.

Description

Image processing method, apparatus, device, storage medium, and program product

Technical Field

The present invention relates to the field of image processing technology, and in particular, to an image processing method, apparatus, device, storage medium, and program product.

Background

In recent years, the use of electronic devices for a long time places the eyes in a highly stressed state, resulting in an increasing prevalence of internal myopia, and therefore, it is important to discover and accurately measure myopia symptoms early.

Through research, the eye condition of a patient can be better known through accurate measurement of key characteristics such as eyeballs, corneas, pupils and the like. In the early technology, the eye condition of a patient is determined by measuring key characteristics of eyeballs, corneas, pupils and the like through a gray level projection method, an elliptic fitting method, a radial symmetric transformation algorithm and the like.

However, early techniques performed relatively poorly for noise and interference in images, resulting in an inability to accurately locate pupils in complex background or light conditions. And because of the design principle of the algorithm, parameters in the early technologies often need to be set manually according to specific scenes, which may introduce subjective factors and make it difficult to maintain stable performance under different environments.

In order to solve the problems of the early technologies, at present, various features in an eye image, such as pupil, iris, fundus, etc., can be automatically analyzed through the application of artificial intelligence in image processing. Specifically, the adopted ene semantic segmentation network model is used for accurately detecting the characteristics of eyes, accurately segmenting the eyeball structure and diagnosing the eye diseases at early stage.

However, in the image processing process, the fusion strategy of the ene semantic Segmentation network model (EFFICIENT NEURAL NETWORK FOR REAL-TIME SEMANTIC segment) is a mode of adding feature graphs, and if two feature graphs have great difference in semantics, feature conflicts can be caused by directly adding and fusing.

Disclosure of Invention

In view of the above, the present invention provides an image processing method, apparatus, device, storage medium and program product, so as to solve the problem that feature conflicts may be caused by directly performing additive fusion when two feature graphs have a large difference in semantics.

In a first aspect, the present invention provides an image processing method, the method comprising: acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; processing eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data.

According to the image processing method provided by the embodiment, by averaging the first feature map and the second feature map with the same or similar spatial resolution, the information of the two feature maps can be fully utilized, the influence of noise or abnormal values possibly generated in the fusion process can be reduced, and the influence of the noise or abnormal values is averaged by a plurality of feature maps, so that even if one feature map has abnormality or noise, the influence of the noise or abnormal values is averaged by other feature maps, and the influence on the whole fusion result is reduced.

In addition, by collecting various types of eyeball image data, namely iris data, pupil data and target ring data, more abundant information can be obtained, so that the physiological state and characteristics of the eyeball can be more comprehensively known.

In an optional embodiment, the processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data includes: classifying the third feature map, determining the category of eyeball image data, positioning the third feature map, and determining the circle center position and radius size information of the eyeball image data; and using the category, the circle center position and the radius size information as semantic segmentation results.

According to the image processing method provided by the embodiment, different areas and features in the eyeball image can be accurately distinguished through classification processing of the third feature image, and the circle center position and radius size information of the key area in the eyeball image can be determined through positioning processing of the third feature image, so that the effect of finely identifying the eyeball image information is achieved.

In an optional embodiment, the positioning processing is performed on the third feature map to determine the center position and radius size information of the eyeball image data, including: respectively acquiring a circle center abscissa, a circle center ordinate, a radius value, an abscissa offset, an ordinate offset and a radius offset which correspond to pupil data, iris data and target ring data; determining a target abscissa of the circle center position based on the circle center abscissa, the abscissa offset and the value of the radius; determining a target ordinate of the circle center position based on the circle center ordinate, the ordinate offset and the value of the radius; radius size information is determined based on the value of the radius and the radius offset.

According to the image processing method provided by the embodiment, the circle center positions of the pupil, the iris and the target ring can be very accurately determined by calculating the values of the circle center abscissa, the circle center ordinate and the radius, so that the eyeball structure can be highly accurately determined or the eyeball state can be estimated.

And the specific position and size of the eyeball characteristics can be intuitively known by calculating the target abscissa, the target ordinate and the radius size information through the values of the circle center abscissa, the circle center ordinate and the radius, the abscissa offset, the ordinate offset and the radius offset. And because of considering various factors (circle center coordinates, radius values and offset), the method can still keep better robustness when processing complex or blurred eyeball images. Relatively accurate positioning results can be obtained even in cases where the image quality is poor or the features are not apparent.

In an alternative embodiment, the center position and radius size information is determined by the following formula:

；

; wherein, Is radius size information,Is the value of radius,Is the radius offset,Is the ordinate of the circle center,Is the ordinate of the target,Is an ordinate offset,Is the abscissa of the target,Is the abscissa of the circle center,Is the abscissa offset.

In an optional embodiment, the processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image includes: carrying out sliding window processing on the third feature map to obtain a candidate region; the candidate areas are composed of vectors corresponding to the anchor points; normalizing the candidate region to obtain a target feature map; and processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.

According to the image processing method provided by the embodiment, the whole feature map can be traversed through sliding window processing, and then the candidate region is extracted through the preset anchor points, so that the key region in the eyeball image can be effectively captured.

In addition, the normalization processing enables the pixel value distribution of the candidate region to be more uniform, and is beneficial to reducing the difference between different images, so that the accuracy of subsequent semantic segmentation is improved. Through normalization, the model can learn the characteristics in the image better, and then judge the category of each pixel more accurately.

In an alternative embodiment, the method further comprises: performing pixel segmentation processing on the third feature map to obtain pixel points corresponding to iris data, pupil data and target ring data; and respectively drawing binary mask graphs corresponding to the iris data, the pupil data and the target ring data based on the pixel points.

The image processing method provided in this embodiment can accurately distinguish the iris, pupil and target ring areas in the third feature map from the background or other irrelevant areas through the pixel segmentation process, and simplify the complex image data into an image containing only two values (usually 0 and 1) by generating a binary mask map, so as to display the positions and shapes of the iris, pupil and target ring in the image in an intuitive manner.

In an alternative embodiment, the first feature map, the second feature map, and the third feature map are obtained through a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure;

The eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is an integer greater than 1, and the semantic segmentation result is output of the region recommendation network.

According to the image processing method provided by the embodiment, the complex eyeball image data can be automatically converted into the structural feature map and the final semantic segmentation result through the pre-trained network model, and feature extraction and classification rules are not required to be manually designed, so that the processing efficiency and accuracy are greatly improved. And, the encoder structure can extract deep features from eyeball images through convolution, pooling and other operations, and the features are critical to subsequent segmentation tasks. Meanwhile, the decoder structure is responsible for decoding the deep features into feature maps with the same size as the original image, and the spatial resolution of the features is maintained.

Furthermore, the network is able to fuse feature information of different scales by means of a plurality of decoder structures (t > 1). The multi-scale feature fusion is helpful for improving the segmentation performance of the model on targets with different scales, and is particularly important when processing irises, pupils and target rings with different sizes in eyeball images.

In addition, the region recommendation network can accurately recommend a region where a target may exist according to the feature map extracted by the encoder and decoder structures. This helps reduce the computational effort of subsequent processing, increases overall processing speed, and enables models to be more focused on processing critical areas.

In a second aspect, the present invention provides an image processing apparatus comprising: the acquisition module is used for acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; the first processing module is used for processing the eyeball image to generate a first characteristic image, a second characteristic image, and a third characteristic image obtained by fusing the first characteristic image and the second characteristic image; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and the second processing module is used for processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image.

In a third aspect, the present invention provides a computer device comprising: the image processing device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the image processing method of the first aspect or any corresponding implementation mode of the first aspect is executed.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the image processing method of the first aspect or any of its corresponding embodiments.

In a fifth aspect, the present invention provides a computer program product comprising computer instructions for causing a computer to perform the image processing method of the first aspect or any of its corresponding embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of an image processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another image processing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a specific network architecture of a backbone network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a semantic segmentation network model implementing eye image multitasking learning according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a specific network architecture of a regional recommendation network according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of eye image detection based on a regional recommendation network, according to an embodiment of the present invention;

fig. 7 is a block diagram of the structure of an image processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Based on the related art, at present, various features in an eye image, such as a pupil, an iris, a fundus, and the like, can be automatically analyzed through the application of artificial intelligence in image processing. Specifically, the adopted ene semantic segmentation network model is used for accurately detecting the characteristics of eyes, accurately segmenting the eyeball structure and diagnosing the eye diseases at early stage.

However, in the image processing process of the ene semantic segmentation network model, the fusion strategy is a mode of adding feature graphs, and if two feature graphs have large difference in semantics, feature conflicts can be caused by direct addition and fusion.

Based on this, the present invention provides an image processing method, by averaging a first feature map and a second feature map having the same or similar spatial resolution, information of both feature maps can be fully utilized, and influence of noise or outlier which may occur in a fusion process can be reduced, and since a plurality of feature maps are averaged, even if there is abnormality or noise in a certain feature map, influence thereof is averaged by other feature maps, thereby reducing influence on the whole fusion result.

According to an embodiment of the present invention, there is provided an image processing method embodiment, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

In this embodiment, an image processing method is provided, which may be used in a computer device, such as a computer, a server, etc., and fig. 1 is a schematic flow chart of the image processing method according to an embodiment of the present invention, as shown in fig. 1, the flow chart includes the following steps:

step S101, eyeball image data is obtained; wherein the eyeball image data includes: iris data, pupil data, and target ring data.

Eyeball image data is used to characterize various characteristics and states of the eyeball. The eyeball image data includes: iris data, pupil data, and target ring data. Wherein the iris is a circular ring part between the black pupil and the white sclera, and the iris data comprises a plurality of mutually staggered spots, filaments, crowns, stripes, recesses and other detailed features; pupil data is used to characterize the size of pupil, etc., which can reflect various physiological and psychological states of the human body. For example, changes in the diameter of the pupil may reflect vital signs of the human body, including changes in blood pressure, heart rate, and respiration, among others. The target ring data is used to characterize a specific region or marker in the eye image, which is used to mark the iris.

Specifically, the eyeball image data may be acquired by a camera capturing infrared light or light with a specific wavelength, or may be acquired by other acquisition devices, which is not limited herein.

Step S102, processing the eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; wherein the first and second feature maps fused to have the same or similar spatial resolution are averaged.

Specifically, an image processing algorithm or a neural network model may be used to generate the first feature map and the second feature map to ensure that the first image and the second image have the same or similar spatial resolution for subsequent fusion. For example: the techniques of image segmentation, edge detection, feature extraction, etc., are not particularly limited herein and may be implemented as such by those skilled in the art. The first feature map and the second feature map describe certain specific features of the eyeball image, such as edges, textures or shapes, and the like, and are not particularly limited herein.

The first feature map and the second feature map may be averaged by using a pixel average value, a neural network model, or the like, to obtain a third feature map. The third feature map includes information of the first feature map and the second feature map.

And step S103, processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data.

The semantic segmentation results may be used to characterize the coordinates, radius, etc. of the iris data, pupil data, and target ring data, without specific limitation herein. Specifically, the third feature map may be processed by a neural network model, edge detection, threshold segmentation, and other manners to obtain a semantic segmentation result corresponding to the eyeball image data, which is not limited herein, and may be implemented by those skilled in the art.

In this embodiment, an image processing method is provided, which may be used in the above-mentioned computer device, such as a computer, a server, etc., and fig. 2 is a schematic flow chart of the image processing method according to an embodiment of the present invention, as shown in fig. 2, where the flow chart includes the following steps:

Step S201, eyeball image data is obtained; wherein the eyeball image data includes: iris data, pupil data, and target ring data. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.

Step S202, processing the eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; wherein the first and second feature maps fused to have the same or similar spatial resolution are averaged. Please refer to step S102 in the embodiment shown in fig. 1 in detail, which is not described herein.

And step S203, processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data.

Specifically, the step S203 includes:

Step S2031, classifying the third feature map to determine the category of the eyeball image data.

The class of eyeball image data may be used to characterize the class of iris, pupil, and target ring. Specifically, the third feature map is processed by a classification algorithm (such as a support vector machine, a random forest, a neural network, etc.) to determine the category of the eyeball image data, which is not particularly limited herein, and may be implemented by those skilled in the art.

Step S2032, performing positioning processing on the third feature map, and determining the center position and radius size information of the eyeball image data.

After determining the category of the eyeball image data, the location and size of a specific structure (such as pupil, iris, etc.) in the image is located. The center position and radius size information may be determined by a neural network, circle or ellipse detection in an image, and the like, which is not particularly limited herein and may be implemented by those skilled in the art.

The step S2032 includes:

And a1, respectively acquiring a circle center abscissa, a circle center ordinate, a radius value, an abscissa offset, an ordinate offset and a radius offset which correspond to pupil data, iris data and target ring data.

The values of the circle center abscissa, the circle center ordinate and the radius can respectively represent the values of the circle center abscissa, the circle center ordinate and the radius of the pupil, the iris and the target ring. In particular, the abscissa, ordinate and radius offsets may be directly obtained by measurement methods and may be used to characterize the pupil, iris and off-center point position in the target ring. For example: the offset is used to describe the deviation between the detected center of the target bounding box and the theoretical or predicted center point. In algorithms such as the object detection algorithm (You Only Look Once, YOLO), these offsets are learned to better adapt to the actual position of the object in the image.

And a2, determining the target abscissa of the circle center position based on the circle center abscissa, the abscissa offset and the value of the radius.

And a step a3, determining the ordinate of the target of the circle center position based on the ordinate of the circle center, the ordinate offset and the value of the radius.

And a step a4 of determining radius size information based on the value of the radius and the radius offset.

The center position and radius size information is determined by the following formula:

；

; wherein, Is radius size information,Is the value of radius,Is the radius offset,Is the ordinate of the circle center,Is the ordinate of the target,Is the abscissa of the target,Is the abscissa of the circle center,Is the abscissa offset.

Step S2033, the category, the center position, and the radius size information are used as semantic segmentation results.

The semantic segmentation result consists of pupil data, iris data and target ring data, and corresponding category, circle center position and radius size information.

In addition, by calculating the values of the circle center abscissa, the circle center ordinate, and the radius, the circle center positions of the pupil, the iris, and the target ring can be determined very accurately, thereby determining the eyeball structure or evaluating the eyeball state with high accuracy.

In an optional embodiment, the processing the third feature map in step S104 to obtain the semantic segmentation result corresponding to the eyeball image includes:

Step b1, carrying out sliding window processing on the third feature map to obtain a candidate region; the candidate region is composed of vectors corresponding to the anchor points.

Sliding window processing may be used to perform object detection. Specifically, on a given feature map (third feature map), the sliding window process would place a fixed size window and traverse the entire third feature map (slide the window). At each window position, some action is performed (e.g., applying a pre-trained classifier) to detect whether the region contains an object of interest (which may be some portion of an eyeball).

The result of the sliding window process is a series of candidate regions that contain the detected target. In the object detection task, these candidate regions are typically defined by anchors (anchors), which are preset rectangular boxes of different sizes and proportions.

And b2, carrying out normalization processing on the candidate region to obtain a target feature map.

Normalization processes are used to adjust the data range, and in object detection or semantic segmentation, normalization can ensure that candidate regions of different sizes and locations have similar weights or effects in subsequent processing. Specifically, the candidate region may be scaled to a fixed size so that it may be entered into a fully connected layer or other layer requiring a fixed input size.

The normalized feature map is generally referred to as a target feature map, which contains normalized candidate region information and is used for subsequent classification or segmentation tasks.

And b3, processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.

Each pixel in the target feature map is assigned a class label. In the context of an eye image, each pixel of the image is classified as a different portion of the eye (e.g., iris, pupil, etc.).

To obtain semantic segmentation results, algorithms typically use one or more convolutional layers, fully-connected layers, or other types of neural network layers to process the target feature map. These layers may learn how to map normalized candidate regions to corresponding class labels. Finally, the class labels are combined into a segmentation map (segmentation map) of the same size as the input image, with each pixel labeled as its corresponding class.

In an alternative embodiment, the method further comprises:

and c1, performing pixel segmentation processing on the third feature image to obtain iris data, pupil data and pixel points corresponding to the target ring data.

The pixel division process is used to divide the third feature map into a plurality of regions. In this step, the third feature map is pixel-level classified to distinguish different areas of the iris, pupil, and target ring (possibly referring to specific markers in the image for vision testing or training). Wherein, the iris data, pupil data and pixel sets of different areas of the target ring data. The iris is the portion of the eye that lies around the pupil and has a unique texture and color. The pupil is a circular hole in the center of the iris, and becomes larger or smaller with the change of light. The target ring is used for guiding the testee to watch or judge the area.

And c2, respectively drawing binary mask patterns corresponding to the iris data, the pupil data and the target ring data based on the pixel points.

A binary mask map is an image in which each pixel has only two possible values (typically 0 and 1) for representing the presence or absence of different regions in the image. In this step, three binary mask patterns will be created using the pixel points obtained in step c 1: one for the iris, one for the pupil and one for the target ring. In particular, the mapping process involves comparing each pixel in the third feature map to the pixel points of the iris, pupil and target ring. If a pixel is classified as part of the iris, then at that pixel location the corresponding location of the binary mask map for the iris will be set to 1 (indicating present) and otherwise set to 0 (indicating not present). Similarly, a corresponding binary mask map is created for the pupil and target ring.

In an alternative embodiment, the first feature map, the second feature map, and the third feature map are obtained through a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure; the eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is an integer greater than 1, and the semantic segmentation result is output of the region recommendation network.

Specifically, the steps S101 to S103, S2031 to S2033, a1 to a4, b1 to b3, and c1 to c2 may be implemented by a pre-trained semantic segmentation network model. The pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure; the eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is larger than 1, and the semantic segmentation result is output of the region recommendation network.

Preferably, in the backbone architecture of the Encoder-Decoder structure of the convolutional neural network, the convolutional neural network is used for processing eyeball image data and extracting image features through a backbone network. The process gradually converts the original eyeball image into a high-level abstract feature map (feature maps) through operations such as rolling and pooling so as to more effectively capture key information in the image.

Specifically, the encoder structure (Encoder) and the Decoder structure (Decoder) each include a plurality of bottleneck modules; wherein, bottleneck module is a basic building block of the semantic segmentation network model, which consists of one convolution layer of 1x1, one convolution layer of 3x3 and another convolution layer of 1x 1. This structure helps to reduce the number of parameters of the model.

Referring to fig. 3, encoder: the first 3 stages belong to the decoding stage. stage1: comprising 5 bottleneck modules, the first bottleneck module performs downsampling and the next 4 repeated bottleneck modules performs downsampling and upsampling operations. Stage 2-3: the bottleneck 2.0.0 module of stage2 performs downsampling, and can add hole convolution or decompose convolution. While stage3 does not downsample, the other operations are the same as stage 2. Among these, stage is typically a concept of an organization network structure for dividing different parts of a model into logically related groups or hierarchies.

Referring to fig. 3, the Decoder: comprising an upsampling layer and two bottleneck modules, gradually restoring the feature map size to the original input image size and producing a final segmentation result.

More specifically, encoder (encoder) sections, each time a layer passes, feature maps are reduced in width and height by one time, and depth is doubled; the Decoder part increases the width and height of the feature map by 2 times and reduces the depth by one time every time one layer passes. Each layer of dimension change table may be [3,32,64,128,256,512,256,128,64 32].

The convolution network in this embodiment changes the residual structure of the upsampling stage into an average fusion strategy. This strategy reduces to some extent the need for feature changes and feature preservation by the fusion process.; Wherein,As a result of the third characteristic diagram,As a first characteristic diagram of the image of the object,Is a second feature map.

As shown in connection with fig. 4 and 5, the regional recommendation network (Region Proposal Network, RPN) structure includes one convolutional layer of 3X3 and its convolutional layer connecting two 1X1 (i.e., 1X1 Conv in fig. 4, 5). The RPN performs sliding window processing on each position of the third feature map through a 3x3 convolution layer (RolNorm), generating K vectors through K anchor points (circular anchors) of different scales and aspect ratios, where K is a positive integer greater than 2. Wherein, two detection branches are classified network and regression network respectively. The first branch is mainly to classify candidate regions. This means that the RPN can identify potential eye regions and assign a confidence level to each region, thereby providing targeted candidate regions. The second is to regress the boundary information of the pupil, iris and target ring, providing highly accurate positional information.

The classification network classifies each anchor point to calculate target probability values for three targets of pupil, iris and target ring. And regression networks get their specific locations by regression. The RPN generates K anchor points with different sizes at each position of the feature map, and each anchor point is composed of 3 variables. The center abscissa, the center ordinate and the radius of each target are respectively.

Further, specific positions of pupils, irises and target rings are obtained by regression, specifically, 3 variables are knownThe values of the target circle center abscissa, the target circle center ordinate and the target radius of each anchor point are respectively represented, and the offset of the 3 coordinates is%) The translation scaling values of 3 variables corresponding to the target point respectively, and the 3 coordinate offsets can be obtained through regression through the output of the RPN network, so that the final candidate target circle center and radius value generated by the RPN can be obtained by the following formula:

。

; the specific content refers to the step a4, and will not be repeated here.

In an alternative embodiment, the steps of determining the classified network loss and the recurrent network loss are as follows:

referring to fig. 6, for each candidate box, a cross entropy loss function is used to measure the accuracy of classification of the model for the target class. In order to predict the class confidence of an eyeball,For predicting the coordinates of a category,Predicting a radius for a circle of the corresponding coordinates; and calculating GroundTrue (real) coordinates and Prediction coordinates by using a loss function of the region of interest (Region of Interest, ROI) so as to obtain center coordinates of the iris, the pupil and the target ring.

; Wherein,In order to classify the loss of the device,For the regression loss of the center coordinates,For the radius length regression loss,Is the loss function of the RPN.

Preferably, after the convolutional neural network of Encoder-Decoder structure extracts the third feature map and performs RPN network, the third branch mask network (mask network) is a full convolutional network including four 3X3 convolutional layers, followed by a 2X2 transpose convolutional layer and a 1X1 convolutional layer outputting a binary mask. Wherein, the mask network (mask network) adopts Binary Cross entropy loss (Binary Cross-Entropy Loss) at pixel level to represent the probability between the Binary segmentation mask and the actual segmentation mask of each pixel point in the normalized candidate region.

In an alternative embodiment, the step of determining the masking network penalty is as follows:

; wherein, Is a mask penalty; for masking network loss, m is the true value and m is the predicted value.

The overall loss function of the neural network specifically comprises: the penalty functions of the classification network and regression network (CRN) and the penalty functions of the mask network (masknetwork), the semantic segmentation network model total penalty functions are:

; wherein L is the total loss function of the semantic segmentation network model.

According to the image processing method, the optimized Enet semantic segmentation network is applied to eyeball image segmentation and positioning, so that a full-automatic learning process is realized, segmented and normalized iris images are accurately acquired, and a corresponding binary mask image is generated. This process ensures a high degree of accuracy and consistency.

In one application scenario, the training process for the semantic segmentation network model includes: the dnn module of OpenCV uses the ENet structure for semantic segmentation. Optimizer and learning strategy identical for all experiments: the optimizer uses Adam, the initial learning rate 0.001,weight decay is set to 5e-4, and an adaptive learning rate adjustment strategy is adopted, with a maximum epoch of 150. All models were trained using 1 Nvidia GPU and deployed and inferred using the i5 family of central processors.

The data acquisition is carried out on the observer by a biological measuring instrument, and the data image comprises three target objects of iris, pupil and target ring on the eyeball surface.

The iris pixel of the sample eyeball image is marked as 0, the pupil pixel is marked as 1, and the target ring pixel is marked as 2, namely the multi-classification marking is realized, and a mask map of a real target area is generated.

Before training a segmentation network, labeling a sample image set for training, wherein the labeling comprises 1) generating a binary mask map of iris, pupil and target ring areas in an eyeball image; 2) The central positions and the radius of the iris, the pupil and the target ring are marked. The data is divided into a training set, a validation set and a test set. And performing image feature extraction by using the Encoder-Decoder structure as a backstone, and performing fine adjustment on multi-branch tasks of the decoding structure to realize multi-task learning, so that model training can be completed.

And respectively inputting the marked sample eyeball images into the semantic segmentation network model, and training the semantic segmentation network model until the model of the semantic segmentation network model converges.

The network parameters and activation values are quantized to a lower-bit representation using quantization techniques, thereby reducing the memory footprint and computational effort of the model. During training, the weights and activation values of the model are calculated using single precision floating point numbers. The weights and activation values of the trained model are mapped to a lower precision representation of the 8-bit integer in the inference phase.

And inputting iris images required to be segmented into iris areas into an Enet segmentation network after training to obtain boundary sizes, center positions and binary mask patterns of the iris, the pupil and the target ring.

The present embodiment also provides an image processing apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides an image processing apparatus, as shown in fig. 7, including: an acquisition module 701, configured to acquire eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; the first processing module 702 is configured to process the eyeball image to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and the second processing module 703 is configured to process the third feature map to obtain a semantic segmentation result corresponding to the eyeball image.

In an alternative embodiment, the second processing module 703 includes: the first processing unit is used for classifying the third feature map and determining the category of eyeball image data; the second processing unit is used for carrying out positioning processing on the third feature map and determining the circle center position and the radius size information of the eyeball image data; and the first determining unit is used for taking the category, the circle center position and the radius size information as semantic segmentation results.

In an alternative embodiment, the second processing unit comprises: the acquisition subunit is used for respectively acquiring the circle center abscissa, the circle center ordinate, the radius value, the abscissa offset, the ordinate offset and the radius offset which are respectively corresponding to the pupil data, the iris data and the target ring data; a first determining subunit, configured to determine a target abscissa of the center position based on the center abscissa, the abscissa offset, and the value of the radius; the second determining subunit is used for determining the target ordinate of the circle center position based on the circle center ordinate, the ordinate offset and the value of the radius; and a third determination subunit configured to determine radius size information based on the value of the radius and the radius offset.

In an alternative embodiment, the apparatus further comprises: the determining module is used for determining the circle center position and the radius size information through the following formula:

。

In an alternative embodiment, the second processing module 703 includes: the sliding window processing unit is used for carrying out sliding window processing on the third feature map to obtain a candidate region; the candidate areas are composed of vectors corresponding to the anchor points; the normalization processing unit is used for carrying out normalization processing on the candidate region to obtain a target feature map; and the third processing unit is used for processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.

In an alternative embodiment, the apparatus further comprises: the pixel segmentation processing module is used for carrying out pixel segmentation processing on the third feature image to obtain iris data, pupil data and pixel points corresponding to the target ring data; and the drawing module is used for respectively drawing the binary mask map corresponding to the iris data, the pupil data and the target ring data based on the pixel points.

In an alternative embodiment, the first feature map, the second feature map, and the third feature map are obtained through a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure; the eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is larger than 1, and the semantic segmentation result is output of the region recommendation network.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The image processing apparatus in this embodiment is presented in the form of functional units, where the functional units refer to ASIC (Application SPECIFIC INTEGRATED Circuit) circuits, processors and memories that execute one or more software or firmware programs, and/or other devices that can provide the above-described functions.

The embodiment of the invention also provides computer equipment, which is provided with the image processing device shown in the figure 7.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 8, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 8.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. An image processing method, the method comprising:

acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data;

Processing the eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the fusion is carried out on a first characteristic diagram and a second characteristic diagram which have the same or similar spatial resolution;

processing the third feature map to obtain a semantic segmentation result corresponding to eyeball image data;

The processing of the third feature map to obtain a semantic segmentation result corresponding to eyeball image data includes:

Classifying the third feature map, determining the category of the eyeball image data, positioning the third feature map, and determining the circle center position and the radius size information of the eyeball image data;

And taking the category, the circle center position and the radius size information as the semantic segmentation result.

2. The image processing method according to claim 1, wherein performing positioning processing on the third feature map to determine the center position and radius size information of the eyeball image data includes:

Respectively acquiring a circle center abscissa, a circle center ordinate, a radius value, an abscissa offset, an ordinate offset and a radius offset which correspond to pupil data, iris data and target ring data;

Determining a target abscissa of the circle center position based on the circle center abscissa, the abscissa offset and the value of the radius;

determining a target ordinate of the circle center position based on the circle center ordinate, the ordinate offset and the value of the radius;

radius size information is determined based on the value of the radius and the radius offset.

3. The image processing method according to claim 2, wherein the center position and radius size information is determined by the following formula:

；

4. The image processing method according to claim 1, wherein the processing the third feature map to obtain a semantic segmentation result corresponding to an eyeball image includes:

performing sliding window processing on the third feature map to obtain a candidate region; the candidate areas are composed of vectors corresponding to the anchor points;

normalizing the candidate region to obtain a target feature map;

and processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.

5. The image processing method according to claim 1, characterized in that the method further comprises:

performing pixel segmentation processing on the third feature image to obtain pixel points corresponding to iris data, pupil data and target ring data;

And respectively drawing binary mask graphs corresponding to the iris data, the pupil data and the target ring data based on the pixel points.

6. The image processing method according to any one of claims 1 to 5, wherein the first feature map, the second feature map, and the third feature map are obtained by a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: a backbone network and a regional recommendation network, the backbone network comprising: an encoder structure and a decoder structure;

The eyeball image data is input of the encoder structure, the first feature map is output of the encoder structure and part of input of a t decoder structure, the second feature map is output of a t-1 decoder structure and part of input of a t decoder structure, the third feature map is output of the t decoder structure and input of a region recommendation network, t is an integer greater than 1, and the semantic segmentation result is output of the region recommendation network.

7. An image processing apparatus, characterized in that the apparatus comprises:

The acquisition module is used for acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data;

the first processing module is used for processing the eyeball image to generate a first feature map, and fusing the first feature map with a second feature map and a third feature map, wherein the first feature map and the second feature map which are fused to be the same or similar in spatial resolution are averaged, and the third feature map is a feature map obtained after the first feature map and the second feature map are averaged;

The second processing module is used for processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image;

The second processing module includes: the first processing unit is used for classifying the third feature map and determining the category of eyeball image data; the second processing unit is used for carrying out positioning processing on the third feature map and determining the circle center position and the radius size information of the eyeball image data; and the first determining unit is used for taking the category, the circle center position and the radius size information as semantic segmentation results.

8. A computer device, comprising:

A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the image processing method of any of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer instructions for causing a computer to execute the image processing method according to any one of claims 1 to 6.

10. A computer program product comprising computer instructions for causing a computer to perform the image processing method of any one of claims 1 to 6.