CN118351589B - Image processing method, apparatus, device, storage medium, and program product - Google Patents
Image processing method, apparatus, device, storage medium, and program product Download PDFInfo
- Publication number
- CN118351589B CN118351589B CN202410751020.6A CN202410751020A CN118351589B CN 118351589 B CN118351589 B CN 118351589B CN 202410751020 A CN202410751020 A CN 202410751020A CN 118351589 B CN118351589 B CN 118351589B
- Authority
- CN
- China
- Prior art keywords
- feature map
- data
- radius
- processing
- eyeball image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 37
- 210000005252 bulbus oculi Anatomy 0.000 claims abstract description 103
- 238000012545 processing Methods 0.000 claims abstract description 91
- 230000011218 segmentation Effects 0.000 claims abstract description 77
- 210000001747 pupil Anatomy 0.000 claims abstract description 62
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000010586 diagram Methods 0.000 claims abstract description 34
- 230000015654 memory Effects 0.000 claims description 28
- 230000004927 fusion Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 abstract description 7
- 238000007499 fusion processing Methods 0.000 abstract description 5
- 210000000554 iris Anatomy 0.000 description 56
- 230000008569 process Effects 0.000 description 18
- 210000001508 eye Anatomy 0.000 description 14
- 230000006870 function Effects 0.000 description 11
- 238000010606 normalization Methods 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 238000001514 detection method Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 210000004087 cornea Anatomy 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 208000030533 eye disease Diseases 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 208000001491 myopia Diseases 0.000 description 2
- 230000004379 myopia Effects 0.000 description 2
- 230000035790 physiological processes and functions Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 210000003786 sclera Anatomy 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/197—Matching; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Ophthalmology & Optometry (AREA)
- Human Computer Interaction (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of image processing, and discloses an image processing method, an image processing device, a storage medium and a program product. The method comprises the following steps: acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; processing eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data. According to the invention, the first characteristic diagram and the second characteristic diagram with the same or similar spatial resolution are averaged, so that the information of the two characteristic diagrams can be fully utilized, and the influence of noise or abnormal value possibly generated in the fusion process can be reduced.
Description
Technical Field
The present invention relates to the field of image processing technology, and in particular, to an image processing method, apparatus, device, storage medium, and program product.
Background
In recent years, the use of electronic devices for a long time places the eyes in a highly stressed state, resulting in an increasing prevalence of internal myopia, and therefore, it is important to discover and accurately measure myopia symptoms early.
Through research, the eye condition of a patient can be better known through accurate measurement of key characteristics such as eyeballs, corneas, pupils and the like. In the early technology, the eye condition of a patient is determined by measuring key characteristics of eyeballs, corneas, pupils and the like through a gray level projection method, an elliptic fitting method, a radial symmetric transformation algorithm and the like.
However, early techniques performed relatively poorly for noise and interference in images, resulting in an inability to accurately locate pupils in complex background or light conditions. And because of the design principle of the algorithm, parameters in the early technologies often need to be set manually according to specific scenes, which may introduce subjective factors and make it difficult to maintain stable performance under different environments.
In order to solve the problems of the early technologies, at present, various features in an eye image, such as pupil, iris, fundus, etc., can be automatically analyzed through the application of artificial intelligence in image processing. Specifically, the adopted ene semantic segmentation network model is used for accurately detecting the characteristics of eyes, accurately segmenting the eyeball structure and diagnosing the eye diseases at early stage.
However, in the image processing process, the fusion strategy of the ene semantic Segmentation network model (EFFICIENT NEURAL NETWORK FOR REAL-TIME SEMANTIC segment) is a mode of adding feature graphs, and if two feature graphs have great difference in semantics, feature conflicts can be caused by directly adding and fusing.
Disclosure of Invention
In view of the above, the present invention provides an image processing method, apparatus, device, storage medium and program product, so as to solve the problem that feature conflicts may be caused by directly performing additive fusion when two feature graphs have a large difference in semantics.
In a first aspect, the present invention provides an image processing method, the method comprising: acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; processing eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data.
According to the image processing method provided by the embodiment, by averaging the first feature map and the second feature map with the same or similar spatial resolution, the information of the two feature maps can be fully utilized, the influence of noise or abnormal values possibly generated in the fusion process can be reduced, and the influence of the noise or abnormal values is averaged by a plurality of feature maps, so that even if one feature map has abnormality or noise, the influence of the noise or abnormal values is averaged by other feature maps, and the influence on the whole fusion result is reduced.
In addition, by collecting various types of eyeball image data, namely iris data, pupil data and target ring data, more abundant information can be obtained, so that the physiological state and characteristics of the eyeball can be more comprehensively known.
In an optional embodiment, the processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data includes: classifying the third feature map, determining the category of eyeball image data, positioning the third feature map, and determining the circle center position and radius size information of the eyeball image data; and using the category, the circle center position and the radius size information as semantic segmentation results.
According to the image processing method provided by the embodiment, different areas and features in the eyeball image can be accurately distinguished through classification processing of the third feature image, and the circle center position and radius size information of the key area in the eyeball image can be determined through positioning processing of the third feature image, so that the effect of finely identifying the eyeball image information is achieved.
In an optional embodiment, the positioning processing is performed on the third feature map to determine the center position and radius size information of the eyeball image data, including: respectively acquiring a circle center abscissa, a circle center ordinate, a radius value, an abscissa offset, an ordinate offset and a radius offset which correspond to pupil data, iris data and target ring data; determining a target abscissa of the circle center position based on the circle center abscissa, the abscissa offset and the value of the radius; determining a target ordinate of the circle center position based on the circle center ordinate, the ordinate offset and the value of the radius; radius size information is determined based on the value of the radius and the radius offset.
According to the image processing method provided by the embodiment, the circle center positions of the pupil, the iris and the target ring can be very accurately determined by calculating the values of the circle center abscissa, the circle center ordinate and the radius, so that the eyeball structure can be highly accurately determined or the eyeball state can be estimated.
And the specific position and size of the eyeball characteristics can be intuitively known by calculating the target abscissa, the target ordinate and the radius size information through the values of the circle center abscissa, the circle center ordinate and the radius, the abscissa offset, the ordinate offset and the radius offset. And because of considering various factors (circle center coordinates, radius values and offset), the method can still keep better robustness when processing complex or blurred eyeball images. Relatively accurate positioning results can be obtained even in cases where the image quality is poor or the features are not apparent.
In an alternative embodiment, the center position and radius size information is determined by the following formula:
;
;
; wherein, Is radius size information,Is the value of radius,Is the radius offset,Is the ordinate of the circle center,Is the ordinate of the target,Is an ordinate offset,Is the abscissa of the target,Is the abscissa of the circle center,Is the abscissa offset.
In an optional embodiment, the processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image includes: carrying out sliding window processing on the third feature map to obtain a candidate region; the candidate areas are composed of vectors corresponding to the anchor points; normalizing the candidate region to obtain a target feature map; and processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.
According to the image processing method provided by the embodiment, the whole feature map can be traversed through sliding window processing, and then the candidate region is extracted through the preset anchor points, so that the key region in the eyeball image can be effectively captured.
In addition, the normalization processing enables the pixel value distribution of the candidate region to be more uniform, and is beneficial to reducing the difference between different images, so that the accuracy of subsequent semantic segmentation is improved. Through normalization, the model can learn the characteristics in the image better, and then judge the category of each pixel more accurately.
In an alternative embodiment, the method further comprises: performing pixel segmentation processing on the third feature map to obtain pixel points corresponding to iris data, pupil data and target ring data; and respectively drawing binary mask graphs corresponding to the iris data, the pupil data and the target ring data based on the pixel points.
The image processing method provided in this embodiment can accurately distinguish the iris, pupil and target ring areas in the third feature map from the background or other irrelevant areas through the pixel segmentation process, and simplify the complex image data into an image containing only two values (usually 0 and 1) by generating a binary mask map, so as to display the positions and shapes of the iris, pupil and target ring in the image in an intuitive manner.
In an alternative embodiment, the first feature map, the second feature map, and the third feature map are obtained through a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure;
The eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is an integer greater than 1, and the semantic segmentation result is output of the region recommendation network.
According to the image processing method provided by the embodiment, the complex eyeball image data can be automatically converted into the structural feature map and the final semantic segmentation result through the pre-trained network model, and feature extraction and classification rules are not required to be manually designed, so that the processing efficiency and accuracy are greatly improved. And, the encoder structure can extract deep features from eyeball images through convolution, pooling and other operations, and the features are critical to subsequent segmentation tasks. Meanwhile, the decoder structure is responsible for decoding the deep features into feature maps with the same size as the original image, and the spatial resolution of the features is maintained.
Furthermore, the network is able to fuse feature information of different scales by means of a plurality of decoder structures (t > 1). The multi-scale feature fusion is helpful for improving the segmentation performance of the model on targets with different scales, and is particularly important when processing irises, pupils and target rings with different sizes in eyeball images.
In addition, the region recommendation network can accurately recommend a region where a target may exist according to the feature map extracted by the encoder and decoder structures. This helps reduce the computational effort of subsequent processing, increases overall processing speed, and enables models to be more focused on processing critical areas.
In a second aspect, the present invention provides an image processing apparatus comprising: the acquisition module is used for acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; the first processing module is used for processing the eyeball image to generate a first characteristic image, a second characteristic image, and a third characteristic image obtained by fusing the first characteristic image and the second characteristic image; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and the second processing module is used for processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image.
In a third aspect, the present invention provides a computer device comprising: the image processing device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the image processing method of the first aspect or any corresponding implementation mode of the first aspect is executed.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the image processing method of the first aspect or any of its corresponding embodiments.
In a fifth aspect, the present invention provides a computer program product comprising computer instructions for causing a computer to perform the image processing method of the first aspect or any of its corresponding embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of an image processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another image processing method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a specific network architecture of a backbone network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a semantic segmentation network model implementing eye image multitasking learning according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a specific network architecture of a regional recommendation network according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of eye image detection based on a regional recommendation network, according to an embodiment of the present invention;
fig. 7 is a block diagram of the structure of an image processing apparatus according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Based on the related art, at present, various features in an eye image, such as a pupil, an iris, a fundus, and the like, can be automatically analyzed through the application of artificial intelligence in image processing. Specifically, the adopted ene semantic segmentation network model is used for accurately detecting the characteristics of eyes, accurately segmenting the eyeball structure and diagnosing the eye diseases at early stage.
However, in the image processing process of the ene semantic segmentation network model, the fusion strategy is a mode of adding feature graphs, and if two feature graphs have large difference in semantics, feature conflicts can be caused by direct addition and fusion.
Based on this, the present invention provides an image processing method, by averaging a first feature map and a second feature map having the same or similar spatial resolution, information of both feature maps can be fully utilized, and influence of noise or outlier which may occur in a fusion process can be reduced, and since a plurality of feature maps are averaged, even if there is abnormality or noise in a certain feature map, influence thereof is averaged by other feature maps, thereby reducing influence on the whole fusion result.
According to an embodiment of the present invention, there is provided an image processing method embodiment, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.
In this embodiment, an image processing method is provided, which may be used in a computer device, such as a computer, a server, etc., and fig. 1 is a schematic flow chart of the image processing method according to an embodiment of the present invention, as shown in fig. 1, the flow chart includes the following steps:
step S101, eyeball image data is obtained; wherein the eyeball image data includes: iris data, pupil data, and target ring data.
Eyeball image data is used to characterize various characteristics and states of the eyeball. The eyeball image data includes: iris data, pupil data, and target ring data. Wherein the iris is a circular ring part between the black pupil and the white sclera, and the iris data comprises a plurality of mutually staggered spots, filaments, crowns, stripes, recesses and other detailed features; pupil data is used to characterize the size of pupil, etc., which can reflect various physiological and psychological states of the human body. For example, changes in the diameter of the pupil may reflect vital signs of the human body, including changes in blood pressure, heart rate, and respiration, among others. The target ring data is used to characterize a specific region or marker in the eye image, which is used to mark the iris.
Specifically, the eyeball image data may be acquired by a camera capturing infrared light or light with a specific wavelength, or may be acquired by other acquisition devices, which is not limited herein.
Step S102, processing the eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; wherein the first and second feature maps fused to have the same or similar spatial resolution are averaged.
Specifically, an image processing algorithm or a neural network model may be used to generate the first feature map and the second feature map to ensure that the first image and the second image have the same or similar spatial resolution for subsequent fusion. For example: the techniques of image segmentation, edge detection, feature extraction, etc., are not particularly limited herein and may be implemented as such by those skilled in the art. The first feature map and the second feature map describe certain specific features of the eyeball image, such as edges, textures or shapes, and the like, and are not particularly limited herein.
The first feature map and the second feature map may be averaged by using a pixel average value, a neural network model, or the like, to obtain a third feature map. The third feature map includes information of the first feature map and the second feature map.
And step S103, processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data.
The semantic segmentation results may be used to characterize the coordinates, radius, etc. of the iris data, pupil data, and target ring data, without specific limitation herein. Specifically, the third feature map may be processed by a neural network model, edge detection, threshold segmentation, and other manners to obtain a semantic segmentation result corresponding to the eyeball image data, which is not limited herein, and may be implemented by those skilled in the art.
According to the image processing method provided by the embodiment, by averaging the first feature map and the second feature map with the same or similar spatial resolution, the information of the two feature maps can be fully utilized, the influence of noise or abnormal values possibly generated in the fusion process can be reduced, and the influence of the noise or abnormal values is averaged by a plurality of feature maps, so that even if one feature map has abnormality or noise, the influence of the noise or abnormal values is averaged by other feature maps, and the influence on the whole fusion result is reduced.
In addition, by collecting various types of eyeball image data, namely iris data, pupil data and target ring data, more abundant information can be obtained, so that the physiological state and characteristics of the eyeball can be more comprehensively known.
In this embodiment, an image processing method is provided, which may be used in the above-mentioned computer device, such as a computer, a server, etc., and fig. 2 is a schematic flow chart of the image processing method according to an embodiment of the present invention, as shown in fig. 2, where the flow chart includes the following steps:
Step S201, eyeball image data is obtained; wherein the eyeball image data includes: iris data, pupil data, and target ring data. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S202, processing the eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; wherein the first and second feature maps fused to have the same or similar spatial resolution are averaged. Please refer to step S102 in the embodiment shown in fig. 1 in detail, which is not described herein.
And step S203, processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image data.
Specifically, the step S203 includes:
Step S2031, classifying the third feature map to determine the category of the eyeball image data.
The class of eyeball image data may be used to characterize the class of iris, pupil, and target ring. Specifically, the third feature map is processed by a classification algorithm (such as a support vector machine, a random forest, a neural network, etc.) to determine the category of the eyeball image data, which is not particularly limited herein, and may be implemented by those skilled in the art.
Step S2032, performing positioning processing on the third feature map, and determining the center position and radius size information of the eyeball image data.
After determining the category of the eyeball image data, the location and size of a specific structure (such as pupil, iris, etc.) in the image is located. The center position and radius size information may be determined by a neural network, circle or ellipse detection in an image, and the like, which is not particularly limited herein and may be implemented by those skilled in the art.
The step S2032 includes:
And a1, respectively acquiring a circle center abscissa, a circle center ordinate, a radius value, an abscissa offset, an ordinate offset and a radius offset which correspond to pupil data, iris data and target ring data.
The values of the circle center abscissa, the circle center ordinate and the radius can respectively represent the values of the circle center abscissa, the circle center ordinate and the radius of the pupil, the iris and the target ring. In particular, the abscissa, ordinate and radius offsets may be directly obtained by measurement methods and may be used to characterize the pupil, iris and off-center point position in the target ring. For example: the offset is used to describe the deviation between the detected center of the target bounding box and the theoretical or predicted center point. In algorithms such as the object detection algorithm (You Only Look Once, YOLO), these offsets are learned to better adapt to the actual position of the object in the image.
And a2, determining the target abscissa of the circle center position based on the circle center abscissa, the abscissa offset and the value of the radius.
And a step a3, determining the ordinate of the target of the circle center position based on the ordinate of the circle center, the ordinate offset and the value of the radius.
And a step a4 of determining radius size information based on the value of the radius and the radius offset.
The center position and radius size information is determined by the following formula:
;
;
; wherein, Is radius size information,Is the value of radius,Is the radius offset,Is the ordinate of the circle center,Is the ordinate of the target,Is the abscissa of the target,Is the abscissa of the circle center,Is the abscissa offset.
Step S2033, the category, the center position, and the radius size information are used as semantic segmentation results.
The semantic segmentation result consists of pupil data, iris data and target ring data, and corresponding category, circle center position and radius size information.
According to the image processing method provided by the embodiment, different areas and features in the eyeball image can be accurately distinguished through classification processing of the third feature image, and the circle center position and radius size information of the key area in the eyeball image can be determined through positioning processing of the third feature image, so that the effect of finely identifying the eyeball image information is achieved.
In addition, by calculating the values of the circle center abscissa, the circle center ordinate, and the radius, the circle center positions of the pupil, the iris, and the target ring can be determined very accurately, thereby determining the eyeball structure or evaluating the eyeball state with high accuracy.
And the specific position and size of the eyeball characteristics can be intuitively known by calculating the target abscissa, the target ordinate and the radius size information through the values of the circle center abscissa, the circle center ordinate and the radius, the abscissa offset, the ordinate offset and the radius offset. And because of considering various factors (circle center coordinates, radius values and offset), the method can still keep better robustness when processing complex or blurred eyeball images. Relatively accurate positioning results can be obtained even in cases where the image quality is poor or the features are not apparent.
In an optional embodiment, the processing the third feature map in step S104 to obtain the semantic segmentation result corresponding to the eyeball image includes:
Step b1, carrying out sliding window processing on the third feature map to obtain a candidate region; the candidate region is composed of vectors corresponding to the anchor points.
Sliding window processing may be used to perform object detection. Specifically, on a given feature map (third feature map), the sliding window process would place a fixed size window and traverse the entire third feature map (slide the window). At each window position, some action is performed (e.g., applying a pre-trained classifier) to detect whether the region contains an object of interest (which may be some portion of an eyeball).
The result of the sliding window process is a series of candidate regions that contain the detected target. In the object detection task, these candidate regions are typically defined by anchors (anchors), which are preset rectangular boxes of different sizes and proportions.
And b2, carrying out normalization processing on the candidate region to obtain a target feature map.
Normalization processes are used to adjust the data range, and in object detection or semantic segmentation, normalization can ensure that candidate regions of different sizes and locations have similar weights or effects in subsequent processing. Specifically, the candidate region may be scaled to a fixed size so that it may be entered into a fully connected layer or other layer requiring a fixed input size.
The normalized feature map is generally referred to as a target feature map, which contains normalized candidate region information and is used for subsequent classification or segmentation tasks.
And b3, processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.
Each pixel in the target feature map is assigned a class label. In the context of an eye image, each pixel of the image is classified as a different portion of the eye (e.g., iris, pupil, etc.).
To obtain semantic segmentation results, algorithms typically use one or more convolutional layers, fully-connected layers, or other types of neural network layers to process the target feature map. These layers may learn how to map normalized candidate regions to corresponding class labels. Finally, the class labels are combined into a segmentation map (segmentation map) of the same size as the input image, with each pixel labeled as its corresponding class.
According to the image processing method provided by the embodiment, the whole feature map can be traversed through sliding window processing, and then the candidate region is extracted through the preset anchor points, so that the key region in the eyeball image can be effectively captured.
In addition, the normalization processing enables the pixel value distribution of the candidate region to be more uniform, and is beneficial to reducing the difference between different images, so that the accuracy of subsequent semantic segmentation is improved. Through normalization, the model can learn the characteristics in the image better, and then judge the category of each pixel more accurately.
In an alternative embodiment, the method further comprises:
and c1, performing pixel segmentation processing on the third feature image to obtain iris data, pupil data and pixel points corresponding to the target ring data.
The pixel division process is used to divide the third feature map into a plurality of regions. In this step, the third feature map is pixel-level classified to distinguish different areas of the iris, pupil, and target ring (possibly referring to specific markers in the image for vision testing or training). Wherein, the iris data, pupil data and pixel sets of different areas of the target ring data. The iris is the portion of the eye that lies around the pupil and has a unique texture and color. The pupil is a circular hole in the center of the iris, and becomes larger or smaller with the change of light. The target ring is used for guiding the testee to watch or judge the area.
And c2, respectively drawing binary mask patterns corresponding to the iris data, the pupil data and the target ring data based on the pixel points.
A binary mask map is an image in which each pixel has only two possible values (typically 0 and 1) for representing the presence or absence of different regions in the image. In this step, three binary mask patterns will be created using the pixel points obtained in step c 1: one for the iris, one for the pupil and one for the target ring. In particular, the mapping process involves comparing each pixel in the third feature map to the pixel points of the iris, pupil and target ring. If a pixel is classified as part of the iris, then at that pixel location the corresponding location of the binary mask map for the iris will be set to 1 (indicating present) and otherwise set to 0 (indicating not present). Similarly, a corresponding binary mask map is created for the pupil and target ring.
The image processing method provided in this embodiment can accurately distinguish the iris, pupil and target ring areas in the third feature map from the background or other irrelevant areas through the pixel segmentation process, and simplify the complex image data into an image containing only two values (usually 0 and 1) by generating a binary mask map, so as to display the positions and shapes of the iris, pupil and target ring in the image in an intuitive manner.
In an alternative embodiment, the first feature map, the second feature map, and the third feature map are obtained through a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure; the eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is an integer greater than 1, and the semantic segmentation result is output of the region recommendation network.
Specifically, the steps S101 to S103, S2031 to S2033, a1 to a4, b1 to b3, and c1 to c2 may be implemented by a pre-trained semantic segmentation network model. The pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure; the eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is larger than 1, and the semantic segmentation result is output of the region recommendation network.
Preferably, in the backbone architecture of the Encoder-Decoder structure of the convolutional neural network, the convolutional neural network is used for processing eyeball image data and extracting image features through a backbone network. The process gradually converts the original eyeball image into a high-level abstract feature map (feature maps) through operations such as rolling and pooling so as to more effectively capture key information in the image.
Specifically, the encoder structure (Encoder) and the Decoder structure (Decoder) each include a plurality of bottleneck modules; wherein, bottleneck module is a basic building block of the semantic segmentation network model, which consists of one convolution layer of 1x1, one convolution layer of 3x3 and another convolution layer of 1x 1. This structure helps to reduce the number of parameters of the model.
Referring to fig. 3, encoder: the first 3 stages belong to the decoding stage. stage1: comprising 5 bottleneck modules, the first bottleneck module performs downsampling and the next 4 repeated bottleneck modules performs downsampling and upsampling operations. Stage 2-3: the bottleneck 2.0.0 module of stage2 performs downsampling, and can add hole convolution or decompose convolution. While stage3 does not downsample, the other operations are the same as stage 2. Among these, stage is typically a concept of an organization network structure for dividing different parts of a model into logically related groups or hierarchies.
Referring to fig. 3, the Decoder: comprising an upsampling layer and two bottleneck modules, gradually restoring the feature map size to the original input image size and producing a final segmentation result.
More specifically, encoder (encoder) sections, each time a layer passes, feature maps are reduced in width and height by one time, and depth is doubled; the Decoder part increases the width and height of the feature map by 2 times and reduces the depth by one time every time one layer passes. Each layer of dimension change table may be [3,32,64,128,256,512,256,128,64 32].
The convolution network in this embodiment changes the residual structure of the upsampling stage into an average fusion strategy. This strategy reduces to some extent the need for feature changes and feature preservation by the fusion process.; Wherein,As a result of the third characteristic diagram,As a first characteristic diagram of the image of the object,Is a second feature map.
As shown in connection with fig. 4 and 5, the regional recommendation network (Region Proposal Network, RPN) structure includes one convolutional layer of 3X3 and its convolutional layer connecting two 1X1 (i.e., 1X1 Conv in fig. 4, 5). The RPN performs sliding window processing on each position of the third feature map through a 3x3 convolution layer (RolNorm), generating K vectors through K anchor points (circular anchors) of different scales and aspect ratios, where K is a positive integer greater than 2. Wherein, two detection branches are classified network and regression network respectively. The first branch is mainly to classify candidate regions. This means that the RPN can identify potential eye regions and assign a confidence level to each region, thereby providing targeted candidate regions. The second is to regress the boundary information of the pupil, iris and target ring, providing highly accurate positional information.
The classification network classifies each anchor point to calculate target probability values for three targets of pupil, iris and target ring. And regression networks get their specific locations by regression. The RPN generates K anchor points with different sizes at each position of the feature map, and each anchor point is composed of 3 variables. The center abscissa, the center ordinate and the radius of each target are respectively.
Further, specific positions of pupils, irises and target rings are obtained by regression, specifically, 3 variables are knownThe values of the target circle center abscissa, the target circle center ordinate and the target radius of each anchor point are respectively represented, and the offset of the 3 coordinates is%) The translation scaling values of 3 variables corresponding to the target point respectively, and the 3 coordinate offsets can be obtained through regression through the output of the RPN network, so that the final candidate target circle center and radius value generated by the RPN can be obtained by the following formula:
。
。
; the specific content refers to the step a4, and will not be repeated here.
In an alternative embodiment, the steps of determining the classified network loss and the recurrent network loss are as follows:
referring to fig. 6, for each candidate box, a cross entropy loss function is used to measure the accuracy of classification of the model for the target class. In order to predict the class confidence of an eyeball,For predicting the coordinates of a category,Predicting a radius for a circle of the corresponding coordinates; and calculating GroundTrue (real) coordinates and Prediction coordinates by using a loss function of the region of interest (Region of Interest, ROI) so as to obtain center coordinates of the iris, the pupil and the target ring.
; Wherein,In order to classify the loss of the device,For the regression loss of the center coordinates,For the radius length regression loss,Is the loss function of the RPN.
Preferably, after the convolutional neural network of Encoder-Decoder structure extracts the third feature map and performs RPN network, the third branch mask network (mask network) is a full convolutional network including four 3X3 convolutional layers, followed by a 2X2 transpose convolutional layer and a 1X1 convolutional layer outputting a binary mask. Wherein, the mask network (mask network) adopts Binary Cross entropy loss (Binary Cross-Entropy Loss) at pixel level to represent the probability between the Binary segmentation mask and the actual segmentation mask of each pixel point in the normalized candidate region.
In an alternative embodiment, the step of determining the masking network penalty is as follows:
; wherein, Is a mask penalty; for masking network loss, m is the true value and m is the predicted value.
The overall loss function of the neural network specifically comprises: the penalty functions of the classification network and regression network (CRN) and the penalty functions of the mask network (masknetwork), the semantic segmentation network model total penalty functions are:
; wherein L is the total loss function of the semantic segmentation network model.
According to the image processing method, the optimized Enet semantic segmentation network is applied to eyeball image segmentation and positioning, so that a full-automatic learning process is realized, segmented and normalized iris images are accurately acquired, and a corresponding binary mask image is generated. This process ensures a high degree of accuracy and consistency.
In one application scenario, the training process for the semantic segmentation network model includes: the dnn module of OpenCV uses the ENet structure for semantic segmentation. Optimizer and learning strategy identical for all experiments: the optimizer uses Adam, the initial learning rate 0.001,weight decay is set to 5e-4, and an adaptive learning rate adjustment strategy is adopted, with a maximum epoch of 150. All models were trained using 1 Nvidia GPU and deployed and inferred using the i5 family of central processors.
The data acquisition is carried out on the observer by a biological measuring instrument, and the data image comprises three target objects of iris, pupil and target ring on the eyeball surface.
The iris pixel of the sample eyeball image is marked as 0, the pupil pixel is marked as 1, and the target ring pixel is marked as 2, namely the multi-classification marking is realized, and a mask map of a real target area is generated.
Before training a segmentation network, labeling a sample image set for training, wherein the labeling comprises 1) generating a binary mask map of iris, pupil and target ring areas in an eyeball image; 2) The central positions and the radius of the iris, the pupil and the target ring are marked. The data is divided into a training set, a validation set and a test set. And performing image feature extraction by using the Encoder-Decoder structure as a backstone, and performing fine adjustment on multi-branch tasks of the decoding structure to realize multi-task learning, so that model training can be completed.
And respectively inputting the marked sample eyeball images into the semantic segmentation network model, and training the semantic segmentation network model until the model of the semantic segmentation network model converges.
The network parameters and activation values are quantized to a lower-bit representation using quantization techniques, thereby reducing the memory footprint and computational effort of the model. During training, the weights and activation values of the model are calculated using single precision floating point numbers. The weights and activation values of the trained model are mapped to a lower precision representation of the 8-bit integer in the inference phase.
And inputting iris images required to be segmented into iris areas into an Enet segmentation network after training to obtain boundary sizes, center positions and binary mask patterns of the iris, the pupil and the target ring.
The present embodiment also provides an image processing apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides an image processing apparatus, as shown in fig. 7, including: an acquisition module 701, configured to acquire eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data; the first processing module 702 is configured to process the eyeball image to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the first characteristic diagram and the second characteristic diagram which are fused into the same or similar spatial resolution are averaged; and the second processing module 703 is configured to process the third feature map to obtain a semantic segmentation result corresponding to the eyeball image.
In an alternative embodiment, the second processing module 703 includes: the first processing unit is used for classifying the third feature map and determining the category of eyeball image data; the second processing unit is used for carrying out positioning processing on the third feature map and determining the circle center position and the radius size information of the eyeball image data; and the first determining unit is used for taking the category, the circle center position and the radius size information as semantic segmentation results.
In an alternative embodiment, the second processing unit comprises: the acquisition subunit is used for respectively acquiring the circle center abscissa, the circle center ordinate, the radius value, the abscissa offset, the ordinate offset and the radius offset which are respectively corresponding to the pupil data, the iris data and the target ring data; a first determining subunit, configured to determine a target abscissa of the center position based on the center abscissa, the abscissa offset, and the value of the radius; the second determining subunit is used for determining the target ordinate of the circle center position based on the circle center ordinate, the ordinate offset and the value of the radius; and a third determination subunit configured to determine radius size information based on the value of the radius and the radius offset.
In an alternative embodiment, the apparatus further comprises: the determining module is used for determining the circle center position and the radius size information through the following formula:
。
。
; wherein, Is radius size information,Is the value of radius,Is the radius offset,Is the ordinate of the circle center,Is the ordinate of the target,Is an ordinate offset,Is the abscissa of the target,Is the abscissa of the circle center,Is the abscissa offset.
In an alternative embodiment, the second processing module 703 includes: the sliding window processing unit is used for carrying out sliding window processing on the third feature map to obtain a candidate region; the candidate areas are composed of vectors corresponding to the anchor points; the normalization processing unit is used for carrying out normalization processing on the candidate region to obtain a target feature map; and the third processing unit is used for processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.
In an alternative embodiment, the apparatus further comprises: the pixel segmentation processing module is used for carrying out pixel segmentation processing on the third feature image to obtain iris data, pupil data and pixel points corresponding to the target ring data; and the drawing module is used for respectively drawing the binary mask map corresponding to the iris data, the pupil data and the target ring data based on the pixel points.
In an alternative embodiment, the first feature map, the second feature map, and the third feature map are obtained through a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: backbone network and regional recommendation network, the backbone network includes: an encoder structure and a decoder structure; the eyeball image data is input of an encoder structure, the first characteristic diagram is output of the encoder structure and part of input of a t decoder structure, the second characteristic diagram is output of a t-1 decoder structure and other part of input of the t decoder structure, the third characteristic diagram is output of the t decoder structure and input of a region recommendation network, t is larger than 1, and the semantic segmentation result is output of the region recommendation network.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The image processing apparatus in this embodiment is presented in the form of functional units, where the functional units refer to ASIC (Application SPECIFIC INTEGRATED Circuit) circuits, processors and memories that execute one or more software or firmware programs, and/or other devices that can provide the above-described functions.
The embodiment of the invention also provides computer equipment, which is provided with the image processing device shown in the figure 7.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 8, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 8.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.
Claims (10)
1. An image processing method, the method comprising:
acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data;
Processing the eyeball image data to generate a first feature map, a second feature map, and a third feature map obtained by fusing the first feature map and the second feature map; the fusion is carried out on a first characteristic diagram and a second characteristic diagram which have the same or similar spatial resolution;
processing the third feature map to obtain a semantic segmentation result corresponding to eyeball image data;
The processing of the third feature map to obtain a semantic segmentation result corresponding to eyeball image data includes:
Classifying the third feature map, determining the category of the eyeball image data, positioning the third feature map, and determining the circle center position and the radius size information of the eyeball image data;
And taking the category, the circle center position and the radius size information as the semantic segmentation result.
2. The image processing method according to claim 1, wherein performing positioning processing on the third feature map to determine the center position and radius size information of the eyeball image data includes:
Respectively acquiring a circle center abscissa, a circle center ordinate, a radius value, an abscissa offset, an ordinate offset and a radius offset which correspond to pupil data, iris data and target ring data;
Determining a target abscissa of the circle center position based on the circle center abscissa, the abscissa offset and the value of the radius;
determining a target ordinate of the circle center position based on the circle center ordinate, the ordinate offset and the value of the radius;
radius size information is determined based on the value of the radius and the radius offset.
3. The image processing method according to claim 2, wherein the center position and radius size information is determined by the following formula:
;
;
; wherein, Is radius size information,Is the value of radius,Is the radius offset,Is the ordinate of the circle center,Is the ordinate of the target,Is an ordinate offset,Is the abscissa of the target,Is the abscissa of the circle center,Is the abscissa offset.
4. The image processing method according to claim 1, wherein the processing the third feature map to obtain a semantic segmentation result corresponding to an eyeball image includes:
performing sliding window processing on the third feature map to obtain a candidate region; the candidate areas are composed of vectors corresponding to the anchor points;
normalizing the candidate region to obtain a target feature map;
and processing the target feature map to obtain a semantic segmentation result corresponding to the eyeball image.
5. The image processing method according to claim 1, characterized in that the method further comprises:
performing pixel segmentation processing on the third feature image to obtain pixel points corresponding to iris data, pupil data and target ring data;
And respectively drawing binary mask graphs corresponding to the iris data, the pupil data and the target ring data based on the pixel points.
6. The image processing method according to any one of claims 1 to 5, wherein the first feature map, the second feature map, and the third feature map are obtained by a pre-trained semantic segmentation network model; the pre-trained semantic segmentation network model comprises the following steps: a backbone network and a regional recommendation network, the backbone network comprising: an encoder structure and a decoder structure;
The eyeball image data is input of the encoder structure, the first feature map is output of the encoder structure and part of input of a t decoder structure, the second feature map is output of a t-1 decoder structure and part of input of a t decoder structure, the third feature map is output of the t decoder structure and input of a region recommendation network, t is an integer greater than 1, and the semantic segmentation result is output of the region recommendation network.
7. An image processing apparatus, characterized in that the apparatus comprises:
The acquisition module is used for acquiring eyeball image data; wherein the eyeball image data includes: iris data, pupil data, and target ring data;
the first processing module is used for processing the eyeball image to generate a first feature map, and fusing the first feature map with a second feature map and a third feature map, wherein the first feature map and the second feature map which are fused to be the same or similar in spatial resolution are averaged, and the third feature map is a feature map obtained after the first feature map and the second feature map are averaged;
The second processing module is used for processing the third feature map to obtain a semantic segmentation result corresponding to the eyeball image;
The second processing module includes: the first processing unit is used for classifying the third feature map and determining the category of eyeball image data; the second processing unit is used for carrying out positioning processing on the third feature map and determining the circle center position and the radius size information of the eyeball image data; and the first determining unit is used for taking the category, the circle center position and the radius size information as semantic segmentation results.
8. A computer device, comprising:
A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the image processing method of any of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer instructions for causing a computer to execute the image processing method according to any one of claims 1 to 6.
10. A computer program product comprising computer instructions for causing a computer to perform the image processing method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410751020.6A CN118351589B (en) | 2024-06-12 | 2024-06-12 | Image processing method, apparatus, device, storage medium, and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410751020.6A CN118351589B (en) | 2024-06-12 | 2024-06-12 | Image processing method, apparatus, device, storage medium, and program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118351589A CN118351589A (en) | 2024-07-16 |
CN118351589B true CN118351589B (en) | 2024-08-27 |
Family
ID=91818318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410751020.6A Active CN118351589B (en) | 2024-06-12 | 2024-06-12 | Image processing method, apparatus, device, storage medium, and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118351589B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111588469A (en) * | 2020-05-18 | 2020-08-28 | 四川大学华西医院 | Ophthalmic robot end effector guidance and positioning system |
CN113780234A (en) * | 2021-09-24 | 2021-12-10 | 北京航空航天大学 | Edge-guided human eye image analysis method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2016138608A (en) * | 2016-09-29 | 2018-03-30 | Мэджик Лип, Инк. | NEURAL NETWORK FOR SEGMENTING THE EYE IMAGE AND ASSESSING THE QUALITY OF THE IMAGE |
CN110059589B (en) * | 2019-03-21 | 2020-12-29 | 昆山杜克大学 | Iris region segmentation method in iris image based on Mask R-CNN neural network |
CN111783514A (en) * | 2019-11-18 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Face analysis method, face analysis device and computer-readable storage medium |
CN117523650B (en) * | 2024-01-04 | 2024-04-02 | 山东大学 | Eyeball motion tracking method and system based on rotation target detection |
CN118135181A (en) * | 2024-03-12 | 2024-06-04 | 湖北星纪魅族集团有限公司 | Pupil positioning method, pupil positioning device, electronic device and storage medium |
-
2024
- 2024-06-12 CN CN202410751020.6A patent/CN118351589B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111588469A (en) * | 2020-05-18 | 2020-08-28 | 四川大学华西医院 | Ophthalmic robot end effector guidance and positioning system |
CN113780234A (en) * | 2021-09-24 | 2021-12-10 | 北京航空航天大学 | Edge-guided human eye image analysis method |
Also Published As
Publication number | Publication date |
---|---|
CN118351589A (en) | 2024-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Veena et al. | A novel optic disc and optic cup segmentation technique to diagnose glaucoma using deep learning convolutional neural network over retinal fundus images | |
Wang et al. | Hierarchical retinal blood vessel segmentation based on feature and ensemble learning | |
Biswal et al. | Robust retinal blood vessel segmentation using line detectors with multiple masks | |
Khowaja et al. | A framework for retinal vessel segmentation from fundus images using hybrid feature set and hierarchical classification | |
Zhou et al. | Optic disc and cup segmentation in retinal images for glaucoma diagnosis by locally statistical active contour model with structure prior | |
Naqvi et al. | Automatic optic disk detection and segmentation by variational active contour estimation in retinal fundus images | |
CN109583364A (en) | Image-recognizing method and equipment | |
Veiga et al. | Quality evaluation of digital fundus images through combined measures | |
Wang et al. | Segmenting retinal vessels with revised top-bottom-hat transformation and flattening of minimum circumscribed ellipse | |
de Moura et al. | Automatic identification of intraretinal cystoid regions in optical coherence tomography | |
CN111127400A (en) | Method and device for detecting breast lesions | |
CN113012093B (en) | Training method and training system for glaucoma image feature extraction | |
Verma et al. | Machine learning classifiers for detection of glaucoma | |
Parikh et al. | Effective approach for iris localization in nonideal imaging conditions | |
KR20200129440A (en) | Device for predicting optic neuropathy and method for providing prediction result to optic neuropathy using fundus image | |
CN118351589B (en) | Image processing method, apparatus, device, storage medium, and program product | |
CN117079339B (en) | Animal iris recognition method, prediction model training method, electronic equipment and medium | |
CN117274278B (en) | Retina image focus part segmentation method and system based on simulated receptive field | |
Jana et al. | A semi-supervised approach for automatic detection and segmentation of optic disc from retinal fundus image | |
Nsaef et al. | Enhancement segmentation technique for iris recognition system based on Daugman's Integro-differential operator | |
Nigam et al. | Iris classification based on its quality | |
Fusek et al. | Pupil localization using self-organizing migrating algorithm | |
KR102282334B1 (en) | Method for optic disc classification | |
Al-Shakarchy et al. | Open and closed eyes classification in different lighting conditions using new convolution neural networks architecture | |
Mashudi et al. | Dynamic U-Net Using Residual Network for Iris Segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |