CN107833219B

CN107833219B - Image recognition method and device

Info

Publication number: CN107833219B
Application number: CN201711212620.1A
Authority: CN
Inventors: 孙星; 刘诗昆; 郭晓威
Original assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2022-12-13
Anticipated expiration: 2037-11-28
Also published as: CN107833219A

Abstract

The application relates to an image recognition method. The method comprises the following steps: acquiring a three-dimensional imaging map; processing the three-dimensional imaging graph through a feature extraction model branch to obtain an image feature and a heat map of the three-dimensional imaging graph, wherein the image feature is a feature used for indicating an object region in the three-dimensional imaging graph; processing the image characteristics and the heat map through a recognition model branch to obtain the image attribute of the three-dimensional imaging map; the constructed heat map can indicate the global characteristics of the three-dimensional imaging map, and more information can be carried, so that the identification is carried out based on the heat map, and the accuracy of image identification can be improved.

Description

Image recognition method and device

Technical Field

The embodiment of the application relates to the technical field of machine learning, in particular to an image identification method and device.

Background

With the continuous development of the machine learning algorithm, the application of the machine learning algorithm in the image recognition scene is more and more extensive.

In one image recognition scenario, attributes of a three-dimensional imaging map, such as a CT (Computed Tomography) image, may be quickly recognized by means of a machine learning algorithm so that medical personnel may further make an accurate diagnosis.

In the three-dimensional imaging graph, one or more target objects related to image attributes are usually included, so, taking a CT graph as an example, in the related art, two independent machine learning models are trained in advance, wherein a first machine learning model is obtained by learning a CT pattern book of a target object manually labeled in advance, and a second machine learning model is obtained by learning a CT pattern book of a target object manually labeled in advance. When the medical staff is assisted in diagnosis, the CT image of the object region containing the target object is input into a first machine learning model to determine the position of each target object in the CT image, the CT image of each target object is divided from the CT image according to the position of the target object, the CT image of each target object is input into a second machine learning model respectively to determine the attribute of each target object, and finally the image attribute of the CT image is judged according to the attribute of each target object.

Because the target object does not contain information of all object regions of the target object, and the error of manually labeling the attributes of the target object is large, the accuracy of a model trained in the related technology is not high, and the accuracy of image attribute judgment is low.

Disclosure of Invention

The embodiment of the application provides an image recognition method and device, which can be used for solving the problems of low accuracy of a model trained in the related technology and low accuracy of image attribute judgment, and the technical scheme is as follows:

in a first aspect, an image recognition method is provided, the method including:

acquiring a three-dimensional imaging graph to be identified;

processing the three-dimensional imaging graph through a feature extraction model branch to obtain an image feature and a heat map of the three-dimensional imaging graph, wherein the image feature is a feature used for indicating an object region in the three-dimensional imaging graph, the heat map is generated by the feature extraction model branch according to the image feature, and the object region is a region where a target object is located;

and processing the image characteristics and the heat map through an identification model branch to obtain an identification result corresponding to the three-dimensional imaging map, wherein the identification result comprises the image attribute of the three-dimensional imaging map.

In a second aspect, an image recognition apparatus is provided, the apparatus comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a three-dimensional imaging image to be identified, and a focus area of a lung disease is an area where a lung generates lesions;

the first processing module is used for processing the three-dimensional imaging graph through a feature extraction model branch to obtain an image feature and a heat map of the three-dimensional imaging graph, wherein the image feature is a feature used for indicating an object region in the three-dimensional imaging graph, the heat map is generated by the feature extraction model branch according to the image feature, and the object region is a region where a target object is located;

and the second processing module is used for processing the image characteristics and the heat map through a recognition model branch to obtain a recognition result corresponding to the three-dimensional imaging map, wherein the recognition result comprises the image attribute of the three-dimensional imaging map.

In a third aspect, there is provided a computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the image recognition method according to the first aspect.

In a fourth aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the image recognition method according to the first aspect.

The technical scheme provided by the application can comprise the following beneficial effects:

the method comprises the steps of constructing a heat map of the three-dimensional imaging map by detecting an object region in the three-dimensional imaging map, and obtaining an identification result containing image attributes of the three-dimensional imaging map based on the constructed heat map and attribute features indicating the object region. In addition, because the image attribute of the three-dimensional imaging graph is not required to be identified through the attribute of each target object, the attribute of the target object is not required to be manually marked, the problem of low accuracy of manual marking is avoided, and the identification accuracy is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a diagram illustrating a model training and image recognition framework in accordance with an exemplary embodiment;

FIG. 2 is a model architecture diagram of a machine learning model, shown in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of machine learning model training in accordance with an exemplary embodiment;

FIG. 4 is a schematic diagram of a model training process according to the embodiment shown in FIG. 3;

FIG. 5 is a flow diagram illustrating an image recognition method according to an exemplary embodiment;

FIG. 6 is a schematic diagram of an image recognition process according to the embodiment shown in FIG. 5;

fig. 7 is a block diagram illustrating a configuration of an image recognition apparatus according to an exemplary embodiment;

FIG. 8 is a schematic diagram illustrating a configuration of a computer device in accordance with one illustrative embodiment;

FIG. 9 is a schematic diagram illustrating a configuration of a computer device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The embodiment of the application provides a scheme for identifying image attributes, and the scheme can be used for rapidly identifying the image attributes by identifying and processing a three-dimensional imaging image of an object area where a target object is located.

In a possible implementation manner, the scheme shown in the present application may be implemented as an efficient and high-accuracy scheme for identifying disease attributes, which can help users and doctors thereof to quickly identify disease attributes by performing identification processing on a three-dimensional imaging map of a disease focus region. For convenience of understanding, several terms referred to in the embodiments of the present application are explained below.

(1) Focal region: generally refers to a partial region of a body where a lesion occurs, and a lesion region of a lung disease is a region where a lesion occurs in a lung. For example, if a lobe of the lung is destroyed by tubercle bacillus, the lobe is the focal region of pulmonary tuberculosis; for another example, if the bronchial mucosa epithelium of the lung is damaged by cancer cells, the region corresponding to the bronchial mucosa epithelium is the focal region of lung cancer.

(2) The disease source area: the etiology refers to the knot that causes the disease. For lung cancer, the disease origin refers to the lung nodule, the disease origin region refers to the region where the disease origin causes the disease, and for lung cancer, the disease origin region refers to the region where the lung nodule causing the lung cancer is located in the focus region, which may also be referred to as the lung nodule region. Wherein the etiology attribute can be classified as benign or malignant. Continuing with the case of the disease source being a lung nodule, the lung nodule attributes can be classified as benign and malignant.

(3) Three-dimensional imaging map: generally refers to imaging a three-dimensional space of an object. In the embodiment of the present application, the three-dimensional imaging map refers to a medical three-dimensional pathological image of a diseased organ, and the three-dimensional imaging map may be composed of two-dimensional images of a plurality of different slice layers. For example, in the embodiment of the present application, the three-dimensional imaging map may be a CT map or a mri map of the lung of the patient. Alternatively, the three-dimensional imaging map may be directed to the entire diseased organ or only to the lesion area of the diseased organ. In a possible implementation manner, when the three-dimensional imaging map is a CT map or a mri map of the lung of the patient, the target object in the three-dimensional imaging map may be a disease source, and the object region is a disease source region. In one possible implementation, in the medical field, the image attributes of the three-dimensional imaging map are benign or malignant of a disease.

The scheme of the embodiment of the application comprises a model training phase and a recognition phase. FIG. 1 is a diagram illustrating a model training and image recognition framework, according to an example embodiment. As shown in fig. 1, in the model training phase, the model training device 110 trains an end-to-end machine learning model through a three-dimensional imaging map of a region (e.g., a disease focus region) labeled with a position of a target region (e.g., a disease origin region) and an image attribute (e.g., a disease quality), and in the recognition phase, the recognition device 120 directly recognizes the image attribute (e.g., the disease quality) of the three-dimensional imaging map according to the trained machine learning model and the input three-dimensional imaging map of the region (e.g., the disease quality) containing the target object. In the embodiment of the application, only one end-to-end machine learning model is needed, so that compared with a mode that two or more machine learning models are cascaded mutually, the method can effectively reduce the cascaded accumulated error between the two or more machine learning models and improve the accuracy of the machine learning models.

The model training device 110 and the recognition device 120 may be computer devices with machine learning capability, for example, the computer devices may be stationary computer devices such as a personal computer, a server, and a stationary medical device, or the computer devices may also be mobile computer devices such as a tablet computer, an e-book reader, or a portable medical device.

Optionally, the model training device 110 and the recognition device 120 may be the same device, or the model training device 110 and the recognition device 120 may be different devices. Also, when the model training device 110 and the recognition device 120 are different devices, the model training device 110 and the recognition device 120 may be the same type of device, such as the model training device 110 and the recognition device 120 may both be personal computers; alternatively, the model training device 110 and the recognition device 120 may be different types of devices, for example, the model training device 110 may be a server, and the recognition device 120 may be a stationary medical device or a stationary medical device, etc. The embodiment of the present application is not limited to the types of the model training device 110 and the recognition device 120.

FIG. 2 is a model architecture diagram illustrating a machine learning model in accordance with an exemplary embodiment. As shown in fig. 2, the end-to-end machine learning model 20 in the embodiment of the present application may include two model branches, where the feature extraction model branch 210 is configured to extract an image indicating a target region in a three-dimensional imaging graph from an input three-dimensional imaging graph of the target region, and construct and output a thermal graph capable of indicating the global state of the target region according to image features. The feature extraction model branch 210 outputs the constructed heat map, and also inputs the heat map and the extracted image features into the recognition model branch 220, and the recognition model branch 220 is used for learning and recognizing according to the heat map and the image features input by the feature extraction model branch 210, and outputting a recognition result containing the image attributes of the three-dimensional imaging map. In one possible implementation, the recognition result is benign or malignant of the lung disease when the three-dimensional imaging map is a CT map of a lesion region of the lung.

In the end-to-end machine learning model shown in fig. 2, the object region is not used as a main factor for determining the image attribute of the three-dimensional imaging graph, but the object region in the three-dimensional imaging graph is detected to construct the thermal graph of the three-dimensional imaging graph, and the image attribute of the three-dimensional imaging graph is determined based on the constructed thermal graph and the attribute feature indicating the object region. In addition, because the image attributes are not identified through the attributes of each target object region, the attributes of the target object regions (such as benign and malignant lung nodules) do not need to be labeled manually in the model training process, so that the problem of low accuracy of manual labeling of the target object regions is solved, and the identification accuracy is further improved.

Taking the application of the scheme in the medical field as an example, the application mode of the scheme on the product side is mainly background recognition, the three-dimensional imaging image of the patient is transmitted to the feature extraction model branch 210 and the recognition model branch 220 for disease recognition, and a corresponding highlight area (namely, a heat map) is made according to the recognition result. The scheme can be used for auxiliary medical systems facing hospitals or individuals, and helps patients to quickly and efficiently detect benign and malignant diseases related to disease source areas, for example, by taking the disease source area as a pulmonary nodule area, lung diseases related to pulmonary nodules can include but are not limited to lung cancer, lung congenital dysplasia, pneumonia, tuberculosis, bronchiectasis, pulmonary aspergillosis, pulmonary hemangioma, pulmonary granulomatosis, pulmonary tuberculosis tumor, pulmonary inflammatory pseudotumor, pulmonary hamartoma, pulmonary metastatic cancer, pulmonary paralysis scar, rheumatic pulmonary nodule, intrapulmonary lymph node and the like.

The scheme shown in the embodiment of the application has the characteristics of quick response, wide coverage, low labor cost, capability of helping field experts to determine coping strategies and the like. The scheme can be used for facing hospitals or personal medical assistants to help experts and patients to quickly locate the sensitive areas for disease identification, and the accuracy of disease judgment of the patients is increased.

FIG. 3 is a flow diagram illustrating a machine learning model training method that may be used in a computer device, such as the model training device 110 shown in FIG. 1 described above, to train to obtain the machine learning model shown in FIG. 2 described above, according to an example embodiment. As shown in fig. 3, the machine learning model training method may include the following steps:

step 301, a model training device obtains a three-dimensional imaged graph sample labeled with image attributes and object region attributes, where the object region attributes include coordinates and dimensions of an object region in the three-dimensional imaged graph sample.

Taking the application in the medical field as an example, the model training device obtains a three-dimensional imaging map sample labeled with disease attributes and disease source area attributes, where the disease source area attributes include coordinates and dimensions of a disease source area in the three-dimensional imaging map sample.

In this embodiment of the present application, the three-dimensional imaging graph sample may be input into the model training device by a label auditor, that is, the label auditor marks coordinates and a size of a disease source region (i.e., an object region) in the three-dimensional imaging graph sample in other computer devices according to a guidance of a professional, and inputs the three-dimensional imaging graph sample with the marked coordinates and size of the disease source region and an image attribute (i.e., a disease attribute, such as benign or malignant) corresponding to the three-dimensional imaging graph sample into the model training device.

Or, the three-dimensional imaging graph sample can also be obtained by directly labeling the model training equipment by a labeling auditor.

The three-dimensional imaging map sample may include a plurality of positive samples and a plurality of negative samples, wherein, taking the application in the medical field as an example, the positive sample may be a three-dimensional imaging map of a lesion region of a patient who has been diagnosed as benign to a disease, or the positive sample may be a three-dimensional imaging map of a body region of a normal person who has not yet acquired a disease, and the negative sample may be a three-dimensional imaging map of a lesion region of a patient who has been diagnosed as malignant to a disease.

Alternatively, the above-mentioned positive sample may be a three-dimensional imaged picture of a lesion region of a patient who has been diagnosed as malignant disease, and the negative sample may be a three-dimensional imaged picture of a lesion region of a patient who has been diagnosed as benign disease, or the negative sample may be a three-dimensional imaged picture of a body region of a normal person who has not received disease.

In the embodiment of the present application, only the attribute of the disease source region including the coordinates and the size of the disease source region is taken as an example for description, in practical applications, other types of attributes of the disease source region, such as shape, specificity, and the like, may also be used, and the type of the attribute of the disease source region is not limited in the embodiment of the present application.

Step 302, the model training device constructs a heat map sample according to the object region attribute.

For example, taking the application in the medical field as an example, the model training device constructs a heat map sample according to the attribute of the disease source region.

In this embodiment of the present application, the model training device may obtain a corresponding heat map sample according to the above-mentioned gaussian distribution of the constructed and marked attribute of the disease origin region.

Optionally, in order to enrich the heat map sample and improve the accuracy of subsequent model training, in this embodiment of the application, for one three-dimensional imaging map sample, the model training device may construct the heat map sample at two or more resolutions, that is, the model training device may construct a plurality of heat map samples with different resolutions (i.e., multiple scales) according to the attribute of the disease source region (i.e., the attribute of the object region) of the one three-dimensional imaging map sample.

Optionally, after constructing the heat map sample, the model training device may further perform regularization (normalization) on each location of the disease source region (i.e., the location of the object region) in the heat map, so that the maximum value of the heat map at each resolution is 1. Since the larger the diameter of the large lesion area, the smaller the corresponding peak of the gaussian distribution, and the regularization allows each heat map to focus on the location of the lesion area with the largest diameter.

And step 303, performing model training by the model training equipment according to the three-dimensional imaging graph sample and the heat pattern book to obtain a feature extraction model branch and an image feature sample in the machine learning model.

The image feature sample is an image feature obtained by processing the three-dimensional imaging image sample through a feature extraction model branch.

The model training equipment can perform machine training on the three-dimensional imaging chart sample and the heat pattern book according to a preset algorithm to obtain a feature extraction model branch in a machine learning model, so that the feature extraction model branch can obtain the corresponding heat chart sample and the image feature sample after processing the three-dimensional imaging chart sample.

In the embodiment of the present application, the Feature extraction model branch may include two parts, namely, a three-dimensional deep convolution incomplete Network and a Feature Pyramid Network (Feature Pyramid Network). The three-dimensional depth convolution incomplete network extracts specific image features from an input three-dimensional imaging image, and the image features can indicate features of a disease source area (namely, an object area) in an input three-dimensional imaging image sample, such as the position, shape, size, specificity and the like of each disease source area, and the embodiment of the application does not limit the types of the image features; and the characteristic pyramid network is used for constructing a heat map according to the image characteristics extracted by the three-dimensional deep convolution incomplete network. And after the branch training of the feature extraction model is finished, the image features extracted from the three-dimensional imaging image sample through the three-dimensional depth convolution incomplete network are the image feature sample.

It should be noted that, when each three-dimensional imaging graph or the three-dimensional imaging graph sample corresponds to a heat map with multiple resolutions, the feature pyramid network may be a multi-scale feature pyramid network, and each scale corresponds to a resolution, that is, the feature pyramid network may construct multiple heat maps with different resolutions according to the input image features.

The heat map or the heat pattern referred to in the embodiment of the present application is a three-dimensional heat map.

Optionally, if the three-dimensional imaging graph sample is trained as a whole, it may be that more processing resources are required in the training process due to a large amount of data included in the three-dimensional imaging graph, and training time may also be affected, resulting in low training efficiency.

For example, the model training device may segment a three-dimensional image sample to obtain a plurality of three-dimensional image block samples, for example, segment a three-dimensional image sample into 48 sub-samples with [48x 48x 48] resolution, correspondingly segment the heat map of the three-dimensional image sample generated in step 302 into corresponding 48 sub-heat map samples, for each three-dimensional image block sample, learn sub-image feature samples in the three-dimensional image block sample by using a three-dimensional depth convolution residual network, and learn, through a feature pyramid network, the heat map corresponding to the three-dimensional image block sample according to the sub-image feature samples in the three-dimensional image block sample until the learned heat map is the same as or close to (for example, the error is within a certain range) the sub-heat map samples corresponding to the three-dimensional image block sample.

And step 304, performing model training by the model training equipment according to the heat map sample, the image feature sample and the image attribute corresponding to the three-dimensional imaging map sample to obtain an identification model branch of the machine learning model.

Taking the application in the medical field as an example, the model training device performs model training according to the heat map sample, the image feature sample and the disease attribute corresponding to the three-dimensional imaging map sample to obtain an identification model branch of the machine learning model.

In this embodiment, the model training device may train the recognition model branch through a preset model training algorithm, where the model training algorithm may be a three-dimensional deep convolutional network algorithm or a long-short term memory network algorithm, and for example, the model training device may input a heat map sample, the image feature sample, and a disease attribute corresponding to the three-dimensional imaging map sample into the model training algorithm together for training, so as to obtain the recognition model branch.

Alternatively, when the above-mentioned heat pattern substantially includes heat pattern samples at a plurality of resolutions, after performing step 303, the model training apparatus may introduce an attention mechanism in the recognition model branch, that is, obtain the recognition model branch through end-to-end training based on the attention mechanism.

For example, taking a machine learning model based on the attention mechanism as an example, the heat maps at each resolution obtained in step 304 and the learned image features are subjected to point multiplication, then weighted summation is performed according to the point multiplication result corresponding to each resolution and the weight of each resolution, and then the heat maps and the disease attributes (i.e. image attributes) corresponding to the three-dimensional imaging map sample are input into the convolutional layer and the connection layer for training, and finally the disease goodness and badness judgment is obtained through the sigmoid activation function, and the three-dimensional heat maps representing the position and the diameter of the disease source region are also obtained, wherein the weight of each resolution, the convolutional layer and the connection layer are parts which need to be trained in the recognition model branch.

In this embodiment of the present application, the number of the resolutions may be set to 2 to 4, for example, taking setting 3 different resolutions as an example, assuming that the weight corresponding to the resolution 1 is a1, the weight corresponding to the resolution 2 is a2, the weight corresponding to the resolution 3 is a3, the image attribute obtained by processing the three-dimensional imaging graph by the feature extraction model branch is b, the heat map corresponding to the resolution 1 is c1, the heat map corresponding to the resolution 2 is c2, and the heat map corresponding to the resolution 3 is c3, then the weighted summation result d may be represented as follows:

d＝a1*b*c1+a2*b*c2+a3*b*c3。

optionally, when the three-dimensional imaging graph sample and the corresponding thermal graph sample are subjected to segmentation training in step 303, before step 304 is executed, the model training device may combine sub-image feature samples extracted from each three-dimensional image block sample by the trained feature extraction model branch into the image feature sample.

In practical applications, the method for performing segmentation training on the three-dimensional imaging chart sample and the corresponding heat pattern sample, and the algorithm for performing model training on the three-dimensional imaging chart sample and a plurality of heat chart samples with different resolutions may be used in combination according to actual conditions. For example, in one possible implementation, the model training device may perform segmentation training on the three-dimensional imaging graph sample and the corresponding single-resolution heat graph sample to obtain a single-resolution machine learning model; or, in another possible implementation manner, the model training device may perform model training on the three-dimensional imaging chart sample and a plurality of heat chart samples with different resolutions as a whole to obtain a multi-resolution machine learning model; alternatively, in yet another possible implementation manner, the model training device may perform segmentation training on the three-dimensional imaging graph sample and the multi-resolution heat graph sample corresponding to the three-dimensional imaging graph sample to obtain the multi-resolution machine learning model.

Please refer to fig. 4, which illustrates a schematic diagram of a model training process according to an embodiment of the present application. Taking the disease source region as a lung nodule, the disease is a lung disease, and the machine training is guided by three-dimensional thermograms at 3 different resolutions as an example, as shown in fig. 4, in the embodiment of the present application, the annotation reviewer provides a three-dimensional imaged graph sample (including several positive samples and several negative samples), and lung disease attributes corresponding to the three-dimensional imaged graph sample and three-dimensional coordinates and dimensions (such as radius or diameter) of the lung nodule in the three-dimensional imaged graph sample in the model training apparatus.

The model training device firstly obtains thermal map samples (namely thermal map 1-thermal map 3 in fig. 4) corresponding to each three-dimensional imaging map sample under three different resolutions according to the three-dimensional coordinates and sizes of lung nodules in the three-dimensional imaging map sample and by constructing a gaussian distribution mode of the three-dimensional coordinates and sizes of the lung nodules, wherein the obtained thermal map samples can be thermal map samples subjected to regularization processing.

After obtaining the thermal image samples, the model training device may perform model training according to the three-dimensional imaging image samples and the thermal image samples at three different resolutions corresponding to the three-dimensional imaging image samples to obtain a feature extraction model branch in the machine learning model, where the training is to enable the three-dimensional depth convolution residual network in the feature extraction model branch to extract image features from the three-dimensional imaging image samples, and the feature pyramid network in the feature extraction model branch to construct the thermal image samples at three different resolutions corresponding to the three-dimensional imaging image samples according to the image features extracted by the three-dimensional depth convolution residual network. And after the branch training of the feature extraction model is finished, outputting an image feature sample extracted from the three-dimensional imaging image sample by the three-dimensional depth convolution incomplete network. Optionally, in the training process, the three-dimensional imaging graph sample and the corresponding heat maps at three different resolutions may be subjected to segmentation training.

The model training equipment performs model training giving attention to a mechanism through image feature samples output by the feature extraction model branches, the heat map samples corresponding to the three-dimensional imaging map samples under three different resolutions and lung disease attributes corresponding to the three-dimensional imaging map samples, and obtains recognition model branches. The recognition model branch can comprise an attention module, a winding layer, a connecting layer and an activation function, the activation function is a preset fixed function, in the training process, the attention module is used for performing point multiplication on an input image feature sample and thermal diagram samples under three different resolutions corresponding to the three-dimensional imaging diagram sample respectively, weighting and summing the three point multiplication results according to weights corresponding to the three different resolutions respectively and then inputting the three point multiplication results into the winding layer and the connecting layer, the aim of the training process is to obtain weights corresponding to the three different resolutions respectively and the winding layer and the connecting layer, so that lung disease attributes of the corresponding three-dimensional imaging diagram sample can be accurately obtained after the image feature sample and the thermal diagram samples under the three different resolutions sequentially pass through the attention module, the winding layer, the connecting layer and the activation function.

To sum up, in the method shown in the embodiment of the present application, when the three-dimensional imaging graph is processed through the feature extraction model branch, the three-dimensional imaging graph is divided into at least two three-dimensional image blocks, and each three-dimensional image block is processed respectively, so that the data volume of single processing is reduced, and the efficiency of model training and recognition is improved.

In addition, according to the method disclosed by the embodiment of the application, in the model training and recognition process, a plurality of heat maps under different resolutions can be constructed for one three-dimensional imaging map, and training or recognition is performed according to the heat maps under different resolutions, so that the accuracy of the model and the accuracy of the recognition result are improved.

After the feature extraction model branch and the recognition model branch in the machine learning model are trained offline, the machine learning model can be applied to recognition equipment for offline complete lung disease detection, so as to help a user (such as a patient) to rapidly perform image attribute (such as benign and malignant lung disease) detection, and help a doctor to efficiently view a focus area, please see the following embodiments.

Fig. 5 is a flow diagram illustrating an image recognition method that may be used in a computer device, such as the recognition device 120 shown in fig. 1 described above, according to an example embodiment. Taking the application of the present scheme in the medical field to identify the benign or malignant diseases as an example, as shown in fig. 5, the image identification method may include the following steps:

step 501, the identification equipment acquires a three-dimensional imaging graph to be identified.

In the actual recognition process, the user or the doctor may input a three-dimensional imaging map of the lesion area of the patient into the recognition apparatus.

Step 502, the recognition device processes the three-dimensional imaging graph through a feature extraction model branch to obtain the image features of the three-dimensional imaging graph and a heat map constructed according to the image features.

The image feature is a feature for indicating an object region in the three-dimensional imaging graph, and the object region is a region where a target object is located.

Taking the medical field as an example, the image feature is a feature for indicating a disease source region in the three-dimensional imaging map, and is not described herein again.

Optionally, because there are many pixels in the three-dimensional imaging graph, directly performing learning processing may result in too long processing time and affect recognition efficiency, in this embodiment of the present application, when the three-dimensional imaging graph is processed through the feature extraction model branch, the three-dimensional imaging graph may be divided into at least two three-dimensional image blocks; for each three-dimensional image block of the at least two three-dimensional image blocks, inputting the three-dimensional image block into the feature extraction model branch to obtain a sub-image feature corresponding to the three-dimensional image block and a sub-heat map constructed according to the sub-image feature; combining the sub-image features corresponding to the at least two three-dimensional image blocks into the image feature; and combining the sub-heat maps corresponding to the at least two three-dimensional image blocks into the heat map. The above segmentation processing steps are similar to the segmentation processing steps in the training process, and are not described herein again.

Optionally, when the three-dimensional image is processed through the feature extraction model branch, the three-dimensional image may be processed through the feature extraction model branch to obtain a thermal map in which the three-dimensional image and at least two resolutions respectively correspond to each other.

In the identification process, after the three-dimensional imaging graph is input into the feature extraction model branch by the identification equipment, the three-dimensional depth convolution incomplete network in the feature extraction model branch extracts image features in the three-dimensional imaging graph from the three-dimensional imaging graph, the image features are the features indicating each disease source area in the three-dimensional imaging graph, the three-dimensional depth convolution incomplete network inputs the extracted image features into the feature pyramid network, and the thermal map corresponding to the three-dimensional imaging graph is constructed through the feature pyramid network and the image features.

Corresponding to the training process, in the identification process, for a three-dimensional imaging graph, the identification equipment can construct at least two heat graphs respectively corresponding to the two resolutions through the image features extracted by the feature extraction model branches.

Step 503, the recognition device processes the image feature and the thermal map through the recognition model branch to obtain a recognition result corresponding to the three-dimensional imaging map, where the recognition result includes an image attribute of the three-dimensional imaging map.

Taking the application in the medical field as an example, the identification result includes disease attributes of the disease, including benign or malignant.

In the embodiment of the application, the recognition model branch may be obtained by training a preset model training algorithm, where the model training algorithm may be a three-dimensional deep convolutional network algorithm or a long-short term memory network algorithm, and in the recognition process, the recognition device may input the heat map and the image features into the recognition model branch together to obtain a recognition result including the disease attribute. The training manner for identifying the model branch may refer to step 304 in the embodiment corresponding to fig. 3, and is not described herein again.

Optionally, when the thermal map includes thermal maps with a plurality of resolutions, the at least two resolutions correspond to respective weights, and when the image feature and the thermal map are processed through the recognition model branch to obtain the disease attribute corresponding to the three-dimensional imaging map, for each thermal map in the thermal maps corresponding to the three-dimensional imaging map and the at least two resolutions, the image feature and the thermal map are subjected to point multiplication through the recognition model branch, according to the weights corresponding to the at least two resolutions, the point multiplication results of the thermal maps corresponding to the three-dimensional imaging map and the at least two resolutions are subjected to weighted summation, and according to the result of the weighted summation, the recognition result corresponding to the three-dimensional imaging map is obtained.

In this embodiment, the machine learning model may be an end-to-end model based on an attention mechanism, and the identification model branch may include an attention model, where the attention model is configured to perform point multiplication on a thermal map of an input three-dimensional imaging map corresponding to different resolutions and the image features, perform weighted summation through weights corresponding to the different resolutions, input a result of the weighted summation to other parts in the identification model branch, such as a collagen layer, a connection layer, and an activation function (e.g., a sigmoid function), and perform processing by the other parts in the identification model branch to obtain an identification result (including identifying whether the disease is benign or malignant). The weights corresponding to different resolutions are obtained by training the recognition model branch in step 303.

In the embodiment of the application, the machine learning model can output the disease attribute obtained by the branch processing of the recognition model, and also can output the heat map obtained by the branch processing of the feature extraction model, so that doctors can perform manual operation such as disease analysis according to the heat map.

Please refer to fig. 6, which illustrates a schematic diagram of an image recognition process according to an embodiment of the present application. Taking a disease source region as a lung nodule, a disease as a lung disease, and performing identification based on three-dimensional heat maps of 3 different resolutions as an example, as shown in fig. 6, in the embodiment of the present application, when a patient visits, a doctor may input a three-dimensional imaging map of a disease region of the lung disease of the patient into an identification device, the identification device extracts image features of the three-dimensional imaging map through a three-dimensional depth convolution incomplete network in a feature extraction model branch, and constructs three-dimensional heat maps of 3 different resolutions based on the extracted image features through a feature pyramid network in the feature extraction model branch, that is, heat maps 1 to 3 in fig. 6.

After the three-dimensional heat maps with 3 different resolutions are constructed, on one hand, the identification equipment outputs the constructed three-dimensional heat maps to a visual interface, and on the other hand, the identification equipment inputs the extracted image features and the constructed three-dimensional heat maps with 3 different resolutions into the identification model branch.

The attention module in the recognition model branch performs point multiplication on image features and constructed three-dimensional heat maps with 3 different resolutions respectively, then performs weighted summation, inputs weighted summation results into the wrapping layer and the connecting layer, processes the weighted summation results by the wrapping layer and the connecting layer, outputs an array obtained by processing to the activation function, obtains a recognition result according to the array by the activation function, and the recognition result includes lung disease attributes corresponding to the three-dimensional imaging map, for example, when the output recognition result is 0, the lung disease attribute corresponding to the three-dimensional imaging map can be considered as benign, and when the output recognition result is 1, the lung disease attribute corresponding to the three-dimensional imaging map can be considered as malignant.

The scheme provided by the embodiment of the application is a high-accuracy and high-coverage scheme for identifying the disease sensitive region and detecting the benign and malignant diseases. According to the scheme, a three-dimensional convolution network is constructed by adopting a deep learning correlation technique, a basic model is obtained based on data training of a marked disease source region (namely, a characteristic extraction model branch for processing and obtaining a heat map and image characteristics), and then end-to-end training based on an attention mechanism is carried out by utilizing the obtained heat map and image characteristics. Through retraining the heat map, other focus information can be obtained besides the disease source region, so that the identification accuracy is improved, a doctor can be helped to quickly detect diseases, and the problems of poor robustness and excessive dependence on the marking accuracy of the disease source region caused by the fact that the traditional method only identifies the diseases through the benign and malignant states of the disease source region are solved. And the scheme provided by the embodiment of the application also supports automatic highlight sensitive region identification (corresponding to the step of outputting the heat map), compared with other deep learning methods, the scheme can automatically capture a disease sensitive region through end-to-end training, and can accelerate the accuracy of model convergence and model training through a preprocessing (corresponding to the step of segmenting the three-dimensional imaging map).

The machine learning model has high response speed, can realize real-time detection, and is particularly suitable for the condition that remote areas lack high-quality medical resources.

In summary, in the method shown in the embodiment of the present application, the object region is not used as a main factor for determining the image attribute, but the object region in the three-dimensional imaging map is detected to construct the thermal map of the three-dimensional imaging map, and the image attribute is determined based on the constructed thermal map and the attribute feature of the indicated object region. In addition, because the image attributes are not identified through the attributes of each object region, the attributes of the object regions do not need to be labeled manually in the model training process, so that the problem of low accuracy of manual labeling of the object regions is solved, and the identification accuracy is further improved.

In addition, the model for identifying the image attributes in the embodiment of the application is an end-to-end machine learning model, and compared with a mode that two or more machine learning models are cascaded, the method can effectively reduce the cascaded accumulated error between the two or more machine learning models and improve the accuracy of the machine learning models.

In addition, according to the method disclosed by the embodiment of the application, when the three-dimensional imaging graph is processed through the feature extraction model branch, the three-dimensional imaging graph is divided into at least two three-dimensional image blocks, and each three-dimensional image block is processed respectively, so that the data volume of single processing is reduced, and the efficiency of model training and recognition is improved.

In addition, according to the method disclosed by the embodiment of the application, in the process of model training and recognition, a plurality of heat maps under different resolutions can be constructed for one three-dimensional imaging map, and training or recognition is performed according to the heat maps under different resolutions, so that the accuracy of the model and the accuracy of the recognition result are improved.

Fig. 7 is a block diagram illustrating a configuration of an image recognition apparatus according to an exemplary embodiment. The image recognition apparatus may be used in a computer device to perform all or part of the steps in the embodiment shown in fig. 3. The image recognition apparatus may include:

a first obtaining module 701, configured to obtain a three-dimensional imaging map to be identified;

a first processing module 702, configured to process the three-dimensional imaging graph through a feature extraction model branch to obtain an image feature and a heat map of the three-dimensional imaging graph, where the image feature is a feature used to indicate an object region in the three-dimensional imaging graph, the heat map is generated by the feature extraction model branch according to the image feature, and the object region is a region where a target object is located;

the second processing module 703 is configured to process the image features and the thermal map through a recognition model branch to obtain a recognition result corresponding to the three-dimensional imaging map, where the recognition result includes an image attribute of the three-dimensional imaging map.

Optionally, the first processing module 702 is configured to divide the three-dimensional imaging graph into at least two three-dimensional image blocks; for each three-dimensional image block of the at least two three-dimensional image blocks, inputting the three-dimensional image block into the feature extraction model branch to obtain a sub-image feature corresponding to the three-dimensional image block and a sub-heat map constructed according to the sub-image feature; combining the sub-image features corresponding to the at least two three-dimensional image blocks into the image feature; and combining the sub-heat maps corresponding to the at least two three-dimensional image blocks into the heat map.

Optionally, the first processing module 702 is configured to process the three-dimensional image through the feature extraction model branch to obtain image features of the three-dimensional imaging graph and a heat map in which the three-dimensional imaging graph and at least two resolutions respectively correspond.

Optionally, the at least two resolutions correspond to respective weights, and the second processing module 703 is configured to, for each of the three-dimensional imaging graph and the heat maps corresponding to the at least two resolutions, perform dot multiplication on the image feature and the heat map through the identification model branch, perform weighted summation on dot multiplication results of the three-dimensional imaging graph and the heat maps corresponding to the at least two resolutions according to the respective weights corresponding to the at least two resolutions, and obtain an identification result corresponding to the three-dimensional imaging graph according to a result of the weighted summation.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring a three-dimensional imaging image sample marked with image attributes and object area attributes before acquiring the three-dimensional imaging image, wherein the object area attributes comprise coordinates and sizes of object areas in the three-dimensional imaging image sample;

the heat map construction module is used for constructing a heat map sample according to the object region attribute;

the first training module is used for carrying out model training according to the three-dimensional imaging chart sample and the heat pattern book to obtain the feature extraction model branch and an image feature sample, wherein the image feature sample is an image feature obtained by processing the three-dimensional imaging chart sample through the feature extraction model branch;

and the second training module is used for carrying out model training according to the heat map sample, the image feature sample and the image attributes corresponding to the three-dimensional imaging map sample to obtain the identification model branch.

Optionally, the apparatus further comprises:

the third processing module is used for conducting regularization processing on the heat map sample to obtain a regularized heat map sample before the first training module conducts model training according to the three-dimensional imaging map sample and the heat map sample to obtain the feature extraction model branch and the image feature sample;

the first training module is used for carrying out model training according to the three-dimensional imaging graph sample and the regularized heat pattern book to obtain the feature extraction model branch and the image feature sample.

Optionally, the feature extraction model branch includes a three-dimensional depth convolution incomplete network and a feature pyramid network, and the first processing module 702 is configured to extract the image feature from the three-dimensional imaging map through the three-dimensional depth convolution incomplete network, and construct the heat map through the feature pyramid network and the image feature.

Optionally, the recognition model is branched into an end-to-end model based on an attention mechanism.

Optionally, the three-dimensional imaging map is an electron computed tomography CT map or a nuclear magnetic resonance imaging map.

In summary, in the apparatus shown in the embodiment of the present application, the object region is not used as a main factor for determining the image attribute, but the object region in the three-dimensional imaging map is detected to construct the thermal map of the three-dimensional imaging map, and the image attribute is determined based on the constructed thermal map and the attribute feature of the indicated object region. In addition, because the image attributes are not identified through the attributes of all the object areas, the attributes of the object areas do not need to be manually marked in the model training process, so that the problem of low accuracy of manual marking of the object areas is solved, and the identification accuracy is further improved.

In addition, when the three-dimensional imaging graph is processed through the feature extraction model branch, the device shown in the embodiment of the application divides the three-dimensional imaging graph into at least two three-dimensional image blocks, and processes each three-dimensional image block respectively, so that the data volume of single processing is reduced, and the efficiency of model training and recognition is improved.

In addition, according to the device disclosed by the embodiment of the application, in the model training and recognition process, a plurality of heat maps under different resolutions can be constructed for one three-dimensional imaging map, and training or recognition is performed according to the heat maps under different resolutions, so that the accuracy of the model and the accuracy of the recognition result are improved.

FIG. 8 is a schematic diagram illustrating a configuration of a computer device, according to an example embodiment. The computer device 800 includes a Central Processing Unit (CPU) 801, a system memory 804 including a Random Access Memory (RAM) 802 and a Read Only Memory (ROM) 803, and a system bus 805 connecting the system memory 804 and the central processing unit 801. The computer device 800 also includes a basic input/output system (I/O system) 806, which facilitates transfer of information between various components within the computer, and a mass storage device 807 for storing an operating system 813, application programs 814, and other program modules 815.

The basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse, keyboard, etc. for user input of information. Wherein the display 808 and the input device 809 are connected to the central processing unit 801 through an input output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input/output controller 810 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 810 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable media provide non-volatile storage for the computer device 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 804 and mass storage 807 described above may be collectively referred to as memory.

The computer device 800 may be connected to the internet or other network devices through a network interface unit 811 coupled to the system bus 805.

The memory further includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 801 executes the one or more programs to implement all or part of the steps of the method shown in fig. 3 or fig. 5.

Fig. 9 shows a block diagram of a terminal 900 according to an exemplary embodiment of the present application. The terminal 900 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 900 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

In general, terminal 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement a machine learning model training method or an image recognition method provided by method embodiments herein.

In some embodiments, terminal 900 can also optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Each peripheral may be connected to the peripheral interface 903 by a bus, signal line, or circuit board. For example, the peripheral devices include: at least one of a radio frequency circuit 904, a touch display screen 905, a camera 906, an audio circuit 907, a positioning component 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, providing a front panel of the terminal 900; in other embodiments, the display 905 may be at least two, respectively disposed on different surfaces of the terminal 900 or in a foldable design; in still other embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the terminal 900. Even more, the display 905 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and disposed at different locations of the terminal 900. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate the current geographic Location of the terminal 900 for navigation or LBS (Location Based Service). The Positioning component 908 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 909 is used to provide power to the various components in terminal 900. The power source 909 may be alternating current, direct current, disposable or rechargeable. When the power source 909 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 900 can also include one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 900. For example, the acceleration sensor 911 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 901 can control the touch display 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 911. The acceleration sensor 911 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 912 may detect a body direction and a rotation angle of the terminal 900, and the gyro sensor 912 may cooperate with the acceleration sensor 911 to acquire a 3D motion of the user on the terminal 900. The processor 901 can implement the following functions according to the data collected by the gyro sensor 912: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 913 may be disposed on a side bezel of the terminal 900 and/or underneath the touch display 905. When the pressure sensor 913 is disposed on the side frame of the terminal 900, the user's holding signal of the terminal 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the touch display 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 905. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 914 may be disposed on the front, back, or side of the terminal 900. When a physical key or vendor Logo is provided on the terminal 900, the fingerprint sensor 914 may be integrated with the physical key or vendor Logo.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the touch display 905 based on the ambient light intensity collected by the optical sensor 915. For example, when the ambient light intensity is high, the display brightness of the touch display screen 905 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 905 is turned down. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.

A proximity sensor 916, also known as a distance sensor, is typically provided on the front panel of the terminal 900. The proximity sensor 916 is used to collect the distance between the user and the front face of the terminal 900. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front face of the terminal 900 gradually decreases, the processor 901 controls the touch display 905 to switch from the bright screen state to the dark screen state; when the proximity sensor 916 detects that the distance between the user and the front surface of the terminal 900 gradually becomes larger, the processor 901 controls the touch display 905 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal 900, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as a memory comprising computer programs (instructions), executable by a processor of a computer device to perform all or part of the steps of the methods shown in the various embodiments of the present application, is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An image recognition method, characterized in that the method comprises:

acquiring a three-dimensional imaging graph to be identified;

processing the three-dimensional imaging graph through a feature extraction model branch to obtain an image feature and a heat map of the three-dimensional imaging graph, wherein the image feature is used for indicating an area where a target object in the three-dimensional imaging graph is located, and the heat map is constructed and generated according to the image feature through the feature extraction model branch;

and identifying the image characteristics and the heat map through an identification model branch to obtain an identification result corresponding to the three-dimensional imaging map, wherein the identification result comprises the image attribute of the three-dimensional imaging map.

2. The method according to claim 1, wherein the processing the three-dimensional imaging map through a feature extraction model branch to obtain image features of the three-dimensional imaging map and a heat map constructed according to the image features comprises:

and processing the three-dimensional imaging graph through the feature extraction model branch to obtain the image features of the three-dimensional imaging graph and the heat graphs respectively corresponding to the three-dimensional imaging graph and at least two resolutions.

3. The method according to claim 1 or 2, wherein the processing the three-dimensional imaging graph through a feature extraction model branch to obtain image features of the three-dimensional imaging graph and a heat map constructed according to the image features comprises:

dividing the three-dimensional imaging graph into at least two three-dimensional image blocks;

for each three-dimensional image block of the at least two three-dimensional image blocks, inputting the three-dimensional image block into the feature extraction model branch to obtain a sub-image feature corresponding to the three-dimensional image block and a sub-heat map constructed according to the sub-image feature;

combining the sub-image features corresponding to the at least two three-dimensional image blocks into the image feature;

and combining the sub-heat maps corresponding to the at least two three-dimensional image blocks into the heat map.

4. The method according to claim 2, wherein the at least two resolutions correspond to respective weights, and the processing the image feature and the thermal map through the recognition model branch to obtain the recognition result corresponding to the three-dimensional imaging map comprises:

and performing point multiplication on the image characteristics and the heat map through the identification model branch for each heat map in the heat maps corresponding to the three-dimensional imaging map and at least two resolutions respectively, performing weighted summation on the point multiplication results of the heat maps corresponding to the three-dimensional imaging map and the at least two resolutions respectively according to respective weights corresponding to the at least two resolutions, and obtaining an identification result corresponding to the three-dimensional imaging map according to the weighted summation result.

5. The method of any of claims 1 to 4, further comprising:

before obtaining the three-dimensional imaging graph, obtaining a three-dimensional imaging graph sample marked with image attributes and object region attributes, wherein the object region attributes comprise coordinates and sizes of object regions in the three-dimensional imaging graph sample;

constructing a heat map sample according to the object region attribute;

performing model training according to the three-dimensional imaging chart sample and the heat pattern book to obtain the feature extraction model branch and an image feature sample, wherein the image feature sample is an image feature obtained by processing the three-dimensional imaging chart sample through the feature extraction model branch;

and performing model training according to the heat map sample, the image feature sample and the image attribute corresponding to the three-dimensional imaging map sample to obtain the identification model branch.

6. The method of claim 5, wherein before performing model training based on the three-dimensional imaged graph sample and the thermal pattern sample to obtain the feature extraction model branch and the image feature sample, further comprising:

regularizing the heat map sample to obtain a regularized heat map sample;

performing model training according to the three-dimensional imaging graph sample and the heat pattern book to obtain the feature extraction model branch and the image feature sample, including:

and performing model training according to the three-dimensional imaging graph sample and the regularized heat pattern, and obtaining the feature extraction model branch and the image feature sample.

7. The method according to any one of claims 1 to 4, wherein the feature extraction model branch comprises a three-dimensional depth convolution incomplete network and a feature pyramid network, and the processing of the three-dimensional imaging graph through the feature extraction model branch to obtain the image features of the three-dimensional imaging graph and the heat map constructed according to the image features comprises:

and extracting the image characteristics from the three-dimensional imaging graph through the three-dimensional depth convolution incomplete network, and constructing the heat map through the characteristic pyramid network and the image characteristics.

8. The method of any one of claims 1 to 4, wherein the recognition model branches into end-to-end models based on a mechanism of attention.

9. The method according to any one of claims 1 to 4, wherein the three-dimensional imaging map is a computed tomography CT map or a magnetic resonance imaging map.

10. An image recognition apparatus, characterized in that the apparatus comprises:

the first acquisition module is used for acquiring a three-dimensional imaging image to be identified;

the first processing module is used for processing the three-dimensional imaging graph through a feature extraction model branch to obtain an image feature and a heat map of the three-dimensional imaging graph, wherein the image feature is used for indicating an area where a target object in the three-dimensional imaging graph is located, and the heat map is generated by the feature extraction model branch according to the image feature;

11. The apparatus of claim 10,

the first processing module is configured to process the three-dimensional imaging graph through the feature extraction model branch to obtain image features of the three-dimensional imaging graph and a heat map in which the three-dimensional imaging graph and at least two resolutions respectively correspond.

12. The apparatus according to claim 10 or 11, wherein the first processing module is configured to,

13. The apparatus of claim 11, wherein the at least two resolutions correspond to respective weights, and wherein the second processing module is configured to,

14. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the image recognition method according to any one of claims 1 to 9.

15. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the image recognition method according to any one of claims 1 to 9.