CN113516140A

CN113516140A - Image processing method, model training method, system and equipment

Info

Publication number: CN113516140A
Application number: CN202010378398.8A
Authority: CN
Inventors: 许敏丰; 迟颖; 郭恒
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-05-07
Filing date: 2020-05-07
Publication date: 2021-10-19

Abstract

The embodiment of the application provides an image processing method, a model training method, a system and equipment. The method comprises the following steps: acquiring a first image of a target object and a plurality of second images acquired under a plurality of different states of the target object; extracting spatial features of the target object from the first image by using at least one first network layer in a neural network-based classification model; extracting variation characteristics of the target object among different states from an image sequence formed by arranging the plurality of second images by utilizing at least one second network layer in the classification model; and integrating the spatial features and the variation features to classify the target object. Compared with a scheme of classifying based on spatial features, the technical scheme provided by the embodiment of the application can improve the classification accuracy.

Description

Image processing method, model training method, system and equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, a system, and an apparatus for image processing and model training.

Background

Artificial neural networks: is a mathematical model for information processing using structures similar to brain neurosynaptic connections. It is also often directly referred to in engineering and academia as simply "neural networks" or "neural-like networks".

The deep neural network is one of artificial neural networks, and has become a research hotspot in the field of current image recognition. With the introduction of the deep neural network, the image classification technology is greatly improved. When images are classified, the images are input into a classification model based on a neural network trained in advance, required features are extracted through the classification model, and then classification is carried out. However, the classification accuracy of the current image classification method based on the deep neural network is low.

Disclosure of Invention

In view of the above, the present application is directed to providing an image processing, model training method, system and apparatus that addresses the above problems, or at least partially addresses the above problems.

Thus, in one embodiment of the present application, an image processing method is provided. The method comprises the following steps:

acquiring a first image of a target object and a plurality of second images acquired under a plurality of different states of the target object;

extracting spatial features of the target object from the first image by using at least one first network layer in a neural network-based classification model;

extracting variation characteristics of the target object among different states from an image sequence formed by arranging the plurality of second images by utilizing at least one second network layer in the classification model;

and integrating the spatial features and the variation features to classify the target object.

In another embodiment of the present application, a model training method is provided. The method comprises the following steps:

acquiring a first sample image of a sample object and a plurality of second sample images acquired in a plurality of different states of the sample object;

extracting spatial features of the sample object from the first sample image using at least one first network layer in a neural network-based classification model;

extracting variation characteristics of the sample object among different states from a sample image sequence formed by arranging the plurality of second sample images by using at least one second network layer in the classification model;

integrating the spatial features and the variation features, and classifying the sample objects to obtain actual classification results;

and optimizing the classification model according to the actual classification result and the expected classification result corresponding to the sample object.

In another embodiment of the present application, a neural network system is provided. The neural network system includes: at least one first network layer, at least one second network layer, and at least one third network layer;

the at least one first network layer is used for extracting the spatial characteristics of a target object from a first image of the target object;

the at least one second network layer is used for extracting variation characteristics of the target object among different states from an image sequence formed by arranging a plurality of second images; the plurality of second images are acquired at a plurality of different states of the target object;

the at least one third network layer is used for integrating the spatial characteristics and the variation characteristics to classify the target object.

In another embodiment of the present application, an image processing apparatus is provided. The image processing apparatus includes:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first medical image of a target tissue and a plurality of second medical images acquired under a plurality of different phases corresponding to the target tissue;

a first extraction module, configured to extract spatial features of the target tissue from the first medical image by using at least one first network layer in a neural network-based classification model;

the second extraction module is used for extracting variation characteristics of the target tissue at different phases from a medical image sequence formed by arranging the plurality of second medical images by utilizing at least one second network layer in the classification model;

and the first classification module is used for integrating the spatial characteristics and the variation characteristics to classify the target tissues.

acquiring a first sample medical image of a sample tissue and a plurality of second sample medical images acquired under a plurality of different phases corresponding to the sample tissue;

extracting spatial features of the sample tissue from the first sample medical image using at least one first network layer in a neural network-based classification model;

extracting variation characteristics of the sample tissues at different phases from a sample medical image sequence formed by arranging the plurality of second sample medical images by using at least one second network layer in the classification model;

integrating the spatial features and the variation features, and classifying the sample tissues to obtain an actual classification result;

and optimizing the classification model according to the actual classification result and the expected classification result corresponding to the sample tissue.

the at least one first network layer is used for extracting the spatial characteristics of the target tissue from the first medical image of the target tissue;

the at least one second network layer is used for extracting variation characteristics of the target tissue at different phases from a medical image sequence formed by arranging a plurality of second medical images; the plurality of second medical images are acquired at a plurality of different phase phases corresponding to the target tissue;

the at least one third network layer is used for integrating the spatial features and the variation features to classify the target tissues.

In another embodiment of the present application, an electronic device is provided. The electronic device includes: a memory and a processor, wherein,

the memory is used for storing programs;

the processor, coupled with the memory, to execute the program stored in the memory to:

the memory is used for storing programs;

acquiring a first medical image of a target tissue and a plurality of second medical images acquired under a plurality of different phases corresponding to the target tissue;

extracting spatial features of the target tissue from the first medical image using at least one first network layer in a neural network based classification model;

extracting variation characteristics of the target tissue at different phases from a medical image sequence formed by arranging the plurality of second medical images by using at least one second network layer in the classification model;

and integrating the spatial features and the variation features to classify the target tissues.

the memory is used for storing programs;

In the technical scheme provided by the embodiment of the application, the designed classification model based on the neural network not only can acquire the spatial characteristics of the target object to be classified, but also can acquire the change characteristics of the target object in different states. And classifying the target object by combining the spatial characteristics of the target object and the change characteristics of the target object between different states. Compared with a scheme of classifying based on spatial features, the technical scheme provided by the embodiment of the application can improve the classification accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1a is an exemplary diagram of an image processing method according to an embodiment of the present application;

fig. 1b is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a model training method according to another embodiment of the present disclosure;

FIG. 3 is a schematic flow chart illustrating a model training method according to another embodiment of the present disclosure;

FIG. 4 is a block diagram of an image processing/model training apparatus according to an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device according to another embodiment of the present application.

Detailed Description

Generally, a classification model is to extract features of a target object in a spatial dimension (i.e., spatial features) from one or more images of the target object and classify the target object based on the spatial features. Taking the biometric field as an example, it is common to take images of a user, such as: the face image is used for extracting face spatial features from the face image obtained by shooting, and classifying (two classes, namely a target user or a non-target user) based on the face spatial features, so that the user identity authentication is realized.

The inventor finds that, in the process of researching the technical scheme provided by the embodiment of the application, the shapes of the same object in different states are different, and the change rules of the shapes of different objects in different states are also different. If the information about the change rules is combined in the classification process, the classification accuracy can be certainly improved. For example: in the field of biological recognition, a user can be instructed to make a specified action (such as blinking and smiling), shooting is carried out in the process that the user makes the specified action, and a plurality of frames of face images are obtained, wherein each frame of face image corresponds to a time point, namely corresponds to a state. The change characteristics of the human face form in the process of making the specified action are extracted from the multi-frame human face images, and the user is classified by combining the change characteristics and the human face space characteristics, so that the improvement of the classification accuracy rate can be facilitated.

Therefore, the embodiment of the present application provides a scheme for classifying a target object by simultaneously combining spatial features of the target object and variation features of the target object between different states, so as to improve classification accuracy.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Further, in some flows described in the specification, claims, and above-described figures of the present application, a number of operations are included that occur in a particular order, which operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are used merely to distinguish between the various operations, and do not represent any order of execution per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 1b shows a schematic flowchart of an image processing method according to an embodiment of the present application. The execution main body of the method can be a client or a server. The client may be hardware integrated on the terminal and having an embedded program, may also be first software installed in the terminal, and may also be tool software embedded in an operating system of the terminal, which is not limited in this embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, an intelligent sound box and the like. The server may be a common server, a cloud, a virtual server, or the like, which is not specifically limited in this embodiment of the application. As shown in fig. 1b, the method comprises:

101. a first image of a target object and a plurality of second images acquired at a plurality of different states of the target object are acquired.

102. And extracting the spatial features of the target object from the first image by utilizing at least one first network layer in the neural network-based classification model.

103. And extracting the change characteristics of the target object between different states from the image sequence formed by arranging the plurality of second images by utilizing at least one second network layer in the classification model.

104. And integrating the spatial features and the variation features to classify the target object.

In the above 101, the target object refers to an object to be classified. In the biometric scene, the target object may be a human face. In a medical image processing scenario, the target object may be human tissue. The first image and the second images all comprise target object images.

Taking the target object as a human face as an example, the plurality of second images may be acquired at a plurality of different time points in the process of the user making the designated action, and the plurality of different time points correspond to a plurality of different states. The first image may be acquired while the user remains stationary or any of the plurality of second images described above.

Taking the target object as a human tissue as an example, the first image and the second image are both medical images, such as: CT (Computed Tomography) images, and MRI (Magnetic Resonance Imaging) images. Under the action of medical agents (such as contrast agents), the gray scale of images of human tissues changes differently with time, and the gray scale change rules of different types of human tissues are different. The plurality of second images may be acquired under a plurality of different phases corresponding to the human tissue. Wherein the plurality of different phases may include: at least two of a telogen phase, an early arterial phase, a late arterial phase, a portal venous phase, and a delayed phase. A plurality of different phases corresponds to a plurality of different states. The first image may be acquired with the body tissue in a state without the medical agent or any of the plurality of second images. That is, in an example, the target object may be a target tissue, the first image may be a first medical image, the second image may be a second medical image, the plurality of different states may be a plurality of different phases, and the image sequence may be a medical image sequence.

In 102, a classification model based on a neural network is constructed, where the classification model includes at least one first network layer and at least one second network layer. The classification model may be pre-trained, and the classification model may be trained based on a first sample image of a sample object, a plurality of second sample images acquired at a plurality of different states of the sample object, and expected classification results corresponding to the sample object. The training process of the classification model will be described in detail in the following embodiments. The neural network may be a deep neural network.

And extracting the spatial features of the target object from the first image by utilizing at least one first network layer in the classification model. The spatial feature is a feature of the target object in a spatial dimension.

In one example, the at least one first network layer may form a Convolutional Neural Network (CNN). The spatial characteristics of the target object can be effectively extracted by utilizing the convolutional neural network, and the classification accuracy is improved.

The convolutional neural network may comprise a plurality of feature extraction stages. In one example, the first feature output by the last stage of the plurality of feature extraction stages may be taken as the spatial feature. Typically, the first feature output by the last stage has high-dimensional semantic information, but a large amount of detail information is lost. Therefore, in another embodiment, a first feature output from the penultimate stage in the plurality of feature extraction stages and a second feature output from the penultimate stage can be fused through an attention mechanism to obtain a first fused feature; fusing a third feature output by the third stage from the last to the second feature output by the first stage from the last to obtain a second fused feature through an attention mechanism; and fusing the first fusion feature, the second fusion feature and the second feature to obtain the spatial feature. Therefore, the finally obtained spatial features not only have high-dimensional semantic information, but also contain effective detail information, and the classification accuracy is improved.

In 103, the plurality of second images may be arranged according to a preset order of a plurality of different states to obtain an image sequence. And extracting the change characteristics of the target object between different states from the image sequence by utilizing at least one second network layer in the classification model. The change characteristic is used for representing a change rule of the target object among a plurality of different states arranged according to the preset sequence.

It should be noted that the preset sequence of the plurality of different states can be set according to actual needs. The variation characteristics obtained will also be different for different preset orderings. Therefore, in practical application, the preset sequences of a plurality of different states can be determined according to actual needs before the classification model is trained. When the subsequent model is trained and applied, the preset sequence is adopted to obtain the sample image sequence during training and the image sequence during application.

In an implementation, the method may further include:

105. and acquiring the occurrence time sequence of the plurality of different states.

106. And arranging the plurality of second images according to the occurrence time sequence of the plurality of different states to obtain the image sequence.

Generally, the occurrence times of the plurality of different states of the target object are in a fixed sequential order. And sequencing according to the occurrence time sequence to obtain an image sequence, so that the finally extracted change characteristics can better accord with the change rule under the real condition.

In one example, the at least one second Network layer may form a Recurrent Neural Network (RNN). The recurrent neural network has memorability, can well extract the change characteristics of the target object among different states from the image sequence, and improves the classification accuracy. Specifically, the at least one second network layer may constitute a Long Short-Term Memory network (LSTM). Wherein, the long-short term memory network is one of the recurrent neural networks.

In 104, feature fusion can be performed on the spatial features and the variation features to obtain fusion features; and classifying the target object according to the fusion characteristics.

In an implementation, the at least one second network layer forms a long-term memory network. In 103, "extracting, by using at least one second network layer in the classification model, a change feature of the target object between different states from an image sequence formed by arranging the plurality of second images", specifically:

1031. and inputting the image sequence into the long-time memory network, and extracting the time characteristics of the image sequence to be used as the change characteristics of the target object in different states.

The specific process of extracting the time characteristic of the image sequence from the image sequence by the long-time and short-time memory network can be referred to in the prior art, and is not described in detail in this application.

In practical applications, the step 104 can be implemented by the classification model to improve the classification accuracy. Specifically, the "integrating the spatial features and the variation features to classify the target object" in the above 104 includes:

1041. and fusing the spatial features and the change features by using at least one third network layer in the classification model to obtain fusion features, and classifying the target object according to the fusion features.

In an implementation, the spatial feature and the variation feature may be feature-spliced to obtain a fusion feature.

For example: the spatial features are m-dimensional vectors, the variation features are n-dimensional vectors, and the fusion features obtained by splicing are (m + n) -dimensional vectors.

In another implementation, bilinear pooling may be performed on the spatial features and the variation features to obtain the fusion features.

The bilinear pooling process may be a Multi-modal compact bilinear pooling (Multi-modal compact bilinear pooling) process.

By adopting bilinear pooling, the spatial characteristics and the variation characteristics can be better fused so as to improve the classification accuracy.

In order to further improve the classification accuracy, the number of the first images may be plural. In this way, spatial features containing more information can be extracted based on the plurality of first images, and the representativeness of the extracted spatial features can be improved, thereby improving the classification accuracy. Therefore, the method may further include:

the method may further include:

107. and acquiring a plurality of first images by using a plurality of different imaging parameters in the same state of the target object.

Under the same state of the target object, the obtained multiple first images can represent the target object from multiple aspects by utilizing multiple different imaging parameters.

The same state may be one of the above-mentioned multiple different states or another state different from the above-mentioned multiple different states, and this is not specifically limited in this embodiment of the present application.

Taking an MRI image of a human tissue as an example, by setting imaging parameters of an MRI imaging instrument, under a plurality of different imaging parameters, a plurality of first images such as a T1 in-phase image, a T1 antiphase image, a T2 image, a DWI (Diffusion-weighted imaging) image, an ADC (application Diffusion-weighted Diffusion imaging) image, and the like are obtained by scanning respectively.

It is considered that the target object is likely to move during the process of acquiring the first image and the plurality of second images, so that the positions of the target object images in the first image and the plurality of second images are not aligned with each other. The spatial position of the target object images among the images is not aligned, so that the correct extraction of the features by the neural network model is influenced, and the influence on the classification result is also influenced. In order to avoid the negative effect caused by the position misalignment, the method may further include:

108. and according to the position information of the target object in the first image and the plurality of second images, carrying out position alignment processing on the target object in the first image and the plurality of second images so as to align the geometric positions of the target object in the first image and the plurality of second images with each other.

The above step 108 may be performed before the first image and the sequence of images are input to the classification model.

The position information of the target object in the first image and the plurality of second images specifically refers to position information of the target object image in the first image and the plurality of second images. The geometric positions of the target object in the first image and the plurality of second images are aligned with each other, that is, the coordinates of the geometric positions of the target object image in the first image and the plurality of second images are the same.

In one implementation, the location information may be determined using object location techniques. That is, a target positioning technology is adopted to respectively position the target object in the first image and the plurality of second images to obtain the position information of the target object in the first image and the plurality of second images.

Specifically, each of the first image and the plurality of second images may be sequentially input into a neural network-based target location model, so that the target location model locates the position information of the target object image in each image. The specific implementation of the object localization model can be found in the prior art and will not be described in detail here.

In practical applications, a multi-modality image registration technique may be adopted to perform a position alignment process on the target object in the first image and the second images, so as to align the geometric positions of the target object in the first image and the second images with each other. The specific working principle of the multi-modality image registration technology can be referred to in the prior art, and is not detailed herein.

In addition, before the first image and the image sequence are input into the classification model, normalization processing can be further performed on the first image and the at least one second image so as to eliminate the influence of the surrounding environment on the image gray value during image acquisition.

Fig. 2 shows a schematic flow chart of a model training method provided in an embodiment of the present application. The execution main body of the method can be a client or a server. The client may be hardware integrated on the terminal and having an embedded program, may also be first software installed in the terminal, and may also be tool software embedded in an operating system of the terminal, which is not limited in this embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, an intelligent sound box and the like. The server may be a common server, a cloud, a virtual server, or the like, which is not specifically limited in this embodiment of the application. As shown in fig. 2, the method includes:

201. a first sample image of a sample object and a plurality of second sample images acquired at a plurality of different states of the sample object are acquired.

202. Extracting spatial features of the sample object from the first sample image using at least one first network layer in a neural network based classification model.

203. And extracting the change characteristics of the sample object among different states from a sample image sequence formed by arranging the plurality of second sample images by utilizing at least one second network layer in the classification model.

204. And integrating the spatial features and the change features, and classifying the sample objects to obtain an actual classification result.

205. And optimizing the classification model according to the actual classification result and the expected classification result corresponding to the sample object.

Wherein the classification model is used for classifying the target object.

In the above 201, in the biometric scene, the sample object may be a human face. In a medical image processing scenario, the sample object may be human tissue. The first sample image and the plurality of second sample images each include a sample object image.

Taking the sample object as a human face as an example, the plurality of second sample images may be acquired at a plurality of different time points in the process of the user making the designated action, and the plurality of different time points correspond to a plurality of different states. The first sample image may be acquired while the user remains stationary or any of the plurality of second sample images described above.

Taking the sample object as a human tissue as an example, the first sample image and the second sample image are both medical images, such as: CT (Computed Tomography) images, and MRI (Magnetic Resonance Imaging) images. Under the action of medical agents (such as contrast agents), the images of human tissues change differently with time, such as gray scale changes, and the gray scale change rules of different types of human tissues are different. The plurality of second sample images may be acquired under a plurality of different phases corresponding to the sample human tissue. Wherein the plurality of different phases may include: at least two of a telogen phase, an early arterial phase, a late arterial phase, a portal venous phase, and a delayed phase. A plurality of different phases corresponds to a plurality of different states. The first sample image may be acquired with the human tissue in a state without the medical agent or may be any one of the plurality of second sample images. That is, in an example, the sample object may be a sample tissue, the first sample image is a first sample medical image, the second sample image is a second sample medical image, the plurality of different states are a plurality of different phases, and the sample image sequence is a sample medical image sequence.

In 202, a classification model based on a neural network is constructed, where the classification model includes at least one first network layer and at least one second network layer. And extracting the spatial features of the sample object from the first image by utilizing at least one first network layer in a classification model. The spatial feature is a feature of the sample object in a spatial dimension.

In 203, the plurality of second sample images may be arranged according to a plurality of preset orders of different states, so as to obtain a sample image sequence. And extracting the variation characteristics of the sample object among different states from the sample image sequence by utilizing at least one second network layer in the classification model. The change characteristic is used for representing a change rule of the sample object among a plurality of different states arranged according to the preset sequence.

In an implementation, the method may further include:

206. and acquiring the occurrence time sequence of the plurality of different states.

207. And arranging the second sample images according to the occurrence time sequence of the different states to obtain the sample image sequence.

Therefore, the extracted change characteristics are more consistent with the change rule under the real condition.

In one example, the at least one second Network layer may form a Recurrent Neural Network (RNN). The recurrent neural network has memorability, can well extract the change characteristics of the sample object among different states from the sample image sequence, and improves the classification accuracy. Specifically, the at least one second network layer may constitute a Long Short-Term Memory network (LSTM). Wherein, the long-short term memory network is one of the recurrent neural networks.

In 204, feature fusion can be performed on the spatial features and the variation features to obtain sample fusion features; and classifying the sample objects according to the sample fusion characteristics.

In 205, the classification model may be optimized according to a difference between the actual classification result and an expected classification result corresponding to the sample object. And optimizing the classification model, namely optimizing each network parameter in the classification network.

The initial value of each network parameter in the neural network model may be a random value.

The parameter optimization of the classification model according to the expected recognition result corresponding to the predicted recognition result and the sample time series data may be specifically implemented by using a loss function (loss function), where the loss function is used to estimate the degree of inconsistency (i.e., the above difference) between the actual recognition result and the expected recognition result of the model, and is usually a non-negative real-valued function. Alternatively, the loss function may be embodied as a cross-entropy loss function. Each time the classification model is optimized, the adjustment coefficient of each model parameter in the classification model can be obtained, and the adjustment coefficient of each model parameter is utilized to carry out numerical adjustment on each model parameter, so that the adjusted model parameter of the classification model can be obtained. The method of using the loss function to perform parameter optimization is the same as the prior art, and will not be described in detail herein.

The classification accuracy of the classification model obtained by training with the model training method provided by the embodiment of the application is high. The trained classification model can not only obtain the spatial characteristics of the target object to be classified, but also obtain the variation characteristics of the target object among different states. And classifying the target object by combining the spatial characteristics of the target object and the change characteristics of the target object between different states. Compared with a scheme of classifying based on spatial features, the technical scheme provided by the embodiment of the application can improve the classification accuracy.

In an implementation, the at least one second network layer forms a long-term memory network. In 203, the "extracting, by using at least one second network layer in the classification model, a change feature of the sample object between different states from a sample image sequence formed by arranging the plurality of second sample images" specifically includes:

2031. and inputting the sample image sequence into the long-time memory network, and extracting the time characteristics of the sample image sequence to be used as the change characteristics of the sample object in different states.

The specific process of extracting the temporal features of the sample image sequence from the sample image sequence by the long-time and short-time memory network can be referred to in the prior art, and is not described in detail in this application.

In practical applications, the step 204 can be implemented by the classification model. Specifically, in 204, the "integrating the spatial features and the variation features to classify the sample object to obtain an actual classification result" includes:

2041. and fusing the spatial features and the variation features by using at least one third network layer in the classification model to obtain sample fusion features, and classifying the sample objects according to the sample fusion features.

In an implementation, the spatial feature and the variation feature may be feature-spliced to obtain a sample fusion feature.

For example: the spatial features are m-dimensional vectors, the variation features are n-dimensional vectors, and the sample fusion features obtained by splicing are (m + n) -dimensional vectors.

In another implementation, bilinear pooling may be performed on the spatial features and the variation features to obtain the sample fusion features.

The bilinear pooling process may be a multi-modal compact bilinear pooling process.

In order to further improve the classification accuracy, the number of the first sample images may be plural. In this way, spatial features containing more information can be extracted based on the plurality of first sample images, and the representativeness of the extracted spatial features can be improved, so that the classification accuracy is improved. Therefore, the method may further include:

208. and acquiring a plurality of first sample images by using a plurality of different imaging parameters under the same state of the sample object.

Under the same state of the sample object, the obtained multiple first images can represent the sample object from multiple aspects by utilizing multiple different imaging parameters.

Taking an MRI image of a human tissue as an example, by setting imaging parameters of an MRI imaging instrument, under a plurality of different imaging parameters, a plurality of first sample images such as a T1 in-phase image, a T1 antiphase image, a T2 image, a DWI (Diffusion-weighted imaging) image, an ADC (application Diffusion-weighted Diffusion imaging) image, and the like are obtained by scanning respectively.

It is considered that the sample object is likely to move during the acquisition of the first sample image and the plurality of second sample images, resulting in the position of the sample object image in the first sample image and the plurality of second sample images not being aligned with each other. The spatial position of the sample object images among the images is not aligned, so that the correct extraction of the features by the neural network model is influenced, and the influence on the classification result is also influenced. In order to avoid the negative effect caused by the position misalignment, the method may further include:

209. according to the position information of the sample object in the first sample image and the plurality of second sample images, carrying out position alignment processing on the sample object in the first sample image and the plurality of second sample images so as to align the geometric positions of the sample object in the first sample image and the plurality of second sample images with each other.

The above step 209 may be performed before the first image and the sequence of images are input to the classification model.

The position information of the sample object in the first sample image and the plurality of second sample images specifically refers to the position information of the sample object image in the first sample image and the plurality of second sample images. The geometric positions of the sample object in the first sample image and the second sample images are aligned with each other, that is, the coordinates of the geometric positions of the sample object image in the first sample image and the second sample images are the same.

In one implementation, the location information may be determined using object location techniques. That is, a target positioning technique is adopted to respectively position the sample objects in the first sample image and the plurality of second sample images, so as to obtain the position information of the sample objects in the first sample image and the plurality of second sample images.

Specifically, each sample image in the first sample image and the plurality of second sample images may be sequentially input into a neural network-based target positioning model, so that the target positioning model positions the position information of the sample object image in each sample image.

In practical applications, a multi-modality image registration technique may be adopted to perform a position alignment process on the sample object in the first sample image and the second sample images, so as to align the geometric positions of the sample object in the first sample image and the second sample images with each other.

In one example, the human tissue may be nodular tissue on organs such as liver and lung. In the enhanced CT image or the enhanced MRI image, the gray scale change of different types of typical nodules in different phase images is different and is strongly related to time. However, in the existing scheme, when feature extraction is performed on different phase diagrams of nodules, the time correlation between the features is not considered, which is also the reason that the classification accuracy of the existing scheme is low. Therefore, the scheme takes the change characteristics of the nodules at different phases into consideration, and can effectively improve the accuracy of typical nodule classification. In addition, at present, for nodules smaller than 2cm, the blood supply of blood vessels is not very clear, so that the nodules have no obvious typical characteristics, and the time domain knowledge is limited, so that the scheme also takes the spatial characteristics of the nodules into consideration to enhance the accuracy of classification of the atypical nodules.

The embodiment of the application also provides a neural network system. The system comprises: at least one first network layer, at least one second network layer, and at least one third network layer. Wherein,

the at least one first network layer is configured to extract a spatial feature of a target object from a first image of the target object.

And the at least one second network layer is used for extracting the change characteristics of the target object between different states from an image sequence formed by arranging a plurality of second images.

Wherein the plurality of second images are acquired at a plurality of different states of the target object.

For specific structures and processing procedures of the at least one first network layer, the at least one second network layer, and the at least one third network layer, reference may be made to corresponding contents in the foregoing embodiments, and details are not described herein again.

The neural network system provided by the application can acquire not only the spatial characteristics of the target object to be classified, but also the variation characteristics of the target object among different states. And classifying the target object by combining the spatial characteristics of the target object and the change characteristics of the target object between different states. Compared with a scheme of classifying based on spatial features, the technical scheme provided by the embodiment of the application can improve the classification accuracy.

Optionally, the system may further include: and a pre-processing module.

The preprocessing module can be used for acquiring the occurrence time sequence of the plurality of different states; and arranging the plurality of second images according to the occurrence time sequence of the plurality of different states to obtain the image sequence.

Optionally, the preprocessing module may be configured to perform a position alignment process on a target object in the first image and the plurality of second images according to position information of the target object in the first image and the plurality of second images, so that geometric positions of the target object in the first image and the plurality of second images are aligned with each other.

Optionally, the preprocessing module may be configured to separately locate the target object in the first image and the plurality of second images by using a target location technology, so as to obtain location information of the target object in the first image and the plurality of second images.

Optionally, the number of the first images is plural; the plurality of first images are acquired using a plurality of different imaging parameters in the same state of the target object.

Fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic device includes a memory 1101 and a processor 1102. The memory 1101 may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device. The memory 1101 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The memory is used for storing programs;

the processor 1102, coupled to the memory 1101, is configured to execute the program stored in the memory 1101 to:

Wherein the target tissue is specifically a target human tissue. The first medical image and the second medical image may be a CT (Computed Tomography) image or an MRI (Magnetic Resonance Imaging) image. Under the action of medical agents (such as contrast agents), the gray scale change of the image of human tissues can change differently along with the change of time, and the gray scale change rules of different types of human tissues are different. The plurality of second sample images may be acquired under a plurality of different phases corresponding to the human tissue. Wherein the plurality of different phases may include: at least two of a telogen phase, an early arterial phase, a late arterial phase, a portal venous phase, and a delayed phase. A plurality of different phases corresponds to a plurality of different states. The first sample image may be acquired with the human tissue in a state without the medical agent or may be any one of the plurality of second sample images.

When the processor 1102 executes the program in the memory 1101, the processor 1102 may also implement other functions in addition to the above functions, which may be specifically referred to the description of the foregoing embodiments.

Further, as shown in fig. 5, the electronic device further includes: communication components 1103, display 1104, power components 1105, audio components 1106, and the like. Only some of the components are schematically shown in fig. 5, and it is not meant that the electronic device comprises only the components shown in fig. 5.

Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement, when executed by a computer, the following:

The computer program can also realize other functions when being executed by a computer, and the specific reference can be made to the description of the previous embodiments.

In practical application, the sample object in the model training method may be a sample tissue, for example: nodular tissue on organs such as liver and lung. The first sample image is specifically a first sample medical image, the second sample image is specifically a second sample medical image, the sample image sequence is specifically a sample medical image sequence, and the plurality of different states are specifically a plurality of different phases. Specifically, as shown in fig. 3, the method includes:

301. a first sample medical image of a sample tissue and a plurality of second sample medical images acquired at a plurality of different phase positions corresponding to the sample tissue are acquired.

302. And extracting the spatial features of the sample tissues from the first sample medical image by utilizing at least one first network layer in the neural network-based classification model.

303. And extracting the change characteristics of the sample tissues at different phases from the sample medical image sequence formed by arranging the plurality of second sample medical images by utilizing at least one second network layer in the classification model.

304. And integrating the spatial features and the change features, and classifying the sample tissues to obtain an actual classification result.

305. And optimizing the classification model according to the actual classification result and the expected classification result corresponding to the sample tissue.

For specific implementation of the steps 301 to 305, reference may be made to corresponding contents in the above embodiments, and details are not described herein.

Here, it should be noted that: the content of each step in the method provided by the embodiment of the present application, which is not described in detail in the foregoing embodiment, may refer to the corresponding content in the foregoing embodiment, and is not described herein again. In addition, the method provided in the embodiment of the present application may further include, in addition to the above steps, other parts or all of the steps in the above embodiments, and specific reference may be made to corresponding contents in the above embodiments, which is not described herein again.

The following is presented taking the target tissue as the hepatic nodule tissue as an example: as shown in fig. 1a, a user can introduce five first medical images, such as a T1 homophase map, a T1 antiphase map, a T2 map, a DWI map and an ADC map, and five second medical images, such as a pan-phase map, an arterial early-phase map, an arterial late-phase map, a portal vein phase map and a delayed-phase map, of liver nodule tissue in a terminal interface (only three images are exemplarily shown in fig. 1a, and ten images can be shown in practice). After receiving an image input by a user, a terminal inputs five first medical images into a CNN (CNN network) 10 in a classification model, and extracts spatial features of a liver nodule tissue; the five second medical images are arranged in sequence into a medical image sequence, and the medical image sequence is input into the LSTM network 20 in the classification model, and the change characteristics are extracted. Then, the terminal inputs both the spatial features and the variation features into a bilinear pooling layer 30 in the classification model, so that the bilinear pooling layer 30 performs bilinear pooling on the spatial features and the variation features to obtain fusion features. And finally, inputting the fusion characteristics into the full-connection layer 40 in the classification model by the terminal for classification, obtaining a classification result and outputting and displaying the classification result.

It is to be added that at least one third network layer in the classification model may include a bilinear pooling layer and a full link layer.

The neural network system provided by the application can acquire not only the spatial characteristics of the target tissue to be classified, but also the variation characteristics of the target tissue among different states. And the spatial characteristics of the target tissue and the change characteristics of the target tissue between different states are combined to classify the target tissue. Compared with a scheme of classifying based on spatial features, the technical scheme provided by the embodiment of the application can improve the classification accuracy.

Optionally, the system may further include: and a pre-processing module.

The preprocessing module can be used for acquiring the occurrence time sequence of the plurality of different states; and arranging the plurality of second medical images according to the occurrence time sequence of the plurality of different states to obtain the medical image sequence.

Optionally, the pre-processing module may be configured to perform a position alignment process on the target tissue in the first medical image and the plurality of second medical images according to the position information of the target tissue in the first medical image and the plurality of second medical images, so as to align the geometric positions of the target tissue in the first medical image and the plurality of second medical images with each other.

Optionally, the preprocessing module may be configured to separately locate the target tissue in the first medical image and the plurality of second medical images by using a target location technology, so as to obtain location information of the target tissue in the first medical image and the plurality of second medical images.

Optionally, the number of the first medical images is plural; the plurality of first medical images are acquired under the same state of the target tissue by using a plurality of different imaging parameters.

For the specific implementation process of the preprocessing module, reference may be made to corresponding contents in the foregoing embodiments, and details are not described herein.

Fig. 4 shows a block diagram of an image processing apparatus according to still another embodiment of the present application. As shown in fig. 4, the apparatus includes:

a first obtaining module 401, configured to obtain a first image of a target object and a plurality of second images acquired in a plurality of different states of the target object;

a first extraction module 402, configured to extract a spatial feature of the target object from the first image by using at least one first network layer in a neural network-based classification model;

a second extraction module 403, configured to extract, by using at least one second network layer in the classification model, a variation feature of the target object between different states from an image sequence formed by arranging the plurality of second images;

a first classification module 404, configured to synthesize the spatial features and the variation features to classify the target object.

Optionally, the apparatus may further include:

the second acquisition module is used for acquiring the occurrence time sequence of the different states;

and the first arrangement module is used for arranging the plurality of second images according to the occurrence time sequence of the plurality of different states to obtain the image sequence.

Optionally, the number of the first images is plural; the above apparatus may further include:

the first acquisition module is used for acquiring a plurality of first images by using a plurality of different imaging parameters in the same state of the target object.

Optionally, the apparatus may further include:

a first position processing module, configured to perform position alignment processing on a target object in the first image and the plurality of second images according to position information of the target object in the first image and the plurality of second images, so that geometric positions of the target object in the first image and the plurality of second images are aligned with each other.

Optionally, the apparatus may further include:

a first positioning module, configured to separately position a target object in the first image and the plurality of second images by using a target positioning technology, so as to obtain position information of the target object in the first image and the plurality of second images.

Here, it should be noted that: the image apparatus provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the above method embodiments, and is not described herein again.

Fig. 4 shows a block diagram of a model training apparatus according to another embodiment of the present application. As shown in fig. 4, the apparatus includes:

a first obtaining module 401, configured to obtain a first sample image of a sample object and a plurality of second sample images acquired in a plurality of different states of the sample object;

a first extraction module 402, configured to extract spatial features of the sample object from the first sample image using at least one first network layer in a neural network-based classification model;

a second extraction module 403, configured to extract, by using at least one second network layer in the classification model, a variation feature of the sample object between different states from a sample image sequence formed by arranging the plurality of second sample images;

a first classification module 404, configured to synthesize the spatial features and the variation features, and classify the sample object to obtain an actual classification result;

a first optimizing module 405, configured to optimize the classification model according to the actual classification result and an expected classification result corresponding to the sample object.

Optionally, the apparatus may further include:

and the first arrangement module is used for arranging the second sample images according to the occurrence time sequence of the different states to obtain the sample image sequence.

the first acquisition module is used for acquiring a plurality of first sample images by using a plurality of different imaging parameters in the same state of the sample object.

Optionally, the apparatus may further include:

a first position processing module, configured to perform position alignment processing on the sample objects in the first sample image and the plurality of second sample images according to position information of the sample objects in the first sample image and the plurality of second sample images, so that geometric positions of the sample objects in the first sample image and the plurality of second sample images are aligned with each other.

Optionally, the apparatus may further include:

a first positioning module, configured to position the sample object in the first sample image and the plurality of second sample images by using a target positioning technology, so as to obtain position information of the sample object in the first sample image and the plurality of second sample images.

Here, it should be noted that: the model training device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the above method embodiments, which is not described herein again.

a first obtaining module 401, configured to obtain a first medical image of a target tissue and a plurality of second medical images acquired in a plurality of different phases corresponding to the target tissue;

a first extraction module 402, configured to extract spatial features of the target tissue from the first medical image by using at least one first network layer in a neural network-based classification model;

a second extraction module 403, configured to extract, by using at least one second network layer in the classification model, a variation feature of the target tissue between different phases from a medical image sequence formed by arranging the plurality of second medical images;

a first classification module 404, configured to integrate the spatial features and the variation features to classify the target tissue.

Optionally, the apparatus may further include:

and the first arrangement module is used for arranging the plurality of second medical images according to the occurrence time sequence of the plurality of different states to obtain the medical image sequence.

Optionally, the number of the first medical images is plural; the above apparatus may further include:

the first acquisition module is used for acquiring a plurality of first medical images by using a plurality of different imaging parameters in the same state of the target tissue.

Optionally, the apparatus may further include:

a first position processing module, configured to perform position alignment processing on a target tissue in the first medical image and the plurality of second medical images according to position information of the target tissue in the first medical image and the plurality of second medical images, so as to align geometric positions of the target tissue in the first medical image and the plurality of second medical images with each other.

Optionally, the apparatus may further include:

a first positioning module, configured to separately position a target tissue in the first medical image and the plurality of second medical images by using a target positioning technology, so as to obtain position information of the target tissue in the first medical image and the plurality of second medical images.

Here, it should be noted that: the image processing apparatus provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module may refer to the corresponding content in the above method embodiments, and is not described herein again.

a first obtaining module 401, configured to obtain a first sample medical image of a sample tissue and a plurality of second sample medical images acquired in a plurality of different phases corresponding to the sample tissue;

a first extraction module 402, configured to extract spatial features of the sample tissue from the first sample medical image by using at least one first network layer in a neural network-based classification model;

a second extraction module 403, configured to extract, by using at least one second network layer in the classification model, a variation feature of the sample tissue between different phases from a sample medical image sequence formed by arranging the plurality of second sample medical images;

a first classification module 404, configured to synthesize the spatial features and the variation features, and classify the sample tissue to obtain an actual classification result;

a first optimizing module 405, configured to optimize the classification model according to the actual classification result and an expected classification result corresponding to the sample tissue.

Here, it should be noted that: the model training device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle of each module may refer to the corresponding content in the above method embodiments, which is not described herein again.

The memory is used for storing programs;

the processor 1102 is coupled to the memory 1101, and configured to execute the program stored in the memory 1101, so as to implement the image processing method or the model training method in the foregoing embodiments.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program, which, when executed by a computer, can implement the steps or functions of the image processing method and the model training method provided in the foregoing embodiments.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image processing method, comprising:

2. The method of claim 1, further comprising:

acquiring the occurrence time sequence of the plurality of different states;

and arranging the plurality of second images according to the occurrence time sequence of the plurality of different states to obtain the image sequence.

3. The method of claim 1, wherein the at least one second network layer constitutes a long-and-short term memory network;

extracting the change characteristics of the target object between different states from an image sequence formed by arranging the plurality of second images by using at least one second network layer in the classification model, wherein the change characteristics comprise:

and inputting the image sequence into the long-time memory network, and extracting the time characteristics of the image sequence to be used as the change characteristics of the target object in different states.

4. The method of any one of claims 1 to 3, wherein integrating the spatial features and the varying features to classify the target object comprises:

and fusing the spatial features and the change features by using at least one third network layer in the classification model to obtain fusion features, and classifying the target object according to the fusion features.

5. The method of claim 4, wherein fusing the spatial features and the variance features to obtain fused features comprises:

and carrying out bilinear pooling on the spatial characteristics and the variation characteristics to obtain the fusion characteristics.

6. The method according to any one of claims 1 to 3, wherein the number of the first images is plural;

the method further comprises the following steps:

and acquiring a plurality of first images by using a plurality of different imaging parameters in the same state of the target object.

7. The method of any of claims 1 to 3, further comprising:

and according to the position information of the target object in the first image and the plurality of second images, carrying out position alignment processing on the target object in the first image and the plurality of second images so as to align the geometric positions of the target object in the first image and the plurality of second images with each other.

8. The method of claim 7, further comprising:

and respectively positioning the target object in the first image and the plurality of second images by adopting a target positioning technology to obtain the position information of the target object in the first image and the plurality of second images.

9. A method of model training, comprising:

10. The method of claim 9, further comprising:

acquiring the occurrence time sequence of the plurality of different states;

and arranging the second sample images according to the occurrence time sequence of the different states to obtain the sample image sequence.

11. The method of claim 9, wherein the at least one second network layer constitutes a long-and-short term memory network;

extracting the change characteristics of the sample object among different states from a sample image sequence formed by arranging the plurality of second sample images by using at least one second network layer in the classification model, wherein the change characteristics comprise:

and inputting the sample image sequence into the long-time memory network, and extracting the time characteristics of the sample image sequence to be used as the change characteristics of the sample object in different states.

12. The method of any one of claims 9 to 11, wherein integrating the spatial features and the varying features to classify the sample object comprises:

and fusing the spatial features and the variation features by using at least one third network layer in the classification model to obtain sample fusion features, and classifying the sample objects according to the sample fusion features.

13. The method according to any one of claims 9 to 11, wherein the number of the first sample images is plural;

the method further comprises the following steps:

and acquiring a plurality of first sample images by using a plurality of different imaging parameters under the same state of the sample object.

14. The method of any of claims 9 to 11, further comprising:

according to the position information of the sample object in the first sample image and the plurality of second sample images, carrying out position alignment processing on the sample object in the first sample image and the plurality of second sample images so as to align the geometric positions of the sample object in the first sample image and the plurality of second sample images with each other.

15. The method of claim 14, further comprising:

and positioning the sample object in the first sample image and the plurality of second sample images by adopting a target positioning technology to obtain the position information of the sample object in the first sample image and the plurality of second sample images.

16. A neural network system, comprising: at least one first network layer, at least one second network layer, and at least one third network layer;

17. The method of claim 16, wherein the at least one second network layer constitutes a long-and-short memory network.

18. An image processing apparatus characterized by comprising:

19. A method of model training, comprising:

20. A neural network system, comprising: at least one first network layer, at least one second network layer, and at least one third network layer;

21. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

22. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

23. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

24. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;