CN117496216A - Image classification method, device, equipment, medium and product - Google Patents
Image classification method, device, equipment, medium and product Download PDFInfo
- Publication number
- CN117496216A CN117496216A CN202311234897.XA CN202311234897A CN117496216A CN 117496216 A CN117496216 A CN 117496216A CN 202311234897 A CN202311234897 A CN 202311234897A CN 117496216 A CN117496216 A CN 117496216A
- Authority
- CN
- China
- Prior art keywords
- target image
- neural network
- image data
- convolutional neural
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 239000013598 vector Substances 0.000 claims abstract description 99
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 80
- 238000011176 pooling Methods 0.000 claims abstract description 80
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000004590 computer program Methods 0.000 claims abstract description 41
- 230000006870 function Effects 0.000 claims description 43
- 238000012545 processing Methods 0.000 claims description 20
- 230000009466 transformation Effects 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000006243 chemical reaction Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 210000004205 output neuron Anatomy 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The application relates to an image classification method, an image classification device, a computer device, a storage medium and a computer program product, which can be used in the technical field of artificial intelligence. The method comprises the following steps: acquiring a historical image data set and a pre-constructed convolutional neural network, training the pre-constructed convolutional neural network by utilizing the historical image data set, wherein the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer; acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain corresponding feature vectors of the target image data; and determining the target image category of the target image based on the full connection layer and the feature vector. According to the method, parameters needing to be learned in the model can be greatly reduced, and therefore the working efficiency of the model is improved.
Description
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an image classification method, apparatus, computer device, storage medium, and computer program product.
Background
The VGG model (Visual Geometry Group ) is a deep convolutional neural network model in which the largest pooling layer is used to extract the most responsive, strongest part of the features for input to the next layer. However, in a deeper network, the feature size is smaller, and more semantic information is contained, so that the feature extraction capability and the dimension reduction capability of the last largest pooling layer before entering the full-connection layer are weaker, and more full-connection layers are needed to improve the model effect, so that the number of parameters to be learned in the model is increased, and the working efficiency of the model is reduced.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an image classification method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the working efficiency of a model.
In a first aspect, the present application provides an image classification method, the method comprising:
acquiring a historical image data set and a pre-constructed convolutional neural network, and training the pre-constructed convolutional neural network by utilizing the historical image data set, wherein the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer;
acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain a corresponding feature vector of the target image data;
and determining a target image category of the target image based on the full connection layer and the feature vector.
In one embodiment, the total number of the convolution layers is one layer greater than the total number of the maximum pooling layers, one maximum pooling layer exists between every two convolution layers, the global average pooling layer is located after the last convolution layer, and the full connection layer is located after the global average pooling layer.
In one embodiment, the inputting the target image data into the trained convolutional neural network to obtain the corresponding feature vector of the target image data includes:
acquiring a plurality of feature images obtained by carrying out convolution operation processing on target image data through a plurality of convolution layers, wherein each convolution layer corresponds to one feature image;
inputting all the feature images into a global average pooling layer to obtain the pixel average value of all the pixels in each feature image;
and arranging all the pixel average values according to the sequence of the corresponding convolution layers in the convolution neural network to obtain the feature vector.
In one embodiment, the determining the target image category of the target image based on the fully connected layer and the feature vector includes:
acquiring a weight matrix in the full connection layer, wherein weight parameters in the weight matrix correspond to elements in the feature vector one by one;
calculating the product between each element and the corresponding weight parameter, and determining a linear transformation result based on the sum of all the products;
nonlinear processing is carried out on the linear transformation result by utilizing a nonlinear activation function, so as to obtain a prediction class vector;
the target image class is determined based on the predicted class vector and a normalized exponential function.
In one embodiment, the determining the target image class of the target image based on the prediction class vector and the normalized exponential function includes:
converting all elements in the prediction category vector based on the normalized exponential function, wherein the elements in the prediction category vector correspond to the image categories one by one;
determining the ratio of the conversion result of each element to the sum of the conversion results of all elements as a target probability corresponding to the element;
and determining the image category corresponding to the maximum target probability as the target image category.
In one embodiment, after training the pre-constructed convolutional neural network using the historical image dataset, the training method further comprises:
judging whether the current training round number reaches the preset training round number, and if so, stopping the training process of the pre-constructed convolutional neural network.
In a second aspect, the present application further provides an image classification apparatus, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a historical image data set and a pre-constructed convolutional neural network, the pre-constructed convolutional neural network is trained by utilizing the historical image data set, and the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer;
the second acquisition module is used for acquiring target image data, inputting the target image data into the trained convolutional neural network and obtaining corresponding feature vectors of the target image data;
and the determining module is used for determining the target image category of the target image based on the full connection layer and the feature vector.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of any of the embodiments described above when the computer program is executed by the processor.
In a fourth aspect, the present application also provides a computer-readable storage medium. A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the embodiments described above.
In a fifth aspect, the present application also provides a computer program product. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the embodiments described above.
The image classification method, the device, the computer equipment, the storage medium and the computer program product acquire a historical image data set and a pre-constructed convolutional neural network, train the pre-constructed convolutional neural network by utilizing the historical image data set, and the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer; acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain corresponding feature vectors of the target image data; and determining the target image category of the target image based on the full connection layer and the feature vector. According to the method, the global average pooling layer is arranged in the model, so that the image classification of the target image can be finished only by arranging the full-connection layer, parameters needing to be learned in the model can be greatly reduced, and the working efficiency of the model is improved.
Drawings
FIG. 1 is a flow chart of an image classification method according to an embodiment;
FIG. 2 is a comparison of VGG-19 network models before and after modification in one embodiment;
FIG. 3 is a flow chart of a feature vector determination method in one embodiment;
FIG. 4 is a flow chart of an image classification method according to another embodiment;
FIG. 5 is a block diagram of an image classification apparatus in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, an image classification method is provided, where the embodiment is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
s102, acquiring a historical image data set and a pre-constructed convolutional neural network, and training the pre-constructed convolutional neural network by utilizing the historical image data set, wherein the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer.
The image classification method provided in the embodiment may be used for classifying different currencies in the banking or finance field, the historical image dataset may be a set of corresponding feature data of different currencies, and in other fields, the image classification method may also be used for other purposes, which is not specifically limited in the embodiment of the present application; the convolutional neural network may be a VGG-19 network; each convolution layer uses a convolution kernel of 3x3 size and a ReLU activation function, which extract features of the input image by being stacked continuously; between every two convolution layers, there is a 2x2 max pooling layer for reducing the spatial size of the feature map and retaining the main features; the global average pooling layer is used for converting the characteristic diagram of the convolutional neural network into a characteristic vector with fixed length; the fully connected layer is used to convert the extracted feature map into a final output.
Specifically, the historical image data set is input into a pre-constructed convolutional neural network, the convolutional neural network is trained, and the training process is ended under the condition that the training process meets the preset condition. For example, the preset condition may be that the number of training rounds reaches a preset number of rounds, the training loss is stabilized, or the training loss is reduced to a preset value.
S104, acquiring target image data, and inputting the target image data into the trained convolutional neural network to obtain corresponding feature vectors of the target image data.
In the banking or financial field, the target image data may be image data of a currency of unknown kind.
Before inputting the target image data into the trained convolutional neural network, preprocessing is required to be performed on the target image, wherein the preprocessing content comprises: and adjusting the size of the target image to be the corresponding input size of the trained convolutional neural network, normalizing pixels in the target image, and converting the target image into a tensor format required by the model.
Specifically, after target image data is input into a trained convolutional neural network, a feature extraction layer is determined in a plurality of convolutional layers, and a feature map output by the feature extraction layer is processed by using a global average pooling layer, so that a feature vector corresponding to the target image data is obtained. Wherein the feature extraction layer may be at least one of a plurality of convolution layers.
S106, determining the target image category of the target image based on the full connection layer and the feature vector.
Specifically, the feature vector is input to the full connection layer, linear transformation is carried out on the feature vector based on a weight matrix in the full connection layer, nonlinear processing is carried out on a result of the linear transformation based on a nonlinear activation function, then a normalization exponential function is utilized to process the nonlinear processing result, and finally the target image category is determined.
In the image classification method, a historical image data set and a pre-constructed convolutional neural network are obtained, the pre-constructed convolutional neural network is trained by utilizing the historical image data set, and the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer; acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain corresponding feature vectors of the target image data; and determining the target image category of the target image based on the full connection layer and the feature vector. According to the method, the global average pooling layer is arranged in the model, so that the image classification of the target image can be finished only by arranging the full-connection layer, parameters needing to be learned in the model can be greatly reduced, and the working efficiency of the model is improved.
In some embodiments, the total number of layers of the convolution layers is one more than the total number of layers of the largest pooling layer, one largest pooling layer exists between every two convolution layers, the global average pooling layer is located after the last convolution layer, and the full connection layer is located after the global average pooling layer.
The convolutional neural network may be a VGG-19 network model, as shown in fig. 2, in which the left side in fig. 2 is a network structure diagram of the existing VGG-19, the right side in fig. 2 is a network structure diagram of the VGG-19 modified in this embodiment, in which conv is a convolutional layer, maxpool is a max pooling layer, maxpool/2 is a max pooling layer with a stride of 2, fc-4096 is a fully connected layer with 4096 output neurons, fc-4 is a fully connected layer with 4 output neurons, fc-1000 is a fully connected layer with 1000 output neurons, GAP is a global average pooling layer, and Softmax is a normalized exponential function. As can be seen from fig. 2, the VGG-19 network model in the present embodiment uses a global average pooling layer instead of a maximum pooling layer and a full-connection layer, and the number of output neurons of the remaining full-connection layer is also greatly reduced compared to the conventional VGG-19 network model.
In this embodiment, a global average pooling layer is used to replace a maximum pooling layer and a full-connection layer, so that a network model can be simplified, and the number of output neurons of the full-connection layer can be greatly reduced, thereby improving the operation efficiency of the network.
In some embodiments, as shown in fig. 3, inputting the target image data into the trained convolutional neural network to obtain the corresponding feature vector of the target image data, including:
s302, acquiring a plurality of feature images obtained by the target image data through convolution operation processing of a plurality of convolution layers, wherein each convolution layer corresponds to one feature image.
The characteristic diagram is a data structure in the convolutional neural network, is a result output by a corresponding convolutional layer, and is obtained by performing convolutional operation on input data by neurons in the convolutional layer.
S304, inputting all the feature graphs into a global average pooling layer to obtain the pixel average value of all the pixels in each feature graph.
Wherein the feature map may be converted to scalar values by calculating the pixel average of all pixels in each feature map.
S306, arranging the average values of all the pixels according to the sequence of the corresponding convolution layers in the convolution neural network to obtain the feature vector.
Wherein, a feature vector with a fixed length can be obtained by arranging the scalar values corresponding to each feature map in sequence.
In this embodiment, the global average pooling layer is used to determine the feature vector corresponding to the target image data, so that the size of the feature map can be effectively reduced, and thus the number of parameters and the calculation amount of the network are reduced.
In some embodiments, determining a target image class for the target image based on the full connection layer and the feature vector comprises: acquiring a weight matrix in the full connection layer, wherein weight parameters in the weight matrix correspond to elements in the feature vector one by one; calculating the product between each element and the corresponding weight parameter, and determining a linear transformation result based on the sum of all the products; nonlinear processing is carried out on the linear transformation result by utilizing a nonlinear activation function to obtain a prediction category vector; a target image class is determined based on the predicted class vector and the normalized exponential function.
Wherein, the process of linearly transforming the feature vector is equivalent to a combination and mapping of the feature vector; the function of the nonlinear activation function is to introduce nonlinearity so that the fully connected layer can learn more complex features and modes, and the nonlinear activation function can be a ReLU (Rectified Linear Unit) function, a Sigmoid function or a Tanh function.
In this embodiment, the feature vectors are sequentially subjected to linear transformation and nonlinear processing in the full-connection layer, so that the final image class prediction result is more accurate.
In some embodiments, determining the target image class for the target image based on the predicted class vector and the normalized exponential function comprises: converting all elements in the prediction category vector based on the normalized exponential function, wherein the elements in the prediction category vector correspond to the image categories one by one; determining the ratio of the conversion result of each element to the sum of the conversion results of all elements as the target probability corresponding to the element; and determining the image category corresponding to the maximum target probability as the target image category.
The normalized exponential function is used to convert the original output of a vector into a vector representing the probability distribution, and the normalized exponential function may be a Softmax function, which may convert the elements in the prediction category vector into an exponential function.
In this embodiment, the elements in the prediction class vector are converted to the exponential function by using the Softmax function, so that the non-negativity of the finally obtained target probability can be ensured.
In some embodiments, after training the pre-constructed convolutional neural network with the historical image dataset, further comprising: judging whether the current training round number reaches the preset training round number, and if so, stopping the training process of the pre-constructed convolutional neural network.
The preset training wheel number is determined based on historical training experience and actual requirements.
In this embodiment, whether to stop training on the convolutional neural network is determined based on the preset training round number, so that the training result of the model is more accurate.
In one embodiment, another image classification method is provided, as shown in fig. 4, which includes the following: firstly, constructing a VGG-19 convolutional neural network based on improvement, then training the network by using a historical image data set, saving a trained network model, finally substituting target image data into the model to extract feature vectors, and determining the category of the target image.
The improved VGG-19 convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full-connection layer, wherein the total layer number of the convolutional layers is one layer more than that of the maximum pooling layer, one maximum pooling layer exists between every two convolutional layers, and the global average pooling layer is positioned after the last convolutional layer and the full-connection layer is positioned after the global average pooling layer.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an image classification device for realizing the above related image classification method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the image classification device or devices provided below may be referred to the limitation of the image classification method hereinabove, and will not be repeated here.
In one embodiment, as shown in fig. 5, there is provided an image classification apparatus 500 comprising: a first acquisition module 501, a second acquisition module 502, and a determination module 503, wherein:
a first obtaining module 501, configured to obtain a historical image dataset and a pre-constructed convolutional neural network, and train the pre-constructed convolutional neural network by using the historical image dataset, where the pre-constructed convolutional neural network includes a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer, and a full connection layer.
The second obtaining module 502 is configured to obtain target image data, input the target image data into the trained convolutional neural network, and obtain a feature vector corresponding to the target image data.
A determining module 503, configured to determine a target image category of the target image based on the full connection layer and the feature vector.
In some embodiments, the image classification apparatus 500 is specifically configured to have a total number of layers of convolution layers greater than a total number of layers of the largest pooling layer, where there is a largest pooling layer between every two convolution layers, the global average pooling layer is located after the last convolution layer, and the fully connected layer is located after the global average pooling layer.
In some embodiments, the second obtaining module 502 is further configured to obtain a plurality of feature maps obtained by performing a convolution operation on the target image data through a plurality of convolution layers, where each convolution layer corresponds to one feature map; inputting all the feature images into a global average pooling layer to obtain the pixel average value of all the pixels in each feature image; and arranging all the pixel average values according to the sequence of the corresponding convolution layers in the convolution neural network to obtain the feature vector.
In some embodiments, the determining module 503 includes:
the acquisition unit is used for acquiring a weight matrix in the full-connection layer, and weight parameters in the weight matrix are in one-to-one correspondence with elements in the feature vector.
And a calculation unit for calculating the product between each element and the corresponding weight parameter, and determining the linear transformation result based on the sum of all the products.
And the nonlinear processing unit is used for carrying out nonlinear processing on the linear transformation result by utilizing a nonlinear activation function to obtain a prediction category vector.
And the determining unit is used for determining the target image category based on the prediction category vector and the normalized exponential function.
In some embodiments, the determining unit is further configured to convert all elements in the prediction category vector based on the normalized exponential function, where the elements in the prediction category vector are in one-to-one correspondence with image categories; determining the ratio of the conversion result of each element to the sum of the conversion results of all elements as a target probability corresponding to the element; and determining the image category corresponding to the maximum target probability as the target image category.
In some embodiments, the image classification device 500 is further configured to determine whether the current training round number reaches the preset training round number, and if so, stop the training process of the pre-constructed convolutional neural network.
The respective modules in the above-described image classification apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image classification method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of: acquiring a historical image data set and a pre-constructed convolutional neural network, and training the pre-constructed convolutional neural network by utilizing the historical image data set, wherein the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer; acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain a corresponding feature vector of the target image data; and determining a target image category of the target image based on the full connection layer and the feature vector.
In one embodiment, the total number of convolution layers implemented by the processor when executing the computer program is one more than the total number of the largest pooling layers, there is one largest pooling layer between every two convolution layers, the global average pooling layer is located after the last convolution layer, and the fully connected layer is located after the global average pooling layer.
In one embodiment, the input of the target image data into the trained convolutional neural network implemented when the processor executes the computer program obtains the corresponding feature vector of the target image data, which includes: acquiring a plurality of feature images obtained by carrying out convolution operation processing on target image data through a plurality of convolution layers, wherein each convolution layer corresponds to one feature image; inputting all the feature images into a global average pooling layer to obtain the pixel average value of all the pixels in each feature image; and arranging all the pixel average values according to the sequence of the corresponding convolution layers in the convolution neural network to obtain the feature vector.
In one embodiment, determining a target image class of a target image based on the fully connected layer and the feature vector, as implemented when the processor executes the computer program, comprises: acquiring a weight matrix in the full connection layer, wherein weight parameters in the weight matrix correspond to elements in the feature vector one by one; calculating the product between each element and the corresponding weight parameter, and determining a linear transformation result based on the sum of all the products; nonlinear processing is carried out on the linear transformation result by utilizing a nonlinear activation function, so as to obtain a prediction class vector; the target image class is determined based on the predicted class vector and a normalized exponential function.
In one embodiment, determining a target image class of a target image based on the predictive class vector and a normalized exponential function implemented when the processor executes a computer program comprises: converting all elements in the prediction category vector based on the normalized exponential function, wherein the elements in the prediction category vector correspond to the image categories one by one; determining the ratio of the conversion result of each element to the sum of the conversion results of all elements as a target probability corresponding to the element; and determining the image category corresponding to the maximum target probability as the target image category.
In one embodiment, after training the pre-constructed convolutional neural network using the historical image dataset implemented when the processor executes the computer program, further comprises: judging whether the current training round number reaches the preset training round number, and if so, stopping the training process of the pre-constructed convolutional neural network.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a historical image data set and a pre-constructed convolutional neural network, and training the pre-constructed convolutional neural network by utilizing the historical image data set, wherein the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer; acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain a corresponding feature vector of the target image data; and determining a target image category of the target image based on the full connection layer and the feature vector.
In one embodiment, the total number of convolutions layers implemented when the computer program is executed by the processor is one more than the total number of convolutions of the maximum pooling layer, there is one maximum pooling layer between every two convolutions, the global average pooling layer is located after the last convolutions, and the fully connected layer is located after the global average pooling layer.
In one embodiment, the input of the target image data into the trained convolutional neural network implemented when the computer program is executed by the processor, to obtain the corresponding feature vector of the target image data, includes: acquiring a plurality of feature images obtained by carrying out convolution operation processing on target image data through a plurality of convolution layers, wherein each convolution layer corresponds to one feature image; inputting all the feature images into a global average pooling layer to obtain the pixel average value of all the pixels in each feature image; and arranging all the pixel average values according to the sequence of the corresponding convolution layers in the convolution neural network to obtain the feature vector.
In one embodiment, determining a target image class of a target image based on the fully connected layer and the feature vector, which is implemented when the computer program is executed by the processor, comprises: acquiring a weight matrix in the full connection layer, wherein weight parameters in the weight matrix correspond to elements in the feature vector one by one; calculating the product between each element and the corresponding weight parameter, and determining a linear transformation result based on the sum of all the products; nonlinear processing is carried out on the linear transformation result by utilizing a nonlinear activation function, so as to obtain a prediction class vector; the target image class is determined based on the predicted class vector and a normalized exponential function.
In one embodiment, determining a target image class of a target image based on the predictive class vector and a normalized exponential function, implemented when the computer program is executed by a processor, comprises: converting all elements in the prediction category vector based on the normalized exponential function, wherein the elements in the prediction category vector correspond to the image categories one by one; determining the ratio of the conversion result of each element to the sum of the conversion results of all elements as a target probability corresponding to the element; and determining the image category corresponding to the maximum target probability as the target image category.
In one embodiment, after training the pre-constructed convolutional neural network using the historical image dataset, as implemented when the computer program is executed by the processor, further comprises: judging whether the current training round number reaches the preset training round number, and if so, stopping the training process of the pre-constructed convolutional neural network.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of: acquiring a historical image data set and a pre-constructed convolutional neural network, and training the pre-constructed convolutional neural network by utilizing the historical image data set, wherein the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer; acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain a corresponding feature vector of the target image data; and determining a target image category of the target image based on the full connection layer and the feature vector.
In one embodiment, the total number of convolutions layers implemented when the computer program is executed by the processor is one more than the total number of convolutions of the maximum pooling layer, there is one maximum pooling layer between every two convolutions, the global average pooling layer is located after the last convolutions, and the fully connected layer is located after the global average pooling layer.
In one embodiment, the input of the target image data into the trained convolutional neural network implemented when the computer program is executed by the processor, to obtain the corresponding feature vector of the target image data, includes: acquiring a plurality of feature images obtained by carrying out convolution operation processing on target image data through a plurality of convolution layers, wherein each convolution layer corresponds to one feature image; inputting all the feature images into a global average pooling layer to obtain the pixel average value of all the pixels in each feature image; and arranging all the pixel average values according to the sequence of the corresponding convolution layers in the convolution neural network to obtain the feature vector.
In one embodiment, determining a target image class of a target image based on the fully connected layer and the feature vector, which is implemented when the computer program is executed by the processor, comprises: acquiring a weight matrix in the full connection layer, wherein weight parameters in the weight matrix correspond to elements in the feature vector one by one; calculating the product between each element and the corresponding weight parameter, and determining a linear transformation result based on the sum of all the products; nonlinear processing is carried out on the linear transformation result by utilizing a nonlinear activation function, so as to obtain a prediction class vector; the target image class is determined based on the predicted class vector and a normalized exponential function.
In one embodiment, determining a target image class of a target image based on the predictive class vector and a normalized exponential function, implemented when the computer program is executed by a processor, comprises: converting all elements in the prediction category vector based on the normalized exponential function, wherein the elements in the prediction category vector correspond to the image categories one by one; determining the ratio of the conversion result of each element to the sum of the conversion results of all elements as a target probability corresponding to the element; and determining the image category corresponding to the maximum target probability as the target image category.
In one embodiment, after training the pre-constructed convolutional neural network using the historical image dataset, as implemented when the computer program is executed by the processor, further comprises: judging whether the current training round number reaches the preset training round number, and if so, stopping the training process of the pre-constructed convolutional neural network.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.
Claims (10)
1. A method of classifying images, the method comprising:
acquiring a historical image data set and a pre-constructed convolutional neural network, and training the pre-constructed convolutional neural network by utilizing the historical image data set, wherein the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer;
acquiring target image data, and inputting the target image data into a trained convolutional neural network to obtain a corresponding feature vector of the target image data;
and determining a target image category of the target image based on the full connection layer and the feature vector.
2. The method of claim 1, wherein the total number of layers of the convolution layers is one more than the total number of layers of the largest pooling layer, there is one largest pooling layer between every two convolution layers, the global average pooling layer is located after the last convolution layer, and the fully connected layer is located after the global average pooling layer.
3. The method according to claim 1, wherein the inputting the target image data into the trained convolutional neural network to obtain the corresponding feature vector of the target image data comprises:
acquiring a plurality of feature images obtained by carrying out convolution operation processing on target image data through a plurality of convolution layers, wherein each convolution layer corresponds to one feature image;
inputting all the feature images into a global average pooling layer to obtain the pixel average value of all the pixels in each feature image;
and arranging all the pixel average values according to the sequence of the corresponding convolution layers in the convolution neural network to obtain the feature vector.
4. The method of claim 1, wherein the determining a target image class of a target image based on the full connection layer and the feature vector comprises:
acquiring a weight matrix in the full connection layer, wherein weight parameters in the weight matrix correspond to elements in the feature vector one by one;
calculating the product between each element and the corresponding weight parameter, and determining a linear transformation result based on the sum of all the products;
nonlinear processing is carried out on the linear transformation result by utilizing a nonlinear activation function, so as to obtain a prediction class vector;
the target image class is determined based on the predicted class vector and a normalized exponential function.
5. The method of claim 4, wherein determining the target image class of the target image based on the predictive class vector and a normalized exponential function comprises:
converting all elements in the prediction category vector based on the normalized exponential function, wherein the elements in the prediction category vector correspond to the image categories one by one;
determining the ratio of the conversion result of each element to the sum of the conversion results of all elements as a target probability corresponding to the element;
and determining the image category corresponding to the maximum target probability as the target image category.
6. The method of claim 1, wherein after training the pre-constructed convolutional neural network using the historical image dataset, further comprising:
judging whether the current training round number reaches the preset training round number, and if so, stopping the training process of the pre-constructed convolutional neural network.
7. An image classification apparatus, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a historical image data set and a pre-constructed convolutional neural network, the pre-constructed convolutional neural network is trained by utilizing the historical image data set, and the pre-constructed convolutional neural network comprises a plurality of convolutional layers, a plurality of maximum pooling layers, a global average pooling layer and a full connection layer;
the second acquisition module is used for acquiring target image data, inputting the target image data into the trained convolutional neural network and obtaining corresponding feature vectors of the target image data;
and the determining module is used for determining the target image category of the target image based on the full connection layer and the feature vector.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311234897.XA CN117496216A (en) | 2023-09-22 | 2023-09-22 | Image classification method, device, equipment, medium and product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311234897.XA CN117496216A (en) | 2023-09-22 | 2023-09-22 | Image classification method, device, equipment, medium and product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117496216A true CN117496216A (en) | 2024-02-02 |
Family
ID=89667914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311234897.XA Pending CN117496216A (en) | 2023-09-22 | 2023-09-22 | Image classification method, device, equipment, medium and product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117496216A (en) |
-
2023
- 2023-09-22 CN CN202311234897.XA patent/CN117496216A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109522942B (en) | Image classification method and device, terminal equipment and storage medium | |
CN110378383B (en) | Picture classification method based on Keras framework and deep neural network | |
CN112418292B (en) | Image quality evaluation method, device, computer equipment and storage medium | |
CN113159147A (en) | Image identification method and device based on neural network and electronic equipment | |
CN113822209A (en) | Hyperspectral image recognition method and device, electronic equipment and readable storage medium | |
CN109034206A (en) | Image classification recognition methods, device, electronic equipment and computer-readable medium | |
KR102662997B1 (en) | Hyperspectral image classification method and appratus using neural network | |
US20230401756A1 (en) | Data Encoding Method and Related Device | |
CN112287965A (en) | Image quality detection model training method and device and computer equipment | |
CN111179270A (en) | Image co-segmentation method and device based on attention mechanism | |
CN117291895A (en) | Image detection method, device, equipment and storage medium | |
CN114387289B (en) | Semantic segmentation method and device for three-dimensional point cloud of power transmission and distribution overhead line | |
CN115223662A (en) | Data processing method, device, equipment and storage medium | |
CN115223181A (en) | Text detection-based method and device for recognizing characters of seal of report material | |
CN111639523B (en) | Target detection method, device, computer equipment and storage medium | |
CN112183303A (en) | Transformer equipment image classification method and device, computer equipment and medium | |
CN117496216A (en) | Image classification method, device, equipment, medium and product | |
Zhang et al. | Feature interpolation convolution for point cloud analysis | |
Jin et al. | Blind image quality assessment for multiple distortion image | |
Khanzadi et al. | Robust fuzzy rough set based dimensionality reduction for big multimedia data hashing and unsupervised generative learning | |
CN117235533B (en) | Object variable analysis method, device, computer equipment and storage medium | |
CN115761239B (en) | Semantic segmentation method and related device | |
WO2024174583A1 (en) | Model training method and apparatus, and device, storage medium and product | |
Kalla | Visualization for Solving Non-image Problems and Saliency Mapping | |
CN118781355A (en) | Object identification method based on YOLOv8 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |