[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110991533B - Image recognition method, recognition device, terminal device and readable storage medium - Google Patents

Image recognition method, recognition device, terminal device and readable storage medium Download PDF

Info

Publication number
CN110991533B
CN110991533B CN201911219591.0A CN201911219591A CN110991533B CN 110991533 B CN110991533 B CN 110991533B CN 201911219591 A CN201911219591 A CN 201911219591A CN 110991533 B CN110991533 B CN 110991533B
Authority
CN
China
Prior art keywords
image
identified
feature
determining
indication information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911219591.0A
Other languages
Chinese (zh)
Other versions
CN110991533A (en
Inventor
贾玉虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201911219591.0A priority Critical patent/CN110991533B/en
Publication of CN110991533A publication Critical patent/CN110991533A/en
Application granted granted Critical
Publication of CN110991533B publication Critical patent/CN110991533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image recognition method, a recognition device, a terminal device and a readable storage medium. The method comprises the following steps: acquiring an image to be identified, and determining global depth characteristics of the image to be identified; determining position indication information based on the image to be identified, wherein the position indication information is used for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified; determining depth characteristics of an image area indicated by the position indication information in the image to be identified, and obtaining local depth characteristics of the image to be identified; based on the global depth features and the local depth features, it is determined whether the class of the image to be identified is a target class. The method and the device can avoid training the deep learning model by adopting a large amount of training data and longer training time, and quicken the development period of the terminal equipment to a certain extent.

Description

Image recognition method, recognition device, terminal device and readable storage medium
Technical Field
The application belongs to the technical field of image recognition, and particularly relates to an image recognition method, a recognition device, terminal equipment and a readable storage medium.
Background
Currently, when identifying the category of an image, a deep learning model (for example, alexNet, VGGNet or res net) is often laid out in a terminal device, a global depth feature of the image to be identified is extracted by adopting the deep learning model, and then the category of the image is determined based on the global depth feature.
When the images to be identified are similar, in order to distinguish the categories of the images, a deep learning model is required to extract the deep features capable of reflecting the details of the images. In order to ensure that the deep learning model can extract depth features which embody more image details, a large amount of training data and longer training time are required to train the deep learning model, which undoubtedly prolongs the development period of the terminal equipment.
Disclosure of Invention
In view of this, the embodiments of the present application provide an image recognition method, a recognition device, a terminal device, and a readable storage medium, which can recognize similar image types without training a deep learning model with a large amount of training data and a long training time, and can speed up the development cycle of the terminal device to a certain extent.
A first aspect of an embodiment of the present application provides an image recognition method, including:
acquiring an image to be identified, and determining global depth characteristics of the image to be identified based on a first deep learning model;
determining position indication information based on the image to be identified, wherein the position indication information is used for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified;
determining depth characteristics of an image area indicated by the position indication information in the image to be identified based on a second deep learning model so as to obtain local depth characteristics of the image to be identified;
and determining whether the category of the image to be identified is a target category based on the global depth feature and the local depth feature, wherein the target category is a category containing the target object, and the scene is an image category under a preset scene.
A second aspect of an embodiment of the present application provides an image recognition apparatus, including:
the global feature module is used for acquiring an image to be identified and determining global depth features of the image to be identified based on the first deep learning model;
the position determining module is used for determining position indication information based on the image to be identified, wherein the position indication information is used for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified;
the local feature module is used for determining the depth features of the image area indicated by the position indication information in the image to be identified based on a second deep learning model so as to obtain the local depth features of the image to be identified;
the identification module is used for determining whether the category of the image to be identified is a target category based on the global depth feature and the local depth feature, wherein the target category is the category of the image containing the target object and the scene is the preset scene.
A third aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect when the processor executes the computer program.
A fourth aspect of the embodiments of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as described above in the first aspect.
A fifth aspect of embodiments of the present application provides a computer program product comprising a computer program which, when executed by one or more processors, implements the steps of the method as described above in the first aspect.
From the above, the present application provides an image recognition method. Firstly, determining global depth characteristics of an image to be identified based on a first deep learning model; next, position indication information for indicating: if the image to be identified contains a target object, the target object may be located in a position area; thirdly, determining depth characteristics of the image area indicated by the position indication information based on a second deep learning model (the second deep learning model can be the same as the first deep learning model) as local depth characteristics of the image to be identified; and finally, determining whether the category of the image to be identified is a target category based on the global depth feature and the local depth feature, wherein the target category is the category containing the target object, and the scene is the category of the image under the preset scene.
Therefore, the image recognition method provided by the application determines whether the category of the image to be recognized is the target category based on the global depth feature and the depth feature of the area where the target object is likely to be located, and not only depends on the global depth feature. In addition, even when the images are visually similar, the difference between the image areas indicated by the position indication information is often obvious when the images are of the target type or not, so that in this case, the detail information of the image to be identified is not required to be represented by the global depth feature, and the depth feature of the image area indicated by the position indication information is not required to represent more details, so that a large amount of training data and a long training time are not required to train the first and second deep learning models, and therefore, the image identification method provided by the application can identify similar image types under the condition that a large amount of training data and a long training time are not required to train the deep learning models, and can speed up the development period of the terminal equipment to a certain extent.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application.
Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a training process for performing the neural network model of step S102;
FIG. 3 is a schematic diagram of a process for obtaining candidate windows for indicating location indication information according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a P-Net network according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an R-Net network according to an embodiment of the present disclosure;
fig. 6 is a flowchart of another image recognition method according to the second embodiment of the present application;
fig. 7 is a schematic structural diagram of an image recognition device according to a third embodiment of the present application;
fig. 8 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The method provided by the embodiment of the application can be applied to a terminal device, and the terminal device includes, but is not limited to: smart phones, tablet computers, notebooks, desktop computers, cloud servers, etc.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
In order to illustrate the technical solutions described above, the following description is made by specific embodiments.
Example 1
Referring to fig. 1, the method for identifying an image according to the first embodiment of the present application is described below, and the method includes:
in step S101, an image to be identified is acquired, and global depth features of the image to be identified are determined based on a first deep learning model;
currently, a convolutional neural network (Convolutional Neural Networks, CNN) model is generally used to learn the features of an image, that is, the entire image is input into the CNN model, so as to obtain the global depth features of the image output by the CNN model. Common CNN models are AlexNet model, VGGNet model, google Inception Net model and ResNet model. The specific model architecture is prior art and will not be described in detail herein.
In this step S101, the global depth feature of the image to be identified may be obtained using an AlexNet model, VGGNet model, google Inception Net model, or ResNet model, which are commonly used in the prior art.
In addition, experiments prove that the global depth features obtained after downsampling the image to be identified are closer to the global depth features obtained by directly inputting the image to be identified into the first deep learning model compared with the global depth features obtained by directly inputting the image to be identified into the first deep learning model without downsampling the image to be identified, so that the downsampling of the image to be identified can be performed firstly and then the image to be identified can be input into the first deep learning model in order to reduce the operation amount. That is, the step S101 may include: and downsampling the image to be identified, and inputting the downsampled image into the first deep learning model to obtain the global depth characteristics of the image to be identified, which are output by the first deep learning model.
In step S102, based on the image to be identified, position indication information for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified;
in step S102, it is necessary to estimate the possible position of the target object if the image to be identified includes the target object. It should be understood by those skilled in the art that, whether or not the image to be recognized actually includes the target object, the step S102 needs to give the position indication information.
According to the habit of a user to acquire an image to be identified by using a terminal device, a target object of interest is usually located in the middle area of the image to be identified, and therefore, the position information of the middle area of the image to be identified can be used as the position indication information.
In addition, in the embodiment of the present application, the above-mentioned location indication information may be obtained by training a neural network model in advance (that is, the neural network model is used to estimate a location where a target object may exist in an image input to the neural network model), and a general process of training the neural network model is discussed below with reference to fig. 2.
Fig. 2 shows a schematic diagram of a training process of the neural network model X, which can be used to determine the possible positions of flowers in an image of a plant scene through the training process shown in fig. 2.
As shown in fig. 2, N sample images including flowers and having a plant scene can be obtained in advance, where each sample image corresponds to a label, each sample image is input into a neural network model X, and parameters of the neural network model X are continuously adjusted according to an output result of the neural network model X and the labels corresponding to each sample image respectively until the neural network model X can accurately identify the positions of the flowers in each sample image.
Through the training process shown in fig. 2, the trained neural network model X can identify the possible positions of flowers in the image of the plant scene. However, it should be understood by those skilled in the art that the neural network model X can still give the position indication information when the image input into the trained neural network model X is a plant scene image without flowers or when the image is input not.
In addition, the possible position of the target object in the image to be identified may be determined based on a manner of cascading a suggested Network (P-Net) with a refined Network (R-Net) (for example, the possible position of the flower in the input image may be determined by cascading P-Net and R-Net after training). In particular, the location indication information may be determined by the method shown in fig. 3. That is, the step S102 may include the steps of:
s1021, inputting the image to be identified into a trained suggested network P-Net, and outputting a candidate window for indicating the position indication information by the P-Net;
step S1022, correcting the candidate window of the P-Net output based on a boundary window regression algorithm Bounding box regression and a non-maximum suppression algorithm NMS;
step S1023, inputting the image to be identified and the candidate window corrected by the Bounding box regression and NMS algorithm into a trained improved network R-Net to obtain a candidate window for re-correction of the R-Net output;
and step S1024, correcting the candidate window output by the R-Net again based on the Bounding box regression and NMS algorithm to obtain a final candidate window for indicating the position indication information.
Fig. 4 and 5 of the present embodiments discuss a specific P-Net and R-Net network architecture.
As shown in fig. 4, a specific P-Net network architecture is shown. The input is a 3-channel 12 x 12 size image. First, by 10 convolution kernels of 3 x 3, 2×2 Max Pooling (stride=2), 10 feature maps of 5×5 are generated; next, 16 3×3 feature maps are generated by 16 3×3×10 convolution kernels; again, through 32 3×3×16 convolution kernels, 32 1×1 feature maps are generated; then, for 32 1×1 feature maps, 21×1 feature maps can be generated for classification by 21×1×32 convolution kernels; through 7 convolution kernels of 1×1×32, 9 feature maps of 1×1 are generated for regression frame judgment.
As shown in fig. 5, a specific R-Net network architecture is shown. The input is a 24 x 24 sized image of 3 channels. First of all, by 28 convolutions of 3 x 3 core and 3 x 3 Max Pooling (stride=2) to generate 28 11×11 feature maps; next, 48 4×4 feature maps are generated by 48 convolution kernels of 3×3×28 and Max Pooling (stride=2) of 3×3; again, after passing through 64 2×2×48 convolution kernels, 64 3×3 feature maps are generated; then, the feature map of 3×3×64 is converted into a 128-size full-connection layer, then the full-connection layer for the regression frame classification problem is converted, and the full-connection layer for the position regression problem of the bounding box is converted.
In step S103, determining depth features of the image area indicated by the position indication information in the image to be identified based on a second deep learning model, so as to obtain local depth features of the image to be identified;
the specific implementation process of the step S103 is substantially the same as that of the step S101, except that the image according to the step S101 is the whole image to be identified, the image according to the step S103 is a partial image area in the image to be identified, that is, the image area indicated by the position indication information may be input into the second deep learning model, so as to obtain the depth feature output by the second deep learning model.
In order to reduce the data operand, the image area indicated by the position indication information may be downsampled, and then the depth feature of the downsampled image area may be obtained as the depth feature of the image to be identified, as in step S101.
In addition, in order to reduce the occupation amount of the storage space of the terminal device, the second deep learning model may be the first deep learning model, and it is easy for those skilled in the art to understand that when the second deep learning model is the same as the first deep learning model, the development period of the terminal device can be further accelerated.
In step S104, based on the global depth feature and the local depth feature, determining whether the class of the image to be identified is a target class, where the target class is a class including the target object and the scene is an image class under a preset scene;
in this embodiment of the present application, the step S104 may be performed by using a recognition model (for example, a support vector machine SVM classifier), that is, the global depth feature and the local depth feature are input into the classifier, and the class of the image to be recognized is determined based on the classifier (for example, the classifier may output which of the preset classes the class of the image to be recognized is), so as to determine whether the image to be recognized is the target class.
According to the method, different types of all images with similar images can be accurately identified, for example, the images to be identified are images in all potting scenes, some of the images contain flowers, and some of the images do not contain flowers.
It should be noted that, in the first embodiment of the present application, a potting scene is listed, but it can be understood by those skilled in the art that the application scene of the image recognition method of the embodiment of the present application is not limited to potting scene recognition, and the image recognition method of the embodiment of the present application may be applied to scenes in which each image to be recognized is relatively similar, specifically, the image recognition method provided in the first embodiment of the present application determines whether the category of the image to be recognized is the target category based on the global depth feature and the depth feature of the area where the target object may be located, instead of relying solely on the global depth feature. Therefore, in this case, the global depth feature is not required to represent the detail information of the image to be identified, and when the image is of the target type or is not of the target type, the difference between the image areas indicated by the position indication information is often relatively obvious, so the depth feature of the image area indicated by the position indication information is not required to represent more details, a large amount of training data and a long training time are not required to train the first and second deep learning models, and therefore, the image identification method provided by the application can accelerate the development period of the terminal equipment to a certain extent.
Example two
Referring to fig. 6, another image recognition method provided in the second embodiment of the present application is described below, and the method includes:
in step S201, an image to be identified is acquired, and global depth features of the image to be identified are determined based on a first deep learning model;
in step S202, based on the image to be identified, position indication information for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified;
in step S203, determining depth features of the image area indicated by the position indication information in the image to be identified based on a second deep learning model, so as to obtain local depth features of the image to be identified;
the specific implementation manner of the steps S201 to S203 is identical to that of the steps S101 to S103 in the first embodiment, and may be specifically referred to the description of the first embodiment, and will not be repeated here.
In step S204, determining an artificial feature of the image to be identified, and determining whether the category of the image to be identified is a target category based on the artificial feature, the global depth feature and the local depth feature, wherein the target category is a category of the image including the target object and the scene is a preset scene;
unlike the first embodiment, this second embodiment further relies on the artificial features of the image to be identified to determine the category of the image to be identified. The artificial features may be color histogram features, texture descriptor features, spatial envelope features, scale invariant feature transforms, and/or directional gradient histogram features, etc.
Several artificial features of the present solution are described in detail below:
1) Color histogram features: the color histogram features can be applied in image retrieval and scene classification, and have the characteristics of simplicity, efficiency and easiness in calculation, and the main advantage of the color histogram features is that the color histogram features are unchanged for translation and rotation around a visual axis. The color histogram feature is also sensitive to small illumination variations and quantization errors.
2) Texture descriptor features: common texture descriptor features include gray level co-occurrence matrix, gabor features, local binary pattern features, etc., which are very effective in identifying texture scene images, especially those with repetitive arrangement characteristics.
3) Spatial envelope characteristics: the spatial envelope features provide a global description of the spatial structure used to represent the major dimensions and directions of the scene, in particular, in the standard spatial envelope features, the image is first convolved using a plurality of steerable pyramid filters, and then divided into 4 x 4 grids for which the azimuth histogram is extracted. Because of its simplicity and efficiency, spatial envelope features are widely used for scene representation.
4) Scale invariant feature transform: the scale-invariant feature transform describes the sub-regions by gradient information around identified keypoints. Standard scale invariant feature transforms, also known as sparse scale invariant feature transforms, are a combination of keypoint detection and histogram-based gradient representations. It generally has four steps, namely scale space extremum searching, sub-pixel keypoint refinement, dominant direction assignment and feature description. In addition to sparse scale invariant feature transforms, dense scale invariant feature transforms exist, such as accelerated robust features (Speed Up Robust Features, SURF). The scale-invariant feature transform is highly unique and invariant to scale, rotation and illumination variations.
5) Directional gradient histogram features: the directional gradient histogram feature represents an object by calculating the distribution of gradient intensities and directions in a spatially distributed sub-region, which has been accepted as one of the best features to capture the edge or local shape information of the object.
The selection of the artificial features can be determined in particular from the application scenario of the image recognition. The above-described artificial features, each of which is used in a specific scenario, contribute to an improvement in recognition rate. Generally speaking, the depth features obtained by using the deep learning model can reflect the texture of the image to some extent, so, for better recognition of the image category, the artificial feature described in the step S204 may be selected as a feature other than the texture descriptor feature, such as a color histogram feature.
It should be understood by those skilled in the art that, in the second embodiment of the present application, the step of acquiring the artificial feature is performed in step S204, but the present application is not limited to the specific execution sequence of "acquiring the artificial feature".
In a second embodiment of the present application, the determining whether the category of the image to be identified is the target category based on the artificial feature, the global depth feature, and the local depth feature may include:
splicing the artificial feature, the global depth feature and the local depth feature to obtain a feature vector;
and inputting the feature vector into a trained recognition model to obtain a recognition result which is output by the recognition model and is used for indicating the type of the image to be recognized.
Compared with the first embodiment, the method and the device further depend on the artificial characteristics of the image to be identified, so that the category of the image to be identified can be identified more accurately to a certain extent compared with the first embodiment.
Example III
The third embodiment of the application provides an image recognition device. For convenience of explanation, only a portion relevant to the present application is shown, and as shown in fig. 7, the image recognition apparatus 300 includes:
the global feature module 301 is configured to obtain an image to be identified, and determine global depth features of the image to be identified based on a first deep learning model;
a position determining module 302, configured to determine, based on the image to be identified, position indication information, where the position indication information is used to indicate: if the image to be identified contains a target object, the position of the target object in the image to be identified;
a local feature module 303, configured to determine depth features of an image area indicated by the position indication information in the image to be identified based on a second deep learning model, so as to obtain local depth features of the image to be identified;
the identifying module 304 is configured to determine whether the class of the image to be identified is a target class based on the global depth feature and the local depth feature, where the target class is a class including the target object, and the scene is an image class under a preset scene.
Optionally, the location determining module 302 includes:
the P-Net unit is used for inputting the image to be identified into a trained suggested network P-Net, and the P-Net outputs a candidate window for indicating the position indication information;
a correction unit, configured to correct the candidate window output by the P-Net based on a boundary window regression algorithm Bounding box regression and a non-maximum suppression algorithm NMS;
the R-Net unit is used for inputting the image to be identified and the candidate window corrected by the Bounding box regression and NMS algorithm into a trained improved network R-Net to obtain a candidate window corrected again for the R-Net output;
and the re-correction unit is used for re-correcting the candidate window output by the R-Net based on the Bounding box regression and NMS algorithm to obtain a final candidate window for indicating the position indication information.
Optionally, the global feature module 301 is specifically configured to:
and downsampling the image to be identified, and inputting the downsampled image to the first deep learning model to obtain the global depth characteristics of the image to be identified, which are output by the first deep learning model.
Optionally, the image recognition apparatus 300 further includes:
the artificial feature module is used for determining the artificial feature of the image to be identified;
accordingly, the identification module 304 is specifically configured to:
and determining whether the category of the image to be identified is a target category based on the artificial feature, the global depth feature and the local depth feature.
Optionally, the identification module 304 includes:
the splicing unit is used for splicing the artificial feature, the global depth feature and the local depth feature to obtain a feature vector;
the recognition unit is used for inputting the feature vector into the trained recognition model to obtain a recognition result which is output by the recognition model and used for indicating the image category to be recognized.
Optionally, the above artificial feature module is specifically configured to:
and determining the color histogram characteristics of the image to be identified.
It should be noted that, because the content of the information interaction and the execution process between the devices/units is based on the same concept as the first embodiment and the second embodiment of the method, specific functions and technical effects thereof may be referred to in the corresponding method embodiment section, and will not be described herein.
Example IV
Fig. 8 is a schematic diagram of a terminal device provided in a fourth embodiment of the present application. As shown in fig. 8, the terminal device 400 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented when the processor 401 executes the computer program 403 described above. Alternatively, the processor 401 may implement the functions of the modules/units in the above-described embodiments of the apparatus when executing the computer program 403.
Illustratively, the computer program 403 may be divided into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to complete the present application. The one or more modules/units may be a series of instruction segments of a computer program capable of performing a specific function, which instruction segments are used to describe the execution of the computer program 403 in the terminal device 400. For example, the computer program 403 may be divided into a global feature module, a location determination module, a local feature module, and an identification module, where each module specifically functions as follows:
acquiring an image to be identified, and determining global depth characteristics of the image to be identified based on a first deep learning model;
determining position indication information based on the image to be identified, wherein the position indication information is used for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified;
determining depth characteristics of an image area indicated by the position indication information in the image to be identified based on a second deep learning model so as to obtain local depth characteristics of the image to be identified;
and determining whether the category of the image to be identified is a target category based on the global depth feature and the local depth feature, wherein the target category is a category containing the target object, and the scene is an image category under a preset scene.
The terminal device may include, but is not limited to, a processor 401, a memory 402. It will be appreciated by those skilled in the art that fig. 8 is merely an example of a terminal device 400 and is not intended to limit the terminal device 400, and may include more or fewer components than shown, or may combine certain components, or different components, such as the terminal device described above may also include input-output devices, network access devices, buses, etc.
The processor 401 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 402 may be an internal storage unit of the terminal device 400, for example, a hard disk or a memory of the terminal device 400. The memory 402 may be an external storage device of the terminal device 400, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided in the terminal device 400. Further, the memory 402 may also include both an internal storage unit and an external storage device of the terminal device 400. The memory 402 is used for storing the computer program and other programs and data required for the terminal device. The memory 402 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units described above is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of each of the above-described method embodiments, or may be implemented by a computer program to instruct related hardware, where the above-described computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the above-described method embodiments. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium described above can be appropriately increased or decreased according to the requirements of the jurisdiction's legislation and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the legislation and the patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. An image recognition method, comprising:
acquiring an image to be identified, and determining global depth characteristics of the image to be identified based on a first deep learning model;
determining position indication information based on the image to be identified, wherein the position indication information is used for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified;
determining depth features of the image area indicated by the position indication information in the image to be identified based on a second deep learning model so as to obtain local depth features of the image to be identified;
determining whether the category of the image to be identified is a target category based on the global depth feature and the local depth feature, wherein the target category is a category containing the target object, and the scene is an image under a preset scene;
the determining the position indication information based on the image to be identified comprises the following steps:
inputting the image to be identified into a trained suggested network P-Net, and outputting a candidate window for indicating the position indication information by the P-Net;
correcting the candidate window of the P-Net output based on a boundary window regression algorithm Bounding box regression and a non-maximum suppression algorithm NMS;
inputting the image to be identified and the candidate window corrected by the boundary window-based regression algorithm Bounding box regression and the non-maximum suppression algorithm NMS algorithm into a trained improved network R-Net to obtain a candidate window for re-correction of the R-Net output;
and correcting the candidate window output by the R-Net again based on the boundary window-based regression algorithm Bounding box regression and a non-maximum suppression algorithm NMS algorithm to obtain a final candidate window for indicating the position indication information.
2. The image recognition method of claim 1, wherein the determining global depth features of the image to be recognized based on a first deep learning model comprises:
and downsampling the image to be identified, and inputting the downsampled image to the first deep learning model to obtain the global depth characteristics of the image to be identified, which are output by the first deep learning model.
3. The image recognition method according to any one of claims 1 to 2, characterized in that the image recognition method further comprises:
determining the artificial characteristics of the image to be identified;
accordingly, the determining, based on the global depth feature and the local depth feature, whether the category of the image to be identified is a target category includes:
and determining whether the category of the image to be identified is a target category based on the artificial feature, the global depth feature and the local depth feature.
4. The image recognition method of claim 3, wherein the determining whether the class of the image to be recognized is a target class based on the artificial feature, the global depth feature, and the local depth feature comprises:
splicing the artificial feature, the global depth feature and the local depth feature to obtain a feature vector;
and inputting the feature vector into a trained recognition model to obtain a recognition result which is output by the recognition model and is used for indicating the type of the image to be recognized.
5. The image recognition method of claim 3, wherein the determining the artificial feature of the image to be recognized comprises:
and determining the color histogram characteristics of the image to be identified.
6. An image recognition apparatus, comprising:
the global feature module is used for acquiring an image to be identified and determining global depth features of the image to be identified based on a first deep learning model;
the position determining module is used for determining position indication information based on the image to be identified, wherein the position indication information is used for indicating: if the image to be identified contains a target object, the position of the target object in the image to be identified;
the local feature module is used for determining depth features of the image area indicated by the position indication information in the image to be identified based on a second deep learning model so as to obtain local depth features of the image to be identified;
the identification module is used for determining whether the category of the image to be identified is a target category based on the global depth feature and the local depth feature, wherein the target category comprises the target object, and the scene is the category of the image under a preset scene;
the location determination module includes:
the P-Net unit is used for inputting the image to be identified into a trained suggested network P-Net, and the P-Net outputs a candidate window for indicating the position indication information;
a correction unit, configured to correct the candidate window output by the P-Net based on a boundary window regression algorithm Bounding box regression and a non-maximum suppression algorithm NMS;
the R-Net unit is used for inputting the image to be identified and the candidate window corrected by the boundary window regression algorithm Bounding box regression and the non-maximum suppression algorithm NMS algorithm into the trained improved network R-Net to obtain a candidate window for re-correction of the R-Net output;
and the re-correction unit is used for re-correcting the candidate window output by the R-Net based on the boundary window regression algorithm Bounding box regression and the non-maximum suppression algorithm NMS algorithm to obtain a final candidate window for indicating the position indication information.
7. Terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the image recognition method according to any one of claims 1 to 5 when the computer program is executed.
8. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the image recognition method according to any one of claims 1 to 5.
CN201911219591.0A 2019-12-03 2019-12-03 Image recognition method, recognition device, terminal device and readable storage medium Active CN110991533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911219591.0A CN110991533B (en) 2019-12-03 2019-12-03 Image recognition method, recognition device, terminal device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911219591.0A CN110991533B (en) 2019-12-03 2019-12-03 Image recognition method, recognition device, terminal device and readable storage medium

Publications (2)

Publication Number Publication Date
CN110991533A CN110991533A (en) 2020-04-10
CN110991533B true CN110991533B (en) 2023-08-04

Family

ID=70089698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911219591.0A Active CN110991533B (en) 2019-12-03 2019-12-03 Image recognition method, recognition device, terminal device and readable storage medium

Country Status (1)

Country Link
CN (1) CN110991533B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814538B (en) * 2020-05-25 2024-03-05 北京达佳互联信息技术有限公司 Method and device for identifying category of target object, electronic equipment and storage medium
CN111783889B (en) * 2020-07-03 2022-03-01 北京字节跳动网络技术有限公司 Image recognition method and device, electronic equipment and computer readable medium
CN111666957B (en) * 2020-07-17 2023-04-25 湖南华威金安企业管理有限公司 Image authenticity identification method and device
CN112001152A (en) * 2020-08-25 2020-11-27 杭州大拿科技股份有限公司 Object recognition processing method, processing device, electronic device and storage medium
CN112241713B (en) * 2020-10-22 2023-12-29 江苏美克医学技术有限公司 Method and device for identifying vaginal microorganisms based on pattern recognition and deep learning
CN112541543B (en) * 2020-12-11 2023-11-24 深圳市优必选科技股份有限公司 Image recognition method, device, terminal equipment and storage medium
CN113362314B (en) * 2021-06-18 2022-10-18 北京百度网讯科技有限公司 Medical image recognition method, recognition model training method and device
CN113420696A (en) * 2021-07-01 2021-09-21 四川邮电职业技术学院 Odor generation control method and system and computer readable storage medium
CN114595352A (en) * 2022-02-25 2022-06-07 北京爱奇艺科技有限公司 Image identification method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229444A (en) * 2018-02-09 2018-06-29 天津师范大学 A kind of pedestrian's recognition methods again based on whole and local depth characteristic fusion
CN108288067A (en) * 2017-09-12 2018-07-17 腾讯科技(深圳)有限公司 Training method, bidirectional research method and the relevant apparatus of image text Matching Model
CN109492529A (en) * 2018-10-08 2019-03-19 中国矿业大学 A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN110096933A (en) * 2018-01-30 2019-08-06 华为技术有限公司 The method, apparatus and system of target detection
CN110399822A (en) * 2019-07-17 2019-11-01 思百达物联网科技(北京)有限公司 Action identification method of raising one's hand, device and storage medium based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170000748A (en) * 2015-06-24 2017-01-03 삼성전자주식회사 Method and apparatus for face recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288067A (en) * 2017-09-12 2018-07-17 腾讯科技(深圳)有限公司 Training method, bidirectional research method and the relevant apparatus of image text Matching Model
CN110096933A (en) * 2018-01-30 2019-08-06 华为技术有限公司 The method, apparatus and system of target detection
CN108229444A (en) * 2018-02-09 2018-06-29 天津师范大学 A kind of pedestrian's recognition methods again based on whole and local depth characteristic fusion
CN109492529A (en) * 2018-10-08 2019-03-19 中国矿业大学 A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN110399822A (en) * 2019-07-17 2019-11-01 思百达物联网科技(北京)有限公司 Action identification method of raising one's hand, device and storage medium based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PNET:像素级台标识别网络;徐佳宇;张冬明;靳国庆;包秀国;袁庆升;张勇东;;计算机辅助设计与图形学学报(第10期);第97-108页 *
基于深度学习的蝴蝶科级标本图像自动识别;周爱明;马鹏鹏;席天宇;王江宁;冯晋;邵泽中;陶玉磊;姚青;;昆虫学报(第11期);第107-116页 *

Also Published As

Publication number Publication date
CN110991533A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110991533B (en) Image recognition method, recognition device, terminal device and readable storage medium
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
CN111310775A (en) Data training method and device, terminal equipment and computer readable storage medium
CN111080660A (en) Image segmentation method and device, terminal equipment and storage medium
CN112183212B (en) Weed identification method, device, terminal equipment and readable storage medium
CN110046622B (en) Targeted attack sample generation method, device, equipment and storage medium
CN109271842B (en) General object detection method, system, terminal and storage medium based on key point regression
CN113393487B (en) Moving object detection method, moving object detection device, electronic equipment and medium
US11094049B2 (en) Computing device and non-transitory storage medium implementing target object identification method
CN113128536A (en) Unsupervised learning method, system, computer device and readable storage medium
CN110738204A (en) Method and device for positioning certificate areas
CN111373393B (en) Image retrieval method and device and image library generation method and device
CN108960246B (en) Binarization processing device and method for image recognition
CN115223042A (en) Target identification method and device based on YOLOv5 network model
CN114359048A (en) Image data enhancement method and device, terminal equipment and storage medium
CN111199228B (en) License plate positioning method and device
CN114240935B (en) Space-frequency domain feature fusion medical image feature identification method and device
CN113034449B (en) Target detection model training method and device and communication equipment
CN111767710B (en) Indonesia emotion classification method, device, equipment and medium
US10832076B2 (en) Method and image processing entity for applying a convolutional neural network to an image
CN107704819B (en) Action identification method and system and terminal equipment
CN114004976A (en) LBP (local binary pattern) feature-based target identification method and system
US20240144633A1 (en) Image recognition method, electronic device and storage medium
CN112288748A (en) Semantic segmentation network training and image semantic segmentation method and device
CN116664990B (en) Camouflage target detection method, model training method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant