Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In various fields, for actions to be performed by a user, such as transactions, investments, ratings and the like, an image including a target object is uploaded by the user, the target object is generally a certificate, and the identity, authority, capability and the like of the user are identified.
At present, for an image uploaded by a user, a certificate in the image is identified to obtain contents such as a text in the certificate, however, the quality of the image uploaded by the user often cannot meet identification requirements, for example, the image has low definition, low brightness, overexposure or incomplete certificate, and the image which does not meet the identification requirements is directly identified, and the accuracy of an identification result cannot be ensured.
In view of the above problems, in the embodiment of the present application, the image feature of the image to be recognized is obtained, whether the image meets the recognition condition in the imaging quality or the integrity of the target object is determined based on the image feature, and after it is determined that the image meets the recognition condition, the text in the image is recognized, so as to obtain the text information.
As an example and not by way of limitation, while confirming whether the image to be recognized is a copy, an invalid object (e.g., an incorrect document or an absence of a desired document), a correct orientation, or the like, the text in the image is recognized when the state of the image is confirmed to be normal and the image satisfies the recognition condition.
The technical scheme of the embodiment of the application can be applied to various electronic devices and is used for accurately identifying the image to be identified. The electronic device may be a terminal device, such as a Mobile Phone (Mobile Phone), a tablet computer (Pad), a computer, a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a terminal device in industrial control (industrial control), a terminal device in unmanned driving (self driving), a terminal device in remote medical treatment (remote medical), a terminal device in smart city (smart city), a terminal device in smart home (smart home), and the like. The terminal equipment in this application embodiment can also be wearable equipment, and wearable equipment also can be called as wearing formula smart machine, is the general term of using wearing formula technique to carry out intelligent design, develop the equipment that can dress to daily wearing, like glasses, gloves, wrist-watch, dress and shoes etc.. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The terminal device may be fixed or mobile.
For example, the electronic device in the embodiment of the present application may also be a server, and when the electronic device is the server, the electronic device may receive an image acquired by a terminal device, and perform image processing on the image, so as to achieve accurate identification of a target object.
Fig. 1 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present disclosure. As shown in fig. 1, the electronic device 100 includes: the image recognition system comprises an image acquisition unit 101, an image determination unit 102 and an image recognition unit 103, wherein the image determination unit 102 is respectively connected with the image acquisition unit 101 and the image recognition unit 103.
The image acquiring unit 102 is configured to acquire an image to be recognized, which includes, for example, a document object. For example, the image acquired by the image acquisition device, or the image transmitted by another device, or the image input by the user may be received, which is not limited in the embodiments of the present application.
The image determining unit 102 receives the image to be recognized sent by the image acquiring unit 101, confirms the imaging quality of the image and/or the integrity of the target object in the image, and sends the image to be recognized to the image recognizing unit 103 when the image meets the recognition condition, that is, when the imaging quality of the image meets the quality condition and/or the integrity of the target object meets the integrity condition; for example, if the imaging quality of the image does not satisfy the identification condition, or the integrity of the target object does not satisfy the identification condition, or neither the imaging quality of the image nor the integrity of the target object satisfies the identification condition, the image is not identified.
In some embodiments, the electronic device 100 further comprises an information sending unit 104, and the information sending unit 104 is connected with the image determining unit 102. When the image determining unit 102 determines that the image does not satisfy the recognition condition, an instruction is sent to the information sending unit 104, and the information sending unit 104 generates indication information according to the instruction, wherein the indication information is used for indicating that the image does not meet the requirement, optionally, further indicating an item which does not meet the requirement, and prompting the user to provide the image to be recognized again.
After receiving the image to be recognized, the image recognition unit 103 recognizes the target object in the image to be recognized to obtain information of the target object, generally, the obtained information of the target object is text information, and in some embodiments, image information of the target object may also be obtained, for example, when the target object is an identification card, a personal photograph in the target object is obtained.
It should be understood that the electronic device 100 further includes a storage unit (not shown in the figure) for storing information of the identified target objects, and illustratively, the information of each target object is structurally stored, for example, the "name lie" in the form of [ key, value ] is stored as [ name, lie ] in the identification card.
The present application is specifically illustrated by the following examples.
Fig. 2 is a flowchart illustrating an image processing method 200 according to an embodiment of the present disclosure.
In order to accurately identify a target object in an image, the embodiment of the application firstly confirms the imaging quality and the integrity of the image before identifying the image, and identifies the target object in the image when confirming that the image meets the identification condition based on the imaging quality and the integrity.
As shown in fig. 2, the image processing method provided in the embodiment of the present application includes:
s201: at least one image feature of an image to be identified is acquired.
It should be understood that the image to be recognized contains an object, and the object may be a document object, such as an identification card, a license or certificate, or a document object.
Wherein the image features are used to characterize the imaging quality of the image or the completeness of the target object in the image.
Illustratively, the image features include at least one first feature for characterizing an imaging quality of the image, and/or a second feature for characterizing a completeness of the target object. Correspondingly, acquiring at least one image feature of the image to be recognized comprises: at least one first feature of the image is acquired, and/or a second feature of the image is acquired. It should be understood that only the at least one first feature of the image or only the second feature of the image may be acquired, or that at least one first feature and second feature of the image may be acquired separately.
Optionally, the first feature may be any one of a variance feature, a mean feature, a first pixel number feature, or a second pixel number feature. The first pixel number is the number of adjacent pixels with pixel values larger than a first preset pixel value, the second pixel number is the number of adjacent pixels with pixel values smaller than a second preset pixel value, and optionally, the first preset pixel value is larger than the second preset pixel value.
S202: based on the at least one image feature, it is determined whether the image satisfies the recognition condition.
In this step, it is determined whether the image satisfies the recognition condition in combination with the acquired at least one image feature. For example, whether the definition of the image meets the requirement is determined based on the variance characteristics, and when the definition of the image meets the requirement, the image meets the identification condition is determined; for another example, whether the definition of the image meets the requirement is determined based on the variance characteristics, whether the brightness of the image meets the requirement is determined based on the mean value, and when the definition meets the requirement and the brightness meets the requirement, the image meets the identification condition is determined.
S203: and when the image meets the identification condition, identifying the target object in the image to obtain the information of the target object.
If the image meets the identification condition, the image is easy to be accurately identified, namely, the image meeting the identification condition is identified, and a more accurate identification result is obtained. Further, the information of the target object is obtained by identifying the target object in the image, and generally, the information of the target object includes text information, such as name, address, date of birth, etc. in the identification card, and in some embodiments, the target object also includes image information, such as a personal photo in the identification card.
Illustratively, the resulting information of the target object is stored for subsequent querying or use.
The method and the device for recognizing the target object in the image are used for confirming whether the image meets the recognition condition on the basis of at least one image feature of the acquired image to be recognized in the imaging quality and/or the integrity degree of the target object, and recognizing the target object in the image when the image meets the confirmation condition so as to obtain an accurate recognition result.
Fig. 3 is a flowchart illustrating an image processing method 300 according to an embodiment of the present disclosure.
In order to accurately confirm whether an image meets an identification condition, an embodiment of the present application proposes an implementation manner as shown in fig. 3 to determine whether an image meets an identification condition, including:
s301: and obtaining an evaluation result of whether the image features meet the corresponding preset conditions or not based on the image features and the corresponding threshold value aiming at each image feature in the at least one image feature.
For example, if the image feature is a variance of an image, when the variance is greater than a definition threshold, it is determined that the image feature satisfies a corresponding preset condition, it should be understood that the variance of the image is a variance calculated based on a pixel value of each pixel point in the image, and the larger the variance of the image is, the image has a wider frequency response range, which indicates that the image is focused accurately, i.e., the higher the definition of the image is, and the smaller the variance of the image is, the narrower frequency response range of the image is, which indicates that the number of edges of the image is small, i.e., the lower the definition of the image is. The definition threshold is a preset value of the variance meeting the definition requirement.
If the image feature is an average value of the image, determining that the image feature satisfies a corresponding preset condition when the average value is greater than a brightness threshold, and it should be understood that the average value of the image is an average value calculated based on a pixel value of each pixel point in the image, where the larger the average value of the image is, the higher the brightness of the image is, and the smaller the average value of the image is, the lower the brightness of the image is. The brightness threshold is a value corresponding to the mean value when the brightness requirement is met.
If the image feature is the first pixel number, when the first pixel number is smaller than a first number threshold, it is determined that the image feature satisfies a corresponding preset condition, wherein the first pixel number is the number of adjacent pixels of which the pixel values are larger than a first preset pixel value. Firstly, the number of adjacent pixels in the image, which is greater than a first preset pixel value, is determined, that is, the first pixel number, and when the first pixel number is smaller than a first number threshold, it indicates that there is no bright spot, or referred to as a light spot, in the image.
If the image feature is the second pixel number, when the second pixel number is smaller than a second number threshold, determining that the image feature meets the corresponding preset condition, wherein the second pixel number is the number of adjacent pixels of which the pixel values are smaller than a second preset pixel value. First, the number of vector pixels in the image smaller than a second preset pixel value, i.e., the second pixel number, is determined, and when the second pixel number is smaller than a second number threshold, it indicates that there is no shadow, or shadow.
It should be noted that the first preset pixel value is greater than the second preset pixel value; the first quantity threshold and the second quantity threshold may be the same or different, and the application does not limit this.
If the image features are intersection ratios, when the intersection ratios are larger than an intersection ratio threshold value, determining that the image features meet corresponding preset conditions, wherein the intersection ratios are ratios of intersections and unions of foreground images and background images obtained after image segmentation of the images, the foreground images contain target objects, the background images do not contain the target objects, and generally, the foreground objects only contain the target objects. It should be understood that when the intersection ratio is 1, the target object in the image is complete, when the intersection ratio is less than 1, the target object in the image has a situation of lacking edges, lacking corners or shielding, the smaller the intersection ratio is, the more serious the situation of lacking edges, lacking corners or shielding of the target object is, and the intersection ratio threshold is used for distinguishing whether the target object meets the preset condition according to the acceptable incomplete degree of the target object.
S302: based on the evaluation result of each image feature, it is determined whether the image satisfies the recognition condition.
In an actual application scene, the evaluation result of each image feature can be subjected to weighted operation to determine whether the image meets the identification condition; or when more than half of the evaluation results indicate that the image features meet the corresponding preset conditions, determining that the image meets the identification conditions, otherwise, determining that the image does not meet the identification conditions; or when the evaluation result of each image feature indicates that the corresponding image feature satisfies the corresponding preset condition, determining that the image satisfies the identification condition, and when any evaluation result indicates that the image feature does not satisfy the corresponding preset condition, determining that the image does not satisfy the identification condition.
Fig. 4 is a flowchart illustrating an image processing method 400 according to an embodiment of the present disclosure.
On the basis of any of the above embodiments, the present application will now describe how to acquire at least one first feature of an image with reference to fig. 4.
As shown in fig. 4, the method includes:
s401: the image is converted into a grayscale image.
Generally, the image to be recognized is a color image, such as an RGB image, and in this step, the color image needs to be converted into a grayscale image through color control conversion. Optionally, the pixel value of each pixel point in the grayscale image is between 0 and 255.
S402: based on the gray scale image, at least one first feature of the image is determined.
At least one first feature of the image is obtained based on the pixel value of each pixel point in the gray image, such as the variance of the image, the mean of the image, the first pixel number or the second pixel number, and the like.
With reference to fig. 5, a possible implementation is provided for determining at least one first feature of an image based on a gray-scale image.
S501: and converting the gray level image into a Laplace image through a Laplace algorithm.
It should be appreciated that laplace is a differential operator, and its application can enhance the abrupt gray level change region in the gray image and reduce the slow change region of gray level.
In this step, the grayscale image is converted into a laplacian image by a laplacian algorithm, and an operation can be performed based on an arbitrary laplacian operator.
Illustratively, the gray image is convolved through a preset laplacian mask to obtain a laplacian image.
The laplacian mask is a preset convolution template, and preferably, the laplacian mask may be set to a 3-by-3 mask as shown in table 1.
TABLE 1
S502: based on the laplacian image, at least one first feature of the image is determined.
Illustratively, at least one of the variance, the mean, the first pixel number, and the second pixel number of the laplacian image is calculated based on the pixel value of each pixel point in the laplacian image.
The first pixel number is the number of adjacent pixels with pixel values larger than a first preset pixel value, the second pixel number is the number of adjacent pixels with pixel values smaller than a second preset pixel value, and the first preset pixel value is larger than the second preset pixel value.
Fig. 6 is a flowchart illustrating an image processing method 600 according to an embodiment of the present application.
On the basis of any of the above embodiments, the embodiments of the present application will describe how to acquire the second feature of the image with reference to fig. 6.
As shown in fig. 6, the method includes:
s601: and carrying out image segmentation on the image through a segmentation algorithm to obtain a foreground image and a background image.
In this step, the image is segmented by a segmentation algorithm, for example, a GrabCut segmentation algorithm, to obtain a foreground image containing the target object and a background image not containing the target object.
S602: based on the foreground image and the preset image, an Intersection-over-Union (IoU) of the foreground image and the preset image is calculated.
Note that the preset image is an image having the same aspect ratio as the target object. As an example, when a user performs image acquisition on a target object, a preset image or an outline of the preset image is displayed in a view frame or a preview frame, so that the user can obtain an image containing the target object based on the preset image acquisition; as another example, after acquiring the image to be recognized, the target object in the image is calibrated according to the preset image, for example, a center point of the target object is aligned with a center point of the preset image, and the target object is scaled to the size of the preset image.
In this step, the intersection of the foreground image and the preset image, i.e. the area of the quadrilateral ABCD in fig. 7a, is compared with the union of the foreground image and the preset image, i.e. the area of the irregular polygon EFBGHD in fig. 7a, to obtain the intersection ratio. And the intersection and union ratio is used for representing the ratio of the intersection and the union of the foreground image and the preset image. For example, referring to fig. 7B again, if the quadrangle a 'B' C 'D' is a blocked area, the area belongs to the background image, and the intersection ratio is the area ratio of the quadrangle ABCD minus the quadrangle a 'B' C 'D' to the irregular polygon EFBGHD.
It is to be understood that the second feature includes the cross-over ratio.
Further, when the intersection ratio is larger than the intersection ratio threshold value, the image feature is determined to meet the corresponding preset condition.
In this embodiment, whether the target object in the image is complete is determined by calculating the intersection ratio, and the target object is prevented from having an edge or an angle or being blocked, so as to determine that the image meets the identification condition.
Fig. 8 is a flowchart illustrating an image processing method 800 according to an embodiment of the present disclosure.
On the basis of any of the above embodiments, with reference to fig. 8, the image processing method further includes:
s801: and inputting the image to be identified into the image classification model to obtain the classification result of the image.
The image classification model is obtained by training based on a first network model, for example, an initiation series network model, and preferably, initiation v3 may be used as a backbone network.
The classification result is used to represent that the state of the target object is a normal state or an abnormal state, optionally, the abnormal state at least includes one of a copy, or an invalid object, taking the target object as an identity card as an example, the invalid object includes that the target object is a temporary identity card, and the image does not include the identity card or a non-identity card of the target object. Optionally, the abnormal state further includes whether the front side and the back side of the target object meet the requirement, for example, when the front side of the identity card needs to be uploaded, the target object in the image is the back side of the identity card, which is the abnormal state, and the target object is the front side of the identity card, which is the normal state.
For example, the image to be recognized may be input into the image classification model, or the image to be recognized may be preprocessed and the processed image may be input into the image classification model, for example, the image to be recognized may be converted into a grayscale image.
S802: and when the classification result indicates that the state of the target object is a normal state and the image meets the identification condition, identifying the target object in the image to obtain the information of the target object.
It should be noted that, for the execution order of determining whether the state of the image is normal and determining whether the image satisfies the recognition condition based on the classification result, the present embodiment is not limited, that is, determining whether the state of the image is normal based on the classification result may be executed before or after determining whether the image satisfies the recognition condition, or may be executed simultaneously with determining whether the image satisfies the recognition condition.
In this embodiment, the target object in the image is further classified and judged, and when the target object is determined to be in a normal state, the target object in the image is identified, so that the target object is prevented from being identified by mistake, and the information of the acquired target object is prevented from being wrong.
Fig. 9 is a flowchart illustrating an image processing method 900 according to an embodiment of the present application.
On the basis of any of the above embodiments, with reference to fig. 9, when an image satisfies a recognition condition, how to recognize a target object in the image to obtain information of the target object provides the following implementation manners:
s901: at least one text line image of the target object is acquired.
In this step, the text line of the target object in the image is identified to obtain at least one text line image.
For example, edge detection may be performed on each text line of the target object by an arbitrary image segmentation algorithm, and a binary mask of the text line may be extracted by a morphological operation (also referred to as an on-off operation) in combination with a connected component, so as to obtain a text line image.
S902: and inputting at least one text line image into the text recognition model to obtain text information of the target object.
And the text recognition model is obtained based on the second network model training.
Optionally, the second network model is a network model based on a convolutional neural network CNN model and a join-sense-time-sorted CTC setup. Compared with the traditional network model containing the Recurrent Neural Network (RNN), the method can accurately identify the content of the text line and improve the identification speed of the content of the text line, the CTC is used for overcoming the problem that the length of an output sequence is inconsistent with that of an input sequence, a new sequence is formed by filling a space in the middle, and the space is removed by using a certain rule during decoding. Alternatively, the backbone network of the second network model may employ a network of the DenseNet algorithm.
It should be understood that in the text recognition model in this step, for each text line image, the output text information of the target object is structured information, that is, the text information is output in the form of [ key, Value ].
On the basis of the embodiment shown in fig. 9, as an example, the image to be recognized needs to be preprocessed before acquiring at least one text line image of the target object.
Illustratively, during the process of image capturing of the target object, there is a certain relative angle between the image capturing device, such as the lens of the camera, and the target object, so that there is a certain degree of distortion in the target object, as shown in fig. 10, the target object in fig. 10- (a) is a quadrangle represented by 1234 four vertices, which is a trapezoid, and therefore, it is necessary to convert the quadrangle into a regular quadrangle represented by 1234 four vertices in fig. 10- (b).
For example, the image of the target object may be acquired from the image to be recognized based on an arbitrary edge detection algorithm, for example, the edge detection is performed on the target object through an image segmentation algorithm, a binarization boundary (also referred to as a binarization mask) of the target object is extracted based on a morphological operation (also referred to as an opening and closing operation) in combination with a connected domain analysis, a maximum bounding rectangle is taken based on the binarization boundary, and an area ratio of a region of the target object and a region of the image to be recognized is combined to eliminate the possibility of false detection. Further, four vertexes of the quadrangle are obtained through connected domain analysis, the target object in the converted regular quadrangle range is obtained through perspective transformation based on the position coordinates of the four vertexes, and recognition is carried out based on the image containing the target object so as to obtain the information of the target object.
Fig. 11 is a flowchart illustrating an image processing method 1100 according to an embodiment of the present disclosure.
On the basis of any of the above embodiments, this embodiment provides a possible implementation manner, which specifically includes:
firstly, image acquisition is carried out, in an application scene, when a user carries out image acquisition on a target object through a handheld image acquisition device, an electronic device provides an image preview frame for the user through a display device, in some embodiments, a preset image outline is displayed in the image preview frame, the preset image outline and the target object to be acquired have the same transverse-longitudinal ratio, and the user can align the target object with the preset image outline, place the target object in the preset image outline and carry out acquisition on the target object.
Further, imaging quality evaluation is performed on the image to be recognized including the target object, for example, whether the definition and the brightness of the image to be recognized meet the requirements of preset conditions and whether bright spots or shadows exist are determined, after the imaging quality evaluation, if the image to be recognized is qualified, the integrity detection is continued, and if the image to be recognized is not qualified, the image acquisition is performed again.
And carrying out integrity detection on the image to be identified with the imaging quality evaluated to be qualified, determining whether the target object has the conditions of edge deletion, corner deletion or shielding, if the detection result of the integrity is qualified, carrying out next-step risk type evaluation, and if the detection result of the integrity is not qualified, carrying out image acquisition again.
And then, performing risk type evaluation on the image to be recognized through an image classification model obtained through pre-training to obtain a classification result, detecting the target object when the classification result indicates that the state of the target object is a normal state, and performing image acquisition again when the classification result indicates that the state of the target object is an abnormal state.
When the evaluation and the detection of the image to be recognized pass through the above processes, it is indicated that the image is easy to be recognized accurately, further, the embodiment detects the target object in the image, for example, the image of the target object is obtained through an image segmentation algorithm, further, the image of the target object is detected in a text line, for example, at least one text line image is obtained through the image segmentation algorithm, then the at least one text line image is input into a text recognition model obtained through pre-training, and structured text information is output by the text recognition model.
Fig. 12 is a schematic structural diagram of an electronic device 1200 according to an embodiment of the present application, and as shown in fig. 12, the electronic device 1200 includes:
an obtaining unit 1210, configured to obtain at least one image feature of an image to be identified, where the image includes a target object, and the image feature is used to represent imaging quality of the image or a completeness of the target object;
a processing unit 1220 for determining whether the image satisfies the recognition condition based on at least one image feature;
the processing unit 1220 is further configured to identify a target object in the image when the image satisfies the identification condition, and obtain information of the target object.
The electronic device 1200 provided by the present embodiment includes an obtaining unit 1210 and a processing unit 1220, and determines whether an image satisfies a recognition condition in terms of imaging quality and/or integrity of a target object based on at least one image feature of the obtained image to be recognized, and recognizes the target object in the image when the image satisfies the recognition condition, so as to obtain an accurate recognition result.
In one possible design, the obtaining unit 1210 is specifically configured to:
acquiring at least one first characteristic of the image, wherein the first characteristic is used for representing the imaging quality of the image;
and/or the presence of a gas in the gas,
and acquiring a second characteristic of the image, wherein the second characteristic is used for representing the integrity degree of the target object.
In one possible design, the processing unit 1220 is specifically configured to:
for each image feature in at least one image feature, obtaining an evaluation result of whether the image feature meets a corresponding preset condition based on the image feature and a corresponding threshold;
based on the evaluation result of each image feature, it is determined whether the image satisfies the recognition condition.
In one possible design, the processing unit 1220 is specifically configured to:
and when the evaluation result of each image feature indicates that the image feature meets the corresponding preset condition, determining that the image meets the identification condition.
In one possible design, the obtaining unit 1210 is specifically configured to:
converting the image into a gray scale image;
based on the gray scale image, at least one first feature of the image is determined.
In one possible design, the obtaining unit 1210 is specifically configured to:
converting the gray level image into a Laplace image through a Laplace algorithm;
based on the laplacian image, at least one first feature of the image is determined.
In one possible design, the obtaining unit 1210 is specifically configured to:
and performing convolution operation on the gray image through a preset Laplace mask to obtain a Laplace image.
In one possible design, the obtaining unit 1210 is specifically configured to:
calculating to obtain at least one of variance, mean, first pixel quantity or second pixel quantity of the Laplace image based on the pixel value of each pixel point in the Laplace image;
the first pixel number is the number of adjacent pixels with pixel values larger than a first preset pixel value, the second pixel number is the number of adjacent pixels with pixel values smaller than a second preset pixel value, and the first preset pixel value is larger than the second preset pixel value.
In one possible design, the processing unit 1220 is specifically configured to:
if the image characteristics are the variance of the image, determining that the image characteristics meet corresponding preset conditions when the variance is larger than a definition threshold;
if the image characteristics are the average value of the image, determining that the image characteristics meet corresponding preset conditions when the average value is larger than a brightness threshold;
if the image features are the first pixel number, when the first pixel number is smaller than a first number threshold, determining that the image features meet corresponding preset conditions, wherein the first pixel number is the number of adjacent pixels of which the pixel values are larger than a first preset pixel value;
if the image feature is the second pixel number, when the second pixel number is smaller than a second number threshold, it is determined that the image feature satisfies the corresponding preset condition, and the second pixel number is the number of adjacent pixels of which the pixel values are smaller than a second preset pixel value.
In one possible design, the obtaining unit 1210 is specifically configured to:
carrying out image segmentation on the image through a segmentation algorithm to obtain a foreground image and a background image, wherein the foreground image comprises a target object, and the background image does not comprise the target object;
and calculating to obtain an intersection and union ratio of the foreground image and the preset image based on the foreground image and the preset image, wherein the intersection and union ratio is used for representing the ratio of the intersection and the union of the foreground image and the preset image.
In one possible design, the processing unit 1220 is specifically configured to:
and when the intersection ratio is greater than the intersection ratio threshold value, determining that the image characteristics meet corresponding preset conditions, wherein the intersection ratio is used for representing the ratio of the intersection and the union of the foreground image and the preset image, and the foreground image is an image which is obtained by carrying out image segmentation on the image and contains a target object.
In one possible design, the obtaining unit 1210 is further configured to: inputting an image to be recognized into an image classification model to obtain a classification result of the image, wherein the image classification model is obtained based on training of a first network model, the classification result is used for representing that the state of a target object is a normal state or an abnormal state, and the abnormal state at least comprises one of a copying object, a copying object or an invalid object;
the processing unit 1220 is further configured to, when the classification result indicates that the state of the target object is a normal state, perform a step of identifying the target object in the image to obtain information of the target object when the image satisfies the identification condition.
In one possible design, the processing unit is specifically configured to:
acquiring at least one text line image of a target object;
and inputting at least one text line image into a text recognition model to obtain text information of the target object, wherein the text recognition model is obtained based on the second network model training.
The electronic device provided in this embodiment can be used to implement the method in any of the above embodiments, and the implementation effect is similar to that of the method embodiment, and is not described here again.
Fig. 13 is a schematic hardware structure diagram of an electronic device 1300 according to an embodiment of the present disclosure. As shown in fig. 13, in general, an electronic device 1300 includes: a processor 1310 and a memory 1320.
Processor 1310 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1310 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1310 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 1310 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
Memory 1320 may include one or more computer-readable storage media, which may be non-transitory. Memory 1320 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1320 is used to store at least one instruction for execution by the processor 1310 to implement the methods provided by the method embodiments herein.
Optionally, as shown in fig. 10, the electronic device 1300 may further include a transceiver 1330, and the processor 1310 may control the transceiver 1330 to communicate with other devices, and specifically may transmit information or data to other devices or receive information or data transmitted by other devices.
The transceiver 1330 may include a transmitter and a receiver, among others. The transceiver 1330 can further include one or more antennas.
Optionally, the electronic device 1300 may implement corresponding processes in the methods of the embodiments of the present application, and for brevity, details are not described here again.
Those skilled in the art will appreciate that the configuration shown in fig. 10 does not constitute a limitation of the electronic device 1300, and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.
Embodiments of the present application also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method provided by the above embodiments.
The computer-readable storage medium in this embodiment may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. that is integrated with one or more available media, and the available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., SSDs), etc.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The embodiment of the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method provided by the above embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.