WO2020000879A1

WO2020000879A1 - Image recognition method and apparatus

Info

Publication number: WO2020000879A1
Application number: PCT/CN2018/116335
Authority: WO
Inventors: 周恺卉; 王长虎
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2018-06-27
Filing date: 2018-11-20
Publication date: 2020-01-02
Also published as: CN109002842A

Abstract

An image recognition method and apparatus. The method comprises: obtaining an image to be recognized (201); inputting said image to a pre-trained screenshot image recognition model to obtain a recognition result for representing whether said image is a screenshot image (202); and deleting said image in response to the recognition result representing that said image is a screenshot image (203). Recognition of an image to be recognized and deletion of a screenshot image are implemented. Due to use of a screenshot image recognition model, the verification and recognition efficiency of an image is improved compared with manual verification.

Description

Image recognition method and device

This patent application claims the priority of a Chinese patent application filed on June 27, 2018 with an application number of 201810680031.4, the applicant being Beijing BYTE Network Technology Co., Ltd., and the invention name being "Image Recognition Method and Device". Is incorporated by reference in its entirety.

Technical field

Embodiments of the present application relate to the field of computer technology, and in particular, to an image recognition method and device.

Background technique

With the rapid development of the Internet, especially the popularity of the mobile Internet, videos or images of various contents are emerging endlessly. In order to monitor video content or image content, the pictures or videos uploaded by users need to be reviewed.

Summary of the invention

The embodiments of the present application provide an image recognition method and device.

In a first aspect, an embodiment of the present application provides an image recognition method. The method includes: acquiring an image to be identified; inputting the image to be identified into a pre-trained screen image recognition model to obtain a characterization for whether the image to be identified is a screen image Of the recognition result, wherein the screen capture image recognition model is used to characterize the correspondence between the image to be recognized and the recognition result; and in response to the recognition result indicating that the image to be recognized is a screenshot image, the image to be recognized is deleted.

In some embodiments, in response to the recognition result indicating that the image to be identified is not a screenshot image, information for indicating that the image to be identified is not a screenshot image is pushed.

In some embodiments, before acquiring the image to be identified, the method includes: acquiring a target image; and capturing a preset region of the target image as the image to be identified.

In some embodiments, before acquiring the image to be identified, the method includes: acquiring a frame sequence of the target video; and selecting a target frame in the frame sequence of the target video as the image to be identified.

In some embodiments, the screenshot image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training sample includes a sample image and label information used to characterize whether the sample image is a screenshot image; The sample image is used as input, and the label information corresponding to the input sample image is used as the desired output, and a screenshot image recognition model is trained.

In some embodiments, the method further includes: in response to the recognition result indicating that the image to be recognized is not a screenshot image, performing text recognition on the image to be recognized to obtain the recognition result; determining whether the recognition result includes a preset text; and in response to determining the recognition result Contains preset text to delete images to be identified.

In a second aspect, an embodiment of the present application provides an image recognition device, the device includes: an image to be identified acquisition unit configured to obtain the image to be identified; an identification unit configured to input the image to be identified into a pre-trained screenshot An image recognition model to obtain a recognition result used to characterize whether the image to be recognized is a screenshot image, wherein the screenshot image recognition model is used to represent the correspondence between the image to be recognized and the recognition result; a first deletion unit configured to respond to the recognition result The image to be identified is a screenshot image, and the image to be identified is deleted.

In some embodiments, the apparatus further includes: a pushing unit configured to indicate that the image to be identified is not a screenshot image in response to the recognition result, and to push information for indicating that the image to be identified is not a screenshot image.

In some embodiments, the apparatus further includes: a target image acquisition unit configured to acquire a target image; and a capture unit configured to intercept a preset region of the target image as an image to be identified.

In some embodiments, the apparatus further includes: a frame sequence acquisition unit configured to acquire a frame sequence of the target video; and a selection unit configured to select a target frame in the frame sequence of the target video as an image to be identified.

In some embodiments, the apparatus further includes: a recognition unit configured to indicate that the image to be recognized is not a screenshot image in response to the recognition result, to perform text recognition on the image to be recognized to obtain a recognition result; and a determination unit configured to determine the recognition result Whether to include a preset text; and a second deleting unit configured to delete the image to be recognized in response to determining that the recognition result includes the preset text.

According to a third aspect, an embodiment of the present application provides an electronic device. The electronic device includes: one or more processors; a storage device that stores one or more programs thereon; Or multiple processors execute, so that the above one or more processors implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable medium on which a computer program is stored. When the foregoing program is executed by a processor, the method as described in any implementation manner of the first aspect is implemented.

The image recognition method and device provided in the embodiments of the present application recognize a to-be-recognized image by taking a screenshot image recognition model. If the image to be identified is a screenshot, delete it. Thus, the recognition of the image to be recognized and the deletion of the screenshot image are realized. Among them, because the screen capture image recognition model is used, compared with manual review, the efficiency of image review and recognition is improved.

BRIEF DESCRIPTION OF THE DRAWINGS

Other features, objects, and advantages of the present application will become more apparent by reading the detailed description of the non-limiting embodiments with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied; FIG.

2 is a flowchart of an embodiment of an image recognition method according to the present application;

3 is a schematic diagram of an application scenario of the image recognition method according to the present application;

4 is a flowchart of another embodiment of an image recognition method according to the present application;

5 is a schematic structural diagram of an embodiment of an image recognition apparatus according to the present application;

FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a server according to an embodiment of the present application.

detailed description

The following describes the present application in detail with reference to the accompanying drawings and embodiments. It can be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. It should also be noted that, for convenience of description, only the parts related to the related invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The application will be described in detail below with reference to the drawings and embodiments.

FIG. 1 illustrates an exemplary system architecture 100 to which an image recognition method or an image recognition apparatus of an embodiment of the present application can be applied.

As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, and 103, a network 104, and a server 105. The network 104 is a medium for providing a communication link between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications can be installed on the

terminal devices

101, 102, and 103, such as photographing applications, picture processing applications, instant messaging tools, email clients, social platform software, and the like.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they can be various electronic devices with support for storing and transmitting images, including, but not limited to, smart phones, tablet computers, laptop computers, and desktop computers. When the

terminal devices

101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

The server 105 may be a server that provides various services, such as a background server that processes images stored in the

terminal devices

101, 102, and 103. The background server may process the received image (for example, identify whether it is a screenshot image), and perform corresponding processing according to the processing result (for example, the recognition result).

It should be noted that the image recognition method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the image recognition device is generally provided in the server 105.

It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.

With continued reference to FIG. 2, a flowchart 200 of an embodiment of an image recognition method according to the present application is shown. The image recognition method includes the following steps:

Step 201: Obtain an image to be identified.

In this embodiment, an execution subject of the image recognition method (for example, a server shown in FIG. 1) may obtain an image to be recognized from a terminal in a wired connection manner or a wireless connection manner. In addition, the image to be identified may also be stored locally on the execution subject. At this time, the execution subject may directly obtain the image to be identified from the local. The image to be identified may be any image that needs to be identified. In practice, the images to be identified can be specified by a technician or filtered according to certain conditions.

In some optional implementations of this embodiment, before acquiring an image to be identified, the method may include: acquiring a frame sequence of the target video; and selecting a target frame in the frame sequence of the target video as the image to be identified.

In these implementations, the target video can be any video. The determination of the target video can be specified by a technician, or it can be filtered according to certain conditions. The target frame may be at least one frame in the above-mentioned frame sequence. The target frame can be specified by a technician, or it can be filtered according to certain conditions. As an example, the condition may be: one frame is drawn at 2 second intervals.

In step 202, an image to be identified is input to a pre-trained screen image recognition model, and a recognition result for characterizing whether the image to be identified is a screen image is obtained.

In this embodiment, the above-mentioned execution subject may input an image to be recognized into a pre-trained screen image recognition model. Thereby, a recognition result for characterizing whether the image to be recognized is a screenshot image is obtained. The screenshot image may be an image recording content displayed on a screen of the electronic device. The recognition results can take many forms. For example, you can use a number to indicate whether the image to be identified is a screenshot. Specifically, “1” may be used to indicate that the image to be identified is a screenshot image. Use "0" to indicate that the image to be identified is not a screenshot. For another example, the recognition result may also be a value between 0 and 1, which is used to indicate the probability that the image to be recognized is a screenshot image. In addition, the recognition results can also be text, characters, and so on. Here, the form of the recognition result is not limited.

In this embodiment, the screenshot image recognition model is used to characterize the correspondence between the image to be recognized and the recognition result. As an example, the screenshot image model may be a correspondence table storing a large number of images (including screenshot images or non-screenshot images) and recognition results corresponding to the images. The correspondence relationship table may be generated based on statistics of a large number of images and recognition results. In this way, the above-mentioned execution subject can match the image to be identified with a large number of images in the correspondence table. Thereby, an image whose matching degree with the image to be identified in the correspondence table is greater than a preset threshold (for example, 95%) can be determined. After that, the recognition result corresponding to the determined image may be used as the recognition result of the image to be recognized.

In this embodiment, the above screenshot image recognition model may also be a neural network. The neural network abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods. Usually consists of a large number of nodes (or neurons) connected to each other, each node represents a specific output function, called the excitation function. The connection between each two nodes represents a weighted value for the signal passing through the connection, called a weight (also called a parameter), and the output of the network varies according to the connection mode, weight value and incentive function of the network. A neural network usually includes multiple layers, and each layer includes multiple nodes. Generally, the nodes of the same layer can have the same weight, and the nodes of different layers can have different weights, so the parameters of multiple layers of the neural network can also be different.

Step 203: In response to the recognition result indicating that the image to be recognized is a screenshot image, the image to be recognized is deleted.

In this embodiment, in response to the recognition result indicating that the image to be recognized is a screenshot image, the execution subject may delete the image to be recognized.

In some optional implementation manners of this embodiment, in response to the recognition result indicating that the image to be identified is not a screenshot image, the above-mentioned execution subject may also push information for indicating that the image to be identified is not a screenshot image.

In some optional implementations of this embodiment, in response to the recognition result indicating that the image to be recognized is not a screenshot image, text recognition is performed on the to-be-recognized image to obtain the recognition result; determining whether the recognition result includes a preset text; and responding to determining the recognition The result contains preset text, and the image to be recognized is deleted.

In these implementation manners, in response to the recognition result indicating that the image to be recognized is not a screenshot image, the above-mentioned execution subject may perform character recognition on the image to be recognized through various methods to obtain the recognition result. The recognition result may be related information of the text displayed in the image to be recognized. As an example, OCR (Optical Character Recognition) technology can be used to perform text recognition on the image to be recognized, thereby obtaining the text displayed in the image to be recognized. After that, the execution body may determine whether the recognition result (for example, the obtained text) contains a preset text (for example, the name of an operator, etc.). If so, the execution subject may delete the image to be identified.

With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of the image recognition method according to this embodiment. In the application scenario of FIG. 3, the execution body of the image recognition method is the server 300. The server 300 may first obtain an image 301 to be identified from a terminal. Then, the to-be-recognized image 301 is input to a pre-trained screen image recognition model to obtain a recognition result. If the recognition result indicates that the to-be-recognized image 301 is a screenshot image, the to-be-recognized image 301 is deleted.

The image recognition method provided by the above embodiments of the present application uses a screen capture image recognition model to identify an image to be recognized. If the image to be identified is a screenshot, delete it. Thus, the recognition of the image to be recognized and the deletion of the screenshot image are realized. Among them, because the screen capture image recognition model is used, compared with manual review, the efficiency of image review and recognition is improved.

Further reference is made to FIG. 4, which illustrates a process 400 of still another embodiment of the image recognition method. The process 400 of the image recognition method includes the following steps:

Step 401: Obtain a target image.

In this embodiment, the execution subject of the image recognition method may obtain the target image from the terminal in a wired connection or a wireless connection. The target image can be any image. In practice, the target image can be specified by a technician, or it can be filtered based on preset conditions. In addition, the target image may be stored locally in the execution subject. At this time, the execution subject may also directly obtain the target image from the local.

Step 402: Capture a preset area of the target image as the image to be identified.

In this embodiment, the execution subject may intercept a preset area of the target image as the image to be identified. The preset area may be a part or all of the target image. For example, it can be the upper fifth area. In practice, the above-mentioned execution subject may intercept the preset area of the target image in various ways. For example, through some screenshot applications or image processing applications.

Step 403: Acquire an image to be identified.

In this embodiment, the execution subject may obtain the to-be-recognized image obtained in step 402. Because the image to be identified is obtained in step 402, it can generally be obtained directly from the local.

Step 404: Input the image to be identified into a pre-trained screen image recognition model, and obtain a recognition result used to characterize whether the image to be identified is a screen image.

In this embodiment, the above-mentioned screenshot image recognition model may be a model obtained by training an image classification network, such as a Convolutional Neural Network (CNN), based on multiple training samples using a machine learning method. Among them, the convolutional neural network can be a kind of feed-forward neural network, and its artificial neurons can respond to a part of the surrounding cells in the coverage area, and it has excellent performance for image processing. A convolutional neural network may include a convolutional layer, a pooling layer, a depooling layer, and a deconvolution layer. The convolution layer can be used to extract image features. The pooling layer can be used to downsample the input information. The depooling layer can be used to upsample the input information, the deconvolution layer is used to deconvolve the input information, and the transposition of the convolution kernel of the convolution layer is used as the deconvolution layer. The convolution kernel processes the input information.

As an example, the above screenshot image recognition model can be trained by the following steps:

The first step is to obtain a training sample set, where each training sample includes a sample image and annotation information used to characterize whether the sample image is a screenshot image. In practice, it is possible to manually label whether a sample image is a screenshot image, thereby obtaining labeling information of each sample image. Here, the label information may be in various forms. As an example, the label information may be a numerical value. For example, "0" indicates that it is not a screenshot image, and "1" indicates that it is a screenshot image. As an example, the label information may also be text, characters, and so on.

In the second step, the sample image of the training samples in the training sample set is used as input, and the label information corresponding to the input sample image is used as the desired output, and a screenshot image recognition model is trained.

Specifically, the sample images of the training samples can be input into the initial image classification network. The initial image classification network may be various image classification networks. As an example, it may be a residual network (Residual Network, ResNet), VGG, or the like. VGG is a classification model proposed by the Visual Geometry Group (VGG) of a university. In practice, an initial value can be set for the initial image classification network. For example, it could be some different small random numbers. The "small random number" is used to ensure that the network does not enter a saturation state due to excessive weights, which causes training failure. "Different" is used to ensure that the network can learn normally. After that, the recognition result of the input sample image can be obtained. Using the annotation information corresponding to the input sample image as the expected output of the initial image classification network, the machine learning method is used to train the initial image classification network. Specifically, the difference between the recognition result and the label information calculated by using a preset loss function can be used first. Then, based on the obtained differences, the parameters of the initial image classification network can be adjusted, and if the preset training end condition is met, the training is ended, and the trained initial image classification network is used as a screenshot image recognition model. The training end condition here includes but is not limited to at least one of the following: the training time exceeds a preset duration; the number of training times reaches a preset number of times; and the calculated difference is less than a preset difference threshold.

Various methods can be used here to adjust the parameters of the initial image classification network based on the difference between the obtained recognition result and the labeled information corresponding to the input training sample. For example, a BP (Back Propagation, Back Propagation) algorithm or a SGD (Stochastic Gradient Descent, Stochastic Gradient Descent) algorithm can be used to adjust the parameters of the initial image classification network.

It should be noted that the execution subject of the training step and the image recognition method may be the same or different. If they are the same, the execution subject can store the network structure and parameter values of the trained image recognition model locally after training to obtain the screen image recognition model. If they are different, after the training subject obtains a screen capture image recognition model from training, the network structure and parameter values of the model may be sent to the image recognition method execution subject.

Step 405: In response to the recognition result indicating that the image to be recognized is a screenshot image, the image to be recognized is deleted.

For the specific processing of step 405 and the technical effects brought by it, reference may be made to step 203 of the embodiment corresponding to FIG. 2, and details are not described herein again.

As can be seen from FIG. 4, compared with the embodiment corresponding to FIG. 2, the process 400 of the image recognition method in this embodiment adds an image interception step, thereby reducing unnecessary interference information in the image and improving image recognition. Accuracy.

Further referring to FIG. 5, as an implementation of the methods shown in the foregoing figures, this application provides an embodiment of an image recognition device. The device embodiment corresponds to the method embodiment shown in FIG. 2. The device may specifically Used in various electronic equipment.

As shown in FIG. 5, the image recognition device 500 in this embodiment includes an image acquisition unit 501, an image recognition unit 502, and a first deletion unit 503. The image to-be-identified unit 501 is configured to acquire an image to be identified. The image recognition unit 502 is configured to input a to-be-recognized image into a pre-trained screenshot recognition model to obtain a recognition result used to characterize whether the to-be-recognized image is a screenshot image, where the screenshot-recognition model is used to represent the Correspondence of recognition results. The first deleting unit 503 is configured to delete the image to be identified in response to the recognition result indicating that the image to be identified is a screenshot image.

In this embodiment, for the specific processing of the to-be-recognized image acquisition unit 501, the image recognition unit 502, and the first deletion unit 503 in the image recognition apparatus 500, and the technical effects brought by it, refer to the steps in the corresponding embodiment in FIG. 2 respectively. Relevant descriptions of 201-203 are not repeated here.

In some optional implementation manners of this embodiment, the apparatus 500 may further include: a push unit (not shown in the figure). The pushing unit is configured to indicate that the image to be identified is not a screenshot image in response to the recognition result, and to push information for indicating that the image to be identified is not a screenshot image.

In some optional implementation manners of this embodiment, the apparatus 500 may further include: a target image acquisition unit (not shown in the figure) and a capture unit (not shown in the figure). The target image acquisition unit is configured to acquire a target image. The capturing unit is configured to capture a preset area of the target image as an image to be identified.

In some optional implementation manners of this embodiment, the apparatus 500 further includes: a frame sequence acquisition unit and a selection unit. The frame sequence obtaining unit is configured to obtain a frame sequence of a target video. The selection unit is configured to select a target frame in a frame sequence of the target video as an image to be identified.

In some optional implementations of this embodiment, the screenshot image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training sample includes a sample image and annotation information used to characterize whether the sample image is a screenshot image; The sample images of the training samples in the training sample set are used as input, and the label information corresponding to the input sample images is used as the desired output, and a screenshot image recognition model is trained.

In some optional implementation manners of this embodiment, the apparatus 500 may further include: an identifying unit (not shown in the figure), a determining unit (not shown in the figure), and a second deleting unit (not shown in the figure) . The recognition unit is configured to respond to the recognition result to indicate that the image to be recognized is not a screenshot image, and perform text recognition on the image to be recognized to obtain the recognition result; the determination unit is configured to determine whether the recognition result includes a preset text; the second deletion unit , Configured to delete the image to be recognized in response to determining that the recognition result includes a preset text.

In this embodiment, the above-mentioned image recognition unit 502 inputs the to-be-recognized image obtained by the to-be-recognized image acquisition unit 501 into a pre-trained screen image recognition model, and recognizes the to-be-recognized image. If the image to be identified is a screenshot image, it is deleted by the first deleting unit 503. Thus, the recognition of the image to be recognized and the deletion of the screenshot image are realized. Among them, because the screen capture image recognition model is used, compared with manual review, the efficiency of image review and recognition is improved.

Reference is now made to FIG. 6, which shows a schematic structural diagram of a computer system 600 suitable for implementing a server according to an embodiment of the present application. The server shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.

As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into a random access memory (RAM) 603 from a program stored in a read-only memory (ROM) 602 or from a storage portion 608 Instead, perform various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

The following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; a storage portion 608 including a hard disk and the like And a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I / O interface 605 as necessary. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage section 608 as needed.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and / or installed from a removable medium 611. When the computer program is executed by a central processing unit (CPU) 601, the above-mentioned functions defined in the method of the present application are executed.

It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. . Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of this application may be written in one or more programming languages, or a combination thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language—such as "C" or a similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider) Internet connection).

The flowchart and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions labeled in the blocks may also occur in a different order than those labeled in the drawings. For example, two blocks represented one after the other may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an image acquisition unit to be identified, an image recognition unit, and an image first deletion unit. The names of these units do not constitute a limitation on the unit itself in some cases. For example, the image acquisition unit to be identified may also be described as a “unit to acquire an image to be identified”.

As another aspect, the present application also provides a computer-readable medium, which may be included in the server described in the above embodiments; or may exist alone without being assembled into the server. The computer readable medium carries one or more programs, and when the one or more programs are executed by the server, the server: obtains an image to be identified; enters the image to be identified into a pre-trained screen image recognition model to obtain It is used to characterize the recognition result of whether the image to be recognized is a screenshot image, wherein the screenshot image recognition model is used to characterize the correspondence between the image to be recognized and the recognition result; in response to the recognition result, the image to be recognized is a screenshot image, and the image to be recognized is deleted .

The above description is only a preferred embodiment of the present application and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution of the specific combination of the above technical features, but should also cover the above technical features or Other technical solutions formed by arbitrarily combining their equivalent features. For example, a technical solution formed by replacing the above features with technical features disclosed in the present application (but not limited to) having similar functions.

Claims

An image recognition method includes:

Obtaining images to be identified;

Inputting the image to be identified into a pre-trained screenshot image recognition model to obtain a recognition result that is used to characterize whether the image to be recognized is a screenshot image, wherein the screenshot image recognition model is used to represent the image to be recognized and the recognition result Corresponding relationship

In response to the recognition result indicating that the image to be identified is a screenshot image, the image to be identified is deleted.
The method of claim 1, further comprising:

In response to the recognition result indicating that the image to be identified is not a screenshot image, information for indicating that the image to be identified is not a screenshot image is pushed.
The method according to claim 1, wherein before the acquiring an image to be identified, comprises:

Obtaining a target image;

A preset area of the target image is captured as the image to be identified.
The method according to claim 1, wherein before the acquiring an image to be identified, comprises:

Get the frame sequence of the target video;

Selecting a target frame in a frame sequence of the target video as the image to be identified.
The method according to any one of claims 1-4, wherein the screenshot image recognition model is obtained by training in the following steps:

Obtaining a training sample set, where the training sample includes a sample image and annotation information used to characterize whether the sample image is a screenshot image;

The sample image of the training sample in the training sample set is used as input, and the label information corresponding to the input sample image is used as the desired output, and the screenshot image recognition model is trained.
The method according to any one of claims 1-4, wherein the method further comprises:

In response to the recognition result characterizing that the image to be recognized is not a screenshot image, performing text recognition on the image to be recognized to obtain a recognition result;

Determining whether the recognition result includes a preset text;

In response to determining that the recognition result includes the preset text, the image to be recognized is deleted.
An image recognition device includes:

An image-to-be-identified obtaining unit configured to acquire an image to-be-identified;

A recognition unit configured to input the image to be recognized into a pre-trained screenshot image recognition model to obtain a recognition result used to characterize whether the image to be recognized is a screenshot image, wherein the screenshot image recognition model is used for The correspondence between the image to be identified and the recognition result;

The first deleting unit is configured to delete the image to be identified in response to the recognition result indicating that the image to be identified is a screenshot image.
The apparatus according to claim 7, wherein the apparatus further comprises:

The pushing unit is configured to indicate that the image to be identified is not a screenshot image in response to the recognition result, and to push information for indicating that the image to be identified is not a screenshot image.
The apparatus according to claim 7, wherein the apparatus further comprises:

A target image acquisition unit configured to acquire a target image;

The capturing unit is configured to capture a preset area of the target image as the image to be identified.
The apparatus according to claim 7, wherein the apparatus further comprises:

A frame sequence obtaining unit configured to obtain a frame sequence of a target video;

The selection unit is configured to select a target frame in a frame sequence of the target video as the image to be identified.
The apparatus according to any one of claims 7 to 10, wherein the screenshot image recognition model is obtained by training in the following steps:

Obtaining a training sample set, where the training sample includes a sample image and annotation information used to characterize whether the sample image is a screenshot image;

The sample image of the training sample in the training sample set is used as input, and the label information corresponding to the input sample image is used as the desired output, and the screenshot image recognition model is trained.
The apparatus according to any one of claims 7 to 10, wherein the apparatus further comprises:

A recognition unit configured to, in response to the recognition result indicating that the image to be recognized is not a screenshot image, perform text recognition on the image to be recognized to obtain a recognition result;

A determining unit configured to determine whether the recognition result includes a preset text;

The second deleting unit is configured to delete the image to be identified in response to determining that the recognition result includes the preset text.
A server including:

One or more processors;

A storage device on which one or more programs are stored;

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-6.
A computer-readable medium having stored thereon a computer program, wherein when the program is executed by a processor, the method according to any one of claims 1-6 is implemented.