CN114339049A

CN114339049A - Video processing method and device, computer equipment and storage medium

Info

Publication number: CN114339049A
Application number: CN202111672470.9A
Authority: CN
Inventors: 钟华平; 何聪辉
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-12
Also published as: WO2023123981A1

Abstract

The present disclosure provides a video processing method, apparatus, computer device and storage medium, wherein the method comprises: acquiring a video to be processed; identifying at least one type of target object in each frame of video image of a video to be processed, wherein each type of target object is related to personal information; performing fuzzy processing on the target object in the identified video image to obtain a target image; and generating a target video after the blurring processing based on the target images respectively corresponding to each frame of video image.

Description

Video processing method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a video processing method and apparatus, a computer device, and a storage medium.

Background

With the rapid development of network technology, people can acquire various information, such as video information, picture information, and the like, through various ways and manners. The diversification of information acquisition ways and acquisition modes increases the convenience of information acquisition, but also increases the risk of information leakage, for example, the leakage of sensitive information such as license plate information and face information in driving videos affects the data security.

How to prevent the leakage of sensitive information in video information and ensure the safety of data while ensuring the convenience of information acquisition becomes a problem which needs to be solved urgently.

Disclosure of Invention

The embodiment of the disclosure at least provides a video processing method, a video processing device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a video processing method, including:

acquiring a video to be processed;

identifying at least one type of target object in each frame of video image of the video to be processed, wherein each type of target object is related to personal information;

performing fuzzy processing on the identified target object in the video image to obtain a target image;

and generating a target video after the blurring processing based on the target images respectively corresponding to the video images of each frame.

The target objects related to the personal information are usually sensitive information in the video images, and the removal of the sensitive information in each frame of video image can be realized by identifying various types of target objects related to the personal information in each frame of video image and then performing fuzzy processing on the target objects in each frame of video image, so that the target video from which various sensitive information is removed, namely the desensitized target video, is obtained, and the safety of video data is effectively improved.

In a possible implementation manner, the blurring the identified target object in the video image to obtain a target image includes:

in response to identifying a plurality of the target objects in the video image, matting an initial sub-image corresponding to each of the target objects from the video image;

blurring each initial sub-image to obtain a target sub-image corresponding to each initial sub-image;

and replacing the corresponding initial sub-image in the video image by each target sub-image to obtain the target image.

According to the embodiment, the initial sub-images corresponding to the target objects are extracted, and the initial sub-images are subjected to the fuzzy processing, so that compared with the operation of directly performing the fuzzy processing on the whole video image, the overall fuzzy processing efficiency can be improved.

In a possible implementation, for any of the initial sub-images, the blurring process is performed on the initial sub-image according to the following steps:

dividing the initial sub-image into a plurality of processing regions;

determining a target pixel value corresponding to each processing region based on the pixel value of each pixel point in each processing region;

and replacing the pixel values corresponding to the pixel points in each processing area with the determined target pixel values to obtain the target sub-image corresponding to the initial sub-image.

In this embodiment, the pixel values of the pixel points in different processing regions are different, the obtained target pixel value corresponding to each processing region is also different, and the pixel values of the processing regions are replaced by using different target pixel values, so that the processing regions in the target image obtained after replacement have pixel value differences, i.e., color changes, and the target image after replacement is more natural.

In a possible implementation manner, the determining a target pixel value corresponding to each processing region based on pixel values corresponding to respective pixel points in each processing region includes:

and determining a pixel value mean value corresponding to each processing area based on the pixel value of each pixel point in each processing area, and taking the pixel value mean value corresponding to each processing area as the target pixel value corresponding to the processing area.

In this embodiment, the pixel mean value is used as the target pixel value, so that the target pixel value can represent the concentration degree of each pixel point of the corresponding processing region, and further the pixel values of the processing regions after the pixel values are replaced are more balanced.

and determining a pixel value extreme value corresponding to each processing area based on the pixel value of each pixel point in each processing area, and taking the pixel value extreme value corresponding to each processing area as the target pixel value corresponding to the processing area.

This embodiment can make the difference in pixel value between the respective processing areas after replacing the pixel value more conspicuous if the pixel extremum is used as the target pixel value.

In a possible implementation manner, in a case that the video to be processed is a video in a photographed road environment, the object type of the target object includes a face type and a license plate type.

According to the embodiment, the target object of the face type and the target object of the license plate type belong to personal information, and the data security can be effectively improved through fuzzy processing of the two types of target objects.

In one possible implementation, the identifying at least one type of target object in each frame of video image of the video to be processed is identified by using a pre-trained target neural network, and the target neural network is a neural network which is trained by using a plurality of sample images and can be used for identifying a plurality of types of target objects.

According to the embodiment, the target neural network trained by the sample images can identify various types of target objects and has high identification precision, and each frame of video image in the video to be processed is processed by the trained target neural network, so that various types of target objects included in each frame of video image can be accurately identified, namely various target objects related to personal information can be accurately identified.

In one possible embodiment, the target neural network includes a shared network and a plurality of branch networks, each branch network being used for identifying a type of target object;

identifying at least one type of target object in each frame of video image of the video to be processed by using a pre-trained target neural network, wherein the method comprises the following steps:

performing video decoding processing on the video to be processed through the shared network in the target neural network to obtain each frame of video image corresponding to the video to be processed; carrying out continuous multiple times of down-sampling processing and up-sampling processing on the video image aiming at each frame of the video image;

determining, by a plurality of branch networks in the target neural network, at least one type of target object included in the video image based on a result of the sampling process.

According to the embodiment, based on continuous up-sampling processing and down-sampling processing, the feature information in the video image can be fully extracted, the extracted information is processed through a plurality of target branch networks, and various types of target objects in the video image can be accurately obtained.

In a possible implementation, the performing, for each frame of the video image, successive multiple downsampling processing and upsampling processing on the video image includes:

for each frame of video image, carrying out continuous multiple downsampling processing on the video image to respectively obtain image characteristic information corresponding to each downsampling processing; the input information of the next downsampling processing in the downsampling processing of a plurality of times is the image characteristic information obtained by the previous downsampling processing, wherein the input information of the first downsampling processing is the video image;

and performing continuous up-sampling processing on the image characteristic information obtained by the last down-sampling processing for multiple times to respectively obtain initial category information corresponding to each up-sampling processing and initial detection frame information corresponding to the initial category information.

In the embodiment, by performing multiple downsampling and multiple upsampling, the feature information in the video image can be fully extracted, and accurate initial category information and initial detection frame information can be obtained.

In one possible embodiment, determining, by a plurality of branch networks in the target neural network, at least one type of target object included in the video image based on the result after the sampling process includes:

performing continuous feature extraction on initial category information corresponding to each upsampling process for multiple times by using a target branch network matched with the initial category information in the plurality of branch networks to obtain target category information, and performing continuous feature extraction on initial detection frame information corresponding to the upsampling process for multiple times to obtain target detection frame information;

and determining at least one type of target object included in the video image based on the target class information and the target detection frame information corresponding to each obtained upsampling process.

In the embodiment, the target branch network matched with the initial category information is suitable for processing the initial category information, the initial category information is subjected to continuous feature extraction for multiple times by using the target branch network matched with the initial category information, so that sufficient information extraction can be realized, and accurate target category information can be obtained; thus, each target object of each type included in the video image and the position information thereof can be obtained based on the obtained target category information and the target detection frame information corresponding to each upsampling process.

In a possible implementation, the determining at least one type of target object included in the video image based on the target category information and the target detection frame information corresponding to each obtained upsampling process includes:

determining the position information of each target object corresponding to each upsampling process based on the target class information and the target detection frame information corresponding to each upsampling process;

determining whether positions of a plurality of target objects corresponding to a plurality of times of upsampling processing are overlapped or not based on the position information;

in response to the position overlapping, determining confidence degrees corresponding to a plurality of target objects with overlapped positions respectively, and taking the target object with the highest confidence degree as a final target object;

and taking each determined final target object and each target object without position overlapping as target objects in the video image.

Due to different times of upsampling, the corresponding initial detection frame information of the same target object can be identified, and the embodiment further processes the target class information and the target detection frame information obtained by each time of upsampling, so that a plurality of target objects with overlapped positions inevitably occur, wherein the plurality of target objects with overlapped positions are the same target object with high probability; in addition, since there is a difference in information obtained by each upsampling process, there may be a difference in confidence even if the same target object is obtained; therefore, the target object with the highest confidence degree among the plurality of target objects with overlapping positions is used as the final target object, and the accuracy of the finally identified target object can be improved.

In a second aspect, an embodiment of the present disclosure further provides a video processing apparatus, including:

the acquisition module is used for acquiring a video to be processed;

the identification module is used for identifying at least one type of target object in each frame of video image of the video to be processed, and each type of target object is related to personal information;

the processing module is used for carrying out fuzzy processing on the identified target object in the video image to obtain a target image;

and the generating module is used for generating a target video after the blurring processing based on the target images respectively corresponding to the video images of each frame.

In a possible implementation, the processing module is configured to, in response to identifying a plurality of the target objects in the video image, scratch an initial sub-image corresponding to each of the target objects from the video image;

In a possible implementation, the processing module is configured to, for any of the initial sub-images, perform blurring on the initial sub-image according to the following steps:

dividing the initial sub-image into a plurality of processing regions;

In a possible implementation manner, the processing module is configured to determine a pixel value mean value corresponding to each processing region based on pixel values of respective pixels in each processing region, and use the pixel value mean value corresponding to each processing region as the target pixel value corresponding to the processing region.

In a possible implementation manner, the processing module is configured to determine a pixel value extremum corresponding to each processing region based on pixel values of respective pixel points in each processing region, and use the pixel value extremum corresponding to each processing region as the target pixel value corresponding to the processing region.

In a possible implementation manner, the identifying at least one type of target object in each frame of video image of the video to be processed is identified by using a pre-trained target neural network, and the target neural network is a neural network which is trained by using a plurality of sample images and can be used for identifying a plurality of types of target objects.

the identification module is configured to identify at least one type of target object in each frame of video image of the video to be processed by using a pre-trained target neural network, and includes:

In a possible implementation manner, the identifying module is configured to perform downsampling processing on each frame of the video image for a plurality of times in succession, and obtain image feature information corresponding to each downsampling processing; the input information of the next downsampling processing in the downsampling processing of a plurality of times is the image characteristic information obtained by the previous downsampling processing, wherein the input information of the first downsampling processing is the video image;

and the device is used for performing continuous up-sampling processing on the image characteristic information obtained by the last down-sampling processing for multiple times, and respectively obtaining initial category information corresponding to each up-sampling processing and initial detection frame information corresponding to the initial category information.

In a possible implementation manner, the identification module is configured to perform, by using a target branch network matched with the initial category information in the plurality of branch networks, continuous multiple feature extraction on the initial category information corresponding to each upsampling process to obtain target category information, and perform continuous multiple feature extraction on the initial detection frame information corresponding to the upsampling process to obtain target detection frame information;

In a possible implementation manner, the identification module is configured to determine, based on the target class information and the target detection frame information corresponding to each upsampling process, position information of each target object corresponding to each upsampling process;

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the video processing apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the video processing method, which is not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flow chart of a video processing method provided by an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a shared network provided by an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating processing of a frame of video image in a video to be processed according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a comparison of an initial sub-image and a target sub-image provided by an embodiment of the present disclosure;

fig. 5 shows a schematic diagram of a video processing apparatus provided by an embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Furthermore, the terms "first," "second," and the like in the description and in the claims, and in the drawings described above, in the embodiments of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Reference herein to "a plurality or a number" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Research shows that in order to improve the security of video data, sensitive information in a video needs to be anonymized, but in the prior art, the anonymization of the sensitive information in the video is realized by using a video tracking algorithm, and specifically, after the sensitive information in a first frame image of the video is determined, the sensitive information is tracked and positioned by using the video tracking algorithm, and then anonymization is performed. However, due to the tracking accuracy of the video tracking algorithm, the anonymization processing effect of the sensitive information is poor and the security of the video data cannot be guaranteed due to the fact that the sensitive information is failed to track or a large tracking error exists.

Based on the research, the present disclosure provides a video processing method, an apparatus, a computer device, and a storage medium, which can remove sensitive information in each frame of video image by identifying various types of target objects related to personal information in each frame of video image and then performing blur processing on the target objects in each frame of video image, thereby obtaining a target video from which various sensitive information is removed, that is, obtaining a desensitized target video, and effectively improving the security of video data.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, a detailed description is first given of a video processing method disclosed in the embodiments of the present disclosure, where an execution subject of the video processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and in some possible implementations, the video processing method may be implemented by a processor calling a computer-readable instruction stored in a memory.

The following describes a video processing method provided by the embodiment of the present disclosure by taking an execution subject as a computer device as an example.

As shown in fig. 1, a flowchart of a video processing method provided in an embodiment of the present disclosure may include the following steps:

s101: and acquiring a video to be processed.

Here, the video to be processed may be a video including one or more types of target objects photographed by any one photographing apparatus, and each type of target object may include one or more types. For example, the video to be processed may be a front road environment video shot by a vehicle data recorder, and at this time, personal information such as license plate information, face information of a driver or a passenger, and the like may appear in the video to be processed.

Each type of target object is associated with personal information, i.e., sensitive information in the image, such as face information, license plate information, and the like. The target object may be an object related to personal information in the video, for example, the target object may be each face, each license plate number, each phone number, and the like appearing in the video.

S102: identifying at least one type of target object in each frame of video image of a video to be processed; each type of target object is associated with personal information.

In one embodiment, identifying at least one type of target object in each frame of video image of the video to be processed may be identified by using a pre-trained target neural network, and the target neural network may be a neural network trained by using a plurality of sample images and capable of being used for identifying a plurality of types of target objects.

The target neural network is a pre-trained neural network capable of identifying multiple types of target objects in the video. Each type is an object type corresponding to personal information, for example, the object type corresponding to face information is a face type, the object type corresponding to telephone number information is a telephone number type, the object type corresponding to text information is a text type, and the like.

In the process of training the target neural network in advance, sample images under different scenes in a large scale can be used, each sample image can comprise one or more types of sample objects, the number of the sample objects in each sample image is different, and even if the same number of sample images of the same type are included, the scene corresponding to each sample image can be different. By training the target neural network by using a large number of sample images, it is possible to obtain a video recognition system having reliable recognition accuracy and suitable for various scenes including various types of target objects.

The type of the target object that can be identified by the trained target neural network may be set according to an actual application scenario, and the disclosure is not particularly limited. In a specific application, only a sample image including any type of sample object to be identified needs to be selected, and the target neural network is trained, namely the target neural network capable of identifying any type of target object is obtained.

In the specific implementation of this step, after the video to be processed is obtained, the video to be processed may be input to a pre-trained target neural network, and video decoding and frame-cutting processing are performed on the video to be processed by using the target neural network, so as to obtain each frame of video image included in the video to be processed. Then, for each frame of video image, the frame of video image may be subjected to recognition processing to determine each target object of each type included in the frame of video image. For example, a video image may be subjected to convolution processing for a plurality of times in succession by using a convolution layer in a target neural network, image feature information of the video image is extracted, feature information belonging to an object type and feature information belonging to each target object are determined according to the image feature information, and then, the type of the target object included in the video image and each type of the target object are determined.

In this way, based on the recognition processing of each frame of video image, each target object of various types included in each frame of video image, that is, each target object of various types related to personal information included in the video to be processed can be accurately determined.

S103: and carrying out fuzzy processing on the identified target object in the video image to obtain a target image.

The target image is an image obtained by desensitizing each target object of various types included in the video image.

In specific implementation, after identifying each target object of each type included in each frame of video image based on the above S102, each target object in each frame of video image may be subjected to blurring processing.

The blurring processing mode may include at least one of the following:

the first way is to cover or replace the target object with a preset filling pattern.

And in the second mode, the pixel value of each pixel point corresponding to the target object is replaced by a preset pixel value.

And thirdly, replacing pixel values of pixel points corresponding to part of key areas (such as five sense organs in the face and all or part of license plate numbers in the license plate) of the target object with the target pixel values.

And determining a target processing mode corresponding to each type of the identified target object according to the type of the target object and the association relationship between the preset type and the fuzzy processing mode (for example, the three modes can be included), and then performing fuzzy processing on the target object by using the processing mode.

And fifthly, cutting out the image area corresponding to the target object from the video image, and then pasting an object schematic diagram (such as a contour diagram) with the same proportion and the same shape back to the image area corresponding to the target object.

Further, after the blurring processing is performed on each target object included in each frame of video image in any of the above manners, the target image corresponding to each frame of video image can be obtained.

S104: and generating a target video after the blurring processing based on the target images respectively corresponding to each frame of video image.

After the target images corresponding to each frame of video image are obtained, the target images corresponding to each frame of video image can be merged according to the image sequence of each frame of video image in the video to be processed, so as to obtain the target video corresponding to the video to be processed and subjected to the fuzzy processing. The target video comprises each target object after the fuzzy processing.

Therefore, by identifying various types of target objects related to personal information in each frame of video image and then carrying out fuzzy processing on the target objects in each frame of video image, the removal of the sensitive information in each frame of video image can be realized, and further the target video without various sensitive information, namely the desensitized target video, is obtained, and the safety of video data is effectively improved.

In an embodiment, the target neural network includes a shared network and a plurality of branch networks, each branch network is respectively configured to identify one type of target object, and the step of identifying at least one type of target object in each frame of video image of the video to be processed using a pre-trained target neural network may be implemented according to the following steps:

s102-1: performing video decoding processing on a video to be processed through a shared network in a target neural network to obtain each frame of video image corresponding to the video to be processed; and aiming at each frame of video image, carrying out continuous down-sampling processing and up-sampling processing on the video image for multiple times.

Here, the shared network is a network that processes the video to be processed and each frame of video image included in the video, and is used to preliminarily extract the feature information in each frame of video image. After any video to be processed is acquired, the video to be processed needs to be processed by using the shared network, and then the result of the processing of the shared network is further processed by using the branch network in the target neural network.

The shared network can comprise two parts, one part is a down-sampling network and the other part is an up-sampling network, wherein the down-sampling network is used for performing down-sampling processing on the video image, and the up-sampling network is used for performing up-sampling processing on a sampling result of the down-sampling network. The up-sampling network and the down-sampling network can both comprise a plurality of sampling layers, the number of the sampling layers of the up-sampling network and the down-sampling network is consistent, and the depth, the accuracy and the integrity of the characteristic information acquired by different sampling layers are different. In a specific implementation, the down-sampling network may be a residual network RetNet, and the up-sampling network may be a Feature Pyramid network Feature Pyramid Net. Alternatively, the downsampling network and the plurality of sampling layers in the downsampling network may be a plurality of convolutional layers having different convolutional kernels.

Different branch networks are respectively used for identifying different types of target objects, one branch network can only identify one type of target object, and the number of the branch networks included in the target neural network can be determined according to the number of the object types of the target objects required to be identified by the target neural network. For example, when the target neural network needs to recognize target objects of two types, namely a face type and a license plate type, the branch network included in the target neural network may be a face recognition branch network and a license plate recognition branch network.

In specific implementation, after the video to be processed is obtained, the video to be processed may be subjected to video decoding processing by using the shared network, so as to obtain each frame of video image corresponding to the video to be processed. Then, for each frame of video image, a plurality of sampling layers in the downsampling network can be used for carrying out continuous downsampling processing on each frame of video image for a plurality of times, and then a plurality of sampling layers in the upsampling network are used for carrying out continuous upsampling processing on the result of the last downsampling for a plurality of times, so that the result after the sampling processing corresponding to the video image is obtained.

S102-2: determining at least one type of target object included in the video image based on a result of the sampling process through a plurality of branch networks in the target neural network.

In this step, each branch network may be used to process the result after the sampling processing, so as to obtain the output of each branch network, and the output of each branch network is used as each target object included in the video image.

Here, since different branch networks may output target objects having different object types, if information related to an object type that can be recognized by a certain branch network is not included in a video image, when a result of the above-described sampling process corresponding to the video image is processed by the branch network, output of information may not be performed or error information may be output. Thus, the target objects in the video image are determined directly using the outputs of the respective branch networks, and the resulting target objects may include at least one type.

Or, for S102-2, after obtaining the result after the sampling processing, first, based on the result after the sampling processing, at least one target branch network for processing the result is selected from the plurality of branch networks, and then, the result after the sampling processing is processed by using the determined target branch network, so as to determine at least one type of target object included in the video image.

In specific implementation, the target branch network may be determined as follows:

in the first mode, based on the processed result, each object type corresponding to each target object included in the video image is determined, and a branch network corresponding to each object type is selected as each target branch network.

In the second method, in the process of executing the above S102-1, for each frame of video image, the video image may be subjected to a plurality of successive downsampling processes by using a plurality of sampling layers in the downsampling network to obtain a first result that the video image corresponds to the downsampling processes, and the video image may be subjected to a plurality of successive upsampling processes by using a plurality of sampling layers in the upsampling network to obtain a second result that the video image corresponds to the upsampling processes. Then, based on the first result, at least one target branch network may be selected from the plurality of branch networks, and based on the second result, at least one target branch network may be selected from the plurality of branch networks, and further, the target branch network corresponding to the first result and the target branch network corresponding to the second result may be used as the final determination target branch networks.

In the process of executing the step S102-1, after each frame of video image is processed by using a down-sampling network to obtain the first result, the first result and the video image are combined and input to the up-sampling network, and the combined first result and the video image are subjected to up-sampling processing continuously for multiple times by using the up-sampling network to obtain a final result after the sampling processing; and finally, selecting at least one target branch network from the plurality of branch networks by using the result after the sampling processing.

After each target branch network is determined, the result after sampling processing may be input to each target branch network, and the result after sampling processing is processed by each target branch network, so as to obtain each target object output by each target branch network and matched with the identification type corresponding to the branch network, thereby obtaining each target object of each type included in the video image.

Or after each target branch network is determined, for each target branch network, the result related to the identification type corresponding to the target branch network in the results after sampling processing may be input to the target branch network and processed, so as to obtain each target object output by the target branch network and matched with the identification type corresponding to the target branch network. Finally, each target object of each type included in the video image is determined based on each target object of each type output by each target branch network, respectively.

In one embodiment, the step of performing the downsampling process and the upsampling process for each frame of the video image in S102-1 may be implemented as follows:

s102-1-1: and aiming at each frame of video image, continuously and repeatedly carrying out downsampling processing on the video image to respectively obtain image characteristic information corresponding to each downsampling processing.

S102-1-2: and performing continuous up-sampling processing on the image characteristic information obtained by the last down-sampling processing for multiple times to respectively obtain initial category information corresponding to each up-sampling processing and initial detection frame information corresponding to the initial category information.

The input information of the next downsampling processing in the downsampling processing of a plurality of times is the image characteristic information obtained by the previous downsampling processing, wherein the input information of the first downsampling processing is a video image. The total times of continuous multiple times of upsampling processing is the same as the total times of continuous multiple times of downsampling processing, and the input information of the next upsampling processing in the continuous multiple times of upsampling processing is initial category information, initial detection frame information and image characteristic information obtained by downsampling processing matched with the previous upsampling processing, wherein the downsampling processing and the upsampling processing are matched by: the sum of the sampling order bits corresponding to the upsampling process and the sampling order bits corresponding to the downsampling process is equal to the total number of times plus 1.

The result after the sampling processing in S102-1 is the initial category information corresponding to each upsampling processing and the initial detection frame information corresponding to the initial category information.

The initial category information is used for representing the object type of a target object in a video image identified by a shared network, and the initial category information corresponding to each upsampling process can be the same or different; the initial detection frame information is used to reflect initial predicted position information of each target object corresponding to the initial category information. For example, the initial category information may include target objects of two face types and a target object of one license plate type, and the initial detection frame information may include initial detection frames corresponding to the target objects of the two face types and an initial detection frame corresponding to the target object of one license plate type.

As shown in fig. 2, for a schematic structural diagram of a shared network provided in an embodiment of the present disclosure, a down-sampling network in the shared network shown in fig. 2 includes three down-sampling layers: a first downsampling layer corresponding to the first sampling order bits, a second downsampling layer corresponding to the second sampling order bits, and a third downsampling layer corresponding to the third sampling order bits; the upsampling network also includes three upsampling layers: a first upsampling layer corresponding to the first sampling order bits, a second upsampling layer corresponding to the second sampling order bits, and a third upsampling layer corresponding to the third sampling order bits. However, fig. 2 is only an example, and the number of sampling layers included in the up-down sampling network may be set according to actual sampling requirements, and is not limited herein.

The above-mentioned S102-1-1 and S102-1-2 are explained in detail by the downsampling network shown in fig. 2 as follows:

in specific implementation, for each frame of video image, the frame of video image may be input to a first downsampling layer in a downsampling network, and the downsampling layer is used to perform downsampling on the frame of video image, so as to extract image characteristic information, such as color characteristic information and texture characteristic information, of the video image corresponding to the first downsampling layer.

Then, the image feature information extracted by the first downsampling layer is input into a second downsampling layer, the image feature information is further downsampled by the second downsampling layer, and the image feature information of the video image corresponding to the second downsampling layer is extracted.

Then, inputting the image characteristic information extracted by the second down-sampling layer into a third down-sampling layer, and further performing down-sampling processing on the image characteristic information by using the third down-sampling layer to extract the image characteristic information of the video image corresponding to the third down-sampling layer. That is, the video image is subjected to downsampling processing three times in succession, and the total number of downsampling processing is 3.

Further, the image feature information obtained by the last downsampling process, that is, the image feature information of the extracted video image corresponding to the third downsampling layer at the third sampling order position, may be input to the first upsampling layer at the first sampling order position in the upsampling network, and the image feature information is upsampled by using the first upsampling layer, so as to obtain the initial category information corresponding to the first upsampling process, and the initial detection frame information corresponding to the initial category information.

Then, the initial category information corresponding to the first upsampling process, the initial detection frame information corresponding to the initial category information, and the image feature information obtained by the downsampling process matched with the second upsampling process of the second sampling sequence order (namely, the image feature information obtained by sampling the second downsampling layer of the second sampling sequence order) are fused, the fused results are input into the second upsampling layer positioned at the second sampling sequence order together, the second upsampling layer is used for upsampling the fused results, and the obtained initial category information corresponding to the second upsampling process and the initial detection frame information corresponding to the initial category information are obtained.

Then, the initial category information corresponding to the second upsampling process, the initial detection frame information corresponding to the initial category information, and the image feature information obtained by the downsampling process matched with the third upsampling process of the third sampling sequence order (namely, the image feature information obtained by the first downsampling layer of the first sampling sequence order) are fused, the fused results are input into a third upsampling layer located at the third sampling sequence order together, the third upsampling layer is used for upsampling the fused results, and the obtained initial category information corresponding to the third upsampling process and the initial detection frame information corresponding to the initial category information are obtained. Here, the total number of upsampling is also 3.

Further, in the step of selecting the target branch network, after the initial category information corresponding to each upsampling process is obtained, each object type corresponding to the initial category information may be determined according to the initial category information, and the target branch network capable of identifying each object type may be selected from the plurality of branch networks based on each object type. For example, under the condition that the object types included in the initial category information corresponding to the primary up-sampling processing are the face type and the vehicle type, the determined target branch network may be a face recognition branch network and a license plate recognition branch network; under the condition that the object type included in the initial category information corresponding to the primary up-sampling processing is only the face type, the determined target branch network can be a face identification branch network.

Further, based on the initial category information corresponding to each upsampling process, the target branch network corresponding to each upsampling process can be determined.

In an embodiment, for the above S102-2, the following steps may be performed:

s102-2-1: and performing continuous feature extraction on the initial class information corresponding to each upsampling process for multiple times by using a target branch network matched with the initial class information in the multiple branch networks to obtain target class information, and performing continuous feature extraction on the initial detection frame information corresponding to the upsampling process for multiple times to obtain target detection frame information.

Here, the target branch networks matched with the initial category information are the respective target branch networks determined by the method provided in the above embodiment.

The target category information is detail information which is output by the target branch network and used for representing target objects which are matched with the object types which can be identified by the target branch network in the video image, and can represent whether the target objects which are matched with the object types which can be identified by the target branch network exist in the video image. For example, in the case that the target branch network is a face recognition branch network, the target class information may be detail information of each face output by the face recognition branch network, such as a specific pixel position of a face contour.

The target detection frame information is the final detection frame of each target object corresponding to the target category information.

Each branch network may be divided into two parts, one part is a category information extraction network for processing initial category information, the other part is a detection frame information extraction network for processing initial detection frame information, both the two parts are networks including a plurality of convolutional layers, and the number of convolutional layers included in the two parts of networks is the same.

In specific implementation, for the initial category information and the initial detection frame information obtained by each upsampling process, the initial category information and the initial detection frame information may be processed by using each determined target branch network corresponding to the upsampling process. For example, in a case where the target branch network corresponding to the upsampling process includes two face recognition branch networks and a license plate recognition branch network, the initial category information and the initial detection frame information corresponding to the upsampling process may be input to the face recognition branch network and to the license plate recognition branch network; processing the initial category information and the initial detection frame information by using a face recognition branch network to obtain target category information and target detection frame information corresponding to the face recognition branch network; and processing the initial category information and the initial detection frame information by using the license plate recognition branch network to obtain target category information and target detection frame information corresponding to the license plate recognition branch network.

In this way, for each initial category information and initial detection frame information obtained by the upsampling process, the target category information and target detection frame information corresponding to the upsampling process can be determined by using each target branch network corresponding to the upsampling process.

The process of determining the target category information and the target detection frame information by using each target branch network may specifically be:

for the input initial category information, the category information in the target branch network can be utilized to extract a first convolutional layer in a plurality of convolutional layers in the network to perform feature extraction on the input initial category information, the result of the first extraction is input to a second convolutional layer to obtain the result of the second extraction, the result of the second extraction is input to a next convolutional layer to obtain the result of a new extraction, and the like until the result of the last convolutional layer extraction is obtained, and the result is used as the obtained target category information. Similarly, for the input initial detection frame information, a first convolution layer in a plurality of convolution layers in the target branch network can be extracted by using the detection frame information in the target branch network to perform feature extraction on the input initial detection frame information, the result of the first extraction is input into a second convolution layer to obtain the result of the second extraction, the result of the second extraction is input into the next convolution layer to obtain the result of the new extraction, and so on until the result of the last convolution layer extraction is obtained, and the result is used as the obtained target detection frame information.

S102-2-2: and determining at least one type of target object included in the video image based on the obtained target class information and the target detection frame information corresponding to each upsampling process.

In specific implementation, the target category information and the target detection frame information corresponding to each upsampling process may be merged to obtain final target category information and target detection frame information corresponding to the video image, an object type corresponding to the final target category information is used as an object type of a target object included in the video image, and each object in a detection frame corresponding to the final target detection frame information is used as each target object included in the video image.

In one embodiment, for S102-2-2, the following steps may be performed:

s102-2-2-1: and determining the position information of each target object corresponding to each upsampling process based on the target class information and the target detection frame information corresponding to each upsampling process.

Here, the target detection frame information is a detection frame, and may be specifically a rectangular detection frame. The position information of the target object may include the top left vertex of the detection frame corresponding to the target object, the corresponding pixel position information in the video image, and the length and width information of the detection frame, or may also include the corresponding pixel position information in the video image for each vertex of the detection frame corresponding to the target object.

In specific implementation, for the target class information and the target detection frame information corresponding to each upsampling process, the number of object types included in the target class information corresponding to the upsampling process at this time may be determined, and then the detection frame of each target object corresponding to the upsampling process at this time may be determined according to detail information included in the target class information, and further, the pixel position information and the length and width information corresponding to the vertex of the detection frame of each target object corresponding to the upsampling process at this time may be used as the position information of each target object corresponding to the upsampling process at this time.

S102-2-2-2: based on the position information, it is determined whether there is an overlap in the positions of the plurality of target objects corresponding to the plurality of times of upsampling processing.

Here, the position information determined in S102-2-2-1 includes position information of each target object corresponding to each upsampling process, and since there may be a partial overlap between the target class information and the target detection frame information corresponding to each upsampling process, that is, the target class information and the target detection frame information corresponding to each upsampling process may correspond to the same target object. Therefore, after the position information of each target object is obtained, whether the positions of the plurality of target objects corresponding to the plurality of times of upsampling processing are overlapped or not can be determined according to the position corresponding to each position information.

S102-2-2-3: and in response to the position overlapping, determining the confidence degrees corresponding to the multiple target objects with overlapped positions respectively, and taking the target object with the highest confidence degree as the final target object.

In specific implementation, a plurality of target objects with overlapped positions are found out based on each piece of position information, and for each target object in the plurality of target objects with overlapped positions, a confidence degree corresponding to each target object can be determined in advance. The confidence coefficient may be a confidence coefficient of target category information corresponding to the target object, or a confidence coefficient of target detection frame information corresponding to the target object; alternatively, the confidence may be an average of the confidence values corresponding to the two confidences.

Finally, according to the confidence degree corresponding to each target object in the multiple target objects with overlapping positions, the target object with the highest confidence degree is used as the final target object at the overlapping position, and other target objects are deleted.

S102-2-2-4: and taking each determined final target object and each target object without position overlapping as at least one type of target object included in the video image.

In a specific implementation, the determined final target objects and the target objects corresponding to each upsampling process that do not have overlapping positions may be directly used as at least one type of target object included in the video image.

As shown in fig. 3, a schematic diagram for processing a frame of video image in a video to be processed according to an embodiment of the present disclosure is provided, where fig. 3 includes two branch networks, a face recognition branch network and a vehicle recognition branch network, the two branch networks have the same number of convolutional layers, x 4 in fig. 3 represents that 4 convolutional layers (which are an example) pass through, after initial category information and initial detection frame information corresponding to each upsampling process are output by using a shared network in a target neural network, they may be input into different branch networks, as shown in fig. 3, initial category information 1 and initial detection frame information 1 corresponding to a first upsampling process may be input into the face recognition branch network, initial category information 2 and initial detection frame information 2 corresponding to a second upsampling process may be input into a license plate recognition branch network, inputting the initial category information 3 and the initial detection frame information 3 corresponding to the third upsampling into a face recognition branch network and a license plate recognition branch network; then, the input initial category information and initial detection frame information are processed respectively by using the category information extraction network and the detection frame information extraction network in each of the identification branch networks. Here, since the initial class information and the initial detection frame information input to the face recognition branch network are the results of the first upsampling process and the third upsampling process, the target class information output by the face recognition branch network in fig. 3 includes the target class information 1 corresponding to the initial class information 1 in the first upsampling process and the target class information 3 corresponding to the initial class information 3 in the third upsampling process, and similarly, the target detection frame information output by the face recognition branch network in fig. 3 also includes the target detection frame information 1 corresponding to the first upsampling process and the target detection frame information 3 corresponding to the third upsampling process. The license plate recognition branch network in fig. 3 outputs only the target class information 2 and the target detection frame information 2 corresponding to the second upsampling process. Finally, at least one type of target object included in the video image may be determined based on the output target category information 1, 2, 3 and the target detection frame information 1, 2, 3.

In one implementation, the target neural network mentioned in the video processing method provided by the embodiment of the present disclosure may also include a plurality of target neural networks, and each target neural network is used for identifying one type of target object in the video to be processed. For example, a facial neural network is used for identifying a face in each frame of video image, and a license plate neural network is used for identifying a license plate in each frame of video image. In specific implementation, the video to be processed can be input into the target neural network corresponding to the object type according to the object type of the target object needing the fuzzy processing, and then the target object identified by the target neural network is subjected to the fuzzy processing, so that the target video after the fuzzy processing is obtained. The specific identification process of the target neural network for identifying each type of target object may refer to the identification process of the corresponding branch network in each embodiment, which is not described in detail herein.

In one embodiment, for S103, the following steps may be performed:

s103-1: in response to identifying a plurality of target objects in the video image, an initial sub-image corresponding to each target object is scratched from the video image.

In specific implementation, for each frame of video image, in a case that it is determined that a plurality of target objects are identified from the frame of video image, in response to identifying a plurality of target objects in the video image, an object Identity (ID) may be assigned to each target object; and then according to the position information corresponding to each target object, deducting the initial sub-image corresponding to each target object from the video image, and taking the object identifier corresponding to each target object as the image identifier of the corresponding initial sub-image.

In an embodiment, in the step of capturing the initial sub-images corresponding to the target objects from the video image according to the position information corresponding to each target object, the image area corresponding to the detection frame of the target object can be directly deducted to be used as the initial sub-image of the target object; or, the detection frame of the target object may be scaled according to a preset scaling ratio, and the image area corresponding to the scaled detection frame is deducted to be used as the initial sub-image of the target object.

In one embodiment, even if only one target object is identified from the video image, an initial sub-image corresponding to the target object can be deducted from the video image based on the position information corresponding to the target object.

S103-2: and carrying out fuzzy processing on each initial sub-image to obtain a target sub-image corresponding to each initial sub-image.

In specific implementation, each initial sub-image deducted from the video image may be subjected to blurring processing to obtain a target sub-image corresponding to each initial sub-image, and an image identifier corresponding to each initial sub-image is used as an image identifier corresponding to the initial sub-image.

S103-3: and replacing the corresponding initial sub-image in the video image by each target sub-image to obtain a target image.

In specific implementation, the target object corresponding to each target sub-image may be determined according to the image identifier corresponding to each target sub-image and the object identifier corresponding to each target object, and then the target sub-image is used to replace the initial sub-image corresponding to the target object in the video image. Furthermore, the replacement of the initial sub-image corresponding to each target object in the video image can be realized based on the image identifier of the initial sub-image corresponding to each target sub-image and the object identifier corresponding to the target object, so as to obtain the target image.

In one embodiment, for any initial sub-image, the blurring is performed on the initial sub-image according to the following steps:

step one, dividing an initial sub-image into a plurality of processing areas.

In this step, the initial sub-image may be equally divided into a plurality of processing regions having the same size according to the image size corresponding to the initial sub-image, for example, when the image size corresponding to the initial sub-image is 100 × 80, the initial sub-image may be divided into 100 processing regions, and the region size of each processing region is 10 × 8. Alternatively, the initial sub-image may be divided into the processing areas of the number of processing areas according to the preset number of processing areas. Or, the dividing manner may also be determined according to the object type of the target object, for example, when the object type of the target object is a face type, the initial sub-image may be divided into a plurality of processing areas according to the position information of the facial features; and under the condition that the object type of the target object is the license plate type, dividing the initial sub-image into a plurality of processing areas with the same size.

And step two, determining a target pixel value corresponding to each processing area based on the pixel value of each pixel point in each processing area.

In an embodiment, for each divided processing region, a pixel mean value corresponding to the processing region may be determined based on pixel values corresponding to respective pixel points in the processing region, and the determined pixel mean value is used as a target pixel value of the processing region.

In another embodiment, for each divided processing region, a pixel extremum corresponding to the processing region may also be determined based on pixel values corresponding to respective pixel points in the processing region, and the determined pixel extremum is used as a target pixel value of the processing region.

Or, for each divided processing region, one pixel value may be randomly selected from the pixel values corresponding to the respective pixel points in the processing region as a target pixel value.

And step three, replacing the pixel values corresponding to the pixel points in each processing area with the determined target pixel values to obtain the target sub-images corresponding to the initial sub-images.

In specific implementation, for each processing region, the pixel value corresponding to each pixel point in the processing region may be replaced with the determined target pixel value corresponding to the processing region, and after the pixel value replacement for each processing region is completed, the target sub-image is obtained. Fig. 4 is a schematic diagram illustrating a comparison between an initial sub-image and a target sub-image provided in an embodiment of the present disclosure.

In an embodiment, in the case that the video to be processed is a video shot in a road environment, since most target objects appearing in the video shot in the road environment are faces of pedestrians or license plates of other vehicles, and the faces, the license plates, and the like belong to sensitive information requiring desensitization processing, for the video shot in the road environment, the object types of the target objects include a face type and a license plate type.

For example, the to-be-processed video may be a driving video uploaded by a user and shot by a driving recorder, and the object type of the target object may include: face type, license plate type.

Of course, besides the above object types, various object types corresponding to target objects that may appear in the video may also be included, such as a target animal type, a target building type, a target object type, and the like, and the embodiments of the present disclosure are not particularly limited with respect to a specific object type.

In addition, the embodiment of the present disclosure further provides a method for training a target neural network by using a plurality of sample images, specifically, a plurality of sample images need to be obtained first, where each sample image includes at least one type of sample object, and the number of sample objects included in different sample images may be different.

Then, the sample image may be input to a target neural network to be trained, and the sample image is identified by using a shared network in the target neural network to be trained, so as to determine initial prediction category information corresponding to each upsampling process of the sample image and initial prediction detection frame information corresponding to the initial prediction category information.

Then, for the obtained initial prediction category information corresponding to each upsampling process, at least one target branch network matched with the initial prediction category information can be screened out from the multiple branch networks, the screened target branch network is used for processing the initial prediction category information and the initial prediction detection frame information corresponding to the initial prediction category information, and the target prediction category information and the target prediction detection frame information corresponding to the upsampling process are determined. Furthermore, the predicted sample object included in the sample image may be determined based on the target prediction type information and the target prediction detection frame information corresponding to each upsampling process.

Finally, a first loss value corresponding to the shared network can be determined according to initial prediction category information and initial prediction detection frame information output by the shared network, and standard initial category information and standard initial detection frame information; respectively determining second loss values corresponding to the branch networks according to target prediction category information and target prediction detection frame information, and standard target prediction category information and standard prediction detection frame information, which are output by each branch network when the branch network serves as a target branch network; and then, performing iterative training on the shared network by using the determined first loss value, and performing iterative training on the branch network by using a second loss value corresponding to each branch network to adjust the network parameter values of the shared network and each branch network until a preset training cut-off condition is met, thereby obtaining the trained shared network and each branch network. The preset training cutoff condition may include that the number of rounds of iterative training reaches a preset number of rounds and/or the prediction accuracy of the network obtained through training reaches a preset accuracy.

And determining a third loss value according to the standard sample object corresponding to each sample image and the predicted sample object output by the target neural network, and performing iterative training on the target neural network to be trained by using the third loss value. In addition, besides the second loss value corresponding to each branch network can be determined, a fourth loss value corresponding to each branch network can be determined according to each predicted object and the corresponding standard object output by each branch network, and each branch network is iteratively trained by using the fourth loss values.

And finally, obtaining the trained target neural network under the condition that the training of the shared network and each branch network is determined to be completed.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, a video processing apparatus corresponding to the video processing method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the video processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.

As shown in fig. 5, a schematic diagram of a video processing apparatus provided in an embodiment of the present disclosure includes:

an obtaining module 501, configured to obtain a video to be processed;

an identifying module 502, configured to identify at least one type of target object in each frame of video image of the video to be processed, where each type of target object is related to personal information;

a processing module 503, configured to perform blur processing on the identified target object in the video image to obtain a target image;

a generating module 504, configured to generate a target video after the blurring processing based on the target images respectively corresponding to each frame of the video image.

In a possible implementation, the processing module 503 is configured to, in response to identifying a plurality of the target objects in the video image, scratch an initial sub-image corresponding to each of the target objects from the video image;

In a possible implementation manner, the processing module 503 is configured to perform blurring processing on any initial sub-image according to the following steps:

dividing the initial sub-image into a plurality of processing regions;

In a possible implementation manner, the processing module 503 is configured to determine a pixel value mean value corresponding to each processing region based on pixel values of respective pixels in each processing region, and use the pixel value mean value corresponding to each processing region as the target pixel value corresponding to the processing region.

In a possible implementation manner, the processing module 503 is configured to determine a pixel value extremum corresponding to each processing region based on pixel values of respective pixel points in each processing region, and use the pixel value extremum corresponding to each processing region as the target pixel value corresponding to the processing region.

the identifying module 502 is configured to identify at least one type of target object in each frame of video image of the video to be processed by using a pre-trained target neural network, and includes:

In a possible implementation manner, the identifying module 502 is configured to perform, for each frame of the video image, continuous downsampling processing on the video image multiple times, so as to obtain image feature information corresponding to each downsampling processing; the input information of the next downsampling processing in the downsampling processing of a plurality of times is the image characteristic information obtained by the previous downsampling processing, wherein the input information of the first downsampling processing is the video image;

In a possible implementation manner, the identifying module 502 is configured to perform, by using a target branch network matched with the initial category information in the plurality of branch networks, continuous multiple feature extraction on the initial category information corresponding to each upsampling process to obtain target category information, and perform continuous multiple feature extraction on the initial detection frame information corresponding to the upsampling process to obtain target detection frame information;

In a possible implementation manner, the identifying module 502 is configured to determine, based on the target class information and the target detection frame information corresponding to each upsampling process, position information of each target object corresponding to each upsampling process;

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 6, which is a schematic structural diagram of a computer device provided in an embodiment of the present disclosure, and includes:

a processor 61 and a memory 62; the memory 62 stores machine-readable instructions executable by the processor 61, the processor 61 being configured to execute the machine-readable instructions stored in the memory 62, the processor 61 performing the following steps when the machine-readable instructions are executed by the processor 61: s101: acquiring a video to be processed; s102: identifying at least one type of target object in each frame of video image of a video to be processed; each type of target object is associated with personal information; s103: blurring the target object in the identified video image to obtain a target image and S104: and generating a target video after the blurring processing based on the target images respectively corresponding to each frame of video image.

The memory 62 includes a memory 621 and an external memory 622; the memory 621 is also referred to as an internal memory, and temporarily stores operation data in the processor 61 and data exchanged with the external memory 622 such as a hard disk, and the processor 61 exchanges data with the external memory 622 via the memory 621.

For the specific execution process of the instruction, reference may be made to the steps of the video processing method described in the embodiments of the present disclosure, and details are not described here.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the video processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the video processing method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the video processing method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implementing, and for example, a plurality of units or components may be combined, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A video processing method, comprising:

acquiring a video to be processed;

2. The method according to claim 1, wherein the blurring the identified target object in the video image to obtain a target image comprises:

3. The method according to claim 2, wherein for any of the initial sub-images, the initial sub-image is blurred according to the following steps:

dividing the initial sub-image into a plurality of processing regions;

4. The method according to claim 3, wherein the determining a target pixel value corresponding to each of the processing regions based on the pixel values corresponding to the respective pixel points in each of the processing regions comprises:

5. The method according to claim 4, wherein the determining a target pixel value corresponding to each of the processing regions based on the pixel values corresponding to the respective pixel points in each of the processing regions comprises:

6. The method according to any one of claims 1 to 5, wherein in the case that the video to be processed is a video in a photographed road environment, the object type of the target object includes a face type and a license plate type.

7. The method according to any one of claims 1 to 6, wherein the identifying at least one type of target object in each frame of video image of the video to be processed is identified by using a pre-trained target neural network, and the target neural network is a neural network which is trained by using a plurality of sample images and can be used for identifying a plurality of types of target objects.

8. The method of claim 7, wherein the target neural network comprises a shared network and a plurality of branch networks, each branch network being configured to identify a type of target object;

9. The method according to claim 8, wherein the performing down-sampling processing and up-sampling processing on the video image for each frame a plurality of times includes:

10. The method of claim 9, wherein determining at least one type of target object included in the video image based on the result of the sampling process through a plurality of branch networks in the target neural network comprises:

11. The method according to claim 10, wherein the determining at least one type of target object included in the video image based on the target class information and the target detection frame information corresponding to each obtained upsampling process comprises:

12. A video processing apparatus, comprising:

the acquisition module is used for acquiring a video to be processed;

13. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor for executing the machine-readable instructions stored in the memory, the processor performing the steps of the video processing method according to any one of claims 1 to 11 when the machine-readable instructions are executed by the processor.

14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the video processing method according to any one of claims 1 to 11.