CN113573096A

CN113573096A - Video processing method, video processing device, electronic equipment and medium

Info

Publication number: CN113573096A
Application number: CN202110757078.8A
Authority: CN
Inventors: 王俊翔
Original assignee: Vivo Mobile Communication Hangzhou Co Ltd
Current assignee: Vivo Mobile Communication Hangzhou Co Ltd
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-10-29

Abstract

The application discloses a video processing method, a video processing device, electronic equipment and media, and belongs to the technical field of camera shooting. The method comprises the following steps: displaying a sound control identification of at least one video object in the target video; receiving a first input of a first sound control identifier from a user, wherein the first sound control identifier is a sound control identifier of a first video object in the at least one video object; in response to the first input, performing a first process on sound data of the first video object.

Description

Video processing method, video processing device, electronic equipment and medium

Technical Field

The application belongs to the technical field of camera shooting, and particularly relates to a video processing method and device, electronic equipment and a medium.

Background

With the continuous development of electronic technology, electronic devices such as mobile phones have more and more functions.

However, in the related art, when a user takes or records a video through an electronic device such as a mobile phone, all sounds that can be received by the electronic device such as the mobile phone are recorded in the video, so that the problem that the recorded video may have noisy sound of the video may occur, and the video with the sound required by the user cannot be obtained.

Disclosure of Invention

An object of the embodiments of the present application is to provide a video processing method, an apparatus, an electronic device, and a medium, which can solve the problem that a video with a sound required by a user cannot be obtained in a manner of recording a video by an electronic device such as a mobile phone.

In a first aspect, an embodiment of the present application provides a video processing method, where the method includes: displaying a sound control identification of at least one video object in the target video; receiving a first input of a first sound control identifier from a user, wherein the first sound control identifier is a sound control identifier of a first video object in the at least one video object; in response to the first input, performing a first process on sound data of the first video object.

In a second aspect, an embodiment of the present application provides a video processing apparatus, including: the display module is used for displaying the sound control identification of at least one video object in the target video; the receiving module is used for receiving a first input of a first sound control identifier from a user, wherein the first sound control identifier is the sound control identifier of a first video object in the at least one video object; a response module to perform a first process on sound data of the first video object in response to the first input.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, the electronic device receives the input of the sound control identifier of the video object displayed in the target video from the user and responds to the input to process the sound data of the video object, that is, the user can process the sound data of the video object by inputting the sound control identifier of any video object in the target video, so as to obtain the video with sound required by the user, thereby meeting the user requirements.

Drawings

Fig. 1 is a schematic flowchart of a video processing method according to an embodiment of the present application.

Fig. 2a is a schematic view of an album playing interface provided in the embodiment of the present application.

Fig. 2b is a schematic view of a video playing interface provided in the embodiment of the present application.

Fig. 2c is a second schematic view of a video playing interface provided in the embodiment of the present application.

Fig. 2d is a third schematic view of a video playing interface provided in the embodiment of the present application.

Fig. 2e is a fourth schematic view of a video playing interface provided in the embodiment of the present application.

Fig. 2f is a fifth schematic view of a video playing interface provided in the embodiment of the present application.

Fig. 3 is a second flowchart of a video processing method according to an embodiment of the present application.

Fig. 4a is a schematic view of a video recording interface according to an embodiment of the present application.

Fig. 4b is a second schematic view of a video recording interface according to an embodiment of the present application.

Fig. 4c is a third schematic view of a video recording interface according to an embodiment of the present application.

Fig. 4d is a fourth schematic view of a video recording interface according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 7 is a second schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

As shown in fig. 1, a flow chart of a video processing method 100 provided in an exemplary embodiment of the present application is schematically illustrated, where the method 100 may be executed by, but is not limited to, an electronic device, and specifically may be executed by software and/or hardware installed in the electronic device. The method 100 may include at least the following steps.

Step 110, displaying the sound control identification of at least one video object in the target video.

According to different video processing scenes, such as a video early recording scene, a video playing scene, a video later editing scene and the like, the target video can be a real-time video when a user takes or records a video through electronic equipment such as a mobile phone, can also be a video which is stored in a position such as an album after taking or recording, and can also be a video which is downloaded by the user from a website and the like.

For example, assuming that the target video is a video saved in an album, as shown in fig. 2a, the user may open the album, select a target video that needs video processing, such as the first video shown in fig. 2a, click "sound edit" shown in fig. 2b, and then display a sound control identifier 0 of at least one video object in the target video on the video playing interface, as shown in fig. 2 b.

It should be noted that the identifiers (including the sound control identifier, the interception identifier, the sounding identifier, etc.) mentioned in this embodiment may be words, symbols, images, etc. for indicating information (such as video object information), and a control or other container may be used as a carrier for displaying information, including but not limited to a word identifier, a symbol identifier, an image identifier (such as the trumpet shown in fig. 2 b), etc. In the case that the sound control identifier is an image identifier, the image identifier may be generated according to a video object included in the target video, for example, please refer to fig. 2a again, when the video object is a puppy, the sound control identifier may be a cartoon image of the puppy generated according to an image of the puppy, and when the video object is a kitten, the sound control identifier may be a cartoon image of the kitten generated according to an image of the kitten.

In addition, in this embodiment, one video object may be individually provided with one sound control flag, and each sound control flag is used for individually controlling the corresponding video object. Optionally, for a target video including multiple video objects, the sound control identifiers corresponding to the video objects may be the same (as shown in fig. 2 b) or different, which is not limited herein.

Of course, referring to fig. 2b again, the video objects refer to objects capable of generating sound in the target video, such as a kitten (cat)201, a puppy (dog)202, and a child (kid)203 shown in fig. 2a, in this case, considering that sounds generated by different video objects (such as a puppy, a kitten, a child, a vehicle, a bird, etc.) may be different, the electronic device may perform sound separation processing on sound data of all video objects included in the target video before displaying the sound control identifier of at least one video object in the target video, and generate the sound control identifier of each video object based on the sound data obtained after the sound separation processing, thereby enabling a user to process sound data that does not satisfy their own needs based on the separated sound data and in combination with their own needs; meanwhile, through the generation and display of the sound control identification, the user can further quickly, intuitively and accurately select the sound data of the video object to be processed based on different sound control identifications.

In one implementation, the electronic device may perform separation processing on the sound of the video object included in the target video based on an Artificial Intelligence (AI) algorithm and in combination with at least one of the tone, the volume (the size of the sound), and the pitch (the frequency of the sound) of different video objects, so as to obtain sound data corresponding to different video objects.

In another implementation manner, when performing sound separation processing, a sound control identifier of each video object may be further associated with a sound production starting time point of each video object, so that in the target video playing process, for a target video object in the target video, under the condition that a video playing time point of the target video is the same as a sound production starting time point of the target video object, the sound control identifier of the target video object is displayed in a target area of a video playing interface of the target video; the target area is an identification display area associated with the target video object, and the target video object is any video object of the target video, and it can be understood that the identification display area may be set by a user, and the like, which is not limited herein.

In this implementation, the sound control identifier and the sound production start time point of the video object are processed in a correlated manner, that is, the sound control identifier of different video objects is fixed at the position where the sound appears in the target video, for example, when a bird song appears in the 10 th minute (or 10 minutes to 20 minutes) of the target video, the sound control identifier corresponding to the bird song can be fixed in the 10 th minute of the video, and thus, in the process of playing the video, the sound control identifier of the video object can be displayed only when the video object starts to produce the sound, but the sound control identifier of the unvoiced video object is not displayed, so that the user can process the sound data of the video object based on the displayed sound control identifier, and the simplicity of the video playing interface can be ensured.

It should be noted that, when displaying the sound control identifier, in addition to the display based on the association relationship, the sound control identifiers of all the video objects included in the target video may be displayed on the video playing interface in a form of a list or the like during the video playing process. For example, the sound control identifiers corresponding to puppies, kittens and children can be displayed on the video playing interface all the time, so that a user can compare different sound data, and then process the sound data needing to be processed at any time, and the sound data of different video objects can be efficiently processed.

Step 120, a first input of a first voice control identity from a user is received.

Wherein the first sound control identification is a sound control identification of a first video object of the at least one video object. It is to be understood that the first video object is any one of at least one video object, such as a puppy or a kitten as shown in fig. 2 b.

Step 130, in response to the first input, performing a first process on the sound data of the first video object.

Wherein the first processing may include at least one of playing, deleting, dividing, storing, filtering, special effects processing, and the like. In this embodiment, "play" may be understood as: playing the sound data of the first video object separately or repeatedly; "delete" can be understood as: deleting sound data of the first video object; "segmentation" is to be understood as: segmenting sound data of the first video object (by time morning and evening, etc.); "storing" is to be understood as: storing sound data of a first video object; "filtration" is understood to mean: filtering sound data of the first video object or filtering other sound data except the sound data of the first video object; "special effects processing" is to be understood as: and carrying out special effect processing such as sound change, sound sharpening, volume adjustment and the like on the sound data of the first video object.

In one implementation, assuming that the first processing is playing, a user may click different sound control identifiers to lock a plurality of first video objects before or during playing of a video, that is, play or not play sound of the first video objects during playing. For example, the user may sequentially click on the sound control identifiers corresponding to the child and the puppy shown in fig. 2b in the order of 1 and 2 to select or lock the sound data of the child and the puppy, and then click on the playing target video, so that the sound played by the video only includes the sound of the child and the puppy, or the sound played by the video only includes the sound of the puppy.

Of course, for the foregoing step 120 and step 130, depending on the difference of the first input, for example, the first input may include a plurality of sub-inputs, or the first input may be a long press, a click, a slide, and the like of the first voice control identifier by the user, and in this embodiment, the processing manner of the voice data of the first video object may be different, for example, the processing of the voice data may be implemented based on a voice track, and for example, the processing of the voice data may be implemented based on a voice control identifier, and based on this, for convenience of understanding, the following describes step 120 and step 130 given in this embodiment with reference to example 1 and example 2.

Example 1

Assuming that the first input comprises a first sub-input and a second sub-input, the electronic device may receive a first sub-input of a user to the first voice control identity; displaying a first sound track of the first video object in response to the first sub-input; and receiving a second sub-input of the user to the first sound track, and responding to the second sub-input to execute a first process to the sound data of the first video object, thereby enabling the user to selectively process the sound data of the video object directly based on the sound track of the video object, such as only processing partial sound data, and the like, and achieving accurate process to the sound data of the video object.

The first sub-input may be a click sub-input, a long-press sub-input, a slide sub-input, and the like of the first voice control identifier by the user, and correspondingly, the second sub-input may also be a click sub-input, a long-press sub-input, a slide sub-input, and the like of the first voice track by the user, which is not limited herein.

As a possible implementation manner, the interface that includes the slider sub-input at the first sub-input and displays the sound control identifier includes a first area (e.g., "1" shown in fig. 2 c) and a second area (e.g., "2" shown in fig. 2 c), where the first area is used to display the video image of the target video and at least one of the sound control identifiers; the second area is used for editing sound data of at least one video object in the target video; in this case, the electronic device may receive a slide sub input of the user on the first sound control indicator, and display the first sound track of the first video object in a case where a slide start position of the slide sub input is located on the first sound control indicator and a slide end position of the slide sub input is located in the second area, thereby enabling the user to edit the sound data in the second area while displaying the target video and the sound control indicator by separating the display of the target video and the sound control indicator and the editing of the sound data into two areas, thereby improving the processing efficiency of the sound data in the video.

For example, referring to fig. 2d in combination, when the user needs to process the sound data of the child, the sound control identifier corresponding to the child may be used as a sliding start position, and any position in the second area is used as a sliding end position, and a sliding sub-input is performed on the sound control identifier corresponding to the child, so as shown in fig. 2e, the electronic device responds to the sliding sub-input, displays the sound track corresponding to the sound data of the child in the second area, and finally enables the user to process the sound data of the child based on the sound track, such as deleting, playing, splitting, special effect processing, and the like.

In this implementation, to further improve the accuracy and convenience of sound processing, the first sound track may include a first clipping flag and a second clipping flag; the second region further comprises at least one sound data processing control; in this case, the electronic device may receive a second sub-input of the user to the first interception identifier, the second interception identifier, and the target sound data processing control, and determine a sound fragment to be processed based on display positions of the first interception identifier and the second interception identifier; and executing first processing associated with the target sound data processing control on the sound fragment to be processed.

For example, referring to fig. 2e again, the first capture mark may be "3" as shown in fig. 2e, and the second capture mark may be "4" as shown in fig. 2e, wherein the first capture mark and the second capture mark may be in the form of, but not limited to, the form as shown in fig. 2c or fig. 2d, for example, the first capture mark and the second capture mark may be cartoon marks, letter marks, etc.

In example 1, based on the input of the sound track by the user, efficient and accurate processing of the sound data of the video object in the target video is further achieved, so that the sound data in the processed video meets the user requirement.

Example 2

Assuming that the first input may include a long press, click, slide, etc. of the first voice control identity by the user, in which case the electronic device may obtain the input characteristics of the first input; performing a first process associated with an input feature of the first input on sound data of the first video object; wherein different input features are associated with different sound data processing.

It can be understood that the input feature of the first input may be understood as double-click, single-click, long-press, slide, and the like of the user on the sound control identifier, and based on this, before performing video processing, in order to implement quick processing on sound data based on the sound control identifier, different input features may be associated with different sound data processing manners, for example, if the sound data processing manner includes deletion, saving, playing, and special effect processing, and the input feature includes double-click, single-click, long-press, and slide, then the input feature "double-click" may be associated with deletion, "single-click" may be associated with saving, "long-press" may be associated with playing, "slide" may be associated with special effect processing; then, when the user double-clicks the sound control identifier (that is, the acquired input feature of the first input is double-click), the corresponding sound data is deleted, when the user single-clicks the sound control identifier (that is, the acquired input feature of the first input is single-click), the corresponding sound data is saved, when the user long-presses the sound control identifier (that is, the acquired input feature of the first input is long-press), the corresponding sound data is played, and when the user slides the sound control identifier (that is, the acquired input feature of the first input is sliding), special effect processing is performed on the corresponding sound data.

In this example 2, processing of the sound data of the video object in the target video is implemented based on the input of the sound control identifier by the user and according to the processing associated with the input feature of the input, so as to achieve the purpose of efficiently and quickly processing the sound data, so that the sound data in the processed video meets the user requirement.

It should be noted that, in this embodiment, video processing may also be implemented based on both examples 1 and 2, for example, after the user clicks "sound edit" shown in fig. 2b, the mobile phone jumps from the current interface to a video playing interface, where the video playing interface includes a first area and a second area shown in fig. 2 c. The user can control the sound corresponding to any sound label in the played target video by clicking the sound control identifier. For example, as shown in fig. 2f, the user may sequentially click on the sound control identifiers corresponding to the puppy and the child shown in fig. 2f in the order of 1 and 2 to select or lock the sound data of the child and the puppy, and then click to play the target video, so that the sound played by the video only includes the sound of the child and the puppy or the sound played by the video only includes the sound of the kitten.

If the user needs to further edit the sound in the target video in the process of playing the target video or after the target video is played, as shown in fig. 2d, the sound tag corresponding to the sound to be edited may be moved to the second area for editing. Wherein, for the sound label moved to the second area, if the sound label is the unnecessary sound, the sound control mark can be deleted directly; if the sound needs to be reserved, clicking to save; if the sound is a sound which needs to be subjected to special effect processing, as shown in fig. 2e, a sound segment which needs to be edited is determined by dragging a timeline (i.e., dragging the first capture identifier and/or the second capture identifier), and then special effect processing such as changing the sound and sharpening is performed based on the selected sound segment.

In the video processing method provided by this embodiment, the electronic device receives an input of the sound control identifier of the video object displayed in the target video from the user, and responds to the input to implement processing on the sound data of the video object, that is, the user can implement processing on the sound data of the video object by inputting the sound control identifier of any video object in the target video, so as to obtain a video with sound required by the user, thereby satisfying the user requirement.

As shown in fig. 3, a flow chart of a video processing method 300 according to an exemplary embodiment of the present application is provided, where the method 300 may be executed by, but not limited to, an electronic device, and specifically may be executed by software and/or hardware installed in the electronic device. The method 300 may include at least the following steps.

And 310, receiving a second input of the user on the video recording interface.

The second input may be a single click, a double click, a long press, etc., which is not limited herein.

In one implementation, prior to video recording, the user may click on a designated button on the video recording interface, such as an automatic speech recognition (ACV) button shown in fig. 4 a.

Step 320, in response to the second input, identifying the video object that vocalizes in the video recording interface and displaying at least one voice identifier, each voice identifier indicating a vocalized video object.

The voice identifier is used for marking the recognized video object so as to facilitate the user to select the video object based on the voice identifier. For example, referring to fig. 4b in combination, the voice identifier may be a box displayed on the periphery of the video object, and it is understood that the box may be replaced by a "dashed box, a circle" or the like, which is not limited herein.

In one implementation, referring to fig. 4b in combination, after the mobile phone responds to the second input of the user to the ACV, the ACV function is turned on, and the video object in the shooting or recording scene is detected and identified through the scene detection technology, as shown in fig. 4b, the identified video object (e.g., puppy, kitten, child, etc.) with the utterance is displayed on the video recording interface, and an utterance identifier is displayed on the periphery of the identified video object with the utterance, as shown in block 5 in fig. 4 b.

Optionally, in order to facilitate the user to clearly identify whether the video object is a kitten or a puppy, in this embodiment, in addition to the aforementioned displaying of the video object on the video recording interface, a text description may be performed on the video object, for example, an object name may be added to the video object, such as a puppy or a kitten.

Step 330, receiving a third input of the second video object from the user.

Wherein the second video object is a video object indicated by one of the at least one voice identifier. The third input may be a single click, a double click, a long press, etc., without limitation.

For example, assuming that the recognized video objects of the vocalization in the video recording scene include a puppy, a kitten, and a child as shown in fig. 4b, and the user only needs the sound of the puppy, a second input may be made to the vocalization identifier of the puppy as shown in fig. 4c, such as pressing the vocalization identifier for a long time, and the puppy may be used as (or locked to) the second video object.

In addition, the user may select or lock a plurality of different types of sound combinations, that is, the user may simultaneously perform a third input on one or more second video objects to select or lock a plurality of different types of sound data, and then perform a subsequent second process, such as at least one of playing, deleting, segmenting, storing, filtering, and special effect processing, on the sound data of different combinations. For example, the user may perform a third input to both a kitten and a puppy as shown in fig. 4d to lock in two different types of sound data.

Step 340, responding to the third input, and executing second processing on the sound data of the target video object of the target video in the process of recording the target video.

Wherein the target video object is the second video object, or a video object other than the second video object in the at least one video object; the second processing comprises at least one of playing, deleting, dividing, storing, filtering and special effect processing.

For example, if the target video that the user needs to record only includes the sound data of a puppy, then the puppy is locked as the second video object through steps 310 to 330, and based on this, in the video recording or shooting process, the electronic device may perform separation processing on the sound data in the shot target video in real time, and delete data other than the sound data of the puppy based on the separated sound data, thereby achieving the purpose of performing real-time processing on the sound data of the target video in the recording or shooting process, so that the target video obtained after the recording is more in line with the user requirements, such as a video with the sound required by the user.

It is understood that the implementation process of sound data separation can refer to the related description in the foregoing step 350, and is not repeated herein for avoiding repetition.

It should be noted that, in the present embodiment, through steps 310 to 340, before recording or shooting a video, the video object to be processed may be locked in advance through video object detection and identification, so as to process (e.g., filter, etc.) the sound data of the locked video object in real time during the recording or shooting process, so that the sound data included in the recorded target video better meets the user requirements.

Step 350, displaying the sound control identification of at least one video object in the target video.

Step 360, receiving a first input of a first voice control identifier from a user, where the first voice control identifier is a voice control identifier of a first video object in the at least one video object.

Step 370, in response to the first input, performing a first process on sound data of the first video object.

The implementation processes of steps 350 to 370 may refer to the related descriptions in method embodiment 100, and in order to avoid repetition, this embodiment is not described herein again. In addition, in an implementation manner, the video processing procedure may include only a part of the foregoing steps 310 to 370, such as only steps 310 to 340, which is not limited in this application.

In the embodiment, the sound data in the target video is processed in two stages (including the pre-processing during recording the target video and the post-processing after the recording is completed), so that the problems that all sounds received by electronic equipment such as a mobile phone are all recorded in the target video in the video shooting or recording process, the video sound is noisy and the like can be further solved, and the video of the sound required by a user is obtained, so that the user requirements can be met.

It should be noted that, in the video processing method provided in the embodiment of the present application, the execution subject may be a video processing apparatus, or a control module in the video processing apparatus for executing the video processing method. In the embodiment of the present application, a video processing apparatus executing a video processing method is taken as an example, and the video processing apparatus provided in the embodiment of the present application is described.

As shown in fig. 5, a schematic structural diagram of a video processing apparatus 500 according to an exemplary embodiment of the present application is provided, where the apparatus 500 includes: a display module 510, configured to display a sound control identifier of at least one video object in a target video; a receiving module 520, configured to receive a first input of a first voice control identifier from a user, where the first voice control identifier is a voice control identifier of a first video object in the at least one video object; a response module 530 for performing a first process on the sound data of the first video object in response to the first input.

In one possible implementation, the first processing includes at least one of: playing, deleting, segmenting, storing, filtering and specially processing.

In another possible implementation manner, the receiving module 520 is configured to receive a first sub-input of a first voice control identifier from a user; the display module 510, configured to display a first sound track of the first video object in response to the first sub-input; the receiving module 520 is configured to receive a second sub-input of the first sound track from the user; the response module 530 is configured to perform a first process on the sound data of the first video object in response to the second sub-input.

In another possible implementation manner, the interface for displaying the sound control identifier includes a first area and a second area, where the first area is used for displaying a video image of the target video and at least one sound control identifier; the second area is used for editing sound data of at least one video object in the target video; the receiving module 520 is configured to receive a sliding sub-input of a user on the first voice control identifier; the display module 510 is configured to display the first sound track of the first video object when the sliding start position of the sliding sub-input is located on the first sound control identifier and the sliding end position of the sliding sub-input is located in the second area.

In another possible implementation manner, the first sound track includes a first interception flag and a second interception flag; the second region further comprises at least one sound data processing control; the receiving module 520 is configured to receive a second sub-input of the first interception identifier, the second interception identifier, and the target sound data processing control from the user; the response module 530 is configured to determine a sound fragment to be processed based on the display positions of the first interception identifier and the second interception identifier; and executing first processing associated with the target sound data processing control on the sound fragment to be processed.

In another possible implementation manner, the response module 530 is configured to obtain an input characteristic of the first input; and performing a first process associated with an input feature of the first input on sound data of the first video object; wherein different input features are associated with different sound data processing.

In another possible implementation manner, the apparatus further includes: the sound separation module is used for carrying out sound separation processing on sound data of all video objects included in the target video; and generating a sound control identifier of each video object based on the sound data obtained after sound separation.

In another possible implementation, the sound control identifier of each video object is associated with the sound production starting time point of each video object; the display module 510 is further configured to, in a target video playing process, for a target video object in the target video, display a sound control identifier of the target video object in a target area of a video playing interface of the target video under a condition that a video playing time point of the target video is the same as a sound production starting time point of the target video object; the target area is an identification display area associated with the target video object, and the target video object is any video object of the target video.

In another possible implementation manner, before recording the target video, the receiving module 520 is further configured to receive a second input of the user on the video recording interface; the response module 530 is further configured to, in response to the second input, identify a video object that vocalizes in the video recording interface and display at least one vocalization identifier, each vocalization identifier indicating a vocalization video object; the receiving module 520 is further configured to receive a third input from the user to a second video object, where the second video object is a video object indicated by one of the at least one voice identifier; the response module 530 is further configured to, in response to the third input, perform a second process on the sound data of the target video object of the target video during the recording of the target video; wherein the target video object is the second video object, or a video object other than the second video object in the at least one video object; the second processing includes at least one of: playing, deleting, segmenting, storing, filtering and specially processing.

The video processing apparatus 500 in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The video processing apparatus 500 in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The video processing apparatus 500 provided in this embodiment of the application can implement each process implemented by the method embodiments of fig. 1 to fig. 4d, and is not described herein again to avoid repetition.

Optionally, as shown in fig. 6, an electronic device 600 is further provided in this embodiment of the present application, and includes a processor 601, a memory 602, and a program or an instruction stored in the memory 602 and executable on the processor 601, where the program or the instruction is executed by the processor 601 to implement each process of the above-mentioned video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 700 includes, but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, and a processor 710.

Those skilled in the art will appreciate that the electronic device 700 may further include a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The display unit 706 is configured to display a sound control identifier of at least one video object in the target video; an input unit 704, configured to receive a first input of a first sound control identifier from a user, where the first sound control identifier is a sound control identifier of a first video object in the at least one video object; processor 710 is operative to perform a first process on sound data of the first video object responsive to the first input.

In another possible implementation manner, the input unit 704 is configured to: receiving a first sub-input of a first sound control identifier by a user; a processor 710 for responding to the first sub-input and displaying a first sound track of the first video object through the display unit 706, an input unit 704 for receiving a second sub-input of the first sound track by a user, and a processor 710 for responding to the second sub-input and performing a first process on sound data of the first video object.

In another possible implementation manner, the interface for displaying the sound control identifier includes a first area and a second area, where the first area is used for displaying a video image of the target video and at least one sound control identifier; the second area is used for editing sound data of at least one video object in the target video; an input unit 704 for: receiving a sliding sub-input of a user on the first sound control identification; and displaying a first sound track of the first video object through the display unit 706 under the condition that the sliding start position of the sliding sub input is located on the first sound control identifier and the sliding end position of the sliding sub input is located in the second area.

In another possible implementation manner, the first sound track includes a first interception flag and a second interception flag; the second region further comprises at least one sound data processing control; an input unit 704, further configured to: receiving a second sub-input of the user to the first interception identification, the second interception identification and the target sound data processing control; a processor 710 for determining a sound clip to be processed based on the display positions of the first interception flag and the second interception flag; and executing first processing associated with the target sound data processing control on the sound fragment to be processed.

In another possible implementation manner, the processor 710 is further configured to: input features for obtaining the first input; and performing a first process associated with an input feature of the first input on sound data of the first video object; wherein different input features are associated with different sound data processing.

In another possible implementation manner, the processor 710 is further configured to: carrying out sound separation processing on sound data of all video objects included in the target video; and generating a sound control identifier of each video object based on the sound data obtained after sound separation.

In another possible implementation, the sound control identifier of each video object is associated with the sound production starting time point of each video object; in the process of playing a target video, for a target video object in the target video, under the condition that the video playing time point of the target video is the same as the sound production starting time point of the target video object, displaying a sound control identifier of the target video object in a target area of a video playing interface of the target video through a display unit 706; the target area is an identification display area associated with the target video object, and the target video object is any video object of the target video.

In another possible implementation, before recording the target video, the input unit 704 is further configured to: receiving a second input of the user on the video recording interface; processor 710, further configured to identify, in response to the second input, the vocalized video objects in the video recording interface and display at least one vocalized identifier via display unit 706, each vocalized identifier indicating a vocalized video object; an input unit 704, further configured to receive a third input from the user to a second video object, where the second video object is a video object indicated by one of the at least one voice identifier; a processor 710, further configured to perform a second process on sound data of a target video object of the target video in the process of recording the target video in response to the third input; wherein the target video object is the second video object, or a video object other than the second video object in the at least one video object; the second processing includes at least one of: playing, deleting, segmenting, storing, filtering and specially processing.

It should be understood that, when the processor 710 in the embodiment implements any implementation manner, the implementation process thereof may refer to the relevant description in the method embodiment 100, and in order to avoid repetition, details are not described here.

In addition, in the embodiment of the present application, the input Unit 704 may include a Graphics Processing Unit (GPU) 1041 and a microphone 7042, and the Graphics processor 7041 processes image data of a still picture or a video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 706 may include a display panel 7061, and the display panel 7061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 707 includes a touch panel 7071 and other input devices 7072. The touch panel 7071 is also referred to as a touch screen. The touch panel 7071 may include two parts of a touch detection device and a touch controller. Other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Memory 709 may be used to store software programs as well as various data, including but not limited to applications and operating systems. Processor 710 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 710.

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above video processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of video processing, the method comprising:

displaying a sound control identification of at least one video object in the target video;

receiving a first input of a first sound control identifier from a user, wherein the first sound control identifier is a sound control identifier of a first video object in the at least one video object;

in response to the first input, performing a first process on sound data of the first video object.

2. The method of claim 1, wherein the first processing comprises at least one of: playing, deleting, segmenting, storing, filtering and specially processing.

3. The method of claim 1, wherein receiving a first input from a user to a first voice control indicia comprises:

receiving a first sub-input of a first sound control identifier by a user;

displaying a first sound track of the first video object in response to the first sub-input;

receiving a second sub-input of the first sound track by a user;

the performing, in response to the first input, a first process on sound data of the first video object includes:

performing a first process on sound data of the first video object in response to the second sub-input.

4. The method of claim 3, wherein the interface for displaying the voice control indicator comprises a first area and a second area, wherein the first area is used for displaying the video image of the target video and at least one voice control indicator; the second area is used for editing sound data of at least one video object in the target video;

the receiving of the first sub-input of the first voice control identifier by the user comprises:

receiving a sliding sub-input of a user on the first sound control identification;

the first sound track displaying the first video object, comprising:

and displaying a first sound track of the first video object under the condition that the sliding starting position of the sliding sub input is positioned on the first sound control identifier and the sliding ending position of the sliding sub input is positioned in the second area.

5. The method of claim 4, wherein the first sound track comprises a first clipping indicator and a second clipping indicator; the second region further comprises at least one sound data processing control;

the receiving a second sub-input of the first sound track by the user comprises:

receiving a second sub-input of the user to the first interception identification, the second interception identification and the target sound data processing control;

the performing, in response to the second sub-input, a first process on sound data of the first video object, including:

determining a sound fragment to be processed based on the display positions of the first interception mark and the second interception mark;

and executing first processing associated with the target sound data processing control on the sound fragment to be processed.

6. The method of claim 1, wherein performing a first process on the sound data of the first video object comprises:

acquiring input features of the first input;

performing a first process associated with an input feature of the first input on sound data of the first video object;

wherein different input features are associated with different sound data processing.

7. The method of claim 1, wherein prior to displaying the voice-controlled representation of the at least one video object in the target video, the method further comprises:

carrying out sound separation processing on sound data of all video objects included in the target video;

and generating a sound control identifier of each video object based on the sound data obtained after sound separation.

8. The method of claim 7, wherein the sound control identification of each video object is associated with a vocalization start time point of each video object;

the sound control identification of at least one video object in the display target video comprises:

in the target video playing process, for a target video object in the target video, under the condition that the video playing time point of the target video is the same as the sound production starting time point of the target video object, displaying a sound control identifier of the target video object in a target area of a video playing interface of the target video;

the target area is an identification display area associated with the target video object, and the target video object is any video object of the target video.

9. The method of claim 1, wherein prior to recording the target video, the method further comprises:

receiving a second input of the user on the video recording interface;

in response to the second input, recognizing the video objects that are vocalized in the video recording interface and displaying at least one vocalization identifier, each vocalization identifier indicating a vocalized video object;

receiving a third input of a user to a second video object, wherein the second video object is a video object indicated by one sound emitting identifier in the at least one sound emitting identifier;

in response to the third input, performing second processing on sound data of a target video object of the target video in the process of recording the target video;

wherein the target video object is the second video object, or a video object other than the second video object in the at least one video object; the second processing includes at least one of: playing, deleting, segmenting, storing, filtering and specially processing.

10. A video processing apparatus, characterized in that the apparatus comprises:

the display module is used for displaying the sound control identification of at least one video object in the target video;

the receiving module is used for receiving a first input of a first sound control identifier from a user, wherein the first sound control identifier is the sound control identifier of a first video object in the at least one video object;

a response module to perform a first process on sound data of the first video object in response to the first input.

11. The apparatus of claim 10, wherein the first processing comprises at least one of: playing, deleting, segmenting, storing, filtering and specially processing.

12. The apparatus of claim 10,

the receiving module is used for receiving a first sub-input of a first sound control identifier by a user;

the display module to display a first sound track of the first video object in response to the first sub-input;

the receiving module is used for receiving a second sub-input of the first sound track from a user;

the response module is used for responding to the second sub-input and executing first processing on the sound data of the first video object.

13. The apparatus of claim 12, wherein the interface for displaying the voice control indicator comprises a first area and a second area, the first area being used for displaying the video image of the target video and the at least one voice control indicator; the second area is used for editing sound data of at least one video object in the target video;

the receiving module is used for receiving a sliding sub-input of a user on the first sound control identifier;

the display module is configured to display the first sound track of the first video object when the sliding start position of the sliding sub input is located on the first sound control identifier and the sliding end position of the sliding sub input is located in the second area.

14. The apparatus of claim 13, wherein the first sound track comprises a first clipping indicator and a second clipping indicator; the second region further comprises at least one sound data processing control;

the receiving module is used for receiving a second sub-input of the first interception identification, a second interception identification and a target sound data processing control by a user;

the response module is used for determining a sound fragment to be processed based on the display positions of the first interception identification and the second interception identification; and executing first processing associated with the target sound data processing control on the sound fragment to be processed.

15. The apparatus of claim 10, wherein the response module is configured to obtain input characteristics of the first input; and performing a first process associated with an input feature of the first input on sound data of the first video object; wherein different input features are associated with different sound data processing.

16. The apparatus of claim 10, further comprising:

the sound separation module is used for carrying out sound separation processing on sound data of all video objects included in the target video; and generating a sound control identifier of each video object based on the sound data obtained after sound separation.

17. The apparatus of claim 15, wherein the sound control identification of each video object is associated with a sound production start time point of each video object;

the display module is further used for displaying a sound control identifier of a target video object in the target video in a target area of a video playing interface of the target video under the condition that the video playing time point of the target video is the same as the sound production starting time point of the target video object in the target video playing process;

18. The apparatus of claim 10, wherein the receiving module, prior to recording the target video, is further configured to receive a second input from the user at the video recording interface;

the response module is further used for responding to the second input, identifying the video object which is sounded in the video recording interface, and displaying at least one sounding identifier, wherein each sounding identifier indicates one sounded video object;

the receiving module is further configured to receive a third input of a second video object from a user, where the second video object is a video object indicated by one of the at least one sound emitting identifier;

the response module is further used for responding to the third input and executing second processing on sound data of a target video object of the target video in the process of recording the target video;

19. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the video processing method according to any one of claims 1-9.

20. A readable storage medium, on which a program or instructions are stored, which, when executed by a processor, carry out the steps of the video processing method according to any one of claims 1 to 9.