WO2025002075A1 - Video generation method and apparatus, electronic device, and storage medium - Google Patents
Video generation method and apparatus, electronic device, and storage medium Download PDFInfo
- Publication number
- WO2025002075A1 WO2025002075A1 PCT/CN2024/101109 CN2024101109W WO2025002075A1 WO 2025002075 A1 WO2025002075 A1 WO 2025002075A1 CN 2024101109 W CN2024101109 W CN 2024101109W WO 2025002075 A1 WO2025002075 A1 WO 2025002075A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- camera movement
- video
- node
- initial
- optimized
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 127
- 230000033001 locomotion Effects 0.000 claims abstract description 706
- 238000005457 optimization Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims description 61
- 230000015654 memory Effects 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 abstract description 74
- 230000000007 visual effect Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 9
- 230000016776 visual perception Effects 0.000 description 7
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 4
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 4
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 4
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 4
- 230000001815 facial effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 208000002173 dizziness Diseases 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 208000006930 Pseudomyxoma Peritonei Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 208000003464 asthenopia Diseases 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000005057 finger movement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 229920000306 polymethylpentene Polymers 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/64—Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/2621—Cameras specially adapted for the electronic generation of special effects during image pickup, e.g. digital cameras, camcorders, video cameras having integrated special effects capability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/2628—Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
Definitions
- the embodiments of the present disclosure relate to the field of Internet technology, and in particular to a video generation method, device, electronic device, and storage medium.
- the camera template is a special effect template that processes the video to add simulated camera effects to the video and improve the video's viewing experience.
- the camera movement effect provided by the camera movement template does not match the content of the video to be processed actually shot by the user, which affects the visual perception of the camera movement special effects.
- an embodiment of the present disclosure provides a video generation method, comprising: obtaining an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the camera lens when the video is played; obtaining video content characteristics of the video to be processed, and obtaining an optimized camera movement sequence according to the video content characteristics and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics; processing the video to be processed based on the optimized camera movement sequence to generate an output video, wherein the camera lens of the output video has the camera movement nodes corresponding to the optimized camera movement nodes. Camera movement characteristics.
- an embodiment of the present disclosure provides a video generating device, comprising: a loading module, used to obtain an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the camera lens when the video is played; a processing module, used to obtain the video content characteristics of the video to be processed, and obtain an optimized camera movement sequence according to the video content characteristics and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics; an output module, used to process the video to be processed based on the optimized camera movement sequence to generate an output video, and the camera lens of the output video has the camera movement characteristics corresponding to the optimized camera movement node.
- an embodiment of the present disclosure provides an electronic device, comprising: a processor, and a memory communicatively connected to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the video generation method described in the first aspect and various possible designs of the first aspect.
- an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
- a processor executes the computer execution instructions, the video generation method described in the first aspect and various possible designs of the first aspect is implemented.
- an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video generation method described in the first aspect and various possible designs of the first aspect.
- FIG1 is a diagram of an application scenario of a video generation method provided by an embodiment of the present disclosure
- FIG2 is a flow chart of a video generation method according to an embodiment of the present disclosure.
- FIG3 is a flow chart of a specific implementation method of step S102 in the embodiment shown in FIG2 ;
- FIG4 is a schematic diagram of a process for determining video content features provided by an embodiment of the present disclosure.
- FIG5 is a flow chart of a possible implementation of step S103 in the embodiment shown in FIG2 ;
- FIG6 is a schematic diagram of an optimized camera movement node generated based on a first content feature provided by an embodiment of the present disclosure
- FIG. 7 is a flow chart of another possible implementation of step S103 in the embodiment shown in FIG. 2 ;
- FIG8 is a schematic diagram of an optimized camera movement node generated based on a second content feature provided by an embodiment of the present disclosure
- FIG9 is a flow chart of another possible implementation of step S103 in the embodiment shown in FIG2 ;
- FIG10 is a flowchart of a specific implementation method of step S1037 in the embodiment shown in FIG9 ;
- FIG11 is a second flow chart of a video generation method provided by an embodiment of the present disclosure.
- FIG12 is a flowchart of a specific implementation method of step S205 in the embodiment shown in FIG11 ;
- FIG13 is a flowchart of a specific implementation method of step S206 in the embodiment shown in FIG11 ;
- FIG14 is a structural block diagram of a video generating device provided by an embodiment of the present disclosure.
- FIG15 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
- FIG. 16 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
- the embodiments of the present disclosure provide a video generation method, device, electronic device and storage medium to overcome the problem that the camera movement effect provided by the camera movement template does not match the content of the video to be processed actually shot by the user.
- the video generation method, device, electronic device and storage medium provided in this embodiment obtain an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to represent the video playback time.
- the camera movement characteristics of the lens picture obtain the video content characteristics of the video to be processed, and obtain the optimized camera movement sequence according to the video content characteristics and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics; process the video to be processed based on the optimized camera movement sequence to generate an output video, wherein the lens picture of the output video has the camera movement characteristics corresponding to the optimized camera movement node.
- the initial camera movement template is set to obtain an optimized camera movement sequence composed of optimized camera movement nodes, and an output video with camera movement effects is generated based on the optimized camera movement sequence. Since the camera movement characteristics corresponding to the optimized camera movement nodes in the optimized camera movement sequence match the video content characteristics of the video to be processed, the output video is generated based on the optimized camera movement sequence and the video to be processed, thereby avoiding the problem of the camera movement special effects generated by the initial camera movement template and the content of the video to be processed not matching, and improving the visual perception of the camera movement special effects.
- user information including but not limited to user device information, user personal information, etc.
- data including but not limited to data used for analysis, stored data, displayed data, etc.
- user information including but not limited to user device information, user personal information, etc.
- data including but not limited to data used for analysis, stored data, displayed data, etc.
- FIG1 is a diagram of an application scenario of the video generation method provided by the embodiment of the present disclosure.
- the video generation method provided by the embodiment of the present disclosure can be applied to an application scenario of adding camera effects to a video.
- the method provided by the embodiment of the present disclosure can be applied to electronic devices such as terminal devices and servers.
- a terminal device such as a terminal device
- a terminal device such as The video generation method provided in this embodiment is implemented by running a video processing application (Application, APP) on a smart phone.
- APP video processing application
- the terminal device after loading a pre-generated video to be processed 11, the terminal device triggers a camera template control 12 provided by the video processing application to add camera effects to the video to be processed, thereby generating an output video 13 with a camera effect on the lens screen.
- the camera effect includes, for example, movement and zooming of the lens screen.
- the camera movement effect provided by the camera movement template is achieved by manually or automatically cutting the image screen of the initial video to generate multiple moving lens pictures to simulate the real camera movement. Therefore, the camera movement effect achieved by the camera movement template can match the image screen of the initial video.
- the corresponding video content is different from the initial video, which leads to the problem that the camera movement effect provided by the camera movement template does not match the content of the video to be processed actually shot by the user.
- the camera movement special effects of the output video generated by applying the camera movement template will cause problems such as the inability to focus on the portrait in the center of the lens picture, the lens picture zooming too large or too small, etc., affecting the visual perception.
- the embodiment of the present disclosure provides a video generation method to solve the above-mentioned problem.
- FIG. 2 is a flow chart of a video generation method provided by an embodiment of the present disclosure.
- the method of this embodiment can be applied in a terminal device or a server, and the video generation method includes:
- Step S101 obtaining an initial camera movement sequence according to an initial camera movement template.
- the initial camera movement sequence includes at least one camera movement node corresponding to an initial video, and the camera movement node is used to represent the camera movement characteristics of a lens picture during video playback.
- Camera movement refers to the process of shooting a video by pushing, pulling, shaking, moving, following, swinging, raising, lowering, etc., so that the camera lens of the shooting device is in motion to complete the video shooting. At the same time, it ensures the focus of the shooting subject (such as a dancing person) in the shooting picture, so that the movement of the shooting subject remains coherent, thereby achieving the effect of highlighting the shooting subject and improving the visual perception. It is a common video shooting method.
- the camera movement effect refers to processing the video that has been shot. Based on the original video, the video screen of the original video is dynamically trimmed. Image processing steps such as scaling and rotation are performed to form a continuous, moving lens screen, thereby simulating the real camera movement effect.
- the camera movement template is the data used to realize the above-mentioned camera movement special effects.
- the initial camera movement template in the present embodiment is generated based on the initial video.
- the initial camera movement template includes an initial camera movement sequence, and one or some independent camera movement effects are expressed through the initial camera movement sequence.
- the camera movement sequence is composed of at least one orderly camera movement node, and the camera movement node is used to characterize the camera movement characteristics of the lens picture when the video is played, such as the lens picture shaking left and right, shaking up and down, zooming, etc., so as to achieve an independent camera movement effect.
- the camera movement characteristics also include specific parameters of the above-mentioned lens picture movement, such as shaking amplitude, frequency, etc.
- Each camera movement node in the camera movement sequence corresponds to a timestamp, and the timestamp may include a start timestamp and an end timestamp, which respectively correspond to the start time and end time of the camera movement effect represented by the camera movement node.
- the initial camera movement sequence corresponds to one or more camera movement effects in the initial video, and the camera movement nodes in the initial camera movement sequence characterize the camera movement characteristics of the lens picture when the initial video is played. Further, by parsing the model data corresponding to the initial camera movement model, the initial camera movement sequence in the initial camera movement model can be obtained.
- the initial camera movement sequence in the initial camera movement model can be manually set by the user through key frames (K frames) based on the content of the initial video. It can also be automatically generated by the terminal device or server after content recognition based on the content of the initial video. The specific generation process of the initial camera movement model will not be introduced here.
- the above steps can be performed based on a trigger operation within the application. For example, after the user clicks on the control of the camera template A (initial camera template), the terminal device loads the template data corresponding to the camera simulation A, and then obtains the initial camera sequence.
- the specific interactive operation process is not specifically limited here and will not be repeated.
- Step S102 Obtain video content features of the video to be processed.
- the terminal device can load the video to be processed by responding to the user's instruction.
- the video to be processed can be a video shot by the terminal device, or a video shot or generated by other devices. More specifically, the content of the video to be processed can be various, for example Such as dance videos, landscape videos, cartoon videos, etc., no specific restrictions are made here.
- the corresponding video content features can be obtained by performing content recognition on the video to be processed.
- the video content features are information used to characterize the video content of the video to be processed.
- the video content features can be implemented in the form of identifiers. For example, the video content feature is "0", indicating that no face appears in the video to be processed.
- the video content feature is "1", indicating that a face appears in the video to be processed.
- the video content features can be implemented in the form of more complex feature vectors and feature matrices to characterize more complex content. For example, the outline of a portrait in the image to be processed, the size of a specific object, etc.
- step S102 includes:
- Step S1021 According to the timestamp of at least one camera movement node in the initial camera movement sequence, obtain the corresponding target video frame in the video to be processed.
- Step S1023 Perform content detection on the target video frame to obtain video content features.
- the initial camera movement sequence includes at least one camera movement node.
- the camera movement node corresponds to at least one timestamp, and the timestamp is used to characterize the marking time of the camera movement node in the initial camera movement sequence in the initial video. More specifically, the marking time can be the camera movement start time, the camera movement end time, etc.
- the corresponding video frame that is, the target video frame
- the video frame content feature corresponding to one of the target video frames can be used as the video content feature. It is also possible to use a collection of video frame content features corresponding to multiple target video frames as the video content feature.
- FIG4 is a schematic diagram of a process for determining video content features provided by an embodiment of the present disclosure.
- the initial camera movement sequence includes camera movement node #1, camera movement node #2, and camera movement node #3.
- the timestamp of camera movement node #1 is [00:02, 00:03], which indicates that the start time of the camera movement effect corresponding to camera movement node #1 is 00:02 (0 minutes and 2 seconds, the same below), and the end time is 00:03.
- the timestamp of camera movement node #2 is [00:04, 00:05], which indicates that the start time of the camera movement effect corresponding to camera movement node #2 is 00:02 (0 minutes and 2 seconds, the same below), and the end time is 00:03.
- the start time is 00:04 and the end time is 00:05.
- the timestamp of camera node #3 is [00:06, 00:08], which indicates that the start time of the camera effect corresponding to camera node #3 is 00:06 and the end time is 00:08.
- the start times corresponding to camera nodes #1, #2 and #3 i.e.
- the target video frames at the corresponding playback timestamps are obtained from the video to be processed, which are video frame P1, video frame P2 and video frame P3 respectively.
- content detection is performed on video frames P1, video frames P2 and video frames P3 respectively to obtain corresponding video frame content features F_1, video frame content features F_2 and video frame content features F_3.
- the set of the above video frame content features F_1, video frame content features F_2 and video frame content features F_3 (shown as [F_1, F_2, F_3] in the figure) is used as the video content features.
- Step S103 obtaining an optimized camera movement sequence according to the video content features and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement features corresponding to the optimized camera movement node match the video content features.
- At least one initial camera movement node in the initial camera movement sequence is optimized according to the video content feature to generate a corresponding optimized camera movement node, thereby obtaining an optimized camera movement sequence.
- the optimized camera movement node is also used to characterize the camera movement feature of the lens picture during video playback.
- the generated optimized camera movement node can match the video content feature.
- the optimized camera movement node can match the video content feature.
- the motion amplitude corresponding to the optimized camera movement node is matched with the scale of the scene in the video to be processed (an exemplary implementation of the video content feature).
- the center point of the lens picture corresponding to the optimized camera movement node is matched with the portrait position in the video to be processed (an exemplary implementation of the video content feature). Therefore, when the video to be processed is inconsistent with the initial video in content, the adaptive optimization of the camera movement effect is achieved by optimizing the camera movement node.
- the video content feature includes a first content feature
- the first content feature represents the position of the portrait in the video to be processed.
- step S103 A possible implementation includes:
- Step S1031 determining a lens focus area of a lens picture corresponding to at least one lens movement node in an initial lens movement sequence during the movement process according to the first content feature.
- Step S1032 configuring corresponding camera movement nodes according to the lens focus area during the movement of the lens image, and generating optimized camera movement nodes.
- Step S1033 Generate an optimized camera movement sequence according to the optimized camera movement nodes.
- the camera movement node has a first parameter, and the first parameter is used to determine the center point of the lens picture corresponding to the camera movement node during the movement process, that is, the lens focus area.
- the terminal device can determine the position of the portrait in the video to be processed through the first content feature.
- the first parameter of the camera movement node is set based on the position of the portrait, and the optimized camera movement node is generated, so that in the camera movement special effect generated by the camera movement node, the center point of the lens picture during the movement process is the position of the portrait, thereby realizing the focus of the portrait in the video to be processed.
- the video content features may be a set composed of a plurality of video frame content features, and the video frame content features correspond to a video frame at a certain playback position in the video to be processed (for example, the starting position of the camera movement special effect). Therefore, the first content feature may include the coordinates of the position of the portrait in the video at a plurality of playback positions in the video to be processed. More specifically, for example, the coordinates of the position of the center point of the character, or the coordinates of the center of the character's face. Afterwards, based on the position represented by the first content feature, the first parameter of at least one camera movement node in the initial camera movement sequence is set.
- each video frame content feature represents a portrait position
- the camera movement node corresponding to the video frame content feature is set according to each video frame content feature (the video frame content feature is determined based on the timestamp of the camera movement node, so the two have a corresponding relationship).
- FIG6 is a schematic diagram of an optimized camera movement node generated based on a first content feature provided by an embodiment of the present disclosure.
- the camera movement effect achieved by the optimized camera movement node #1 generated based on the first content feature representing the position of the portrait in the video to be processed is "zooming camera movement".
- the center of the lens picture is aligned with the reference point P1 where the portrait is located in the video to be processed, thereby achieving the effect of focusing on the portrait.
- the video content feature includes a second content feature
- the second content feature represents the scale of the scene in the to-be-processed video.
- step S103 includes:
- Step S1034 determining the movement amplitude of the shot image corresponding to at least one camera movement node in the initial camera movement sequence during the movement process according to the second content feature.
- Step S1035 configuring corresponding camera movement nodes according to the movement amplitude of the lens image during the movement process, and generating optimized camera movement nodes.
- Step S1036 Generate an optimized camera movement sequence according to the optimized camera movement nodes.
- the camera movement node has a second parameter, and the second parameter is used to determine the movement amplitude of the lens picture corresponding to the camera movement node during the movement process.
- the video content feature is a second content feature that characterizes the scale of the scene in the video to be processed
- the terminal device can determine the scale of the scene in the video to be processed through the second content feature. More specifically, that is, close view, medium view, long view, etc.
- the scale of the scene represented by the second content feature can be a continuously changing value, which is used to represent a more refined scene scale, and examples are not given one by one here.
- the second parameter of the camera movement node is set based on the scale of the scene, and the optimized camera movement node is generated, so that the shaking amplitude of the lens picture in the camera movement special effect generated by the camera movement node during the shaking process matches the scene.
- the farther the scene is the larger the relative movement amplitude is; and the farther the scene is, the smaller the relative movement amplitude is, so as to avoid the dizziness caused by the inappropriate camera movement shaking amplitude when the camera movement special effects such as "shaking camera movement" are used, and the viewing effect of the camera movement special effects is improved.
- FIG8 is a schematic diagram of an optimized camera movement node generated based on a second content feature provided by an embodiment of the present disclosure.
- the optimized camera movement node #2 generated by the second content feature realizes the camera movement effect of "left and right shaking camera movement".
- the shaking amplitude is a during the movement of the lens image P1 to the lens image P2.
- the shaking amplitude is b during the movement of the lens image P3 to the lens image P4.
- b 0.2a.
- the second parameter of the camera movement node is set by the first content feature to achieve the effect of adaptive amplitude adjustment of the left and right shaking camera movement.
- the video content feature includes a third content feature
- the third content feature represents a target object in the video to be processed.
- step S103 includes:
- Step S1037 determining the change frequency of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the camera movement process according to the third content feature, and the change frequency is less than a preset frequency threshold.
- Step S1038 configuring corresponding camera movement nodes according to the frequency of changes of the lens image during the camera movement process, and generating optimized camera movement nodes.
- Step S1039 Generate an optimized camera movement sequence according to the optimized camera movement nodes.
- the camera movement node has a third parameter, and the third parameter is used to determine the frequency of change of the lens picture corresponding to the camera movement node during the camera movement process.
- the terminal device can determine whether the target object exists in the video to be processed through the third content feature. For example, whether there are important parts such as face and hands, and the position and size of the above parts.
- the second parameter of the camera movement node is set based on the target object to generate an optimized camera movement node, thereby realizing the control of the shaking frequency of the lens picture in the camera movement special effect generated by the camera movement node.
- the high-frequency shaking is reduced.
- the shaking frequency (changing frequency) is reduced to below a preset frequency value by the third parameter, thereby improving the focusing effect of the target object, avoiding the problem that the user cannot see the target object such as the facial expression of the portrait and the finger movement due to too frequent shaking.
- step S1037 includes:
- Step S1037A Obtain the outline size of the target object.
- Step S1037B Obtain the corresponding target motion frequency according to the outline size, wherein the target motion frequency is inversely proportional to the outline size.
- Step S1037C according to the target motion frequency, a third parameter is set, where the third parameter is used to determine the frequency of change of the lens image corresponding to the camera movement node during the camera movement process.
- the corresponding shaking frequency is set based on the outline size of the target object.
- the shaking frequency is lower.
- the shaking frequency is higher (but not greater than the initial value of the third parameter).
- the specific implementation of the above video content feature can be one of the above first content feature, the second content feature, and the third content feature, or multiple of the above first content feature, the second content feature, and the third content feature. That is, the camera movement node can be configured by at least two of the first parameter, the second parameter, and the third parameter to generate an optimized camera movement node and a corresponding optimized camera movement sequence.
- the specific implementation method can be set as needed and will not be repeated here.
- Step S104 Process the video to be processed based on the optimized camera movement sequence to generate an output video, wherein the shot images of the output video have camera movement features corresponding to the optimized camera movement nodes.
- camera movement special effects are generated at corresponding positions of the video to be processed based on each optimized camera movement node in the optimized camera movement sequence, thereby generating a special effects video with one or more camera movement special effects, i.e., an output video. Since the camera movement special effects in the output video are generated based on the optimized optimized camera movement sequence, the lens images of the output video have camera movement features corresponding to the optimized camera movement nodes, so that the camera movement special effects in the output video match the video content.
- the initial camera movement sequence is obtained according to the initial camera movement template.
- the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the lens picture when the video is played.
- the video content characteristics of the video to be processed are obtained, and the optimized camera movement sequence is obtained according to the video content characteristics and the initial camera movement sequence.
- the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics.
- the video to be processed is processed based on the optimized camera movement sequence to generate an output video, and the lens picture of the output video has the camera movement characteristics corresponding to the optimized camera movement node.
- the initial camera movement template is set to obtain an optimized camera movement sequence composed of optimized camera movement nodes. And based on the optimized camera movement sequence, an output video with a camera movement effect is generated, because the camera movement characteristics corresponding to the optimized camera movement nodes in the optimized camera movement sequence match the video content characteristics of the video to be processed. Therefore, the output video is generated based on the optimized camera movement sequence and the video to be processed, avoiding the problem that the camera movement special effects generated by the initial camera movement template do not match the content of the video to be processed, and improving the visual perception of the camera movement special effects.
- FIG11 is a second flow chart of the video generation method provided by the embodiment of the present disclosure. Based on the embodiment shown in FIG2 , this embodiment further refines step S103 , and the video generation method includes:
- Step S201 obtaining an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to an initial video, and the camera movement node is used to represent camera movement features of a camera shot during video playback.
- Step S202 Obtain a first video content feature of the video to be processed.
- Step S203 Acquire the second video content feature of the initial video.
- Step S204 obtaining feature deviation information according to the second video content feature and the first video content feature, wherein the feature deviation information represents the feature deviation amount in video content between the video to be processed and the initial video.
- the initial video is the video used to generate the initial camera movement sequence, that is, the video content matches the camera movement special effects generated by the initial camera movement sequence.
- the template data corresponding to the initial camera movement template includes the initial video corresponding to the initial camera movement template.
- the initial video is subjected to content detection to obtain the video content features corresponding to the initial video, that is, the second video content features.
- the specific implementation method of obtaining the second video content features is similar to that of obtaining the video content features of the video to be processed (the second video content features).
- the implementation method of obtaining the video content feature of the video to be processed is the same as that of the video content feature of the video to be processed. Please refer to the introduction of the specific implementation method of obtaining the video content feature of the video to be processed in the embodiment shown in Figure 2, which will not be repeated here.
- feature comparison can be performed based on the first video content feature and the second video content feature to obtain feature deviation information.
- the feature deviation amount represents the feature deviation amount between the initial video and the video to be processed in terms of video content.
- the first video content feature represents the first position where the portrait is located in the video to be processed
- the second video content feature represents the second position where the portrait is located in the initial video.
- the feature deviation information representing the deviation amount between the first position and the second position is obtained through the first video content feature and the second video content feature.
- Step S205 setting node parameters of at least one camera movement node in the initial camera movement sequence according to the feature deviation information to obtain an optimized camera movement sequence.
- the node parameters of the camera movement node in the initial camera movement sequence are corrected based on the characteristic deviation amount represented by the characteristic deviation information, so that the camera movement node can match the video content feature (first video content feature) of the video to be processed, so as to achieve the purpose of camera movement node optimization, and then obtain an optimized camera movement sequence.
- the feature deviation information includes a position deviation
- the position deviation represents a deviation between a first position where the portrait is located in the initial video and a second position where the portrait is located in the video to be processed.
- a specific implementation of step S205 includes:
- Step S2051 Obtain the target camera movement node corresponding to the position deviation in the initial camera movement sequence.
- Step S2052 according to the position deviation, the center point of the lens picture corresponding to the target camera movement node during the camera movement process is set to obtain an optimized camera movement sequence.
- the feature deviation information is obtained based on the feature comparison between the first video content feature and the second video content feature, and the first video content feature and the second video content feature are obtained based on the video frame corresponding to the same camera movement node. Therefore, the feature deviation information corresponds to a camera movement node in the initial camera movement sequence.
- the camera movement node corresponding to the feature deviation information is the target camera movement node.
- the characteristic deviation information is a position deviation amount representing the deviation between a first position where a person portrait is located in the initial video and a second position where a person portrait is located in the video to be processed
- the characteristic deviation information can realize the adjustment of the center point of the picture of the lens picture corresponding to the camera movement node during the camera movement process.
- the position deviation amount is a deviation vector, through which the center point of the picture of the lens picture corresponding to the target camera movement node during the camera movement process is set. Thereby, the center point of the picture generates an offset corresponding to the deviation vector, and the optimization of the target camera movement node is realized.
- the optimization of the initial camera movement sequence is realized at the same time, thereby generating an optimized camera movement sequence corresponding to the initial camera movement sequence.
- the camera movement effect generated by the optimized optimized camera movement sequence can be focused on the position where the portrait is located in the video to be processed, thereby improving the visual perception experience.
- the step of setting the center point of the picture of the lens picture corresponding to the target camera movement node during the camera movement process in this embodiment can be realized by setting the first parameter of the target camera movement node.
- the specific implementation method can refer to the relevant introduction in the embodiment shown in Figure 2, which will not be repeated here.
- the feature deviation information includes a scale coefficient
- the scale coefficient represents the ratio of the scene scale in the initial video to the scene scale in the video to be processed.
- Step S2053 Obtain the target camera movement node corresponding to the scale coefficient in the initial camera movement sequence.
- Step S2054 according to the scale coefficient, the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process is set to obtain an optimized camera movement sequence.
- the target camera movement node corresponding to the scale coefficient is first obtained.
- the implementation method is similar to the step of obtaining the target camera movement node corresponding to the position deviation in the above-mentioned embodiment, and will not be repeated.
- the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process is set to achieve the setting of the shaking amplitude of the lens picture in the camera movement special effects such as "shaking camera movement".
- the optimization of the target camera movement node is achieved, and the optimized camera movement sequence corresponding to the initial camera movement sequence is generated.
- the movement amplitude of the camera movement effect generated by the optimized optimized camera movement sequence matches the scene scale.
- the step of setting the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process is as follows: The step can be realized by setting the second parameter of the target camera movement node. The specific implementation method can be found in the relevant introduction of the embodiment shown in FIG2, which will not be repeated here.
- the initial video has a first play duration
- the video to be processed has a second play duration.
- the method further includes:
- Step S206 Generate an extended camera movement sequence according to at least one camera movement node in the optimized camera movement sequence.
- the camera movement duration of the lens picture corresponding to the extended camera movement sequence is not less than the difference between the second playback duration and the first playback duration.
- Step S207 Process the video to be processed based on the optimized camera movement sequence and the extended camera movement sequence in sequence to generate an output video.
- this embodiment when the playback duration of the initial video is longer than the playback duration of the video to be processed, the playback duration of the initial camera movement template generated based on the initial video will be shorter than the video to be processed, thereby causing the camera movement special effects generated by the initial camera movement sequence to be unable to cover the entire video to be processed.
- this embodiment generates a delayed camera movement sequence by obtaining at least one camera movement node from the generated optimized camera movement sequence. After that, the video to be processed is processed based on the optimized camera movement sequence and the extended camera movement sequence in turn, so that the camera movement special effects generated by the optimized camera movement sequence and the extended camera movement sequence can cover the entire video to be processed, avoiding the existence of blank special effects segments without camera movement special effects in the generated output video.
- the extended camera movement sequence is generated based on the camera movement nodes in the optimized camera movement sequence, it can be ensured that the extended camera movement sequence matches the remaining video clips in the video to be processed that are not covered by the optimized camera movement sequence, thereby improving the camera movement special effects of the output video.
- step S206 includes:
- Step S2061 Obtain identification features of the first video segment in the video to be processed, wherein the first video segment is a video segment after a first playback duration in the video to be processed, and the identification features include video content features and/or music melody features of the first video segment.
- Step S2062 According to the identification feature, a second video segment is obtained from the video to be processed, where the second video segment is a video segment before the first playback duration in the video to be processed.
- the identification feature of the second video segment is similar to the identification feature of the first video segment.
- Step S2063 According to at least one continuous camera movement node corresponding to the second video clip, Generates an extended camera sequence.
- the video to be processed when the playback duration of the video to be processed is longer than that of the initial video, the video to be processed can be divided into two parts, namely, the part after the first playback duration and the part before the first playback duration.
- the part after the first playback duration it can be divided into one or more sub-segments, namely, the first video segment, based on the duration of the camera movement node.
- feature extraction is performed on the first video segment to obtain the identification feature corresponding to the first video segment, wherein the identification feature, for example, includes the video content feature and/or the music melody feature of the first video segment.
- the part before the first playback duration is searched to obtain a second video segment similar to the identification feature of the first video segment; wherein the judgment method of similar identification features includes: the feature distance between the identification feature of the second video segment and the identification feature of the first video segment is less than the feature threshold.
- the judgment method of similar identification features includes: the feature distance between the identification feature of the second video segment and the identification feature of the first video segment is less than the feature threshold.
- the timestamp of the second video segment at least one corresponding continuous multiplexed camera movement node is obtained.
- the above process is repeated until the multiplexed camera movement nodes corresponding to each first video segment are obtained, and the above multiplexed camera movement nodes are spliced to generate an extended camera movement sequence.
- the multiplexed camera movement node selected for the first video clip can also match the video content features of the first video clip to a certain extent. Therefore, the generated extended camera movement sequence can inherit the effect of the optimized camera movement sequence, achieve the adaptation of the video segments in the processed video that are not covered by the optimized camera movement sequence, and improve the visual effect of the overall camera movement special effect of the generated output video.
- step S201-step S202 is the same as the implementation of step S101-step S102 in the embodiment shown in FIG. 2 of the present disclosure, and will not be described in detail here.
- FIG14 is a structural block diagram of a video generation device provided by an embodiment of the present disclosure.
- the video generation device 3 includes:
- a loading module 31 is used to obtain an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to an initial video, and the camera movement node is used to represent the camera movement characteristics of a lens picture during video playback;
- the processing module 32 is used to obtain the video content features of the video to be processed, and obtain an optimized camera sequence according to the video content features and the initial camera sequence.
- the optimized camera sequence includes One less optimized camera movement node, the camera movement features corresponding to the optimized camera movement node match the video content features;
- the output module 33 is used to process the video to be processed based on the optimized camera movement sequence to generate an output video, and the lens images of the output video have the camera movement features corresponding to the optimized camera movement nodes.
- the video content feature includes a first content feature
- the first content feature characterizes the position of the portrait in the video to be processed
- the processing module 32 obtains the optimized camera movement sequence according to the video content feature and the initial camera movement sequence, it is specifically used to: generate a first parameter of at least one camera movement node in the initial camera movement sequence according to the first content feature, wherein the first parameter is used to determine the center point of the lens picture corresponding to the camera movement node during the movement process; configure the corresponding camera movement node according to the first parameter to generate an optimized camera movement node; and generate an optimized camera movement sequence according to the optimized camera movement node.
- the video content feature includes a second content feature
- the second content feature characterizes the scale of the scene in the video to be processed
- the processing module 32 obtains the optimized camera movement sequence according to the video content feature and the initial camera movement sequence, it is specifically used to: generate a second parameter of at least one camera movement node in the initial camera movement sequence according to the second content feature, wherein the second parameter is used to determine the movement amplitude of the lens picture corresponding to the camera movement node during the movement process; configure the corresponding camera movement node according to the second parameter to generate an optimized camera movement node; and generate an optimized camera movement sequence according to the optimized camera movement node.
- the video content feature includes a third content feature
- the third content feature characterizes the facial contour in the video to be processed
- the processing module 32 obtains the optimized camera movement sequence according to the video content feature and the initial camera movement sequence, it is specifically used to: generate a third parameter of at least one camera movement node in the initial camera movement sequence according to the third content feature, wherein the third parameter is used to determine the frequency of change of the lens picture corresponding to the camera movement node during the camera movement process; configure the corresponding camera movement node according to the third parameter to generate an optimized camera movement node; and generate an optimized camera movement sequence according to the optimized camera movement node.
- the processing module 32 when the processing module 32 generates the third parameter of at least one camera movement node in the initial camera movement sequence according to the third content feature, it is specifically used to: obtain the contour size of the facial contour; obtain the corresponding target motion frequency according to the contour size, wherein the target motion frequency is inversely proportional to the contour size; and set the third parameter.
- the processing module 32 when the processing module 32 obtains the optimized camera movement sequence based on the video content features and the initial camera movement sequence, it is specifically used to: obtain the video content features of the initial video; obtain feature deviation information based on the video content features of the initial video and the video content features of the video to be processed, the feature deviation information representing the feature deviation amount in video content between the video to be processed and the initial video; and set the node parameters of at least one camera movement node in the initial camera movement sequence based on the feature deviation information to obtain the optimized camera movement sequence.
- the characteristic deviation information includes a position deviation
- the position deviation represents a deviation between a first position of a portrait in an initial video and a second position of the portrait in a video to be processed
- the processing module 32 sets the node parameters of at least one camera movement node in the initial camera movement sequence according to the characteristic deviation information to obtain an optimized camera movement sequence, it is specifically used to: obtain a target camera movement node corresponding to the position deviation in the initial camera movement sequence; and set the center point of the lens picture corresponding to the target camera movement node during the camera movement process according to the position deviation to obtain an optimized camera movement sequence.
- the characteristic deviation information includes a scale coefficient, which represents the ratio of the scene scale in the initial video to the scene scale in the video to be processed; when the processing module 32 sets the node parameters of at least one camera movement node in the initial camera movement sequence according to the characteristic deviation information to obtain the optimized camera movement sequence, it is specifically used to: obtain the target camera movement node corresponding to the scale coefficient in the initial camera movement sequence; according to the scale coefficient, set the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process to obtain the optimized camera movement sequence.
- a scale coefficient which represents the ratio of the scene scale in the initial video to the scene scale in the video to be processed
- the processing module 32 when the processing module 32 obtains the video content features of the video to be processed, it is specifically used to: obtain the corresponding target video frame in the video to be processed according to the timestamp of at least one camera movement node in the initial camera movement sequence; perform content detection on the target video frame to obtain the video content features.
- the initial video has a first playback duration
- the video to be processed has a second playback duration.
- the processing module 32 is further configured to: generate an extended camera movement sequence according to at least one camera movement node in the optimized camera movement sequence, wherein the camera movement duration of the lens picture corresponding to the extended camera movement sequence is not less than the difference between the second playback duration and the first playback duration; and the processing module 32 processes the camera movement sequence based on the optimized camera movement sequence.
- the video to be processed generates an output video, it is specifically used to: process the video to be processed based on the optimized camera movement sequence and the extended camera movement sequence in sequence to generate an output video.
- the processing module 32 when the processing module 32 generates an extended camera movement sequence based on at least one camera movement node in the optimized camera movement sequence, it is specifically used to: obtain identification features of a first video clip in the video to be processed, wherein the first video clip is a video segment after a first playback duration in the video to be processed, and the identification features include video content features and/or music melody features of the first video clip; according to the identification features, obtain a second video clip from the video to be processed, the second video clip is a video segment before the first playback duration in the video to be processed, and the identification features of the second video clip are similar to the identification features of the first video clip; and generate an extended camera movement sequence based on at least one continuous camera movement node corresponding to the second video clip.
- the loading module 31, processing module 32 and output module 33 are connected in sequence.
- the video generating device 3 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
- FIG. 15 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 15 , the electronic device 4 includes:
- the memory 42 stores computer executable instructions
- the processor 41 executes the computer-executable instructions stored in the memory 42 to implement the video generation method in the embodiments shown in FIG. 2 to FIG. 13 .
- processor 41 and the memory 42 are connected via a bus 43 .
- An embodiment of the present disclosure provides a computer-readable storage medium, in which computer-executable instructions are stored.
- the computer-executable instructions are executed by a processor, they are used to implement the video generation method provided in any one of the embodiments corresponding to Figures 2 to 13 of the present disclosure.
- An embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the methods in the embodiments shown in FIG. 2 to FIG. 13 .
- the electronic device 900 may be a terminal device or a server.
- the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (Portable Media Players, PMPs), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
- PDAs personal digital assistants
- PADs Portable multimedia players
- PMPs portable multimedia players
- vehicle-mounted terminals such as vehicle-mounted navigation terminals
- fixed terminals such as digital TVs, desktop computers, etc.
- the electronic device shown in FIG16 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
- the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903.
- a processing device e.g., a central processing unit, a graphics processing unit, etc.
- RAM random access memory
- Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903.
- the processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904.
- An input/output (I/O) interface 905 is also connected to the bus 904.
- the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 907 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 908 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 909.
- the communication device 909 may allow the electronic device 900 to communicate with other devices wirelessly or by wire to exchange data.
- FIG. 16 shows an electronic device 900 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
- an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
- the computer program can be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902.
- the processing device 901 executes, the above functions defined in the method of the embodiment of the present disclosure are performed.
- the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
- Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
- This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
- the computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
- the program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
- the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
- the computer-readable medium carries one or more programs.
- the electronic device executes the method shown in the above embodiment.
- Computer program code for performing operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
- the program code may be written entirely in
- the program may be executed on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
- LAN Local Area Network
- WAN Wide Area Network
- each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
- the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
- each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
- the name of a unit does not limit the unit itself in some cases.
- the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
- exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), and the like.
- FPGAs field programmable gate arrays
- ASICs application specific integrated circuits
- ASSPs application specific standard products
- SOCs systems on chips
- CPLDs complex programmable logic devices
- a machine-readable medium may be a tangible medium that can contain or store information for use by or in connection with an instruction execution system, apparatus, or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- the machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing.
- machine-readable storage media may include electrical connections based on one or more wires, portable computer disks, hard disks, random access memories (RAM), read-only memories (ROM), erasable programmable read-only memories (EPROM or flash memory), optical fibers, portable compact disk read-only memories (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- RAM random access memories
- ROM read-only memories
- EPROM or flash memory erasable programmable read-only memories
- CD-ROM compact disk read-only memories
- magnetic storage devices or any suitable combination of the foregoing.
- the present disclosure provides a video generation method, comprising:
- an initial camera movement sequence is obtained, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the lens screen when the video is played;
- the video content characteristics of the video to be processed are obtained, and the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the lens screen when the video is played;
- the video content features and the initial camera movement sequence are combined to obtain an optimized camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement features corresponding to the optimized camera movement node match the video content features; based on the optimized camera movement sequence, the video to be processed is processed to generate an output video, and the shot images of the output video have the camera movement features corresponding to the optimized camera movement node.
- the video content feature includes a first content feature
- the first content feature represents the position of the portrait in the video to be processed
- the optimized camera movement sequence is obtained according to the video content feature and the initial camera movement sequence, including: determining the lens focus area of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the movement process according to the first content feature; configuring the corresponding camera movement node according to the lens focus area of the lens picture during the movement process, and generating an optimized camera movement node; generating the optimized camera movement sequence according to the optimized camera movement node.
- the video content feature includes a second content feature
- the second content feature characterizes the scale of the scene in the video to be processed
- obtaining the optimized camera movement sequence based on the video content feature and the initial camera movement sequence includes: determining, based on the second content feature, the motion amplitude of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the motion process; configuring the corresponding camera movement node according to the motion amplitude of the lens picture during the motion process, and generating an optimized camera movement node; generating the optimized camera movement sequence based on the optimized camera movement node.
- the video content feature includes a third content feature
- the third content feature represents the target object in the video to be processed
- the optimized camera movement sequence is obtained based on the video content feature and the initial camera movement sequence, including: determining the change frequency of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the camera movement process based on the third content feature, and the change frequency is less than a preset frequency threshold; configuring the corresponding camera movement node according to the change frequency of the lens picture during the camera movement process, and generating an optimized camera movement node; generating the optimized camera movement sequence based on the optimized camera movement node.
- generating a third parameter of at least one camera movement node in the initial camera movement sequence according to the third content feature includes: obtaining The outline size of the target object; according to the outline size, obtaining a corresponding target motion frequency, wherein the target motion frequency is inversely proportional to the outline size; according to the target motion frequency, setting the third parameter.
- obtaining an optimized camera movement sequence based on the video content features and the initial camera movement sequence includes: acquiring video content features of the initial video; obtaining feature deviation information based on the video content features of the initial video and the video content features of the video to be processed, wherein the feature deviation information represents the amount of feature deviation in video content between the video to be processed and the initial video; and setting node parameters of at least one camera movement node in the initial camera movement sequence based on the feature deviation information to obtain the optimized camera movement sequence.
- the characteristic deviation information includes a position deviation
- the position deviation represents a deviation between a first position of a portrait in the initial video and a second position of the portrait in the video to be processed
- node parameters of at least one camera movement node in the initial camera movement sequence are set to obtain the optimized camera movement sequence, including: obtaining a target camera movement node corresponding to the position deviation in the initial camera movement sequence; according to the position deviation, setting the center point of the lens picture corresponding to the target camera movement node during the camera movement process to obtain the optimized camera movement sequence.
- the feature deviation information includes a scale coefficient
- the scale coefficient represents the ratio of the scene scale in the initial video to the scene scale in the video to be processed
- the node parameters of at least one camera movement node in the initial camera movement sequence are set to obtain the optimized camera movement sequence, including: obtaining the target camera movement node corresponding to the scale coefficient in the initial camera movement sequence; according to the scale coefficient, setting the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process to obtain the optimized camera movement sequence.
- the obtaining of video content features of the video to be processed includes: obtaining a corresponding target video frame in the video to be processed based on a timestamp of at least one of the camera movement nodes in the initial camera movement sequence; and performing content detection on the target video frame to obtain the video content features.
- the initial video has a first playback time.
- the video to be processed has a second playback duration.
- the method further comprises: generating an extended camera movement sequence according to at least one camera movement node in the optimized camera movement sequence, and the camera movement duration of the lens picture corresponding to the extended camera movement sequence is not less than the difference between the second playback duration and the first playback duration; processing the video to be processed based on the optimized camera movement sequence to generate an output video comprises: processing the video to be processed based on the optimized camera movement sequence and the extended camera movement sequence in sequence to generate an output video.
- generating an extended camera movement sequence based on at least one camera movement node in the optimized camera movement sequence includes: obtaining identification features of a first video clip in the video to be processed, wherein the first video clip is a video segment after a first playback duration in the video to be processed, and the identification features include video content features and/or music melody features of the first video clip; obtaining a second video clip from the video to be processed based on the identification features, wherein the second video clip is a video segment before a first playback duration in the video to be processed, and the identification features of the second video clip are similar to those of the first video clip; generating the extended camera movement sequence based on at least one continuous camera movement node corresponding to the second video clip.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Image Processing (AREA)
- Studio Devices (AREA)
Abstract
Description
本申请要求2023年6月27日递交的、标题为“视频生成方法、装置、电子设备及存储介质”、申请号为2023107702269的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese invention patent application entitled “Video generation method, device, electronic device and storage medium” and application number 2023107702269, filed on June 27, 2023. The entire contents of this application are incorporated by reference into this application.
本公开实施例涉及互联网技术领域,尤其涉及一种视频生成方法、装置、电子设备及存储介质。The embodiments of the present disclosure relate to the field of Internet technology, and in particular to a video generation method, device, electronic device, and storage medium.
当前,视频平台和视频处理应用会为用户提供各式各样的特效模板,来提高用户制作自制视频作品的效率和质量。其中,运镜模板是通过对视频进行处理,从而实现为视频添加模拟的运镜特效、提高视频观感的一种特效模板。Currently, video platforms and video processing applications provide users with a variety of special effects templates to improve the efficiency and quality of users' homemade video works. Among them, the camera template is a special effect template that processes the video to add simulated camera effects to the video and improve the video's viewing experience.
然而,在运镜模板的实际应用过程中,存在运镜模板所提供的运镜效果与用户实际拍摄的待处理视频的内容不匹配的问题,影响运镜特效的视觉观感。However, in the actual application of the camera movement template, there is a problem that the camera movement effect provided by the camera movement template does not match the content of the video to be processed actually shot by the user, which affects the visual perception of the camera movement special effects.
发明内容Summary of the invention
第一方面,本公开实施例提供一种视频生成方法,包括:根据初始运镜模板,获得初始运镜序列,所述初始运镜序列包括初始视频对应的至少一个运镜节点,所述运镜节点用于表征视频播放时镜头画面的运镜特征;获取待处理视频的视频内容特征,并根据所述视频内容特征和所述初始运镜序列,得到优化运镜序列,所述优化运镜序列包括至少一个优化运镜节点,所述优化运镜节点对应的运镜特征与所述视频内容特征相匹配;基于所述优化运镜序列处理所述待处理视频,生成输出视频,所述输出视频的镜头画面具有所述优化运镜节点对应 运镜特征。In a first aspect, an embodiment of the present disclosure provides a video generation method, comprising: obtaining an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the camera lens when the video is played; obtaining video content characteristics of the video to be processed, and obtaining an optimized camera movement sequence according to the video content characteristics and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics; processing the video to be processed based on the optimized camera movement sequence to generate an output video, wherein the camera lens of the output video has the camera movement nodes corresponding to the optimized camera movement nodes. Camera movement characteristics.
第二方面,本公开实施例提供一种视频生成装置,包括:加载模块,用于根据初始运镜模板,获得初始运镜序列,所述初始运镜序列包括初始视频对应的至少一个运镜节点,所述运镜节点用于表征视频播放时镜头画面的运镜特征;处理模块,用于获取待处理视频的视频内容特征,并根据所述视频内容特征和所述初始运镜序列,得到优化运镜序列,所述优化运镜序列包括至少一个优化运镜节点,所述优化运镜节点对应的运镜特征与所述视频内容特征相匹配;输出模块,用于基于所述优化运镜序列处理所述待处理视频,生成输出视频,所述输出视频的镜头画面具有所述优化运镜节点对应运镜特征。In a second aspect, an embodiment of the present disclosure provides a video generating device, comprising: a loading module, used to obtain an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the camera lens when the video is played; a processing module, used to obtain the video content characteristics of the video to be processed, and obtain an optimized camera movement sequence according to the video content characteristics and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics; an output module, used to process the video to be processed based on the optimized camera movement sequence to generate an output video, and the camera lens of the output video has the camera movement characteristics corresponding to the optimized camera movement node.
第三方面,本公开实施例提供一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;所述存储器存储计算机执行指令;所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的视频生成方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, comprising: a processor, and a memory communicatively connected to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the video generation method described in the first aspect and various possible designs of the first aspect.
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的视频生成方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored. When a processor executes the computer execution instructions, the video generation method described in the first aspect and various possible designs of the first aspect is implemented.
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的视频生成方法。In a fifth aspect, an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video generation method described in the first aspect and various possible designs of the first aspect.
为了更清楚地说明本公开实施例或当前方案中的技术方案,下面将对实施例或当前方案描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments or current solutions of the present disclosure, the drawings required for use in the description of the embodiments or current solutions will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present disclosure. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.
图1为本公开实施例提供的视频生成方法的一种应用场景图; FIG1 is a diagram of an application scenario of a video generation method provided by an embodiment of the present disclosure;
图2为本公开实施例提供的视频生成方法的流程示意图一;FIG2 is a flow chart of a video generation method according to an embodiment of the present disclosure;
图3为图2所示实施例中步骤S102的具体实现方式的流程图;FIG3 is a flow chart of a specific implementation method of step S102 in the embodiment shown in FIG2 ;
图4为本公开实施例提供的一种视频内容特征的确定过程示意图;FIG4 is a schematic diagram of a process for determining video content features provided by an embodiment of the present disclosure;
图5为图2所示实施例中步骤S103的一种可能的实现方式流程图;FIG5 is a flow chart of a possible implementation of step S103 in the embodiment shown in FIG2 ;
图6为本公开实施例提供的一种基于第一内容特征生成的优化运镜节点的示意图;FIG6 is a schematic diagram of an optimized camera movement node generated based on a first content feature provided by an embodiment of the present disclosure;
图7为图2所示实施例中步骤S103的另一种可能的实现方式流程图;FIG. 7 is a flow chart of another possible implementation of step S103 in the embodiment shown in FIG. 2 ;
图8为本公开实施例提供的一种基于第二内容特征生成的优化运镜节点的示意图;FIG8 is a schematic diagram of an optimized camera movement node generated based on a second content feature provided by an embodiment of the present disclosure;
图9为图2所示实施例中步骤S103的再一种可能的实现方式流程图;FIG9 is a flow chart of another possible implementation of step S103 in the embodiment shown in FIG2 ;
图10为图9所示实施例中步骤S1037的具体实现方式的流程图;FIG10 is a flowchart of a specific implementation method of step S1037 in the embodiment shown in FIG9 ;
图11为本公开实施例提供的视频生成方法的流程示意图二;FIG11 is a second flow chart of a video generation method provided by an embodiment of the present disclosure;
图12为图11所示实施例中步骤S205的具体实现方式的流程图;FIG12 is a flowchart of a specific implementation method of step S205 in the embodiment shown in FIG11 ;
图13为图11所示实施例中步骤S206的具体实现方式的流程图;FIG13 is a flowchart of a specific implementation method of step S206 in the embodiment shown in FIG11 ;
图14为本公开实施例提供的视频生成装置的结构框图;FIG14 is a structural block diagram of a video generating device provided by an embodiment of the present disclosure;
图15为本公开实施例提供的一种电子设备的结构示意图;FIG15 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure;
图16为本公开实施例提供的电子设备的硬件结构示意图。FIG. 16 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
本公开实施例提供一种视频生成方法、装置、电子设备及存储介质,以克服运镜模板所提供的运镜效果与用户实际拍摄的待处理视频的内容不匹配的问题。The embodiments of the present disclosure provide a video generation method, device, electronic device and storage medium to overcome the problem that the camera movement effect provided by the camera movement template does not match the content of the video to be processed actually shot by the user.
本实施例提供的视频生成方法、装置、电子设备及存储介质,通过根据初始运镜模板,获得初始运镜序列,所述初始运镜序列包括初始视频对应的至少一个运镜节点,所述运镜节点用于表征视频播放时 镜头画面的运镜特征;获取待处理视频的视频内容特征,并根据所述视频内容特征和所述初始运镜序列,得到优化运镜序列,所述优化运镜序列包括至少一个优化运镜节点,所述优化运镜节点对应的运镜特征与所述视频内容特征相匹配;基于所述优化运镜序列处理所述待处理视频,生成输出视频,所述输出视频的镜头画面具有所述优化运镜节点对应运镜特征。通过利用待处理视频的视频内容特征,对初始运镜模板进行设置,得到由优化运镜节点构成的优化运镜序列,并基于优化运镜序列生带有运镜效果的输出视频,由于优化运镜序列中的优化运镜节点对应的运镜特征与待处理视频的视频内容特征相匹配,因此基于优化运镜序列和待处理视频生成输出视频,避免了初始运镜模板所生成的运镜特效和待处理视频的内容不匹配的问题,提高运镜特效的视觉观感。The video generation method, device, electronic device and storage medium provided in this embodiment obtain an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to represent the video playback time. The camera movement characteristics of the lens picture; obtain the video content characteristics of the video to be processed, and obtain the optimized camera movement sequence according to the video content characteristics and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics; process the video to be processed based on the optimized camera movement sequence to generate an output video, wherein the lens picture of the output video has the camera movement characteristics corresponding to the optimized camera movement node. By using the video content characteristics of the video to be processed, the initial camera movement template is set to obtain an optimized camera movement sequence composed of optimized camera movement nodes, and an output video with camera movement effects is generated based on the optimized camera movement sequence. Since the camera movement characteristics corresponding to the optimized camera movement nodes in the optimized camera movement sequence match the video content characteristics of the video to be processed, the output video is generated based on the optimized camera movement sequence and the video to be processed, thereby avoiding the problem of the camera movement special effects generated by the initial camera movement template and the content of the video to be processed not matching, and improving the visual perception of the camera movement special effects.
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present disclosure clearer, the technical solution in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.
需要说明的是,本公开所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.
下面对本公开实施例的应用场景进行解释:The application scenarios of the embodiments of the present disclosure are explained below:
图1为本公开实施例提供的视频生成方法的一种应用场景图,本公开实施例提供的视频生成方法,可以应用于为视频添加运镜特效的应用场景。具体地,如图1所示,本公开实施例提供的方法,可以应用于终端设备、服务器等电子设备。以终端设备为例,终端设备例如 为智能手机,通过运行视频处理应用(Application,APP),来实现本实施例所提供的视频生成方法。具体地,参考图1中所示,终端设备在加载预生成的待处理视频11后,通过触发视频处理应用所提供的运镜模板控件12,为该待处理视频增加运镜特效,从而生成镜头画面具有运镜效果的输出视频13。其中,运镜效果例如包括镜头画面的移动、缩放等。FIG1 is a diagram of an application scenario of the video generation method provided by the embodiment of the present disclosure. The video generation method provided by the embodiment of the present disclosure can be applied to an application scenario of adding camera effects to a video. Specifically, as shown in FIG1 , the method provided by the embodiment of the present disclosure can be applied to electronic devices such as terminal devices and servers. Taking a terminal device as an example, a terminal device such as The video generation method provided in this embodiment is implemented by running a video processing application (Application, APP) on a smart phone. Specifically, as shown in FIG1 , after loading a pre-generated video to be processed 11, the terminal device triggers a camera template control 12 provided by the video processing application to add camera effects to the video to be processed, thereby generating an output video 13 with a camera effect on the lens screen. The camera effect includes, for example, movement and zooming of the lens screen.
相关技术中,运镜模板所提供的运镜效果,是通过手动或自动的方式,针对初始视频的图像画面进行剪裁,生成多个运动的镜头画面来模拟真实的运镜而实现的。因此,运镜模板所实现的运镜效果,可以与初始视频的图像画面相匹配。然而,对于需要应用该运镜模板而获得相同运镜效果的待处理视频,其对应的视频内容与初始视频存在差异,因此,导致运镜模板所提供的运镜效果与用户实际拍摄的待处理视频的内容不匹配的问题。使应用该运镜模板而生成输出视频的运镜特效出现镜头画面中心无法聚焦人像、镜头画面缩放过大或过小等问题,影响视觉观感。In the related art, the camera movement effect provided by the camera movement template is achieved by manually or automatically cutting the image screen of the initial video to generate multiple moving lens pictures to simulate the real camera movement. Therefore, the camera movement effect achieved by the camera movement template can match the image screen of the initial video. However, for the video to be processed that needs to apply the camera movement template to obtain the same camera movement effect, the corresponding video content is different from the initial video, which leads to the problem that the camera movement effect provided by the camera movement template does not match the content of the video to be processed actually shot by the user. The camera movement special effects of the output video generated by applying the camera movement template will cause problems such as the inability to focus on the portrait in the center of the lens picture, the lens picture zooming too large or too small, etc., affecting the visual perception.
本公开实施例提供一种视频生成方法以解决上述问题。The embodiment of the present disclosure provides a video generation method to solve the above-mentioned problem.
参考图2,图2为本公开实施例提供的视频生成方法的流程示意图一。本实施例的方法可以应用在终端设备或服务器中,该视频生成方法包括:Referring to FIG. 2 , FIG. 2 is a flow chart of a video generation method provided by an embodiment of the present disclosure. The method of this embodiment can be applied in a terminal device or a server, and the video generation method includes:
步骤S101:根据初始运镜模板,获得初始运镜序列。初始运镜序列包括初始视频对应的至少一个运镜节点,运镜节点用于表征视频播放时镜头画面的运镜特征。Step S101: obtaining an initial camera movement sequence according to an initial camera movement template. The initial camera movement sequence includes at least one camera movement node corresponding to an initial video, and the camera movement node is used to represent the camera movement characteristics of a lens picture during video playback.
首先对运镜的概念进行简单介绍。运镜是指视频拍摄过程中,通过对拍摄设备的镜头进行推、拉、摇、移、跟、甩、升、降等操作,使拍摄设备的镜头在运动状态下完成视频拍摄。同时保证拍摄画面中的拍摄主体(例如跳舞的人)的聚焦,使拍摄主题的运动保持连贯性,从而实现突出拍摄主体、提高视觉观感的效果,是常用的视频拍摄方式。First, let's briefly introduce the concept of camera movement. Camera movement refers to the process of shooting a video by pushing, pulling, shaking, moving, following, swinging, raising, lowering, etc., so that the camera lens of the shooting device is in motion to complete the video shooting. At the same time, it ensures the focus of the shooting subject (such as a dancing person) in the shooting picture, so that the movement of the shooting subject remains coherent, thereby achieving the effect of highlighting the shooting subject and improving the visual perception. It is a common video shooting method.
而对于运镜特效,是指通过对已经拍摄完成的视频进行处理,在 原视频的基础上,通过对原视频的视频画面进行动态地剪裁。并进行缩放、旋转等图像处理步骤,形成连续的、具有运动状态的镜头画面,从而实现对真实运镜效果的模拟。其中,运镜模板即用于实现上述运镜特效的数据,通过运镜模板对待处理视频进行处理,即可使处理后的待处理视频(即输出视频)具有对应的运镜特效。As for the camera movement effect, it refers to processing the video that has been shot. Based on the original video, the video screen of the original video is dynamically trimmed. Image processing steps such as scaling and rotation are performed to form a continuous, moving lens screen, thereby simulating the real camera movement effect. Among them, the camera movement template is the data used to realize the above-mentioned camera movement special effects. By processing the video to be processed through the camera movement template, the processed video to be processed (i.e., the output video) can have the corresponding camera movement special effects.
进一步地,本实施例中的初始运镜模板基于初始视频生成。其中,初始运镜模板中包括初始运镜序列,通过初始运镜序列来表现一个或一些独立的运镜效果。具体地,运镜序列由至少一个有序的运镜节点构成,运镜节点用于表征视频播放时镜头画面的运镜特征,例如镜头画面左右晃动、上下晃动、缩放等,从而实现一个独立的运镜效果。运镜特征中还包括上述镜头画面运动的具体参数,例如晃动幅度、频率等。运镜序列中的各运镜节点分别对应有时间戳,该时间戳可以包括开始时间戳和结束时间戳,分别对应运镜节点所表征的运镜效果的开始时间和结束时间。而初始运镜序列则对应初始视频中的一个或多个运镜效果,初始运镜序列中的运镜节点表征初始视频播放时镜头画面的运镜特征。进一步地,通过解析初始运镜模型对应的模型数据,可以获得初始运镜模型中的初始运镜序列。初始运镜模型中的初始运镜序列,可以是用户基于初始视频的内容,通过关键帧(K帧)进行手动设置的。也可以是终端设备或服务器基于初始视频的内容进行内容识别后,自动生成的。此处不再对初始运镜模型的具体生成过程进行介绍。Further, the initial camera movement template in the present embodiment is generated based on the initial video. Wherein, the initial camera movement template includes an initial camera movement sequence, and one or some independent camera movement effects are expressed through the initial camera movement sequence. Specifically, the camera movement sequence is composed of at least one orderly camera movement node, and the camera movement node is used to characterize the camera movement characteristics of the lens picture when the video is played, such as the lens picture shaking left and right, shaking up and down, zooming, etc., so as to achieve an independent camera movement effect. The camera movement characteristics also include specific parameters of the above-mentioned lens picture movement, such as shaking amplitude, frequency, etc. Each camera movement node in the camera movement sequence corresponds to a timestamp, and the timestamp may include a start timestamp and an end timestamp, which respectively correspond to the start time and end time of the camera movement effect represented by the camera movement node. The initial camera movement sequence corresponds to one or more camera movement effects in the initial video, and the camera movement nodes in the initial camera movement sequence characterize the camera movement characteristics of the lens picture when the initial video is played. Further, by parsing the model data corresponding to the initial camera movement model, the initial camera movement sequence in the initial camera movement model can be obtained. The initial camera movement sequence in the initial camera movement model can be manually set by the user through key frames (K frames) based on the content of the initial video. It can also be automatically generated by the terminal device or server after content recognition based on the content of the initial video. The specific generation process of the initial camera movement model will not be introduced here.
其中,上述步骤可以基于应用内的触发操作而执行。例如,用户点击运镜模板A(初始运镜模板)的控件后,终端设备加载该运镜模拟A对应的模板数据,进而得到初始运镜序列,具体交互操作过程。此处不做具体限制,也不再赘述。The above steps can be performed based on a trigger operation within the application. For example, after the user clicks on the control of the camera template A (initial camera template), the terminal device loads the template data corresponding to the camera simulation A, and then obtains the initial camera sequence. The specific interactive operation process is not specifically limited here and will not be repeated.
步骤S102:获取待处理视频的视频内容特征。Step S102: Obtain video content features of the video to be processed.
示例性地,终端设备可通过响应用户指令,加载待处理视频。该待处理视频可以是通过该终端设备拍摄的视频,也可以是由其他设备拍摄或生成的视频。更具体地,该待处理视频的内容可以为多种,例 如舞蹈视频、风景视频、卡通视频等,此处不做具体限制。在终端设备加载该待处理视频后,通过对该待处理视频进行内容识别,可以得到对应的视频内容特征。其中,视频内容特征是用于表征待处理视频的视频内容的信息,一种可能的实现方式中,视频内容特征可以以标识的形式实现。例如,视频内容特征为“0”,表示待处理视频中没有出现人脸。视频内容特征为“1”,表示待处理视频中出现人脸。另一种可能的实现方式中,视频内容特征可以更为复杂的特征向量、特征矩阵的形式实现,进而来表征更为复杂的内容。例如待处理图像中人像的轮廓、特定物体的尺寸等。For example, the terminal device can load the video to be processed by responding to the user's instruction. The video to be processed can be a video shot by the terminal device, or a video shot or generated by other devices. More specifically, the content of the video to be processed can be various, for example Such as dance videos, landscape videos, cartoon videos, etc., no specific restrictions are made here. After the terminal device loads the video to be processed, the corresponding video content features can be obtained by performing content recognition on the video to be processed. Among them, the video content features are information used to characterize the video content of the video to be processed. In one possible implementation, the video content features can be implemented in the form of identifiers. For example, the video content feature is "0", indicating that no face appears in the video to be processed. The video content feature is "1", indicating that a face appears in the video to be processed. In another possible implementation, the video content features can be implemented in the form of more complex feature vectors and feature matrices to characterize more complex content. For example, the outline of a portrait in the image to be processed, the size of a specific object, etc.
在一种可能的实现方式中,如图3所示,步骤S102的具体实现方式包括:In a possible implementation, as shown in FIG3 , a specific implementation of step S102 includes:
步骤S1021:根据初始运镜序列中至少一个运镜节点的时间戳,获取待处理视频中对应的目标视频帧。Step S1021: According to the timestamp of at least one camera movement node in the initial camera movement sequence, obtain the corresponding target video frame in the video to be processed.
步骤S1023:对目标视频帧进行内容检测,得到视频内容特征。Step S1023: Perform content detection on the target video frame to obtain video content features.
示例性地,初始运镜序列包括至少一个运镜节点。其中,运镜节点对应至少一个时间戳,该时间戳用于表征初始运镜序列中的运镜节点在初始视频中的标记时间。更具体地,标记时间可以为运镜开始时间、运镜结束时间等。之后,基于至少一个运镜节点的时间戳,在待处理视频帧,读取对应的视频帧,即目标视频帧。对目标视频帧进行内容检测,即可得到目标视频帧对应的视频帧内容特征。在之后,可以以其中的一个目标视频帧对应的视频帧内容特征,作为视频内容特征。也可以将多个目标视频帧对应的视频帧内容特征的集合,作为视频内容特征。Exemplarily, the initial camera movement sequence includes at least one camera movement node. The camera movement node corresponds to at least one timestamp, and the timestamp is used to characterize the marking time of the camera movement node in the initial camera movement sequence in the initial video. More specifically, the marking time can be the camera movement start time, the camera movement end time, etc. Afterwards, based on the timestamp of at least one camera movement node, the corresponding video frame, that is, the target video frame, is read in the video frame to be processed. Content detection is performed on the target video frame to obtain the video frame content feature corresponding to the target video frame. Afterwards, the video frame content feature corresponding to one of the target video frames can be used as the video content feature. It is also possible to use a collection of video frame content features corresponding to multiple target video frames as the video content feature.
图4为本公开实施例提供的一种视频内容特征的确定过程示意图,参考图4对上述过程进行进一步介绍。如图4所示,示例性地,初始运镜序列中包括运镜节点#1、运镜节点#2和运镜节点#3。其中,运镜节点#1的时间戳为[00:02,00:03],表征运镜节点#1对应的运镜效果的开始时间为00:02(0分2秒,下同)、结束时间为00:03。运镜节点#2的时间戳为[00:04,00:05],表征运镜节点#2对应的运镜效果的 开始时间为00:04,结束时间为00:05。运镜节点#3的时间戳为[00:06,00:08],表征运镜节点#3对应的运镜效果的开始时间为00:06,结束时间00:08。之后,根据运镜节点#1、运镜节点#2和运镜节点#3对应的开始时间,即00:02、00:04、00:06,分别从待处理视频中,获得对应播放时间戳处的目标视频帧,分别为视频帧P1、视频帧P2和视频帧P3。在之后,对视频帧P1、视频帧P2和视频帧P3分别进行内容检测,得到对应的视频帧内容特征F_1、视频帧内容特征F_2和视频帧内容特征F_3,最后将上述视频帧内容特征F_1、视频帧内容特征F_2和视频帧内容特征F_3的集合(图中示为[F_1,F_2,F_3]),作为视频内容特征。FIG4 is a schematic diagram of a process for determining video content features provided by an embodiment of the present disclosure. The above process will be further described with reference to FIG4. As shown in FIG4, exemplarily, the initial camera movement sequence includes camera movement node #1, camera movement node #2, and camera movement node #3. Among them, the timestamp of camera movement node #1 is [00:02, 00:03], which indicates that the start time of the camera movement effect corresponding to camera movement node #1 is 00:02 (0 minutes and 2 seconds, the same below), and the end time is 00:03. The timestamp of camera movement node #2 is [00:04, 00:05], which indicates that the start time of the camera movement effect corresponding to camera movement node #2 is 00:02 (0 minutes and 2 seconds, the same below), and the end time is 00:03. The start time is 00:04 and the end time is 00:05. The timestamp of camera node #3 is [00:06, 00:08], which indicates that the start time of the camera effect corresponding to camera node #3 is 00:06 and the end time is 00:08. Afterwards, according to the start times corresponding to camera nodes #1, #2 and #3, i.e. 00:02, 00:04 and 00:06, the target video frames at the corresponding playback timestamps are obtained from the video to be processed, which are video frame P1, video frame P2 and video frame P3 respectively. Afterwards, content detection is performed on video frames P1, video frames P2 and video frames P3 respectively to obtain corresponding video frame content features F_1, video frame content features F_2 and video frame content features F_3. Finally, the set of the above video frame content features F_1, video frame content features F_2 and video frame content features F_3 (shown as [F_1, F_2, F_3] in the figure) is used as the video content features.
步骤S103:根据视频内容特征和初始运镜序列,得到优化运镜序列,优化运镜序列包括至少一个优化运镜节点,优化运镜节点对应的运镜特征与视频内容特征相匹配。Step S103: obtaining an optimized camera movement sequence according to the video content features and the initial camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement features corresponding to the optimized camera movement node match the video content features.
示例性地,在得到视频内容特征和初始运镜序列后,根据视频内容特征对初始运镜序列中的至少一个初始运镜节点进行优化,生成对应的优化运镜节点,从而得到优化运镜序列。其中,与初始运镜序列中的初始运镜节点类似,优化运镜节点也用于表征视频播放时镜头画面的运镜特征。但由于优化运镜节点是在初始运镜序列中的初始运镜节点的基础上,基于视频内容特征进行设置后生成的,因此所生成的优化运镜节点能够与视频内容特征相匹配。其中,优化运镜节点能够与视频内容特征相匹配。例如包括:优化运镜节点对应的运动幅度(运镜特征的一种示例性实现),与待处理视频中景别的尺度(视频内容特征的一种示例性实现)相匹配。再例如包括:优化运镜节点对应的镜头画面的画面中心点(运镜效果的一种示例性实现),与待处理视频中人像位置(视频内容特征的一种示例性实现)相匹配。从而在待处理视频与初始视频在内容上不一致时,通过优化运镜节点实现对运镜效果的自适应优化。Exemplarily, after obtaining the video content feature and the initial camera movement sequence, at least one initial camera movement node in the initial camera movement sequence is optimized according to the video content feature to generate a corresponding optimized camera movement node, thereby obtaining an optimized camera movement sequence. Wherein, similar to the initial camera movement node in the initial camera movement sequence, the optimized camera movement node is also used to characterize the camera movement feature of the lens picture during video playback. However, since the optimized camera movement node is generated after being set based on the initial camera movement node in the initial camera movement sequence and based on the video content feature, the generated optimized camera movement node can match the video content feature. Wherein, the optimized camera movement node can match the video content feature. For example, it includes: the motion amplitude corresponding to the optimized camera movement node (an exemplary implementation of the camera movement feature) is matched with the scale of the scene in the video to be processed (an exemplary implementation of the video content feature). Another example includes: the center point of the lens picture corresponding to the optimized camera movement node (an exemplary implementation of the camera movement effect) is matched with the portrait position in the video to be processed (an exemplary implementation of the video content feature). Therefore, when the video to be processed is inconsistent with the initial video in content, the adaptive optimization of the camera movement effect is achieved by optimizing the camera movement node.
在一种可能的实现方式中,视频内容特征包括第一内容特征,第一内容特征表征待处理视频中人像所在的位置。如图5所示,步骤S103 的一种可能的实现方式包括:In a possible implementation, the video content feature includes a first content feature, and the first content feature represents the position of the portrait in the video to be processed. As shown in FIG. 5 , step S103 A possible implementation includes:
步骤S1031:根据第一内容特征,确定初始运镜序列中至少一个运镜节点对应的镜头画面在运动过程中的镜头聚焦区域。Step S1031: determining a lens focus area of a lens picture corresponding to at least one lens movement node in an initial lens movement sequence during the movement process according to the first content feature.
步骤S1032:根据镜头画面在运动过程中的镜头聚焦区域配置对应的运镜节点,生成优化运镜节点。Step S1032: configuring corresponding camera movement nodes according to the lens focus area during the movement of the lens image, and generating optimized camera movement nodes.
步骤S1033:根据优化运镜节点,生成优化运镜序列。Step S1033: Generate an optimized camera movement sequence according to the optimized camera movement nodes.
示例性地,运镜节点具有第一参数,第一参数用于确定运镜节点对应的镜头画面在运动过程中的画面中心点,也即镜头聚焦区域。当视频内容特征为表征待处理视频中人像所在的位置的第一内容特征时,终端设备通过第一内容特征,可以确定待处理视频中人像所在的位置。之后,基于人像所在的位置去设置运镜节点的第一参数,生成优化运镜节点,从而使运镜节点所生成的运镜特效中,镜头画面在运动过程中的画面中心点为该人像所在位置,进而实现对待处理视频中人像的聚焦。相对于直接使用初始运镜序列(初始运镜模板)对待处理视频进行处理的方案,可以避免出现在添加“缩放运镜”、“摇晃运镜”等运镜特效时,镜头画面中心与待处理视频中的人物位置不一致的问题,提高运镜特效的观感效果。Exemplarily, the camera movement node has a first parameter, and the first parameter is used to determine the center point of the lens picture corresponding to the camera movement node during the movement process, that is, the lens focus area. When the video content feature is a first content feature that characterizes the position of the portrait in the video to be processed, the terminal device can determine the position of the portrait in the video to be processed through the first content feature. Afterwards, the first parameter of the camera movement node is set based on the position of the portrait, and the optimized camera movement node is generated, so that in the camera movement special effect generated by the camera movement node, the center point of the lens picture during the movement process is the position of the portrait, thereby realizing the focus of the portrait in the video to be processed. Compared with the solution of directly using the initial camera movement sequence (initial camera movement template) to process the video to be processed, it can avoid the problem that the center of the lens picture is inconsistent with the position of the character in the video to be processed when adding camera movement special effects such as "zoom camera movement" and "shaking camera movement", and improve the visual effect of the camera movement special effect.
参考之前实施例中对视频内容特征的介绍,视频内容特征可以是由多个视频帧内容特征构成的集合,视频帧内容特征对应待处理视频中某一播放位置(例如运镜特效的开始位置)处的视频帧。因此,第一内容特征可以包括待处理视频中多个播放位置处,视频中人像所在位置的坐标。更具体地,例如人物中心点位置的坐标、或者人物面部中心的坐标。之后,基于该第一内容特征所表征的位置,设置初始运镜序列中至少一个运镜节点的第一参数。示例性地,当第一内容特征包括多个视频帧内容特征时,每一视频帧内容特征表征一个人像位置,则根据每一视频帧内容特征去设置与该视频帧内容特征对应的运镜节点(视频帧内容特征是基于运镜节点的时间戳确定,因此二者具有对应关系)。从而实现在待处理视频中的人物处于移动状态时(人像位置变化),镜头画面的画面中心点实在聚焦在人物位置处,进一步突 出人像,提高运镜效果的视觉观感效果。Referring to the introduction of the video content features in the previous embodiment, the video content features may be a set composed of a plurality of video frame content features, and the video frame content features correspond to a video frame at a certain playback position in the video to be processed (for example, the starting position of the camera movement special effect). Therefore, the first content feature may include the coordinates of the position of the portrait in the video at a plurality of playback positions in the video to be processed. More specifically, for example, the coordinates of the position of the center point of the character, or the coordinates of the center of the character's face. Afterwards, based on the position represented by the first content feature, the first parameter of at least one camera movement node in the initial camera movement sequence is set. Exemplarily, when the first content feature includes a plurality of video frame content features, each video frame content feature represents a portrait position, and the camera movement node corresponding to the video frame content feature is set according to each video frame content feature (the video frame content feature is determined based on the timestamp of the camera movement node, so the two have a corresponding relationship). Thereby, when the character in the video to be processed is in a moving state (the position of the portrait changes), the center point of the lens picture is actually focused on the character position, further highlighting the Create portraits and improve the visual effect of camera movement.
图6为本公开实施例提供的一种基于第一内容特征生成的优化运镜节点的示意图。如图6所示,基于表征待处理视频中人像所在的位置的第一内容特征生成的优化运镜节点#1,所实现的运镜效果为“缩放运镜”。在实现该运镜效果的过程中,镜头画面在缩放过程中,镜头画面的中心对准待处理视频中人像位置所在的基准点P1,从而实现人像聚焦的效果。FIG6 is a schematic diagram of an optimized camera movement node generated based on a first content feature provided by an embodiment of the present disclosure. As shown in FIG6 , the camera movement effect achieved by the optimized camera movement node #1 generated based on the first content feature representing the position of the portrait in the video to be processed is "zooming camera movement". In the process of achieving this camera movement effect, during the zooming process, the center of the lens picture is aligned with the reference point P1 where the portrait is located in the video to be processed, thereby achieving the effect of focusing on the portrait.
在另一种可能的实现方式中,视频内容特征包括第二内容特征,第二内容特征表征待处理视频中景别的尺度,如图7所示,步骤S103的另一种可能的实现方式包括:In another possible implementation, the video content feature includes a second content feature, and the second content feature represents the scale of the scene in the to-be-processed video. As shown in FIG. 7 , another possible implementation of step S103 includes:
步骤S1034:根据第二内容特征,确定初始运镜序列中至少一个运镜节点对应的镜头画面的在运动过程中的运动幅度。Step S1034: determining the movement amplitude of the shot image corresponding to at least one camera movement node in the initial camera movement sequence during the movement process according to the second content feature.
步骤S1035:根据镜头画面的在运动过程中的运动幅度配置对应的运镜节点,生成优化运镜节点。Step S1035: configuring corresponding camera movement nodes according to the movement amplitude of the lens image during the movement process, and generating optimized camera movement nodes.
步骤S1036:根据优化运镜节点,生成优化运镜序列。Step S1036: Generate an optimized camera movement sequence according to the optimized camera movement nodes.
示例性地,运镜节点具有第二参数,第二参数用于确定运镜节点对应的镜头画面的在运动过程中的运动幅度。当视频内容特征为表征待处理视频中景别的尺度的第二内容特征时,终端设备通过第二内容特征,可以确定待处理视频中景别的尺度。更具体地,即近景、中景、远景等。当然,在其他实现方式中,第二内容特征所表征的景别的尺度,可以是连续变化的量值,以次来表示更加细化的景别尺度,此处不再一一举例。之后,基于景别的尺度去设置运镜节点的第二参数,生成优化运镜节点,从而使运镜节点所生成的运镜特效中,镜头画面在晃动过程中的晃动幅度,与景别相匹配。具体地,即景别越远,相对的运动幅度越大;而即景别越远,相对的运动幅度越小,从而避免“摇晃运镜”等运镜特效时,不合适的运镜晃动幅度导致眩晕感,提高运镜特效的观感效果。Exemplarily, the camera movement node has a second parameter, and the second parameter is used to determine the movement amplitude of the lens picture corresponding to the camera movement node during the movement process. When the video content feature is a second content feature that characterizes the scale of the scene in the video to be processed, the terminal device can determine the scale of the scene in the video to be processed through the second content feature. More specifically, that is, close view, medium view, long view, etc. Of course, in other implementations, the scale of the scene represented by the second content feature can be a continuously changing value, which is used to represent a more refined scene scale, and examples are not given one by one here. Afterwards, the second parameter of the camera movement node is set based on the scale of the scene, and the optimized camera movement node is generated, so that the shaking amplitude of the lens picture in the camera movement special effect generated by the camera movement node during the shaking process matches the scene. Specifically, the farther the scene is, the larger the relative movement amplitude is; and the farther the scene is, the smaller the relative movement amplitude is, so as to avoid the dizziness caused by the inappropriate camera movement shaking amplitude when the camera movement special effects such as "shaking camera movement" are used, and the viewing effect of the camera movement special effects is improved.
图8为本公开实施例提供的一种基于第二内容特征生成的优化运镜节点的示意图。如图8所示,基于表征待处理视频中景别的尺度的 第二内容特征生成的优化运镜节点#2,所实现的运镜效果为“左右晃动运镜”。在实现该运镜效果的过程中,当待处理视频中景别的尺度=03(表征远景)时,镜头画面P1运动至镜头画面P2的过程中,晃动幅度为a。当待处理视频中景别的尺度=01(表征近景)时,镜头画面P3运动至镜头画面P4的过程中,晃动幅度为b。其中,示例性地,b=0.2a。通过第一内容特征设置运镜节点的第二参数,实现左右晃动运镜的自适应幅度调整的效果。FIG8 is a schematic diagram of an optimized camera movement node generated based on a second content feature provided by an embodiment of the present disclosure. The optimized camera movement node #2 generated by the second content feature realizes the camera movement effect of "left and right shaking camera movement". In the process of realizing this camera movement effect, when the scale of the scene in the video to be processed is 03 (representing the distant view), the shaking amplitude is a during the movement of the lens image P1 to the lens image P2. When the scale of the scene in the video to be processed is 01 (representing the close view), the shaking amplitude is b during the movement of the lens image P3 to the lens image P4. Among them, illustratively, b = 0.2a. The second parameter of the camera movement node is set by the first content feature to achieve the effect of adaptive amplitude adjustment of the left and right shaking camera movement.
在又一种可能的实现方式中,视频内容特征包括第三内容特征,第三内容特征表征待处理视频中的目标对象,如图9所示,步骤S103的再一种可能的实现方式包括:In another possible implementation, the video content feature includes a third content feature, and the third content feature represents a target object in the video to be processed. As shown in FIG. 9 , another possible implementation of step S103 includes:
步骤S1037:根据第三内容特征,确定初始运镜序列中至少一个运镜节点对应的镜头画面的在运镜过程中的变化频率,变化频率小于预设的频率阈值。Step S1037: determining the change frequency of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the camera movement process according to the third content feature, and the change frequency is less than a preset frequency threshold.
步骤S1038:根据镜头画面的在运镜过程中的变化频率配置对应的运镜节点,生成优化运镜节点。Step S1038: configuring corresponding camera movement nodes according to the frequency of changes of the lens image during the camera movement process, and generating optimized camera movement nodes.
步骤S1039:根据优化运镜节点,生成优化运镜序列。Step S1039: Generate an optimized camera movement sequence according to the optimized camera movement nodes.
示例性地,运镜节点具有第三参数,第三参数用于确定运镜节点对应的镜头画面的在运镜过程中的变化频率。当视频内容特征为表征待处理视频中的目标对象的第三内容特征时,终端设备通过第三内容特征,可以确定待处理视频中是否存在该目标对象。例如是否存在面部、手部等重要部位,以及上述部位所在的位置和大小。之后,基于目标对象去设置运镜节点的第二参数,生成优化运镜节点,从而实现运镜节点所生成的运镜特效中,镜头画面的晃动频率控制。具体地,例如,针对“摇晃运镜”等运镜特效时,一种可能的实现方式中,当基于第三内容特征,确定待处理视频中存在人像面部、手指(目标对象)时,则减少高频的晃动。例如通过第三参数降低晃动频率(变化频率)至某个预设频率值以下,从而提高目标对象的聚焦效果,避免由于过于频繁的晃动造成用户无法看清人像面部表情、手指动作等目标对象的问题。 Exemplarily, the camera movement node has a third parameter, and the third parameter is used to determine the frequency of change of the lens picture corresponding to the camera movement node during the camera movement process. When the video content feature is a third content feature that characterizes the target object in the video to be processed, the terminal device can determine whether the target object exists in the video to be processed through the third content feature. For example, whether there are important parts such as face and hands, and the position and size of the above parts. Afterwards, the second parameter of the camera movement node is set based on the target object to generate an optimized camera movement node, thereby realizing the control of the shaking frequency of the lens picture in the camera movement special effect generated by the camera movement node. Specifically, for example, for camera movement special effects such as "shaking camera movement", in a possible implementation method, when it is determined based on the third content feature that there are portrait faces and fingers (target objects) in the video to be processed, the high-frequency shaking is reduced. For example, the shaking frequency (changing frequency) is reduced to below a preset frequency value by the third parameter, thereby improving the focusing effect of the target object, avoiding the problem that the user cannot see the target object such as the facial expression of the portrait and the finger movement due to too frequent shaking.
在另一种可能的实现方式中,如图10所示,步骤S1037的具体实现方式包括:In another possible implementation, as shown in FIG10 , a specific implementation of step S1037 includes:
步骤S1037A:获取目标对象的轮廓尺寸。Step S1037A: Obtain the outline size of the target object.
步骤S1037B:根据轮廓尺寸,得到对应的目标运动频率,其中,目标运动频率与轮廓尺寸成反比。Step S1037B: Obtain the corresponding target motion frequency according to the outline size, wherein the target motion frequency is inversely proportional to the outline size.
步骤S1037C:根据目标运动频率,设置第三参数,第三参数用于确定运镜节点对应的镜头画面的在运镜过程中的变化频率。Step S1037C: according to the target motion frequency, a third parameter is set, where the third parameter is used to determine the frequency of change of the lens image corresponding to the camera movement node during the camera movement process.
示例性地,当基于第三内容特征,得到目标对象的轮廓尺寸(例如面部轮廓的面积)后,基于目标对象的轮廓尺寸,设置对应的晃动频率(目标运动频率)。具体地,例如,当面部轮廓的轮廓尺寸越大,则晃动频率越低。反之,当轮廓尺寸越小,则晃动频率越高(但不大于第三参数的初始值)。从而实现运镜特效中的晃动频率控制,降低高频晃动对视频中的人脸、手指等目标对象的观感影响,提高运镜视觉效果。Exemplarily, after obtaining the outline size of the target object (e.g., the area of the facial outline) based on the third content feature, the corresponding shaking frequency (target motion frequency) is set based on the outline size of the target object. Specifically, for example, when the outline size of the facial outline is larger, the shaking frequency is lower. Conversely, when the outline size is smaller, the shaking frequency is higher (but not greater than the initial value of the third parameter). This achieves shaking frequency control in the camera movement special effect, reduces the visual impact of high-frequency shaking on target objects such as faces and fingers in the video, and improves the visual effect of camera movement.
其中,需要说明的是,上述视频内容特征的具体实现,既可以是上述第一内容特征、第二内容特征、第三内容特征中的一种,也可以是上述第一内容特征、第二内容特征、第三内容特征中的多种。即可以通过第一参数、第二参数、第三参数中的至少两种,来对运镜节点进行配置,生成优化运镜节点和对应的优化运镜序列。具体实现方式可根据需要设置,此处不再赘述。It should be noted that the specific implementation of the above video content feature can be one of the above first content feature, the second content feature, and the third content feature, or multiple of the above first content feature, the second content feature, and the third content feature. That is, the camera movement node can be configured by at least two of the first parameter, the second parameter, and the third parameter to generate an optimized camera movement node and a corresponding optimized camera movement sequence. The specific implementation method can be set as needed and will not be repeated here.
步骤S104:基于优化运镜序列处理待处理视频,生成输出视频,输出视频的镜头画面具有优化运镜节点对应运镜特征。Step S104: Process the video to be processed based on the optimized camera movement sequence to generate an output video, wherein the shot images of the output video have camera movement features corresponding to the optimized camera movement nodes.
示例性地,在得到优化运镜序列后,基于该优化运镜序列中的各优化运镜节点,在待处理视频的对应位置生成运镜特效,从而生成带有一个或多个运镜特效的特效视频,即输出视频。由于输出视频中的运镜特效,是基于优化后的优化运镜序列生成的。因此,输出视频的镜头画面具有优化运镜节点对应运镜特征,从而使输出视频中的运镜特效与视频内容相匹配。Exemplarily, after obtaining the optimized camera movement sequence, camera movement special effects are generated at corresponding positions of the video to be processed based on each optimized camera movement node in the optimized camera movement sequence, thereby generating a special effects video with one or more camera movement special effects, i.e., an output video. Since the camera movement special effects in the output video are generated based on the optimized optimized camera movement sequence, the lens images of the output video have camera movement features corresponding to the optimized camera movement nodes, so that the camera movement special effects in the output video match the video content.
在本实施例中,通过根据初始运镜模板,获得初始运镜序列。初 始运镜序列包括初始视频对应的至少一个运镜节点,运镜节点用于表征视频播放时镜头画面的运镜特征。获取待处理视频的视频内容特征,并根据视频内容特征和初始运镜序列,得到优化运镜序列。优化运镜序列包括至少一个优化运镜节点,优化运镜节点对应的运镜特征与视频内容特征相匹配。基于优化运镜序列处理待处理视频,生成输出视频,输出视频的镜头画面具有优化运镜节点对应运镜特征。通过利用待处理视频的视频内容特征,对初始运镜模板进行设置,得到由优化运镜节点构成的优化运镜序列。并基于优化运镜序列生带有运镜效果的输出视频,由于优化运镜序列中的优化运镜节点对应的运镜特征与待处理视频的视频内容特征相匹配。因此基于优化运镜序列和待处理视频生成输出视频,避免了初始运镜模板所生成的运镜特效和待处理视频的内容不匹配的问题,提高运镜特效的视觉观感。In this embodiment, the initial camera movement sequence is obtained according to the initial camera movement template. The initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the lens picture when the video is played. The video content characteristics of the video to be processed are obtained, and the optimized camera movement sequence is obtained according to the video content characteristics and the initial camera movement sequence. The optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement characteristics corresponding to the optimized camera movement node match the video content characteristics. The video to be processed is processed based on the optimized camera movement sequence to generate an output video, and the lens picture of the output video has the camera movement characteristics corresponding to the optimized camera movement node. By using the video content characteristics of the video to be processed, the initial camera movement template is set to obtain an optimized camera movement sequence composed of optimized camera movement nodes. And based on the optimized camera movement sequence, an output video with a camera movement effect is generated, because the camera movement characteristics corresponding to the optimized camera movement nodes in the optimized camera movement sequence match the video content characteristics of the video to be processed. Therefore, the output video is generated based on the optimized camera movement sequence and the video to be processed, avoiding the problem that the camera movement special effects generated by the initial camera movement template do not match the content of the video to be processed, and improving the visual perception of the camera movement special effects.
参考图11,图11为本公开实施例提供的视频生成方法的流程示意图二。本实施例在图2所示实施例的基础上,进一步地对步骤S103进行细化,该视频生成方法包括:Referring to FIG11 , FIG11 is a second flow chart of the video generation method provided by the embodiment of the present disclosure. Based on the embodiment shown in FIG2 , this embodiment further refines step S103 , and the video generation method includes:
步骤S201:根据初始运镜模板,获得初始运镜序列,初始运镜序列包括初始视频对应的至少一个运镜节点,运镜节点用于表征视频播放时镜头画面的运镜特征。Step S201: obtaining an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to an initial video, and the camera movement node is used to represent camera movement features of a camera shot during video playback.
步骤S202:获取待处理视频的第一视频内容特征。Step S202: Obtain a first video content feature of the video to be processed.
步骤S203:获取初始视频的第二视频内容特征。Step S203: Acquire the second video content feature of the initial video.
步骤S204:根据第二视频内容特征和第二视频内容特征,得到特征偏差信息,特征偏差信息表征待处理视频和初始视频在视频内容上的特征偏差量。Step S204: obtaining feature deviation information according to the second video content feature and the first video content feature, wherein the feature deviation information represents the feature deviation amount in video content between the video to be processed and the initial video.
示例性地,初始视频时生成初始运镜序列所使用的视频,也即视频内容与初始运镜序列所生成的运镜特效相匹配的视频。一种可能的实现方式中,初始运镜模板对应的模板数据中,包含有该初始运镜模板对应的初始视频。之后,对初始视频进行内容检测,即可得到初始视频对应的视频内容特征,即第二视频内容特征。其中,获取第二视频内容特征的具体实现方式,与获取待处理视频的视频内容特征(第 一视频内容特征)的实现方式相同。可参见图2所示实施例中获取待处理视频的视频内容特征的具体实现方式的介绍,此处不再赘述。Exemplarily, the initial video is the video used to generate the initial camera movement sequence, that is, the video content matches the camera movement special effects generated by the initial camera movement sequence. In one possible implementation, the template data corresponding to the initial camera movement template includes the initial video corresponding to the initial camera movement template. Afterwards, the initial video is subjected to content detection to obtain the video content features corresponding to the initial video, that is, the second video content features. Among them, the specific implementation method of obtaining the second video content features is similar to that of obtaining the video content features of the video to be processed (the second video content features). The implementation method of obtaining the video content feature of the video to be processed is the same as that of the video content feature of the video to be processed. Please refer to the introduction of the specific implementation method of obtaining the video content feature of the video to be processed in the embodiment shown in Figure 2, which will not be repeated here.
进一步地,在得到第一视频内容特征和第二视频内容特征后,可以基于第一视频内容特征和第二视频内容特征进行特征对比,得到特征偏差信息。该特征偏差量表征初始视频和待处理视频在视频内容上的特征偏差量。具体地,例如,第一视频内容特征表征待处理视频中人像所在的第一位置、第二视频内容特征表征初始视频中人像所在的第二位置、通过第一视频内容特征和第一视频内容特征,得到表征第一位置和第二位置的偏差量的特征偏差信息。Furthermore, after obtaining the first video content feature and the second video content feature, feature comparison can be performed based on the first video content feature and the second video content feature to obtain feature deviation information. The feature deviation amount represents the feature deviation amount between the initial video and the video to be processed in terms of video content. Specifically, for example, the first video content feature represents the first position where the portrait is located in the video to be processed, and the second video content feature represents the second position where the portrait is located in the initial video. The feature deviation information representing the deviation amount between the first position and the second position is obtained through the first video content feature and the second video content feature.
步骤S205:根据特征偏差信息,设置初始运镜序列中的至少一个运镜节点的节点参数,得到优化运镜序列。Step S205: setting node parameters of at least one camera movement node in the initial camera movement sequence according to the feature deviation information to obtain an optimized camera movement sequence.
示例性地,在得到特征偏差信息后,基于该特征偏差信息所表征的特征偏差量,对初始运镜序列中的运镜节点的节点参数进行修正。从而使该运镜节点能够与待处理视频的视频内容特征(第一视频内容特征)相匹配,达到运镜节点优化的目的,进而得到优化运镜序列。Exemplarily, after obtaining the characteristic deviation information, the node parameters of the camera movement node in the initial camera movement sequence are corrected based on the characteristic deviation amount represented by the characteristic deviation information, so that the camera movement node can match the video content feature (first video content feature) of the video to be processed, so as to achieve the purpose of camera movement node optimization, and then obtain an optimized camera movement sequence.
在一种可能的实现方式中,特征偏差信息包括位置偏差量,位置偏差量表征初始视频中人像所在的第一位置和待处理视频中人像所在的第二位置的偏差。如图12所示,步骤S205的具体实现方式包括:In a possible implementation, the feature deviation information includes a position deviation, and the position deviation represents a deviation between a first position where the portrait is located in the initial video and a second position where the portrait is located in the video to be processed. As shown in FIG. 12 , a specific implementation of step S205 includes:
步骤S2051:获取初始运镜序列中位置偏差量对应的目标运镜节点。Step S2051: Obtain the target camera movement node corresponding to the position deviation in the initial camera movement sequence.
步骤S2052:根据位置偏差量,设置目标运镜节点对应的镜头画面在运镜过程中的画面中心点,得到优化运镜序列。Step S2052: according to the position deviation, the center point of the lens picture corresponding to the target camera movement node during the camera movement process is set to obtain an optimized camera movement sequence.
示例性地,由于特征偏差信息是基于第一视频内容特征和第二视频内容特征的特征对比而得到的,而第一视频内容特征和第二视频内容特征是基于相同的运镜节点所对应的视频帧得到的。因此,特征偏差信息与初始运镜序列中的某个运镜节点相对应。本实施例中,与特征偏差信息(位置偏差量)对应的运镜节点为目标运镜节点。Exemplarily, since the feature deviation information is obtained based on the feature comparison between the first video content feature and the second video content feature, and the first video content feature and the second video content feature are obtained based on the video frame corresponding to the same camera movement node. Therefore, the feature deviation information corresponds to a camera movement node in the initial camera movement sequence. In this embodiment, the camera movement node corresponding to the feature deviation information (position deviation amount) is the target camera movement node.
进一步地,当特征偏差信息为表征初始视频中人像所在的第一位置和待处理视频中人像所在的第二位置的偏差的位置偏差量时,通过 特征偏差信息,可以实现对运镜节点所对应的镜头画面在运镜过程中的画面中心点的调整。具体地,例如,位置偏差量为一个偏差向量,通过该偏差向量,设置目标运镜节点对应的镜头画面在运镜过程中的画面中心点。从而使该画面中心点产生偏差向量对应的偏移,实现对目标运镜节点的优化。进一步地,随着至少一个目标运镜节点的优化,同时实现了对初始运镜序列的优化,从而生成初始运镜序列对应的优化运镜序列。优化后的优化运镜序列所生成的运镜效果,可以聚焦在待处理视频中人像所在的位置,提高视觉观感体验。其中,本实施例中设置目标运镜节点对应的镜头画面在运镜过程中的画面中心点的步骤,可以通过对目标运镜节点的第一参数进行设置实现,具体实现方式可参见图2所示实施例中相关介绍,此处不再赘述。Furthermore, when the characteristic deviation information is a position deviation amount representing the deviation between a first position where a person portrait is located in the initial video and a second position where a person portrait is located in the video to be processed, by The characteristic deviation information can realize the adjustment of the center point of the picture of the lens picture corresponding to the camera movement node during the camera movement process. Specifically, for example, the position deviation amount is a deviation vector, through which the center point of the picture of the lens picture corresponding to the target camera movement node during the camera movement process is set. Thereby, the center point of the picture generates an offset corresponding to the deviation vector, and the optimization of the target camera movement node is realized. Further, with the optimization of at least one target camera movement node, the optimization of the initial camera movement sequence is realized at the same time, thereby generating an optimized camera movement sequence corresponding to the initial camera movement sequence. The camera movement effect generated by the optimized optimized camera movement sequence can be focused on the position where the portrait is located in the video to be processed, thereby improving the visual perception experience. Among them, the step of setting the center point of the picture of the lens picture corresponding to the target camera movement node during the camera movement process in this embodiment can be realized by setting the first parameter of the target camera movement node. The specific implementation method can refer to the relevant introduction in the embodiment shown in Figure 2, which will not be repeated here.
在另一种可能的实现方式中,特征偏差信息包括尺度系数,尺度系数表征初始视频中的景别尺度和待处理视频中的景别尺度的比例。步骤S205的具体实现方式包括:In another possible implementation, the feature deviation information includes a scale coefficient, and the scale coefficient represents the ratio of the scene scale in the initial video to the scene scale in the video to be processed. The specific implementation of step S205 includes:
步骤S2053:获取初始运镜序列中尺度系数对应的目标运镜节点。Step S2053: Obtain the target camera movement node corresponding to the scale coefficient in the initial camera movement sequence.
步骤S2054:根据尺度系数,设置目标运镜节点对应的镜头画面的在运动过程中的运动幅度,得到优化运镜序列。Step S2054: according to the scale coefficient, the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process is set to obtain an optimized camera movement sequence.
示例性地,与上述基于位置偏差量得到优化运镜序列的过程类似。当特征偏差信息为表征初始视频中的景别尺度和待处理视频中的景别尺度的比例的尺度系数时,首先获取尺度系数对应的目标运镜节点,实现方式与上述实施例中获取位置偏差量对应的目标运镜节点的步骤类似,不再赘述。之后,根据尺度系数,设置目标运镜节点对应的镜头画面的在运动过程中的运动幅度,实现在“晃动运镜”等运镜特效中,对镜头画面的晃动幅度的设置。从而实现对目标运镜节点的优化,以及生成初始运镜序列对应的优化运镜序列。优化后的优化运镜序列所生成的运镜效果的运动幅度,与景别尺度相匹配。从而避免了近景下大幅度运动造成的视觉疲劳和眩晕感,以及远景下过小幅度的运行造成运镜特效不明显的问题,提高视觉观感体验。其中,本实施例中设置目标运镜节点对应的镜头画面的在运动过程中的运动幅度的步 骤,可以通过对目标运镜节点的第二参数进行设置实现。具体实现方式可参见图2所示实施例中相关介绍,此处不再赘述。Exemplarily, it is similar to the above-mentioned process of obtaining an optimized camera movement sequence based on the position deviation. When the characteristic deviation information is a scale coefficient that characterizes the ratio of the scene scale in the initial video to the scene scale in the video to be processed, the target camera movement node corresponding to the scale coefficient is first obtained. The implementation method is similar to the step of obtaining the target camera movement node corresponding to the position deviation in the above-mentioned embodiment, and will not be repeated. Afterwards, according to the scale coefficient, the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process is set to achieve the setting of the shaking amplitude of the lens picture in the camera movement special effects such as "shaking camera movement". Thereby, the optimization of the target camera movement node is achieved, and the optimized camera movement sequence corresponding to the initial camera movement sequence is generated. The movement amplitude of the camera movement effect generated by the optimized optimized camera movement sequence matches the scene scale. Thereby, the visual fatigue and dizziness caused by large-scale movement in the close view and the problem of unclear camera movement special effects caused by too small movement in the distant view are avoided, and the visual perception experience is improved. Among them, in this embodiment, the step of setting the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process is as follows: The step can be realized by setting the second parameter of the target camera movement node. The specific implementation method can be found in the relevant introduction of the embodiment shown in FIG2, which will not be repeated here.
可选地,在一种可能的实现方式中,初始视频具有第一播放时长,待处理视频具有第二播放时长。当第二播放时长大于第一播放时长时,在步骤S205之后,还包括:Optionally, in a possible implementation, the initial video has a first play duration, and the video to be processed has a second play duration. When the second play duration is greater than the first play duration, after step S205, the method further includes:
步骤S206:根据优化运镜序列中的至少一个运镜节点,生成延长运镜序列。延长运镜序列所对应的镜头画面的运镜时长不小于第二播放时长与第一播放时长的差。Step S206: Generate an extended camera movement sequence according to at least one camera movement node in the optimized camera movement sequence. The camera movement duration of the lens picture corresponding to the extended camera movement sequence is not less than the difference between the second playback duration and the first playback duration.
步骤S207:依次基于优化运镜序列和延长运镜序列处理待处理视频,生成输出视频。Step S207: Process the video to be processed based on the optimized camera movement sequence and the extended camera movement sequence in sequence to generate an output video.
示例性地,当初始视频的播放时长大于待处理视频的播放时长时,会导致基于初始视频生成的初始运镜模板的播放时长小于待处理视频,进而导致初始运镜序列生成的运镜特效无法覆盖全部待处理视频。针对该问题,本实施例通过从已生成的优化运镜序列中,获取至少一个的运镜节点,来生成延迟运镜序列。之后依次基于优化运镜序列和延长运镜序列处理待处理视频,从而使优化运镜序列和延长运镜序列所产生的运镜特效能够覆盖全部待处理视频,避免生成的输出视频中存在无运镜特效的空白特效段。同时,由于基于优化运镜序列中的运镜节点生成延长运镜序列,因此可以保证延长运镜序列与待处理视频中未被优化运镜序列覆盖的剩余视频片段的匹配,提高输出视频的运镜特效效果。Exemplarily, when the playback duration of the initial video is longer than the playback duration of the video to be processed, the playback duration of the initial camera movement template generated based on the initial video will be shorter than the video to be processed, thereby causing the camera movement special effects generated by the initial camera movement sequence to be unable to cover the entire video to be processed. In response to this problem, this embodiment generates a delayed camera movement sequence by obtaining at least one camera movement node from the generated optimized camera movement sequence. After that, the video to be processed is processed based on the optimized camera movement sequence and the extended camera movement sequence in turn, so that the camera movement special effects generated by the optimized camera movement sequence and the extended camera movement sequence can cover the entire video to be processed, avoiding the existence of blank special effects segments without camera movement special effects in the generated output video. At the same time, since the extended camera movement sequence is generated based on the camera movement nodes in the optimized camera movement sequence, it can be ensured that the extended camera movement sequence matches the remaining video clips in the video to be processed that are not covered by the optimized camera movement sequence, thereby improving the camera movement special effects of the output video.
示例性地,如图13所示,步骤S206的具体实现方式包括:Exemplarily, as shown in FIG13 , the specific implementation of step S206 includes:
步骤S2061:获取待处理视频中第一视频片段的识别特征。其中,第一视频片段为待处理视频中第一播放时长之后的视频段,识别特征包括第一视频片段的视频内容特征和/或音乐旋律特征。Step S2061: Obtain identification features of the first video segment in the video to be processed, wherein the first video segment is a video segment after a first playback duration in the video to be processed, and the identification features include video content features and/or music melody features of the first video segment.
步骤S2062:根据识别特征,从待处理视频中获取第二视频片段,第二视频片段为待处理视频中第一播放时长之前的视频段。第二视频片段的识别特征与第一视频片段的识别特征相似。Step S2062: According to the identification feature, a second video segment is obtained from the video to be processed, where the second video segment is a video segment before the first playback duration in the video to be processed. The identification feature of the second video segment is similar to the identification feature of the first video segment.
步骤S2063:根据第二视频片段对应的至少一个连续的运镜节点, 生成延长运镜序列。Step S2063: According to at least one continuous camera movement node corresponding to the second video clip, Generates an extended camera sequence.
示例性地,当待处理视频的播放时长大于初始视频时,可以将待处理视频中分为两部分,即第一播放时长之后的部分和第一播放时长之前的部分。其中,针对第一播放时长之后的部分,可以基于运镜节点的时长,分为一个或多个子片段,即第一视频片段。之后,针对第一视频片段进行特征提取,得到第一视频片段对应的识别特征,其中,识别特征例如包括第一视频片段的视频内容特征和/或音乐旋律特征。再之后,基于该识别特征,对第一播放时长之前的部分进行搜索,得到与第一视频片段的识别特征相似的第二视频片段;其中,识别特征相似的判断方式包括:第二视频片段的识别特征与第一视频片段的识别特征的特征距离小于特征阈值。之后,根据第二视频片段的时间戳,获取所对应的至少一个连续的复用运镜节点。在之后,重复上述过程,直至得到各第一视频片段对应的复用运镜节点,并将上述复用运镜节点进行拼接,生成延长运镜序列。由于第二视频片段的识别特征与第一视频片段的视频特征相似,因此,为第一视频片段选择的复用运镜节点,也可以在一定程度上匹配第一视频片段的视频内容特征。从而使生成的延长运镜序列能够继承优化运镜序列的效果,实现对待处理视频中未被优化运镜序列所覆盖的视频段的适配,提高所生成的输出视频的整体运镜特效的视觉效果。Exemplarily, when the playback duration of the video to be processed is longer than that of the initial video, the video to be processed can be divided into two parts, namely, the part after the first playback duration and the part before the first playback duration. Among them, for the part after the first playback duration, it can be divided into one or more sub-segments, namely, the first video segment, based on the duration of the camera movement node. Afterwards, feature extraction is performed on the first video segment to obtain the identification feature corresponding to the first video segment, wherein the identification feature, for example, includes the video content feature and/or the music melody feature of the first video segment. Afterwards, based on the identification feature, the part before the first playback duration is searched to obtain a second video segment similar to the identification feature of the first video segment; wherein the judgment method of similar identification features includes: the feature distance between the identification feature of the second video segment and the identification feature of the first video segment is less than the feature threshold. Afterwards, according to the timestamp of the second video segment, at least one corresponding continuous multiplexed camera movement node is obtained. Afterwards, the above process is repeated until the multiplexed camera movement nodes corresponding to each first video segment are obtained, and the above multiplexed camera movement nodes are spliced to generate an extended camera movement sequence. Since the identification features of the second video clip are similar to the video features of the first video clip, the multiplexed camera movement node selected for the first video clip can also match the video content features of the first video clip to a certain extent. Therefore, the generated extended camera movement sequence can inherit the effect of the optimized camera movement sequence, achieve the adaptation of the video segments in the processed video that are not covered by the optimized camera movement sequence, and improve the visual effect of the overall camera movement special effect of the generated output video.
本实施例中,步骤S201-步骤S202的实现方式与本公开图2所示实施例中的步骤S101-步骤S102的实现方式相同,在此不再一一赘述。In this embodiment, the implementation of step S201-step S202 is the same as the implementation of step S101-step S102 in the embodiment shown in FIG. 2 of the present disclosure, and will not be described in detail here.
对应于上文实施例的视频生成方法,图14为本公开实施例提供的视频生成装置的结构框图。为了便于说明,仅示出了与本公开实施例相关的部分。参照图14,视频生成装置3,包括:Corresponding to the video generation method of the above embodiment, FIG14 is a structural block diagram of a video generation device provided by an embodiment of the present disclosure. For ease of explanation, only the parts related to the embodiment of the present disclosure are shown. Referring to FIG14 , the video generation device 3 includes:
加载模块31,用于根据初始运镜模板,获得初始运镜序列,初始运镜序列包括初始视频对应的至少一个运镜节点,运镜节点用于表征视频播放时镜头画面的运镜特征;A loading module 31 is used to obtain an initial camera movement sequence according to an initial camera movement template, wherein the initial camera movement sequence includes at least one camera movement node corresponding to an initial video, and the camera movement node is used to represent the camera movement characteristics of a lens picture during video playback;
处理模块32,用于获取待处理视频的视频内容特征,并根据视频内容特征和初始运镜序列,得到优化运镜序列,优化运镜序列包括至 少一个优化运镜节点,优化运镜节点对应的运镜特征与视频内容特征相匹配;The processing module 32 is used to obtain the video content features of the video to be processed, and obtain an optimized camera sequence according to the video content features and the initial camera sequence. The optimized camera sequence includes One less optimized camera movement node, the camera movement features corresponding to the optimized camera movement node match the video content features;
输出模块33,用于基于优化运镜序列处理待处理视频,生成输出视频,输出视频的镜头画面具有优化运镜节点对应运镜特征。The output module 33 is used to process the video to be processed based on the optimized camera movement sequence to generate an output video, and the lens images of the output video have the camera movement features corresponding to the optimized camera movement nodes.
在本公开的一个实施例中,视频内容特征包括第一内容特征,第一内容特征表征待处理视频中人像所在的位置;处理模块32在根据视频内容特征和初始运镜序列,得到优化运镜序列时,具体用于:根据第一内容特征,生成初始运镜序列中至少一个运镜节点的第一参数,其中,第一参数用于确定运镜节点对应的镜头画面在运动过程中的画面中心点;根据第一参数配置对应的运镜节点,生成优化运镜节点;根据优化运镜节点,生成优化运镜序列。In one embodiment of the present disclosure, the video content feature includes a first content feature, and the first content feature characterizes the position of the portrait in the video to be processed; when the processing module 32 obtains the optimized camera movement sequence according to the video content feature and the initial camera movement sequence, it is specifically used to: generate a first parameter of at least one camera movement node in the initial camera movement sequence according to the first content feature, wherein the first parameter is used to determine the center point of the lens picture corresponding to the camera movement node during the movement process; configure the corresponding camera movement node according to the first parameter to generate an optimized camera movement node; and generate an optimized camera movement sequence according to the optimized camera movement node.
在本公开的一个实施例中,视频内容特征包括第二内容特征,第二内容特征表征待处理视频中景别的尺度;处理模块32在根据视频内容特征和初始运镜序列,得到优化运镜序列时,具体用于:根据第二内容特征,生成初始运镜序列中至少一个运镜节点的第二参数,其中,第二参数用于确定运镜节点对应的镜头画面的在运动过程中的运动幅度;根据第二参数配置对应的运镜节点,生成优化运镜节点;根据优化运镜节点,生成优化运镜序列。In one embodiment of the present disclosure, the video content feature includes a second content feature, and the second content feature characterizes the scale of the scene in the video to be processed; when the processing module 32 obtains the optimized camera movement sequence according to the video content feature and the initial camera movement sequence, it is specifically used to: generate a second parameter of at least one camera movement node in the initial camera movement sequence according to the second content feature, wherein the second parameter is used to determine the movement amplitude of the lens picture corresponding to the camera movement node during the movement process; configure the corresponding camera movement node according to the second parameter to generate an optimized camera movement node; and generate an optimized camera movement sequence according to the optimized camera movement node.
在本公开的一个实施例中,视频内容特征包括第三内容特征,第三内容特征表征待处理视频中的面部轮廓;处理模块32在根据视频内容特征和初始运镜序列,得到优化运镜序列时,具体用于:根据第三内容特征,生成初始运镜序列中至少一个运镜节点的第三参数,其中,第三参数用于确定运镜节点对应的镜头画面的在运镜过程中的变化频率;根据第三参数配置对应的运镜节点,生成优化运镜节点;根据优化运镜节点,生成优化运镜序列。In one embodiment of the present disclosure, the video content feature includes a third content feature, and the third content feature characterizes the facial contour in the video to be processed; when the processing module 32 obtains the optimized camera movement sequence according to the video content feature and the initial camera movement sequence, it is specifically used to: generate a third parameter of at least one camera movement node in the initial camera movement sequence according to the third content feature, wherein the third parameter is used to determine the frequency of change of the lens picture corresponding to the camera movement node during the camera movement process; configure the corresponding camera movement node according to the third parameter to generate an optimized camera movement node; and generate an optimized camera movement sequence according to the optimized camera movement node.
在本公开的一个实施例中,处理模块32在根据第三内容特征,生成初始运镜序列中至少一个运镜节点的第三参数时,具体用于:获取面部轮廓的轮廓尺寸;根据轮廓尺寸,得到对应的目标运动频率,其中,目标运动频率与轮廓尺寸成反比;根据目标运动频率,设置第三 参数。In one embodiment of the present disclosure, when the processing module 32 generates the third parameter of at least one camera movement node in the initial camera movement sequence according to the third content feature, it is specifically used to: obtain the contour size of the facial contour; obtain the corresponding target motion frequency according to the contour size, wherein the target motion frequency is inversely proportional to the contour size; and set the third parameter.
在本公开的一个实施例中,处理模块32在根据视频内容特征和初始运镜序列,得到优化运镜序列时,具体用于:获取初始视频的视频内容特征;根据初始视频的视频内容特征和待处理视频的视频内容特征,得到特征偏差信息,特征偏差信息表征待处理视频和初始视频在视频内容上的特征偏差量;根据特征偏差信息,设置初始运镜序列中的至少一个运镜节点的节点参数,得到优化运镜序列。In one embodiment of the present disclosure, when the processing module 32 obtains the optimized camera movement sequence based on the video content features and the initial camera movement sequence, it is specifically used to: obtain the video content features of the initial video; obtain feature deviation information based on the video content features of the initial video and the video content features of the video to be processed, the feature deviation information representing the feature deviation amount in video content between the video to be processed and the initial video; and set the node parameters of at least one camera movement node in the initial camera movement sequence based on the feature deviation information to obtain the optimized camera movement sequence.
在本公开的一个实施例中,特征偏差信息包括位置偏差量,位置偏差量表征初始视频中人像所在的第一位置和待处理视频中人像所在的第二位置的偏差;处理模块32在根据特征偏差信息,设置初始运镜序列中的至少一个运镜节点的节点参数,得到优化运镜序列时,具体用于:获取初始运镜序列中位置偏差量对应的目标运镜节点;根据位置偏差量,设置目标运镜节点对应的镜头画面在运镜过程中的画面中心点,得到优化运镜序列。In one embodiment of the present disclosure, the characteristic deviation information includes a position deviation, and the position deviation represents a deviation between a first position of a portrait in an initial video and a second position of the portrait in a video to be processed; when the processing module 32 sets the node parameters of at least one camera movement node in the initial camera movement sequence according to the characteristic deviation information to obtain an optimized camera movement sequence, it is specifically used to: obtain a target camera movement node corresponding to the position deviation in the initial camera movement sequence; and set the center point of the lens picture corresponding to the target camera movement node during the camera movement process according to the position deviation to obtain an optimized camera movement sequence.
在本公开的一个实施例中,特征偏差信息包括尺度系数,尺度系数表征初始视频中的景别尺度和待处理视频中的景别尺度的比例;处理模块32在根据特征偏差信息,设置初始运镜序列中的至少一个运镜节点的节点参数,得到优化运镜序列时,具体用于:获取初始运镜序列中尺度系数对应的目标运镜节点;根据尺度系数,设置目标运镜节点对应的镜头画面的在运动过程中的运动幅度,得到优化运镜序列。In one embodiment of the present disclosure, the characteristic deviation information includes a scale coefficient, which represents the ratio of the scene scale in the initial video to the scene scale in the video to be processed; when the processing module 32 sets the node parameters of at least one camera movement node in the initial camera movement sequence according to the characteristic deviation information to obtain the optimized camera movement sequence, it is specifically used to: obtain the target camera movement node corresponding to the scale coefficient in the initial camera movement sequence; according to the scale coefficient, set the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process to obtain the optimized camera movement sequence.
在本公开的一个实施例中,处理模块32在获取待处理视频的视频内容特征时,具体用于:根据初始运镜序列中至少一个运镜节点的时间戳,获取待处理视频中对应的目标视频帧;对目标视频帧进行内容检测,得到视频内容特征。In one embodiment of the present disclosure, when the processing module 32 obtains the video content features of the video to be processed, it is specifically used to: obtain the corresponding target video frame in the video to be processed according to the timestamp of at least one camera movement node in the initial camera movement sequence; perform content detection on the target video frame to obtain the video content features.
在本公开的一个实施例中,初始视频具有第一播放时长,待处理视频具有第二播放时长,当第二播放时长大于第一播放时长时,处理模块32,还用于:根据优化运镜序列中的至少一个运镜节点,生成延长运镜序列,延长运镜序列所对应的镜头画面的运镜时长不小于第二播放时长与第一播放时长的差;处理模块32在基于优化运镜序列处理 待处理视频,生成输出视频时,具体用于:依次基于优化运镜序列和延长运镜序列处理待处理视频,生成输出视频。In one embodiment of the present disclosure, the initial video has a first playback duration, and the video to be processed has a second playback duration. When the second playback duration is greater than the first playback duration, the processing module 32 is further configured to: generate an extended camera movement sequence according to at least one camera movement node in the optimized camera movement sequence, wherein the camera movement duration of the lens picture corresponding to the extended camera movement sequence is not less than the difference between the second playback duration and the first playback duration; and the processing module 32 processes the camera movement sequence based on the optimized camera movement sequence. When the video to be processed generates an output video, it is specifically used to: process the video to be processed based on the optimized camera movement sequence and the extended camera movement sequence in sequence to generate an output video.
在本公开的一个实施例中,处理模块32在根据优化运镜序列中的至少一个运镜节点,生成延长运镜序列时,具体用于:获取待处理视频中第一视频片段的识别特征,其中,第一视频片段为待处理视频中第一播放时长之后的视频段,识别特征包括第一视频片段的视频内容特征和/或音乐旋律特征;根据识别特征,从待处理视频中获取第二视频片段,第二视频片段为待处理视频中第一播放时长之前的视频段,第二视频片段的识别特征与第一视频片段的识别特征相似;根据第二视频片段对应的至少一个连续的运镜节点,生成延长运镜序列。In one embodiment of the present disclosure, when the processing module 32 generates an extended camera movement sequence based on at least one camera movement node in the optimized camera movement sequence, it is specifically used to: obtain identification features of a first video clip in the video to be processed, wherein the first video clip is a video segment after a first playback duration in the video to be processed, and the identification features include video content features and/or music melody features of the first video clip; according to the identification features, obtain a second video clip from the video to be processed, the second video clip is a video segment before the first playback duration in the video to be processed, and the identification features of the second video clip are similar to the identification features of the first video clip; and generate an extended camera movement sequence based on at least one continuous camera movement node corresponding to the second video clip.
其中,加载模块31、处理模块32、输出模块33依次连接。本实施例提供的视频生成装置3可以执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。The loading module 31, processing module 32 and output module 33 are connected in sequence. The video generating device 3 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
图15为本公开实施例提供的一种电子设备的结构示意图,如图15所示,该电子设备4包括:FIG. 15 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 15 , the electronic device 4 includes:
处理器41,以及与处理器41通信连接的存储器42;A processor 41, and a memory 42 communicatively connected to the processor 41;
存储器42存储计算机执行指令;The memory 42 stores computer executable instructions;
处理器41执行存储器42存储的计算机执行指令,以实现如图2-图13所示实施例中的视频生成方法。The processor 41 executes the computer-executable instructions stored in the memory 42 to implement the video generation method in the embodiments shown in FIG. 2 to FIG. 13 .
其中,可选地,处理器41和存储器42通过总线43连接。Optionally, the processor 41 and the memory 42 are connected via a bus 43 .
相关说明可以对应参见图2-图13所对应的实施例中的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。The relevant instructions can be understood by referring to the relevant descriptions and effects corresponding to the steps in the embodiments corresponding to Figures 2 to 13, and no further details will be given here.
本公开实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,计算机执行指令被处理器执行时用于实现本公开图2-图13所对应的实施例中任一实施例提供的视频生成方法。An embodiment of the present disclosure provides a computer-readable storage medium, in which computer-executable instructions are stored. When the computer-executable instructions are executed by a processor, they are used to implement the video generation method provided in any one of the embodiments corresponding to Figures 2 to 13 of the present disclosure.
本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如图2-图13所示实施例中的方法。An embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the methods in the embodiments shown in FIG. 2 to FIG. 13 .
参考图16,其示出了适于用来实现本公开实施例的电子设备900 的结构示意图,该电子设备900可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图16示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring to FIG. 16 , an electronic device 900 suitable for implementing the embodiments of the present disclosure is shown. The electronic device 900 may be a terminal device or a server. The terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (Portable Media Players, PMPs), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG16 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
如图16所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(Read Only Memory,简称ROM)902中的程序或者从存储装置908加载到随机访问存储器(Random Access Memory,简称RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。As shown in FIG. 16 , the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903. Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903. The processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图16示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 907 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 908 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 909. The communication device 909 may allow the electronic device 900 to communicate with other devices wirelessly or by wire to exchange data. Although FIG. 16 shows an electronic device 900 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM 902被安装。在该计算机程序被处 理装置901执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When the computer program is processed When the processing device 901 executes, the above functions defined in the method of the embodiment of the present disclosure are performed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above. The computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device. The program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。The computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device executes the method shown in the above embodiment.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在 用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages. The program code may be written entirely in The program may be executed on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present disclosure. In this regard, each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some implementations as replacements, the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。The units involved in the embodiments described in the present disclosure may be implemented by software or hardware. The name of a unit does not limit the unit itself in some cases. For example, the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described above herein may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), and the like.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、 装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that can contain or store information for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing. More specific examples of machine-readable storage media may include electrical connections based on one or more wires, portable computer disks, hard disks, random access memories (RAM), read-only memories (ROM), erasable programmable read-only memories (EPROM or flash memory), optical fibers, portable compact disk read-only memories (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an explanation of the technical principles used. Those skilled in the art should understand that the scope of disclosure involved in the present disclosure is not limited to the technical solutions formed by a specific combination of the above technical features, but should also cover other technical solutions formed by any combination of the above technical features or their equivalent features without departing from the above disclosed concept. For example, the above features are replaced with the technical features with similar functions disclosed in the present disclosure (but not limited to) by each other to form a technical solution.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, although each operation is described in a specific order, this should not be understood as requiring these operations to be performed in the specific order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Similarly, although some specific implementation details are included in the above discussion, these should not be interpreted as limiting the scope of the present disclosure. Some features described in the context of a separate embodiment can also be implemented in a single embodiment in combination. On the contrary, the various features described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable sub-combination mode.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。第一方面,本公开实施例提供一种视频生成方法,包括:Although the subject matter has been described in language specific to structural features and/or method logic actions, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. On the contrary, the specific features and actions described above are merely example forms of implementing the claims. In a first aspect, the present disclosure provides a video generation method, comprising:
根据初始运镜模板,获得初始运镜序列,所述初始运镜序列包括初始视频对应的至少一个运镜节点,所述运镜节点用于表征视频播放时镜头画面的运镜特征;获取待处理视频的视频内容特征,并根据所 述视频内容特征和所述初始运镜序列,得到优化运镜序列,所述优化运镜序列包括至少一个优化运镜节点,所述优化运镜节点对应的运镜特征与所述视频内容特征相匹配;基于所述优化运镜序列处理所述待处理视频,生成输出视频,所述输出视频的镜头画面具有所述优化运镜节点对应运镜特征。According to the initial camera movement template, an initial camera movement sequence is obtained, wherein the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the lens screen when the video is played; the video content characteristics of the video to be processed are obtained, and the initial camera movement sequence includes at least one camera movement node corresponding to the initial video, and the camera movement node is used to characterize the camera movement characteristics of the lens screen when the video is played; The video content features and the initial camera movement sequence are combined to obtain an optimized camera movement sequence, wherein the optimized camera movement sequence includes at least one optimized camera movement node, and the camera movement features corresponding to the optimized camera movement node match the video content features; based on the optimized camera movement sequence, the video to be processed is processed to generate an output video, and the shot images of the output video have the camera movement features corresponding to the optimized camera movement node.
在一种或多种可能的实现方式中,所述视频内容特征包括第一内容特征,所述第一内容特征表征所述待处理视频中人像所在的位置;所述根据所述视频内容特征和所述初始运镜序列,得到优化运镜序列,包括:根据所述第一内容特征,确定所述初始运镜序列中至少一个运镜节点对应的镜头画面在运动过程中的镜头聚焦区域;根据所述镜头画面在运动过程中的镜头聚焦区域配置对应的运镜节点,生成优化运镜节点;根据所述优化运镜节点,生成所述优化运镜序列。In one or more possible implementations, the video content feature includes a first content feature, and the first content feature represents the position of the portrait in the video to be processed; the optimized camera movement sequence is obtained according to the video content feature and the initial camera movement sequence, including: determining the lens focus area of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the movement process according to the first content feature; configuring the corresponding camera movement node according to the lens focus area of the lens picture during the movement process, and generating an optimized camera movement node; generating the optimized camera movement sequence according to the optimized camera movement node.
在一种或多种可能的实现方式中,所述视频内容特征包括第二内容特征,所述第二内容特征表征所述待处理视频中景别的尺度;所述根据所述视频内容特征和所述初始运镜序列,得到优化运镜序列,包括:根据所述第二内容特征,确定所述初始运镜序列中至少一个运镜节点对应的镜头画面的在运动过程中的运动幅度;根据所述镜头画面的在运动过程中的运动幅度配置对应的运镜节点,生成优化运镜节点;根据所述优化运镜节点,生成所述优化运镜序列。In one or more possible implementations, the video content feature includes a second content feature, and the second content feature characterizes the scale of the scene in the video to be processed; obtaining the optimized camera movement sequence based on the video content feature and the initial camera movement sequence includes: determining, based on the second content feature, the motion amplitude of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the motion process; configuring the corresponding camera movement node according to the motion amplitude of the lens picture during the motion process, and generating an optimized camera movement node; generating the optimized camera movement sequence based on the optimized camera movement node.
在一种或多种可能的实现方式中,所述视频内容特征包括第三内容特征,所述第三内容特征表征所述待处理视频中的目标对象;所述根据所述视频内容特征和所述初始运镜序列,得到优化运镜序列,包括:根据所述第三内容特征,确定所述初始运镜序列中至少一个运镜节点对应的镜头画面的在运镜过程中的变化频率,所述变化频率小于预设的频率阈值;根据所述镜头画面的在运镜过程中的变化频率配置对应的运镜节点,生成优化运镜节点;根据所述优化运镜节点,生成所述优化运镜序列。In one or more possible implementations, the video content feature includes a third content feature, and the third content feature represents the target object in the video to be processed; the optimized camera movement sequence is obtained based on the video content feature and the initial camera movement sequence, including: determining the change frequency of the lens picture corresponding to at least one camera movement node in the initial camera movement sequence during the camera movement process based on the third content feature, and the change frequency is less than a preset frequency threshold; configuring the corresponding camera movement node according to the change frequency of the lens picture during the camera movement process, and generating an optimized camera movement node; generating the optimized camera movement sequence based on the optimized camera movement node.
在一种或多种可能的实现方式中,所述根据所述第三内容特征,生成所述初始运镜序列中至少一个运镜节点的第三参数,包括:获取 所述目标对象的轮廓尺寸;根据所述轮廓尺寸,得到对应的目标运动频率,其中,所述目标运动频率与所述轮廓尺寸成反比;根据所述目标运动频率,设置所述第三参数。In one or more possible implementations, generating a third parameter of at least one camera movement node in the initial camera movement sequence according to the third content feature includes: obtaining The outline size of the target object; according to the outline size, obtaining a corresponding target motion frequency, wherein the target motion frequency is inversely proportional to the outline size; according to the target motion frequency, setting the third parameter.
在一种或多种可能的实现方式中,所述根据所述视频内容特征和所述初始运镜序列,得到优化运镜序列,包括:获取所述初始视频的视频内容特征;根据所述初始视频的视频内容特征和所述待处理视频的视频内容特征,得到特征偏差信息,所述特征偏差信息表征所述待处理视频和所述初始视频在视频内容上的特征偏差量;根据所述特征偏差信息,设置所述初始运镜序列中的至少一个运镜节点的节点参数,得到所述优化运镜序列。In one or more possible implementations, obtaining an optimized camera movement sequence based on the video content features and the initial camera movement sequence includes: acquiring video content features of the initial video; obtaining feature deviation information based on the video content features of the initial video and the video content features of the video to be processed, wherein the feature deviation information represents the amount of feature deviation in video content between the video to be processed and the initial video; and setting node parameters of at least one camera movement node in the initial camera movement sequence based on the feature deviation information to obtain the optimized camera movement sequence.
在一种或多种可能的实现方式中,所述特征偏差信息包括位置偏差量,所述位置偏差量表征所述初始视频中人像所在的第一位置和所述待处理视频中人像所在的第二位置的偏差;根据所述特征偏差信息,设置所述初始运镜序列中的至少一个运镜节点的节点参数,得到所述优化运镜序列,包括:获取所述初始运镜序列中所述位置偏差量对应的目标运镜节点;根据所述位置偏差量,设置所述目标运镜节点对应的镜头画面在运镜过程中的画面中心点,得到所述优化运镜序列。In one or more possible implementations, the characteristic deviation information includes a position deviation, and the position deviation represents a deviation between a first position of a portrait in the initial video and a second position of the portrait in the video to be processed; according to the characteristic deviation information, node parameters of at least one camera movement node in the initial camera movement sequence are set to obtain the optimized camera movement sequence, including: obtaining a target camera movement node corresponding to the position deviation in the initial camera movement sequence; according to the position deviation, setting the center point of the lens picture corresponding to the target camera movement node during the camera movement process to obtain the optimized camera movement sequence.
在一种或多种可能的实现方式中,所述特征偏差信息包括尺度系数,所述尺度系数表征所述初始视频中的景别尺度和所述待处理视频中的景别尺度的比例;根据所述特征偏差信息,设置所述初始运镜序列中的至少一个运镜节点的节点参数,得到所述优化运镜序列,包括:获取所述初始运镜序列中所述尺度系数对应的目标运镜节点;根据所述尺度系数,设置所述目标运镜节点对应的镜头画面的在运动过程中的运动幅度,得到所述优化运镜序列。In one or more possible implementations, the feature deviation information includes a scale coefficient, and the scale coefficient represents the ratio of the scene scale in the initial video to the scene scale in the video to be processed; according to the feature deviation information, the node parameters of at least one camera movement node in the initial camera movement sequence are set to obtain the optimized camera movement sequence, including: obtaining the target camera movement node corresponding to the scale coefficient in the initial camera movement sequence; according to the scale coefficient, setting the movement amplitude of the lens picture corresponding to the target camera movement node during the movement process to obtain the optimized camera movement sequence.
在一种或多种可能的实现方式中,所述获取待处理视频的视频内容特征,包括:根据所述初始运镜序列中至少一个所述运镜节点的时间戳,获取所述待处理视频中对应的目标视频帧;对所述目标视频帧进行内容检测,得到所述视频内容特征。In one or more possible implementations, the obtaining of video content features of the video to be processed includes: obtaining a corresponding target video frame in the video to be processed based on a timestamp of at least one of the camera movement nodes in the initial camera movement sequence; and performing content detection on the target video frame to obtain the video content features.
在一种或多种可能的实现方式中,所述初始视频具有第一播放时 长,所述待处理视频具有第二播放时长,当所述第二播放时长大于所述第一播放时长时,所述方法还包括:根据所述优化运镜序列中的至少一个运镜节点,生成延长运镜序列,所述延长运镜序列所对应的镜头画面的运镜时长不小于所述第二播放时长与所述第一播放时长的差;所述基于所述优化运镜序列处理所述待处理视频,生成输出视频,包括:依次基于所述优化运镜序列和所述延长运镜序列处理所述待处理视频,生成输出视频。In one or more possible implementations, the initial video has a first playback time. The video to be processed has a second playback duration. When the second playback duration is greater than the first playback duration, the method further comprises: generating an extended camera movement sequence according to at least one camera movement node in the optimized camera movement sequence, and the camera movement duration of the lens picture corresponding to the extended camera movement sequence is not less than the difference between the second playback duration and the first playback duration; processing the video to be processed based on the optimized camera movement sequence to generate an output video comprises: processing the video to be processed based on the optimized camera movement sequence and the extended camera movement sequence in sequence to generate an output video.
在一种或多种可能的实现方式中,所述根据所述优化运镜序列中的至少一个运镜节点,生成延长运镜序列,包括:获取所述待处理视频中第一视频片段的识别特征,其中,所述第一视频片段为所述待处理视频中第一播放时长之后的视频段,所述识别特征包括所述第一视频片段的视频内容特征和/或音乐旋律特征;根据所述识别特征,从所述待处理视频中获取第二视频片段,所述第二视频片段为所述待处理视频中第一播放时长之前的视频段,所述第二视频片段的识别特征与所述第一视频片段的识别特征相似;根据所述第二视频片段对应的至少一个连续的运镜节点,生成所述延长运镜序列。 In one or more possible implementations, generating an extended camera movement sequence based on at least one camera movement node in the optimized camera movement sequence includes: obtaining identification features of a first video clip in the video to be processed, wherein the first video clip is a video segment after a first playback duration in the video to be processed, and the identification features include video content features and/or music melody features of the first video clip; obtaining a second video clip from the video to be processed based on the identification features, wherein the second video clip is a video segment before a first playback duration in the video to be processed, and the identification features of the second video clip are similar to those of the first video clip; generating the extended camera movement sequence based on at least one continuous camera movement node corresponding to the second video clip.
Claims (15)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310770226.9A CN116782018A (en) | 2023-06-27 | 2023-06-27 | Video generation method, device, electronic equipment and storage medium |
CN202310770226.9 | 2023-06-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2025002075A1 true WO2025002075A1 (en) | 2025-01-02 |
Family
ID=87987627
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/101109 WO2025002075A1 (en) | 2023-06-27 | 2024-06-24 | Video generation method and apparatus, electronic device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116782018A (en) |
WO (1) | WO2025002075A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116782018A (en) * | 2023-06-27 | 2023-09-19 | 北京字跳网络技术有限公司 | Video generation method, device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113591644A (en) * | 2021-07-21 | 2021-11-02 | 此刻启动(北京)智能科技有限公司 | Mirror-moving video processing method and system, storage medium and electronic equipment |
WO2022110033A1 (en) * | 2020-11-27 | 2022-06-02 | 深圳市大疆创新科技有限公司 | Video processing method and apparatus, and terminal device |
CN114731458A (en) * | 2020-12-31 | 2022-07-08 | 深圳市大疆创新科技有限公司 | Video processing method, video processing apparatus, terminal device, and storage medium |
CN115842953A (en) * | 2022-11-25 | 2023-03-24 | 维沃移动通信有限公司 | Shooting method and device thereof |
CN116782018A (en) * | 2023-06-27 | 2023-09-19 | 北京字跳网络技术有限公司 | Video generation method, device, electronic equipment and storage medium |
-
2023
- 2023-06-27 CN CN202310770226.9A patent/CN116782018A/en active Pending
-
2024
- 2024-06-24 WO PCT/CN2024/101109 patent/WO2025002075A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022110033A1 (en) * | 2020-11-27 | 2022-06-02 | 深圳市大疆创新科技有限公司 | Video processing method and apparatus, and terminal device |
CN114731458A (en) * | 2020-12-31 | 2022-07-08 | 深圳市大疆创新科技有限公司 | Video processing method, video processing apparatus, terminal device, and storage medium |
CN113591644A (en) * | 2021-07-21 | 2021-11-02 | 此刻启动(北京)智能科技有限公司 | Mirror-moving video processing method and system, storage medium and electronic equipment |
CN115842953A (en) * | 2022-11-25 | 2023-03-24 | 维沃移动通信有限公司 | Shooting method and device thereof |
CN116782018A (en) * | 2023-06-27 | 2023-09-19 | 北京字跳网络技术有限公司 | Video generation method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116782018A (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020077914A1 (en) | Image processing method and apparatus, and hardware apparatus | |
CN110072047B (en) | Image deformation control method, device and hardware device | |
CN112637517B (en) | Video processing method and device, electronic equipment and storage medium | |
JP7473676B2 (en) | AUDIO PROCESSING METHOD, APPARATUS, READABLE MEDIUM AND ELECTRONIC DEVICE | |
CN110070496B (en) | Method and device for generating image special effect and hardware device | |
US20230368461A1 (en) | Method and apparatus for processing action of virtual object, and storage medium | |
US12019669B2 (en) | Method, apparatus, device, readable storage medium and product for media content processing | |
US20240273794A1 (en) | Image processing method, training method for an image processing model, electronic device, and medium | |
CN116934577A (en) | Method, device, equipment and medium for generating style image | |
WO2025002075A1 (en) | Video generation method and apparatus, electronic device, and storage medium | |
WO2024165010A1 (en) | Information generation method and apparatus, information display method and apparatus, device and storage medium | |
CN114630057B (en) | Method and device for determining special effect video, electronic equipment and storage medium | |
JP2023550970A (en) | Methods, equipment, storage media, and program products for changing the background in the screen | |
WO2025011491A1 (en) | Video processing method and apparatus, device, storage medium and program product | |
CN116527993A (en) | Video processing method, apparatus, electronic device, storage medium and program product | |
CN114445600A (en) | Method, device and equipment for displaying special effect prop and storage medium | |
WO2024131652A1 (en) | Special effect processing method and apparatus, and electronic device and storage medium | |
WO2024131585A1 (en) | Video special-effect display method and apparatus, and electronic device and storage medium | |
WO2025002073A1 (en) | Video data processing method and apparatus, and electronic device | |
WO2024094158A1 (en) | Special effect processing method and apparatus, device, and storage medium | |
CN109636917B (en) | Three-dimensional model generation method, device and hardware device | |
CN115981769A (en) | Page display method, device, equipment, computer readable storage medium and product | |
WO2024104477A1 (en) | Image generation method and apparatus, electronic device, and storage medium | |
WO2024078409A1 (en) | Image preview method and apparatus, and electronic device and storage medium | |
CN112053450B (en) | Text display method and device, electronic equipment and storage medium |