CN114979764A

CN114979764A - Video generation method and device, computer equipment and storage medium

Info

Publication number: CN114979764A
Application number: CN202210441414.2A
Authority: CN
Inventors: 徐娟
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2022-08-30
Anticipated expiration: 2042-04-25
Also published as: CN114979764B

Abstract

The application relates to the field of video generation, and provides a video generation method, a device, equipment and a computer storage medium, wherein the method comprises the following steps: acquiring audio and video information recorded according to preset text information, wherein the text information comprises at least one section of text sub-information; acquiring a first animation split mirror corresponding to each text sub-information; segmenting the audio and video information according to the text sub-information to obtain at least one segment of audio and video sub-information; adjusting each first animation sub-mirror according to each audio and video sub-information to obtain a second animation sub-mirror; and synthesizing each audio and video sub-information with each second animation sub-mirror to generate a target video. The high-quality video with animation effect can be quickly generated according to the audio and video information recorded by the user, and the video production period is shortened. The application also relates to artificial intelligence, and the video generation method can be applied to cloud servers of big data and artificial intelligence platform cloud computing services.

Description

Video generation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of video generation, and in particular, to a video generation method and apparatus, a computer device, and a storage medium.

Background

With the popularity of short videos, more and more teams or individuals choose to promote by shooting short videos. However, in the prior art, a short video scene shot by an individual is single, simple in content and poor in video quality; the short videos produced by the team not only need to produce special animation and other post-stage special effects, but also need to correspondingly match the animation with the video content, the duration and the frame rate, so that the video production cost is high, the production period is long, and a method for generating videos with animation effects at high quality and low cost is urgently needed.

Disclosure of Invention

The present application is directed to a method, an apparatus, a device and a computer storage medium for generating a video, and aims to generate a high-quality video with animation effect quickly and shorten a production cycle of the video.

In a first aspect, the present application provides a video generation method, including the following steps:

acquiring audio and video information recorded according to preset text information, wherein the text information comprises at least one section of text sub-information;

acquiring a first animation split mirror corresponding to each text sub-information;

segmenting the audio and video information according to the text sub-information to obtain at least one segment of audio and video sub-information;

adjusting each first animation sub-mirror according to each audio and video sub-information to obtain a second animation sub-mirror;

and synthesizing each audio and video sub-information with each second animation sub-mirror to generate a target video.

In a second aspect, the present application further provides a video generating apparatus, including:

the first acquisition module is used for acquiring audio and video information recorded according to preset text information, wherein the text information comprises at least one section of text sub-information;

the second acquisition module is used for acquiring the first animation split mirror corresponding to each text sub-message;

the audio and video segmentation module is used for segmenting the audio and video information according to the text sub-information to obtain at least one section of audio and video sub-information;

the animation adjusting module is used for adjusting each first animation sub-mirror according to each audio and video sub-information to obtain a second animation sub-mirror;

and the video synthesis module is used for synthesizing each audio and video sub-information with each second animation sub-mirror to generate a target video.

In a third aspect, the present application further provides a computer device, which includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein when the computer program is executed by the processor, the video generation method as above is implemented.

In a fourth aspect, the present application further provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the video generation method as described above.

The application provides a video generation method, a video generation device and a computer storage medium, and the method comprises the steps of obtaining audio and video information recorded according to preset text information, wherein the text information comprises at least one section of text sub-information; acquiring a first animation lens corresponding to each text sub-information; segmenting the audio and video information according to the text sub-information to obtain at least one segment of audio and video sub-information; adjusting each first animation sub-mirror according to each audio and video sub-information to obtain a second animation sub-mirror; and synthesizing each audio and video sub-information with each second animation sub-mirror to generate a target video. Because the audio and video information recorded by the user is obtained and the pre-generated animation is adjusted according to the audio and video information to generate the target video, the user can automatically generate the high-quality video with the animation only by recording the audio and video information, and the video production period is shortened.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video generation method according to an embodiment of the present application;

fig. 2 is a usage scenario diagram of a video generation method according to an embodiment of the present application;

fig. 3 is a schematic block diagram of a video generating apparatus according to an embodiment of the present application;

fig. 4 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The embodiment of the application provides a video generation method, a video generation device, computer equipment and a computer readable storage medium.

Referring to fig. 1, fig. 1 is a schematic flowchart of a video generation method according to an embodiment of the present application. The video generation method can be used in a terminal or a server to realize the target video according to the audio and video information input by a user. The terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and a wearable device; the server may be an independent server, a server cluster, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Referring to fig. 2, fig. 2 is a usage scenario diagram according to an embodiment of the present application. As shown in fig. 2, the server stores preset text information, the user records audio/video information according to the text information, and the server can match the audio/video information with preset animation in a split mirror manner to synthesize a target video.

As shown in fig. 1, the video generation method includes steps S101 to S105.

Step S101, audio and video information recorded according to preset text information is obtained, wherein the text information comprises at least one section of text sub-information.

Illustratively, text information is preset based on the actual requirement of information propagation, for example, a monologue text in a target video is preset, and the text information is displayed to a user, so that the user can record audio and video information according to the text information.

Illustratively, the audio/video information is an audio/video of the text information read aloud by the user, and the audio/video information may be a video with a sound track or only an audio, which is not limited herein.

For example, the text information may be divided into at least one text sub-information according to actual requirements, for example, the text sub-information is divided according to punctuation marks, line feed marks, and the like in the text information, which is not limited herein.

Illustratively, when the audio/video information is recorded, each text sub-information can be displayed to the user one by one, for example, displayed sentence by sentence or displayed line by line, and the like, so that a pause is left between each text sub-information in the audio/video information recorded by the user, and the audio/video information can be more accurately segmented into the audio/video sub-information in the following.

And step S102, acquiring a first animation split mirror corresponding to each text sub-information.

Illustratively, the first animation mirrors are animation information preset according to the text sub-information, for example, material elements are obtained from a preset material library according to the text sub-information, a plurality of animation video frames are generated based on the video elements, each first animation mirror is synthesized based on the animation video frames, and the first animation mirrors are stored according to template IDs of the text information corresponding to the first animation mirrors and text codes corresponding to the corresponding text sub-information.

In some embodiments, the step S102 of obtaining the first animation split mirror corresponding to each piece of text sub-information includes: and acquiring a first animation split mirror corresponding to each text sub-information according to a preset corresponding relation between the first animation split mirror and the text sub-information.

Illustratively, the first animation split mirror stores the corresponding text information according to the template ID of the text information and the text code corresponding to the text sub information, where the text code may be, for example, the number of lines corresponding to the text sub information; therefore, the first animation segment corresponding to each piece of text sub-information can be obtained through the corresponding relationship between the template ID + the text code and the first animation segment, which is not limited to this.

Illustratively, each text sub-message corresponds to each first animation split mirror according to the template ID of the text message and the text code corresponding to the corresponding text sub-message, so that the first animation split mirror matched with the text sub-message is conveniently obtained.

And S103, segmenting the audio and video information according to the text sub-information to obtain at least one segment of audio and video sub-information.

Illustratively, the audio/video information is divided into audio/video sub-information corresponding to the text sub-information, so that each piece of the audio/video sub-information corresponds to each first animation sub-mirror.

In some embodiments, the step S103 segments the audio/video information according to each text sub-information to obtain at least one segment of audio/video sub-information, including: extracting the audio and video information based on a preset voice endpoint detection algorithm to obtain effective audio and video information; matching the effective audio and video information with the text sub-information based on a preset voice text matching algorithm; and obtaining the audio and video sub-information according to the matching result of the effective audio and video information and the text sub-information.

For example, the Voice endpoint Detection algorithm may be, for example, a Voice Activity Detection algorithm (VAD), and the VAD is capable of detecting whether Voice exists in audio, removing non-Voice segments, and simplifying Voice processing. Of course, the voice endpoint detection may be performed by other algorithms, which are not limited herein.

Illustratively, the pause of the audio and video information user reading the text information is recognized and eliminated through a voice endpoint detection algorithm, and effective audio and video information in the audio and video information is obtained.

Illustratively, based on the voice text matching algorithm, the effective audio/video information is matched with the text sub-information, a starting point and an ending point of each audio/video sub-information in the audio/video information are determined, and the effective audio/video information is segmented according to the starting point and the ending point of each audio/video sub-information to obtain audio/video sub-information corresponding to the text sub-information, such as an audio/video segment of the text sub-information read aloud by a user.

Illustratively, pause of a user in reading text information can be removed through voice endpoint detection, and interference of human factors on generated videos is reduced; and at least one section of audio and video sub-information corresponding to the text sub-information is obtained through a voice text matching algorithm, so that the subsequent adjustment of the animation split mirror is facilitated, and the visibility of the generated target video is improved.

And step S104, adjusting each first animation lens according to each audio and video sub-information to obtain a second animation lens.

Illustratively, when a user records audio/video information, it is difficult to ensure that the time length for reading each piece of text sub-information is consistent with the time length of the corresponding first animation sub-mirror, the audio/video information needs to be divided into the audio/video sub-information, and then the corresponding first animation sub-mirror is adjusted according to the audio/video sub-information to obtain a second animation sub-mirror.

In some embodiments, the step S104 of adjusting each of the first animation mirrors according to each of the audio/video sub-information to obtain a second animation mirror includes: if the length of the audio and video sub information is larger than that of the corresponding first animation sub mirror, performing frame interpolation on the first animation sub mirror to obtain a second animation sub mirror; and if the length of the audio and video sub information is smaller than that of the corresponding first animation sub mirror, performing frame extraction on the first animation sub mirror to obtain a second animation sub mirror.

Illustratively, if the length of the audio and video sub information is greater than that of the corresponding first animation sub mirror, the length of the first animation sub mirror is extended in a manner of inserting frames into the first animation sub mirror, and a second animation sub mirror corresponding to the duration of the audio and video sub information is generated.

Specifically, in order to match the length of the animation segment with the corresponding audio/video sub information, for example, a Real-time Intermediate Flow Estimation (RIFE) may be used to interpolate the first animation segment to obtain a second animation segment, which is not limited to this, and may also be used to interpolate by other algorithms, which is not limited to this.

Illustratively, if the length of the audio and video sub information is smaller than that of the corresponding first animation sub mirror, the length of the first animation sub mirror is shortened in a mode of performing frame extraction on the first animation sub mirror, and a second animation sub mirror corresponding to the duration of the audio and video sub information is generated.

Specifically, in order to match the length of the animation segment with the corresponding audio-video sub-information, for example, frames may be extracted at equal intervals from the first animation segment, and a plurality of frames may be extracted at certain intervals from the first animation segment, which is not limited to this.

It can be understood that if the audio/video sub-information is the same as the corresponding first animation split mirror in duration, the frame rate of the animation split mirror does not need to be adjusted.

Illustratively, by performing frame interpolation or frame extraction on the first animation sub-mirror, the time length of the second animation sub-mirror is the same as that of the corresponding audio and video sub-information, so that the fluency and the visibility of the generated target video are improved.

And S105, synthesizing each audio and video sub-information and each second animation sub-mirror to generate a target video.

In some embodiments, step S105 synthesizes each piece of audio-video sub-information with each piece of second animation split mirror, and generates a target video, including: synthesizing the audio and video sub-information and each second animation sub-mirror to generate a target sub-video; and splicing the target sub-videos to obtain the target video.

Illustratively, the audio and video sub-information and each corresponding second animation are synthesized in a split-mirror manner to obtain at least one section of target sub-video, and the target sub-videos are spliced in sequence to obtain the target video.

In some embodiments, step S105 synthesizes each piece of audio-video sub information with each piece of second animation split mirror, so as to generate a target video, and further includes: and generating the target video according to a preset mirror-out mode, wherein the mirror-out mode comprises a full-range mirror-out mode, a partial mirror-out mode and a non-mirror-out mode.

For example, the mirror-out mode may be determined according to audio/video information recorded by a user, for example, if the audio/video information is a video with a track, the mirror-out mode may be a full-time mirror-out mode or a partial mirror-out mode; and if the audio and video information is audio information, the mirror-out mode is a mirror-out-not-out mode.

For example, the mirror-out mode may also be determined based on a selection operation of a user, for example, if the audio/video information is a video with a soundtrack, the mirror-out mode is determined to be a full mirror-out mode, a partial mirror-out mode, or a non-mirror-out mode according to a selection operation of the user for the mirror-out mode.

Specifically, if the mirror-out mode is the full-range mirror-out mode, the video information in the audio and video information may be set below each of the second animation sub-mirrors, which is certainly not limited thereto, and is not limited herein.

Specifically, if the mirror-out mode is a partial mirror-out mode, for example, the mirror-out mode may be a first mirror-out mode, the audio and video sub information recorded by the user is used in the first segment of the target sub video, and the audio information in each piece of the audio and video sub information recorded by the user and each piece of the second animation sub video are used in the subsequent target sub video.

Illustratively, by setting multiple mirror-out modes, the flexibility of generating the target video is improved, and a user can select to record audio or video with an audio track according to actual requirements, so that the user experience is improved.

In some embodiments, the video generation method further comprises: based on a preset voice recognition algorithm, adjusting each text sub-information according to each audio and video sub-information; generating subtitles corresponding to the audio and video sub-information according to the adjusted text sub-information; step S105 further includes: and synthesizing each audio and video sub-information, each second animation sub-mirror and the subtitles.

Illustratively, when the user reads the text information aloud, the text information is inevitably missed, mistaken, or added, based on the speech recognition algorithm, the text in each text sub-information is modified according to each audio/video sub-information, so that the modified text sub-information corresponds to the content of the audio/video sub-information, and when a target video is synthesized, a subtitle generated according to the modified text sub-information is synthesized with each audio/video sub-information and each second animation sub-mirror to obtain the target video with the subtitle.

In some embodiments, step S105 further comprises: and synthesizing the audio and video sub information, the second animation sub mirrors and the preset sound effect information to generate a target video.

Illustratively, sound effect information is acquired from a preset material library according to the requirement of a synthesized video, and the sound effect information is synthesized into the target video, so that the target video is more vivid and the watching experience of a user is improved.

The video generation method provided by the embodiment obtains the audio and video information recorded according to the preset text information, wherein the text information comprises at least one section of text sub-information; acquiring a first animation split mirror corresponding to each text sub-information; segmenting the audio and video information according to the text sub-information to obtain at least one segment of audio and video sub-information; adjusting each first animation sub-mirror according to each audio and video sub-information to obtain a second animation sub-mirror; and synthesizing each audio and video sub-information with each second animation sub-mirror to generate a target video. The high-quality video with animation effect can be quickly generated according to the audio and video information recorded by the user, and the period of video production is shortened.

Referring to fig. 3, fig. 3 is a schematic diagram of a video generating apparatus according to an embodiment of the present disclosure, where the video generating apparatus may be configured in a server or a terminal, and is used to execute the video generating method.

As shown in fig. 3, the video generation apparatus includes:

the first obtaining module 110 is configured to obtain audio and video information recorded according to preset text information, where the text information includes at least one text sub-information.

The second obtaining module 120 is configured to obtain a first animation split mirror corresponding to each text sub-information.

The audio/video segmentation module 130 is configured to segment the audio/video information according to each piece of text sub-information to obtain at least one piece of audio/video sub-information.

And the animation adjusting module 140 is configured to adjust each first animation segment according to each piece of audio/video sub information to obtain a second animation segment.

And the video synthesis module 150 is configured to synthesize each piece of audio and video sub information with each piece of second animation sub-mirror, so as to generate a target video.

Illustratively, the animation adaptation module 140 includes: the animation frame inserting module and the animation frame extracting module.

The animation frame inserting module is used for inserting frames into the first animation sub-mirror to obtain a second animation sub-mirror if the length of the audio and video sub-information is larger than that of the corresponding first animation sub-mirror;

and the animation frame extracting module is used for extracting frames from the first animation sub-mirror to obtain a second animation sub-mirror if the length of the audio and video sub-information is smaller than that of the corresponding first animation sub-mirror.

Illustratively, the audio/video splitting module 130 includes: the device comprises an effective information extraction module, a voice text matching module and a matching result processing module.

The effective information extraction module is used for extracting the audio and video information based on a preset voice endpoint detection algorithm to obtain effective audio and video information;

the voice text matching module is used for matching the effective audio and video information with the text sub-information based on a preset voice text matching algorithm;

and the matching result processing module is used for obtaining the audio and video sub-information according to the matching result of the effective audio and video information and the text sub-information.

Illustratively, the video generating apparatus further comprises: the system comprises a text adjusting module and a subtitle generating module.

The text adjusting module is used for adjusting each text sub-information according to each audio and video sub-information based on a preset voice recognition algorithm;

the subtitle generating module is used for generating subtitles corresponding to the audio and video sub-information according to the adjusted text sub-information;

illustratively, the video composition module 150 includes a subtitle composition module.

And the caption synthesis module is used for synthesizing each audio and video sub-information, each second animation sub-mirror and the caption.

Illustratively, the second obtaining module 120 includes a second obtaining sub-module.

And the second obtaining submodule is used for obtaining the first animation sub-mirror corresponding to each text sub-information according to the preset corresponding relation between the first animation sub-mirror and the text sub-information.

Illustratively, the video composition module 150 further includes a sub-video composition module and a sub-video stitching module.

The sub-video synthesis module is used for synthesizing the audio and video sub-information and each second animation sub-mirror to generate a target sub-video;

and the sub-video splicing module is used for splicing the target sub-video to obtain the target video.

Illustratively, the video composition module 150 further includes a mirror-out mode determination module.

And the mirror-out mode determining module is used for generating the target video according to a preset mirror-out mode, wherein the mirror-out mode comprises a whole mirror-out mode, a partial mirror-out mode and a mirror-out-not mode.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the apparatus, the modules and the units described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The methods, apparatus, and devices of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The above-described methods and apparatuses may be implemented, for example, in the form of a computer program that can be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.

As shown in fig. 4, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a storage medium and an internal memory.

The storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause a processor to perform any of the video generation methods.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for the execution of a computer program on a storage medium, which when executed by a processor causes the processor to perform any of the video generation methods.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:

adjusting each first animation lens according to each audio and video sub-information to obtain a second animation lens;

In one embodiment, when the processor adjusts each of the first animation segments according to each of the audio/video sub-information to obtain a second animation segment, the processor is configured to:

if the length of the audio and video sub information is larger than that of the corresponding first animation sub mirror, performing frame interpolation on the first animation sub mirror to obtain a second animation sub mirror;

and if the length of the audio and video sub information is smaller than that of the corresponding first animation sub mirror, performing frame extraction on the first animation sub mirror to obtain a second animation sub mirror.

In one embodiment, when the processor segments the audio/video information according to each piece of text sub-information to obtain at least one piece of audio/video sub-information, the processor is configured to:

extracting the audio and video information based on a preset voice endpoint detection algorithm to obtain effective audio and video information;

matching the effective audio and video information with the text sub-information based on a preset voice text matching algorithm;

and obtaining the audio and video sub-information according to the matching result of the effective audio and video information and the text sub-information.

In one embodiment, after the processor segments the audio/video information according to each piece of text sub-information to obtain at least one piece of audio/video sub-information, the processor is configured to:

based on a preset voice recognition algorithm, adjusting each text sub-information according to each audio and video sub-information;

and generating subtitles corresponding to the audio and video sub-information according to the adjusted text sub-information.

In one embodiment, when the processor is used for synthesizing each piece of audio-video sub-information with each piece of second animation sub-mirror to generate a target video, the processor is configured to implement:

and synthesizing each audio and video sub-information, each second animation lens and the subtitles.

In one embodiment, the processor, when implementing obtaining the first animation split mirror corresponding to each text sub-information, is configured to implement:

and acquiring a first animation split mirror corresponding to each text sub-information according to a preset corresponding relation between the first animation split mirror and the text sub-information.

synthesizing the audio and video sub-information and each second animation sub-mirror to generate a target sub-video;

and splicing the target sub-videos to obtain the target video.

and generating the target video according to a preset mirror-out mode, wherein the mirror-out mode comprises a full-range mirror-out mode, a partial mirror-out mode and a non-mirror-out mode.

It should be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the video generation may refer to the corresponding process in the foregoing embodiment of the video generation control method, and is not described herein again.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program includes program instructions, and a method implemented when the program instructions are executed may refer to the various embodiments of the video generation method in the present application.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of video generation, the method comprising:

and synthesizing each audio and video sub-information and each second animation sub-mirror to generate a target video.

2. The video generation method according to claim 1, wherein the adjusting each of the first animation mirrors according to each of the audio/video sub-information to obtain a second animation mirror comprises:

if the length of the audio and video sub information is larger than that of the corresponding first animation lens, performing frame interpolation on the first animation lens to obtain a second animation lens;

3. The video generation method according to claim 1, wherein the segmenting the audio/video information according to each piece of text sub-information to obtain at least one piece of audio/video sub-information comprises:

4. The video generation method according to claim 1, wherein after the audio/video information is segmented according to each piece of text sub-information to obtain at least one piece of audio/video sub-information, the method further comprises:

generating a subtitle corresponding to the audio and video sub-information according to the adjusted text sub-information;

the synthesizing of each audio and video sub-information and each second animation sub-mirror to generate the target video comprises:

and synthesizing each audio and video sub-information, each second animation sub-mirror and the subtitles.

5. The video generation method according to any one of claims 1 to 4, wherein the obtaining of the first animation split mirror corresponding to each text sub-information includes:

6. The video generation method according to any one of claims 1 to 4, wherein the synthesizing of each piece of audio-video sub-information and each piece of second animation split mirror to generate a target video includes:

and splicing the target sub-videos to obtain the target video.

7. The video generation method according to any one of claims 1 to 4, wherein the synthesizing of each piece of audio-video sub-information and each piece of second animation split mirror to generate a target video includes:

8. A video generation apparatus, characterized in that the video generation apparatus comprises:

9. A computer device, characterized in that the computer device comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the video generation method according to any one of claims 1 to 7.

10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the video generation method of any of claims 1 to 7.