CN116484048A

CN116484048A - Video content automatic generation method and system

Info

Publication number: CN116484048A
Application number: CN202310442493.3A
Authority: CN
Inventors: 张德祥; 孟凡朋; 程斌
Original assignee: Shenzhen Jiwu Network Technology Co ltd
Current assignee: Shenzhen Jiwu Network Technology Co ltd
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-25

Abstract

The invention discloses an automatic generation method and system of video content, which relate to the technical field of video synthesis, and are characterized in that an automatic generation system is generated, a single keyword or a plurality of keywords which are manually input are searched for, related data are searched for in a content library, and a plurality of complete stories are obtained.

Description

Video content automatic generation method and system

Technical Field

The invention relates to the technical field of video synthesis, in particular to an automatic generation method and system of video content.

Background

When people watch a novel or children's story, the contents of the scenes described in the story can easily be reminiscent of some real scenes. The novel or story is presented to the user in the form of words, and the user cannot bring the feeling of being personally on the scene, and the novel is born. However, the voiced novice has its own limitations, and in order to solve this problem, the voice-driven character face animation technique is emerging. However, the story content is pre-written, and cannot be customized according to the needs of people.

Chinese patent publication No. CN 112992116A discloses an automatic video content generating method and system, wherein the method comprises: generating input data into story content; synthesizing the story content in text form into speakable audio having the character-specific sound features; and taking the read-aloud audio as the input of a facial animation synthesis model, and driving the character facial animation by the read-aloud audio and generating a facial animation video. In the story content generation process, only one initial word is used as a prediction basis of the new word, so that the generation speed of the story content is greatly improved, and further, the speed of converting the text of the subsequent story content into audio and driving the animation of the appointed role by the audio is ensured.

However, the above scheme still has the following problems: the generated video content is not subjected to multiple tests, so that a plurality of error conditions exist in the finished video, meanwhile, the support language is single, the use requirement of different languages cannot be met, the autonomous learning ability is poor, the content required by various combinations cannot be met, and the overall intelligent management level is poor.

The invention comprises the following steps:

the invention aims to solve the problems that in the background art, the generated video content is not subjected to multiple inspection, so that a plurality of error conditions exist in a finished video, meanwhile, the support language is single, the use requirements of different languages cannot be met, the autonomous learning capability is poor, the contents required by various combinations cannot be met, and the overall intelligent management level is poor.

In order to solve the problems, the invention provides a technical scheme that:

a video content automatic generation method comprises the following specific steps:

s1, generating the automatic generation system, searching a single keyword or a plurality of keywords which are manually input, and carrying out internal retrieval of related data in a content library to obtain a plurality of complete stories;

s2, performing image animation generation processing on a plurality of groups of complete story groups, adding AI audio language data to the generated image animation data, and synchronously adding subtitles to the generated audio in real time;

s3, checking whether the generated caption has an error condition, repairing the image and picture information in the video content, and checking and correcting the audio error in the video content;

and S4, performing evaluation processing on each aspect after each video content is watched, simultaneously performing backup storage, screening the proposed evaluation, and performing corresponding learning, improvement and synchronization processing aiming at the proposed problem.

Preferably, the step S1 includes the following specific steps:

s101, searching a single keyword or a plurality of keywords which are manually input, and searching related data in a content library;

s102, carrying out combined summarization on related data to generate a plurality of story groups;

and S103, integrating the story groups in the aspects of logic, completeness and fluency to obtain a plurality of complete stories, marking the story group closest to the description instruction as a main story group, and marking other story groups as auxiliary story groups in sequence.

Preferably, the language types in the step S2 include chinese, english, japanese, and korean.

Preferably, the caption language in the step S2 includes simplified chinese, traditional chinese, english, japanese and korean.

Preferably, the error condition in the step S3 includes missing words, word and sentence disorder, basic grammar error, display error and data mess code problem, the repair condition in the step S3 includes image unclear, image position deviation from expected position and image overlapping problem, and the audio error problem in the step S3 includes tone error, word and sentence error, audio mess and noise current sound.

The system comprises a video content generation terminal, a later-stage auxiliary unit, a content calibration unit and an interactive learning unit, wherein the output end of the video content generation terminal is in communication connection with the input end of the later-stage auxiliary unit, the output end of the later-stage auxiliary unit is in communication connection with the input end of the content calibration unit, and the output end of the content calibration unit is in communication connection with the input end of the interactive learning unit.

Preferably, the video content generating terminal comprises a keyword searching module, a story data summarizing module and a story content generating module, wherein the output end of the keyword searching module is in communication connection with the input end of the story data summarizing module, the output end of the story data summarizing module is in communication connection with the input end of the story content generating module, and the output end of the story content generating module is in communication connection with the input end of the post-stage auxiliary unit;

the keyword retrieval module is used for searching a single keyword or a plurality of keywords which are manually input, retrieving related data in the content library, and sending the related data to the story data summarization module for summarization;

the story data summarizing module is used for receiving the related data sent by the keyword searching module, summarizing the related data in a combined mode to generate a plurality of groups of story groups, and sending the plurality of groups of story groups into the story content generating module to generate a complete story;

and the story content generation module is used for receiving the multiple groups of story groups sent by the story data summarization module, carrying out logic, complete and fluency integration to obtain multiple complete stories, marking the story group closest to the description instruction as a main story group, and marking other story groups as auxiliary story groups in sequence.

Preferably, the post-stage auxiliary unit comprises an audio processing module, a caption adding module and an image animation processing module, wherein the audio processing module and the caption adding module are both in two-way communication connection with the image animation processing module, and the output end of the image animation processing module is in communication connection with the input end of the content calibration unit;

the audio processing module is integrated inside the image animation processing module and is used for adding AI audio language data to the generated image animation data;

the subtitle adding module is integrated inside the image animation processing module and is used for synchronously adding subtitles to the generated audio in real time;

the image animation processing module is used for receiving a plurality of groups of complete story groups sent by the video content generating terminal, performing image animation generation processing, audio generation and restoration processing and subtitle real-time synchronous processing, and is also used for sending video data which is subjected to image animation processing to the inside of the content calibration unit for calibration.

Preferably, the content calibration unit comprises an error word calibration module, an image restoration module and a sounding correction module, wherein the error word calibration module, the image restoration module and the sounding correction module are all in bidirectional communication connection with the content calibration unit, and the output end of the content calibration unit is in communication connection with the input end of the interactive learning unit;

the error word calibration module is used for checking whether the generated caption has error conditions or not;

the image restoration module is used for restoring the image picture information in the video content;

the sounding correction module is used for checking and correcting audio errors in the video content.

Preferably, the interactive learning unit comprises an interactive evaluation module, a backup storage module and a learning synchronization module, wherein the output end of the interactive evaluation module is in communication connection with the input end of the backup storage module, and the backup storage module is in bidirectional communication connection with the learning synchronization module;

the interaction evaluation module is used for performing evaluation processing on each aspect after each video content is watched;

the backup storage module is used for receiving the video data content and the evaluation content sent by the interaction evaluation module at the moment and carrying out backup storage;

the learning synchronization module is integrated in the backup storage module, and is used for screening the proposed evaluation and carrying out corresponding learning, improvement and synchronization processing aiming at the proposed problem.

The beneficial effects of the invention are as follows: the invention generates the automatic generation system by setting a video content generation terminal, a later-stage auxiliary unit, a content calibration unit and an interactive learning unit, searches related data in a content library by searching single key words or multiple key words which are manually input, obtains multiple complete stories, carries out image animation generation processing on multiple groups of complete story groups, adds AI audio language data to the generated image animation data, synchronously adds subtitles to the generated audio in real time, checks whether the generated subtitles have error conditions, carries out repair processing on image picture information in the video content, checks and corrects audio errors in the video content, carries out evaluation processing on each aspect after each video content is watched, simultaneously carries out backup storage, screens proposed evaluation, carries out corresponding learning, improvement and synchronous processing on the generated video content data and corresponding analysis results, carries out management, and is beneficial to realizing the management of the generated video content through internet cloud management and improves the intelligent level of the management of the generated video content.

Description of the drawings:

for ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.

FIG. 1 is a block diagram of a method and system for automatically generating video content according to the present invention;

fig. 2 is an overall topology diagram of a method and system for automatically generating video content according to the present invention.

The specific embodiment is as follows:

as shown in fig. 1 and 2, the following technical solutions are adopted in this embodiment:

s1, generating the automatic generation system, searching a single keyword or a plurality of keywords which are manually input, and carrying out internal retrieval of related data in a content library to obtain a plurality of complete stories; step S1 comprises the following specific steps: s101, searching a single keyword or a plurality of keywords which are manually input, and searching related data in a content library; s102, carrying out combined summarization on related data to generate a plurality of story groups; s103, integrating the story sets in the aspects of logic, completeness and fluency to obtain a plurality of complete stories, marking the story set closest to the description instruction as a main story set, and marking other story sets as auxiliary story sets in sequence;

s2, performing image animation generation processing on a plurality of groups of complete story groups, adding AI audio language data to the generated image animation data, wherein the language types comprise Chinese, english, japanese and Korean, and simultaneously performing real-time synchronous subtitle addition to the generated audio, and the subtitle language comprises simplified Chinese, traditional Chinese, english, japanese and Korean;

s3, checking whether the generated caption has error conditions, wherein the error conditions comprise word missing and word missing, statement disorder, basic grammar errors, display errors and data mess code problems, repairing the image and picture information in the video content, the repairing conditions comprise unclear images, deviation of image positions from expected positions and overlapping of images, checking and correcting audio errors in the video content, and the audio error problems comprise tone errors, text reading errors, audio mess and noise current sounds;

Further, the video content generating terminal comprises a keyword searching module, a story data summarizing module and a story content generating module, wherein the output end of the keyword searching module is in communication connection with the input end of the story data summarizing module, the output end of the story data summarizing module is in communication connection with the input end of the story content generating module, and the output end of the story content generating module is in communication connection with the input end of the post-stage auxiliary unit; the keyword retrieval module is used for searching a single keyword or a plurality of keywords which are manually input, retrieving related data in the content library, and sending the related data to the story data summarization module for summarization; the story data summarizing module is used for receiving the related data sent by the keyword searching module, summarizing the related data in a combined mode to generate a plurality of groups of story groups, and sending the plurality of groups of story groups into the story content generating module to generate a complete story; and the story content generation module is used for receiving the multiple groups of story groups sent by the story data summarization module, carrying out logic, complete and fluency integration to obtain multiple complete stories, marking the story group closest to the description instruction as a main story group, and marking other story groups as auxiliary story groups in sequence.

Further, the later auxiliary unit comprises an audio processing module, a caption adding module and an image animation processing module, wherein the audio processing module and the caption adding module are both in two-way communication connection with the image animation processing module, and the output end of the image animation processing module is in communication connection with the input end of the content calibration unit; the audio processing module is integrated inside the image animation processing module and is used for adding AI audio language data to the generated image animation data; the subtitle adding module is integrated inside the image animation processing module and is used for synchronously adding subtitles to the generated audio in real time; the image animation processing module is used for receiving a plurality of groups of complete story groups sent by the video content generating terminal, performing image animation generation processing, audio generation and restoration processing and subtitle real-time synchronous processing, and is also used for sending video data which is subjected to image animation processing to the inside of the content calibration unit for calibration.

Further, the content calibration unit comprises a word error calibration module, an image restoration module and a sounding correction module, wherein the word error calibration module, the image restoration module and the sounding correction module are all in bidirectional communication connection with the content calibration unit, and the output end of the content calibration unit is in communication connection with the input end of the interactive learning unit; the error word calibration module is used for checking whether the generated caption has error conditions or not; the image restoration module is used for restoring the image picture information in the video content; the sounding correction module is used for checking and correcting audio errors in the video content.

Further, the interactive learning unit comprises an interactive evaluation module, a backup storage module and a learning synchronization module, wherein the output end of the interactive evaluation module is in communication connection with the input end of the backup storage module, and the backup storage module is in bidirectional communication connection with the learning synchronization module; the interaction evaluation module is used for performing evaluation processing on each aspect after each video content is watched; the backup storage module is used for receiving the video data content and the evaluation content sent by the interaction evaluation module at the moment and carrying out backup storage; the learning synchronization module is integrated in the backup storage module, and is used for screening the proposed evaluation and carrying out corresponding learning, improvement and synchronization processing aiming at the proposed problem.

Example 1

When the automatic generation system is connected to the video content management:

s1, generating an automatic generation system, wherein a keyword searching module searches a single keyword or a plurality of keywords which are manually input, relevant data is searched in a content library, a story data summarizing module carries out combined summarization on the relevant data to generate a plurality of groups of story groups, a story content generating module carries out logic, complete and fluency aspect integration on the story groups to obtain a plurality of complete stories, the story group closest to a description instruction is marked as a main story group, and other story groups are ordered one by one and marked as auxiliary story groups;

s2, performing image animation generation processing on a plurality of groups of complete story groups by an image animation processing module, adding AI audio language data to the generated image animation data by an audio processing module, wherein the language types comprise Chinese, english, japanese and Korean, and simultaneously performing real-time synchronous subtitle addition to the generated audio by a subtitle adding module, wherein the subtitle language comprises simplified Chinese, traditional Chinese, english, japanese and Korean;

s3, checking whether the generated caption has error conditions or not by using a staggered character calibration module, wherein the error conditions comprise word missing and word missing, statement disorder, basic grammar error, display error and data mess code, the image picture information in the video content is repaired by using an image repairing module, the repairing conditions comprise unclear images, deviation of image positions from expected positions and image overlapping problems, and the audio error in the video content is checked and corrected by using a sounding correction module, and the audio error problems comprise tone error, text error, audio mess and noise current sound;

and S4, performing evaluation processing on each aspect after each video content is watched by the interaction evaluation module, simultaneously performing backup storage by the backup storage module, screening the proposed evaluation by the learning synchronization module, and performing corresponding learning, improvement and synchronization processing on the proposed problem.

Example 2

When the automatic generation system is connected to the scenario content management:

s1, generating an automatic generation system, wherein a keyword search module searches single keywords or a plurality of keywords of a manually input script, performs internal search related data of a content library, a story data summarizing module performs combined summarization on related script data to generate a plurality of script groups, a story content generation module performs logic, complete and fluency aspect integration on the script groups to obtain a plurality of complete scripts, the story group closest to a description instruction is recorded as a main script group, and other script groups are sequentially recorded as auxiliary script groups one by one;

s2, performing image animation generation processing on a plurality of complete script sets by using an image animation processing module, adding AI audio language data including Chinese, english, japanese and Korean to the generated image animation data by using an audio processing module, and synchronously adding subtitles to the generated audio in real time by using a subtitle adding module, wherein the subtitles comprise simplified Chinese, traditional Chinese, english, japanese and Korean;

s3, checking whether the generated caption has error conditions or not by using a staggered character calibration module, wherein the error conditions comprise word missing and word missing, statement disorder, display errors and data mess code problems, repairing the image picture information in the video content by using an image repairing module, wherein the problems comprise that the image position deviates from an expected position and the image is overlapped, and checking and correcting the audio errors in the video content by using a sounding correction module, wherein the audio errors comprise tone errors, audio mess and noise current sounds;

and S4, performing evaluation processing on each aspect after each script content is watched by the interactive evaluation module, simultaneously performing backup storage by the backup storage module, screening the proposed evaluation by the learning synchronization module, and performing corresponding learning, improvement and synchronization processing on the problem presented by the script.

Specific: in practical application, the invention has a plurality of video content generating terminals which are respectively matched with a later-stage auxiliary unit, a content calibrating unit and an interactive learning unit for use, generates the automatic generating system by arranging the video content generating terminal, the later-stage auxiliary unit, the content calibrating unit and the interactive learning unit, searches the related data in a content library by searching a single keyword or a plurality of keywords which are manually input to obtain a plurality of complete stories, performs image animation generating processing on a plurality of groups of complete stories, adds AI audio language data to the generated image animation data, synchronously adds subtitles to the generated audio in real time, checks whether the generated subtitles have error conditions, performs repairing processing on image picture information in the video content, checks and corrects audio errors existing in the video content, each aspect of evaluation processing is carried out after each video content is watched, backup storage is carried out at the same time, the proposed evaluation is screened, corresponding learning, improvement and synchronous processing are carried out aiming at the proposed problem, management, visualization and storage are carried out on generated video content data and corresponding analysis results, the generation of video content management is realized through internet cloud management, the intelligent level of generated video content management is improved, an automatic generation system is generated, a keyword search module searches single keywords or a plurality of keywords which are manually input, related data is searched in a content library, a story data summarization module carries out combined summarization on the related data, a plurality of groups of story groups are generated, a story content generation module carries out logic, complete and fluency aspect integration on the story groups to obtain a plurality of complete stories, the story group closest to a description instruction is recorded as a main story group, sorting and marking other story groups one by one as auxiliary story groups; the image animation processing module performs image animation generation processing on a plurality of groups of complete story groups, the audio processing module adds AI audio language data to the generated image animation data, the language types comprise Chinese, english, japanese and Korean, the subtitle adding module synchronously adds subtitles to the generated audio in real time, and the subtitle languages comprise simplified Chinese, traditional Chinese, english, japanese and Korean; the error word calibration module is used for checking whether the generated subtitle has error conditions, wherein the error conditions comprise word missing and word missing, statement disorder, basic grammar errors, display errors and data mess codes, the image restoration module is used for restoring image and picture information in video content, the restoration conditions comprise unclear images, deviation of image positions from expected positions and image overlapping problems, the sounding correction module is used for checking and correcting audio errors in the video content, and the audio error problems comprise tone errors, text reading errors, audio mess and noise current sounds; the interactive evaluation module performs evaluation processing on each aspect after each video content is watched, the backup storage module performs backup storage at the same time, the learning synchronization module screens the proposed evaluation, and performs corresponding learning, improvement and synchronization processing aiming at the proposed problem.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The automatic video content generation method is characterized by comprising the following specific steps:

2. The method for automatically generating video contents according to claim 1, wherein said step S1 comprises the specific steps of:

3. The method for automatically generating video content according to claim 1, wherein: the language types in the step S2 include chinese, english, japanese, and korean.

4. The method for automatically generating video content according to claim 1, wherein: the caption language in the step S2 includes simplified chinese, traditional chinese, english, japanese and korean.

5. The method for automatically generating video content according to claim 1, wherein: the error condition in the step S3 comprises the problems of word missing, statement disorder, basic grammar error, display error and data mess code, the repair condition in the step S3 comprises the problems of unclear image, deviation of the image position from the expected position and overlapping of the image, and the audio error problem in the step S3 comprises the problems of tone error, word reading error, audio mess and noise current sound.

6. An automatic video content generation system, characterized in that: the system comprises a video content generation terminal, a later-stage auxiliary unit, a content calibration unit and an interactive learning unit, wherein the output end of the video content generation terminal is in communication connection with the input end of the later-stage auxiliary unit, the output end of the later-stage auxiliary unit is in communication connection with the input end of the content calibration unit, and the output end of the content calibration unit is in communication connection with the input end of the interactive learning unit.

7. The automatic video content generation system according to claim 6, wherein: the video content generation terminal comprises a keyword retrieval module, a story data summarization module and a story content generation module, wherein the output end of the keyword retrieval module is in communication connection with the input end of the story data summarization module, the output end of the story data summarization module is in communication connection with the input end of the story content generation module, and the output end of the story content generation module is in communication connection with the input end of the post-stage auxiliary unit;

8. The automatic video content generation system according to claim 6, wherein: the post-stage auxiliary unit comprises an audio processing module, a caption adding module and an image animation processing module, wherein the audio processing module and the caption adding module are both in two-way communication connection with the image animation processing module, and the output end of the image animation processing module is in communication connection with the input end of the content calibration unit;

9. The automatic video content generation system according to claim 6, wherein: the content calibration unit comprises a word error calibration module, an image restoration module and a sounding correction module, wherein the word error calibration module, the image restoration module and the sounding correction module are all in two-way communication connection with the content calibration unit, and the output end of the content calibration unit is in communication connection with the input end of the interactive learning unit;

10. The automatic video content generation system according to claim 6, wherein: the interactive learning unit comprises an interactive evaluation module, a backup storage module and a learning synchronization module, wherein the output end of the interactive evaluation module is in communication connection with the input end of the backup storage module, and the backup storage module is in bidirectional communication connection with the learning synchronization module;