CN109614537A

CN109614537A - For generating the method, apparatus, equipment and storage medium of video

Info

Publication number: CN109614537A
Application number: CN201811489221.4A
Authority: CN
Inventors: 张峥; 徐伟建; 罗雨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2019-04-12

Abstract

This disclosure relates to method, apparatus, equipment and storage medium for generating video.According to an example implementations, a kind of method for generating video is provided.In the method, one group of article including text and picture associated with video subject is obtained.Knowledge based model determines the scoring for one group of article.At least one article is selected from one group of article based on scoring.Based at least one article of selection text and picture generate video.According to the example implementations of present disclosure, device, equipment and the computer storage medium for generating video are additionally provided.Using above-mentioned implementation, video can be generated in a manner of more convenient and is efficient in the case where being not necessarily to manual intervention.

Description

For generating the method, apparatus, equipment and storage medium of video

Technical field

The implementation of present disclosure broadly relates to video generation, and more particularly, to for being based on text The method, apparatus, equipment and computer storage medium of video are generated with picture.

Background technique

There is more and more multimedia platforms (website or application) at present, user may browse through platform and be provided The static content of such as text and picture.However, the exhibition method of these static contents is more uninteresting.For for example introductory or For the tendentious content of person, user is more likely to the dynamic content of browsing visual form.At this point, for given video subject For, corresponding text and picture how are obtained based on video subject, and generate based on the text of acquisition and picture dynamic State video becomes a problem to be solved.Come in a manner of more convenient and is effective therefore, it is desired to be able to provide one kind Generate the technical solution of video.

Summary of the invention

According to the sample implementation of present disclosure, provide a kind of for generating the scheme of video.

In the first aspect of present disclosure, a kind of method for generating video is provided.In the method, it obtains and regards The associated one group of article including text and picture of frequency theme.Knowledge based model determines the scoring for one group of article. At least one article is selected from one group of article based on scoring.Text and picture at least one article based on selection come Generate video.

In in the second aspect of the present disclosure, provide a kind of for generating the device of video.The device includes: the dress Setting includes: acquisition module, is configured to obtain one group of article including text and picture associated with video subject；Determine mould Block is configured to knowledge based model to determine the scoring for one group of article；Selecting module, be configured to based on scoring come from At least one article is selected in one group of article；And generation module, the text being configured at least one article based on selection Word and picture generate video.

In the third aspect of present disclosure, a kind of equipment is provided.The equipment includes one or more processors；With And storage device, for storing one or more programs, when one or more programs are executed by one or more processors, so that The method that one or more processors realize the first aspect according to present disclosure.

In the fourth aspect of present disclosure, a kind of computer-readable Jie for being stored thereon with computer program is provided Matter, the method which realizes the first aspect according to present disclosure when being executed by processor.

It should be appreciated that content described in Summary is not intended to limit the implementation of present disclosure Crucial or important feature, it is also non-for limiting the scope of the disclosure.Other features of present disclosure will be by below Description is easy to understand.

Detailed description of the invention

It refers to the following detailed description in conjunction with the accompanying drawings, it is the above and other feature of each implementation of present disclosure, excellent Point and aspect will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, In:

Fig. 1 diagrammatically illustrates the diagram of the static content including text and picture；

Fig. 2 diagrammatically illustrates the technical solution according to the example implementations of present disclosure for generating video Block diagram；

Fig. 3 diagrammatically illustrates the stream of the method for generating video of the example implementations according to present disclosure Cheng Tu；

Fig. 4 diagrammatically illustrates the text and figure with incidence relation of the example implementations according to present disclosure The block diagram of piece；

Fig. 5 A and Fig. 5 B diagrammatically illustrate respectively according to the example implementations of present disclosure based on text and The positional relationship of picture determines the block diagram of associated text and picture；

Fig. 6 diagrammatically illustrates the subtitle of the example implementations according to present disclosure being used to determine in video Block diagram；

Fig. 7 diagrammatically illustrates the signal of the frame sequence with subtitle of the example implementations according to present disclosure Figure；

Fig. 8 diagrammatically illustrates the frame of the device for generating video of the example implementations according to present disclosure Figure；And

Fig. 9 shows the block diagram that can implement the calculating equipment of multiple implementations of present disclosure.

Specific embodiment

The implementation of present disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Certain implementations of content, it should be understood that, present disclosure can be realized by various forms, and not answered This is construed as limited to the implementation illustrated here, provides these on the contrary and is achieved in that for more thorough and complete geography Solve present disclosure.It should be understood that the attached drawing and being given for example only property of implementation of present disclosure act on, it is not intended to Limit the protection scope of present disclosure.

In the description of the implementation of present disclosure, term " includes " and its similar term should be understood as opening Include, i.e., " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " implementation " Or " implementation " should be understood as " at least one implementation ".Term " first ", " second " etc. may refer to difference Or identical object.Hereafter it is also possible that other specific and implicit definition.

For ease of description, the static content in multimedia platform (website or application) is described referring first to Fig. 1.Fig. 1 Diagrammatically illustrate the diagram of the static content 100 including text and picture.As shown in Figure 1, in such as static state of article 110 Rong Zhong may include text 120 and picture 130.At this point, user needs to read content of the text to understand article line by line.It is logical Often, article 110 may include multiple pages, and user has to execute example using keyboard, mouse or touch screen etc. at this time It such as rolls, page turning operation, to control the display of static content.Sometimes user is simultaneously inconvenient to execute input control, and a large amount of User is more likely to obtain information from dynamic video.Especially for the content of content introduction, education or illustrative, user It prefers to viewing dynamic video rather than reads uninteresting static article.Accordingly, it is desirable to can develop with more efficiently side Formula generates the technical solution of video.

The technical solution that video is generated based on various ways has been proposed at present.For example, in designated theme In the case of, relevant text and picture can be searched for based on manual type, and video is generated based on the text of acquisition and picture. In order to improve the efficiency of artificial treatment, automatic search text and picture and the technical solution for generating video also proposed at present.? In the technical solution, the text of acquisition and the quantity of picture can not be well matched with.On the one hand, a large amount of texts may be found It introduces and only a small amount of image content, has to retrieve other pictures again at this time.On the other hand, lack between text and picture Relevance generates video based on text and picture at this time, it will leads to the audio content generated in video based on text and is based on It is inconsistent that theme is described between the picture that picture generates.At this point, how to acquire text and picture, and based on collected text and Picture, which generates dynamic video, becomes a problem to be solved.

In order at least be partially solved the deficiency in above-mentioned technical proposal, according to the exemplary realization of the disclosure, provide It is a kind of for generating the technical solution of video.Hereinafter, it will refer to the exemplary realization that Fig. 2 is broadly described the disclosure.Fig. 2 shows Meaning property shows the block diagram 200 for being used to generate the technical solution of video according to the example implementations of present disclosure.Such as Fig. 2 It is shown, search operation can be executed in network 220 based on video subject 210.Video subject 210 can specify expectation and generate Video content topic, for example, " Summer Palace introduction ", " the Forbidden City introduction " etc..It hereinafter, will be to generate the related " Summer Palace The video of introduction " is example, and the more details of the implementation according to the disclosure are described in detail.

At this point it is possible to based on search engine (for example, " Baidu " search engine or other search engines) in network 220 Search is executed, to obtain one group of article 230 associated with video master map 210.The Summer Palace is largely introduced at this point it is possible to find Webpage article.It includes text 120 and figure that each article in one group of article 230 in this, which for example can be as shown in Figure 1, The article 110 of piece 130.Then, the scoring of each article in one group of article 230 can be determined with knowledge based model 250 240.Knowledge model 250 in this for example can be training in advance for determining that each article is appropriate for generate video Model.Then, at least one article 260 for being more suitable for generating video can be selected based on scoring 240.It can use choosing The text and picture that include at least one article 260 selected generates video 270.When based on 120 He of text as shown in Figure 1 Picture 130 is come when generating video 240, then the picture in video can show picture 130, and audio-frequency unit can play text 120 audio.

It is realized using above-mentioned example, magnanimity article can be searched in network 220, for use in determining and video subject 210 associated articles.According to the example implementations of the disclosure, searching for that magnanimity article ensures in network 220 can be with Material abundant is provided to be used to generate video 270, is avoided the case where requiring supplementation with search in the later period.Further, scoring 240 To determine that selecting which article to be suitable for generation video provides quantitative criteria.According to the example implementations of the disclosure, comment Dividing 240 may include various contents, and the quality of the material for generating video 270 can be improved by this method, facilitate Selection is more suitable for generating the material of video 270.In the video 270 generated by this method, it can be ensured that image content and Audio content has close relevance, that is, audio content can execute explanation and illustration around image content.

Hereinafter, more details of Fig. 3 description in relation to generating video 270 be will refer to.Fig. 3 is diagrammatically illustrated according to this The flow chart of the method 300 for generating video of the example implementations of disclosure.At frame 310, acquisition and video The associated one group of article 230 including text and picture of theme 210.It can be used herein having developed at present or will be A variety of search techniques of future exploitation, to obtain one group of article 230.

According to the example implementations of the disclosure, classification belonging to the video that can be generated based on expectation is held to determine The field of row search.Such as, if it is desired to it generates sight spot and introduces class video, then can choose and execute search in tourism website. If it is desire to generating educational video, then it can choose and execute search in encyclopaedia class website.In this way, it is possible to more accurate Mode find and the more relevant material of video subject 210.

At frame 320, the scoring for one group of article 230 can be determined with knowledge based model 250.It will be understood that This knowledge model 250 may include many factors.For example, may include scoring, no for the quality score of article, emotion Many-sided relevant factor such as good content scores.

According to the example implementations of the disclosure, quality score can be determined based on the quantity of the picture in article. In order to generate the needs of video, can preferentially select include more picture article.Therefore, it is possible to for including more picture Article provide higher quality score, and provide lower quality score for the article including less picture.

According to the example implementations of the disclosure, quality score can be determined based on the quantity of the text in article. Due to generating the audio content in video based on word content, thus the number of word content determines view to a certain extent Thus the length of audio in frequency determines the length of generated video.Typically, user is easier to receive time span to exist Video between 2-4 minutes, thus quality score can be determined based on the quantity of text in article.For example, can be number of words Higher scoring is provided for the article within 500 words, and provides lower scoring for the article that number of words is more than 2000 words.

According to the example implementations of the disclosure, can be determined based on the relationship between the text and picture in article Quality score.Typically, the picture in article is the explanation in order to cooperate text, thus can preferentially select text and figure The more uniform article of the distribution of piece.Assuming that some article includes 10 sections of texts, and picture is shown below every section of text, Then higher scoring can be provided for this article at this time.Assuming that some article includes 10 sections of texts, and only in article starting position Include 1 picture with end position, then can provide lower scoring at this time for this article.

It will be understood that although the independent example in relation to quality score is diagrammatically illustrated above, according to the example of the disclosure Above-mentioned example, can also be used in combination by property implementation.For example, can be respectively that corresponding weight is arranged in each aspect, And final quality score is determined based on modes such as such as weighted sums.

According to the example implementations of the disclosure, the emotion scoring of comment 210 can be determined.Emotion scoring in this can To indicate that article content takes positive, neutral or passive attitude for some theme.For example, can in section [- 1,1] table Show that emotion scores, wherein " -1 " indicates passive attitude, " -1 " indicates positive attitude.For introductory video, it is more prone to Video is generated in selecting article based on positive attitude, thus higher emotion can be set for the article with positive attitude Scoring, and be that there is the article of passive attitude lower emotion scoring is arranged.It, can be with according to the example implementations of the disclosure Obtain predefined emotion keyword database, emotion keyword database may include respectively indicate actively, message and in The keyword of vertical emotion.Then, the one or more keywords extracted from article and emotion number of keyword number can be compared According to library, to determine the emotion scoring of article.

According to the example implementations of the disclosure, scoring can also be provided for whether article includes harmful content. Harmful content in this for example can include but is not limited to ad content, Pornograph, reaction content, swindle content etc..It can be with Lower scoring is provided for the article for including above-mentioned harmful content.

According to the example implementations of the disclosure, this can be determined by text in analysis article and/or picture The emotion of article scores and the scoring in relation to harmful content.For example, it may be possible to which there are following situations for meeting, in the word segment of certain article It is related to the introduction in the Summer Palace, and includes the advertising of such as " second-hand house sale " etc in picture, then can also knows at this time Other picture, to determine whether including harmful content.

According to the example implementations of the disclosure, filter operation can also be executed for the one group of article found, so as to Remove including duplicate contents.The behaviour of above-described determining scoring can be executed only for filtered each article Make.

At frame 330, at least one article is selected from one group of article 230 based on scoring.Hereinbefore in detail Describe how to determine the scoring in terms of the related quality, emotion and harmful content of article.According to the exemplary realization of the disclosure The scoring of above-mentioned different aspect can also be used in combination by mode.For example, can be respectively each corresponding power of aspect setting Weight, such as final scoring is determined based on modes such as weighted sums.

At frame 340, can based at least one article of selection text and picture generate video 270.Under Wen Zhong, will be described in how based in an article text and picture generate the details of video 270.According to the disclosure Example implementations can select the figure in given article 110 for the given article 110 at least one article first Piece 130 then determines text 120 associated with picture 130.

Fig. 4 diagrammatically illustrates the text and figure with incidence relation of the example implementations according to present disclosure The block diagram 400 of piece.As shown in figure 4, may include multiple pictures 130 and 430 in article 110, and this article 110 can be with Text 120 and 420 including multiple paragraphs.At this point it is possible to determine which text is to describe given picture based on various ways Text.

According to the example implementations of the disclosure, can based on the positional relationship of picture and text in given article come Determine text associated with given picture.Typically, the text for describing image content is located at paragraph near picture.Cause And text and picture with incidence relation can be determined based on the positional relationship between text and picture.For example, in Fig. 4 The text 120 of text 110 including the summarized introduction Summer Palace close to 120 lower section of text further include the landmark building in the Summer Palace Picture 130.It can determine that text 120 and picture 130 have incidence relation 410 at this time.In another example the text 110 in Fig. 4 wraps The text 420 for introducing ten seven apertures in the human head bridge of Summer Palace famous sites is included, further includes the picture of ten seven apertures in the human head bridges close to 420 lower section of text 430.It can determine that text 420 and picture 430 have incidence relation 440 at this time.

Hereinafter, it will refer to how Fig. 5 A and Fig. 5 B description determine that text and the more of picture with incidence relation show Example.Fig. 5 A diagrammatically illustrate according to the positional relationship based on text and picture of the example implementations of present disclosure come Determine the block diagram 500A of associated text and picture.As shown in Figure 5A, article 510A includes text 520A and picture 530A, and And text 520A is looped around around picture 530A.At this point it is possible to determine that text 520A and picture 530A has incidence relation.

Fig. 5 B diagrammatically illustrates the position based on text and picture of the example implementations according to present disclosure Relationship determines the block diagram 500B of associated text and picture.As shown in Figure 5 B, article 510B includes text 520B and picture 530B, and text 520B is close to the top of picture 530B.Typically, the text of the paragraph of picture upper and lower part is to be directed to The description of image content, at this point it is possible to determine that text 520A and picture 530A has incidence relation.

According to the example implementations of the disclosure, in order to more accurately determine the text and picture with incidence relation, Text region and image recognition technology are also based on to execute further processing.For example, can be extracted from text 520B Keyword, and identify the content of image 530B.Assuming that including " ten seven apertures in the human head bridges " in the keyword of extraction and determining image When the content of 530B is the photo of ten seven apertures in the human head bridges, then it can determine that text 520B and picture 530B has incidence relation.Assuming that Do not include " ten seven apertures in the human head bridges " in the keyword of extraction, then can be carried out for the text 540B of paragraph below picture 530B further Processing.Assuming that text 540B includes " ten seven apertures in the human head bridges ", then it can determine that text 540B and picture 530B has incidence relation.Alternatively Ground and/or additionally can also retrieve the paragraph including keyword " ten seven apertures in the human head bridges " in article, and the paragraph found is arranged It is associated with picture 530B.

According to the example implementations of the disclosure, can based on the adduction relationship of picture and text in given article come Determine text associated with given picture.Often will appear in article for example " See Figure ", " as shown below ", " referring to figure 1 " etc. for indicating the adduction relationship between text and picture.It is closed in this way, it is possible to determine to have in a manner of more accurate The text and picture of connection relationship.

Realized using above-mentioned example, since picture and text are respectively positioned on identical article, thus text and picture have compared with Strong correlation, thus can more effectively select the text with strong correlation and picture.In this way it is possible to assure that base The correlation of picture and audio in the video that the text and picture that this mode selects generate.Relative to based on artificial selection or Person be based on search from a large amount of articles select text and picture and for the video of " piecing together ", the theme of the video generated by this method It is more clear, and the consistency of picture and audio is higher.

According to the example implementations of the disclosure, the frame sequence in video can be generated based on the picture of selection, and And the audio content in video can be generated based on determining text.Hereinafter, it will refer to such as 130 He of picture in Fig. 4 Associated text 120 generates the more details of video clip.

According to the example implementations of the disclosure, text 120 can be converted into voice using as audio content first. Multiple voice generation technique can be used herein will realize above-mentioned steps.For example, can be using the tts skill of Baidu company exploitation Art or text are realized to other switch technologies of voice.

According to the example implementations of the disclosure, the frame sequence in video can be generated based on picture 130.Specifically Ground can determine the time span of voice, and generate the frame sequence with time span based on picture.In other words, by picture 130 show the time span, that is, showing static picture 130 in the time span.It converts by text 120 to voice Later, it is assumed that the time span of voice is 12 seconds, then a video clip can be generated.In the length that the segment includes 12 seconds, Image content is static images 130, and voice content is the 12 seconds phonetic explainings obtained based on text 120.

According to the example implementations of the disclosure, when the animation for being also based on picture 130 shows to generate with this Between length frame sequence.For example, the picture in video can be generated by the way of being similar to the lantern slide page.In a reality In existing, the picture 130 can be made dynamically to move into screen, screen is removed or various fade effects can also be set, so as to Picture 130 is gradually transitted to from previous picture.

According to the example implementations of the disclosure, subtitle can also be added into video 270.Specifically, it can be based on The resolution ratio of video determines the subtitle length of subtitle associated with frame sequence.Subtitle length in this, which refers to, to be suitable for regarding The length of the text shown in a frame in frequency.The more high then subtitle length of the resolution ratio of video is bigger, and the resolution ratio of video is got over Low then subtitle length is smaller.According to the example implementations of the disclosure, it is also contemplated that each sentence in text 120 it is disconnected Sentence, to determine subtitle length.For example, subtitle length can be set to the numerical value between 10-20 word.

Text can be divided at least one subtitle segment based on subtitle length.Hereinafter, will refer to Fig. 6 description has Close the more details for generating subtitle.Fig. 6 is diagrammatically illustrated according to the example implementations of present disclosure for determining The block diagram 600 of subtitle in video.As shown in fig. 6, text 120 may include " Summer Palace, Chinese Qing Dynasty's period imperal garden, Predecessor is the garden Qing Yi, is located at west suburb in Beijing, 15 kilometers away from city, takes up an area about 290 hectares ".At this point it is possible to which text 120 is drawn It is divided into 3 subtitle segments.Subtitle segment 610 may include text " Summer Palace, Chinese Qing Dynasty's period imperal garden "；Subtitle segment 620 can To include text " predecessor is located at west suburb in Beijing for the garden Qing Yi "；And subtitle segment 620 may include that text is " public away from city 15 In, take up an area about 290 hectares ".

It will be understood that, it is assumed that the time span of the audio including whole texts 120 is 12 seconds, then the length of entire video can To be 12 seconds.Further, it is possible to determine the time number of the subtitle segment at least one subtitle segment based on 12 seconds time spans According to.Fig. 6 shows time shaft 640, and the 12 seconds time of the audio for playing whole texts 120 is shown.Due to each A subtitle segment 610,620 is substantially similar with the quantity for the text for including in 630, thus can with similar length come play with respectively A subtitle segment 610,620 and 630 associated audios.For example, can within the 0th second to the 4th second period caption playing section 610 audio, can within the 4th second to the 8th second period caption playing section 620 audio, can be at the 8th second to the 12nd second Period in caption playing section 630 audio.

Based on time data associated with each subtitle segment as shown in Figure 6, can in frame sequence and the time Subtitle segment is inserted at the corresponding position of data.Fig. 7 is diagrammatically illustrated according to the example implementations of present disclosure The schematic diagram 700 of frame sequence with subtitle.As shown in fig. 7, the picture such as attached drawing mark of the frame sequence for the 0th second to the 4th second Shown in note 710, which includes the content " Summer Palace, Chinese Qing Dynasty's period imperal garden " in subtitle segment 610.

According to the example implementations of the disclosure, it is aobvious to protrude to can use the color different from the color of picture 130 Show subtitle.According to the example implementations of the disclosure, various animation effects can be set for subtitle, such as can be with audio Broadcasting and gradually show the text in subtitle, subtitle, etc. can be shown using fade in/out effect.Similar to picture 710, appended drawing reference 720 and 730 is respectively illustrated for the picture of the 4th second to the 8th second frame sequence and for the 8th second to the The picture of 12 seconds frame sequences.At this point, picture 720 is including the text in subtitle segment 620, " predecessor is the garden Qing Yi, is located at Beijing Western suburb ", picture 730 include the text " 15 kilometers away from city, take up an area about 290 hectares " in subtitle segment 630.

According to the example implementations of the disclosure, background sound can also be added into the audio content of video.For example, Background music can be set for audio commentary, or various audio processing effects can also be set.

It will be understood that although only diagrammatically illustrating based on 120 next life an of picture 130 and its associated text above At the details of a video clip, according to the example implementations of the disclosure, it is also based on from one or more texts More pictures of chapter and and its associated text generate more video clips.According to the exemplary realization side of the disclosure Multiple video clips can be combined by formula, to form new video.

According to the example implementations of the disclosure, can also according to the predetermined time length for the video that expectation generates come from Text and picture are selected in one or more articles, and video is generated based on the text of selection and picture.It at this time can be with base Bright reading rate under normal word speed determines to need how many text are selected from article.Assuming that predetermined time length is 2 minutes, And bright reading rate is 150 words per minute clocks, then can select the content of about 300 words from article at this time.

Picture is also based on when in article 110 including multiple pictures according to the example implementations of the disclosure Quantity determines the number of words of text associated with each picture.For example, it is assumed that including 6 pictures in article 110, then can divide Not Wei each picture select the word contents of about 50 words.In another example be also based on each picture importance or other Because usually selecting text appropriate.It, can also be in the view generated based on each picture according to the example implementations of the disclosure Transitional content is added between frequency segment.

The multiple implementations for how generating the method 300 of video 270 are hereinbefore described in detail.According to this public affairs The example implementations opened additionally provide the device for generating video 270.Hereinafter, it will refer to Fig. 8 detailed description.

Fig. 8 diagrammatically illustrates the device 800 for being used to generate video of the example implementations according to present disclosure Block diagram.The device 800 includes: to obtain module 810, and being configured to obtain associated with video subject includes text and picture One group of article；Determining module 820 is configured to knowledge based model to determine the scoring for one group of article；Selecting module 830, it is configured to select at least one article from one group of article based on scoring；And generation module 840, it is configured to base Text and picture at least one article of selection generate video.

According to the example implementations of the disclosure, determining module 820 includes: grading module, is configured to determination and is directed to The scoring of at least any one in below: quality score, emotion scoring, harmful content scoring.

According to the example implementations of the disclosure, generation module 840 includes: picture determining module, is configured to be directed to Given article at least one article selects the picture in given article, to generate in video based on the picture of selection Frame sequence；Text determining module is configured to determine text associated with picture, to generate video based on determining text In audio content.

According to the example implementations of the disclosure, text determining module includes: the first determining module, is configured to be based on The positional relationship of picture and text in given article, determines text associated with picture；And second determining module, configuration For the adduction relationship based on picture and text in given article, text associated with picture is determined.

According to the example implementations of the disclosure, generation module 840 further comprises: speech production module, and configuration is used In being voice using as audio content by text conversion.

According to the example implementations of the disclosure, generation module 840 further comprises: time module, is configured to really The time span of attribute sound；And frame generation module, it is configured to generate the frame sequence with time span based on picture.

According to the example implementations of the disclosure, which further comprises: subtitle length determination modul, configuration For determining the subtitle length of subtitle associated with frame sequence based on the resolution ratio of video；Subtitle division module, configuration are used In text is divided at least one subtitle segment based on subtitle length；Temporal information determining module was configured to based on the time Length determines the time data of the subtitle segment at least one subtitle segment；And subtitle is inserted into module, when being configured to be based on Between data, subtitle segment is inserted at the position corresponding with time data in frame sequence.

According to the example implementations of the disclosure, generation module 840 further comprises: background module, be configured to Background sound is added in the audio content of video.

According to the example implementations of the disclosure, generation module 840 further comprises: animation is configured to base Frame sequence is generated in the animation effect of picture.

According to the example implementations of the disclosure, which further comprises: video length determining module, configuration For determining predetermined time length associated with video.

According to the example implementations of the disclosure, generation module 840 further comprises: mark module is configured to base In predetermined time length, text and picture are identified at least one article of selection；And video generation module, it is configured to Video is generated based on the text of mark and picture.

Fig. 9 shows the block diagram that can implement the calculating equipment 900 of multiple implementations of present disclosure.Equipment 900 The method that can be used to implement Fig. 3 and Fig. 6 description.As shown, equipment 800 includes central processing unit (CPU) 901, it can To be loaded into random visit according to the computer program instructions being stored in read-only memory (ROM) 902 or from storage unit 908 The computer program instructions in memory (RAM) 903 are asked, to execute various movements appropriate and processing.In RAM 903, also Equipment 900 can be stored and operate required various programs and data.CPU 901, ROM 902 and RAM 903 by bus 904 that This is connected.Input/output (I/O) interface 905 is also connected to bus 904.

Multiple components in equipment 900 are connected to I/O interface 905, comprising: input unit 906, such as keyboard, mouse etc.； Output unit 907, such as various types of displays, loudspeaker etc.；Storage unit 908, such as disk, CD etc.；And it is logical Believe unit 909, such as network interface card, modem, wireless communication transceiver etc..Communication unit 909 allows equipment 900 by such as The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.

Processing unit 901 executes each method as described above and processing, such as method 300 and 600.For example, one In a little implementations, method 300 and 600 can be implemented as computer software programs, be tangibly embodied in machine readable Jie Matter, such as storage unit 908.In some implementations, some or all of of computer program can be via ROM 902 And/or communication unit 909 and be loaded into and/or be installed in equipment 900.When computer program loads are to RAM 903 and by CPU When 901 execution, the one or more steps of procedures described above 400 can be executed.Alternatively, in other implementations, CPU 901 can be configured as execution method 300 and 600 by other any modes (for example, by means of firmware) appropriate.

According to the example implementations of present disclosure, a kind of computer for being stored thereon with computer program is provided Readable storage medium storing program for executing.Method described in the disclosure is realized when program is executed by processor.

Function described herein can be executed at least partly by one or more hardware logic components.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used includes: field programmable gate array (FPGA), dedicated Integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device (CPLD) etc..

Program code for implementing the method for present disclosure can be using any group of one or more programming languages It closes to write.These program codes can be supplied to general purpose computer, special purpose computer or other programmable data processing units Processor or controller so that program code when by processor or controller execution when make to be advised in flowchart and or block diagram Fixed function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as independence Software package partly executes on machine and partly executes or hold on remote machine or server on the remote machine completely Row.

In the context of present disclosure, machine readable media can be tangible medium, may include or stores The program for using or being used in combination with instruction execution system, device or equipment for instruction execution system, device or equipment.Machine Device readable medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media may include but unlimited In times of electronics, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content What appropriate combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable Formula computer disks, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage are set Standby or above content any appropriate combination.

Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result. Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above Body realize details, but these be not construed as to scope of the present disclosure limitation.In individual implementation Certain features described in context can also be realized in combination in single realize.On the contrary, in the context individually realized Described in various features can also realize individually or in any suitable subcombination in multiple realizations.

Although having used specific to this theme of the language description of structure feature and/or method logical action, answer When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary, Special characteristic described in face and movement are only to realize the exemplary forms of claims.

Claims

1. a kind of method for generating video, comprising:

Obtain one group of article including text and picture associated with video subject；

Knowledge based model determines the scoring for one group of article；

At least one article is selected from one group of article based on the scoring；And

Based at least one article described in selection text and picture generate video.

2. according to the method described in claim 1, wherein knowledge based model determines the scoring packet for one group of article Include the scoring for determining at least any one being directed in following:

Quality score, emotion scoring, harmful content scoring.

3. according to the method described in claim 1, based at least one article described in selection text and picture generate Video includes: the given article at least one article for described in,

The picture in the given article is selected, to generate the frame sequence in the video based on the picture of selection；With And

Text associated with the picture is determined, to generate in the audio in the video based on the determining text Hold.

4. according to the method described in claim 3, wherein determining that associated with picture text includes based in following Any one of at least determine text associated with the picture:

The picture and positional relationship of the text in the given article；And

The picture and adduction relationship of the text in the given article.

5. according to the method described in claim 3, wherein being generated in the audio in the video based on the determining text Appearance includes:

It is voice using as the audio content by the text conversion.

6. according to the method described in claim 5, wherein generating the frame sequence in the video based on the picture of selection Include:

Determine the time span of the voice；And

The frame sequence with the time span is generated based on the picture.

7. according to the method described in claim 6, further comprising:

The subtitle length of subtitle associated with the frame sequence is determined based on the resolution ratio of the video；

The text is divided at least one subtitle segment based on the subtitle length；

The time data of the subtitle segment at least one described subtitle segment are determined based on the time span；And

Based on the time data, the subtitle is inserted at the position corresponding with the time data in the frame sequence Section.

8. according to the method described in claim 5, wherein generating audio content in the video based on the text into one Step includes:

Background sound is added into the audio content of the video.

9. according to the method described in claim 5, generating the frame sequence with the time span based on the picture into one Step includes:

Animation effect based on the picture generates the frame sequence.

10. according to the method described in claim 1, further comprising: determining predetermined time length associated with the video； And

Based at least one article described in selection text and picture to generate video further comprise:

Based on the predetermined time length, text and picture are identified at least one article of selection；And

The video is generated based on the text of mark and picture.

11. a kind of for generating the device of video, comprising:

Module is obtained, is configured to obtain one group of article including text and picture associated with video subject；

Determining module is configured to knowledge based model to determine the scoring for one group of article；

Selecting module is configured to select at least one article from one group of article based on the scoring；And

Generation module, be configured to based at least one article described in selection text and picture generate video.

12. device according to claim 11 is configured to determine needle wherein the determining module includes: grading module To the scoring any one of at least in following: quality score, emotion scoring, harmful content scoring.

13. device according to claim 11, wherein the generation module includes:

Picture determining module is configured to select in the given article for the given article at least one described article Picture, to generate the frame sequence in the video based on the picture of selection；

Text determining module was configured to determine text associated with the picture, based on the determining text next life At the audio content in the video.

14. device according to claim 13, wherein the text determining module includes:

First determining module is configured to based on the picture and positional relationship of the text in the given article, really Fixed text associated with the picture；And

Second determining module is configured to based on the picture and adduction relationship of the text in the given article, really Fixed text associated with the picture.

15. device according to claim 13, wherein the generation module further comprises:

Speech production module is configured to the text conversion be voice using as the audio content.

16. device according to claim 15, wherein the generation module further comprises:

Time module is configured to determine the time span of the voice；And

Frame generation module is configured to generate the frame sequence with the time span based on the picture.

17. device according to claim 16, further comprises:

Subtitle length determination modul is configured to determine word associated with the frame sequence based on the resolution ratio of the video The subtitle length of curtain；

Subtitle division module is configured to that the text is divided at least one subtitle segment based on the subtitle length；

Temporal information determining module is configured to determine the subtitle at least one described subtitle segment based on the time span The time data of section；And

Subtitle is inserted into module, is configured to based on the time data, opposite with the time data in the frame sequence The subtitle segment is inserted at the position answered.

18. device according to claim 15, wherein the generation module further comprises:

Background module is configured to add background sound into the audio content of the video.

19. device according to claim 15, wherein the generation module further comprises:

Animation is configured to the animation effect based on the picture and generates the frame sequence.

20. device according to claim 11 further comprises:

Video length determining module is configured to determine predetermined time length associated with the video；And

The generation module further comprises:

Mark module is configured to identify text and figure at least one article of selection based on the predetermined time length Piece；And

Video generation module is configured to generate the video based on the text of mark and picture.

21. a kind of equipment for generating video, the equipment include:

One or more processors；And

Storage device, for storing one or more programs, when one or more of programs are by one or more of processing Device executes, so that one or more of processors realize method according to claim 1 to 10.

22. a kind of computer readable storage medium is stored thereon with computer program, realization when described program is executed by processor Method according to claim 1 to 10.