CN109614537A - For generating the method, apparatus, equipment and storage medium of video - Google Patents
For generating the method, apparatus, equipment and storage medium of video Download PDFInfo
- Publication number
- CN109614537A CN109614537A CN201811489221.4A CN201811489221A CN109614537A CN 109614537 A CN109614537 A CN 109614537A CN 201811489221 A CN201811489221 A CN 201811489221A CN 109614537 A CN109614537 A CN 109614537A
- Authority
- CN
- China
- Prior art keywords
- picture
- text
- video
- article
- subtitle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
Landscapes
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
This disclosure relates to method, apparatus, equipment and storage medium for generating video.According to an example implementations, a kind of method for generating video is provided.In the method, one group of article including text and picture associated with video subject is obtained.Knowledge based model determines the scoring for one group of article.At least one article is selected from one group of article based on scoring.Based at least one article of selection text and picture generate video.According to the example implementations of present disclosure, device, equipment and the computer storage medium for generating video are additionally provided.Using above-mentioned implementation, video can be generated in a manner of more convenient and is efficient in the case where being not necessarily to manual intervention.
Description
Technical field
The implementation of present disclosure broadly relates to video generation, and more particularly, to for being based on text
The method, apparatus, equipment and computer storage medium of video are generated with picture.
Background technique
There is more and more multimedia platforms (website or application) at present, user may browse through platform and be provided
The static content of such as text and picture.However, the exhibition method of these static contents is more uninteresting.For for example introductory or
For the tendentious content of person, user is more likely to the dynamic content of browsing visual form.At this point, for given video subject
For, corresponding text and picture how are obtained based on video subject, and generate based on the text of acquisition and picture dynamic
State video becomes a problem to be solved.Come in a manner of more convenient and is effective therefore, it is desired to be able to provide one kind
Generate the technical solution of video.
Summary of the invention
According to the sample implementation of present disclosure, provide a kind of for generating the scheme of video.
In the first aspect of present disclosure, a kind of method for generating video is provided.In the method, it obtains and regards
The associated one group of article including text and picture of frequency theme.Knowledge based model determines the scoring for one group of article.
At least one article is selected from one group of article based on scoring.Text and picture at least one article based on selection come
Generate video.
In in the second aspect of the present disclosure, provide a kind of for generating the device of video.The device includes: the dress
Setting includes: acquisition module, is configured to obtain one group of article including text and picture associated with video subject;Determine mould
Block is configured to knowledge based model to determine the scoring for one group of article;Selecting module, be configured to based on scoring come from
At least one article is selected in one group of article;And generation module, the text being configured at least one article based on selection
Word and picture generate video.
In the third aspect of present disclosure, a kind of equipment is provided.The equipment includes one or more processors;With
And storage device, for storing one or more programs, when one or more programs are executed by one or more processors, so that
The method that one or more processors realize the first aspect according to present disclosure.
In the fourth aspect of present disclosure, a kind of computer-readable Jie for being stored thereon with computer program is provided
Matter, the method which realizes the first aspect according to present disclosure when being executed by processor.
It should be appreciated that content described in Summary is not intended to limit the implementation of present disclosure
Crucial or important feature, it is also non-for limiting the scope of the disclosure.Other features of present disclosure will be by below
Description is easy to understand.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, it is the above and other feature of each implementation of present disclosure, excellent
Point and aspect will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element,
In:
Fig. 1 diagrammatically illustrates the diagram of the static content including text and picture;
Fig. 2 diagrammatically illustrates the technical solution according to the example implementations of present disclosure for generating video
Block diagram;
Fig. 3 diagrammatically illustrates the stream of the method for generating video of the example implementations according to present disclosure
Cheng Tu;
Fig. 4 diagrammatically illustrates the text and figure with incidence relation of the example implementations according to present disclosure
The block diagram of piece;
Fig. 5 A and Fig. 5 B diagrammatically illustrate respectively according to the example implementations of present disclosure based on text and
The positional relationship of picture determines the block diagram of associated text and picture;
Fig. 6 diagrammatically illustrates the subtitle of the example implementations according to present disclosure being used to determine in video
Block diagram;
Fig. 7 diagrammatically illustrates the signal of the frame sequence with subtitle of the example implementations according to present disclosure
Figure;
Fig. 8 diagrammatically illustrates the frame of the device for generating video of the example implementations according to present disclosure
Figure;And
Fig. 9 shows the block diagram that can implement the calculating equipment of multiple implementations of present disclosure.
Specific embodiment
The implementation of present disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Certain implementations of content, it should be understood that, present disclosure can be realized by various forms, and not answered
This is construed as limited to the implementation illustrated here, provides these on the contrary and is achieved in that for more thorough and complete geography
Solve present disclosure.It should be understood that the attached drawing and being given for example only property of implementation of present disclosure act on, it is not intended to
Limit the protection scope of present disclosure.
In the description of the implementation of present disclosure, term " includes " and its similar term should be understood as opening
Include, i.e., " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " implementation "
Or " implementation " should be understood as " at least one implementation ".Term " first ", " second " etc. may refer to difference
Or identical object.Hereafter it is also possible that other specific and implicit definition.
For ease of description, the static content in multimedia platform (website or application) is described referring first to Fig. 1.Fig. 1
Diagrammatically illustrate the diagram of the static content 100 including text and picture.As shown in Figure 1, in such as static state of article 110
Rong Zhong may include text 120 and picture 130.At this point, user needs to read content of the text to understand article line by line.It is logical
Often, article 110 may include multiple pages, and user has to execute example using keyboard, mouse or touch screen etc. at this time
It such as rolls, page turning operation, to control the display of static content.Sometimes user is simultaneously inconvenient to execute input control, and a large amount of
User is more likely to obtain information from dynamic video.Especially for the content of content introduction, education or illustrative, user
It prefers to viewing dynamic video rather than reads uninteresting static article.Accordingly, it is desirable to can develop with more efficiently side
Formula generates the technical solution of video.
The technical solution that video is generated based on various ways has been proposed at present.For example, in designated theme
In the case of, relevant text and picture can be searched for based on manual type, and video is generated based on the text of acquisition and picture.
In order to improve the efficiency of artificial treatment, automatic search text and picture and the technical solution for generating video also proposed at present.?
In the technical solution, the text of acquisition and the quantity of picture can not be well matched with.On the one hand, a large amount of texts may be found
It introduces and only a small amount of image content, has to retrieve other pictures again at this time.On the other hand, lack between text and picture
Relevance generates video based on text and picture at this time, it will leads to the audio content generated in video based on text and is based on
It is inconsistent that theme is described between the picture that picture generates.At this point, how to acquire text and picture, and based on collected text and
Picture, which generates dynamic video, becomes a problem to be solved.
In order at least be partially solved the deficiency in above-mentioned technical proposal, according to the exemplary realization of the disclosure, provide
It is a kind of for generating the technical solution of video.Hereinafter, it will refer to the exemplary realization that Fig. 2 is broadly described the disclosure.Fig. 2 shows
Meaning property shows the block diagram 200 for being used to generate the technical solution of video according to the example implementations of present disclosure.Such as Fig. 2
It is shown, search operation can be executed in network 220 based on video subject 210.Video subject 210 can specify expectation and generate
Video content topic, for example, " Summer Palace introduction ", " the Forbidden City introduction " etc..It hereinafter, will be to generate the related " Summer Palace
The video of introduction " is example, and the more details of the implementation according to the disclosure are described in detail.
At this point it is possible to based on search engine (for example, " Baidu " search engine or other search engines) in network 220
Search is executed, to obtain one group of article 230 associated with video master map 210.The Summer Palace is largely introduced at this point it is possible to find
Webpage article.It includes text 120 and figure that each article in one group of article 230 in this, which for example can be as shown in Figure 1,
The article 110 of piece 130.Then, the scoring of each article in one group of article 230 can be determined with knowledge based model 250
240.Knowledge model 250 in this for example can be training in advance for determining that each article is appropriate for generate video
Model.Then, at least one article 260 for being more suitable for generating video can be selected based on scoring 240.It can use choosing
The text and picture that include at least one article 260 selected generates video 270.When based on 120 He of text as shown in Figure 1
Picture 130 is come when generating video 240, then the picture in video can show picture 130, and audio-frequency unit can play text
120 audio.
It is realized using above-mentioned example, magnanimity article can be searched in network 220, for use in determining and video subject
210 associated articles.According to the example implementations of the disclosure, searching for that magnanimity article ensures in network 220 can be with
Material abundant is provided to be used to generate video 270, is avoided the case where requiring supplementation with search in the later period.Further, scoring 240
To determine that selecting which article to be suitable for generation video provides quantitative criteria.According to the example implementations of the disclosure, comment
Dividing 240 may include various contents, and the quality of the material for generating video 270 can be improved by this method, facilitate
Selection is more suitable for generating the material of video 270.In the video 270 generated by this method, it can be ensured that image content and
Audio content has close relevance, that is, audio content can execute explanation and illustration around image content.
Hereinafter, more details of Fig. 3 description in relation to generating video 270 be will refer to.Fig. 3 is diagrammatically illustrated according to this
The flow chart of the method 300 for generating video of the example implementations of disclosure.At frame 310, acquisition and video
The associated one group of article 230 including text and picture of theme 210.It can be used herein having developed at present or will be
A variety of search techniques of future exploitation, to obtain one group of article 230.
According to the example implementations of the disclosure, classification belonging to the video that can be generated based on expectation is held to determine
The field of row search.Such as, if it is desired to it generates sight spot and introduces class video, then can choose and execute search in tourism website.
If it is desire to generating educational video, then it can choose and execute search in encyclopaedia class website.In this way, it is possible to more accurate
Mode find and the more relevant material of video subject 210.
At frame 320, the scoring for one group of article 230 can be determined with knowledge based model 250.It will be understood that
This knowledge model 250 may include many factors.For example, may include scoring, no for the quality score of article, emotion
Many-sided relevant factor such as good content scores.
According to the example implementations of the disclosure, quality score can be determined based on the quantity of the picture in article.
In order to generate the needs of video, can preferentially select include more picture article.Therefore, it is possible to for including more picture
Article provide higher quality score, and provide lower quality score for the article including less picture.
According to the example implementations of the disclosure, quality score can be determined based on the quantity of the text in article.
Due to generating the audio content in video based on word content, thus the number of word content determines view to a certain extent
Thus the length of audio in frequency determines the length of generated video.Typically, user is easier to receive time span to exist
Video between 2-4 minutes, thus quality score can be determined based on the quantity of text in article.For example, can be number of words
Higher scoring is provided for the article within 500 words, and provides lower scoring for the article that number of words is more than 2000 words.
According to the example implementations of the disclosure, can be determined based on the relationship between the text and picture in article
Quality score.Typically, the picture in article is the explanation in order to cooperate text, thus can preferentially select text and figure
The more uniform article of the distribution of piece.Assuming that some article includes 10 sections of texts, and picture is shown below every section of text,
Then higher scoring can be provided for this article at this time.Assuming that some article includes 10 sections of texts, and only in article starting position
Include 1 picture with end position, then can provide lower scoring at this time for this article.
It will be understood that although the independent example in relation to quality score is diagrammatically illustrated above, according to the example of the disclosure
Above-mentioned example, can also be used in combination by property implementation.For example, can be respectively that corresponding weight is arranged in each aspect,
And final quality score is determined based on modes such as such as weighted sums.
According to the example implementations of the disclosure, the emotion scoring of comment 210 can be determined.Emotion scoring in this can
To indicate that article content takes positive, neutral or passive attitude for some theme.For example, can in section [- 1,1] table
Show that emotion scores, wherein " -1 " indicates passive attitude, " -1 " indicates positive attitude.For introductory video, it is more prone to
Video is generated in selecting article based on positive attitude, thus higher emotion can be set for the article with positive attitude
Scoring, and be that there is the article of passive attitude lower emotion scoring is arranged.It, can be with according to the example implementations of the disclosure
Obtain predefined emotion keyword database, emotion keyword database may include respectively indicate actively, message and in
The keyword of vertical emotion.Then, the one or more keywords extracted from article and emotion number of keyword number can be compared
According to library, to determine the emotion scoring of article.
According to the example implementations of the disclosure, scoring can also be provided for whether article includes harmful content.
Harmful content in this for example can include but is not limited to ad content, Pornograph, reaction content, swindle content etc..It can be with
Lower scoring is provided for the article for including above-mentioned harmful content.
According to the example implementations of the disclosure, this can be determined by text in analysis article and/or picture
The emotion of article scores and the scoring in relation to harmful content.For example, it may be possible to which there are following situations for meeting, in the word segment of certain article
It is related to the introduction in the Summer Palace, and includes the advertising of such as " second-hand house sale " etc in picture, then can also knows at this time
Other picture, to determine whether including harmful content.
According to the example implementations of the disclosure, filter operation can also be executed for the one group of article found, so as to
Remove including duplicate contents.The behaviour of above-described determining scoring can be executed only for filtered each article
Make.
At frame 330, at least one article is selected from one group of article 230 based on scoring.Hereinbefore in detail
Describe how to determine the scoring in terms of the related quality, emotion and harmful content of article.According to the exemplary realization of the disclosure
The scoring of above-mentioned different aspect can also be used in combination by mode.For example, can be respectively each corresponding power of aspect setting
Weight, such as final scoring is determined based on modes such as weighted sums.
At frame 340, can based at least one article of selection text and picture generate video 270.Under
Wen Zhong, will be described in how based in an article text and picture generate the details of video 270.According to the disclosure
Example implementations can select the figure in given article 110 for the given article 110 at least one article first
Piece 130 then determines text 120 associated with picture 130.
Fig. 4 diagrammatically illustrates the text and figure with incidence relation of the example implementations according to present disclosure
The block diagram 400 of piece.As shown in figure 4, may include multiple pictures 130 and 430 in article 110, and this article 110 can be with
Text 120 and 420 including multiple paragraphs.At this point it is possible to determine which text is to describe given picture based on various ways
Text.
According to the example implementations of the disclosure, can based on the positional relationship of picture and text in given article come
Determine text associated with given picture.Typically, the text for describing image content is located at paragraph near picture.Cause
And text and picture with incidence relation can be determined based on the positional relationship between text and picture.For example, in Fig. 4
The text 120 of text 110 including the summarized introduction Summer Palace close to 120 lower section of text further include the landmark building in the Summer Palace
Picture 130.It can determine that text 120 and picture 130 have incidence relation 410 at this time.In another example the text 110 in Fig. 4 wraps
The text 420 for introducing ten seven apertures in the human head bridge of Summer Palace famous sites is included, further includes the picture of ten seven apertures in the human head bridges close to 420 lower section of text
430.It can determine that text 420 and picture 430 have incidence relation 440 at this time.
Hereinafter, it will refer to how Fig. 5 A and Fig. 5 B description determine that text and the more of picture with incidence relation show
Example.Fig. 5 A diagrammatically illustrate according to the positional relationship based on text and picture of the example implementations of present disclosure come
Determine the block diagram 500A of associated text and picture.As shown in Figure 5A, article 510A includes text 520A and picture 530A, and
And text 520A is looped around around picture 530A.At this point it is possible to determine that text 520A and picture 530A has incidence relation.
Fig. 5 B diagrammatically illustrates the position based on text and picture of the example implementations according to present disclosure
Relationship determines the block diagram 500B of associated text and picture.As shown in Figure 5 B, article 510B includes text 520B and picture
530B, and text 520B is close to the top of picture 530B.Typically, the text of the paragraph of picture upper and lower part is to be directed to
The description of image content, at this point it is possible to determine that text 520A and picture 530A has incidence relation.
According to the example implementations of the disclosure, in order to more accurately determine the text and picture with incidence relation,
Text region and image recognition technology are also based on to execute further processing.For example, can be extracted from text 520B
Keyword, and identify the content of image 530B.Assuming that including " ten seven apertures in the human head bridges " in the keyword of extraction and determining image
When the content of 530B is the photo of ten seven apertures in the human head bridges, then it can determine that text 520B and picture 530B has incidence relation.Assuming that
Do not include " ten seven apertures in the human head bridges " in the keyword of extraction, then can be carried out for the text 540B of paragraph below picture 530B further
Processing.Assuming that text 540B includes " ten seven apertures in the human head bridges ", then it can determine that text 540B and picture 530B has incidence relation.Alternatively
Ground and/or additionally can also retrieve the paragraph including keyword " ten seven apertures in the human head bridges " in article, and the paragraph found is arranged
It is associated with picture 530B.
According to the example implementations of the disclosure, can based on the adduction relationship of picture and text in given article come
Determine text associated with given picture.Often will appear in article for example " See Figure ", " as shown below ", " referring to figure
1 " etc. for indicating the adduction relationship between text and picture.It is closed in this way, it is possible to determine to have in a manner of more accurate
The text and picture of connection relationship.
Realized using above-mentioned example, since picture and text are respectively positioned on identical article, thus text and picture have compared with
Strong correlation, thus can more effectively select the text with strong correlation and picture.In this way it is possible to assure that base
The correlation of picture and audio in the video that the text and picture that this mode selects generate.Relative to based on artificial selection or
Person be based on search from a large amount of articles select text and picture and for the video of " piecing together ", the theme of the video generated by this method
It is more clear, and the consistency of picture and audio is higher.
According to the example implementations of the disclosure, the frame sequence in video can be generated based on the picture of selection, and
And the audio content in video can be generated based on determining text.Hereinafter, it will refer to such as 130 He of picture in Fig. 4
Associated text 120 generates the more details of video clip.
According to the example implementations of the disclosure, text 120 can be converted into voice using as audio content first.
Multiple voice generation technique can be used herein will realize above-mentioned steps.For example, can be using the tts skill of Baidu company exploitation
Art or text are realized to other switch technologies of voice.
According to the example implementations of the disclosure, the frame sequence in video can be generated based on picture 130.Specifically
Ground can determine the time span of voice, and generate the frame sequence with time span based on picture.In other words, by picture
130 show the time span, that is, showing static picture 130 in the time span.It converts by text 120 to voice
Later, it is assumed that the time span of voice is 12 seconds, then a video clip can be generated.In the length that the segment includes 12 seconds,
Image content is static images 130, and voice content is the 12 seconds phonetic explainings obtained based on text 120.
According to the example implementations of the disclosure, when the animation for being also based on picture 130 shows to generate with this
Between length frame sequence.For example, the picture in video can be generated by the way of being similar to the lantern slide page.In a reality
In existing, the picture 130 can be made dynamically to move into screen, screen is removed or various fade effects can also be set, so as to
Picture 130 is gradually transitted to from previous picture.
According to the example implementations of the disclosure, subtitle can also be added into video 270.Specifically, it can be based on
The resolution ratio of video determines the subtitle length of subtitle associated with frame sequence.Subtitle length in this, which refers to, to be suitable for regarding
The length of the text shown in a frame in frequency.The more high then subtitle length of the resolution ratio of video is bigger, and the resolution ratio of video is got over
Low then subtitle length is smaller.According to the example implementations of the disclosure, it is also contemplated that each sentence in text 120 it is disconnected
Sentence, to determine subtitle length.For example, subtitle length can be set to the numerical value between 10-20 word.
Text can be divided at least one subtitle segment based on subtitle length.Hereinafter, will refer to Fig. 6 description has
Close the more details for generating subtitle.Fig. 6 is diagrammatically illustrated according to the example implementations of present disclosure for determining
The block diagram 600 of subtitle in video.As shown in fig. 6, text 120 may include " Summer Palace, Chinese Qing Dynasty's period imperal garden,
Predecessor is the garden Qing Yi, is located at west suburb in Beijing, 15 kilometers away from city, takes up an area about 290 hectares ".At this point it is possible to which text 120 is drawn
It is divided into 3 subtitle segments.Subtitle segment 610 may include text " Summer Palace, Chinese Qing Dynasty's period imperal garden ";Subtitle segment 620 can
To include text " predecessor is located at west suburb in Beijing for the garden Qing Yi ";And subtitle segment 620 may include that text is " public away from city 15
In, take up an area about 290 hectares ".
It will be understood that, it is assumed that the time span of the audio including whole texts 120 is 12 seconds, then the length of entire video can
To be 12 seconds.Further, it is possible to determine the time number of the subtitle segment at least one subtitle segment based on 12 seconds time spans
According to.Fig. 6 shows time shaft 640, and the 12 seconds time of the audio for playing whole texts 120 is shown.Due to each
A subtitle segment 610,620 is substantially similar with the quantity for the text for including in 630, thus can with similar length come play with respectively
A subtitle segment 610,620 and 630 associated audios.For example, can within the 0th second to the 4th second period caption playing section
610 audio, can within the 4th second to the 8th second period caption playing section 620 audio, can be at the 8th second to the 12nd second
Period in caption playing section 630 audio.
Based on time data associated with each subtitle segment as shown in Figure 6, can in frame sequence and the time
Subtitle segment is inserted at the corresponding position of data.Fig. 7 is diagrammatically illustrated according to the example implementations of present disclosure
The schematic diagram 700 of frame sequence with subtitle.As shown in fig. 7, the picture such as attached drawing mark of the frame sequence for the 0th second to the 4th second
Shown in note 710, which includes the content " Summer Palace, Chinese Qing Dynasty's period imperal garden " in subtitle segment 610.
According to the example implementations of the disclosure, it is aobvious to protrude to can use the color different from the color of picture 130
Show subtitle.According to the example implementations of the disclosure, various animation effects can be set for subtitle, such as can be with audio
Broadcasting and gradually show the text in subtitle, subtitle, etc. can be shown using fade in/out effect.Similar to picture
710, appended drawing reference 720 and 730 is respectively illustrated for the picture of the 4th second to the 8th second frame sequence and for the 8th second to the
The picture of 12 seconds frame sequences.At this point, picture 720 is including the text in subtitle segment 620, " predecessor is the garden Qing Yi, is located at Beijing
Western suburb ", picture 730 include the text " 15 kilometers away from city, take up an area about 290 hectares " in subtitle segment 630.
According to the example implementations of the disclosure, background sound can also be added into the audio content of video.For example,
Background music can be set for audio commentary, or various audio processing effects can also be set.
It will be understood that although only diagrammatically illustrating based on 120 next life an of picture 130 and its associated text above
At the details of a video clip, according to the example implementations of the disclosure, it is also based on from one or more texts
More pictures of chapter and and its associated text generate more video clips.According to the exemplary realization side of the disclosure
Multiple video clips can be combined by formula, to form new video.
According to the example implementations of the disclosure, can also according to the predetermined time length for the video that expectation generates come from
Text and picture are selected in one or more articles, and video is generated based on the text of selection and picture.It at this time can be with base
Bright reading rate under normal word speed determines to need how many text are selected from article.Assuming that predetermined time length is 2 minutes,
And bright reading rate is 150 words per minute clocks, then can select the content of about 300 words from article at this time.
Picture is also based on when in article 110 including multiple pictures according to the example implementations of the disclosure
Quantity determines the number of words of text associated with each picture.For example, it is assumed that including 6 pictures in article 110, then can divide
Not Wei each picture select the word contents of about 50 words.In another example be also based on each picture importance or other
Because usually selecting text appropriate.It, can also be in the view generated based on each picture according to the example implementations of the disclosure
Transitional content is added between frequency segment.
The multiple implementations for how generating the method 300 of video 270 are hereinbefore described in detail.According to this public affairs
The example implementations opened additionally provide the device for generating video 270.Hereinafter, it will refer to Fig. 8 detailed description.
Fig. 8 diagrammatically illustrates the device 800 for being used to generate video of the example implementations according to present disclosure
Block diagram.The device 800 includes: to obtain module 810, and being configured to obtain associated with video subject includes text and picture
One group of article;Determining module 820 is configured to knowledge based model to determine the scoring for one group of article;Selecting module
830, it is configured to select at least one article from one group of article based on scoring;And generation module 840, it is configured to base
Text and picture at least one article of selection generate video.
According to the example implementations of the disclosure, determining module 820 includes: grading module, is configured to determination and is directed to
The scoring of at least any one in below: quality score, emotion scoring, harmful content scoring.
According to the example implementations of the disclosure, generation module 840 includes: picture determining module, is configured to be directed to
Given article at least one article selects the picture in given article, to generate in video based on the picture of selection
Frame sequence;Text determining module is configured to determine text associated with picture, to generate video based on determining text
In audio content.
According to the example implementations of the disclosure, text determining module includes: the first determining module, is configured to be based on
The positional relationship of picture and text in given article, determines text associated with picture;And second determining module, configuration
For the adduction relationship based on picture and text in given article, text associated with picture is determined.
According to the example implementations of the disclosure, generation module 840 further comprises: speech production module, and configuration is used
In being voice using as audio content by text conversion.
According to the example implementations of the disclosure, generation module 840 further comprises: time module, is configured to really
The time span of attribute sound;And frame generation module, it is configured to generate the frame sequence with time span based on picture.
According to the example implementations of the disclosure, which further comprises: subtitle length determination modul, configuration
For determining the subtitle length of subtitle associated with frame sequence based on the resolution ratio of video;Subtitle division module, configuration are used
In text is divided at least one subtitle segment based on subtitle length;Temporal information determining module was configured to based on the time
Length determines the time data of the subtitle segment at least one subtitle segment;And subtitle is inserted into module, when being configured to be based on
Between data, subtitle segment is inserted at the position corresponding with time data in frame sequence.
According to the example implementations of the disclosure, generation module 840 further comprises: background module, be configured to
Background sound is added in the audio content of video.
According to the example implementations of the disclosure, generation module 840 further comprises: animation is configured to base
Frame sequence is generated in the animation effect of picture.
According to the example implementations of the disclosure, which further comprises: video length determining module, configuration
For determining predetermined time length associated with video.
According to the example implementations of the disclosure, generation module 840 further comprises: mark module is configured to base
In predetermined time length, text and picture are identified at least one article of selection;And video generation module, it is configured to
Video is generated based on the text of mark and picture.
Fig. 9 shows the block diagram that can implement the calculating equipment 900 of multiple implementations of present disclosure.Equipment 900
The method that can be used to implement Fig. 3 and Fig. 6 description.As shown, equipment 800 includes central processing unit (CPU) 901, it can
To be loaded into random visit according to the computer program instructions being stored in read-only memory (ROM) 902 or from storage unit 908
The computer program instructions in memory (RAM) 903 are asked, to execute various movements appropriate and processing.In RAM 903, also
Equipment 900 can be stored and operate required various programs and data.CPU 901, ROM 902 and RAM 903 by bus 904 that
This is connected.Input/output (I/O) interface 905 is also connected to bus 904.
Multiple components in equipment 900 are connected to I/O interface 905, comprising: input unit 906, such as keyboard, mouse etc.;
Output unit 907, such as various types of displays, loudspeaker etc.;Storage unit 908, such as disk, CD etc.;And it is logical
Believe unit 909, such as network interface card, modem, wireless communication transceiver etc..Communication unit 909 allows equipment 900 by such as
The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 901 executes each method as described above and processing, such as method 300 and 600.For example, one
In a little implementations, method 300 and 600 can be implemented as computer software programs, be tangibly embodied in machine readable Jie
Matter, such as storage unit 908.In some implementations, some or all of of computer program can be via ROM 902
And/or communication unit 909 and be loaded into and/or be installed in equipment 900.When computer program loads are to RAM 903 and by CPU
When 901 execution, the one or more steps of procedures described above 400 can be executed.Alternatively, in other implementations,
CPU 901 can be configured as execution method 300 and 600 by other any modes (for example, by means of firmware) appropriate.
According to the example implementations of present disclosure, a kind of computer for being stored thereon with computer program is provided
Readable storage medium storing program for executing.Method described in the disclosure is realized when program is executed by processor.
Function described herein can be executed at least partly by one or more hardware logic components.Example
Such as, without limitation, the hardware logic component for the exemplary type that can be used includes: field programmable gate array (FPGA), dedicated
Integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device
(CPLD) etc..
Program code for implementing the method for present disclosure can be using any group of one or more programming languages
It closes to write.These program codes can be supplied to general purpose computer, special purpose computer or other programmable data processing units
Processor or controller so that program code when by processor or controller execution when make to be advised in flowchart and or block diagram
Fixed function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as independence
Software package partly executes on machine and partly executes or hold on remote machine or server on the remote machine completely
Row.
In the context of present disclosure, machine readable media can be tangible medium, may include or stores
The program for using or being used in combination with instruction execution system, device or equipment for instruction execution system, device or equipment.Machine
Device readable medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media may include but unlimited
In times of electronics, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content
What appropriate combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable
Formula computer disks, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM
(EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage are set
Standby or above content any appropriate combination.
Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order
Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result.
Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above
Body realize details, but these be not construed as to scope of the present disclosure limitation.In individual implementation
Certain features described in context can also be realized in combination in single realize.On the contrary, in the context individually realized
Described in various features can also realize individually or in any suitable subcombination in multiple realizations.
Although having used specific to this theme of the language description of structure feature and/or method logical action, answer
When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary,
Special characteristic described in face and movement are only to realize the exemplary forms of claims.
Claims (22)
1. a kind of method for generating video, comprising:
Obtain one group of article including text and picture associated with video subject;
Knowledge based model determines the scoring for one group of article;
At least one article is selected from one group of article based on the scoring;And
Based at least one article described in selection text and picture generate video.
2. according to the method described in claim 1, wherein knowledge based model determines the scoring packet for one group of article
Include the scoring for determining at least any one being directed in following:
Quality score, emotion scoring, harmful content scoring.
3. according to the method described in claim 1, based at least one article described in selection text and picture generate
Video includes: the given article at least one article for described in,
The picture in the given article is selected, to generate the frame sequence in the video based on the picture of selection;With
And
Text associated with the picture is determined, to generate in the audio in the video based on the determining text
Hold.
4. according to the method described in claim 3, wherein determining that associated with picture text includes based in following
Any one of at least determine text associated with the picture:
The picture and positional relationship of the text in the given article;And
The picture and adduction relationship of the text in the given article.
5. according to the method described in claim 3, wherein being generated in the audio in the video based on the determining text
Appearance includes:
It is voice using as the audio content by the text conversion.
6. according to the method described in claim 5, wherein generating the frame sequence in the video based on the picture of selection
Include:
Determine the time span of the voice;And
The frame sequence with the time span is generated based on the picture.
7. according to the method described in claim 6, further comprising:
The subtitle length of subtitle associated with the frame sequence is determined based on the resolution ratio of the video;
The text is divided at least one subtitle segment based on the subtitle length;
The time data of the subtitle segment at least one described subtitle segment are determined based on the time span;And
Based on the time data, the subtitle is inserted at the position corresponding with the time data in the frame sequence
Section.
8. according to the method described in claim 5, wherein generating audio content in the video based on the text into one
Step includes:
Background sound is added into the audio content of the video.
9. according to the method described in claim 5, generating the frame sequence with the time span based on the picture into one
Step includes:
Animation effect based on the picture generates the frame sequence.
10. according to the method described in claim 1, further comprising: determining predetermined time length associated with the video;
And
Based at least one article described in selection text and picture to generate video further comprise:
Based on the predetermined time length, text and picture are identified at least one article of selection;And
The video is generated based on the text of mark and picture.
11. a kind of for generating the device of video, comprising:
Module is obtained, is configured to obtain one group of article including text and picture associated with video subject;
Determining module is configured to knowledge based model to determine the scoring for one group of article;
Selecting module is configured to select at least one article from one group of article based on the scoring;And
Generation module, be configured to based at least one article described in selection text and picture generate video.
12. device according to claim 11 is configured to determine needle wherein the determining module includes: grading module
To the scoring any one of at least in following: quality score, emotion scoring, harmful content scoring.
13. device according to claim 11, wherein the generation module includes:
Picture determining module is configured to select in the given article for the given article at least one described article
Picture, to generate the frame sequence in the video based on the picture of selection;
Text determining module was configured to determine text associated with the picture, based on the determining text next life
At the audio content in the video.
14. device according to claim 13, wherein the text determining module includes:
First determining module is configured to based on the picture and positional relationship of the text in the given article, really
Fixed text associated with the picture;And
Second determining module is configured to based on the picture and adduction relationship of the text in the given article, really
Fixed text associated with the picture.
15. device according to claim 13, wherein the generation module further comprises:
Speech production module is configured to the text conversion be voice using as the audio content.
16. device according to claim 15, wherein the generation module further comprises:
Time module is configured to determine the time span of the voice;And
Frame generation module is configured to generate the frame sequence with the time span based on the picture.
17. device according to claim 16, further comprises:
Subtitle length determination modul is configured to determine word associated with the frame sequence based on the resolution ratio of the video
The subtitle length of curtain;
Subtitle division module is configured to that the text is divided at least one subtitle segment based on the subtitle length;
Temporal information determining module is configured to determine the subtitle at least one described subtitle segment based on the time span
The time data of section;And
Subtitle is inserted into module, is configured to based on the time data, opposite with the time data in the frame sequence
The subtitle segment is inserted at the position answered.
18. device according to claim 15, wherein the generation module further comprises:
Background module is configured to add background sound into the audio content of the video.
19. device according to claim 15, wherein the generation module further comprises:
Animation is configured to the animation effect based on the picture and generates the frame sequence.
20. device according to claim 11 further comprises:
Video length determining module is configured to determine predetermined time length associated with the video;And
The generation module further comprises:
Mark module is configured to identify text and figure at least one article of selection based on the predetermined time length
Piece;And
Video generation module is configured to generate the video based on the text of mark and picture.
21. a kind of equipment for generating video, the equipment include:
One or more processors;And
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing
Device executes, so that one or more of processors realize method according to claim 1 to 10.
22. a kind of computer readable storage medium is stored thereon with computer program, realization when described program is executed by processor
Method according to claim 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811489221.4A CN109614537A (en) | 2018-12-06 | 2018-12-06 | For generating the method, apparatus, equipment and storage medium of video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811489221.4A CN109614537A (en) | 2018-12-06 | 2018-12-06 | For generating the method, apparatus, equipment and storage medium of video |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109614537A true CN109614537A (en) | 2019-04-12 |
Family
ID=66006746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811489221.4A Pending CN109614537A (en) | 2018-12-06 | 2018-12-06 | For generating the method, apparatus, equipment and storage medium of video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109614537A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177542A (en) * | 2019-12-20 | 2020-05-19 | 贝壳技术有限公司 | Introduction information generation method and device, electronic equipment and storage medium |
CN111538851A (en) * | 2020-04-16 | 2020-08-14 | 北京捷通华声科技股份有限公司 | Method, system, device and storage medium for automatically generating demonstration video |
CN112153418A (en) * | 2019-06-26 | 2020-12-29 | 阿里巴巴集团控股有限公司 | Streaming media generation method and device, terminal and server |
CN112565875A (en) * | 2020-11-30 | 2021-03-26 | 北京百度网讯科技有限公司 | Method, device, equipment and computer readable storage medium for automatically generating video |
CN113132781A (en) * | 2019-12-31 | 2021-07-16 | 阿里巴巴集团控股有限公司 | Video generation method and apparatus, electronic device, and computer-readable storage medium |
CN113423010A (en) * | 2021-06-22 | 2021-09-21 | 深圳市大头兄弟科技有限公司 | Video conversion method, device and equipment based on document and storage medium |
CN113438543A (en) * | 2021-06-22 | 2021-09-24 | 深圳市大头兄弟科技有限公司 | Matching method, device and equipment for converting document into video and storage medium |
CN113497899A (en) * | 2021-06-22 | 2021-10-12 | 深圳市大头兄弟科技有限公司 | Character and picture matching method, device and equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559214A (en) * | 2013-10-11 | 2014-02-05 | 中国农业大学 | Method and device for automatically generating video |
CN103650002A (en) * | 2011-05-06 | 2014-03-19 | 西尔股份有限公司 | Video generation based on text |
CN107193805A (en) * | 2017-06-06 | 2017-09-22 | 北京百度网讯科技有限公司 | Article Valuation Method, device and storage medium based on artificial intelligence |
CN107943839A (en) * | 2017-10-30 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and storage medium based on picture and word generation video |
CN108090111A (en) * | 2016-11-23 | 2018-05-29 | 谷歌有限责任公司 | It is taken passages for the animation of search result |
CN108509457A (en) * | 2017-02-28 | 2018-09-07 | 阿里巴巴集团控股有限公司 | A kind of recommendation method and apparatus of video data |
US20180277166A1 (en) * | 2013-02-22 | 2018-09-27 | Fuji Xerox Co., Ltd. | Systems and methods for creating and using navigable spatial overviews for video |
-
2018
- 2018-12-06 CN CN201811489221.4A patent/CN109614537A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103650002A (en) * | 2011-05-06 | 2014-03-19 | 西尔股份有限公司 | Video generation based on text |
US20180277166A1 (en) * | 2013-02-22 | 2018-09-27 | Fuji Xerox Co., Ltd. | Systems and methods for creating and using navigable spatial overviews for video |
CN103559214A (en) * | 2013-10-11 | 2014-02-05 | 中国农业大学 | Method and device for automatically generating video |
CN108090111A (en) * | 2016-11-23 | 2018-05-29 | 谷歌有限责任公司 | It is taken passages for the animation of search result |
CN108509457A (en) * | 2017-02-28 | 2018-09-07 | 阿里巴巴集团控股有限公司 | A kind of recommendation method and apparatus of video data |
CN107193805A (en) * | 2017-06-06 | 2017-09-22 | 北京百度网讯科技有限公司 | Article Valuation Method, device and storage medium based on artificial intelligence |
CN107943839A (en) * | 2017-10-30 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and storage medium based on picture and word generation video |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112153418A (en) * | 2019-06-26 | 2020-12-29 | 阿里巴巴集团控股有限公司 | Streaming media generation method and device, terminal and server |
CN111177542A (en) * | 2019-12-20 | 2020-05-19 | 贝壳技术有限公司 | Introduction information generation method and device, electronic equipment and storage medium |
CN111177542B (en) * | 2019-12-20 | 2021-07-20 | 贝壳找房(北京)科技有限公司 | Introduction information generation method and device, electronic equipment and storage medium |
CN113132781A (en) * | 2019-12-31 | 2021-07-16 | 阿里巴巴集团控股有限公司 | Video generation method and apparatus, electronic device, and computer-readable storage medium |
CN111538851A (en) * | 2020-04-16 | 2020-08-14 | 北京捷通华声科技股份有限公司 | Method, system, device and storage medium for automatically generating demonstration video |
CN111538851B (en) * | 2020-04-16 | 2023-09-12 | 北京捷通华声科技股份有限公司 | Method, system, equipment and storage medium for automatically generating demonstration video |
CN112565875A (en) * | 2020-11-30 | 2021-03-26 | 北京百度网讯科技有限公司 | Method, device, equipment and computer readable storage medium for automatically generating video |
CN113423010A (en) * | 2021-06-22 | 2021-09-21 | 深圳市大头兄弟科技有限公司 | Video conversion method, device and equipment based on document and storage medium |
CN113438543A (en) * | 2021-06-22 | 2021-09-24 | 深圳市大头兄弟科技有限公司 | Matching method, device and equipment for converting document into video and storage medium |
CN113497899A (en) * | 2021-06-22 | 2021-10-12 | 深圳市大头兄弟科技有限公司 | Character and picture matching method, device and equipment and storage medium |
CN113438543B (en) * | 2021-06-22 | 2023-02-03 | 深圳市大头兄弟科技有限公司 | Matching method, device and equipment for converting document into video and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109614537A (en) | For generating the method, apparatus, equipment and storage medium of video | |
CN109801193B (en) | Follow-up teaching system with voice evaluation function | |
US10096145B2 (en) | Method and system for assembling animated media based on keyword and string input | |
Durand et al. | The Oxford handbook of corpus phonology | |
KR20180107147A (en) | Multi-variable search user interface | |
CN105224581B (en) | The method and apparatus of picture are presented when playing music | |
JP2001014306A (en) | Method and device for electronic document processing, and recording medium where electronic document processing program is recorded | |
JP2011215964A (en) | Server apparatus, client apparatus, content recommendation method and program | |
CN109359287A (en) | The online recommender system of interactive cultural tour scenic area and scenic spot and method | |
Lüpke | Data collection methods for field-based language documentation | |
Arnold et al. | Storyboarding serious games for large-scale training applications | |
Zhang et al. | Visual storytelling of song ci and the poets in the social–cultural context of song dynasty | |
JP2011128362A (en) | Learning system | |
Yuan et al. | Speechlens: A visual analytics approach for exploring speech strategies with textural and acoustic features | |
JP2003208083A (en) | Method and device for generating teaching material, teaching material generating program, and storage medium with the teaching material generating program stored therein | |
Poovizhi et al. | M-Learning First Word Android Application to support Autistic Children with their Education | |
Androutsopoulos et al. | Generating multilingual personalized descriptions of museum exhibits-The M-PIRO project | |
Zähres et al. | Broadcasting your variety | |
JP7427405B2 (en) | Idea support system and its control method | |
Origlia et al. | Human, all too human: Towards a disfluent virtual tourist guide | |
Tang | Producing informative text alternatives for images | |
Tang | Chinese diaspora narrative histories: Expanding local coproducer knowledge and digital story archival development | |
Pettit | Passing traditions: Child-directed music as an index of cultural change in metropolitan India | |
Jayalakshmi et al. | Augmenting Kannada Educational Video with Indian Sign Language Captions Using Synthetic Animation | |
CN110209918A (en) | A kind of text handling method based on date event, device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |