[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2023142917A1 - 一种视频生成方法、装置、设备、介质及产品 - Google Patents

一种视频生成方法、装置、设备、介质及产品 Download PDF

Info

Publication number
WO2023142917A1
WO2023142917A1 PCT/CN2023/070291 CN2023070291W WO2023142917A1 WO 2023142917 A1 WO2023142917 A1 WO 2023142917A1 CN 2023070291 W CN2023070291 W CN 2023070291W WO 2023142917 A1 WO2023142917 A1 WO 2023142917A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
candidate
target
description information
videos
Prior art date
Application number
PCT/CN2023/070291
Other languages
English (en)
French (fr)
Inventor
周文
张帆
李亚
杜臣
谢荣昌
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023142917A1 publication Critical patent/WO2023142917A1/zh
Priority to US18/393,567 priority Critical patent/US12160643B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/475End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data
    • H04N21/4756End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data for rating content, e.g. scoring a recommended movie
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a video generation method, device, equipment, computer-readable storage medium and computer program product.
  • APP Application, APP
  • video application a user can publish a video based on the video application, and a user can also watch videos posted by other users based on the video application.
  • the purpose of the present disclosure is to provide a video generation method, device, device, computer-readable storage medium and computer program product, which can automatically create videos, simplify user operations, and reduce the difficulty of video creation.
  • the present disclosure provides a video generation method, the method comprising:
  • the target segment corresponding to the description information is obtained from a plurality of video segments in the video material library, and the plurality of video segments have corresponding description information;
  • the target video is generated according to the score of each candidate video in the plurality of candidate videos.
  • the present disclosure provides a video generation device, including:
  • the obtaining module is used to obtain the description information of the target video to be generated; according to the description information of the target video, the target segment corresponding to the description information is obtained from a plurality of video segments in the video material library, and the plurality of videos The segment has corresponding description information; based on the target segment and the plurality of video segments, determining a plurality of candidate segments;
  • a splicing module configured to splice the target segment with each candidate segment in the plurality of candidate segments to obtain multiple candidate videos
  • An evaluation module configured to evaluate the score of each candidate video in the plurality of candidate videos
  • a generating module configured to generate the target video according to the score of each candidate video in the plurality of candidate videos.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of any one of the methods described in the first aspect of the present disclosure are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of any one of the methods in the first aspect of the present disclosure.
  • the present disclosure provides a computer program product including instructions, which, when run on a device, cause the device to execute the method described in any one of the above first aspects.
  • the present disclosure has the following advantages:
  • the present disclosure provides a video generation method.
  • the method first obtains the description information of the target video to be generated, and then obtains the target segment corresponding to the description information from the video material library based on the description information of the target video, and uses the target segment
  • Each of the multiple candidate segments in the video material library is respectively spliced to obtain multiple candidate videos.
  • evaluate the plurality of candidate videos respectively obtain the score of each candidate video in the plurality of candidate videos, and obtain the target video based on the scores.
  • the top 3 candidate videos may be used as target videos. It can be seen that in this method, the target video can be generated without the user performing complex operations such as editing and stitching the original video, which simplifies the user's operation and reduces the difficulty of video creation.
  • FIG. 1 is a schematic diagram of a video generation system provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a video creation interface provided by an embodiment of the present disclosure
  • FIG. 3 is a flow chart of a video generation method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of processing historical video provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a Monte Carlo tree provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a video generation device provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • first and second in the embodiments of the present disclosure are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • the video application can provide users with functions such as posting videos and playing videos.
  • a high-quality video such as a video with a large number of playbacks and a high completion rate
  • the user needs to first shoot the original video, and then edit, splice, dub, etc. the original video to finally obtain the above-mentioned complete video.
  • an embodiment of the present disclosure provides a method for generating a video, the method comprising: obtaining description information of a target video to be generated, and obtaining a target segment corresponding to the description information from a video material library according to the description information of the target video , and then splicing each of the candidate segments in the multiple candidate segments in the video material library to obtain multiple candidate videos, and then evaluating the score of each candidate video in the multiple candidate videos, and finally based on each of the multiple candidate videos Scoring of candidate videos to generate target videos.
  • the top 3 candidate videos may be used as target videos.
  • the target video can be generated without the user performing complex operations such as editing and stitching the original video, which simplifies the user's operation and reduces the difficulty of video creation.
  • the higher the score the higher the playback volume and the higher the completion rate of the candidate video, so as to meet the needs of users and increase the enthusiasm of users to create videos.
  • the method can be applied to video generation systems.
  • the method is specifically implemented in the form of a computer program.
  • the computer program may be independent, for example, it may be an independent application with corresponding functions.
  • the computer program may be a functional module or a plug-in, etc., attached to an existing application to run.
  • the video generation system can automatically generate the target video based on the description information of the target video.
  • the descriptive information of the target video may be the descriptive information input by the user, or may be generated randomly or based on historical descriptive information (for example, the user does not input the descriptive information of the target video).
  • the video generation method provided by the embodiments of the present disclosure can be executed by the terminal alone, by the server alone, or by the terminal and the server in cooperation.
  • the video generation method is executed by the terminal alone, it indicates that the video generation system can be offline run.
  • the video generation method is executed cooperatively by a terminal and a server as an example for illustration below.
  • the video generation system 100 includes a terminal 110 , a terminal 120 , a terminal 130 and a server 140 .
  • the terminal 110, the terminal 120, and the server 140 are connected through a network.
  • terminal 110 may be a terminal of a video creator (such as a target user)
  • terminal 120 may be a terminal of a small number of video viewers (such as a first user range)
  • terminal 130 may be a terminal of a large number of video viewers (such as a second user range)
  • Terminals include but are not limited to smartphones, tablet computers, notebook computers, personal digital assistants (personal digital assistant, PDA) or smart wearable devices.
  • the server 140 may be a cloud server, for example, a central server in a central cloud computing cluster, or an edge server in an edge cloud computing cluster. Certainly, the server 140 may also be a server in a local data center.
  • An on-premises data center refers to a data center directly controlled by the user.
  • the terminal 110 may present a human-computer interaction interface to the target user, so that the target user can create the target video.
  • FIG. 2 which is a schematic diagram of a video creation interface provided by an embodiment of the present disclosure, the video creation interface includes: a description information input area 201 and a creation control 202 .
  • the description information input area 201 is used to input the description information of the target video to be generated, and the creation control 202 is used to automatically generate the target video.
  • the target user can input the description information of the target video in the description information input area 201, and then click the creation control 202, and the terminal 110 can convert the description information input by the target user based on the target user's click operation on the creation control 202 Sent to server 140.
  • the terminal 110 may carry the description information input by the target user in the video generation request. In this way, the server 140 can acquire the description information of the target video to be generated.
  • the target user may not input any description information in the description information input area 201, but directly clicks on the creation control 202, and the terminal 110 may click on the creation control 202 based on the target user, and click on the description information input area 201.
  • the terminal 110 can send the input description information to the server 140; when it does not exist, the terminal 110 can randomly generate the description information or generate the description information based on the history description information, and then Then send to the server 140.
  • the terminal 110 may also send a video generation request that does not carry description information to the server, and after server 140 receives the video generation request that does not carry description information,
  • the description information is generated randomly or based on historical description information, so that the server 140 can obtain the description information of the target video to be generated.
  • the server 140 After the server 140 obtains the description information of the target video, it can obtain the target segment corresponding to the description information from the video material library based on the description information, and then compare the target segment with each of the multiple candidate segments in the video material library.
  • Candidate clips are spliced to obtain multiple candidate videos, and finally the multiple candidate videos are scored, and based on the scores of multiple candidate videos, a target video with a higher score is generated. For example, one target video may be generated, and the target video may be the video with the highest score, or multiple target videos may be generated, and the multiple target videos may be videos with the top N scores, where N is the number of target videos.
  • video creation through the video generation system can simplify a large number of means and operations, reduce the difficulty of video creation, meet the needs of users, and can further ensure the completion rate and playback volume of the target video, and improve the user's ability to create videos. enthusiasm.
  • the following describes the video generation method provided by the embodiment of the present disclosure from the perspectives of the terminal 110 , the terminal 120 , the terminal 130 and the server 140 .
  • this figure is a video generation method provided by an embodiment of the present disclosure, the method includes:
  • the terminal 110 generates a video generation request based on an operation of a target user.
  • the terminal 110 sends a video generation request to the server 140.
  • the terminal 110 may present a video creation interface to the target user. As shown in FIG. 2 , the target user clicks the creation control 202 of the video creation interface, and then the terminal 110 may obtain the operation of the target user. Based on this operation, a video generation request is then generated. Wherein, when there is descriptive information input by the target user in the descriptive information input area 201, the terminal 110 may include the descriptive information input by the target user in the video generation request.
  • the descriptive information input by the target user in the descriptive information input area 201 may be script, topic, film title, action and so on.
  • the script includes characters and lines (for example, what the characters say), etc., so that the server 140 can retrieve video clips corresponding to the script from the video material library, and then splicing them to obtain the target video.
  • Topics include economy, science and technology, etc., and the server 140 can retrieve video clips corresponding to the topic (for example, videos of ships, ports, and robots) from the video material library, and then perform splicing to obtain the target video.
  • the film title may be the title of a film and television drama, etc., and the server 140 may intercept highlight segments in the film and television drama, and then splicing to obtain the target video.
  • the action may be a classic action of a character in a film and television drama
  • the server 140 may search a video clip corresponding to the action in the video material library, and then splicing to obtain the target video. Furthermore, the server 140 may also add the target user's video uploaded by the target user during the splicing of the video clips, which will be described in detail later.
  • the terminal 110 may randomly generate descriptive information or generate descriptive information based on historical descriptive information, and then carry the generated descriptive information in the video generation request .
  • the terminal 110 may not generate description information, and after the server 140 parses the video generation request, the server 140 randomly generates description information or generates description information based on historical description information.
  • the historical description information may be description information historically input by the target user, and generating description information based on the historical description information may be generating description information with a high degree of similarity to the historical description information.
  • the present disclosure does not limit this, and those skilled in the art can choose an appropriate way to generate description information according to actual needs.
  • the server 140 obtains description information of the target video to be generated according to the video generation request.
  • the server 140 may analyze the video generation request.
  • the server 140 can obtain the description information of the target video to be generated from the video generation request; when the terminal 110 does not carry the description information of the target video in the video generation request , the server 140 may randomly generate description information or generate description information based on historical description information. In this way, the server 140 can obtain the description information of the target video to be generated.
  • the server 140 acquires the target segment corresponding to the description information from the video material library.
  • the video material library can be obtained based on historical videos, and the historical videos can be all types of videos, such as entertainment videos, game videos, beauty makeup videos, teaching videos, commentary videos, and so on.
  • this figure is a schematic diagram of processing historical videos provided by the present disclosure.
  • historical video can be semantically storyboarded.
  • ASR automatic speech recognition
  • the historical video can be speech recognized, and then the corresponding text of the historical video can be obtained.
  • the text includes multiple sentences and punctuation between sentences (for example: comma, full stop, semicolon, etc.), the time stamp corresponding to the punctuation on the video can be used as a split point. After the division point is determined, the time stamp is determined, and then the video is divided based on the time stamp to obtain multiple video segments.
  • the time stamp corresponding to the period on the video can also be used as the segmentation point, so that the historical video can be segmented based on the segmentation point corresponding to the period, which can further ensure the semantic coherence of each video segment obtained, thereby avoiding Issue with poor quality of individual video clips.
  • the above segment features may be text features, image features, or fusion features obtained by fusing text features and image features.
  • text features can be BERT (Bidirectional Encoder Representations from Transformers) features
  • image features can be computer vision (CV) features.
  • CV computer vision
  • the above fragment features can be represented by embedding.
  • action recognition can be performed on each video segment, and then the description information of each video segment can be obtained, such as clapping, smiling, and running.
  • face recognition may be performed on each video segment, and then the description information of each video segment, such as person 1, may be obtained.
  • scene recognition may also be performed on each video segment, and then description information of each video segment, such as geographical location and scenery, may be obtained.
  • S403 may be executed first, and then S402 may be executed, or S402 and S403 may be executed simultaneously.
  • each video segment there is an identification and description information of a cluster center corresponding to the video segment.
  • the description information is used for the target user to search the video clips through the terminal 110, and the identification of the clustering center is used for the server 140 to retrieve the video clips from the video material library.
  • the video clips in the video material library can also be scored, and then the videos in the video material library can be sorted based on the score, for example, video clips with higher scores are sorted first.
  • the scoring may be based on the click-through rate of the video segments.
  • the target segment corresponding to the description information may be obtained from the video material.
  • the identification of the clustering center corresponding to the target video is obtained based on the description information, and then based on the identification of the clustering center, the video segment corresponding to the identification is retrieved from the video material library as the above-mentioned target segment.
  • the server 140 splices the target segment with each of the candidate segments in the video material library to obtain multiple candidate videos.
  • the server 140 may obtain multiple candidate videos based on a Monte Carlo tree search manner. For example, the video material library is searched with the target segment as the root node, and each node corresponds to a video segment in the video material library.
  • the search process of Monte Carlo tree includes: selection (selection), node expansion (node expansion) and backpropagation (backpropagation), which are introduced respectively below.
  • the nodes to be searched can be selected through the upper confidence bound (UCB) algorithm. Specifically, the node to be searched can be determined by the following formula:
  • c 1 , c 2 and c 3 respectively control different prior weights
  • a and b represent the child nodes of node s
  • a k represents node a as the optimal child node of node s
  • k represents the layer of the current tree
  • N(s, a) represents the number of visits to node a
  • N(s, b) represents the number of visits to node b
  • R(s, a) represents the first evaluation value of node a, for example, it can be characterized by CTR
  • C(s, a) represents the second evaluation value of node a, for example, can be characterized by a semantic coherence score
  • Q(s, a) represents the average value of the first evaluation value and the second evaluation value of node a.
  • node expansion If the above-mentioned node to be searched is a leaf node, perform node expansion, that is, add a new node under the leaf node.
  • the node expansion of the root node may be splicing the target segment with multiple candidate segments in the video material library, and the multiple sub-nodes obtained after the root node is expanded may be multiple candidate videos.
  • this figure is a schematic diagram of a Monte Carlo tree provided by an embodiment of the present disclosure.
  • node 1 is the root node, and this node 1 corresponds to the target segment.
  • Node 2 and node 3 are new nodes obtained by node expansion of node 1.
  • node 4 and node 5 are new nodes obtained by node expansion of node 2.
  • Each node corresponds to a video clip, as shown in Figure 5, node 1-node 2-node 4 can correspond to a candidate video, node 1-node 2-node 5 can also correspond to a candidate video, node 1-node 3 can also correspond to a candidate video.
  • the target user may upload a video of the target user, for example, it may be a video taken by the target user, or a video stored in the terminal 110 . Then the server 140 splices the video of the target user with the plurality of candidate videos obtained above to obtain spliced candidate videos.
  • the video of the target user may be spliced to the start position of the candidate video, or may be spliced to the middle position of the candidate video, or may be spliced to the end position of the candidate video.
  • the server 140 evaluates the score of each candidate video in the plurality of candidate videos.
  • a candidate video is obtained by splicing multiple video clips, and each video clip has an identification of its corresponding cluster center. Based on this, each candidate video can be represented by an identification sequence.
  • each candidate video can be scored through a pre-trained evaluation model, for example, the identification sequence of each candidate video is input to the evaluation model, and the evaluation model outputs Score for each candidate video.
  • the scoring process may be based on the semantic coherence, CTR, etc. of the candidate videos.
  • the spliced candidate videos may also be scored.
  • the server 140 generates a target video based on the score of each candidate video in the plurality of candidate videos.
  • a candidate video with the highest score may be selected from multiple candidate videos as the target video, or multiple candidate videos with top N scores may be selected from the multiple candidate videos as the target video.
  • the embodiment of the present disclosure does not limit this, and those skilled in the art may choose to generate one target video or multiple target videos according to actual needs. After the target video is obtained, other candidate videos among the plurality of candidate videos are removed.
  • the server 140 delivers the target video to the terminal 120.
  • the terminal 120 sends the behavior information for the target video to the server 140.
  • the terminal 120 is a terminal in a first user range, and the number of users in the first user range is less than a preset number, that is, the number of users in the first user range is relatively small.
  • the server 140 first releases the target video to the range of the first user.
  • the behavior information includes the completion rate and/or the playback duration.
  • the behavior information page can be characterized by like behavior, repost behavior, viewing information, click behavior, etc. .
  • the server 140 judges whether the behavior information meets the preset condition; if yes, execute S312; if not, execute S313.
  • the behavior information meeting the preset condition may be that the broadcast completion rate is greater than the first preset threshold and/or the playback duration is greater than the second preset threshold. In this way, when the broadcast completion rate is greater than the first preset threshold, it indicates that the feedback of users within the first user range on the target video is good feedback; similarly, when the playback duration is greater than the second preset threshold, that is It indicates that the feedback of the users within the range of the first user to the target video is good feedback.
  • the server 140 delivers the target video to the terminal 130.
  • the terminal 130 is a terminal in the second user range, and the number of users in the second user range is higher than the preset number, that is, the number of users in the second user range is relatively large. Also, the users within the second user range are higher than the users within the first user range.
  • the behavior information of the users within the first user range on the target video meets the preset condition, it indicates that the feedback from the users within the first user range on the target video is good feedback, that is, the effect of small traffic delivery is better. Then, the target video is delivered within the scope of the second user, that is, a large traffic delivery is performed, so that a better delivery effect can be achieved and waste of resources required for video delivery can be reduced.
  • the server 140 stops delivering the target video.
  • the behavior information of the users within the first user range on the target video does not meet the preset conditions, it indicates that the feedback of the users within the first user range on the target video is poor feedback, that is, the effect of small traffic delivery is poor. Then, large traffic delivery is no longer performed, thereby reducing the waste of resources required for video delivery.
  • the server 140 may also receive adjustment information of the target video from the target user, so as to adjust the target video according to the adjustment information, and then deliver small traffic to the target video again. If the effect of small traffic delivery is better, large traffic delivery will be performed; if the effect of small traffic delivery is poor, large traffic delivery will not be performed.
  • an embodiment of the present disclosure provides a video generation method.
  • the method first obtains the description information of the target video to be generated, and then obtains the description information corresponding to the description information from the video material library based on the description information of the target video.
  • a target segment splicing the target segment with each of the multiple candidate segments in the video material library to obtain multiple candidate videos.
  • evaluate the plurality of candidate videos respectively obtain the score of each candidate video in the plurality of candidate videos, and obtain the target video based on the scores.
  • the top 3 candidate videos may be used as target videos. It can be seen that in this method, the target video can be generated without the user performing complex operations such as editing and stitching the original video, which simplifies the user's operation and reduces the difficulty of video creation.
  • Fig. 6 is a schematic diagram of a video generation device according to an exemplary disclosed embodiment. As shown in Fig. 6, the video generation device 600 includes:
  • Obtaining module 601 for obtaining the descriptive information of the target video to be generated; according to the descriptive information of the target video, obtain the target segment corresponding to the descriptive information from the video material library;
  • a splicing module 602 configured to splice the target segment with each candidate segment in a plurality of candidate segments in the video material library to obtain a plurality of candidate videos;
  • An evaluation module 603, configured to evaluate the score of each candidate video in the plurality of candidate videos
  • a generation module 604 configured to generate the target video according to the score of each candidate video in the plurality of candidate videos.
  • the evaluation module 603 is specifically configured to evaluate the plurality of candidate videos based on the semantic continuity of adjacent segments of each candidate video in the plurality of candidate videos and/or the click-through rate of each segment. The score of each candidate in the video.
  • the obtaining module 601 is specifically configured to obtain description information input by the target user for the target video to be generated.
  • the description information is randomly generated or generated based on historical description information.
  • the acquiring module 601 is also configured to acquire the video of the target user of the target user;
  • the splicing module 602 is further configured to splice the target user video with the plurality of candidate videos to obtain spliced candidate videos;
  • the evaluation module 603 is specifically configured to evaluate the scores of the spliced candidate videos
  • the generating module 604 is specifically configured to generate the target video according to the scores of the spliced candidate videos.
  • the device also includes a delivery module
  • the delivery module is used to deliver the target video within the scope of the first user
  • the acquiring module 601 is further configured to acquire behavior information of users within the range of the first user on the target video;
  • the delivery module is also used to deliver the target video within the scope of the second user when the behavior information meets the preset condition;
  • the users in the first user range are less than the users in the second user range.
  • the behavior information includes a completion rate and/or a playback duration
  • the preset condition includes that the completion rate is greater than a first preset threshold and/or the playback duration is greater than a second preset threshold.
  • the acquiring module 601 is further configured to receive adjustment information of the target video from the target user; and adjust the target video according to the adjustment information.
  • FIG. 7 shows a schematic structural diagram of an electronic device 7 suitable for implementing an embodiment of the present disclosure.
  • the electronic device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 700 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) Various appropriate actions and processes are executed by programs in the memory (RAM) 703 . In the RAM 703, various programs and data necessary for the operation of the electronic device 700 are also stored.
  • the processing device 701, ROM 702, and RAM 703 are connected to each other through a bus 704.
  • An input/output (I/O) interface 705 is also connected to the bus 704 .
  • the following devices can be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 707 such as a computer; a storage device 708 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 709.
  • the communication means 709 may allow the electronic device 700 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 7 shows electronic device 700 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 709, or from storage means 708, or from ROM 702.
  • the processing device 701 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the terminal and the server can communicate with any currently known or future-developed network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium (eg, communication network) interconnections.
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks ("LANs”), wide area networks ("WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the description information of the target video to be generated; according to the description information of the target video , obtaining the target segment corresponding to the description information from the video material library; splicing the target segment with each of the multiple candidate segments in the video material library to obtain multiple candidate videos; evaluating the The score of each candidate video in the plurality of candidate videos; and generating the target video according to the score of each candidate video in the plurality of candidate videos.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to connected via the Internet.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the first obtaining module may also be described as "a module for obtaining at least two Internet Protocol addresses".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a document search method to obtain the description information of the target video to be generated; according to the description information of the target video, obtain the description from the video material library The target segment corresponding to the information; the target segment is respectively spliced with each candidate segment in a plurality of candidate segments in the video material library to obtain a plurality of candidate videos; evaluating the performance of each candidate video in the plurality of candidate videos Score: generating the target video according to the score of each candidate video in the plurality of candidate videos.
  • example 2 provides the method of example 1, and the evaluation of the score of each candidate video in the plurality of candidate videos includes:
  • Example 3 provides the method of Example 1 or 2, and the acquisition of the description information of the target video to be generated includes:
  • Example 4 provides the method of Example 1 or 2, and the description information is randomly generated or generated based on historical description information.
  • Example 5 provides the method of Examples 1-4, the method further comprising:
  • the evaluation of the scoring of each candidate video in the plurality of candidate videos includes:
  • the generating the target video according to the scoring of each candidate video in the plurality of candidate videos includes:
  • the target video is generated according to the scores of the spliced candidate videos.
  • Example 6 provides the methods of Examples 1 to 5, the method further comprising:
  • the users in the first user range are less than the users in the second user range.
  • Example 7 provides the method of Example 6, the behavior information includes the completion rate and/or the playback duration, and the preset condition includes that the completion rate is greater than the first preset The threshold and/or the playing duration are greater than a second preset threshold.
  • Example 8 provides the method of Examples 1-7, the method further comprising:

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本公开提供了一种视频生成方法、装置、设备、介质及产品,涉及计算机技术领域,该方法包括获取待生成的目标视频的描述信息;根据所述目标视频的描述信息,从视频素材库中获取与所述描述信息对应的目标片段;接着将所述目标片段分别与所述视频素材库中多个候选片段中每个候选片段进行拼接,得到多个候选视频,然后评估所述多个候选视频中每个候选视频的评分;根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。可见,该方法能够自动进行视频的创作,从而简化用户的操作,降低视频的创作难度。

Description

一种视频生成方法、装置、设备、介质及产品
本申请要求于2022年1月29日提交的申请号为202210111702.1、申请名称为“一种视频生成方法、装置、设备、介质及产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,尤其涉及一种视频生成方法、装置、设备、计算机可读存储介质以及计算机程序产品。
背景技术
随着计算机技术尤其是移动互联网技术的不断发展,各种各样的应用(application,APP)应运而生。以视频应用为例,用户可以基于该视频应用发布视频,用户也可以基于该视频应用观看其他用户发布的视频。
用户在创作视频过程中,需要先拍摄得到原始视频,在对原始视频进行剪辑、拼接、配音等操作后,最终得到完整的视频。可见,上述视频的创作过程中,需要用户进行繁琐的操作,视频的创作难度较大。
发明内容
本公开的目的在于:提供了一种视频生成方法、装置、设备、计算机可读存储介质以及计算机程序产品,能够自动地创作视频,简化用户的操作,降低视频的创作难度。
第一方面,本公开提供了一种视频生成方法,所述方法包括:
获取待生成的目标视频的描述信息;
根据所述目标视频的描述信息,从视频素材库的多个视频片段中获取与所述描述信息对应的目标片段,所述多个视频片段具有对应的描述信息;
基于所述目标片段和所述多个视频片段,确定多个候选片段;
将所述目标片段分别与所述多个候选片段中每个候选片段进行拼接,得到多个候选视频;
评估所述多个候选视频中每个候选视频的评分;
根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。
第二方面,本公开提供了一种视频生成装置,包括:
获取模块,用于获取待生成的目标视频的描述信息;根据所述目标视频的描述信息,从视频素材库的多个视频片段中获取与所述描述信息对应的目标片段,所述多个视频片段具有对应的描述信息;基于所述目标片段和所述多个视频片段,确定多个候选片段;
拼接模块,用于将所述目标片段分别与所述多个候选片段中每个候选片段进行拼接,得到多个候选视频;
评估模块,用于评估所述多个候选视频中每个候选视频的评分;
生成模块,用于根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。
第三方面,本公开提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面中任一项所述方法的步骤。
第四方面,本公开提供了一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开第一方面中任一项所述方法的步骤。
第五方面,本公开提供了一种包含指令的计算机程序产品,当其在设备上运行时,使得设备执行上述第一方面中的任一种实现方式所述的方法。
从以上技术方案可以看出,本公开具有如下优点:
本公开提供了一种视频生成方法,该方法先获取待生成的目标视频的描述信息,接着基于该目标视频的描述信息,从视频素材库中获取与描述信息对应的目标片段,将该目标片段分别与视频素材库中多个候选片段中每个候选片段进行拼接,进而得到多个候选视频。再分别对多个候选视频进行评估,得到多个候选视频中每个候选视频的评分,基于评分得到目标视频。例如可以将评分前3的候选视频作为目标视频。可见,在该方法中,无需用户对原始视频进行剪辑、拼接等复杂的操作即可生成目标视频,简化用户的操作,降低视频的创作难度。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
为了更清楚地说明本公开实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本公开实施例提供的一种视频生成系统的示意图;
图2为本公开实施例提供的一种视频创作界面的示意图;
图3为本公开实施例提供的一种视频生成方法的流程图;
图4为本公开实施例提供的一种对历史视频进行处理的示意图;
图5为本公开实施例提供的一种蒙特卡洛树的示意图;
图6为本公开实施例提供的一种视频生成装置的示意图;
图7为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
本公开实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本公开实施例中所涉及到的一些技术术语进行介绍。
视频应用作为娱乐类型的应用中的一种,能够为用户提供发布视频、播放视频等功能。用户在制作高质量视频(例如播放量较多、完播率较高的视频)时,需要先拍摄原始视频,然后对原始视频进行剪辑、拼接、配音等操作,最终得到上述完整的视频。
可见,上述视频的创作过程中,需要用户进行繁琐的操作,视频的创作难度较大。并且,用户手动对原始视频进行剪辑等操作后得到的视频的不确定性也较高,例如,无法满足用户的需求,从而导致播放量较差、完播率较低,从而打消用户再次创作视频的积极性。
有鉴于此,本公开实施例提供了一种视频生成方法,该方法包括:获取待生成的目标视频的描述信息,根据目标视频的描述信息,从视频素材库中获取与描述信息对应的目标 片段,接着将目标片段分别视频素材库中多个候选片段中每个候选片段进行拼接得到多个候选视频,然后评估多个候选视频中每个候选视频的评分,最后基于多个候选视频中每个候选视频的评分,生成目标视频。例如可以将评分前3的候选视频作为目标视频。
可见,在该方法中,无需用户对原始视频进行剪辑、拼接等复杂的操作即可生成目标视频,简化用户的操作,降低视频的创作难度。进一步的,评分越高,表征该候选视频的播放量越高、完播率越高,从而满足用户的需求,提高用户创作视频的积极性。
该方法可以应用于视频生成系统。该方法应用于视频生成系统时,具体是以计算机程序的形式实现。在一些实施例中,该计算机程序可以是独立的,例如可以是具有相应功能的独立应用。在另一些实施例中,该计算机程序可以是功能模块或插件等,附着于已有的应用中运行。
举例说明,以视频生成系统附着于短视频应用(视频应用的一种)为例,视频生成系统可以基于目标视频的描述信息,自动生成目标视频。其中,目标视频的描述信息可以是用户输入的描述信息,也可以是随机生成或基于历史描述信息生成(例如用户没有输入目标视频的描述信息的情况)。如此,简化用户创作视频的操作,降低用户创作视频的难度,提高用户创作视频的积极性。
本公开实施例提供的视频生成方法可以由终端单独执行,也可以由服务器单独执行,也可以由终端和服务器协同执行,当该视频生成方法由终端单独执行时,表明该视频生成系统可以是离线运行。为了便于理解,下面以视频生成方法由终端和服务器协同执行为例,进行示例说明。
为了使得本公开的技术方案更加清楚、易于理解,下面结合附图对本公开实施例提供的视频生成系统的架构进行介绍。
参见图1所示的视频生成系统100的系统架构图,视频生成系统100包括终端110、终端120、终端130和服务器140。终端110、终端120和服务器140通过网络连接。其中,终端110可以为视频创作者(如目标用户)终端,终端120可以为少量视频观众(如第一用户范围)的终端,终端130可以为大量视频观众(如第二用户范围)的终端,终端包括但不限于智能手机、平板电脑、笔记本电脑、个人数字助理(personal digital assistant,PDA)或者智能穿戴设备等。服务器140可以是云服务器,例如是中心云计算集群中的中心服务器,或者是边缘云计算集群中的边缘服务器。当然,服务器140也可以是本地数据中心中的服务器。本地数据中心是指用户直接控制的数据中心。
在一些示例中,终端110可以向目标用户呈现人机交互界面,以便目标用户创作目标视频。如图2所示,该图为本公开实施例提供的一种视频创作界面的示意图,该视频创作界面包括:描述信息输入区201和创作控件202。其中,描述信息输入区201用于输入待生成的目标视频的描述信息,创作控件202用于自动生成目标视频。
在一些实施例中,目标用户可以在描述信息输入区201输入目标视频的描述信息,然后点击创作控件202,终端110可以基于目标用户对创作控件202的点击操作,将目标用户所输入的描述信息发送给服务器140。例如,终端110可以将目标用户输入的描述信息携带在视频生成请求中。如此,服务器140可以获取待生成的目标视频的描述信息。
在另一些实施例中,目标用户也可以不在描述信息输入区201输入任何描述信息,而是直接点击创作控件202,终端110可以基于目标用户对创作控件202的点击操作,对描述信息输入区201是否存在已输入的描述信息进行检测,当存在时,终端110可以将已输入的描述信息发送给服务器140;当不存在时,终端110可以随机生成描述信息或基于历史描述信息生成描述信息,然后再发送给服务器140。当描述信息输入区201不存在已输入的描述信息时,终端110还可以将向服务器发送不携带有描述信息的视频生成请求,服务器140在接收到该不携带有描述信息的视频生成请求后,随机生成描述信息或基于历史描述信息生成描述信息,如此服务器140可以获取待生成的目标视频的描述信息。
服务器140获取到目标视频的描述信息后,即可基于该描述信息从视频素材库中获取与该描述信息对应的目标片段,然后将该目标片段分别与视频素材库中的多个候选片段中每个候选片段进行拼接,进而得到多个候选视频,最后对多个候选视频进行评分,基于多个候选视频的分数,生成分数较高的目标视频。例如可以是生成一个目标视频,该目标视频为分数最高的视频,也可以是生成多个目标视频,该多个目标视频为分数排名前N的视频,其中,N为目标视频的个数。
可见,通过该视频生成系统来进行视频创作,能够简化大量的手段操作,降低视频创作的难度,符合用户的需求,并且还能够进一步保证目标视频的完播率、播放量等,提高用户创作视频的积极性。
为了使得本公开的技术方案更加清楚、易于理解,下面从终端110、终端120、终端130和服务器140的角度,对本公开实施例提供的视频生成方法进行介绍。
如图3所示,该图为本公开实施例提供的一种视频生成方法,该方法包括:
S301、终端110基于目标用户的操作,生成视频生成请求。
S302、终端110向服务器140发送视频生成请求。
在一些示例中,终端110可以向目标用户呈现视频创作界面,如图2所示,目标用户点击该视频创作界面的创作控件202,接着终端110可以获取到目标用户的操作。然后基于该操作,生成视频生成请求。其中,当描述信息输入区201存在目标用户输入的描述信息时,终端110可以将目标用户输入的描述信息携带在该视频生成请求中。
在一些实施例中,目标用户在描述信息输入区201输入的描述信息可以是剧本、话题、片名、动作等。其中,剧本包括人物以及台词(例如人物说话的内容)等,如此服务器140可以从视频素材库中检索与该剧本对应的视频片段,然后进行拼接得到目标视频。话题包括经济、科技等,服务器140可以从视频素材库中检索与该话题对应的视频片段(例如可以是轮船、港口、机器人的视频),然后进行拼接得到目标视频。片名可以是影视剧的名称等,服务器140可以对该影视剧中的高光片段进行截取,然后拼接得到目标视频。动作可以是影视剧中人物的经典动作,服务器140可以在视频素材库中检索与该动作对应的视频片段,然后拼接得到目标视频。进一步的,服务器140在拼接视频片段过程中还可以添加目标用户上传的目标用户视频,后续进行详细介绍。
在另一些示例中,当描述信息输入区201不存在目标用户输入的描述信息时,终端110可以随机生成描述信息或基于历史描述信息生成描述信息,然后将生成的描述信息携带在 视频生成请求中。当然,终端110也可以不生成描述信息,在服务器140解析该视频生成请求后,由服务器140随机生成描述信息或基于历史描述信息生成描述信息。
其中,历史描述信息可以是目标用户历史输入的描述信息,基于历史描述信息生成描述信息可以是生成与历史描述信息相似度较高的描述信息。本公开对此不进行限定,本领域技术人员可以根据实际需要选择合适的方式生成描述信息。
S303、服务器140根据视频生成请求,获得待生成的目标视频的描述信息。
如上述,服务器140在接收到该视频生成请求后,可以对该视频生成请求进行解析。当终端110在视频生成请求中携带目标视频的描述信息时,服务器140可以从该视频生成请求中获取待生成的目标视频的描述信息;当终端110未在视频生成请求中携带目标视频的描述信息时,服务器140可以随机生成描述信息或基于历史描述信息生成描述信息。如此,服务器140可以获得待生成的目标视频的描述信息。
S304、服务器140根据描述信息,从视频素材库中获取与描述信息对应的目标片段。
其中,视频素材库中可以基于历史视频得到,该历史视频可以是全类型的全部视频,例如娱乐视频、游戏视频、美妆视频、教学视频、解说视频等等。
如图4所示,该图为本公开提供的一种对历史视频进行处理的示意图。
S401、预先将历史视频分割为多个视频片段,作为视频素材库中的视频素材。
在一些示例中,可以对历史视频进行语义分镜。例如,可以基于自动语音识别(automatic speech recognition,ASR)技术,对历史视频进行语音识别,进而得到历史视频对应的文本,该文本包括多个句子以及句子之间的标点(例如:逗号、句号、分号等),该标点在视频上对应的时间戳可以作为分割点。确定分割点后,即确定了时间戳,然后基于时间戳对视频进行分割处理,进而得到多个视频片段。在一些示例中,也可以以句号在视频上对应的时间戳作为分割点,如此基于句号对应的分割点对历史视频进行分割处理,能够进一步保证得到的每个视频片段的语义连贯性,从而避免单独的视频片段质量较差的问题。
S402、提取上述每个视频片段的片段特征,然对片段特征进行聚类得到聚类中心,不同的聚类中心可以通过聚类中心的标识表征。
上述片段特征可以是文本特征、也可以是图像特征、还可以是文本特征和图像特征融合后的融合特征。在一些示例中,文本特征可以是BERT(Bidirectional Encoder Representations from Transformers)特征,图像特征可以是计算机视觉(computer vision,CV)特征。上述片段特征可以通过embedding表示。
S403、对每个视频片段进行识别处理,进而得到每个视频片段的描述信息(例如可以是视频的标签、关键字等)。
在一些示例中,可以对每个视频片段进行动作识别,进而得到每个视频片段的描述信息,例如鼓掌、微笑、跑步。在另一些示例中,可以对每个视频片段进行人脸识别,进而得到每个视频片段的描述信息,例如人物1。在另一些示例中,还可以对每个视频片段进行场景识别,进而得到每个视频片段的描述信息,例如地理位置、风景。
需要说明的是,本公开实施不限定上述S402和S403的执行顺序,在另一些示例中,可以先执行S403,后执行S402,也可以同时执行S402和S403。
S404、将每个视频片段的描述信息以及聚类中心的标识进行存储,进而得到视频素材库。
针对每个视频片段均存在与该视频片段对应的聚类中心的标识以及描述信息。其中,描述信息用于目标用户通过终端110对视频片段进行搜索,聚类中心的标识用于服务器140从视频素材库中检索视频片段。
在一些实施例中,还可以对视频素材库中的视频片段进行评分,然后基于该评分对视频素材库中的视频进行排序,例如,评分较高的视频片段排序靠前。在一些示例中,可以基于视频片段的点击通过率进行评分。
在一些示例中,服务器获取待生成的目标视频的描述信息后,基于该描述信息,从视频素材可以中获取与该描述信息对应的目标片段。例如,基于描述信息获取目标视频对应的聚类中心的标识,然后基于该聚类中心的标识,从视频素材库中检索与该标识对应的视频片段,作为上述目标片段。
S305、服务器140将目标片段分别与视频素材库中多个候选片段中每个候选片段进行拼接,得到多个候选视频。
在一些实施例中,服务器140可以基于蒙特卡洛树的搜索方式,得到多个候选视频。例如,以目标片段作为根节点对视频素材库进行搜索,每个节点对应视频素材库中的一个视频片段。其中,蒙特卡洛树的搜索过程包括:选择(selection)、节点扩展(node expansion)以及反向传播(backpropagation),下面分别介绍。
selection:可以通过置信区间上界(upper confidence bound,UCB)算法的方式,选择待搜索的节点。具体地,可以通过如下公式确定待搜索的节点:
Figure PCTCN2023070291-appb-000001
其中,c 1、c 2和c 3分别控制不同的先验的权重,a、b表示节点s的子节点,a k为表示节点a为节点s的最优子节点,k表示当前树的层数,N(s,a)表示访问节点a的次数,N(s,b)表示访问节点b的次数,R(s,a)表示节点a的第一评估值,例如可以通过CTR来表征,C(s,a)表示节点a的第二评估值,例如可以通过语义连贯性评分来表征,Q(s,a)表示表示节点a的第一评估值和第二评估值的平均值。
node expansion:若上述待搜索的节点为叶子节点,则进行节点扩展,即在该叶子节点下增加新的节点。对应于本公开实施例中,对根节点进行节点扩展可以是将目标片段与视频素材库中的多个候选片段进行拼接,对根节点进行扩展后得到的多个子节点可以是多个候选视频。
如图5所示,该图为本公开实施例提供的一种蒙特卡洛树的示意图。其中,节点1为根节点,该节点1对应目标片段,节点2和节点3为对节点1进行节点扩展得到的新节点,类似的,节点4和节点5为对节点2进行节点扩展得到的新节点。每个节点均对应一个视 频片段,如图5所示,节点1-节点2-节点4可以对应于一个候选视频,节点1-节点2-节点5也可以对应于一个候选视频,节点1-节点3也可以对应于一个候选视频。
Backpropagation:在完成一次节点扩展后,对每个节点的数据进行更新。例如,对上述Q(s,a)、R(s,a)、C(s,a)、N(s,a)和N(s,b)进行更新。
当然,在另一些实施例中,目标用户可以上传目标用户视频,例如可以是目标用户所拍摄的视频,也可以是存储在终端110中的视频。然后服务器140将该目标用户视频与上述得到的多个候选视频进行拼接,得到拼接后的候选视频。在一些示例中,可以将目标用户视频拼接到候选视频的起始位置,也可以拼接到候选视频的中间位置,还可以拼接到候选视频的末尾位置。
S306、服务器140评估多个候选视频中每个候选视频的评分。
候选视频由多个视频片段进行拼接得到,每个视频片段均存在与其对应的聚类中心的标识,基于此,每个候选视频均可以通过标识序列表征。
在一些实施例中,在确定每个候选视频的标识序列后,可以通过预先训练的评估模型,对每个候选视频进行评分,例如将每个候选视频的标识序列输入给评估模型,评估模型输出每个候选视频的评分。其中,评分的过程可以基于候选视频的语义连贯性、CTR等。
当然,在另一些实施例中,将目标用户视频与多个候选视频进行拼接后,还可以对拼接后的候选视频进行评分。
S307、服务器140基于多个候选视频中每个候选视频的评分,生成目标视频。
在一些实施例中,可以从多个候选视频中选择一个评分最高的候选视频作为目标视频,也可以从多个候选视频中选择多个评分前N的候选视频作为目标视频。本公开实施例对此不进行限定,本领域技术人员可以根据实际需要选择生成1个目标视频或多个目标视频。在得到目标视频后,则将多个候选视频中其他的候选视频去掉。
S309、服务器140向终端120投放目标视频。
S310、终端120向服务器140发送针对目标视频的行为信息。
终端120为第一用户范围的终端,该第一用户范围的用户低于预设数量,即第一用户范围内的用户数量较少。服务器140先向第一用户范围内投放该目标视频。获取第一用户范围内的用户对该目标视频的行为信息,行为信息包括完播率和/或播放时长,其中行为信息页可以通过点赞行为、转发行为、观看信息、点踩行为等来表征。
S311、服务器140判断行为信息是否符合预设条件;若是,则执行S312;若否,则执行S313。
行为信息符合预设条件可以是完播率大于第一预设阈值和/或播放时长大于第二预设阈值。如此,当完播率大于第一预设阈值时,即表明第一用户范围内的用户对该目标视频的反馈为较好的反馈;类似的,当播放时长大于第二预设阈值时,即表明第一用户范围内的用户对该目标视频的反馈为较好的反馈。
S312、服务器140向终端130投放目标视频。
终端130位第二用户范围的终端,该第二用户范围的用户高于预设数量,即第二用户范围内的用户数量较多。并且,第二用户范围内的用户高于第一用户范围内的用户。
当第一用户范围内的用户对目标视频的行为信息符合预设条件时,表明第一用户范围内的用户针对目标视频的反馈为较好的反馈,即小流量投放的效果较好。然后,再在第二用户范围内进行投放该目标视频,即再进行大流量投放,如此能够达到较好的投放效果,减少投放视频所需要的资源的浪费。
S313、服务器140停止投放目标视频。
当第一用户范围内的用户对目标视频的行为信息不符合预设条件时,表明第一用户范围内的用户针对目标视频的反馈为较差的反馈,即小流量投放的效果较差。然后,则不再进行大流量投放,从而减少投放视频所需要的资源的浪费。
在一些实施例中,服务器140还可以接收目标用户对该目标视频的调整信息,从而根据该调整信息对该目标视频进行调整,进而再次对该目标视频进行小流量投放。若小流量投放的效果较好,则进行大流量投放,若小流量投放的效果较差,则不进行大流量投放。
基于上述内容描述,本公开实施例提供了一种视频生成方法,该方法先获取待生成的目标视频的描述信息,接着基于该目标视频的描述信息,从视频素材库中获取与描述信息对应的目标片段,将该目标片段分别与视频素材库中多个候选片段中每个候选片段进行拼接,进而得到多个候选视频。再分别对多个候选视频进行评估,得到多个候选视频中每个候选视频的评分,基于评分得到目标视频。例如可以将评分前3的候选视频作为目标视频。可见,在该方法中,无需用户对原始视频进行剪辑、拼接等复杂的操作即可生成目标视频,简化用户的操作,降低视频的创作难度。
图6是根据一示例性公开实施例示出的一种视频生成装置的示意图,如图6所示,所述视频生成装置600包括:
获取模块601,用于获取待生成的目标视频的描述信息;根据所述目标视频的描述信息,从视频素材库中获取与所述描述信息对应的目标片段;
拼接模块602,用于将所述目标片段分别与所述视频素材库中多个候选片段中每个候选片段进行拼接,得到多个候选视频;
评估模块603,用于评估所述多个候选视频中每个候选视频的评分;
生成模块604,用于根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。
可选的,所述评估模块603,具体用于基于所述多个候选视频中每个候选视频的相邻片段的语义连续性和/或每个片段的点击通过率,评估所述多个候选视频中每个候选视频的评分。
可选的,获取模块601,具体用于获取目标用户针对待生成的目标视频输入的描述信息。
可选的,所述描述信息为随机生成或基于历史描述信息生成。
可选的,所述获取模块601,还用于获取目标用户的目标用户视频;
所述拼接模块602,还用于将所述目标用户视频分别与所述多个候选视频进行拼接,得到拼接后的候选视频;
所述评估模块603,具体用于评估所述拼接后的候选视频的评分;
所述生成模块604,具体用于根据所述拼接后的候选视频的评分,生成所述目标视频。
可选的,所述装置还包括投放模块;
所述投放模块,用于在第一用户范围内投放所述目标视频;
所述获取模块601,还用于获取所述第一用户范围内的用户对所述目标视频的行为信息;
所述投放模块,还用于当所述行为信息符合预设条件时,在第二用户范围内投放所述目标视频;
其中,所述第一用户范围中的用户少于所述第二用户范围内的用户。
可选的,所述行为信息包括完播率和/或播放时长,所述预设条件包括所述完播率大于第一预设阈值和/或所述播放时长大于第二预设阈值。
可选的,所述获取模块601,还用于接收所述目标用户对所述目标视频的调整信息;根据所述调整信息,调整所述目标视频。
上述各模块的功能在上一实施例中的方法步骤中已详细阐述,在此不做赘述。
下面参考图7,其示出了适于用来实现本公开实施例的电子设备7的结构示意图。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备700可以包括处理装置(例如中央处理器、图形处理器等)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储装置708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有电子设备700操作所需的各种程序和数据。处理装置701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。
通常,以下装置可以连接至I/O接口705:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置706;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置707;包括例如磁带、硬盘等的存储装置708;以及通信装置709。通信装置709可以允许电子设备700与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备700,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置709从网络上被下载和安装,或者从存储装置708被安装,或者从ROM 702被安装。在该计算机程序被处理装置701执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程 只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,终端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取待生成的目标视频的描述信息;根据所述目标视频的描述信息,从视频素材库中获取与所述描述信息对应的目标片段;将所述目标片段分别与所述视频素材库中多个候选片段中每个候选片段进行拼接,得到多个候选视频;评估所述多个候选视频中每个候选视频的评分;根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。 也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,第一获取模块还可以被描述为“获取至少两个网际协议地址的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种文档搜索方法,获取待生成的目标视频的描述信息;根据所述目标视频的描述信息,从视频素材库中获取与所述描述信息对应的目标片段;将所述目标片段分别与所述视频素材库中多个候选片段中每个候选片段进行拼接,得到多个候选视频;评估所述多个候选视频中每个候选视频的评分;根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述评估所述多个候选视频中每个候选视频的评分,包括:
基于所述多个候选视频中每个候选视频的相邻片段的语义连续性和/或每个片段的点击通过率,评估所述多个候选视频中每个候选视频的评分。
根据本公开的一个或多个实施例,示例3提供了示例1或2的方法,所述获取待生成的目标视频的描述信息,包括:
获取目标用户针对待生成的目标视频输入的描述信息。
根据本公开的一个或多个实施例,示例4提供了示例1或2的方法,所述描述信息为随机生成或基于历史描述信息生成。
根据本公开的一个或多个实施例,示例5提供了示例1-4的方法,所述方法还包括:
获取目标用户的目标用户视频;
将所述目标用户视频分别与所述多个候选视频进行拼接,得到拼接后的候选视频;
所述评估所述多个候选视频中每个候选视频的评分,包括:
评估所述拼接后的候选视频的评分;
所述根据所述多个候选视频中每个候选视频的评分,生成所述目标视频,包括:
根据所述拼接后的候选视频的评分,生成所述目标视频。
根据本公开的一个或多个实施例,示例6提供了示例1至5的方法,所述方法还包括:
在第一用户范围内投放所述目标视频,获取所述第一用户范围内的用户对所述目标视频的行为信息;
当所述行为信息符合预设条件时,在第二用户范围内投放所述目标视频;
其中,所述第一用户范围中的用户少于所述第二用户范围内的用户。
根据本公开的一个或多个实施例,示例7提供了示例6的方法,所述行为信息包括完播率和/或播放时长,所述预设条件包括所述完播率大于第一预设阈值和/或所述播放时长大于第二预设阈值。
根据本公开的一个或多个实施例,示例8提供了示例1-7的方法,所述方法还包括:
接收所述目标用户对所述目标视频的调整信息;
根据所述调整信息,调整所述目标视频。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (12)

  1. 一种视频生成方法,其特征在于,所述方法包括:
    获取待生成的目标视频的描述信息;
    根据所述目标视频的描述信息,从视频素材库的多个视频片段中获取与所述描述信息对应的目标片段,所述多个视频片段具有对应的描述信息;
    基于所述目标片段和所述多个视频片段,确定多个候选片段;
    将所述目标片段分别与所述多个候选片段中每个候选片段进行拼接,得到多个候选视频;
    评估所述多个候选视频中每个候选视频的评分;
    根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。
  2. 根据权利要求1所述的方法,其特征在于,所述评估所述多个候选视频中每个候选视频的评分,包括:
    基于所述多个候选视频中每个候选视频的相邻片段的语义连续性和/或每个片段的点击通过率,评估所述多个候选视频中每个候选视频的评分。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取待生成的目标视频的描述信息,包括:
    获取目标用户针对待生成的目标视频输入的描述信息。
  4. 根据权利要求1或2所述的方法,其特征在于,所述描述信息为随机生成或基于历史描述信息生成。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    获取目标用户的目标用户视频;
    将所述目标用户视频分别与所述多个候选视频进行拼接,得到拼接后的候选视频;
    所述评估所述多个候选视频中每个候选视频的评分,包括:
    评估所述拼接后的候选视频的评分;
    所述根据所述多个候选视频中每个候选视频的评分,生成所述目标视频,包括:
    根据所述拼接后的候选视频的评分,生成所述目标视频。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:
    在第一用户范围内投放所述目标视频,获取所述第一用户范围内的用户对所述目标视频的行为信息;
    当所述行为信息符合预设条件时,在第二用户范围内投放所述目标视频;
    其中,所述第一用户范围中的用户少于所述第二用户范围内的用户。
  7. 根据权利要求6所述的方法,其特征在于,所述行为信息包括完播率和/或播放时长,所述预设条件包括所述完播率大于第一预设阈值和/或所述播放时长大于第二预设阈值。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述方法还包括:
    接收所述目标用户对所述目标视频的调整信息;
    根据所述调整信息,调整所述目标视频。
  9. 一种视频生成装置,其特征在于,包括:
    获取模块,用于获取待生成的目标视频的描述信息;根据所述目标视频的描述信息,从视频素材库的多个视频片段中获取与所述描述信息对应的目标片段,所述多个视频片段具有对应的描述信息;基于所述目标片段和所述多个视频片段,确定多个候选片段;
    拼接模块,用于将所述目标片段分别与所述多个候选片段中每个候选片段进行拼接,得到多个候选视频;
    评估模块,用于评估所述多个候选视频中每个候选视频的评分;
    生成模块,用于根据所述多个候选视频中每个候选视频的评分,生成所述目标视频。
  10. 一种电子设备,其特征在于,包括:
    存储装置,其上存储有计算机程序;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1至8中任一项所述的方法。
  11. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1至8中任一项所述的方法。
  12. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行如权利要求1至8中任一项所述的方法。
PCT/CN2023/070291 2022-01-29 2023-01-04 一种视频生成方法、装置、设备、介质及产品 WO2023142917A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/393,567 US12160643B2 (en) 2022-01-29 2023-12-21 Video generation method and apparatus, and device, medium and product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210111702.1 2022-01-29
CN202210111702.1A CN114501064B (zh) 2022-01-29 2022-01-29 一种视频生成方法、装置、设备、介质及产品

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/393,567 Continuation US12160643B2 (en) 2022-01-29 2023-12-21 Video generation method and apparatus, and device, medium and product

Publications (1)

Publication Number Publication Date
WO2023142917A1 true WO2023142917A1 (zh) 2023-08-03

Family

ID=81479303

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/070291 WO2023142917A1 (zh) 2022-01-29 2023-01-04 一种视频生成方法、装置、设备、介质及产品

Country Status (3)

Country Link
US (1) US12160643B2 (zh)
CN (1) CN114501064B (zh)
WO (1) WO2023142917A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708008B (zh) * 2021-12-30 2024-12-03 北京有竹居网络技术有限公司 一种推广内容处理方法、装置、设备、介质及产品
CN114501064B (zh) * 2022-01-29 2023-07-14 北京有竹居网络技术有限公司 一种视频生成方法、装置、设备、介质及产品
CN115348459A (zh) * 2022-08-16 2022-11-15 支付宝(杭州)信息技术有限公司 短视频处理方法及装置
CN117786159A (zh) * 2022-09-22 2024-03-29 北京有竹居网络技术有限公司 文本素材获取方法、装置、设备、介质和程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188479A1 (en) * 2017-12-14 2019-06-20 Google Llc Generating synthesis videos
CN112004163A (zh) * 2020-08-31 2020-11-27 北京市商汤科技开发有限公司 视频生成方法及装置、电子设备和存储介质
CN113259754A (zh) * 2020-02-12 2021-08-13 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
US20210264446A1 (en) * 2020-02-20 2021-08-26 Adobe Inc. Enhancing media content effectiveness using feedback between evaluation and content editing
CN114501064A (zh) * 2022-01-29 2022-05-13 北京有竹居网络技术有限公司 一种视频生成方法、装置、设备、介质及产品

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015925A1 (en) * 2000-03-28 2006-01-19 Gotuit Media Corp Sales presentation video on demand system
WO2005107264A1 (en) * 2004-04-30 2005-11-10 British Broadcasting Corporation Media content and enhancement data delivery
US9071859B2 (en) * 2007-09-26 2015-06-30 Time Warner Cable Enterprises Llc Methods and apparatus for user-based targeted content delivery
US10178422B1 (en) * 2017-09-20 2019-01-08 Rovi Guides, Inc. Systems and methods for generating aggregated media assets based on related keywords
CN108307240B (zh) * 2018-02-12 2019-10-22 北京百度网讯科技有限公司 视频推荐方法和装置
US10860860B1 (en) * 2019-01-03 2020-12-08 Amazon Technologies, Inc. Matching videos to titles using artificial intelligence
CN111798879B (zh) * 2019-04-08 2022-05-03 百度(美国)有限责任公司 用于生成视频的方法和装置
CN109874029B (zh) * 2019-04-22 2021-02-12 腾讯科技(深圳)有限公司 视频描述生成方法、装置、设备及存储介质
CN112235631B (zh) * 2019-07-15 2022-05-03 北京字节跳动网络技术有限公司 视频处理方法、装置、电子设备及存储介质
CN113038149A (zh) 2019-12-09 2021-06-25 上海幻电信息科技有限公司 直播视频互动方法、装置以及计算机设备
CN111683209B (zh) * 2020-06-10 2023-04-18 北京奇艺世纪科技有限公司 混剪视频的生成方法、装置、电子设备及计算机可读存储介质
CN112784078A (zh) 2021-01-22 2021-05-11 哈尔滨玖楼科技有限公司 一种基于语义识别的视频自动剪辑方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188479A1 (en) * 2017-12-14 2019-06-20 Google Llc Generating synthesis videos
CN113259754A (zh) * 2020-02-12 2021-08-13 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
US20210264446A1 (en) * 2020-02-20 2021-08-26 Adobe Inc. Enhancing media content effectiveness using feedback between evaluation and content editing
CN112004163A (zh) * 2020-08-31 2020-11-27 北京市商汤科技开发有限公司 视频生成方法及装置、电子设备和存储介质
CN114501064A (zh) * 2022-01-29 2022-05-13 北京有竹居网络技术有限公司 一种视频生成方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
US20240147023A1 (en) 2024-05-02
US12160643B2 (en) 2024-12-03
CN114501064B (zh) 2023-07-14
CN114501064A (zh) 2022-05-13

Similar Documents

Publication Publication Date Title
US11917344B2 (en) Interactive information processing method, device and medium
WO2023142917A1 (zh) 一种视频生成方法、装置、设备、介质及产品
US20240107127A1 (en) Video display method and apparatus, video processing method, apparatus, and system, device, and medium
JP2023553101A (ja) ライブストリーミングインタラクション方法、装置、デバイス及び媒体
US20190130185A1 (en) Visualization of Tagging Relevance to Video
JP6971292B2 (ja) 段落と映像を整列させるための方法、装置、サーバー、コンピュータ可読記憶媒体およびコンピュータプログラム
US10545954B2 (en) Determining search queries for obtaining information during a user experience of an event
WO2022105760A1 (zh) 一种多媒体浏览方法、装置、设备及介质
WO2021057740A1 (zh) 视频生成方法、装置、电子设备和计算机可读介质
WO2023016349A1 (zh) 一种文本输入方法、装置、电子设备和存储介质
US10061761B2 (en) Real-time dynamic visual aid implementation based on context obtained from heterogeneous sources
WO2020215852A1 (zh) 信息处理方法、装置、终端设备及服务器
WO2022161328A1 (zh) 视频处理方法、装置、存储介质及设备
CN114816599B (zh) 图像显示方法、装置、设备及介质
CN114329223A (zh) 媒体内容搜索方法、装置、设备及介质
CN115052188B (zh) 一种视频剪辑方法、装置、设备及介质
WO2023000782A1 (zh) 获取视频热点的方法、装置、可读介质和电子设备
WO2023174073A1 (zh) 视频生成方法、装置、设备、存储介质和程序产品
WO2023143518A1 (zh) 直播间话题推荐方法、装置、设备及介质
JP7586919B2 (ja) 情報を交換するための方法及び装置
CN115547330A (zh) 基于语音交互的信息展示方法、装置和电子设备
CN116935261A (zh) 数据处理方法及相关装置
CN112699687A (zh) 内容编目方法、装置和电子设备
CN112182290A (zh) 一种信息处理方法、装置和电子设备
CN114708008B (zh) 一种推广内容处理方法、装置、设备、介质及产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23745802

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23745802

Country of ref document: EP

Kind code of ref document: A1