WO2024160260A1 - Video processing method and apparatus, device, and storage medium - Google Patents
Video processing method and apparatus, device, and storage medium Download PDFInfo
- Publication number
- WO2024160260A1 WO2024160260A1 PCT/CN2024/075333 CN2024075333W WO2024160260A1 WO 2024160260 A1 WO2024160260 A1 WO 2024160260A1 CN 2024075333 W CN2024075333 W CN 2024075333W WO 2024160260 A1 WO2024160260 A1 WO 2024160260A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- segment
- video segment
- time interval
- semantic
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 4
- 238000000034 method Methods 0.000 claims abstract description 37
- 230000004044 response Effects 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims description 69
- 238000004458 analytical method Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 8
- 230000000007 visual effect Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 6
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 5
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 5
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 4
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000013475 authorization Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4667—Processing of monitored end-user data, e.g. trend analysis based on the log file of viewer selections
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Definitions
- Example embodiments of the present disclosure generally relate to the field of computers, and more particularly, to methods, devices, apparatuses, and computer-readable storage media for video processing.
- a method for video processing comprises: determining a video segment from the first video based on access information of the first video, wherein the access information indicates the distribution of access statistics of the first video over video time; determining semantic continuity between the video segment and the first video; and in response to the semantic continuity being higher than a threshold, generating a second video by appending the video segment to the front of the first video.
- a device for video processing includes: a determination module configured to determine a video segment from a first video based on access information of the first video, the access information indicating the distribution of access statistics of the first video over video time; a judgment module configured to determine semantic continuity between the video segment and the first video; and an editing module configured to, in response to the semantic continuity being higher than a threshold, edit the video segment by editing the video segment.
- the segment is appended to the first video to generate the second video.
- an electronic device in a third aspect of the present disclosure, includes at least one processing unit; and at least one memory, the at least one memory is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit. When the instructions are executed by the at least one processing unit, the device executes the method of the first aspect.
- a computer-readable storage medium wherein a computer program is stored on the computer-readable storage medium, and the computer program can be executed by a processor to implement the method of the first aspect.
- FIG1 shows a schematic diagram of an example environment in which embodiments according to the present disclosure may be implemented
- FIG2 shows a flow chart of an example process of video processing according to some embodiments of the present disclosure
- FIG3 shows a schematic structural block diagram of a device for video processing according to some embodiments of the present disclosure.
- FIG. 4 illustrates a block diagram of an electronic device capable of implementing various embodiments of the present disclosure.
- the embodiments of the present disclosure may involve user data, data acquisition and/or use, etc. These aspects are subject to the corresponding laws, regulations and relevant provisions.
- all data collection, acquisition, processing, processing, forwarding, use, etc. are carried out on the premise that the user knows and confirms. Accordingly, when implementing each embodiment of the present disclosure, the type, scope of use, usage scenario, etc. of the data or information that may be involved should be informed to the user and the user's authorization should be obtained in an appropriate manner in accordance with the relevant laws and regulations.
- the specific notification and/or authorization method can vary according to the actual situation and application scenario, and the scope of the present disclosure is not limited in this respect.
- the initial content of the video has a great influence on whether the user continues to watch the video.
- the attractive content is not always located at the top of the video, which may cause people to miss such parts.
- manually selecting the highlights in the video requires a lot of manpower and may have the defect of insufficient accuracy.
- a video segment can be determined from a first video based on access information of the first video, wherein the access information indicates the distribution of access statistics of the first video over the video time. Further, the semantic continuity between the video segment and the first video can be determined. If the semantic continuity is higher than a threshold, a second video can be generated by appending the video segment to the front of the first video.
- the embodiment of the present disclosure can automatically determine the video clips that the user may be interested in from the video, and if the semantic continuity between the video clips and the original video is good, the video clips are added to the original video. Therefore, the embodiment of the present disclosure can improve the attractiveness of the video content and ensure the semantic continuity of the video content.
- FIG1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented.
- the environment 100 may include a processing device 120.
- the processing device 120 may include any suitable electronic device, examples of which may include but are not limited to: a mobile device, a tablet computer, a laptop computer, a desktop computer, a cloud server, an edge computing device, etc.
- the processing device 120 may obtain the first video 110 and the access information 130 of the first video 110.
- the first video 110 may be a video released by a creator.
- the first video 110 may include an advertisement video, which may receive a click operation from a user and perform corresponding interactions, such as guiding to a corresponding promotional page or purchase page.
- the access information 130 may indicate the distribution of access statistics of the first video 110 over the video time.
- Such access statistics may include, for example, the user click rate and/or user churn rate of the first video 110. It should be understood that if the user click rate of a video at a certain moment is high, it may indicate that the user is more interested in the content at that moment; on the contrary, if the user churn rate of a video at a certain moment is high (i.e., the user stops watching the video), the user may be more interested in the content at that moment. The proportion of users at that moment may indicate that the users are less interested in the content at that moment.
- the processing device 120 may determine a video segment 140 from the first video 110 according to the access information 130 of the first video 110.
- a video segment 140 may be part of the content of the first video 110, or may be generated based on part of the content of the first video 110.
- the processing device 120 may determine the semantic continuity between the video segment 140 and the first video 110 , and if the semantic continuity is above a threshold, append the video segment 140 to the first video 110 to generate the second video 150 .
- the video segment 140 may be edited as the header of the second video 150.
- the process of generating the video segment 140 and determining the semantic continuity will be described in detail below in conjunction with FIG.
- Process 200 shows a flow chart of an example process 200 for video processing according to some embodiments of the present disclosure.
- Process 200 may be implemented at processing device 120.
- Process 200 is described below with reference to FIG.
- the processing device 120 determines a video segment 140 from the first video 110 based on access information 130 of the first video 110 , wherein the access information 130 indicates distribution of access statistics of the first video 110 over video time.
- the processing device 120 may obtain the first video 110, such that the first video 110 may be, for example, a video that has been published.
- the processing device 120 may determine the first video 110 from the video collection based on the number of views and/or clicks of the video in the video collection.
- the processing device 120 may obtain the first video 110 whose number of views is greater than a preset number from a video library accessible to the public based on the number of views and/or clicks of the video.
- processing device 120 may also obtain access information 130 of the first video 110.
- access information 130 is used to indicate the distribution of the user's interest in the first video 110 over the video time.
- the access information 130 may include, for example, the distribution of the video click rate and/or video loss rate of the first video 110 along with the video time of the first video 110 .
- the processing device 120 may determine the time interval corresponding to the segment based on the access statistics. Specifically, the processing device 120 may determine the target moment of the first video 110 based on the access information 130, wherein the access statistics of the first video at the target moment meet the threshold requirement. Exemplarily, such a target moment may be the moment when the user click rate of the first video 110 is the highest and/or the moment when the user churn rate is the lowest.
- the processing device 120 may also determine a time interval associated with the target moment based on semantic recognition of the text content of the first video 110, wherein the text segment corresponding to the time interval has continuous semantics.
- the processing device 120 may recognize the speech of the first video 110 to obtain its text content. Additionally, the processing device 120 may obtain a text segment with continuous semantics associated with the target moment based on semantic recognition of the text content.
- the time length of such a text segment (ie, the length of the determined time interval) needs to meet a preset length range.
- the processing device 120 may determine a time interval of 3 seconds to 7 seconds based on the target time and semantic information.
- the processing device 120 may also add punctuation to the text content, and determine a single complete sentence associated with the target moment based on the text content after the punctuation is added.
- the embodiments of the present disclosure can ensure that the text content corresponding to the determined time interval is semantically continuous and complete.
- the processing device 120 may obtain a video segment 140 corresponding to the time interval.
- the processing device 120 may directly determine the segment of the first video 110 corresponding to the time interval as the video segment 140.
- the processing device 120 may also use an appropriate storyboard model (e.g., TransNet V2 Model) is used to divide the first video 110 into a group of storyboard segments, each of which may correspond to a different storyboard, for example.
- an appropriate storyboard model e.g., TransNet V2 Model
- the processing device 120 may generate a video segment based on the plurality of storyboard segments. For example, the processing device 120 may generate the video segment 140 by combining the plurality of storyboard segments. Alternatively, the processing device 120 may also add a smoothing effect such as fade-in and fade-out between the plurality of storyboard segments to construct the video segment 140.
- the embodiments of the present disclosure can provide a video segment 140 that is semantically continuous, semantically complete, and storyboard-continuous.
- the processing device 120 may not generate the video segment 140.
- a target time range may, for example, include a first preset duration associated with the start moment of the first video 110 (e.g., the first five seconds of the video), and/or a second preset duration associated with the end moment of the first video 110 (e.g., the last five seconds of the video).
- the processing device 120 may perform determination of a time interval to generate the video segment 140 .
- the processing device 120 determines semantic continuity of the video segment 140 with the first video 110 .
- the processing device 120 may determine semantic continuity based on the features of the video segment 140 and the features of the first video 110 .
- the processing device 120 may process features of the video segment 140 and the first video 110 using an analysis model to determine semantic continuity, wherein the features include at least one of the following: visual features of the video, speech features of the video, or text features of the video.
- the processing device 120 may utilize an appropriate machine learning model as an analysis model, and may use visual features, speech features, text features and/or other appropriate features or feature combinations of the two videos as inputs to the analysis model to determine the semantic continuity between the two videos.
- a suitable training device in order to train a machine learning model to have the ability to judge the semantics of a video
- a suitable training device (which may be the same as or different from the processing device 120) can use sample data to analyze the model.
- sample data may include positive sample data and/or negative sample data, for example.
- such positive sample data may include, for example, a first video segment and a second video segment, wherein the first video segment is related to a first semantic continuous shot of a reference video, and the second video segment is another video segment in the reference video that is different from the first video segment.
- the training device can extract a semantically continuous shot from a published reference video based on the storyboard model. Further, the training device can determine the video segment corresponding to such a shot and other video segments of the reference video as positive sample data to indicate that such two video segments are semantically continuous.
- such negative sample data may include a third video segment and a fourth video segment from different videos to indicate that such video segments are discontinuous.
- the embodiments of the present disclosure can automatically and efficiently determine the semantic continuity between the generated video clip and the original video content.
- the processing device 120 In response to the semantic continuity being above the threshold, the processing device 120 generates the second video 150 by appending the video segment 140 to the front of the first video 110 .
- the analysis model may output the semantic continuity as a continuous numerical value (eg, a value between 0 and 1 to indicate its semantic continuity), or may output the semantic continuity as a discrete numerical value (eg, 0 for discontinuity and 1 for continuity).
- the processing device 120 may determine that the generated video segment 140 is suitable for being appended to the first video 110 .
- the processing device 120 generates the second video 150 by editing the video clip 140 before the first video 110.
- the generated second video 150 can have a title content that is more attractive to users, and does not affect the continuous viewing experience of the second video 150.
- the processing device 120 may further publish the edited second video 150 .
- the embodiments of the present disclosure can utilize the delayed knowledge of the video (e.g., Access information) to perform intelligent editing of videos, thereby creating video content that is more attractive to users and ensuring the viewing experience of such video content.
- the delayed knowledge of the video e.g., Access information
- the embodiments of the present disclosure also provide corresponding devices for implementing the above methods or processes.
- FIG. 3 shows a schematic structural block diagram of an apparatus 300 for video processing according to some embodiments of the present disclosure.
- the apparatus 300 may be implemented as or included in the processing device 120.
- Each module/component in the apparatus 300 may be implemented by hardware, software, firmware or any combination thereof.
- the apparatus 300 comprises a determination module 310 configured to determine a video segment from the first video based on access information of the first video, wherein the access information indicates a distribution of access statistics of the first video over video time.
- the apparatus 300 further includes a determination module 320 configured to determine semantic continuity between the video segment and the first video.
- the apparatus 300 further comprises an editing module 330 configured to generate a second video by appending a video segment to the front of the first video in response to the semantic continuity being higher than a threshold.
- the determination module 310 is further configured to: determine the target moment of the first video based on the access information, wherein the access statistics of the first video at the target moment meet the threshold requirement; determine the time interval associated with the target moment based on the semantic recognition of the text content of the first video, wherein the text segment corresponding to the time interval has continuous semantics; and obtain the video segment corresponding to the time interval.
- the determination module 310 is further configured to: divide the first video into a group of storyboard segments using a storyboard model; and generate a video segment based on the multiple storyboard segments in response to the time interval being associated with the multiple storyboard segments.
- the length of the time interval is within a preset length range.
- the text segment corresponds to a single complete sentence in the text content, and the single complete sentence is determined based on adding punctuation to the text content.
- the determination module 310 is further configured to: in response to the target time not falling within the target time range, determine a time interval, wherein the target time range includes: A first preset duration associated with a start time of a video, and/or a second preset duration associated with an end time of the first video.
- the judgment module 320 is further configured to: use the analysis model to process features of the video clip and the first video to determine semantic continuity, the features including at least one of the following: visual features of the video, voice features of the video, or text features of the video.
- the analysis model is trained based on the following sample data: positive sample data, including a first video clip and a second video clip, the first video clip is related to the first semantic continuous shot of a reference video, and the second video is other video clips in the reference video that are different from the first video clip; or negative sample data, including a third video clip and a fourth video clip from different videos.
- the access statistics indicate at least: video click-through rate and/or user churn rate.
- the apparatus 300 further includes a video selection module configured to determine a first video from the video set based on the number of plays and/or clicks of the videos in the video set.
- the first video comprises an advertisement video.
- FIG4 shows a block diagram of an electronic device 400 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the electronic device 400 shown in FIG4 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein. The electronic device 400 shown in FIG4 may be used to implement the processing device 120 of FIG1 .
- the electronic device 400 is in the form of a general electronic device.
- the components of the electronic device 400 may include, but are not limited to, one or more processors or processing units 410, a memory 420, a storage device 430, one or more communication units 440, one or more input devices 450, and one or more output devices 460.
- the processing unit 410 may be an actual or virtual processor and is capable of performing various processes according to a program stored in the memory 420. In a multi-processor system, multiple processing units execute computer executable instructions in parallel to improve the parallel processing capability of the electronic device 400.
- the electronic device 400 typically includes a plurality of computer storage media. Such media may be any accessible media that the electronic device 400 can access, including but not limited to volatile and non-volatile media, removable and non-removable media.
- the memory 420 may be a volatile memory.
- the storage device 430 may be a removable or non-removable medium and may include a machine-readable medium such as a flash drive, a disk, or any other medium that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within the electronic device 400.
- the electronic device 400 may further include additional removable/non-removable, volatile/non-volatile storage media.
- a disk drive for reading or writing from a removable, non-volatile disk e.g., a “floppy disk”
- an optical drive for reading or writing from a removable, non-volatile optical disk may be provided.
- each drive may be connected to a bus (not shown) by one or more data media interfaces.
- the memory 420 may include a computer program product 425 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
- the communication unit 440 implements communication with other electronic devices through a communication medium. Additionally, the functions of the components of the electronic device 400 can be implemented with a single computing cluster or multiple computing machines that can communicate through a communication connection. Therefore, the electronic device 400 can operate in a networked environment using a logical connection with one or more other servers, a network personal computer (PC), or another network node.
- PC network personal computer
- the input device 450 may be one or more input devices, such as a mouse, a keyboard, a tracking ball, etc.
- the output device 460 may be one or more output devices, such as a display, a speaker, a printer, etc.
- the electronic device 400 may also communicate with one or more external devices (not shown) through the communication unit 440 as needed, such as a storage device, a display device, etc., communicate with one or more devices that allow a user to interact with the electronic device 400, or communicate with any device that allows the electronic device 400 to communicate with one or more other electronic devices (e.g., a network card, a modem, etc.). Such communication may be performed via an input/output (I/O) interface (not shown).
- I/O input/output
- a computer-readable storage medium on which computer-executable instructions are stored, wherein the computer-executable instructions are executed by a processor.
- a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, and the computer-executable instructions are executed by a processor to implement the method described above.
- These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processing unit of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated.
- These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
- Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, so that the instructions executed on the computer, other programmable data processing apparatus, or other device implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
- each box in the flowchart or block diagram may represent a module, a program segment, or a portion of an instruction, which contains one or more executable instructions for implementing a specified logical function.
- the functions marked in the boxes may also occur in an order different from that marked in the accompanying drawings. For example, two consecutive boxes may actually be executed substantially in parallel, and they may sometimes be executed in the opposite order. It depends on the functions involved.
- each block in the block diagram and/or flow chart, and combinations of blocks in the block diagram and/or flow chart can be implemented by a dedicated hardware-based system that performs the specified functions or actions, or can be implemented by a combination of dedicated hardware and computer instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Television Signal Processing For Recording (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
Abstract
The embodiments of the present disclosure relate to a video processing method and apparatus, a device, and a storage medium. The method provided herein comprises: based on access information of a first video, determining a video clip from the first video, the access information indicating the distribution over the time of the video of access statistics data of the first video; determining the semantic continuity of the video clip and the first video; and in response to the semantic continuity exceeding a threshold, adding the video clip in front of the first video, thereby generating a second video. Based on the above method, the embodiments of the present disclosure can make a video more attractive to a user by adding in front of the video a specific clip from the video.
Description
本申请要求2023年2月1递交的、标题为“视频处理的方法、装置、设备和存储介质”、申请号为202310126804.5的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese invention patent application entitled “Video Processing Method, Device, Equipment and Storage Medium” filed on February 1, 2023 and application number 202310126804.5, the entire contents of which are incorporated by reference into this application.
本公开的示例实施例总体涉及计算机领域,特别地涉及视频处理的方法、装置、设备和计算机可读存储介质。Example embodiments of the present disclosure generally relate to the field of computers, and more particularly, to methods, devices, apparatuses, and computer-readable storage media for video processing.
随着计算机技术的发展,各类视频内容已经成为人们获取内容的主要途径之一。尤其是对于一些短视频内容而言,人们通常是通过最初播放的一些内容来确定是否对其感兴趣,并确定是继续观看后续内容还是切换至播放其它视频内容。With the development of computer technology, various video contents have become one of the main ways for people to obtain content. Especially for some short video contents, people usually determine whether they are interested in them through some of the contents played initially, and decide whether to continue watching the subsequent contents or switch to play other video contents.
发明内容Summary of the invention
在本公开的第一方面,提供了一种视频处理的方法。该方法包括:基于第一视频的访问信息,从第一视频中确定视频片段,访问信息指示第一视频的访问统计数据随视频时间的分布;确定视频片段与第一视频的语义连续性;以及响应于语义连续性高于阈值,通过将视频片段附加至第一视频前,生成第二视频。In a first aspect of the present disclosure, a method for video processing is provided. The method comprises: determining a video segment from the first video based on access information of the first video, wherein the access information indicates the distribution of access statistics of the first video over video time; determining semantic continuity between the video segment and the first video; and in response to the semantic continuity being higher than a threshold, generating a second video by appending the video segment to the front of the first video.
在本公开的第二方面,提供了一种用于视频处理的装置。该装置包括:确定模块,被配置为基于第一视频的访问信息,从第一视频中确定视频片段,访问信息指示第一视频的访问统计数据随视频时间的分布;判断模块,被配置为确定视频片段与第一视频的语义连续性;以及编辑模块,被配置为响应于语义连续性高于阈值,通过将视频片
段附加至第一视频前,生成第二视频。In a second aspect of the present disclosure, a device for video processing is provided. The device includes: a determination module configured to determine a video segment from a first video based on access information of the first video, the access information indicating the distribution of access statistics of the first video over video time; a judgment module configured to determine semantic continuity between the video segment and the first video; and an editing module configured to, in response to the semantic continuity being higher than a threshold, edit the video segment by editing the video segment. The segment is appended to the first video to generate the second video.
在本公开的第三方面,提供了一种电子设备。该设备包括至少一个处理单元;以及至少一个存储器,至少一个存储器被耦合到至少一个处理单元并且存储用于由至少一个处理单元执行的指令。指令在由至少一个处理单元执行时使设备执行第一方面的方法。In a third aspect of the present disclosure, an electronic device is provided. The device includes at least one processing unit; and at least one memory, the at least one memory is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit. When the instructions are executed by the at least one processing unit, the device executes the method of the first aspect.
在本公开的第四方面,提供了一种计算机可读存储介质。该计算机可读存储介质上存储有计算机程序,计算机程序可由处理器执行以实现第一方面的方法。In a fourth aspect of the present disclosure, a computer-readable storage medium is provided, wherein a computer program is stored on the computer-readable storage medium, and the computer program can be executed by a processor to implement the method of the first aspect.
应当理解,本内容部分中所描述的内容并非旨在限定本公开的实施例的关键特征或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的描述而变得容易理解。It should be understood that the contents described in this content section are not intended to limit the key features or important features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:The above and other features, advantages and aspects of the embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. In the accompanying drawings, the same or similar reference numerals represent the same or similar elements, wherein:
图1示出了其中可以实施根据本公开的实施例的示例环境的示意图;FIG1 shows a schematic diagram of an example environment in which embodiments according to the present disclosure may be implemented;
图2示出了根据本公开的一些实施例的视频处理的示例过程的流程图;FIG2 shows a flow chart of an example process of video processing according to some embodiments of the present disclosure;
图3示出了根据本公开的一些实施例的用于视频处理的装置的示意性结构框图;以及FIG3 shows a schematic structural block diagram of a device for video processing according to some embodiments of the present disclosure; and
图4示出了能够实施本公开的多个实施例的电子设备的框图。FIG. 4 illustrates a block diagram of an electronic device capable of implementing various embodiments of the present disclosure.
下面将参照附图更详细地描述本公开的实施例。虽然附图中示出了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本
公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。The embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be construed as being limited to the embodiments described herein. Instead, these embodiments are provided to provide a more thorough and complete understanding of the present disclosure. The disclosed drawings and embodiments are only for illustrative purposes and are not intended to limit the protection scope of the present disclosure.
需要注意的是,本文中所提供的任何节/子节的标题并不是限制性的。本文通篇描述了各种实施例,并且任何类型的实施例都可以包括在任何节/子节下。此外,在任一节/子节中描述的实施例可以以任何方式与同一节/子节和/或不同节/子节中描述的任何其他实施例相结合。It should be noted that the titles of any sections/subsections provided herein are not restrictive. Various embodiments are described throughout this article, and any type of embodiment may be included under any section/subsection. In addition, the embodiments described in any section/subsection may be combined in any manner with any other embodiments described in the same section/subsection and/or different sections/subsections.
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“一些实施例”应当理解为“至少一些实施例”。下文还可能包括其他明确的和隐含的定义。术语“第一”、“第二”等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "including" and similar terms should be understood as open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions may be included below. The terms "first", "second", etc. may refer to different or the same objects. Other explicit and implicit definitions may be included below.
本公开的实施例中可能涉及用户的数据、数据的获取和/或使用等。这些方面均遵循相应的法律法规及相关规定。在本公开的实施例中,所有数据的采集、获取、处理、加工、转发、使用等,都是在用户知晓并且确认的前提下进行的。相应地,在实现本公开的各实施例时,均应根据相关法律法规通过适当的方式,将可能所涉及的数据或信息的类型、使用范围、使用场景等告知用户并获得用户的授权。具体的告知和/或授权方式可以根据实际情况和应用场景而变化,本公开的范围在此方面不受限制。The embodiments of the present disclosure may involve user data, data acquisition and/or use, etc. These aspects are subject to the corresponding laws, regulations and relevant provisions. In the embodiments of the present disclosure, all data collection, acquisition, processing, processing, forwarding, use, etc. are carried out on the premise that the user knows and confirms. Accordingly, when implementing each embodiment of the present disclosure, the type, scope of use, usage scenario, etc. of the data or information that may be involved should be informed to the user and the user's authorization should be obtained in an appropriate manner in accordance with the relevant laws and regulations. The specific notification and/or authorization method can vary according to the actual situation and application scenario, and the scope of the present disclosure is not limited in this respect.
本说明书及实施例中方案,如涉及个人信息处理,则均会在具备合法性基础(例如征得个人信息主体同意,或者为履行合同所必需等)的前提下进行处理,且仅会在规定或者约定的范围内进行处理。用户拒绝处理基本功能所需必要信息以外的个人信息,不会影响用户使用基本功能。In this specification and the embodiments, if personal information processing is involved, it will be processed on the premise of having a legal basis (such as obtaining the consent of the subject of personal information, or it is necessary to perform a contract, etc.), and will only be processed within the scope of regulations or agreements. If a user refuses to process personal information other than the necessary information for basic functions, it will not affect the user's use of basic functions.
如前文所简要提及的,对于视频内容而言,视频最初的内容对于用户是否继续观看视频具有较大的影响。传统的视频的对用户具有吸
引力的内容不一定总是位于视频头部,这导致人们可能会错过这样的部分。此外,通过人工方式来选择视频中的精彩片段是需要耗费大量的人力,且可能存在准确度不够的缺陷。As briefly mentioned above, for video content, the initial content of the video has a great influence on whether the user continues to watch the video. The attractive content is not always located at the top of the video, which may cause people to miss such parts. In addition, manually selecting the highlights in the video requires a lot of manpower and may have the defect of insufficient accuracy.
本公开的实施例提出了一种用于视频处理的方案。根据该方案,可以基于第一视频的访问信息,从第一视频中确定视频片段,其中访问信息指示第一视频的访问统计数据随视频时间的分布。进一步地,可以确定视频片段与第一视频的语义连续性。如果语义连续性高于阈值,则可以通过将视频片段附加至第一视频前,来生成第二视频。The embodiment of the present disclosure proposes a scheme for video processing. According to the scheme, a video segment can be determined from a first video based on access information of the first video, wherein the access information indicates the distribution of access statistics of the first video over the video time. Further, the semantic continuity between the video segment and the first video can be determined. If the semantic continuity is higher than a threshold, a second video can be generated by appending the video segment to the front of the first video.
通过这样的方式,本公开的实施例能够自动地从视频中确定出用户可能感兴趣的视频片段,并在这样的视频片段与原视频的语义连续性较好的情况下,将视频片段附加到原视频前。由此,本公开的实施例能够提高视频内容的片头吸引力,并且保证视频内容的语义连续性。In this way, the embodiment of the present disclosure can automatically determine the video clips that the user may be interested in from the video, and if the semantic continuity between the video clips and the original video is good, the video clips are added to the original video. Therefore, the embodiment of the present disclosure can improve the attractiveness of the video content and ensure the semantic continuity of the video content.
以下进一步结合附图来详细描述该方案的各种示例实现。Various example implementations of the solution are described in detail below in conjunction with the accompanying drawings.
示例环境Example Environment
图1示出了本公开的实施例能够在其中实现的示例环境100的示意图。如图1所示,环境100可以包括处理设备120。处理设备120可以包括任何适当的电子设备,其示例可以包括但不限于:移动设备、平板电脑、笔记本电脑、台式电脑、云端服务器、边缘计算设备等。FIG1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented. As shown in FIG1 , the environment 100 may include a processing device 120. The processing device 120 may include any suitable electronic device, examples of which may include but are not limited to: a mobile device, a tablet computer, a laptop computer, a desktop computer, a cloud server, an edge computing device, etc.
如图1所示,处理设备120可以获取第一视频110和第一视频110的访问信息130。在一些实施例中,第一视频110可以是由创作者发布的视频。示例性地,第一视频110例如可以包括广告视频,这样的广告视频例如可以接收用户的点击操作,并执行相应的交互,例如引导至对应的宣传页面或购买页面。As shown in FIG1 , the processing device 120 may obtain the first video 110 and the access information 130 of the first video 110. In some embodiments, the first video 110 may be a video released by a creator. Exemplarily, the first video 110 may include an advertisement video, which may receive a click operation from a user and perform corresponding interactions, such as guiding to a corresponding promotional page or purchase page.
在一些实施例中,访问信息130可以指示第一视频110的访问统计数据随视频时间的分布。这样的访问统计数据例如可以包括第一视频110的用户点击率和/或用户流失率。应当理解,如果视频在某个时刻的用户点击率较高,则可以表示用户对该时刻的内容更感兴趣;相反,如果视频在某个时刻的用户流失率较高(即,停止观看该视频的
用户的比例),则可以表示用户对该时刻的内容较不感兴趣。In some embodiments, the access information 130 may indicate the distribution of access statistics of the first video 110 over the video time. Such access statistics may include, for example, the user click rate and/or user churn rate of the first video 110. It should be understood that if the user click rate of a video at a certain moment is high, it may indicate that the user is more interested in the content at that moment; on the contrary, if the user churn rate of a video at a certain moment is high (i.e., the user stops watching the video), the user may be more interested in the content at that moment. The proportion of users at that moment may indicate that the users are less interested in the content at that moment.
如图1所示,处理设备120可以根据第一视频110的访问信息130,从第一视频110中确定视频片段140。这样的视频片段140可以是第一视频110的部分内容,或者是基于第一视频110的部分内容而被生成。1 , the processing device 120 may determine a video segment 140 from the first video 110 according to the access information 130 of the first video 110. Such a video segment 140 may be part of the content of the first video 110, or may be generated based on part of the content of the first video 110.
进一步地,处理设备120可以确定视频片段140与第一视频110的语义连续性,并且在语义连续性高于阈值的情况下,将视频片段140附加至第一视频110,以生成第二视频150。Further, the processing device 120 may determine the semantic continuity between the video segment 140 and the first video 110 , and if the semantic continuity is above a threshold, append the video segment 140 to the first video 110 to generate the second video 150 .
示例性地,视频片段140可以被编辑作为第二视频150的片头。关于生成视频片段140和确定语义连续性的过程将在下文结合图2详细描述。For example, the video segment 140 may be edited as the header of the second video 150. The process of generating the video segment 140 and determining the semantic continuity will be described in detail below in conjunction with FIG.
应当理解,仅出于示例性的目的描述环境100的结构和功能,而不暗示对于本公开的范围的任何限制。It should be understood that the structure and functionality of environment 100 are described for exemplary purposes only and does not imply any limitation on the scope of the present disclosure.
示例过程Example Process
图2示出了根据本公开的一些实施例的用于视频处理的示例过程200的流程图。过程200可以被实现在处理设备120处。下面参考图1来描述过程200。2 shows a flow chart of an example process 200 for video processing according to some embodiments of the present disclosure. Process 200 may be implemented at processing device 120. Process 200 is described below with reference to FIG.
如图2所示,在框210,处理设备120基于第一视频110的访问信息130,从第一视频110中确定视频片段140,其中访问信息130指示第一视频110的访问统计数据随视频时间的分布。As shown in FIG. 2 , at block 210 , the processing device 120 determines a video segment 140 from the first video 110 based on access information 130 of the first video 110 , wherein the access information 130 indicates distribution of access statistics of the first video 110 over video time.
在一些实施例中,处理设备120可以获取第一视频110,这样的第一视频110例如可以是已经发布的视频。示例性地,处理设备120可以从视频集中视频的播放数和/或点击数,从视频集中确定第一视频110。例如,处理设备120可以从公众可访问的视频库中根据视频的播放数和/或点击数获取播放数大于预设数目的第一视频110。In some embodiments, the processing device 120 may obtain the first video 110, such that the first video 110 may be, for example, a video that has been published. Exemplarily, the processing device 120 may determine the first video 110 from the video collection based on the number of views and/or clicks of the video in the video collection. For example, the processing device 120 may obtain the first video 110 whose number of views is greater than a preset number from a video library accessible to the public based on the number of views and/or clicks of the video.
进一步地,处理设备120还可以获取第一视频110的访问信息130。这样的访问信息130用于指示用户对第一视频110的感兴趣程度随视频时间的分布。
Furthermore, the processing device 120 may also obtain access information 130 of the first video 110. Such access information 130 is used to indicate the distribution of the user's interest in the first video 110 over the video time.
以第一视频110为广告视频作为示例,访问信息130例如可以包括第一视频110的视频点击率和/或视频流失率随第一视频110的视频时间的分布。Taking the first video 110 as an advertisement video as an example, the access information 130 may include, for example, the distribution of the video click rate and/or video loss rate of the first video 110 along with the video time of the first video 110 .
为了从第一视频110中确定用户可能更感兴趣的视频片段140,处理设备120可以根据访问统计数据来确定片段所对应的时间区间。具体地,处理设备120可以基于访问信息130确定第一视频110的目标时刻,其中第一视频在目标时刻的访问统计数据满足阈值要求。示例性地,这样的目标时刻可以是第一视频110的用户点击率最高的时刻和/或用户流失率最低的时刻。In order to determine the video segment 140 that the user may be more interested in from the first video 110, the processing device 120 may determine the time interval corresponding to the segment based on the access statistics. Specifically, the processing device 120 may determine the target moment of the first video 110 based on the access information 130, wherein the access statistics of the first video at the target moment meet the threshold requirement. Exemplarily, such a target moment may be the moment when the user click rate of the first video 110 is the highest and/or the moment when the user churn rate is the lowest.
进一步地,为了保证所确定的视频片段是语义连续的,处理设备120还可以基于对第一视频110的文本内容的语义识别,确定与目标时刻相关联的时间区间,其中时间区间对应的文本片段具有连续的语义。Furthermore, in order to ensure that the determined video segment is semantically continuous, the processing device 120 may also determine a time interval associated with the target moment based on semantic recognition of the text content of the first video 110, wherein the text segment corresponding to the time interval has continuous semantics.
示例性地,处理设备120可以对第一视频110的语音进行识别,以获得其文本内容。附加地,处理设备120可以根据对文本内容的语义识别,来获取与目标时刻相关联的、具有连续语义的文本片段。For example, the processing device 120 may recognize the speech of the first video 110 to obtain its text content. Additionally, the processing device 120 may obtain a text segment with continuous semantics associated with the target moment based on semantic recognition of the text content.
在一些实施例中,这样的文本片段的时间长度(即,所确定的时间区间的长度)需要满足预设的长度范围。例如,处理设备120可以基于目标时刻和语义信息来确定3秒到7秒的时间区间。In some embodiments, the time length of such a text segment (ie, the length of the determined time interval) needs to meet a preset length range. For example, the processing device 120 may determine a time interval of 3 seconds to 7 seconds based on the target time and semantic information.
在一些实施例中,除了考虑语义连续外,处理设备120还可以对文本内容添加标点,并基于添加标点后的文本内容来确定与目标时刻相关联的单个完整语句。In some embodiments, in addition to considering semantic continuity, the processing device 120 may also add punctuation to the text content, and determine a single complete sentence associated with the target moment based on the text content after the punctuation is added.
基于这样的方式,本公开的实施例能够保证所确定的时间区间所对应的文本内容是语义连续且完整的。Based on this approach, the embodiments of the present disclosure can ensure that the text content corresponding to the determined time interval is semantically continuous and complete.
进一步地,处理设备120可以获取与时间区间对应的视频片段140。示例性地,处理设备120可以将第一视频110的、与该时间区间对应的片段直接确定为视频片段140。Furthermore, the processing device 120 may obtain a video segment 140 corresponding to the time interval. Exemplarily, the processing device 120 may directly determine the segment of the first video 110 corresponding to the time interval as the video segment 140.
在一些实施例中,为了保证所生成的视频片段140在视觉上是拦蓄的,处理设备120还可以利用适当的分镜模型(例如,transnetv2
模型)来将第一视频110切分为一组分镜片段,这样的每个分镜片段例如可以对应于不同的分镜。In some embodiments, in order to ensure that the generated video segment 140 is visually readable, the processing device 120 may also use an appropriate storyboard model (e.g., TransNet V2 Model) is used to divide the first video 110 into a group of storyboard segments, each of which may correspond to a different storyboard, for example.
附加地,如果所确定的时间区间与多个分镜片段相关联,则处理设备120可以基于多个分镜片段生成视频片段。例如,处理设备120可以通过组合多个分镜片段来生成视频片段140。或者,处理设备120例如还可以在多个分镜片段之间添加淡入淡出等平滑效果来构建视频片段140。Additionally, if the determined time interval is associated with a plurality of storyboard segments, the processing device 120 may generate a video segment based on the plurality of storyboard segments. For example, the processing device 120 may generate the video segment 140 by combining the plurality of storyboard segments. Alternatively, the processing device 120 may also add a smoothing effect such as fade-in and fade-out between the plurality of storyboard segments to construct the video segment 140.
通过这样的方式,本公开的实施例能够提出语义连续、语义完整且分镜连续的视频片段140。In this way, the embodiments of the present disclosure can provide a video segment 140 that is semantically continuous, semantically complete, and storyboard-continuous.
在一些实施例中,如果所确定的目标时刻落入目标时间范围内,则处理设备120可以不进行视频片段140的生成。这样的目标时间范围例如可以包括与第一视频110的起始时刻相关的第一预设时长(例如,视频最开始五秒),和/或与第一视频110的结束时刻相关的第二预设时长(例如,视频最后五秒)。In some embodiments, if the determined target moment falls within the target time range, the processing device 120 may not generate the video segment 140. Such a target time range may, for example, include a first preset duration associated with the start moment of the first video 110 (e.g., the first five seconds of the video), and/or a second preset duration associated with the end moment of the first video 110 (e.g., the last five seconds of the video).
相反,如果所确定的目标时刻未落入目标时间范围内,则处理设备120可以执行时间区间的确定,以生成视频片段140。On the contrary, if the determined target moment does not fall within the target time range, the processing device 120 may perform determination of a time interval to generate the video segment 140 .
继续参考图2,在框220,处理设备120确定视频片段140与第一视频110的语义连续性。Continuing with reference to FIG. 2 , at block 220 , the processing device 120 determines semantic continuity of the video segment 140 with the first video 110 .
在一些实施例中,为了保证所添加的片头能够与原视频具有较好的连续性,处理设备120可以基于视频片段140的特征和第一视频110的特征来确定语义连续性。In some embodiments, in order to ensure that the added header has good continuity with the original video, the processing device 120 may determine semantic continuity based on the features of the video segment 140 and the features of the first video 110 .
在一些实施例中,处理设备120可以利用分析模型处理视频片段140和第一视频110的特征来确定语义连续性,其中特征包括以下至少一项:视频的视觉特征,视频的语音特征或视频的文本特征。In some embodiments, the processing device 120 may process features of the video segment 140 and the first video 110 using an analysis model to determine semantic continuity, wherein the features include at least one of the following: visual features of the video, speech features of the video, or text features of the video.
示例性地,处理设备120可以利用适当的机器学习模型来作为分析模型,并可以将两个视频的视觉特征、语音特征、文本特征和/或其它适当的特征或特征组合作为分析模型的输入来确定两个视频之间的语义连续性。Exemplarily, the processing device 120 may utilize an appropriate machine learning model as an analysis model, and may use visual features, speech features, text features and/or other appropriate features or feature combinations of the two videos as inputs to the analysis model to determine the semantic continuity between the two videos.
在一些实施例中,为了训练机器学习模型以具备判断视频的语义
连续性的能力,适当的训练设备(可以与处理设备120相同或不同)可以利用样本数据来分析模型。这样的样本数据例如可以包括正样本数据和/或负样本数据。In some embodiments, in order to train a machine learning model to have the ability to judge the semantics of a video To improve the ability of continuous training, a suitable training device (which may be the same as or different from the processing device 120) can use sample data to analyze the model. Such sample data may include positive sample data and/or negative sample data, for example.
在一些实施例中,这样的正样本数据例如可以包括第一视频片段和第二视频片段,其中第一视频片段与参考视频的首个语义连续镜头相关,并且第二视频是参考视频中不同于第一视频片段的其它视频片段。In some embodiments, such positive sample data may include, for example, a first video segment and a second video segment, wherein the first video segment is related to a first semantic continuous shot of a reference video, and the second video segment is another video segment in the reference video that is different from the first video segment.
示例性地,训练设备例如可以基于分镜模型而从已经发布的参考视频中提取其收个语义连续的镜头。进一步地,训练设备可以将这样的镜头所对应的视频片段与该参考视频的其它视频片段确定作为正样本数据,以指示这样的两个视频片段是语义连续的。For example, the training device can extract a semantically continuous shot from a published reference video based on the storyboard model. Further, the training device can determine the video segment corresponding to such a shot and other video segments of the reference video as positive sample data to indicate that such two video segments are semantically continuous.
在一些实施例中,这样的负样本数据可以包括来自于不同视频的第三视频片段和第四视频片段,以指示这样的视频片段是不连续的。In some embodiments, such negative sample data may include a third video segment and a fourth video segment from different videos to indicate that such video segments are discontinuous.
基于这样的方式,本公开的实施例能够自动且高效地判断所生成的视频片段与原视频内容之间的语义连续性。Based on this approach, the embodiments of the present disclosure can automatically and efficiently determine the semantic continuity between the generated video clip and the original video content.
在框230,响应于语义连续性高于阈值,处理设备120通过将视频片段140附加至第一视频110前,生成第二视频150。At block 230 , in response to the semantic continuity being above the threshold, the processing device 120 generates the second video 150 by appending the video segment 140 to the front of the first video 110 .
示例性地,分析模型例如可以将语义连续性输出作为连续的数值(例如,0至1之间的值,以指示其语义连续性),或者也可以将语义连续性输出为离散的数值(例如,0代表不连续,1代表连续)。Exemplarily, the analysis model may output the semantic continuity as a continuous numerical value (eg, a value between 0 and 1 to indicate its semantic continuity), or may output the semantic continuity as a discrete numerical value (eg, 0 for discontinuity and 1 for continuity).
如果这样的语义连续性高于阈值(例如,连续数值大于某个阈值数值,离散数值大于0),则处理设备120可以确定所生成的视频片段140适于被附加到第一视频110之前。If such semantic continuity is above a threshold (eg, the continuous value is greater than a certain threshold value, and the discrete value is greater than 0), the processing device 120 may determine that the generated video segment 140 is suitable for being appended to the first video 110 .
进一步地,处理设备120通过将视频片段140编辑至第一视频110前,来生成第二视频150。由此,所生成的第二视频150能够具有更加吸引用户的片头内容,并且并不影响第二视频150的连续观看体验。Furthermore, the processing device 120 generates the second video 150 by editing the video clip 140 before the first video 110. Thus, the generated second video 150 can have a title content that is more attractive to users, and does not affect the continuous viewing experience of the second video 150.
在一些实施例中,处理设备120还可以进一步发布经编辑的第二视频150。In some embodiments, the processing device 120 may further publish the edited second video 150 .
由此,本公开的实施例能够利用视频的后延知识(例如,视频的
访问信息)来执行对视频的智能剪辑,从而创作出能够更加吸引用户的视频内容,并能够保证这样的视频内容的观看体验。Thus, the embodiments of the present disclosure can utilize the delayed knowledge of the video (e.g., Access information) to perform intelligent editing of videos, thereby creating video content that is more attractive to users and ensuring the viewing experience of such video content.
示例装置和设备Example devices and equipment
本公开的实施例还提供了用于实现上述方法或过程的相应装置。The embodiments of the present disclosure also provide corresponding devices for implementing the above methods or processes.
图3示出了根据本公开的某些实施例的用于视频处理的装置300的示意性结构框图。装置300可以被实现为或者被包括在处理设备120中。装置300中的各个模块/组件可以由硬件、软件、固件或者它们的任意组合来实现。3 shows a schematic structural block diagram of an apparatus 300 for video processing according to some embodiments of the present disclosure. The apparatus 300 may be implemented as or included in the processing device 120. Each module/component in the apparatus 300 may be implemented by hardware, software, firmware or any combination thereof.
装置300包括确定模块310,被配置为基于第一视频的访问信息,从第一视频中确定视频片段,访问信息指示第一视频的访问统计数据随视频时间的分布。The apparatus 300 comprises a determination module 310 configured to determine a video segment from the first video based on access information of the first video, wherein the access information indicates a distribution of access statistics of the first video over video time.
装置300还包括判断模块320,被配置为确定视频片段与第一视频的语义连续性。The apparatus 300 further includes a determination module 320 configured to determine semantic continuity between the video segment and the first video.
此外,装置300还包括编辑模块330,被配置为响应于语义连续性高于阈值,通过将视频片段附加至第一视频前,生成第二视频。In addition, the apparatus 300 further comprises an editing module 330 configured to generate a second video by appending a video segment to the front of the first video in response to the semantic continuity being higher than a threshold.
在一些实施例中,确定模块310还被配置为:基于访问信息,确定第一视频的目标时刻,其中第一视频在目标时刻的访问统计数据满足阈值要求;基于对第一视频的文本内容的语义识别,确定与目标时刻相关联的时间区间,其中时间区间对应的文本片段具有连续的语义;以及获取与时间区间对应的视频片段。In some embodiments, the determination module 310 is further configured to: determine the target moment of the first video based on the access information, wherein the access statistics of the first video at the target moment meet the threshold requirement; determine the time interval associated with the target moment based on the semantic recognition of the text content of the first video, wherein the text segment corresponding to the time interval has continuous semantics; and obtain the video segment corresponding to the time interval.
在一些实施例中,确定模块310还被配置为:利用分镜模型,将第一视频切分为一组分镜片段;以及响应于时间区间与多个分镜片段相关联,基于多个分镜片段生成视频片段。In some embodiments, the determination module 310 is further configured to: divide the first video into a group of storyboard segments using a storyboard model; and generate a video segment based on the multiple storyboard segments in response to the time interval being associated with the multiple storyboard segments.
在一些实施例中,时间区间的长度在预设的长度范围内。In some embodiments, the length of the time interval is within a preset length range.
在一些实施例中,文本片段对应于文本内容中的单个完整语句,单个完整语句是基于对文本内容添加标点而被确定的。In some embodiments, the text segment corresponds to a single complete sentence in the text content, and the single complete sentence is determined based on adding punctuation to the text content.
在一些实施例中,确定模块310还被配置为:响应于目标时刻未落入目标时间范围内,确定时间区间,其中目标时间范围包括:与第
一视频的起始时刻相关的第一预设时长,和/或与第一视频的结束时刻相关的第二预设时长。In some embodiments, the determination module 310 is further configured to: in response to the target time not falling within the target time range, determine a time interval, wherein the target time range includes: A first preset duration associated with a start time of a video, and/or a second preset duration associated with an end time of the first video.
在一些实施例中,判断模块320还被配置为:利用分析模型处理视频片段和第一视频的特征,确定语义连续性,特征包括以下至少一项:视频的视觉特征,视频的语音特征或视频的文本特征。In some embodiments, the judgment module 320 is further configured to: use the analysis model to process features of the video clip and the first video to determine semantic continuity, the features including at least one of the following: visual features of the video, voice features of the video, or text features of the video.
在一些实施例中,分析模型基于以下样本数据而被训练:正样本数据,包括第一视频片段和第二视频片段,第一视频片段与参考视频的首个语义连续镜头相关,第二视频是参考视频中不同于第一视频片段的其它视频片段;或者负样本数据,包括来自于不同视频的第三视频片段和第四视频片段。In some embodiments, the analysis model is trained based on the following sample data: positive sample data, including a first video clip and a second video clip, the first video clip is related to the first semantic continuous shot of a reference video, and the second video is other video clips in the reference video that are different from the first video clip; or negative sample data, including a third video clip and a fourth video clip from different videos.
在一些实施例中,访问统计数据至少指示:视频点击率和/或用户流失率。In some embodiments, the access statistics indicate at least: video click-through rate and/or user churn rate.
在一些实施例中,装置300还包括视频选择模块,被配置为基于视频集中视频的播放数和/或点击数,从视频集中确定第一视频。In some embodiments, the apparatus 300 further includes a video selection module configured to determine a first video from the video set based on the number of plays and/or clicks of the videos in the video set.
在一些实施例中,第一视频包括广告视频。In some embodiments, the first video comprises an advertisement video.
图4示出了其中可以实施本公开的一个或多个实施例的电子设备400的框图。应当理解,图4所示出的电子设备400仅仅是示例性的,而不应当构成对本文所描述的实施例的功能和范围的任何限制。图4所示出的电子设备400可以用于实现图1的处理设备120。FIG4 shows a block diagram of an electronic device 400 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the electronic device 400 shown in FIG4 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein. The electronic device 400 shown in FIG4 may be used to implement the processing device 120 of FIG1 .
如图4所示,电子设备400是通用电子设备的形式。电子设备400的组件可以包括但不限于一个或多个处理器或处理单元410、存储器420、存储设备430、一个或多个通信单元440、一个或多个输入设备450以及一个或多个输出设备460。处理单元410可以是实际或虚拟处理器并且能够根据存储器420中存储的程序来执行各种处理。在多处理器系统中,多个处理单元并行执行计算机可执行指令,以提高电子设备400的并行处理能力。As shown in FIG4 , the electronic device 400 is in the form of a general electronic device. The components of the electronic device 400 may include, but are not limited to, one or more processors or processing units 410, a memory 420, a storage device 430, one or more communication units 440, one or more input devices 450, and one or more output devices 460. The processing unit 410 may be an actual or virtual processor and is capable of performing various processes according to a program stored in the memory 420. In a multi-processor system, multiple processing units execute computer executable instructions in parallel to improve the parallel processing capability of the electronic device 400.
电子设备400通常包括多个计算机存储介质。这样的介质可以是电子设备400可访问的任何可以获取的介质,包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器420可以是易失性存
储器(例如寄存器、高速缓存、随机访问存储器(RAM))、非易失性存储器(例如,只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、闪存)或它们的某种组合。存储设备430可以是可拆卸或不可拆卸的介质,并且可以包括机器可读介质,诸如闪存驱动、磁盘或者任何其他介质,其可以能够用于存储信息和/或数据(例如用于训练的训练数据)并且可以在电子设备400内被访问。The electronic device 400 typically includes a plurality of computer storage media. Such media may be any accessible media that the electronic device 400 can access, including but not limited to volatile and non-volatile media, removable and non-removable media. The memory 420 may be a volatile memory. The storage device 430 may be a removable or non-removable medium and may include a machine-readable medium such as a flash drive, a disk, or any other medium that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within the electronic device 400.
电子设备400可以进一步包括另外的可拆卸/不可拆卸、易失性/非易失性存储介质。尽管未在图4中示出,可以提供用于从可拆卸、非易失性磁盘(例如“软盘”)进行读取或写入的磁盘驱动和用于从可拆卸、非易失性光盘进行读取或写入的光盘驱动。在这些情况中,每个驱动可以由一个或多个数据介质接口被连接至总线(未示出)。存储器420可以包括计算机程序产品425,其具有一个或多个程序模块,这些程序模块被配置为执行本公开的各种实施例的各种方法或动作。The electronic device 400 may further include additional removable/non-removable, volatile/non-volatile storage media. Although not shown in FIG. 4 , a disk drive for reading or writing from a removable, non-volatile disk (e.g., a “floppy disk”) and an optical drive for reading or writing from a removable, non-volatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. The memory 420 may include a computer program product 425 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
通信单元440实现通过通信介质与其他电子设备进行通信。附加地,电子设备400的组件的功能可以以单个计算集群或多个计算机器来实现,这些计算机器能够通过通信连接进行通信。因此,电子设备400可以使用与一个或多个其他服务器、网络个人计算机(PC)或者另一个网络节点的逻辑连接来在联网环境中进行操作。The communication unit 440 implements communication with other electronic devices through a communication medium. Additionally, the functions of the components of the electronic device 400 can be implemented with a single computing cluster or multiple computing machines that can communicate through a communication connection. Therefore, the electronic device 400 can operate in a networked environment using a logical connection with one or more other servers, a network personal computer (PC), or another network node.
输入设备450可以是一个或多个输入设备,例如鼠标、键盘、追踪球等。输出设备460可以是一个或多个输出设备,例如显示器、扬声器、打印机等。电子设备400还可以根据需要通过通信单元440与一个或多个外部设备(未示出)进行通信,外部设备诸如存储设备、显示设备等,与一个或多个使得用户与电子设备400交互的设备进行通信,或者与使得电子设备400与一个或多个其他电子设备通信的任何设备(例如,网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。The input device 450 may be one or more input devices, such as a mouse, a keyboard, a tracking ball, etc. The output device 460 may be one or more output devices, such as a display, a speaker, a printer, etc. The electronic device 400 may also communicate with one or more external devices (not shown) through the communication unit 440 as needed, such as a storage device, a display device, etc., communicate with one or more devices that allow a user to interact with the electronic device 400, or communicate with any device that allows the electronic device 400 to communicate with one or more other electronic devices (e.g., a network card, a modem, etc.). Such communication may be performed via an input/output (I/O) interface (not shown).
根据本公开的示例性实现方式,提供了一种计算机可读存储介质,其上存储有计算机可执行指令,其中计算机可执行指令被处理器执行
以实现上文描述的方法。根据本公开的示例性实现方式,还提供了一种计算机程序产品,计算机程序产品被有形地存储在非瞬态计算机可读介质上并且包括计算机可执行指令,而计算机可执行指令被处理器执行以实现上文描述的方法。According to an exemplary implementation of the present disclosure, a computer-readable storage medium is provided, on which computer-executable instructions are stored, wherein the computer-executable instructions are executed by a processor. According to an exemplary implementation of the present disclosure, a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, and the computer-executable instructions are executed by a processor to implement the method described above.
这里参照根据本公开实现的方法、装置、设备和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Various aspects of the present disclosure are described herein with reference to the flowcharts and/or block diagrams of the methods, devices, equipment, and computer program products implemented according to the present disclosure. It should be understood that each box in the flowchart and/or block diagram and the combination of each box in the flowchart and/or block diagram can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其他可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processing unit of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
可以把计算机可读程序指令加载到计算机、其他可编程数据处理装置、或其他设备上,使得在计算机、其他可编程数据处理装置或其他设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其他可编程数据处理装置、或其他设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, so that the instructions executed on the computer, other programmable data processing apparatus, or other device implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
附图中的流程图和框图显示了根据本公开的多个实现的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这
依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings show possible architectures, functions, and operations of systems, methods, and computer program products according to multiple implementations of the present disclosure. In this regard, each box in the flowchart or block diagram may represent a module, a program segment, or a portion of an instruction, which contains one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the boxes may also occur in an order different from that marked in the accompanying drawings. For example, two consecutive boxes may actually be executed substantially in parallel, and they may sometimes be executed in the opposite order. It depends on the functions involved. It should also be noted that each block in the block diagram and/or flow chart, and combinations of blocks in the block diagram and/or flow chart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions, or can be implemented by a combination of dedicated hardware and computer instructions.
以上已经描述了本公开的各实现,上述说明是示例性的,并非穷尽性的,并且也不限于所公开的各实现。在不偏离所说明的各实现的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实现的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文公开的各个实现方式。
The above descriptions of various implementations of the present disclosure are exemplary, non-exhaustive, and not limited to the disclosed implementations. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described implementations. The selection of terms used herein is intended to best explain the principles of the implementations, practical applications, or improvements to the technology in the market, or to enable other persons of ordinary skill in the art to understand the various implementations disclosed herein.
Claims (14)
- 一种视频处理的方法,包括:A video processing method, comprising:基于第一视频的访问信息,从所述第一视频中确定视频片段,所述访问信息指示所述第一视频的访问统计数据随视频时间的分布;Determining a video segment from the first video based on access information of the first video, the access information indicating a distribution of access statistics of the first video over video time;确定所述视频片段与所述第一视频的语义连续性;以及determining semantic continuity between the video segment and the first video; and响应于所述语义连续性高于阈值,通过将所述视频片段附加至所述第一视频前,生成第二视频。In response to the semantic continuity being higher than a threshold, a second video is generated by appending the video segment to the front of the first video.
- 根据权利要求1所述的方法,其中确定所述视频片段包括:The method of claim 1, wherein determining the video segment comprises:基于所述访问信息,确定所述第一视频的目标时刻,其中所述第一视频在所述目标时刻的所述访问统计数据满足阈值要求;Determining a target time of the first video based on the access information, wherein the access statistics data of the first video at the target time meets a threshold requirement;基于对所述第一视频的文本内容的语义识别,确定与所述目标时刻相关联的时间区间,其中所述时间区间对应的文本片段具有连续的语义;以及Determining a time interval associated with the target moment based on semantic recognition of text content of the first video, wherein a text segment corresponding to the time interval has continuous semantics; and获取与所述时间区间对应的所述视频片段。The video segment corresponding to the time interval is obtained.
- 根据权利要求2所述的方法,其中获取与所述时间区间对应的所述视频片段包括:The method according to claim 2, wherein obtaining the video segment corresponding to the time interval comprises:利用分镜模型,将所述第一视频切分为一组分镜片段;以及Using the storyboard model, dividing the first video into a group of storyboard segments; and响应于所述时间区间与多个分镜片段相关联,基于所述多个分镜片段生成所述视频片段。In response to the time interval being associated with a plurality of storyboard segments, the video segment is generated based on the plurality of storyboard segments.
- 根据权利要求2所述的方法,其中所述时间区间的长度在预设的长度范围内。The method according to claim 2, wherein the length of the time interval is within a preset length range.
- 根据权利要求2所述的方法,其中所述文本片段对应于所述文本内容中的单个完整语句,所述单个完整语句是基于对所述文本内容添加标点而被确定的。The method according to claim 2, wherein the text segment corresponds to a single complete sentence in the text content, and the single complete sentence is determined based on adding punctuation to the text content.
- 根据权利要求2所述的方法,其中确定所述时间区间包括:The method according to claim 2, wherein determining the time interval comprises:响应于所述目标时刻未落入目标时间范围内,确定所述时间区间,其中所述目标时间范围包括:与所述第一视频的起始时刻相关的第一预设时长,和/或与所述第一视频的结束时刻相关的第二预设时长。 In response to the target moment not falling within the target time range, the time interval is determined, wherein the target time range includes: a first preset duration related to the start moment of the first video, and/or a second preset duration related to the end moment of the first video.
- 根据权利要求1所述的方法,其中确定所述视频片段与所述第一视频的语义连续性包括:The method of claim 1, wherein determining semantic continuity between the video segment and the first video comprises:利用分析模型处理所述视频片段和所述第一视频的特征,确定所述语义连续性,所述特征包括以下至少一项:视频的视觉特征,视频的语音特征或视频的文本特征。The semantic continuity is determined by processing features of the video clip and the first video using an analysis model, wherein the features include at least one of the following: visual features of the video, voice features of the video, or text features of the video.
- 根据权利要求7所述的方法,其中所述分析模型基于以下样本数据而被训练:The method according to claim 7, wherein the analysis model is trained based on the following sample data:正样本数据,包括第一视频片段和第二视频片段,所述第一视频片段与参考视频的首个语义连续镜头相关,所述第二视频是所述参考视频中不同于所述第一视频片段的其它视频片段;或者Positive sample data includes a first video segment and a second video segment, wherein the first video segment is related to the first semantic continuous shot of a reference video, and the second video segment is another video segment in the reference video that is different from the first video segment; or负样本数据,包括来自于不同视频的第三视频片段和第四视频片段。The negative sample data includes a third video segment and a fourth video segment from different videos.
- 根据权利要求1所述的方法,其中所述访问统计数据至少指示:视频点击率和/或用户流失率。The method according to claim 1, wherein the access statistics at least indicate: video click-through rate and/or user churn rate.
- 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:基于视频集中视频的播放数和/或点击数,从所述视频集中确定第一视频。A first video is determined from the video set based on the number of views and/or clicks of the videos in the video set.
- 根据权利要求1所述的方法,其中所述第一视频包括广告视频。The method of claim 1, wherein the first video comprises an advertisement video.
- 一种用于视频处理的装置,包括:A device for video processing, comprising:确定模块,被配置为基于第一视频的访问信息,从所述第一视频中确定视频片段,所述访问信息指示所述第一视频的访问统计数据随视频时间的分布;a determination module configured to determine a video segment from a first video based on access information of the first video, wherein the access information indicates a distribution of access statistics of the first video over video time;判断模块,被配置为确定所述视频片段与所述第一视频的语义连续性;以及a determination module, configured to determine semantic continuity between the video segment and the first video; and编辑模块,被配置为响应于所述语义连续性高于阈值,通过将所述视频片段附加至所述第一视频前,生成第二视频。The editing module is configured to generate a second video by appending the video segment to the front of the first video in response to the semantic continuity being higher than a threshold.
- 一种电子设备,包括:An electronic device, comprising:至少一个处理单元;以及 at least one processing unit; and至少一个存储器,所述至少一个存储器被耦合到所述至少一个处理单元并且存储用于由所述至少一个处理单元执行的指令,所述指令在由所述至少一个处理单元执行时使所述电子设备执行根据权利要求1至11中任一项所述的方法。At least one memory, the at least one memory being coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions causing the electronic device to perform the method according to any one of claims 1 to 11 when executed by the at least one processing unit.
- 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序可由处理器执行以实现根据权利要求1至11中任一项所述的方法。 A computer-readable storage medium having a computer program stored thereon, wherein the computer program can be executed by a processor to implement the method according to any one of claims 1 to 11.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310126804.5A CN116112743A (en) | 2023-02-01 | 2023-02-01 | Video processing method, device, equipment and storage medium |
CN202310126804.5 | 2023-02-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024160260A1 true WO2024160260A1 (en) | 2024-08-08 |
Family
ID=86265334
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/075333 WO2024160260A1 (en) | 2023-02-01 | 2024-02-01 | Video processing method and apparatus, device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116112743A (en) |
WO (1) | WO2024160260A1 (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030234803A1 (en) * | 2002-06-19 | 2003-12-25 | Kentaro Toyama | System and method for automatically generating video cliplets from digital video |
WO2018062795A1 (en) * | 2016-09-28 | 2018-04-05 | (주) 프람트 | Timeline-based social network service providing system |
US10455297B1 (en) * | 2018-08-29 | 2019-10-22 | Amazon Technologies, Inc. | Customized video content summary generation and presentation |
CN111935503A (en) * | 2020-06-28 | 2020-11-13 | 百度在线网络技术(北京)有限公司 | Short video generation method and device, electronic equipment and storage medium |
US10917704B1 (en) * | 2019-11-12 | 2021-02-09 | Amazon Technologies, Inc. | Automated video preview generation |
CN114245229A (en) * | 2022-01-29 | 2022-03-25 | 北京百度网讯科技有限公司 | Short video production method, device, equipment and storage medium |
CN115460455A (en) * | 2022-09-06 | 2022-12-09 | 上海硬通网络科技有限公司 | Video editing method, device, equipment and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113542845B (en) * | 2020-04-16 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Information display method, device, equipment and storage medium |
CN111988638B (en) * | 2020-08-19 | 2022-02-18 | 北京字节跳动网络技术有限公司 | Method and device for acquiring spliced video, electronic equipment and storage medium |
CN113099129A (en) * | 2021-01-27 | 2021-07-09 | 北京字跳网络技术有限公司 | Video generation method and device, electronic equipment and storage medium |
CN115086709A (en) * | 2021-03-10 | 2022-09-20 | 上海哔哩哔哩科技有限公司 | Dynamic cover setting method and system |
CN114445754A (en) * | 2022-01-29 | 2022-05-06 | 北京有竹居网络技术有限公司 | Video processing method and device, readable medium and electronic equipment |
CN115052188B (en) * | 2022-05-09 | 2024-08-27 | 北京有竹居网络技术有限公司 | Video editing method, device, equipment and medium |
-
2023
- 2023-02-01 CN CN202310126804.5A patent/CN116112743A/en active Pending
-
2024
- 2024-02-01 WO PCT/CN2024/075333 patent/WO2024160260A1/en unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030234803A1 (en) * | 2002-06-19 | 2003-12-25 | Kentaro Toyama | System and method for automatically generating video cliplets from digital video |
WO2018062795A1 (en) * | 2016-09-28 | 2018-04-05 | (주) 프람트 | Timeline-based social network service providing system |
US10455297B1 (en) * | 2018-08-29 | 2019-10-22 | Amazon Technologies, Inc. | Customized video content summary generation and presentation |
US10917704B1 (en) * | 2019-11-12 | 2021-02-09 | Amazon Technologies, Inc. | Automated video preview generation |
CN111935503A (en) * | 2020-06-28 | 2020-11-13 | 百度在线网络技术(北京)有限公司 | Short video generation method and device, electronic equipment and storage medium |
CN114245229A (en) * | 2022-01-29 | 2022-03-25 | 北京百度网讯科技有限公司 | Short video production method, device, equipment and storage medium |
CN115460455A (en) * | 2022-09-06 | 2022-12-09 | 上海硬通网络科技有限公司 | Video editing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116112743A (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11375295B2 (en) | Method and device for obtaining video clip, server, and storage medium | |
US10997222B2 (en) | Conversational agent dialog flow user interface | |
US9438850B2 (en) | Determining importance of scenes based upon closed captioning data | |
US8856636B1 (en) | Methods and systems for trimming video footage | |
US12086503B2 (en) | Audio segment recommendation | |
CN112511854B (en) | Live video highlight generation method, device, medium and equipment | |
JP2019511036A (en) | System and method for linguistic feature generation across multiple layer word representations | |
US20140164371A1 (en) | Extraction of media portions in association with correlated input | |
WO2023173539A1 (en) | Video content processing method and system, and terminal and storage medium | |
US12026354B2 (en) | Video generation | |
US20140161423A1 (en) | Message composition of media portions in association with image content | |
US20180189249A1 (en) | Providing application based subtitle features for presentation | |
US11677991B1 (en) | Creating automatically a short clip summarizing highlights of a video stream | |
KR101367195B1 (en) | Method for determining meaningful view time in lecture video and apparatus and method for providing lecture video service using the same | |
CN113923479A (en) | Audio and video editing method and device | |
US20140163956A1 (en) | Message composition of media portions in association with correlated text | |
CN111723235B (en) | Music content identification method, device and equipment | |
WO2024160260A1 (en) | Video processing method and apparatus, device, and storage medium | |
WO2021153403A1 (en) | Text information editing device and text information editing method | |
EP4226234A1 (en) | Systems and methods for generating an advertising-elasticity model using natural-language search | |
WO2024093442A1 (en) | Method and apparatus for checking audiovisual content, and device and storage medium | |
CN117319728A (en) | Method, apparatus, device and storage medium for audio-visual content sharing | |
CN114661941B (en) | Click rate prediction model construction method, device, computer equipment and storage medium | |
WO2024093937A1 (en) | Method and apparatus for viewing audio-visual content, device, and storage medium | |
CN118509666A (en) | Method, apparatus, device and storage medium for video generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24749739 Country of ref document: EP Kind code of ref document: A1 |