[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024193538A1 - 视频数据处理方法、装置、设备及可读存储介质 - Google Patents

视频数据处理方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2024193538A1
WO2024193538A1 PCT/CN2024/082438 CN2024082438W WO2024193538A1 WO 2024193538 A1 WO2024193538 A1 WO 2024193538A1 CN 2024082438 W CN2024082438 W CN 2024082438W WO 2024193538 A1 WO2024193538 A1 WO 2024193538A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
work
processed
videos
attribute information
Prior art date
Application number
PCT/CN2024/082438
Other languages
English (en)
French (fr)
Inventor
余萌
邓昉熙
潘德辉
Original Assignee
北京搜狗科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京搜狗科技发展有限公司 filed Critical 北京搜狗科技发展有限公司
Publication of WO2024193538A1 publication Critical patent/WO2024193538A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream

Definitions

  • the present application relates to the field of computer technology, and in particular to a video data processing method, device, equipment and readable storage medium.
  • the query result display box will also recommend film and television commentary videos related to the movie or TV series IP.
  • video editing is simple to get started, most editing bloggers are not professionals, and there is no fixed editing direction and editing arrangement when editing videos, resulting in the edited film and television commentary videos may not be added with video information (such as title or episode number) or the added video information is not accurate enough.
  • the current query mechanism is to queried the film and television commentary videos associated with the search keywords entered by the user, and all these film and television commentary videos will be mixed in the query result display box, which may result in the presence of film and television commentary videos with incomplete video information in the query result display box, and then it is difficult to present the viewing order between the film and television commentary videos under the same movie or TV series IP, which requires users to click on the film and television commentary videos in the query result display box one by one to watch them, in order to determine the film and television commentary videos of interest and the viewing order between these film and television commentary videos, resulting in poor presentation of the searched film and television commentary videos.
  • the embodiments of the present application provide a video data processing method, apparatus, device and readable storage medium, which can improve the presentation effect of searched film and television commentary videos.
  • An embodiment of the present application provides a video data processing method, including:
  • M is a positive integer
  • Feature extraction is performed on each of the M videos to be processed to obtain video attribute information corresponding to each video to be processed, and source label information corresponding to each video to be processed is obtained;
  • the video attribute information includes work attribute information and episode attribute information;
  • the video to be sorted is sorted to obtain the sorted video. If the episode attribute information corresponding to the sorted video meets the legal condition of the episode, the sorted video is determined to be an ordered album video, and a video album set containing the ordered album video is generated; the video album set is used to be displayed in the query result display box when the query data matches the work attribute information or source label information corresponding to the ordered album video.
  • An embodiment of the present application provides a video data processing method, including:
  • a recommendation result display area is displayed in a query result display box on the application page;
  • the ordered album videos contained in the target video album set are displayed in sequence;
  • the target video album set is a video album set whose work attribute information or source tag information matches the target query data, and the target video album set includes one or more ordered album videos corresponding to the work attribute information;
  • the display order of the ordered album videos with the same work attribute information is sorted according to the episode number order between the corresponding episode attribute information;
  • the ordered album videos in the target video album set belong to the commentary video type.
  • An embodiment of the present application provides a video data processing device, including:
  • the acquisition module is used to acquire M videos to be processed; M is a positive integer;
  • a feature extraction module is used to extract features from the M videos to be processed, obtain video attribute information corresponding to each video to be processed, and obtain source label information corresponding to each video to be processed;
  • the video attribute information includes work attribute information and episode attribute information;
  • the video determination module is used to classify the M videos to be processed according to the source label information to obtain an initial video set, and determine the videos to be processed with the target work attribute information in the initial video set as the videos to be sorted; each video to be processed in the initial video set has the same source label information; the work attribute information involved in the M videos to be processed includes the target work attribute information;
  • a generation module is used to sort the videos to be sorted according to the episode attribute information corresponding to the videos to be sorted, so as to obtain the sorted videos. If the episode attribute information corresponding to the sorted videos meets the legal episode condition, the sorted videos are determined to be ordered album videos, and a video album set containing the ordered album videos is generated; the video album set is used to be displayed in the query result display box when the query data matches the work attribute information or source tag information corresponding to the ordered album videos.
  • the M videos to be processed include a video to be processed M i , where i is a positive integer less than or equal to M;
  • Feature extraction module including:
  • the first extraction unit is used to extract the work attribute of the video to be processed M i and obtain the work attribute information corresponding to the video to be processed M i ;
  • the second extraction unit is used to perform episode attribute extraction processing on the video to be processed Mi to obtain episode attribute information corresponding to the video to be processed Mi.
  • the first extraction unit comprises:
  • the frame retrieval subunit is used to perform sampling processing on the video to be processed M i to obtain a video frame image
  • the frame retrieval subunit is further used to perform picture matching processing on the video frame image and the video works in the video work library, respectively, to obtain the picture similarity between the video works in the video work library and the video frame image;
  • the frame retrieval subunit is further used to determine the video work with the highest screen similarity to the video frame image as the target video work;
  • the frame retrieval subunit is also used to determine the video work attribute information corresponding to the target video work as the work attribute information corresponding to the video to be processed Mi if the picture similarity between the video frame image and the target video work is greater than or equal to the picture similarity threshold.
  • the first extraction unit is specifically used to perform equal-interval sampling processing on the video to be processed M i to obtain multiple video frame images, traverse the multiple video frame images, and perform screen matching processing on the i-th video frame image in the multiple video frame images with the video works in the video work library to obtain the screen similarity between the video works in the video work library and the i-th video frame image; i is a positive integer less than or equal to the number of the multiple video frame images;
  • the first extraction unit is specifically used to obtain the video work with the highest picture similarity with the i-th video frame image as the pending video work corresponding to the i-th video frame image, mark the pending video work corresponding to the i-th video frame image, and when the pending video work corresponding to each video frame image in multiple video frame images is marked, the video work attribute information corresponding to the pending video work with the largest number of markings is determined as the work attribute information corresponding to the video to be processed M i .
  • the first extraction unit comprises:
  • the template matching subunit is used to obtain the video title information corresponding to the video M i to be processed;
  • the template matching subunit is further used to perform structural matching processing on the video title information and the title templates in the title template library to obtain the structural similarity between the title templates in the title template library and the video title information;
  • the template matching subunit is further used to determine the title template with the highest structural similarity to the video title information as the target title template;
  • the template matching subunit is also used to extract information from the video title information according to the target title template to obtain the work attribute information corresponding to the video M i to be processed if the structural similarity between the video title information and the target title template is greater than or equal to the structural similarity threshold.
  • the first extraction unit comprises:
  • the propagation matching subunit is used to traverse and obtain the kth sample video in the sample video library; k is a positive integer;
  • the propagation matching subunit is also used to perform screen matching processing on the processed video M i and the kth sample video to obtain the video screen similarity;
  • the propagation matching subunit is further used to calculate the similarity between the video title information of the processed video M i and the video title information corresponding to the k-th sample video to obtain the video title similarity;
  • the propagation matching subunit is also used to obtain the video click log associated with the to-be-processed video M i and the kth sample video, perform click analysis on the video click log, and obtain the video click similarity;
  • the propagation matching subunit is further used to determine the video similarity between the to-be-processed video M i and the kth sample video according to the video picture similarity, the video title similarity and the video click similarity;
  • the propagation matching subunit is further used for, if the video similarity is greater than the video similarity threshold, weighting the video work confidence of the k-th sample video for the associated work according to the video similarity to obtain the work confidence of the to-be-processed video M i for the associated work; the video work confidence of the k-th sample video for the associated work is used to characterize the credibility of the k-th sample video belonging to the associated work;
  • the propagation matching subunit is further used to determine the video work attribute information corresponding to the associated work as the work attribute information corresponding to the to-be-processed video M i if the work confidence is greater than or equal to the work confidence threshold.
  • the second extraction unit comprises:
  • the frame matching subunit is used to obtain a video work with work attribute information corresponding to the video to be processed M i from the video work library, and The obtained video works are used as the video works to be matched;
  • the frame matching subunit is also used to perform sampling processing on the video to be processed Mi to obtain a video frame image
  • the frame matching subunit is further used to perform picture matching processing on the video frame image and the video work picture in the video work to be matched, so as to obtain the video work picture matching the video frame image;
  • the frame matching sub-unit is also used to determine the episode number information corresponding to the video work picture that matches the video frame image as the episode number attribute information corresponding to the video to be processed Mi.
  • the second extraction unit comprises:
  • the title matching subunit is used to perform video layout character recognition processing on the cover image of the video to be processed M i to obtain the cover title information corresponding to the video to be processed M i ;
  • the title matching subunit is further used to perform structural matching processing on the cover title information and the episode templates in the episode template library, respectively, to obtain the structural similarity between the episode templates in the episode template library and the cover title information;
  • the title matching subunit is further used to determine the episode template with the highest structural similarity to the cover title information as the target episode template;
  • the title matching subunit is also used to extract information from the cover title information according to the target episode template if the structural similarity between the cover title information and the target episode template is greater than or equal to the structural similarity threshold, so as to obtain the episode attribute information corresponding to the video M i to be processed.
  • the generation module includes:
  • a sorting unit used for sorting the videos to be sorted according to the attribute information of the number of episodes corresponding to the videos to be sorted, to obtain sorted videos;
  • a detection unit used to perform continuity detection on the episode attribute information corresponding to the sorted videos to obtain a continuity detection result
  • a version identification unit is used to perform video version identification processing on the sorted video according to the target work knowledge graph to obtain a target video version corresponding to the sorted video if the continuity detection result is a continuous result of the number of episodes;
  • the target work knowledge graph is a work knowledge graph associated with the work attribute information corresponding to the sorted video;
  • An episode determination unit used to determine the total episode information corresponding to the sorted videos according to the target video version in the target work knowledge graph
  • a video determination unit configured to determine that the episode attribute information corresponding to the sorted videos meets the episode legality condition if the largest episode attribute information among the episode attribute information corresponding to the sorted videos is the same as the total episode information;
  • the video determination unit is further configured to determine the sorted video as an ordered album video if the episode attribute information corresponding to the sorted video meets the episode legality condition;
  • the album generation unit generates a video album set containing ordered album videos.
  • the target work knowledge graph includes one or more video versions and a list of video objects corresponding to each video version
  • Version identification unit including:
  • the overlap determination subunit is used to perform object recognition processing on the sorted video to obtain multiple video objects contained in the sorted video and the corresponding appearance duration of each video object;
  • the overlap determination subunit is further used to obtain R target video objects from the multiple video objects according to the duration sequence between the appearance durations corresponding to each video object; R is a positive integer;
  • the overlap determination subunit is further used to determine the object overlap between the R target video objects and each video object list in the target work knowledge graph; the object overlap refers to the overlap between the video objects included in a video object list and the R target video objects;
  • the version determination subunit is used to determine the video version corresponding to the video object list with the largest object overlap as the target video version corresponding to the sorted video.
  • the number of ordered album videos is at least two;
  • Album generation unit including:
  • the cover determination subunit is used to traverse at least two ordered album videos and sequentially obtain the j-th ordered album video, where j is a positive integer;
  • the cover determination subunit is further used to perform relevance matching on the video cover corresponding to the j-th ordered album video and the video title corresponding to the j-th ordered album video to obtain a relevance matching result;
  • the cover determination subunit is further used to determine the video cover corresponding to the j-th ordered album video as the album video cover corresponding to the j-th ordered album video if the relevance matching result is a relevance matching success result;
  • the cover determination subunit is further used for, if the relevance matching result is a relevance matching failure result, performing video frame screening processing on the j-th ordered album video, obtaining a video frame picture matching the video title corresponding to the j-th ordered album video, and determining the video frame picture as the album video cover corresponding to the j-th ordered album video;
  • the generating subunit is used to generate a video album set containing the album video cover corresponding to each ordered album video when the album video cover corresponding to each ordered album video is obtained.
  • the above-mentioned video data processing device further includes:
  • a filtering module used for obtaining a first initial video set
  • the filtering module is further used to perform black edge detection on the first initial video set to obtain a black edge ratio corresponding to each initial video in the first initial video set;
  • the filtering module is further used to filter the initial videos whose black edge ratio is greater than a black edge ratio threshold from the first initial video set to obtain a second initial video set;
  • the filtering module is further used to perform watermark detection on the second initial video set to obtain the watermark area ratio corresponding to each initial video in the second initial video set;
  • the filtering module is further used to filter the initial videos whose watermark area ratio is greater than the watermark area ratio threshold from the second initial video set to obtain a third initial video set;
  • the filtering module is further used to perform definition recognition on the third initial video set to obtain the definition corresponding to each initial video in the third initial video set;
  • the filtering module is further used to filter the initial videos whose definition is lower than the definition threshold from the third initial video set to obtain M videos to be processed.
  • An embodiment of the present application provides a video data processing device, including:
  • a first display module used to display the input target query data in the query box of the application page
  • a response module used to respond to a trigger operation for target query data, and if the intent type of the target query data is a video intent type, a recommendation result display area is displayed in the query result display box of the application page;
  • the second display module is used to sequentially display the ordered album videos included in the target video album set in the recommendation result display area;
  • the target video album set is a video album set whose work attribute information or source tag information matches the target query data, and the target video album set includes one or more ordered album videos corresponding to the work attribute information;
  • the display order of the ordered album videos with the same work attribute information is sorted according to the episode order between the corresponding episode attribute information;
  • the ordered album videos in the target video album set belong to the commentary video type.
  • an embodiment of the present application provides a computer device, including: a processor, a memory, and a network interface;
  • the above-mentioned processor is connected to the above-mentioned memory and the above-mentioned network interface, wherein the above-mentioned network interface is used to provide a data communication network element, the above-mentioned memory is used to store a computer program, and the above-mentioned processor is used to call the above-mentioned computer program to execute the method in the embodiment of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored.
  • the computer program is suitable for being loaded by a processor and executing the method in the embodiment of the present application.
  • an embodiment of the present application provides a computer program product or a computer program, which includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • a processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method in the embodiment of the present application.
  • feature extraction can be performed on the M videos to be processed respectively to obtain video attribute information and source label information corresponding to each video to be processed, wherein the video attribute information includes work attribute information and episode attribute information; then, the videos to be processed with the same source label information can be added to the same video set to obtain an initial video set, and the videos to be processed with the target work attribute information in the initial video set are determined as videos to be sorted; finally, according to the episode attribute information corresponding to the videos to be sorted, the videos to be sorted are sorted to obtain sorted videos, and if the episode attribute information corresponding to the sorted videos meets the legal condition of the episode, the sorted videos are determined as ordered album videos, and a video album set containing ordered album videos is generated.
  • the ordered album videos contained in the video album set obtained by the method provided in the embodiment of the present application correspond to the same work attribute information and source label information.
  • the query data matches the same work attribute information and source label information corresponding to the ordered album videos
  • the video album set can be displayed in the query result display box, thereby realizing structured video output and improving the display effect of the video corresponding to the query data.
  • the ordered album videos in the video album set are sorted according to the episode attribute information, and there is no need to click to watch one by one to determine the viewing order of the ordered album videos, thereby improving the presentation effect of the searched film and television commentary videos.
  • FIG1 is a schematic diagram of a network architecture provided in an embodiment of the present application.
  • FIG. 2a is a schematic diagram of a scenario for generating a video album set provided in an embodiment of the present application
  • FIG2b is a schematic diagram of a video query scenario provided by an embodiment of the present application.
  • FIG3 is a flow chart of a video data processing method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of the overall process of a video clustering mining method provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of the structure of a video data processing device provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of the structure of a computer device provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the structure of another video data processing device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the structure of another computer device provided in an embodiment of the present application.
  • Artificial Intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline that covers a wide range of fields, including both hardware-level and software-level technologies.
  • Basic artificial intelligence technologies generally include sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operating/interactive systems, mechatronics and other technologies.
  • Artificial intelligence software technologies mainly include computer vision technology, speech processing technology, natural language processing technology, as well as machine learning/deep learning, autonomous driving, smart transportation and other major directions.
  • Computer vision is a science that studies how to make machines "see”. To put it more specifically, it refers to machine vision such as using cameras and computers to replace human eyes to identify and measure targets, and further perform graphic processing so that the computer processing becomes an image that is more suitable for human eye observation or transmission to instruments for detection.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition, text recognition), video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, virtual reality, augmented reality, simultaneous positioning and map construction, autonomous driving, and smart transportation.
  • the key technologies of speech technology include automatic speech recognition technology, speech synthesis technology and voiceprint recognition technology. Enabling computers to listen, see, speak and feel is the future development direction of human-computer interaction, among which speech has become one of the most promising human-computer interaction methods in the future.
  • Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that can achieve effective communication between people and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language people use in daily life, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.
  • the network architecture may include a server 100 and a terminal device cluster, and the terminal device cluster may include: terminal device 200a, terminal device 200b, terminal device 200c, ..., terminal device 200n, wherein any terminal device in the terminal device cluster may have a communication connection with the server 100, for example, there is a communication connection between the terminal device 200a and the server 100, wherein the above communication connection does not limit the connection mode, and may be directly or indirectly connected by wired communication mode, or directly or indirectly connected by wireless communication mode, or by other modes, and the present application does not limit it here.
  • each terminal device in the terminal cluster shown in FIG. 1 can be installed with an application client.
  • the application client can be an instant messaging application, a live broadcast application, a short video application, a video application, a music application, a social application, a shopping application, a game application, a novel application, a payment application, a browsing application, and other application clients with query functions.
  • the application client can be an independent client or an embedded sub-client integrated in a client (such as an instant messaging client, a social client, a video client, etc.), which is not limited here.
  • the server 100 can be used to respond to the query request sent by the terminal device through the short video application to perform query processing for the query data belonging to the video intent type contained in the query request. Therefore, each terminal device can transmit data with the server 100 through the short video application. For example, each terminal device can obtain the query data matching the query data through the short video application. The data stream corresponding to the video album collection.
  • the terminal device 200a can display an application page through a short video application, and a query box can be displayed in the application page. After the terminal device 200a responds to the input operation, the input target query data can be displayed in the query box.
  • the intent type of the target query data is a video intent type, that is, the target query data can refer to data related to film and television works such as movies or TV series, for example, the name of the film and television IP, the actors and actresses of the film and television, the editing bloggers you want to watch, etc.
  • the terminal device 200a can respond to the trigger operation for the target query data and send a query request containing the target query data to the server 100.
  • the server 100 can obtain a video album set whose work attribute information or source tag information matches the target query data in the video album set library as the target video album set, and then return the data stream corresponding to the target video album set to the terminal device 200a.
  • the work attribute information refers to the film and television IP information.
  • the source tag information refers to the source information of the video, for example, which editing blogger it comes from, or which website it comes from, etc.
  • the terminal device 200a can display the recommendation result display area in the query result display box of the application page; in the recommendation result display area, the ordered album videos of the commentary type contained in the target video album set are displayed in sequence.
  • the ordered album videos can be the commentary videos corresponding to each episode of a TV series, that is, if the TV series has a total of 30 episodes, then the ordered album videos corresponding to the TV series may be 30 commentary videos (such as a commentary video is a video edited for one episode of the TV series), and these 30 commentary videos are presented in the recommendation result display area in the order of the number of episodes.
  • the video album set in the video album set library can be generated by the server 100 according to the video data processing method provided in the embodiment of the present application.
  • the server 100 can obtain M videos to be processed, where M is a positive integer, and then perform feature extraction on the M videos to be processed respectively to obtain the video attribute information and source tag information corresponding to each video to be processed, and the video attribute information includes work attribute information and episode attribute information; then, the server 100 can add the videos to be processed with the same source tag information to the same video set to obtain an initial video set; optionally, if there is only one video to be processed associated with a certain source tag information, an initial video set containing only this one video to be processed can also be generated.
  • the server 100 can determine the videos to be processed with the target work attribute information in the initial video set as the videos to be sorted; finally, according to the episode attribute information corresponding to the videos to be sorted, the videos to be sorted are sorted and filtered to obtain ordered album videos, and a video album set containing ordered album videos is generated.
  • the process of sorting and filtering can be: sorting the videos to be sorted according to the episode attribute information corresponding to the videos to be sorted, obtaining sorted videos (i.e., sorting these videos to be sorted in the order of episodes), and then further detecting whether the episodes corresponding to these sorted videos (derived from the episode attribute information) are continuous, and whether the largest episode in these sorted videos is the same as the total episodes of the works to which they belong (such as the TV series to which these sorted videos belong).
  • the server 100 can also sort and filter the videos to be sorted with other works attribute information in the initial video set according to the above process. If the episode legality condition is also met, the corresponding ordered album videos will also be generated. Therefore, the video album set can contain ordered album videos corresponding to multiple works attribute information.
  • the server 100 can associate the generated video album set with its corresponding work attribute information and source tag information and write them into the video album set library for storage, so that after receiving the query data sent by the terminal device, after determining the work attribute information or source tag information corresponding to the query data, the video album set matching the query data can be quickly obtained, and the corresponding data stream is returned to the terminal device.
  • the episode attribute information corresponding to the ordered album videos contained in the video album set obtained by the embodiment of the present application is continuous, which is convenient for quickly determining the viewing order, thereby improving the viewing efficiency.
  • the method provided in the embodiment of the present application can be executed by a computer device, and the computer device includes but is not limited to a terminal device or a server.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud database, cloud service, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and big data and artificial intelligence platform.
  • the terminal device can be a smart phone, tablet computer, laptop computer, desktop computer, PDA, mobile Internet device (mobile internet device, MID), wearable device (such as smart watch, smart bracelet, etc.), smart TV, smart car, etc., which can run instant messaging applications or social applications.
  • the terminal device and the server can be directly or indirectly connected by wired or wireless means, and the embodiment of the present application is not limited here.
  • query data and other related data involved are When applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards of relevant countries and regions.
  • Figures 2a-2b To facilitate understanding of the above-mentioned video album set generation and the display process of the target video album set when querying the target query data, please refer to Figures 2a-2b.
  • the implementation process of Figures 2a-2b can be performed in the server 100 as shown in Figure 1, or in the terminal device (terminal device 200a, terminal device 200b, terminal device 200c or terminal device 200n as shown in Figure 1), and can also be performed jointly by the terminal device and the server.
  • the embodiment of the present application takes the joint execution of the terminal device 200b and the server 100 as an example for explanation.
  • Figure 2a is a schematic diagram of a scene for generating a video album set provided by an embodiment of the present application.
  • the server 100 can obtain M videos to be processed: video 1 to be processed, video 2 to be processed, ..., video M to be processed.
  • the video to be processed can be a video of film and television commentary, that is, a video edited according to part of the content of a movie or TV series and accompanied by a commentary.
  • the M videos to be processed can be videos obtained by the server 100 after quality screening of all videos that can be obtained in large quantities, so the source of each video to be processed may be different, the content of the film and television works involved may be different, and the corresponding video content presentation method and video publishing method may also be different. Therefore, the server 100 can classify and sort the M videos to be processed to obtain an orderly video album set. Specifically, after obtaining the M videos to be processed, the server 100 can first perform feature extraction on the M videos to be processed respectively to obtain the video attribute information and source tag information corresponding to each video to be processed.
  • the video attribute information may include work attribute information and episode attribute information.
  • the work attribute information is used to describe the film and television work involved in the video to be processed
  • the episode attribute information is used to describe which part of the content of the corresponding film and television work (such as which episode) the video to be processed involves
  • the source tag information refers to the source information of the video, for example, which editing blogger it comes from, or which website it comes from, etc.
  • the video attribute information 201 corresponding to the video to be processed 1 may be "TV series A, Episode 2", indicating that the video to be processed 1 is a film and television commentary video for the film and television content of the second episode of TV series A;
  • the video attribute information 202 corresponding to the video to be processed 2 may be "TV series B, Episode 1", indicating that the video to be processed 2 is a film and television commentary video for the film and television content of the first episode of TV series B;
  • the video attribute information 203 corresponding to the video to be processed M may be "Movie C, Part 1", indicating that the video to be processed M is a film and television commentary video for the film and television content of the first half of Movie C.
  • the server 100 can first classify the sources of the M videos to be processed, that is, the videos to be processed can be classified according to the source tag information, such as first adding the videos to be processed with the same source tag information to the same initial video set, so that each video to be processed in an initial video set has the same source tag information.
  • the server 100 can obtain multiple initial video sets, for example, an initial video set 204.
  • the initial video set 204 may include videos to be processed 2, ..., and videos a to be processed, that is, videos to be processed 2, ..., and videos to be processed have the same source tag information, and the same is true for other initial video sets.
  • the server 100 will determine the videos to be processed with the same work attribute information in each initial video set as videos to be sorted.
  • the initial video set 204 as an example, as shown in FIG2a, assuming that the work attribute information corresponding to the to-be-processed videos 2, ..., and to-be-processed videos c is TV series B, and the work attribute information corresponding to the to-be-processed videos 3 and to-be-processed videos a is movie D
  • the server 100 can determine the to-be-processed videos 2, ..., and to-be-processed videos c as to-be-sorted videos 205, and determine the to-be-processed videos 3 and to-be-processed videos a as to-be-sorted videos 206, and so on.
  • the server 100 can sort and filter each group of to-be-sorted videos (a group of to-be-sorted videos corresponds to one work attribute information, that is, the to-be-processed videos in a group of to-be-sorted videos all have the same work attribute information), that is, sort the group of to-be-sorted videos according to the episode attribute information corresponding to a group of to-be-sorted videos, and obtain sorted videos. If the episode attribute information corresponding to these sorted videos is continuous and complete, these sorted videos can be determined as ordered album videos, and then a video album set containing ordered album videos is generated.
  • the video album set can contain ordered album videos corresponding to one or more work attribute information, that is, it can be understood that the to-be-sorted videos corresponding to other work attribute information can also generate corresponding ordered album videos in the same way.
  • the video to be processed after the video to be processed 2 is the video to be processed c, but the episode attribute information corresponding to the video to be processed 2 is the 1st episode, and the episode attribute information corresponding to the video to be processed c is the 3rd episode. That is to say, in the video to be sorted 205, there is no video to be processed related to the content of the second episode of the TV series B.
  • the server 100 can consider that the video to be sorted 205 is out of order, and the server 100 can give up subsequent processing of the video to be sorted 205, that is, it will not generate its corresponding ordered album video.
  • the server 100 sorts the video to be sorted 206, it obtains the video to be processed a and the video to be processed 3, where the episode attribute information corresponding to the video to be processed a is up, and the episode attribute information corresponding to the video to be processed 3 is down, then the server 100 can determine that the episode attribute information corresponding to the videos to be processed in the video to be sorted 206 is continuous and complete, and therefore the video to be processed a and the video to be processed 3 can be determined as ordered album videos, and then a video album set 207 containing the ordered album videos (i.e., video to be processed a and video to be processed 3) is generated.
  • the server 100 can finally obtain multiple ordered video album sets, that is, one video album set corresponds to one source tag information, and one video album set can also contain ordered album videos corresponding to one or more work attribute information. There can be multiple ordered album videos corresponding to one work attribute information, and the display order of these multiple ordered album videos is displayed in the order of the episodes.
  • the work attribute information or source tag information matching the query data can be determined first, and then the ordered album videos corresponding to the work attribute information or source tag information matching the query data are returned to the terminal device.
  • the terminal device can display the input target query data in the query box of the application page, and then respond to the trigger operation for the target query data.
  • the intent type of the target query data is a video intent type
  • the recommended result display area is displayed in the query result display box of the application page; in the recommended result display area, the ordered album videos contained in the target video album set are displayed in sequence.
  • the target video album set is a video album set whose work attribute information or source tag information matches the target query data; the display order of the ordered album videos in the target video album set is sorted according to the episode order between the episode attribute information corresponding to the ordered album video; the ordered album videos in the target video album set belong to the commentary video type.
  • Figure 2b is a scene schematic diagram of a video query provided in an embodiment of the present application.
  • the object having an association relationship with the terminal device 200b is object 1, and a short video application is integrated and installed on the terminal device 200b.
  • Object 1 can exchange data with server 100 through the short video application of terminal device 200b.
  • terminal device 200b can display application page 31, which includes query box 311 and query result display box 312.
  • query box 311 is used to provide query function
  • query result display box 312 is used to display query result.
  • object 1 wants to watch film and television works, it can perform input operation through query box 311.
  • Terminal device 200b can display query content 311a input by object 1 in query box 311 of application page 31.
  • query content 311a can be "movie D".
  • a trigger operation can be performed on query content 311a.
  • the trigger operation can be a trigger operation on query control 311b.
  • terminal device 200b After terminal device 200b responds to the trigger operation on query content 311a, it can send query content 311a to server 100.
  • the server 100 can perform query processing on the query content 311a, obtain query result data, and then return the query result data on the query content 311a to the terminal device 200b, and the terminal device 200b can display the query result in the result display box according to the query result data.
  • a feasible process of query processing is to first determine the intent type of the query content 311a. If it is determined that the intent type of the query content 311a is a video intent type, the server 100 will not only search for video data matching the query content 311a in a large amount of video data that can be obtained as the first video data, but also search for a video album set matching the query content 311a in a plurality of ordered video album sets obtained in the scene shown in FIG. 2a, that is, a video album set matching "Movie D", such as the video album set 207 shown in FIG. 2a above, and the server 100 will use the video data corresponding to the video album set 207 as the second video data. Then, the server 100 will determine the first video data and the second video data as the query result data.
  • the terminal device 200b after receiving the query result data, the terminal device 200b will display the recommended result display area in the query result display box 312 of the application page 31, for example, the recommended result display area 312a and the recommended result display area 312b.
  • different recommended result display areas are used to display different video data, and the display level of the second video data takes precedence over the first video data. Therefore, the terminal device 200b will display the second video data in the recommended result display area 312a and the first video data in the recommended result display area 312b.
  • the terminal device 200b will display the video cover corresponding to each ordered album video in the recommended result display area 312a according to the position order of the ordered album videos contained in the video album set 207 (the position order matches the episode order corresponding to the ordered album videos). Because the video album set 207 sequentially contains the video a to be processed and the video 3 to be processed, the video cover 313 is the video cover corresponding to the video a to be processed, and the video cover 314 is the video cover corresponding to the video 3 to be processed. Then, the terminal device 200b will display the video cover corresponding to the first video data in the recommendation result display area 312b.
  • the terminal device when responding to query data for video intent type, the terminal device will first give priority to displaying an ordered video album collection, thereby realizing structured ordered video output, improving the display effect of the video corresponding to the query data, and the ordered album videos in the video album collection are sorted according to the episode attribute information, and there is no need to click to watch one by one to determine the viewing order of the ordered album videos, thereby improving the presentation effect of the searched film and television commentary videos.
  • FIG. 3 is a flow chart of a video data processing method provided in an embodiment of the present application.
  • the video data processing method can be executed by a computer device, and the computer device can include a terminal device or a server as shown in FIG. 1.
  • the method can include the following steps S101-S104:
  • Step S101 obtaining M videos to be processed; M is a positive integer.
  • the video to be processed refers to the edited video associated with the film or television work (ie, the above-mentioned movie or TV series IP).
  • the video to be processed may be a film and television commentary video, that is, a video generated by a blogger editing part of the film and television content in a film and television work and adding corresponding commentary (which may be text commentary, voice commentary, or video commentary, etc.). It can be understood that a film and television commentary video can help users quickly understand the content outline of the film and television work.
  • Step S102 perform feature extraction on the M videos to be processed respectively, obtain video attribute information corresponding to each video to be processed respectively, and obtain source label information corresponding to each video to be processed respectively;
  • the video attribute information includes work attribute information and episode attribute information.
  • the work attribute information refers to the film and television work information corresponding to the video to be processed.
  • the work attribute information corresponding to the video to be processed A can be the name of the TV series, such as "BBB", which means that the video content of the video to be processed A belongs to the TV series "BBB".
  • the attribute information is used to indicate which time period of the film and television content of the video to be processed corresponds to.
  • the episode attribute information corresponding to the video A to be processed is episode 1-2, which means that the video content in the video A to be processed involves the film and television content of the first and second episodes of the TV series "BBB".
  • the video A to be processed is generated based on the film and television content of the first and second episodes of the TV series "BBB".
  • the computer device can also obtain the source tag information corresponding to each video to be processed.
  • the source tag information refers to the source information of the video, for example, which editing blogger it comes from, or which website it comes from, etc.
  • M videos to be processed include a video to be processed Mi , where i is a positive integer less than or equal to M.
  • feature extraction is described by taking the video to be processed Mi as an example.
  • Feature extraction is performed on the M videos to be processed respectively to obtain a feasible implementation process of the video attribute information corresponding to each video to be processed.
  • the process may be: performing work attribute extraction on the video to be processed Mi to obtain the work attribute information corresponding to the video to be processed Mi ; performing episode attribute extraction on the video to be processed Mi to obtain the episode attribute information corresponding to the video to be processed Mi.
  • the work attribute extraction process may be performed in a variety of ways, such as video frame retrieval, title template matching, and label propagation.
  • the episode attribute extraction process may be performed in a variety of ways, such as video frame retrieval, title template matching, and label propagation.
  • a feasible implementation process of performing work attribute extraction process on the processed video Mi to obtain the work attribute information corresponding to the processed video Mi can be: sampling the processed video Mi to obtain a video frame image; performing screen matching process on the video frame image and the video works in the video work library respectively to obtain the screen similarities between the video works in the video work library and the video frame image; determining the video work with the highest screen similarity with the video frame image as the target video work; if the screen similarity between the video frame image and the target video work is greater than or equal to the screen similarity threshold, determining the video work attribute information corresponding to the target video work as the work attribute information corresponding to the processed video Mi.
  • the video frame image may include a video frame image X
  • the film and television works in the film and television work library may include a film and television work Y. Then, among the frame images included in the film and television work Y, the frame image with the highest similarity to the video frame image X is obtained as the target frame image, and the similarity between the target frame image and the video frame image X is determined as the screen similarity between the film and television work Y and the video frame image X.
  • the image similarity between the video frame image and the frame image may be calculated by the image representation vectors corresponding to the two images respectively, or may be obtained by other similarity comparison models, which are not limited here.
  • the feasible implementation process of determining the video work with the highest screen similarity to the video frame image can be: traverse multiple video frame images, perform screen matching processing on the i-th video frame image among the multiple video frame images and the video works in the video work library, and obtain the screen similarity between the video works in the video work library and the i-th video frame image; i is a positive integer less than or equal to the number of multiple video frame images; obtain the video work with the highest screen similarity to the i-th video frame image as the pending video work corresponding to the i-th video frame image, and mark the pending video work corresponding to the i-th video frame image.
  • the pending video work with the most marking times is determined as the video work with the highest screen similarity to the video frame image, and then the video work attribute information corresponding to the pending video work with the most marking times can be determined as the work attribute information corresponding to the video to be processed M i .
  • the to-be-determined video work with the highest similarity determined by using a larger number of video frame images can have a higher accuracy, that is, it can be ensured that the work attribute information corresponding to the determined to-be-processed video Mi is sufficiently accurate.
  • a feasible implementation process of performing work attribute extraction process on the video to be processed M i to obtain the work attribute information corresponding to the video to be processed M i can be: obtaining the video title information corresponding to the video to be processed M i ; performing structural matching process on the video title information and the title templates in the title template library respectively to obtain the structural similarity between the title templates in the title template library and the video title information respectively; determining the title template with the highest structural similarity with the video title information as the target title template; if the structural similarity between the video title information and the target title template is greater than or equal to the structural similarity threshold, performing information extraction process on the video title information according to the target title template to obtain the work attribute information corresponding to the video to be processed M i .
  • the title template in the title template library refers to a predefined text template used to extract the work attribute information in the video title information, that is, IP information.
  • the title template may include: “ ⁇ IP ⁇ ", “ ⁇ IP>”, “[IP]”, “IP+number:”, “IP+number”.
  • the video title information C corresponding to the video to be processed Mi is "XXX”.
  • the computer device calculates the structural similarity between the video title information C and the title template in the title template library, it can be determined that the target title template most similar to the video title information C is "IP”. Then the computer device can perform information extraction processing on the video title information C according to the target title template, and obtain the work attribute information corresponding to the video to be processed Mi as XXX.
  • label propagation is to predict the unlabeled node label information from the labeled node label information by utilizing the relationship between samples.
  • the work attribute extraction process adopts the label propagation method
  • the work attribute extraction process is performed on the processed video Mi to obtain the processed video
  • a feasible implementation process of the work attribute information corresponding to the video M i can be as follows: traverse and obtain the kth sample video in the sample video library; k is a positive integer; perform screen matching processing on the to-be-processed video M i and the kth sample video to obtain video screen similarity; perform similarity calculation on the video title information of the to-be-processed video M i and the video title information corresponding to the kth sample video to obtain video title similarity; obtain the video click log associated with the to-be-processed video M i and the kth sample video, perform click analysis processing on the video click log to obtain video click similarity; determine the video similarity between the to-be-processed video M i and the kth sample
  • the video work confidence of the kth sample video for the associated work is used to characterize the credibility of the kth sample video belonging to the associated work.
  • the sample video and the video to be processed belong to the same type of video.
  • the sample video in the sample video library can be regarded as a node, and corresponds to a related work label (the related work label can indicate the associated video corresponding to the sample video), and the associated work label will have a video work confidence (generated by the algorithm when calculating the label).
  • the video work confidence is used to characterize the credibility of the sample video belonging to the associated work indicated by the associated work label.
  • a video click log refers to a user's click behavior analysis log for a video within a certain period of time. It can be understood that there can be multiple video click logs associated with the video to be processed Mi and the kth sample video. Through these video click logs, the possibility of the user clicking on the video to be processed Mi and the kth sample video at the same time can be analyzed as the video click similarity.
  • the process of determining the video similarity between the to-be-processed video Mi and the kth sample video based on the video picture similarity, video title similarity and video click similarity can refer to: adding and averaging the video picture similarity, video title similarity and video click similarity, or can refer to: weighting the video picture similarity, video title similarity and video click similarity and then adding and averaging them.
  • the specific method can be determined based on actual conditions and this application does not impose any restrictions on this.
  • a feasible implementation process of extracting the episode attribute of the video to be processed Mi to obtain the episode attribute information corresponding to the video to be processed Mi can be as follows: from the video work library, obtain the video work with the work attribute information corresponding to the video to be processed Mi , and use the obtained video work as the video work to be matched; perform sampling processing on the video to be processed Mi to obtain a video frame image; perform screen matching processing on the video frame image and the video work screen in the video work to be matched to obtain the video work screen matching the video frame image; determine the episode information corresponding to the video work screen matching the video frame image as the episode attribute information corresponding to the video to be processed Mi.
  • the video work screen of the episode, minute and second in the video work to be matched corresponding to the video frame image can be located, so that it can be determined which part of the content of the video work to be matched is involved in the video to be processed Mi , thereby determining the episode attribute information.
  • a feasible implementation process of performing episode attribute extraction processing on the video to be processed Mi to obtain the episode attribute information corresponding to the video to be processed Mi can be: performing video layout character recognition processing on the cover image of the video to be processed Mi to obtain the cover title information corresponding to the video to be processed Mi ; performing structural matching processing on the cover title information and the episode templates in the episode template library to obtain the structural similarity between the episode templates in the episode template library and the cover title information; determining the episode template with the highest structural similarity with the cover title information as the target episode template; if the structural similarity between the cover title information and the target episode template is greater than or equal to the structural similarity threshold, performing information extraction processing on the cover title information according to the target episode template to obtain the episode attribute information corresponding to the video to be processed Mi.
  • the video layout character recognition processing refers to the use of VideoLayout_OCR (video layout character recognition) technology, which can not only obtain the text information on the cover image, but also recognize the layout attributes of the regional text, such as title, subtitle, background text, etc., so as to determine the cover title information corresponding to the video to be processed Mi according to the layout attributes and text information.
  • VideoLayout_OCR refers to the technology that uses a three-branch multi-task neural network to combine text detection and attribute classification tasks into one.
  • the specific implementation process can refer to the implementation process when the work attribute extraction process adopts the title template matching method.
  • the template used is the episode template for the episode attribute information. Because the episode attribute information can include two parts, episode (episode) and part (part), where Episode represents the episode number and part represents the part number, such as upper/middle/lower, 1/2/3. Therefore, the episode template can be divided into a pattern type template for extracting pattern information and a part type template for extracting part information.
  • the pattern type template may include: "+Arabic numerals/Chinese numerals+"issue” or “episode” or “case”, such as the 1st issue or the second episode; ""+Arabic numerals+”-” or “ ⁇ "+Arabic numerals+”issue” or “episode” or “case”, such as the 1st-2nd episode; "EP” or “Part”+Arabic numerals, such as EP1 or Part1; or if the video title contains the string "Grand Finale", the episode is considered to be the last episode.
  • the part type template may include: "(upper/middle/lower/number)", “upper/middle/lower/number”, “[upper/middle/lower/number]", number+"/"+number, such as 1/3, number+"
  • the part information of the video is obtained, such as the upper, middle and lower episodes, or 1/3, 2/3, 3/3. It can be understood that the computer device can match the two types of episode number templates with the cover title information respectively, and the matches between the two do not affect each other. If there is a part type template that matches the cover title information, the part information can be extracted; if there is a pattern type template that matches the cover title information, the pattern information can be extracted.
  • the episode attribute extraction process adopts the title template matching method
  • the episode attribute extraction can also be performed for the video title information corresponding to the video to be processed Mi.
  • the video title information refers to the title information corresponding to the video to be processed Mi when it is released.
  • the work attribute extraction process can adopt methods such as video frame retrieval, title template matching and label propagation
  • the episode attribute extraction process can adopt methods such as video frame retrieval, title template matching, etc.
  • one or more of the above methods can be used simultaneously to perform the work attribute extraction process of the video M i to be processed, and one or more of the above methods can be used simultaneously to perform the episode attribute extraction process of the video M i to be processed, and the present application does not impose any restrictions on this.
  • some videos to be processed may not be able to extract usable work attribute information or episode attribute information.
  • the computer device can determine these videos to be processed as invalid videos to be processed and filter them out directly, that is, they will no longer participate in the processing of subsequent steps.
  • Step S103 classify the M videos to be processed according to the source label information to obtain an initial video set, and determine the videos to be processed in the initial video set that have the target work attribute information as the videos to be sorted; each video to be processed in the initial video set has the same source label information; the work attribute information involved in the M videos to be processed includes the target work attribute information.
  • the source tag information refers to the source information of the video to be processed, for example, the ID (Identity document, identity or account) of the author who published the video to be processed.
  • the computer device can first classify the valid videos to be processed according to the source tag information, that is, classify the M videos to be processed according to the source tag information.
  • the videos to be processed with the same source tag information are added to the same initial video set, so as to obtain multiple initial video sets, that is, one initial video set corresponds to one source tag information. In other words, each video to be processed in an initial video set has the same source tag information.
  • the videos to be processed with the same work attribute information are classified and determined as the same batch of videos to be sorted.
  • the videos to be processed with the target work attribute information in a certain initial video set can be determined as a batch of videos to be sorted, that is, the videos to be processed belonging to the same film and television work in a certain initial video set (they all have the same source label information and the same work attribute information) will be processed together as subsequent ordered album videos.
  • the videos to be processed belonging to another film and television work in the initial video set will also be processed together as another group of ordered album videos.
  • Step S104 sorting the videos to be sorted according to the episode attribute information corresponding to the videos to be sorted to obtain sorted videos. If the episode attribute information corresponding to the sorted videos meets the legal episode condition, the sorted videos are determined to be ordered album videos, and a video album set containing the ordered album videos is generated; the video album set is used to be displayed in the query result display box when the query data matches the work attribute information or source tag information corresponding to the ordered album videos.
  • the computer device can sort the videos to be sorted according to the episode attribute information corresponding to the videos to be sorted to obtain the sorted videos, and then perform continuity detection on the episode attribute information corresponding to the sorted videos to obtain the continuity detection result; if the continuity detection result is an episode continuous result, it can be determined that the episode attribute information corresponding to the sorted videos meets the episode legality condition, that is, the episode legality condition can refer to whether the episode attribute information between the sorted videos meets the episode continuous result condition, and then the sorted videos that meet the episode legality condition are determined as ordered album videos, and a video album set containing the ordered album videos is generated; if the continuity detection result is an episode discontinuity result, the sorted videos are determined as unordered videos, and there is no need to generate a video album set for the unordered videos.
  • the sorting process can be in ascending order from small to large, or in descending order from large to small, which is not limited here.
  • the continuity detection is to determine whether the episode attribute information corresponding to all adjacent sorted videos is continuous.
  • the episode attribute information corresponding to sorted video 1 is the first episode
  • the episode attribute information corresponding to the adjacent sorted video 2 is the third episode.
  • the two are not continuous, and the second episode is missing in the middle.
  • the continuity detection result is the episode discontinuity result. It can be seen that the episode continuity result at this time means that the episode attribute information of each two adjacent sorted videos are adjacent episodes.
  • the computer device can also identify the episode attribute information corresponding to the first sorted video in the sorted video, so as to determine whether the first sorted video is the first video of the work, that is, to determine whether its corresponding episode attribute information is the first episode; similarly, the computer device can obtain the total episode information corresponding to the work attribute information corresponding to the sorted video, and then the computer device can identify the episode attribute information corresponding to the last sorted video in the sorted video, and determine whether the last sorted video is the last video of the work, that is, to determine whether its corresponding episode attribute information is equal to the total episode information of the film and television work to which it belongs.
  • the continuity detection result can be determined as a discontinuous episode result. It can be seen that the episode number at this time Continuous results mean that the episode attribute information of every two adjacent sorted videos are adjacent episodes, and the first sorted video is the first video of the film or television work to which it belongs (such as the first episode), and the last sorted video is the last video of the film or television work to which it belongs (such as the last episode).
  • the video to be processed is a film and television commentary video, and a complete video commentary album for a movie or TV series is generated as an example for explanation.
  • Figure 4 is a schematic diagram of the overall process of a video clustering mining method provided by an embodiment of the present application. As shown in Figure 4, the entire video clustering mining method mainly includes the following processes:
  • Step T1 input M videos to be processed.
  • the M videos to be processed are the M videos to be processed in step S101 in the embodiment corresponding to FIG. 3 .
  • the computer device needs to extract the episode attribute information and work attribute information corresponding to each video to be processed, and also obtain the source tag information corresponding to each video to be processed.
  • the computer device will perform steps T2 to T9 for each video to be processed. If the work attribute information or episode attribute information of the video to be processed is not finally extracted, the video to be processed is determined to be an invalid video.
  • steps T2 to T9 are described using a single video to be processed as an example.
  • Step T2 performing video frame retrieval processing on the video to be processed.
  • the implementation process of the video frame retrieval processing can refer to the description of the above step S102 when the work attribute extraction processing adopts the video frame retrieval method, which will not be repeated here.
  • Step T3 determine whether the work attribute information corresponding to the video to be processed is extracted; if so, execute step T8; if not, execute step T4.
  • Step T4 performing template matching processing on the video to be processed.
  • the implementation process of the template matching process can refer to the description of the above step S102 when the work attribute extraction process adopts the template matching method, which will not be repeated here.
  • Step T5 determining whether step T4 has extracted the work attribute information corresponding to the video to be processed; if so, executing step T8; if not, executing step T6.
  • Step T6 perform label propagation processing on the video to be processed.
  • the implementation process of the label propagation processing can refer to the description of the above step S102 when the work attribute extraction processing adopts the label propagation method, which will not be repeated here.
  • Step T7 determining whether step T6 has extracted the work attribute information corresponding to the video to be processed; if so, executing step T8; if not, determining that the video to be processed is invalid.
  • Step T8 extracting the episode attribute information of the video to be processed.
  • the implementation process of the episode attribute information extraction process can refer to the description of the implementation of the episode attribute information extraction process in the above step S102, which will not be repeated here.
  • Step T9 determining whether step T8 has extracted the episode attribute information corresponding to the video to be processed; if so, executing step T10; if not, determining that the video to be processed is invalid.
  • Step T10 album aggregation of valid videos to be processed.
  • the valid videos to be processed are classified according to the author ID (i.e., the source tag information in FIG. 3 above), and then classified again under the same video IP (i.e., the work attribute information) under the same author, so that valid videos to be sorted with a unique video author + video IP can be obtained.
  • the author ID i.e., the source tag information in FIG. 3 above
  • the same video IP i.e., the work attribute information
  • Step T11 generate a video album.
  • step T11 may refer to the description of step S104 in the embodiment corresponding to FIG. 3 , which will not be described in detail here.
  • the ordered album videos contained in the obtained video album set correspond to the same work attribute information and source label information.
  • the video album set can be displayed in the query result display box, realizing structured video output, improving the display effect of the video corresponding to the query data, and the ordered album videos are sorted according to the episode attribute information in the video album set, and it is no longer necessary to click and watch one by one to determine the viewing order of the ordered album videos, thereby improving the presentation effect of the searched film and television commentary videos.
  • the present application can more accurately mine the work attribute information and episode attribute information of the video to be processed through video frame retrieval, title template matching and label propagation, so as to better ensure the accuracy of the generated ordered album video, and because of the progressive mining mechanism such as video frame retrieval, title template matching and label propagation, the work attribute information and episode attribute information can be mined in more videos to be processed, so as to ensure the number of ordered album videos. Moreover, judging and detecting by the legal conditions of the number of episodes can better ensure that the number of episodes between the ordered album videos corresponding to a film or television work is continuous and complete, further ensuring the accuracy of the ordered album videos.
  • FIG. 5 is a flow chart of a video data processing method provided by an embodiment of the present application.
  • the video data processing method can be executed by a computer device, and the computer device can include a terminal device or a server as shown in FIG. 1.
  • the method may include the following steps S201-S204:
  • Step S201 performing quality screening processing on a first initial video set to obtain M videos to be processed; the first initial video set includes at least two initial videos.
  • the videos in the first initial video set may be quality screened to filter out some videos with substandard quality.
  • the quality screening process may include black edge detection, watermark detection, and definition recognition.
  • black edge detection requires that the black edge ratio of the initial video cannot exceed a certain range, otherwise the content screen ratio is too small, affecting the user's viewing experience.
  • Black edge detection mainly extracts frames in the initial video at a fixed sampling rate, and then sets the black edge ratio threshold to binarize the image, and detects the ratio of continuous black pixels in the video width/high, medium and low to filter the initial video.
  • the black edge ratio threshold can be determined according to the length and width of the video.
  • the black edge ratio threshold corresponding to a short video can be 1/3
  • the black edge ratio threshold corresponding to a small video can be 2/3.
  • a short video refers to a video whose width is greater than its height
  • a small video refers to a video whose width is less than its height.
  • Watermark detection requires that there is no watermark that is too large in the initial video, otherwise it will seriously block the main body of the video.
  • Watermark detection mainly obtains candidate areas through pixel comparison between consecutive frames of the video, and then binarizes the frame image in the video through edge detection, mean filtering and Otsu threshold method, and then uses the Unicom domain algorithm and clustering algorithm to screen out the area with the largest Unicom area, which is considered to be the watermark part. Then compare the watermark area and the picture area. If it exceeds 1/25, it is considered that the watermark is too large and there is occlusion.
  • clarity recognition refers to calculating the gradient between pixels in the video screen, counting the global gradient mean, and then normalizing to obtain clarity.
  • the clarity can take values 0-4, with 4 being the clearest.
  • the clarity threshold can be set to 2, that is, the clarity corresponding to the initial video cannot be lower than 2.
  • the computer device can select one or more processing processes of black edge detection, watermark detection or clarity recognition to perform quality screening on the videos in the first initial video set, and can also add other quality screening processes according to actual conditions.
  • a feasible implementation process of quality screening processing on the first initial video set to obtain M videos to be processed can be: obtain the first initial video set; perform black edge detection on the first initial video set to obtain the black edge ratio corresponding to each initial video in the first initial video set; filter the initial videos whose black edge ratio is greater than the black edge ratio threshold from the first initial video set to obtain the second initial video set; perform watermark detection on the second initial video set to obtain the watermark area ratio corresponding to each initial video in the second initial video set; filter the initial videos whose watermark area ratio is greater than the watermark area ratio threshold from the second initial video set to obtain the third initial video set; perform clarity recognition on the third initial video set to obtain the clarity corresponding to each initial video in the third initial video set; filter the initial videos whose clarity is lower than the clarity threshold from the third initial video set to obtain M videos to be processed. It can be understood that by filtering the initial videos layer by layer through the three filtering
  • Step S202 perform feature extraction on the M videos to be processed respectively, obtain video attribute information corresponding to each video to be processed respectively, and obtain source label information corresponding to each video to be processed respectively;
  • the video attribute information includes work attribute information and episode attribute information.
  • step S202 may refer to the implementation process of step S102, which will not be described in detail here.
  • Step S203 classify the M videos to be processed according to the source label information to obtain an initial video set, and determine the videos to be processed in the initial video set that have the target work attribute information as the videos to be sorted; each video to be processed in the initial video set has the same source label information; the work attribute information involved in the M videos to be processed includes the target work attribute information.
  • step S203 may refer to the implementation process of step S103 above, which will not be described in detail here.
  • Step S204 sorting the videos to be sorted according to the episode attribute information corresponding to the videos to be sorted, to obtain sorted videos; if the episode attribute information corresponding to the sorted videos meets the legal episode condition, the sorted videos are determined to be ordered album videos, and a video album set containing the ordered album videos is generated; the video album set is used to be displayed in the query result display box when the query data matches the work attribute information or source tag information corresponding to the ordered album videos.
  • the video to be sorted is sorted to obtain the sorted video; the episode attribute information corresponding to the sorted video is detected for continuity to obtain the continuity detection result; if the continuity detection result is a continuous episode result, the sorted video is identified for the video version according to the target work knowledge graph to obtain the target video version corresponding to the sorted video; the target work knowledge graph is the work knowledge graph associated with the work attribute information corresponding to the sorted video; in the target work knowledge graph, the total episode information corresponding to the sorted video is determined according to the target video version; if the largest episode attribute information in the episode attribute information corresponding to the sorted video is the same as the total episode information, it is determined that the episode attribute information corresponding to the sorted video meets the episode legality condition, and then the sorted video is determined as an ordered album video; a video album set containing ordered album videos is generated.
  • the episode legality condition not only requires that the episodes between the sorted videos are continuous, but also requires that the last episode in the sorted video corresponds to the last episode of the film and television work to which it belongs, so as to better ensure the accuracy of the ordered album video.
  • the implementation process of the continuity detection can refer to the description of step S104 in the embodiment corresponding to Figure 3 above.
  • the knowledge graph is a semantic network that describes various entities and concepts that exist in the real world and the relationships between them. That is, it is a semantic network that describes the relationship between a film or TV work and various entities associated with the film or TV work.
  • the episode attribute information corresponding to the sorted video includes episode information or part information
  • the continuity detection result is a continuous episode result
  • the first video ranked first in the sorted video is the first video
  • check whether the last video ranked last in the sorted video has part information, if there is part information, then check whether the part information is the last part. If the part information contains "up” or "middle” or only the number "1", it is determined that the last video is not the last part, and the sorted video is determined to be an invalid video.
  • the part information corresponding to the sorted video is of N/M type, it is also necessary to determine whether the last video in the sorted video, that is, the N of the last video is equal to M, such as 4/4. If not, it is determined that the last video is not the last part, and the sorted video is determined to be an invalid video.
  • the computer device can also check whether the title (article title) of each video in the sorted video appears more than once. If the title appears multiple times and there is no part information, the sorted video is determined to be an invalid video. It should be noted that if the computer device determines that the sorted video is an invalid video, it is not necessary to generate the corresponding video album set.
  • the same film and television work may be performed by different actors.
  • the same film and television work has multiple video versions, and the actor lists corresponding to different video versions of the film and television works are very different. Therefore, we can first use object recognition technology to identify the R target video objects that appear the most times or appear for the longest time in the sorted video, and then calculate the object overlap between the R target video objects and the actor lists (i.e., video object lists) corresponding to each video version in the target work knowledge graph corresponding to the film and television work.
  • the R target video objects can be used as the actor list to be matched, and then the object overlap between the actor list to be matched and the actor list corresponding to a certain video version in the target work knowledge graph can be calculated.
  • the video version corresponding to the video object list with the largest object overlap is the target video version corresponding to the sorted video.
  • the computer device can also obtain the total number of episodes corresponding to the target video version through the target work knowledge graph, and then compare it with the largest number of episodes attribute information corresponding to the sorted video to determine whether the sorted video is complete. If the largest number of episodes attribute information is not less than the total number of episodes corresponding to the target video version, the sorted video can be determined as an ordered video album.
  • the computer device may also determine whether the time difference between the current system time and the release time of the target video version exceeds 90 days. If it exceeds 90 days, the sorted video is determined to be an invalid video. If it is less than 90 days, the sorted video can still be determined as an ordered video album.
  • a feasible implementation process for generating a video album set containing ordered album videos can be: traverse at least two ordered album videos, and sequentially obtain the jth ordered album video, where j is a positive integer; perform correlation matching on the video cover corresponding to the jth ordered album video and the video title corresponding to the jth ordered album video to obtain a correlation matching result; if the correlation matching result is a successful correlation matching result, then the video cover corresponding to the jth ordered album video is determined as the album video cover corresponding to the jth ordered album video; if the correlation matching result is a failed correlation matching result, then the jth ordered album video is subjected to video frame screening processing to obtain a video frame image that matches the video title corresponding to the jth ordered album video, and the video frame image is determined as the album video cover corresponding to the jth ordered album video; when the album video cover corresponding to each ordered album video is obtained, a video album set containing the album video cover
  • the computer device can also select an album video cover for each ordered album video in the video album set.
  • the original video cover of the ordered album video is no longer displayed, but the album video cover corresponding to the ordered album video is displayed.
  • the j-th ordered album video is screened for video frames to obtain a video frame picture matching the video title corresponding to the j-th ordered album video, and the video frame picture is determined as the album video cover corresponding to the j-th ordered album video.
  • the feasible implementation process can be: the top three (or other numbers, without limitation) video frames that are most relevant to the video title corresponding to the j-th ordered album video are screened through the image-text correlation model, and then the highest quality video frame picture is selected through the aesthetics model as the album video cover corresponding to the j-th ordered album video.
  • the video data processing method provided in the embodiment of the present application can help users understand a movie or TV series with a low threshold, in a complete, fast and concise manner. It solves the problem that when users search for their favorite videos, they cannot find subsequent related content or the content is missing. It also solves the problem of misattribution of video titles and content and low video quality, thereby improving the overall user experience.
  • FIG. 6 is a schematic diagram of the structure of a video data processing device provided in an embodiment of the present application.
  • the video data processing device can be a computer program (including program code) running on a computer device, for example, the video data processing device is an application software; the device can be used to execute the corresponding steps in the data processing method provided in an embodiment of the present application.
  • the video data processing device 1 may include: an acquisition module 11, a feature extraction module 12, a video determination module 13 and a generation module 14.
  • the feature extraction module 12 is used to extract features from the M videos to be processed, obtain video attribute information corresponding to each video to be processed, and obtain source label information corresponding to each video to be processed;
  • the video attribute information includes work attribute information and episode attribute information;
  • the video determination module 13 is used to classify the M videos to be processed according to the source label information to obtain an initial video set, and determine the videos to be processed with the target work attribute information in the initial video set as the videos to be sorted; each video to be processed in the initial video set has the same source label information; the work attribute information involved in the M videos to be processed includes the target work attribute information;
  • the generation module 14 is used to sort the videos to be sorted according to the episode attribute information corresponding to the videos to be sorted, so as to obtain the sorted videos. If the episode attribute information corresponding to the sorted videos meets the legal condition of the episode, the sorted videos are determined to be ordered album videos, and a video album set containing the ordered album videos is generated; the video album set is used to be displayed in the query result display box when the query data matches the work attribute information or source tag information corresponding to the ordered album videos.
  • the specific functional implementation of the acquisition module 11, the feature extraction module 12, the video determination module 13 and the generation module 14 can refer to the description of steps S101 to S104 in the corresponding embodiment of Figure 3, and will not be repeated here.
  • the M videos to be processed include a video to be processed M i , where i is a positive integer less than or equal to M;
  • the feature extraction module 12 includes: a first extraction unit 121 and a second extraction unit 122 .
  • the first extraction unit 121 is used to extract the work attribute of the video to be processed M i and obtain the work attribute information corresponding to the video to be processed M i ;
  • the second extraction unit 122 is used to perform episode attribute extraction processing on the video to be processed Mi to obtain episode attribute information corresponding to the video to be processed Mi.
  • first extraction unit 121 and the second extraction unit 122 can refer to the description of step S102 in the embodiment corresponding to FIG. 3 , which will not be described again here.
  • the first extraction unit 121 includes: a frame retrieval subunit 1211.
  • the frame retrieval subunit 1211 is used to perform sampling processing on the video to be processed M i to obtain a video frame image
  • the frame retrieval subunit 1211 is further used to perform picture matching processing on the video frame image and the video works in the video work library, and obtain the picture similarity between the video works in the video work library and the video frame image;
  • the frame search subunit 1211 is further used to determine the video work with the highest screen similarity to the video frame image as the target video work;
  • the frame retrieval subunit 1211 is further configured to determine the video work attribute information corresponding to the target video work as the work attribute information corresponding to the to-be-processed video M i if the picture similarity between the video frame image and the target video work is greater than or equal to the picture similarity threshold.
  • the specific functional implementation of the frame search subunit 1211 can refer to the description of step S102 in the embodiment corresponding to FIG. 3 , which will not be described in detail here.
  • the first extraction unit 121 may be specifically used to perform sampling processing on the video to be processed M i at equal intervals to obtain multiple video frame images, traverse the multiple video frame images, perform screen matching processing on the i-th video frame image in the multiple video frame images and the video works in the video work library, respectively, to obtain the screen similarity between the video works in the video work library and the i-th video frame image; i is a positive integer less than or equal to the number of the multiple video frame images;
  • the first extraction unit 121 is specifically used to obtain the video work with the highest picture similarity with the i-th video frame image as the pending video work corresponding to the i-th video frame image, mark the pending video work corresponding to the i-th video frame image, and when the pending video work corresponding to each video frame image in multiple video frame images is marked, the video work attribute information corresponding to the pending video work with the largest number of markings is determined as the work attribute information corresponding to the video to be processed M i .
  • the first extraction unit 121 includes: a template matching subunit 1212 .
  • the template matching subunit 1212 is used to obtain the video title information corresponding to the video M i to be processed;
  • the template matching subunit 1212 is further used to perform structural matching processing on the video title information and the title templates in the title template library to obtain the structural similarity between the title templates in the title template library and the video title information;
  • the template matching subunit 1212 is further used to determine the title template with the highest structural similarity to the video title information as the target title template;
  • the template matching subunit 1212 is further configured to: if the structural similarity between the video title information and the target title template is greater than or equal to the structural similarity The degree threshold is set, and the video title information is extracted according to the target title template to obtain the work attribute information corresponding to the video M i to be processed.
  • the specific functional implementation of the template matching subunit 1212 can refer to the description of step S102 in the embodiment corresponding to FIG. 3 , which will not be described in detail here.
  • the first extraction unit 121 includes: a propagation matching subunit 1213 .
  • the propagation matching subunit 1213 is used to traverse and obtain the kth sample video in the sample video library; k is a positive integer;
  • the propagation matching subunit 1213 is further used to perform picture matching processing on the processed video M i and the kth sample video to obtain the video picture similarity;
  • the propagation matching subunit 1213 is further used to calculate the similarity between the video title information of the processed video M i and the video title information corresponding to the k-th sample video to obtain the video title similarity;
  • the propagation matching subunit 1213 is further used to obtain the video click log associated with the to-be-processed video M i and the kth sample video, perform click analysis on the video click log, and obtain the video click similarity;
  • the propagation matching subunit 1213 is further used to determine the video similarity between the to-be-processed video M i and the kth sample video according to the video picture similarity, the video title similarity and the video click similarity;
  • the propagation matching subunit 1213 is further used for, if the video similarity is greater than the video similarity threshold, weighting the video work confidence of the k-th sample video for the associated work according to the video similarity to obtain the work confidence of the to-be-processed video M i for the associated work; the video work confidence of the k-th sample video for the associated work is used to characterize the credibility of the k-th sample video belonging to the associated work;
  • the propagation matching subunit 1213 is further configured to determine the video work attribute information corresponding to the associated work as the work attribute information corresponding to the to-be-processed video M i if the work confidence is greater than or equal to the work confidence threshold.
  • propagation matching subunit 1213 can refer to the description of step S102 in the embodiment corresponding to FIG. 3 , which will not be described in detail here.
  • the second extraction unit 122 includes: a frame matching subunit 1221.
  • the frame matching subunit 1221 is used to obtain a video work having work attribute information corresponding to the to-be-processed video M i from the video work library, and use the obtained video work as the to-be-matched video work;
  • the frame matching subunit 1221 is further used to perform equal-interval sampling processing on the video M i to be processed to obtain a video frame image;
  • the frame matching subunit 1221 is further used to perform picture matching processing on the video frame image and the video work picture in the video work to be matched, so as to obtain the video work picture matching the video frame image;
  • the frame matching sub-unit 1221 is further used to determine the episode number information corresponding to the video work picture that matches the video frame image as the episode number attribute information corresponding to the video to be processed Mi.
  • the specific functional implementation of the frame matching subunit 1221 can refer to the description of step S102 in the embodiment corresponding to FIG. 3 , which will not be described in detail here.
  • the second extraction unit 122 includes: a title matching subunit 1222.
  • the title matching subunit 1222 is used to perform video layout character recognition processing on the cover image of the video to be processed M i to obtain the cover title information corresponding to the video to be processed M i ;
  • the title matching subunit 1222 is further used to perform structural matching processing on the cover title information and the episode templates in the episode template library to obtain the structural similarity between the episode templates in the episode template library and the cover title information;
  • the title matching subunit 1222 is further used to determine the episode template with the highest structural similarity to the cover title information as the target episode template;
  • the title matching subunit 1222 is also used to extract information from the cover title information according to the target episode template to obtain the episode attribute information corresponding to the video M i to be processed if the structural similarity between the cover title information and the target episode template is greater than or equal to the structural similarity threshold.
  • step S102 in the embodiment corresponding to FIG. 3 , which will not be described in detail here.
  • the generation module 14 includes: a sorting unit 141 , a detection unit 142 , a version identification unit 143 , an episode determination unit 144 , a video determination unit 145 and an album generation unit 146 .
  • a sorting unit 141 is used to sort the videos to be sorted according to the episode attribute information corresponding to the videos to be sorted, so as to obtain sorted videos;
  • a detection unit 142 configured to perform continuity detection on the episode attribute information corresponding to the sorted videos to obtain a continuity detection result
  • the version identification unit 143 is used to sort the sorted videos according to the target work knowledge graph if the continuity detection result is a continuous result of the number of episodes. Perform video version identification processing to obtain a target video version corresponding to the sorted video; the target work knowledge graph is a work knowledge graph associated with work attribute information corresponding to the sorted video;
  • the episode number determination unit 144 is used to determine the total episode number information corresponding to the sorted videos according to the target video version in the target work knowledge graph;
  • the video determination unit 145 is configured to determine that the episode number attribute information corresponding to the sorted videos meets the episode number legality condition if the largest episode number attribute information among the episode number attribute information corresponding to the sorted videos is the same as the total episode number information;
  • the video determination unit 145 is further configured to determine the sorted video as an ordered album video if the episode number attribute information corresponding to the sorted video meets the episode number legality condition;
  • the album generation unit 146 generates a video album set including ordered album videos.
  • the specific functional implementation methods of the sorting unit 141, the detection unit 142, the version identification unit 143, the episode determination unit 144, the video determination unit 145 and the album generation unit 146 can be found in the description of step S204 in the corresponding embodiment of Figure 5, and will not be repeated here.
  • the target work knowledge graph includes one or more video versions and a list of video objects corresponding to each video version
  • the version identification unit 143 includes: an overlap determination subunit 1431 and a version determination subunit 1432 .
  • the overlap determination subunit 1431 is used to perform object recognition processing on the sorted video to obtain multiple video objects contained in the sorted video and the appearance duration corresponding to each video object;
  • the overlap determination subunit 1431 is further used to obtain R target video objects from the multiple video objects according to the duration sequence between the appearance durations corresponding to each video object; R is a positive integer;
  • the overlap determination subunit 1431 is further used to determine the object overlap between the R target video objects and each video object list in the target work knowledge graph; the object overlap refers to the overlap between the video objects included in a video object list and the R target video objects;
  • the version determination subunit 1432 is used to determine the video version corresponding to the video object list with the largest object overlap as the target video version corresponding to the sorted video.
  • the number of ordered album videos is at least two;
  • the album generation unit 146 includes a cover determination subunit 1461 and a generation subunit 1462 .
  • the cover determination subunit 1461 is used to traverse at least two ordered album videos and sequentially obtain the j-th ordered album video, where j is a positive integer;
  • the cover determination subunit 1461 is further used to perform correlation matching on the video cover corresponding to the j-th ordered album video and the video title corresponding to the j-th ordered album video to obtain a correlation matching result;
  • the cover determination subunit 1461 is further configured to, if the relevance matching result is a relevance matching failure result, perform video frame screening processing on the j-th ordered album video to obtain a video frame picture matching the video title corresponding to the j-th ordered album video, and determine the video frame picture as the album video cover corresponding to the j-th ordered album video;
  • the generating subunit 1462 is used to generate a video album set including the album video cover corresponding to each ordered album video when the album video cover corresponding to each ordered album video is obtained.
  • cover determination subunit 1461 and the generation subunit 1462 can be found in the description of step S204 in the corresponding embodiment of Figure 5, and will not be repeated here.
  • the video data processing device 1 further includes: a filtering module 15 .
  • a filtering module 15 configured to obtain a first initial video set
  • the filtering module 15 is further used to perform black edge detection on the first initial video set to obtain a black edge ratio corresponding to each initial video in the first initial video set;
  • the filtering module 15 is further used to filter the initial videos whose black edge ratio is greater than a black edge ratio threshold from the first initial video set to obtain a second initial video set;
  • the filtering module 15 is further used to perform watermark detection on the second initial video set to obtain the watermark area ratio corresponding to each initial video in the second initial video set;
  • the filtering module 15 is further used to filter the initial videos whose watermark area ratio is greater than the watermark area ratio threshold from the second initial video set to obtain a third initial video set;
  • the filtering module 15 is further used to identify the clarity of the third initial video set, and obtain the clarity of each initial video in the third initial video set. The clarity of the response;
  • the filtering module 15 is further configured to filter the initial videos whose definition is lower than the definition threshold from the third initial video set, to obtain M videos to be processed.
  • the specific functional implementation of the filtering module 15 can refer to the description of step S204 in the embodiment corresponding to FIG5 , which will not be described in detail here.
  • Figure 7 is a structural diagram of a computer device provided in an embodiment of the present application.
  • the video data processing device 1 in the embodiment corresponding to Figure 6 above can be applied to a computer device 1000, and the computer device 1000 may include: a processor 1001, a network interface 1004 and a memory 1005.
  • the computer device 1000 may also include: a user interface 1003, and at least one communication bus 1002.
  • the communication bus 1002 is used to realize the connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001. As shown in FIG. 7 , the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide a network communication network element;
  • the user interface 1003 is mainly used to provide an input interface for the user; and
  • the processor 1001 can be used to call the device control application stored in the memory 1005 to achieve:
  • M is a positive integer
  • Feature extraction is performed on each of the M videos to be processed to obtain video attribute information corresponding to each video to be processed, and source label information corresponding to each video to be processed is obtained;
  • the video attribute information includes work attribute information and episode attribute information;
  • the video to be sorted is sorted to obtain the sorted video. If the episode attribute information corresponding to the sorted video meets the legal condition of the episode, the sorted video is determined to be an ordered album video, and a video album set containing the ordered album video is generated; the video album set is used to be displayed in the query result display box when the query data matches the work attribute information or source label information corresponding to the ordered album video.
  • the computer device 1000 described in the embodiment of the present application can execute the description of the video data processing method in any of the embodiments corresponding to FIG. 3 and FIG. 5 above, which will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiment of the present application also provides a computer-readable storage medium
  • the above-mentioned computer-readable storage medium stores a computer program executed by the video data processing device 1 mentioned above, and the above-mentioned computer program includes program instructions.
  • the above-mentioned processor executes the above-mentioned program instructions, it can execute the description of the above-mentioned video data processing method in any of the embodiments corresponding to Figures 3 and 5 above, so it will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • FIG. 8 is a schematic diagram of the structure of another video data processing device provided in an embodiment of the present application.
  • the above-mentioned video data processing device can be a computer program (including program code) running in a computer device, for example, the video data processing device is an application software; the device can be used to execute the corresponding steps in the method provided in an embodiment of the present application.
  • the data processing device 2 may include: a first display module 21, a response module 22 and a second display module 23.
  • a response module 22 is used to respond to a trigger operation for the target query data, and if the intent type of the target query data is a video intent type, display a recommendation result display area in the query result display box of the application page;
  • the second display module 23 is used to sequentially display the ordered album videos included in the target video album set in the recommendation result display area;
  • the target video album set is a video album set whose work attribute information or source tag information matches the target query data, and the target video album set includes one or more ordered album videos corresponding to the work attribute information;
  • the display order of the ordered album videos with the same work attribute information is sorted according to the episode order between the corresponding episode attribute information;
  • the ordered album videos in the target video album set belong to the commentary video type.
  • the specific functional implementation of the first display module 21, the response module 22 and the second display module 23 can refer to the scene description in the corresponding embodiment of FIG. 2b, which will not be repeated here.
  • FIG. 9 is a schematic diagram of the structure of another computer device provided in an embodiment of the present application.
  • the video data processing device 2 in the embodiment corresponding to FIG. 8 can be applied to a computer device 2000, and the computer device 2000 can include
  • the computer device 2000 includes: a processor 2001, a network interface 2004 and a memory 2005.
  • the computer device 2000 also includes: a user interface 2003, and at least one communication bus 2002.
  • the communication bus 2002 is used to realize the connection and communication between these components.
  • the user interface 2003 may include a display screen (Display), a keyboard (Keyboard), and the optional user interface 2003 may also include a standard wired interface and a wireless interface.
  • the network interface 2004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 2005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 2005 may also be at least one storage device located away from the aforementioned processor 2001.
  • the memory 2005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application.
  • the network interface 2004 can provide a network communication function;
  • the user interface 2003 is mainly used to provide an input interface for the user;
  • the processor 2001 can be used to call the device control application stored in the memory 2005 to achieve:
  • a recommendation result display area is displayed in a query result display box on the application page;
  • the ordered album videos contained in the target video album set are displayed in sequence;
  • the target video album set is a video album set whose work attribute information or source tag information matches the target query data, and the target video album set includes one or more ordered album videos corresponding to the work attribute information;
  • the display order of the ordered album videos with the same work attribute information is sorted according to the episode number order between the corresponding episode attribute information;
  • the ordered album videos in the target video album set belong to the commentary video type.
  • the computer device 2000 described in the embodiment of the present application can execute the description of the video data processing method in the above embodiments, and can also execute the description of the video data processing device 2 in the above embodiments corresponding to FIG. 3 and FIG. 5, which will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiment of the present application also provides a computer-readable storage medium, and the above-mentioned computer-readable storage medium stores a computer program executed by the video data processing device 2 mentioned above.
  • the above-mentioned processor loads and executes the above-mentioned computer program, it can execute the description of the above-mentioned video data processing method in any of the above-mentioned embodiments, so it will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the computer-readable storage medium may be the video data processing device provided in any of the aforementioned embodiments or the internal storage unit of the computer device, such as a hard disk or memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (smart media card, SMC), a secure digital (secure digital, SD) card, a flash card (flash card), etc. equipped on the computer device.
  • the computer-readable storage medium may also include both the internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
  • the computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
  • the embodiment of the present application also provides a computer program product or a computer program, which includes a computer instruction, and the computer instruction is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method provided by any of the embodiments corresponding to Figures 3 and 5 above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种视频数据处理方法、装置、设备及可读存储介质,方法包括:获取M个待处理视频分别对应的来源标签信息、作品属性信息和集数属性信息;根据来源标签信息对M个待处理视频进行分类,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;初始视频集合中的每个待处理视频具有相同的来源标签信息;根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合。采用本发明,可以提高搜索的视频的呈现效果。

Description

视频数据处理方法、装置、设备及可读存储介质
本申请要求于2023年3月20日提交中国专利局、申请号为202310272580.9、申请名称为“视频数据处理方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种视频数据处理方法、装置、设备及可读存储介质。
背景技术
目前电影或者电视剧的剪辑博主会在原有电影或者电视剧IP(Intellectual Property,知识产权)的基础上进行剪辑并配上真人解说文案,内容以对电视或电视剧完整的剧情讲解为主,帮助用户快速了解该电影/电视剧IP的内容梗概。
因此,在用户通过搜索引擎去搜索某个电影或者电视剧IP时,查询结果显示框中也会推荐与该电影或者电视剧IP相关的影视解说类视频。但是由于视频剪辑入门简单,大多数剪辑博主都不是专业人员,在剪辑视频时没有固定的剪辑方向和剪辑安排,导致剪辑出的影视解说类视频可能不会被添加视频信息(如标题或集数)或所添加的视频信息不够准确。而目前的查询机制是在查询到与用户输入的搜索关键词相关联的影视解说类视频后,会把这些影视解说类视频全部混排在查询结果显示框中,这就导致查询结果显示框中可能存在视频信息展示不全的影视解说类视频,进而也就难以呈现同一个电影或者电视剧IP下的影视解说类视频之间的观看顺序,这就使得用户需要逐一点击查询结果显示框中的影视解说类视频进行观看,才能确定感兴趣的影视解说类视频以及这些影视解说类视频之间的观看顺序,导致搜索出的影视解说类视频的呈现效果较差。
发明内容
本申请实施例提供了一种视频数据处理方法、装置、设备及可读存储介质,可以提高搜索出的影视解说类视频的呈现效果。
本申请实施例一方面提供了一种视频数据处理方法,包括:
获取M个待处理视频;M为正整数;
对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取每个待处理视频分别对应的来源标签信息;视频属性信息包括作品属性信息和集数属性信息;
根据来源标签信息对M个待处理视频进行分类,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;初始视频集合中的每个待处理视频具有相同的来源标签信息;M个待处理视频所涉及的作品属性信息中包括目标作品属性信息;
根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合;视频专辑集合用于在查询数据与有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
本申请实施例一方面提供了一种视频数据处理方法,包括:
在应用页面的查询框中显示输入的目标查询数据;
响应针对目标查询数据的触发操作,若目标查询数据的意图类型为视频意图类型,则在应用页面的查询结果显示框中,显示推荐结果显示区域;
在推荐结果显示区域中,顺序显示目标视频专辑集合包含的有序专辑视频;目标视频专辑集合为作品属性信息或来源标签信息与目标查询数据相匹配的视频专辑集合,目标视频专辑集合包括一个或多个作品属性信息分别对应的有序专辑视频;具有相同作品属性信息的有序专辑视频的显示顺序是按照所对应的集数属性信息之间的集数顺序进行排序的;目标视频专辑集合中的有序专辑视频属于解说视频类型。
本申请实施例一方面提供了一种视频数据处理装置,包括:
获取模块,用于获取M个待处理视频;M为正整数;
特征提取模块,用于对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取每个待处理视频分别对应的来源标签信息;视频属性信息包括作品属性信息和集数属性信息;
视频确定模块,用于根据来源标签信息对M个待处理视频进行分类,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;初始视频集合中的每个待处理视频具有相同的来源标签信息;M个待处理视频所涉及的作品属性信息中包括目标作品属性信息;
生成模块,用于根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合;视频专辑集合用于在查询数据与有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
M个待处理视频包括待处理视频Mi,i为小于或等于M的正整数;
特征提取模块,包括:
第一提取单元,用于对待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息;
第二提取单元,用于对待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息。
其中,第一提取单元,包括:
帧检索子单元,用于对待处理视频Mi进行采样处理,得到视频帧图像;
帧检索子单元,还用于将视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到视频作品库中的视频作品分别与视频帧图像之间的画面相似度;
帧检索子单元,还用于将与视频帧图像之间的画面相似度最高的视频作品,确定为目标视频作品;
帧检索子单元,还用于若视频帧图像与目标视频作品之间的画面相似度大于或等于画面相似度阈值,则将目标视频作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。
第一提取单元,具体用于对待处理视频Mi进行等间隔采样处理,得到多个视频帧图像,遍历多个视频帧图像,将多个视频帧图像中的第i个视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到视频作品库中的视频作品分别与第i个视频帧图像之间的画面相似度;i为小于或等于多个视频帧图像的数量的正整数;
第一提取单元,具体用于获取与第i个视频帧图像之间的画面相似度最高的视频作品,作为第i个视频帧图像对应的待定视频作品,对第i个视频帧图像对应的待定视频作品进行标记,当标记完多个视频帧图像中每个视频帧图像对应的待定视频作品时,将标记次数最多的待定视频作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。
其中,第一提取单元,包括:
模板匹配子单元,用于获取待处理视频Mi对应的视频标题信息;
模板匹配子单元,还用于将视频标题信息与标题模板库中的标题模板分别进行结构匹配处理,得到标题模板库中的标题模板分别与视频标题信息之间的结构相似度;
模板匹配子单元,还用于将与视频标题信息之间结构相似度最高的标题模板,确定为目标标题模板;
模板匹配子单元,还用于若视频标题信息与目标标题模板之间的结构相似度大于或等于结构相似度阈值,则根据目标标题模板对视频标题信息进行信息提取处理,得到待处理视频Mi对应的作品属性信息。
其中,第一提取单元,包括:
传播匹配子单元,用于遍历获取样本视频库中的第k个样本视频;k为正整数;
传播匹配子单元,还用于对待处理视频Mi与第k个样本视频进行画面匹配处理,得到视频画面相似度;
传播匹配子单元,还用于对待处理视频Mi的视频标题信息与第k个样本视频对应的视频标题信息进行相似度计算,得到视频标题相似度;
传播匹配子单元,还用于获取与待处理视频Mi和第k个样本视频关联的视频点击日志,对视频点击日志进行点击分析处理,得到视频点击相似度;
传播匹配子单元,还用于根据视频画面相似度、视频标题相似度以及视频点击相似度,确定待处理视频Mi与第k个样本视频之间的视频相似度;
传播匹配子单元,还用于若视频相似度大于视频相似度阈值,则根据视频相似度对第k个样本视频针对关联作品的视频作品置信度进行加权处理,得到待处理视频Mi针对关联作品的作品置信度;第k个样本视频针对关联作品的视频作品置信度用于表征第k个样本视频属于关联作品的可信程度;
传播匹配子单元,还用于若作品置信度大于或等于作品置信度阈值,则将关联作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。
其中,第二提取单元,包括:
帧匹配子单元,用于从视频作品库中,获取具有待处理视频Mi对应的作品属性信息的视频作品,将 获取到的视频作品作为待匹配视频作品;
帧匹配子单元,还用于对待处理视频Mi进行采样处理,得到视频帧图像;
帧匹配子单元,还用于对视频帧图像和待匹配视频作品中的视频作品画面进行画面匹配处理,得到与视频帧图像匹配的视频作品画面;
帧匹配子单元,还用于将与视频帧图像匹配的视频作品画面对应的集数信息,确定为待处理视频Mi对应的集数属性信息。
其中,第二提取单元,包括:
标题匹配子单元,用于对待处理视频Mi的封面图像进行视频布局字符识别处理,得到待处理视频Mi对应的封面标题信息;
标题匹配子单元,还用于将封面标题信息与集数模板库中的集数模板分别进行结构匹配处理,得到集数模板库中的集数模板分别与封面标题信息之间的结构相似度;
标题匹配子单元,还用于将与封面标题信息之间结构相似度最高的集数模板,确定为目标集数模板;
标题匹配子单元,还用于若封面标题信息与目标集数模板之间的结构相似度大于或等于结构相似度阈值,则根据目标集数模板对封面标题信息进行信息提取处理,得到待处理视频Mi对应的集数属性信息。
其中,生成模块,包括:
排序单元,用于根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频;
检测单元,用于对排序视频对应的集数属性信息进行连续性检测,得到连续性检测结果;
版本识别单元,用于若连续性检测结果为集数连续结果,则根据目标作品知识图谱对排序视频进行视频版本识别处理,得到排序视频对应的目标视频版本;目标作品知识图谱为排序视频对应的作品属性信息关联的作品知识图谱;
集数确定单元,用于在目标作品知识图谱中,根据目标视频版本确定排序视频对应的总集数信息;
视频确定单元,用于若排序视频对应的集数属性信息中最大的集数属性信息和总集数信息相同,则确定排序视频对应的集数属性信息满足集数合法条件;
视频确定单元,还用于若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频;
专辑生成单元,生成包含有序专辑视频的视频专辑集合。
其中,目标作品知识图谱包含一个或多个视频版本以及每个视频版本对应的视频对象列表;
版本识别单元,包括:
重合确定子单元,用于对排序视频进行对象识别处理,得到排序视频所包含的多个视频对象以及每个视频对象分别对应的出现时长;
重合确定子单元,还用于根据每个视频对象分别对应的出现时长之间的时长顺序,从多个视频对象中获取R个目标视频对象;R为正整数;
重合确定子单元,还用于确定R个目标视频对象分别与目标作品知识图谱中的每个视频对象列表之间的对象重合度;对象重合度是指一个视频对象列表所包含的视频对象与R个目标视频对象之间的重合度;
版本确定子单元,用于将对象重合度最大的视频对象列表对应的视频版本,确定为排序视频对应的目标视频版本。
其中,有序专辑视频的数量为至少两个;
专辑生成单元,包括:
封面确定子单元,用于遍历至少两个有序专辑视频,顺序获取第j个有序专辑视频,j为正整数;
封面确定子单元,还用于对第j个有序专辑视频对应的视频封面和第j个有序专辑视频对应的视频标题进行相关度匹配,得到相关度匹配结果;
封面确定子单元,还用于若相关度匹配结果为相关度匹配成功结果,则将第j个有序专辑视频对应的视频封面确定为第j个有序专辑视频对应的专辑视频封面;
封面确定子单元,还用于若相关度匹配结果为相关度匹配失败结果,则对第j个有序专辑视频进行视频帧筛选处理,得到与第j个有序专辑视频对应的视频标题匹配的视频帧画面,将视频帧画面确定为第j个有序专辑视频对应的专辑视频封面;
生成子单元,用于当获取到每个有序专辑视频分别对应的专辑视频封面时,生成包含每个有序专辑视频分别对应的专辑视频封面的视频专辑集合。
其中,上述视频数据处理装置,还包括:
过滤模块,用于获取第一初始视频集;
过滤模块,还用于对第一初始视频集进行黑边检测,得到第一初始视频集中每个初始视频分别对应的黑边占比;
过滤模块,还用于从第一初始视频集中,过滤黑边占比大于黑边占比阈值的初始视频,得到第二初始视频集;
过滤模块,还用于对第二初始视频集进行水印检测,得到第二初始视频集中每个初始视频分别对应的水印面积占比;
过滤模块,还用于从第二初始视频集中,过滤水印面积占比大于水印面积占比阈值的初始视频,得到第三初始视频集;
过滤模块,还用于对第三初始视频集进行清晰度识别,得到第三初始视频集中每个初始视频分别对应的清晰度;
过滤模块,还用于从第三初始视频集中,过滤清晰度低于清晰度阈值的初始视频,得到M个待处理视频。
本申请实施例一方面提供了一种视频数据处理装置,包括:
第一显示模块,用于在应用页面的查询框中显示输入的目标查询数据;
响应模块,用于响应针对目标查询数据的触发操作,若目标查询数据的意图类型为视频意图类型,则在应用页面的查询结果显示框中,显示推荐结果显示区域;
第二显示模块,用于在推荐结果显示区域中,顺序显示目标视频专辑集合包含的有序专辑视频;目标视频专辑集合为作品属性信息或来源标签信息与目标查询数据相匹配的视频专辑集合,目标视频专辑集合包括一个或多个作品属性信息分别对应的有序专辑视频;具有相同作品属性信息的有序专辑视频的显示顺序是按照所对应的集数属性信息之间的集数顺序进行排序的;目标视频专辑集合中的有序专辑视频属于解说视频类型。
本申请实施例一方面提供了一种计算机设备,包括:处理器、存储器、网络接口;
上述处理器与上述存储器、上述网络接口相连,其中,上述网络接口用于提供数据通信网元,上述存储器用于存储计算机程序,上述处理器用于调用上述计算机程序,以执行本申请实施例中的方法。
本申请实施例一方面提供了一种计算机可读存储介质,上述计算机可读存储介质中存储有计算机程序,上述计算机程序适于由处理器加载并执行本申请实施例中的方法。
本申请实施例一方面提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例中的方法。
本申请实施例中,获取M个待处理视频后,可以对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息以及来源标签信息,该视频属性信息包括作品属性信息和集数属性信息;然后,可以将具有相同来源标签信息的待处理视频,加入同一个视频集合中,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;最后,根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合。可以理解,通过本申请实施例提供的方法得到的视频专辑集合中包含的有序专辑视频对应相同的作品属性信息和来源标签信息,当查询数据与有序专辑视频对应相同的作品属性信息和来源标签信息相匹配时,就可以将该视频专辑集合显示在查询结果显示框中,实现了结构化的视频输出,提高了查询数据对应视频的展示效果,且有序专辑视频在视频专辑集合中是按照集数属性信息排序的,不再需要逐一点击观看以确定有序专辑视频的观看顺序,从而提高了搜索出的影视解说类视频的呈现效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种网络架构示意图;
图2a是本申请实施例提供的一种视频专辑集合生成的场景示意图;
图2b是本申请实施例提供的一种视频查询的场景示意图;
图3是本申请实施例提供的一种视频数据处理方法的流程示意图;
图4是本申请实施例提供的一种视频聚类挖掘方法的整体流程示意图;
图5是本申请实施例提供的一种视频数据处理方法的流程示意图;
图6是本申请实施例提供的一种视频数据处理装置的结构示意图;
图7是本申请实施例提供的一种计算机设备的结构示意;
图8是本申请实施例提供的另一种视频数据处理装置的结构示意图;
图9是本申请实施例提供的另一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习、自动驾驶、智慧交通等几大方向。
计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR(Optical Character Recognition,文字识别)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、虚拟现实、增强现实、同步定位与地图构建、自动驾驶以及智慧交通等技术。
语音技术(Speech Technology)的关键技术有自动语音识别技术和语音合成技术以及声纹识别技术。让计算机能听、能看、能说、能感觉,是未来人机交互的发展方向,其中语音成为未来最被看好的人机交互方式之一。
自然语言处理(Nature Language processing,NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。
本申请实施例提供的方案涉及人工智能的计算机视觉技术、语言技术以及自然语言处理等技术,具体通过如下实施例进行说明:
请参见图1,图1是本申请实施例提供的一种网络架构示意图。该网络架构可以包括服务器100以及终端设备集群,终端设备集群可以包括:终端设备200a、终端设备200b、终端设备200c、…、终端设备200n,其中,终端设备集群中的任一终端设备可以与服务器100存在通信连接,例如终端设备200a与服务器100之间存在通信连接,其中,上述通信连接不限定连接方式,可以通过有线通信方式进行直接或间接地连接,也可以通过无线通信方式进行直接或间接地连接,还可以通过其它方式,本申请在此不做限制。
应该理解,如图1所示的终端集群中的每个终端设备均可以安装有应用客户端,当该应用客户端运行于各终端设备中时,可以分别与上述图1所示的服务器100之间进行数据交互。其中,该应用客户端可以为即时通信应用、直播应用、短视频应用、视频应用、音乐应用、社交应用、购物应用、游戏应用、小说应用、支付应用、浏览应用等具有查询功能的应用客户端。其中,该应用客户端可以为独立的客户端,也可以为集成在某客户端(例如即时通信客户端、社交客户端、视频客户端等)中的嵌入式子客户端,在此不做限定。以短视频应用为例,服务器100可以用于响应终端设备通过短视频应用所发送的查询请求,以执行针对查询请求中包含的属于视频意图类型的查询数据的查询处理,因此,每个终端设备均可以通过该短视频应用与服务器100进行数据传输,如每个终端设备均可以通过该短视频应用获取与查询数据相匹配 的视频专辑集合对应的数据流。
以终端设备200a为例,终端设备200a可以通过短视频应用显示应用页面,该应用页面中可以显示有查询框,终端设备200a响应输入操作后,可以在该查询框中显示输入的目标查询数据。其中,目标查询数据的意图类型为视频意图类型,即目标查询数据可以是指与电影或者电视剧等影视作品有关的数据,例如,影视IP名称、影视的参演人员、想看的剪辑博主等。然后,终端设备200a可以响应针对目标查询数据的触发操作,将包含目标查询数据的查询请求发送至服务器100,服务器100可以在视频专辑集合库中,获取作品属性信息或来源标签信息与目标查询数据相匹配的视频专辑集合,作为目标视频专辑集合,然后将目标视频专辑集合对应的数据流返回给终端设备200a。其中,作品属性信息是指影视IP信息。其中,来源标签信息是指视频的来源信息,例如,来源于哪个剪辑博主,或者来源于哪个网站等等。终端设备200a接收到相应的数据流后,就可以在应用页面的查询结果显示框中,显示推荐结果显示区域;在推荐结果显示区域中,顺序显示目标视频专辑集合包含的属于解说类型的有序专辑视频。例如,有序专辑视频可以是某电视剧中每一集视频分别对应的解说类视频,即若该电视剧一共有30集,则该电视剧对应的有序专辑视频就可能为30个解说类视频(如一个解说类视频是针对该电视剧其中一集所剪辑得到的视频),且这30个解说类视频是按照集数的顺序呈现在推荐结果显示区域中。其中,视频专辑集合库中的视频专辑集合,可以是服务器100根据本申请实施例提供的视频数据处理方法生成的。
具体的,本申请实施例中,服务器100可以获取M个待处理视频,M为正整数,然后对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息以及来源标签信息,该视频属性信息包括作品属性信息和集数属性信息;然后,服务器100可以将具有相同来源标签信息的待处理视频,加入相同的视频集合中,得到初始视频集合;可选的,若某个来源标签信息所关联的待处理视频只有一个,也可以生成只包含这一个待处理视频的初始视频集合。以目标作品属性信息(M个待处理视频所涉及的作品属性信息中包括目标作品属性信息)为例,服务器100可以将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;最后,根据待排序视频对应的集数属性信息,对待排序视频进行排序过滤处理,得到有序专辑视频,生成包含有序专辑视频的视频专辑集合。其中,排序过滤处理的过程可以为:根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频(即按照集数的先后顺序来排序这些待排序视频),再进一步检测这些排序视频对应的集数(来源于集数属性信息)是否连续、且这些排序视频中最大的集数是否与所属作品(如这些排序视频所属的电视剧)的总集数相同,若上述判断条件均能满足,则可以确定排序视频对应的集数属性信息满足集数合法条件,进而可以将排序视频确定为有序专辑视频;若上述判断条件未能均满足,则可以过滤这些排序视频,即不会将其作为有序专辑视频进行推荐,这样可以有效保证最终显示的有序专辑视频的质量。同样的,服务器100针对初始视频集合中的其他作品属性信息的待排序视频也可以按照上述过程进行排序过滤处理,若同样也能满足集数合法条件,则也会生成对应的有序专辑视频,因此,视频专辑集合可以包含多个作品属性信息分别对应的有序专辑视频。可选的,针对某作品属性信息对应的排序视频只有1个的情况,可以直接默认该排序视频是满足集数合法条件的,即可以将该排序视频直接作为有序专辑视频。可以理解,服务器100可以将生成的视频专辑集合和其对应的作品属性信息和来源标签信息关联写入视频专辑集合库中进行存储,以便于在接收到终端设备发送的查询数据后,确定与该查询数据对应的作品属性信息或者来源标签信息后,快速获取与该查询数据匹配的视频专辑集合,并返回对应的数据流给终端设备。可以理解,通过本申请实施例得到的视频专辑集合中包含的有序专辑视频对应的集数属性信息是连续的,便于快速确定观看顺序,从而提高了观看效率。
可以理解的是,本申请实施例提供的方法可以由计算机设备执行,计算机设备包括但不限于终端设备或服务器。其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云数据库、云服务、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端设备可以是智能手机、平板电脑、笔记本电脑、台式计算机、掌上电脑、移动互联网设备(mobile internet device,MID)、可穿戴设备(例如智能手表、智能手环等)、智能电视、智能车载等可以运行即时通信应用或社交应用的智能终端。终端设备和服务器可以通过有线或无线方式进行直接或间接地连接,本申请实施例在此不做限制。
可以理解的是,本申请实施例可应用于各种场景,包括但不限于云技术、人工智能、智慧交通、区块链等场景。
可以理解的是,在本申请的具体实施方式中,涉及到的查询数据等相关的数据,当本申请以上实施例 运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
为便于理解上述视频专辑集合生成以及查询目标查询数据时目标视频专辑集合的显示过程,请一并参见图2a-图2b,图2a-图2b的实现过程可以在如图1所示的服务器100中进行,也可以在终端设备(如图1所示的终端设备200a、终端设备200b、终端设备200c或终端设备200n)中进行,还可以由终端设备和服务器共同执行,此处不做限制,本申请实施例以终端设备200b、服务器100共同执行为例进行说明。
首先,请参见图2a,图2a是本申请实施例提供的一种视频专辑集合生成的场景示意图。如图2a所示,服务器100可以获取到M个待处理视频:待处理视频1、待处理视频2、……、待处理视频M。其中,待处理视频可以是影视解说类视频,即根据某部电影或者电视剧的部分内容剪辑并配上解说的视频。可以理解,M个待处理视频可以是服务器100对大量能够获取到的所有视频进行质量筛选后得到的视频,因此每个待处理视频的来源可能不同,其涉及的影视作品的内容可能不同,其对应的视频内容呈现方式、视频的发布方式也可能不同。因此,服务器100可以对M个待处理视频进行分类整理,得到有序的视频专辑集合,具体的,服务器100在获取到M个待处理视频后,可以先对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息以及来源标签信息。其中,视频属性信息可以包括作品属性信息和集数属性信息,作品属性信息用于描述待处理视频涉及的影视作品,集数属性信息用于描述待处理视频涉及对应的影视作品的哪部分内容(如哪一集);其中,来源标签信息是指视频的来源信息,例如,来源于哪个剪辑博主,或者来源于哪个网站等等。如图2a所示,待处理视频1对应的视频属性信息201可以为“电视剧A,第2集”,表明待处理视频1是针对电视剧A的第2集的影视内容的影视解说类视频;待处理视频2对应的视频属性信息202可以为“电视剧B,第1集”,表明待处理视频2是针对电视剧B的第1集的影视内容的影视解说类视频;待处理视频M对应的视频属性信息203可以为“电影C,上”,表明待处理视频M是针对电影C的上半部分的影视内容的影视解说类视频。
在得到每个待处理视频的视频属性信息以及来源标签信息后,服务器100可以先对M个待处理视频进行来源分类,即可以根据来源标签信息对待处理视频分类,如先将具有相同来源标签信息的待处理视频,加入相同的初始视频集合中,使得一个初始视频集合中的每个待处理视频具有相同的来源标签信息。如图2a所示,服务器100可以得到多个初始视频集合,例如,初始视频集合204。其中,初始视频集合204中可以包括有待处理视频2、…、待处理视频a,也就是说,待处理视频2、…、待处理视频a具有相同来源标签信息,其他的初始视频集合同理。然后,服务器100会将每个初始视频集合中,具有相同作品属性信息的待处理视频确定为待排序视频。以初始视频集合204为例,如图2a所示,假设待处理视频2、…、待处理视频c对应的作品属性信息均为电视剧B,待处理视频3、待处理视频a对应的作品属性信息均为电影D,则服务器100可以将待处理视频2、…、待处理视频c确定为待排序视频205,将待处理视频3、待处理视频a确定为待排序视频206,以此类推。然后,服务器100可以对每组待排序视频进行排序过滤处理(一组待排序视频对应一个作品属性信息,即一组待排序视频中的待处理视频均具有相同的作品属性信息),即根据某一组待排序视频对应的集数属性信息对该组待排序视频进行排序,得到排序视频,若这些排序视频对应的集数属性信息是连续且完整的,则可以将这些排序视频确定为有序专辑视频,然后生成包含有序专辑视频的视频专辑集合,视频专辑集合可以包含与一个或多个作品属性信息分别对应的有序专辑视频,即可以理解为其他作品属性信息对应的待排序视频也可以通过同样的方式生成对应的有序专辑视频。如图2a所示,假设服务器100在对待排序视频205进行排序后,待处理视频2后面的待处理视频为待处理视频c,但是待处理视频2对应的集数属性信息为第1集,待处理视频c对应的集数属性信息为第3集,也就是说,在待排序视频205中,没有与电视剧B第2集内容相关的待处理视频,则服务器100可以认为待排序视频205是无序的,服务器100可以放弃对待排序视频205进行后续处理,即不会生成其对应的有序专辑视频。如图2a所示,假设服务器100在对待排序视频206进行排序后,得到待处理视频a、待处理视频3,其中,待处理视频a对应的集数属性信息为上,待处理视频3对应的集数属性信息为下,则服务器100可以确定待排序视频206中的待处理视频对应的集数属性信息是连续且完整的,因此可以将待处理视频a、待处理视频3确定为有序专辑视频,然后生成包含该有序专辑视频(即待处理视频a、待处理视频3)的视频专辑集合207。
可以理解,当M的数量足够时,在对M个待处理视频进行分类整理后,服务器100最终可以得到多个有序的视频专辑集合,即一个视频专辑集合对应一种来源标签信息,针对一个视频专辑集合也可以包含与一个或多个作品属性信息分别对应的有序专辑视频,一个作品属性信息对应的有序专辑视频可以为多个,且这多个有序专辑视频的显示顺序是按照集数顺序进行显示的。当服务器100在接收到针对视频意图类型 的查询数据时,就可以先确定与查询数据匹配的作品属性信息或者来源标签信息,然后将与查询数据匹配的作品属性信息或者来源标签信息对应的有序专辑视频返回至终端设备。
进一步地,终端设备可以在应用页面的查询框中显示输入的目标查询数据,然后可以响应针对目标查询数据的触发操作,若目标查询数据的意图类型为视频意图类型,则在应用页面的查询结果显示框中,显示推荐结果显示区域;在推荐结果显示区域中,顺序显示目标视频专辑集合包含的有序专辑视频。其中,目标视频专辑集合为作品属性信息或来源标签信息与目标查询数据相匹配的视频专辑集合;目标视频专辑集合中的有序专辑视频的显示顺序,是按照有序专辑视频对应的集数属性信息之间的集数顺序进行排序的;目标视频专辑集合中的有序专辑视频属于解说视频类型。为便于理解,请一并参见图2b,图2b是本申请实施例提供的一种视频查询的场景示意图。如图2b所示,与终端设备200b具有关联关系的对象是对象1,终端设备200b上集成安装有短视频应用。对象1可以通过终端设备200b的短视频应用和服务器100进行数据交互,例如,对象1通过终端设备200b打开短视频应用后,如图2b所示,终端设备200b可以显示应用页面31,该应用页面31包括查询框311和查询结果显示框312。其中,查询框311用于提供查询功能,查询结果显示框312用于显示查询结果。假设对象1想要观看影视作品,就可以通过查询框311进行输入操作,终端设备200b可以在应用页面31的查询框311中显示对象1输入的查询内容311a,例如,查询内容311a可以为“电影D”。对象1输入完成后,就可以进行针对查询内容311a的触发操作,例如,该触发操作可以为针对查询控件311b的触发操作。终端设备200b响应针对查询内容311a的触发操作后,就可以将查询内容311a发送至服务器100。服务器100可以进行针对查询内容311a的查询处理,得到查询结果数据,然后向终端设备200b返回针对查询内容311a的查询结果数据,终端设备200b可以根据查询结果数据在结果显示框中显示查询的结果。其中,查询处理的一个可行过程,就是先确定查询内容311a的意图类型,若确定查询内容311a的意图类型为视频意图类型,则服务器100除了在能够获取的大量视频数据中查找与查询内容311a相匹配的视频数据,作为第一视频数据,还会在由图2a所示的场景中得到的多个有序的视频专辑集合中,查找与查询内容311a匹配的视频专辑集合,即与“电影D”匹配的视频专辑集合,如上述图2a所示的视频专辑集合207,服务器100会将视频专辑集合207对应的视频数据,作为第二视频数据,然后,服务器100会将第一视频数据和第二视频数据,确定为查询结果数据。
如图2b所示,终端设备200b接收到查询结果数据后,会在应用页面31的查询结果显示框312中显示推荐结果展示区域,例如,推荐结果展示区域312a,推荐结果展示区域312b。其中,不同的推荐结果展示区域用于展示不同的视频数据,第二视频数据的展示级优先于第一视频数据,因此,终端设备200b会在推荐结果展示区域312a中展示第二视频数据,在推荐结果展示区域312b中展示第一视频数据。如图2b所示,终端设备200b会在推荐结果展示区域312a中,根据视频专辑集合207中包含的有序专辑视频的位置顺序(位置顺序与有序专辑视频对应的集数顺序是相匹配的),依次展示每个有序专辑视频对应的视频封面,因为视频专辑集合207顺序包含待处理视频a和待处理视频3,因此,视频封面313为待处理视频a对应的视频封面,视频封面314为待处理视频3对应的视频封面。然后,终端设备200b才会在推荐结果展示区域312b中展示第一视频数据对应的视频封面。
由此可见,通过本申请实施例提供的视频数据处理方法,在响应针对视频意图类型的查询数据时,终端设备会先优先显示有序的视频专辑集合,实现了结构化的有序视频输出,提高了查询数据对应视频的展示效果,且有序专辑视频在视频专辑集合中是按照集数属性信息排序的,不再需要逐一点击观看以确定有序专辑视频的观看顺序,从而提高搜索出的影视解说类视频的呈现效果。
进一步地,请参见图3,请参见图3,图3是本申请实施例提供的一种视频数据处理方法的流程示意图。该视频数据处理方法可以由计算机设备执行,计算机设备可以包括如图1的终端设备或服务器。该方法可以包括以下步骤S101-步骤S104:
步骤S101,获取M个待处理视频;M为正整数。
具体的,待处理视频是指与影视作品(即上述电影或者电视剧IP)相关联的剪辑视频。
一个可行的实施例中,待处理视频可以为影视解说类视频,即剪辑博主对影视作品中的部分影视内容进行剪辑并配上对应的解说(可以为文字解说、语音解说或者视频解说等)而生成的视频。可以理解,影视解说类视频能帮助用户快速了解该影视作品的内容梗概。
步骤S102,对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取每个待处理视频分别对应的来源标签信息;视频属性信息包括作品属性信息和集数属性信息。
具体的,作品属性信息是指待处理视频所对应的影视作品信息,例如,待处理视频A对应的作品属性信息可以为电视剧的名称,如,为《BBB》,说明待处理视频A的视频内容属于电视剧《BBB》。集数属 性信息用于表征待处理视频的视频内容对应于影视作品中哪个时间段的影视内容,例如,待处理视频A对应的集数属性信息为第1-2集,说明待处理视频A中的视频内容涉及到电视剧《BBB》中第一集与第二集的影视内容,即可以理解为,待处理视频A是基于电视剧《BBB》的第一集和第二集的影视内容剪辑生成的。计算机设备还可以获取每个待处理视频分别对应的来源标签信息,来源标签信息是指视频的来源信息,例如,来源于哪个剪辑博主,或者来源于哪个网站等等。
具体的,假设M个待处理视频包括待处理视频Mi,i为小于或等于M的正整数。为便于理解,以待处理视频Mi为例对特征提取进行说明,对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息的一个可行实施过程,可以为:对待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息;对待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息。其中,作品属性提取处理可以采取多种方式,例如,视频帧检索、标题模板匹配以及标签传播等方式。其中,集数属性提取处理可以采取多种方式,例如,视频帧检索、标题模板匹配等方式。
具体的,当作品属性提取处理采用视频帧检索方式时,对待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息的一个可行实施过程,可以为:对待处理视频Mi进行采样处理,得到视频帧图像;将视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到视频作品库中的视频作品分别与视频帧图像之间的画面相似度;将与视频帧图像之间的画面相似度最高的视频作品,确定为目标视频作品;若视频帧图像与目标视频作品之间的画面相似度大于或等于画面相似度阈值,则将目标视频作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。其中,采样处理可以是等时间间隔采样,即所采样得到的视频帧图像可以为多个,且相邻的视频帧图像在待处理视频Mi中对应的播放时间之间的时间间隔是相等的,例如,待处理视频Mi的播放时长为20s(即20秒),采样时间间隔为5s,则获取到的视频帧图像为待处理视频Mi第5s、第10s、第15s以及第20s分别对应的帧图像。因此,可以理解,视频帧图像的数量为一个或多个。其中,视频作品是指电影或者电视剧对应的完整影视视频。
其中,视频帧图像可以包括视频帧图像X,影视作品库中的影视作品可以包括影视作品Y,那么在影视作品Y包含的画面帧图像中,获取与视频帧图像X之间相似度最高的画面帧图像,作为目标画面帧图像,将目标画面帧图像与视频帧图像X之间的相似度确定为影视作品Y与视频帧图像X之间的画面相似度。其中,视频帧图像与画面帧图像之间的图像相似度,可以通过两个图像分别对应的图像表示向量来计算,也可以通过其他相似度比较模型来获取,这里不作限制。
可选的,当视频帧图像的数量为多个时,确定与视频帧图像之间的画面相似度最高的视频作品的可行实现过程,可以为:遍历多个视频帧图像,将多个视频帧图像中的第i个视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到视频作品库中的视频作品分别与第i个视频帧图像之间的画面相似度;i为小于或等于多个视频帧图像的数量的正整数;获取与第i个视频帧图像之间的画面相似度最高的视频作品,作为第i个视频帧图像对应的待定视频作品,对第i个视频帧图像对应的待定视频作品进行标记。当标记完多个视频帧图像中每个视频帧图像对应的待定视频作品时,将标记次数最多的待定视频作品确定为与视频帧图像之间的画面相似度最高的视频作品,进而可以将标记次数最多的待定视频作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。通过数量更多的视频帧图像所确定出的相似度最高的待定视频作品可以具有更高的准确性,即可以保证所确定出的待处理视频Mi对应的作品属性信息足够准确。
可选的,当作品属性提取处理采用标题模板匹配方式时,对待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息的一个可行实施过程,可以为:获取待处理视频Mi对应的视频标题信息;将视频标题信息与标题模板库中的标题模板分别进行结构匹配处理,得到标题模板库中的标题模板分别与视频标题信息之间的结构相似度;将与视频标题信息之间结构相似度最高的标题模板,确定为目标标题模板;若视频标题信息与目标标题模板之间的结构相似度大于或等于结构相似度阈值,则根据目标标题模板对视频标题信息进行信息提取处理,得到待处理视频Mi对应的作品属性信息。其中,标题模板库中的标题模板是指预先定义的文字模板,用于提取视频标题信息中的作品属性信息,即IP信息,例如,标题模板可以包括:“《IP》”、“<IP>”、“[IP]”、“IP+数字:”、“IP+数字”。假设,待处理视频Mi对应的视频标题信息C为“《XXX》”,计算机设备计算该视频标题信息C与标题模板库中的标题模板之间的结构相似度后,可以确定与该视频标题信息C最相似的目标标题模板为“《IP》”,则计算机设备可以根据目标标题模板对视频标题信息C进行信息提取处理,得到待处理视频Mi对应的作品属性信息为XXX。
具体的,标签传播就是通过利用样本间的关系,从已标记的节点标签信息来预测未标记的节点标签信息,当作品属性提取处理是采用标签传播方式时,对待处理视频Mi进行作品属性提取处理,得到待处理 视频Mi对应的作品属性信息的一个可行实施过程,可以为:遍历获取样本视频库中的第k个样本视频;k为正整数;对待处理视频Mi与第k个样本视频进行画面匹配处理,得到视频画面相似度;对待处理视频Mi的视频标题信息与第k个样本视频对应的视频标题信息进行相似度计算,得到视频标题相似度;获取与待处理视频Mi和第k个样本视频关联的视频点击日志,对视频点击日志进行点击分析处理,得到视频点击相似度;根据视频画面相似度、视频标题相似度以及视频点击相似度,确定待处理视频Mi与第k个样本视频之间的视频相似度;若视频相似度大于视频相似度阈值,则根据视频相似度对第k个样本视频针对关联作品的视频作品置信度进行加权处理,得到待处理视频Mi针对关联作品的作品置信度;若作品置信度大于或等于作品置信度阈值,则将关联作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。其中,第k个样本视频针对关联作品的视频作品置信度用于表征第k个样本视频属于关联作品的可信程度。其中,样本视频与待处理视频属于相同类型的视频,例如,待处理视频为影视解说类视频时,样本视频也为影视解说类视频。可以理解,样本视频库中的样本视频可以看为一个节点,且对应有一个关联作品标签(该关联作品标签可以指示样本视频所对应的关联视频),同时关联作品标签都会有一个视频作品置信度(计算标签时算法生成的)。其中,视频作品置信度用于表征样本视频属于关联作品标签所指示的关联作品的可信程度,当视频作品置信度大于作品置信度阈值时,可以认为样本视频属于该关联作品。其中,一个视频点击日志是指一个用户在某段时间内对视频的点击行为分析日志,可以理解,与待处理视频Mi和第k个样本视频关联的视频点击日志的数量可以有多个,通过这些视频点击日志,可以分析出用户同时点击待处理视频Mi和第k个样本视频的可能性,作为视频点击相似度。其中,根据视频画面相似度、视频标题相似度以及视频点击相似度,确定待处理视频Mi与第k个样本视频之间的视频相似度的过程可以是指:对视频画面相似度、视频标题相似度以及视频点击相似度相加求平均,也可以是指:对视频画面相似度、视频标题相似度以及视频点击相似度加权后再相加求平均,具体可以根据实际情况决定,本申请在此不作限制。
具体的,当集数属性提取处理是采用视频帧检索方式时,当集数属性提取处理采用视频帧检索方式时,对待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息的一个可行实施过程,可以为:从视频作品库中,获取具有待处理视频Mi对应的作品属性信息的视频作品,将获取到的视频作品作为待匹配视频作品;对待处理视频Mi进行采样处理,得到视频帧图像;对视频帧图像和待匹配视频作品中的视频作品画面进行画面匹配处理,得到与视频帧图像匹配的视频作品画面;将与视频帧图像匹配的视频作品画面对应的集数信息,确定为待处理视频Mi对应的集数属性信息。可以理解,通过对视频帧图像和待匹配视频作品进行画面匹配处理,可以定位到视频帧图像对应待匹配视频作品中的第几集第几分几秒的视频作品画面,从而可以确定出待处理视频Mi涉及到待匹配视频作品的哪部分内容,从而确定集数属性信息。
可选的,当集数属性提取处理是采用标题模板匹配方式时,对待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息的一个可行实施过程,可以为:对待处理视频Mi的封面图像进行视频布局字符识别处理,得到待处理视频Mi对应的封面标题信息;将封面标题信息与集数模板库中的集数模板分别进行结构匹配处理,得到集数模板库中的集数模板分别与封面标题信息之间的结构相似度;将与封面标题信息之间结构相似度最高的集数模板,确定为目标集数模板;若封面标题信息与目标集数模板之间的结构相似度大于或等于结构相似度阈值,则根据目标集数模板对封面标题信息进行信息提取处理,得到待处理视频Mi对应的集数属性信息。其中,视频布局字符识别处理,是指通过VideoLayout_OCR(视频布局字符识别)技术,在获取封面图上的文字信息的同时,还可以识别出区域文字的布局属性,例如标题、字幕、背景文字等,从而根据布局属性和文字信息确定出待处理视频Mi对应的封面标题信息。其中,VideoLayout_OCR是指使用三分支多任务的神经网络,将文字检测和属性分类任务合二为一的技术。
可以理解,集数属性提取处理采用标题模板匹配方式时,具体实现过程可以参考作品属性提取处理采用标题模板匹配方式时的实现过程。在集数属性提取处理中,采用的模板是针对集数属性信息的集数模板。因为集数属性信息可以包括两部分,episode(集数)和part(部分),其中,Episode代表第几集,part代表第几部分,例如上/中/下,1/2/3。因此,集数模板可以分为用于提取pattern信息的pattern类型模板和用于提取part信息的part类型模板。其中,pattern类型模板可以包括有:第"+阿拉伯数字/中文数字+"期"或者"集"或者"案",如第1期或者第二集;"第"+阿拉伯数字+"-"或者"~"+阿拉伯数字+"期"或者"集"或者"案",如第1-2集;"EP"或者"Part"+阿拉伯数字,如EP1或者Part1;或者视频标题带有“大结局”的字符串,则认为该集为最后一集。其中,part类型模板可以包括有:“(上/中/下/数字)”、“《上/中/下/数字》”、“[上/中/下/数字]”、数字+"/"+数字,如1/3、数字+"|"+数字,如1|3;数字+"-"+数字,如3-1。如 果标题文字能匹配上part类型模板,则获取到该视频的part信息,如上中下集,或者1/3,2/3,3/3。可以理解,计算机设备可以将两种类型的集数模板分别与封面标题信息进行匹配,两者之间的匹配互不影响。如果有与封面标题信息匹配的part类型模板,则可以提取出part信息;如果有与封面标题信息匹配的pattern类型模板,则可以提取出pattern信息。
具体的,当集数属性提取处理是采用标题模板匹配方式时,除了针对待处理视频Mi对应的封面标题信息进行集数属性提取,还可以对待处理视频Mi对应的视频标题信息进行集数属性提取。其中,视频标题信息是指待处理视频Mi发布时对应的标题信息。
具体的,由上述描述可知,作品属性提取处理可以采取视频帧检索、标题模板匹配以及标签传播等方式,集数属性提取处理可以采取视频帧检索、标题模板匹配等方式,在待处理视频Mi对应的作品属性信息和集数属性信息的实际提取过程中,可以同时使用上述一种或多种方式来进行待处理视频Mi的作品属性提取处理,并同时使用上述一种或多种方式来进行待处理视频Mi的集数属性提取处理,本申请在此不作限制。
可选的,在上述作品属性提取处理过程或者集数属性提取处理过程中,有些待处理视频可能无法提取出可用的作品属性信息或者集数属性信息,计算机设备可以将这些待处理视频确定为无效待处理视频,直接过滤掉,即不再参与后续步骤的处理。
步骤S103,根据来源标签信息对M个待处理视频进行分类,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;初始视频集合中的每个待处理视频具有相同的来源标签信息;M个待处理视频所涉及的作品属性信息中包括目标作品属性信息。
具体的,来源标签信息是指待处理视频的来源信息,例如,发布待处理视频的作者ID(Identity document,身份标识或者账号)。可以理解,计算机设备可以先将有效的待处理视频根据来源标签信息进行归类,即根据来源标签信息对M个待处理视频进行分类,也可以理解为是将具有相同来源标签信息的待处理视频加入到同个初始视频集合中,以此得到多个初始视频集合,即一个初始视频集合对应一种来源标签信息,换言之,一个初始视频集合中的每个待处理视频均具有相同的来源标签信息。然后再在初始视频集合中,根据作品属性信息进行归类,将具有相同作品属性信息的待处理视频确定为同一批待排序视频,如可以将某初始视频集合中具有目标作品属性信息的待处理视频确定为某一批待排序视频,即某初始视频集合中属于同一个影视作品的待处理视频(它们都具有相同的来源标签信息,也具有相同的作品属性信息)会被共同处理为后续的有序专辑视频,同样的,该初始视频集合中属于另一个影视作品的待处理视频也会被共同处理为另一组有序专辑视频。
步骤S104,根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合;视频专辑集合用于在查询数据与有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
具体的,针对同一个作品属性信息(如目标作品属性信息)对应的待排序视频,计算机设备可以根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,然后对排序视频对应的集数属性信息进行连续性检测,得到连续性检测结果;若连续性检测结果为集数连续结果,则可以确定排序视频对应的集数属性信息满足集数合法条件,即集数合法条件可以是指排序视频之间的集数属性信息是否满足集数连续结果的条件,进而将满足集数合法条件的排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合;若连续性检测结果为集数不连续结果,则将排序视频确定为无序视频,不用生成针对无序视频的视频专辑集合。其中,排序处理可以是从小到大进行升序,也可以是从大到小进行降序,这里不作限制。其中,连续性检测即确定所有相邻的排序视频对应的集数属性信息是否连续,例如,排序视频1对应的集数属性信息为第一集,与其相邻的排序视频2对应的集数属性信息为第三集,显然,两者不连续,中间缺失了第二集,此时连续性检测结果就为集数不连续结果。可见,此时的集数连续结果是指每相邻两个排序视频的集数属性信息之间均互为相邻集数。
可选的,计算机设备在进行连续性检测时,还可以对排序视频中的第一个排序视频对应的集数属性信息进行识别,从而确定第一个排序视频是否为作品首视频,即确定其对应的集数属性信息是否为第一集;同理的,计算机设备可以获取与排序视频对应的作品属性信息对应的总集数信息,然后计算机设备可以对排序视频中的最后一个排序视频对应的集数属性信息进行识别,确定最后一个排序视频是否为作品尾视频,即确定其对应的集数属性信息是否等于所属影视作品的总集数信息。若第一个排序视频不为作品首视频或者最后一个排序视频不为作品尾视频,可以将连续性检测结果确定为集数不连续结果。可见,此时的集数 连续结果是指每相邻两个排序视频的集数属性信息之间均互为相邻集数,且第一个排序视频为所属影视作品的作品首视频(如第一集),且最后一个排序视频为所属影视作品的作品尾视频(如最后一集)。
为便于理解上述过程,以待处理视频为影视解说类视频,生成针对电影或电视剧作品的完整视频解说专辑为例进行说明。请一并参见图4,图4是本申请实施例提供的一种视频聚类挖掘方法的整体流程示意图。如图4所示,整个视频聚类挖掘方法主要包括以下流程:
步骤T1,输入M个待处理视频。
具体的,M个待处理视频即上述图3所对应实施例中步骤S101中的M个待处理视频。
然后,计算机设备需要提取每个待处理视频对应的集数属性信息和作品属性信息,还会获取每个待处理视频对应的来源标签信息,计算机设备会对每个待处理视频分别执行步骤T2-步骤T9,若最终未提取到待处理视频的作品属性信息或者集数属性信息,则确定该待处理视频为无效视频。为便于理解,下述步骤T2-步骤T9均以单个的待处理视频为例进行描述。
步骤T2,对待处理视频进行视频帧检索处理。
具体的,视频帧检索处理的实现过程,可以参见上述步骤S102中当作品属性提取处理采用视频帧检索方式时的描述,这里不再进行赘述。
步骤T3,确定是否提取到待处理视频对应的作品属性信息;若是,则执行步骤T8;若不是,则执行步骤T4。
步骤T4,对待处理视频进行模板匹配处理。
具体的,模板匹配处理的实现过程,可以参见上述步骤S102中当作品属性提取处理是采用模板匹配方式时的描述,这里不再进行赘述。
步骤T5,确定步骤T4是否提取到待处理视频对应的作品属性信息;若是,则执行步骤T8;若不是,则执行步骤T6。
步骤T6,对待处理视频进行标签传播处理。
具体的,标签传播处理的实现过程,可以参见上述步骤S102中当作品属性提取处理是采用标签传播方式时的描述,这里不再进行赘述。
步骤T7,确定步骤T6是否提取到待处理视频对应的作品属性信息;若是,则执行步骤T8;若不是,则确定待处理视频无效。
步骤T8,对待处理视频进行集数属性信息提取处理。
具体的,集数属性信息提取处理的实现过程,可以参见上述步骤S102中集数属性信息提取处理实现的描述,这里不再进行赘述。
步骤T9,确定步骤T8是否提取到待处理视频对应的集数属性信息;若是,则执行步骤T10;若不是,则确定待处理视频无效。
步骤T10,对有效的待处理视频进行专辑聚合。
具体的,首先将有效的待处理视频,根据作者ID(即上述图3的来源标签信息)归类,然后在同作者下的同一个视频IP(即作品属性信息)再进行归类,这样可以得到唯一的视频作者+视频IP的有效的待排序视频。步骤T10的具体实现过程,可以参见上述步骤S103的描述。
步骤T11,生成视频专辑。
具体的,步骤T11的实现,可以参见上述图3所对应实施例中步骤S104的描述,这里不再进行赘述。
由此可见,采用本申请实施例提供的方法,得到的视频专辑集合中包含的有序专辑视频对应相同的作品属性信息和来源标签信息,当查询数据与有序专辑视频对应相同的作品属性信息和来源标签信息相匹配时,就可以将该视频专辑集合显示在查询结果显示框中,实现了结构化的视频输出,提高了查询数据对应视频的展示效果,且有序专辑视频在视频专辑集合中是按照集数属性信息排序的,不再需要逐一点击观看以确定有序专辑视频的观看顺序,从而提高了搜索出的影视解说类视频的呈现效果。而且本申请通过视频帧检索、标题模板匹配以及标签传播等方式可以更准确挖掘出待处理视频的作品属性信息和集数属性信息,从而可以更好的保证所生成的有序专辑视频的准确性,而且由于具有视频帧检索、标题模板匹配以及标签传播这样的递进挖掘机制,就可以在更多的待处理视频中挖掘出作品属性信息和集数属性信息,从而可以保证有序专辑视频的数量。而且通过集数合法条件进行判断检测,可以更好的保证一个影视作品对应的有序专辑视频之间的集数是连续且完整的,进一步保证了有序专辑视频的准确性。
进一步地,请参见图5,请参见图5,图5是本申请实施例提供的一种视频数据处理方法的流程示意图。该视频数据处理方法可以由计算机设备执行,计算机设备可以包括如图1的终端设备或服务器。该方 法可以包括以下步骤S201-步骤S204:
步骤S201,对第一初始视频集进行质量筛选处理,得到M个待处理视频;第一初始视频集包括至少两个初始视频。
具体的,为了生成质量更好的视频专辑集合,可以先对第一初始视频集中的视频进行质量筛选,过滤掉一些质量不合格的视频。质量筛选处理可以包括黑边检测、水印检测、清晰度识别。
其中,黑边检测要求初始视频的黑边占比不能超过一定范围,否则内容画面占比过小,影响用户观看体验。黑边检测主要是通过固定采样率在初始视频中进行抽帧,然后设定黑边占比阈值进行图像二值化,检测连续黑色像素点在视频宽/高中低占比来对初始视频进行过滤。其中,黑边占比阈值可以根据视频的长宽来决定,例如,短视频对应的黑边占比阈值可以为1/3,小视频对应的黑边占比阈值可以为2/3。其中,短视频是指视频的宽的长度大于高的长度的视频;小视频是指视频的宽的长度小于高的长度的视频。
其中,水印检测要求初始视频中没有过大的水印,否则会对视频画面主体有严重遮挡。水印检测主要是通过视频连续帧之间的像素对比获取候选区域,然后通过边缘检测、均值滤波和大津阈值法对视频中的帧图像进行二值化,再使用联通域算法和聚类算法,筛选得到联通面积最大的区域,即认为是水印部位。然后对比水印面积和画面面积,超过1/25则认为水印过大,存在遮挡。
其中,清晰度识别是指通过计算视频画面内像素点之间的梯度,统计全局梯度均值,再归一化得到清晰度,清晰度可以取值0-4,4表示最清晰,此时清晰度阈值可以设置为2,即初始视频对应的清晰度不可以低于2。
可以理解,计算机设备可以选择黑边检测、水印检测或清晰度识别中的一种或多种处理过程对第一初始视频集中的视频进行质量筛选,也可以根据实际情况加入其他质量筛选处理。
具体的,若同时采用黑边检测、水印检测、清晰度识别三种处理方式,则对第一初始视频集进行质量筛选处理,得到M个待处理视频的一个可行实施过程,可以为:获取第一初始视频集;对第一初始视频集进行黑边检测,得到第一初始视频集中每个初始视频分别对应的黑边占比;从第一初始视频集中,过滤黑边占比大于黑边占比阈值的初始视频,得到第二初始视频集;对第二初始视频集进行水印检测,得到第二初始视频集中每个初始视频分别对应的水印面积占比;从第二初始视频集中,过滤水印面积占比大于水印面积占比阈值的初始视频,得到第三初始视频集;对第三初始视频集进行清晰度识别,得到第三初始视频集中每个初始视频分别对应的清晰度;从第三初始视频集中,过滤清晰度低于清晰度阈值的初始视频,得到M个待处理视频。可以理解,通过三种过滤方式一层一层对初始视频进行过滤,剩下的M个待处理视频的视频质量有一定保证,能够提升最后生成的视频专辑集合的专辑质量。
步骤S202,对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取每个待处理视频分别对应的来源标签信息;视频属性信息包括作品属性信息和集数属性信息。
具体的,步骤S202的实现过程可以参见上述步骤S102的实现过程,这里不再进行赘述。
步骤S203,根据来源标签信息对M个待处理视频进行分类,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;初始视频集合中的每个待处理视频具有相同的来源标签信息;M个待处理视频所涉及的作品属性信息中包括目标作品属性信息。
具体的,步骤S203的实现过程,可以参见上述步骤S103的实现过程,这里不再进行赘述。
步骤S204,根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合;视频专辑集合用于在查询数据与有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
具体的,根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频;对排序视频对应的集数属性信息进行连续性检测,得到连续性检测结果;若连续性检测结果为集数连续结果,则根据目标作品知识图谱对排序视频进行视频版本识别处理,得到排序视频对应的目标视频版本;目标作品知识图谱为排序视频对应的作品属性信息关联的作品知识图谱;在目标作品知识图谱中,根据目标视频版本确定排序视频对应的总集数信息;若排序视频对应的集数属性信息中最大的集数属性信息和总集数信息相同,则确定排序视频对应的集数属性信息满足集数合法条件,进而将排序视频确定为有序专辑视频;生成包含有序专辑视频的视频专辑集合。此时,集数合法条件不仅仅要求排序视频之间的集数是连续的,而且还要求排序视频中的最后一集与所属影视作品的最后一集是对应上的,以此可以更好的保证有序专辑视频的准确性。其中,连续性检测的实现过程,可以参见上述图3所对应实施例中步骤S104的描述。其中,知识图谱是描述真实世界中存在的各种实体和概念,以及它们之间的关系的一种语义网络,作品知识图谱 即是描述某部影视作品以及影视作品相关联的各种实体之间的关系的语义网络。
可选的,当排序视频对应的集数属性信息包括episode(集数)信息或者part(部分)信息时,在确定连续性检测结果为集数连续结果后,还可以先检查排序视频排在第一位的第一视频,是否为首视频,即如果该第一视频只有集数信息,则判断该第一视频的集数信息是否为1,如果不是,则确定排序视频为无效视频;然后确定第一视频是否有part信息,如果有,则需要满足part信息为1或“上”,否则就认为排序视频为无效视频。若排序视频中第一位的第一视频是首视频,则检查该排序视频中排在最后一位的末尾视频是否有part信息,如果有part信息,则检查该part信息是否为最后一部分。如果part信息中含有“上”或者“中”或者只有数字“1”,则判定该末尾视频不是最后一部分,确定排序视频为无效视频。此外,如果排序视频对应的part信息为N/M类型,则还需要判断排序视频中的最后一部视频,即末尾视频的N是否等于M,如4/4。如果不等于,则判定该末尾视频不是最后一部分,确定排序视频为无效视频。然后,计算机设备还可以检查排序视频中各个视频的title(文章标题)名字中是否出现超过1次“大结局”,如果出现多次大结局且均无part信息,则确定排序视频为无效视频。需要说明的是,若计算机设备确定排序视频为无效视频,就不用再生成对应的视频专辑集合了。
具体的,目标作品知识图谱可以包含一个或多个视频版本以及每个视频版本对应的视频对象列表;对排序视频进行视频版本识别处理,得到排序视频对应的视频版本的一个可行实施过程,可以为:对排序视频进行对象识别处理,得到排序视频所包含的多个视频对象以及每个视频对象分别对应的出现时长;根据每个视频对象分别对应的出现时长之间的时长顺序,从多个视频对象中获取R个目标视频对象;R为正整数;确定R个目标视频对象分别与目标作品知识图谱中的每个视频对象列表之间的对象重合度;对象重合度是指一个视频对象列表所包含的视频对象与R个目标视频对象之间的重合度;将对象重合度最大的视频对象列表对应的视频版本,确定为排序视频对应的目标视频版本。可以理解,同个影视作品可能出现不同的演员演绎的情况,此时同个影视作品具有多个视频版本,不同视频版本的影视作品对应的演员列表有很大差异。因此,可以先通过对象识别技术,识别出排序视频中出现次数最多或者出现时长最长的R个目标视频对象,然后将R个目标视频对象和该影视作品对应的目标作品知识图谱中各个视频版本分别对应的演员列表(即视频对象列表),计算相互间的对象重合度,如可以将R个目标视频对象作为待匹配演员列表,然后可以计算待匹配演员列表与目标作品知识图谱中某个视频版本对应的演员列表之间的对象重合度,对象重合度最大的视频对象列表对应的视频版本,即为排序视频对应的目标视频版本。
在确定目标视频版本后,计算机设备还可以通过目标作品知识图谱获取该目标视频版本对应的总集数信息,再和排序视频对应的集数属性信息中最大的集数属性信息进行比对,判别排序视频是否完结。若最大的集数属性信息不小于目标视频版本对应的总集数信息,则可以将排序视频确定为有序视频专辑。
可选的,若最大的集数属性信息小于目标视频版本对应的总集数信息,计算机设备还可以去确定当前的系统时间与目标视频版本的上映时间的时间之差是否超过90天。如果超过90天,则将排序视频确定为无效视频,如果小于90天,则仍然可以将排序视频确定有序视频专辑。
具体的,有序专辑视频的数量为至少两个时,生成包含有序专辑视频的视频专辑集合的一个可行实施过程,可以为:遍历至少两个有序专辑视频,顺序获取第j个有序专辑视频,j为正整数;对第j个有序专辑视频对应的视频封面和第j个有序专辑视频对应的视频标题进行相关度匹配,得到相关度匹配结果;若相关度匹配结果为相关度匹配成功结果,则将第j个有序专辑视频对应的视频封面确定为第j个有序专辑视频对应的专辑视频封面;若相关度匹配结果为相关度匹配失败结果,则对第j个有序专辑视频进行视频帧筛选处理,得到与第j个有序专辑视频对应的视频标题匹配的视频帧画面,将视频帧画面确定为第j个有序专辑视频对应的专辑视频封面;当获取到每个有序专辑视频分别对应的专辑视频封面时,生成包含每个有序专辑视频分别对应的专辑视频封面的视频专辑集合。简言之,为了更好地呈现效果,计算机设备还可以为视频专辑集合中的每个有序专辑视频均挑选一个专辑视频封面,在最终展示视频专辑集合时,就不再展示有序专辑视频原本的视频封面,而是展示有序专辑视频对应的专辑视频封面。其中,对第j个有序专辑视频进行视频帧筛选处理,得到与第j个有序专辑视频对应的视频标题匹配的视频帧画面,将视频帧画面确定为第j个有序专辑视频对应的专辑视频封面的可行实现过程,可以为:通过图文相关性模型筛选出与第j个有序专辑视频对应的视频标题最相关的前三个(也可以是其他数量,不作限制)视频帧画面,然后通过美观度模型挑选出质量最高的视频帧画面,作为第j个有序专辑视频对应的专辑视频封面。
通过本申请实施例提供的视频数据处理方法,可以帮助用户低门槛、完整、快速、精简地了解一部电影或者电视剧,解决了用户搜索喜欢的视频的时候,查找不到后续的相关内容或者内容有缺失的情况,也解决了视频标题和内容张冠李戴、视频质量低的问题,提升了整体的用户体验。
请参见图6,图6是本申请实施例提供的一种视频数据处理装置的结构示意图。该视频数据处理装置可以是运行于计算机设备的一个计算机程序(包括程序代码),例如该视频数据处理装置为一个应用软件;该装置可以用于执行本申请实施例提供的数据处理方法中的相应步骤。如图6所示,该视频数据处理装置1可以包括:获取模块11、特征提取模块12、视频确定模块13以及生成模块14。
获取模块11,用于获取M个待处理视频;M为正整数;
特征提取模块12,用于对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取每个待处理视频分别对应的来源标签信息;视频属性信息包括作品属性信息和集数属性信息;
视频确定模块13,用于根据来源标签信息对M个待处理视频进行分类,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;初始视频集合中的每个待处理视频具有相同的来源标签信息;M个待处理视频所涉及的作品属性信息中包括目标作品属性信息;
生成模块14,用于根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合;视频专辑集合用于在查询数据与有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
其中,获取模块11、特征提取模块12、视频确定模块13以及生成模块14的具体功能实现方式可以参见图3对应实施例中的步骤S101-步骤S104的描述,这里不再进行赘述。
M个待处理视频包括待处理视频Mi,i为小于或等于M的正整数;
特征提取模块12,包括:第一提取单元121以及第二提取单元122。
第一提取单元121,用于对待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息;
第二提取单元122,用于对待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息。
其中,第一提取单元121以及第二提取单元122的具体功能实现方式可以参见图3对应实施例中的步骤S102的描述,这里不再进行赘述。
其中,第一提取单元121,包括:帧检索子单元1211。
帧检索子单元1211,用于对待处理视频Mi进行采样处理,得到视频帧图像;
帧检索子单元1211,还用于将视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到视频作品库中的视频作品分别与视频帧图像之间的画面相似度;
帧检索子单元1211,还用于将与视频帧图像之间的画面相似度最高的视频作品,确定为目标视频作品;
帧检索子单元1211,还用于若视频帧图像与目标视频作品之间的画面相似度大于或等于画面相似度阈值,则将目标视频作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。
其中,帧检索子单元1211的具体功能实现方式可以参见图3对应实施例中的步骤S102的描述,这里不再进行赘述。
或者,第一提取单元121,可以具体用于对待处理视频Mi进行等间隔采样处理,得到多个视频帧图像,遍历多个视频帧图像,将多个视频帧图像中的第i个视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到视频作品库中的视频作品分别与第i个视频帧图像之间的画面相似度;i为小于或等于多个视频帧图像的数量的正整数;
第一提取单元121,具体用于获取与第i个视频帧图像之间的画面相似度最高的视频作品,作为第i个视频帧图像对应的待定视频作品,对第i个视频帧图像对应的待定视频作品进行标记,当标记完多个视频帧图像中每个视频帧图像对应的待定视频作品时,将标记次数最多的待定视频作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。
其中,第一提取单元121,包括:模板匹配子单元1212。
模板匹配子单元1212,用于获取待处理视频Mi对应的视频标题信息;
模板匹配子单元1212,还用于将视频标题信息与标题模板库中的标题模板分别进行结构匹配处理,得到标题模板库中的标题模板分别与视频标题信息之间的结构相似度;
模板匹配子单元1212,还用于将与视频标题信息之间结构相似度最高的标题模板,确定为目标标题模板;
模板匹配子单元1212,还用于若视频标题信息与目标标题模板之间的结构相似度大于或等于结构相似 度阈值,则根据目标标题模板对视频标题信息进行信息提取处理,得到待处理视频Mi对应的作品属性信息。
其中,模板匹配子单元1212的具体功能实现方式可以参见图3对应实施例中的步骤S102的描述,这里不再进行赘述。
其中,第一提取单元121,包括:传播匹配子单元1213。
传播匹配子单元1213,用于遍历获取样本视频库中的第k个样本视频;k为正整数;
传播匹配子单元1213,还用于对待处理视频Mi与第k个样本视频进行画面匹配处理,得到视频画面相似度;
传播匹配子单元1213,还用于对待处理视频Mi的视频标题信息与第k个样本视频对应的视频标题信息进行相似度计算,得到视频标题相似度;
传播匹配子单元1213,还用于获取与待处理视频Mi和第k个样本视频关联的视频点击日志,对视频点击日志进行点击分析处理,得到视频点击相似度;
传播匹配子单元1213,还用于根据视频画面相似度、视频标题相似度以及视频点击相似度,确定待处理视频Mi与第k个样本视频之间的视频相似度;
传播匹配子单元1213,还用于若视频相似度大于视频相似度阈值,则根据视频相似度对第k个样本视频针对关联作品的视频作品置信度进行加权处理,得到待处理视频Mi针对关联作品的作品置信度;第k个样本视频针对关联作品的视频作品置信度用于表征第k个样本视频属于关联作品的可信程度;
传播匹配子单元1213,还用于若作品置信度大于或等于作品置信度阈值,则将关联作品对应的视频作品属性信息,确定为待处理视频Mi对应的作品属性信息。
其中,传播匹配子单元1213的具体功能实现方式可以参见图3对应实施例中的步骤S102的描述,这里不再进行赘述。
其中,第二提取单元122,包括:帧匹配子单元1221。
帧匹配子单元1221,用于从视频作品库中,获取具有待处理视频Mi对应的作品属性信息的视频作品,将获取到的视频作品作为待匹配视频作品;
帧匹配子单元1221,还用于对待处理视频Mi进行等间隔采样处理,得到视频帧图像;
帧匹配子单元1221,还用于对视频帧图像和待匹配视频作品中的视频作品画面进行画面匹配处理,得到与视频帧图像匹配的视频作品画面;
帧匹配子单元1221,还用于将与视频帧图像匹配的视频作品画面对应的集数信息,确定为待处理视频Mi对应的集数属性信息。
其中,帧匹配子单元1221的具体功能实现方式可以参见图3对应实施例中的步骤S102的描述,这里不再进行赘述。
其中,第二提取单元122,包括:标题匹配子单元1222。
标题匹配子单元1222,用于对待处理视频Mi的封面图像进行视频布局字符识别处理,得到待处理视频Mi对应的封面标题信息;
标题匹配子单元1222,还用于将封面标题信息与集数模板库中的集数模板分别进行结构匹配处理,得到集数模板库中的集数模板分别与封面标题信息之间的结构相似度;
标题匹配子单元1222,还用于将与封面标题信息之间结构相似度最高的集数模板,确定为目标集数模板;
标题匹配子单元1222,还用于若封面标题信息与目标集数模板之间的结构相似度大于或等于结构相似度阈值,则根据目标集数模板对封面标题信息进行信息提取处理,得到待处理视频Mi对应的集数属性信息。
其中,标题匹配子单元1222的具体功能实现方式可以参见图3对应实施例中的步骤S102的描述,这里不再进行赘述。
其中,生成模块14,包括:排序单元141、检测单元142、版本识别单元143、集数确定单元144、视频确定单元145以及专辑生成单元146。
排序单元141,用于根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频;
检测单元142,用于对排序视频对应的集数属性信息进行连续性检测,得到连续性检测结果;
版本识别单元143,用于若连续性检测结果为集数连续结果,则根据目标作品知识图谱对排序视频进 行视频版本识别处理,得到排序视频对应的目标视频版本;目标作品知识图谱为排序视频对应的作品属性信息关联的作品知识图谱;
集数确定单元144,用于在目标作品知识图谱中,根据目标视频版本确定排序视频对应的总集数信息;
视频确定单元145,用于若排序视频对应的集数属性信息中最大的集数属性信息和总集数信息相同,则确定排序视频对应的集数属性信息满足集数合法条件;
视频确定单元145,还用于若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频;
专辑生成单元146,生成包含有序专辑视频的视频专辑集合。
其中,排序单元141、检测单元142、版本识别单元143、集数确定单元144、视频确定单元145以及专辑生成单元146的具体功能实现方式可以参见图5对应实施例中的步骤S204的描述,这里不再进行赘述。
其中,目标作品知识图谱包含一个或多个视频版本以及每个视频版本对应的视频对象列表;
版本识别单元143,包括:重合确定子单元1431以及版本确定子单元1432。
重合确定子单元1431,用于对排序视频进行对象识别处理,得到排序视频所包含的多个视频对象以及每个视频对象分别对应的出现时长;
重合确定子单元1431,还用于根据每个视频对象分别对应的出现时长之间的时长顺序,从多个视频对象中获取R个目标视频对象;R为正整数;
重合确定子单元1431,还用于确定R个目标视频对象分别与目标作品知识图谱中的每个视频对象列表之间的对象重合度;对象重合度是指一个视频对象列表所包含的视频对象与R个目标视频对象之间的重合度;
版本确定子单元1432,用于将对象重合度最大的视频对象列表对应的视频版本,确定为排序视频对应的目标视频版本。
其中,重合确定子单元1431以及版本确定子单元1432的具体功能实现方式可以参见图5对应实施例中的步骤S204的描述,这里不再进行赘述。
其中,有序专辑视频的数量为至少两个;
专辑生成单元146,包括:封面确定子单元1461以及生成子单元1462。
封面确定子单元1461,用于遍历至少两个有序专辑视频,顺序获取第j个有序专辑视频,j为正整数;
封面确定子单元1461,还用于对第j个有序专辑视频对应的视频封面和第j个有序专辑视频对应的视频标题进行相关度匹配,得到相关度匹配结果;
封面确定子单元1461,还用于若相关度匹配结果为相关度匹配成功结果,则将第j个有序专辑视频对应的视频封面确定为第j个有序专辑视频对应的专辑视频封面;
封面确定子单元1461,还用于若相关度匹配结果为相关度匹配失败结果,则对第j个有序专辑视频进行视频帧筛选处理,得到与第j个有序专辑视频对应的视频标题匹配的视频帧画面,将视频帧画面确定为第j个有序专辑视频对应的专辑视频封面;
生成子单元1462,用于当获取到每个有序专辑视频分别对应的专辑视频封面时,生成包含每个有序专辑视频分别对应的专辑视频封面的视频专辑集合。
其中,封面确定子单元1461以及生成子单元1462的具体功能实现方式可以参见图5对应实施例中的步骤S204的描述,这里不再进行赘述。
其中,上述视频数据处理装置1,还包括:过滤模块15。
过滤模块15,用于获取第一初始视频集;
过滤模块15,还用于对第一初始视频集进行黑边检测,得到第一初始视频集中每个初始视频分别对应的黑边占比;
过滤模块15,还用于从第一初始视频集中,过滤黑边占比大于黑边占比阈值的初始视频,得到第二初始视频集;
过滤模块15,还用于对第二初始视频集进行水印检测,得到第二初始视频集中每个初始视频分别对应的水印面积占比;
过滤模块15,还用于从第二初始视频集中,过滤水印面积占比大于水印面积占比阈值的初始视频,得到第三初始视频集;
过滤模块15,还用于对第三初始视频集进行清晰度识别,得到第三初始视频集中每个初始视频分别对 应的清晰度;
过滤模块15,还用于从第三初始视频集中,过滤清晰度低于清晰度阈值的初始视频,得到M个待处理视频。
其中,过滤模块15的具体功能实现方式可以参见图5对应实施例中的步骤S204的描述,这里不再进行赘述。
请参见图7,图7是本申请实施例提供的一种计算机设备的结构示意图。如图7所示,上述图6所对应实施例中的视频数据处理装置1可以应用于计算机设备1000,该计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,上述计算机设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图7所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在如图7所示的计算机设备1000中,网络接口1004可提供网络通讯网元;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:
获取M个待处理视频;M为正整数;
对M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取每个待处理视频分别对应的来源标签信息;视频属性信息包括作品属性信息和集数属性信息;
根据来源标签信息对M个待处理视频进行分类,得到初始视频集合,将初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;初始视频集合中的每个待处理视频具有相同的来源标签信息;M个待处理视频所涉及的作品属性信息中包括目标作品属性信息;
根据待排序视频对应的集数属性信息,对待排序视频进行排序处理,得到排序视频,若排序视频对应的集数属性信息满足集数合法条件,则将排序视频确定为有序专辑视频,生成包含有序专辑视频的视频专辑集合;视频专辑集合用于在查询数据与有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图3、图5任一个所对应实施例中对该视频数据处理方法的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且上述计算机可读存储介质中存储有前文提及的视频数据处理装置1所执行的计算机程序,且上述计算机程序包括程序指令,当上述处理器执行上述程序指令时,能够执行前文图3、图5任一个所对应实施例中对上述视频数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
进一步地,请参见图8,图8是本申请实施例提供的另一种视频数据处理装置的结构示意图。上述视频数据处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该视频数据处理装置为一个应用软件;该装置可以用于执行本申请实施例提供的方法中的相应步骤。如图8所示,该数据处理装置2可以包括:第一显示模块21、响应模块22以及第二显示模块23。
第一显示模块21,用于在应用页面的查询框中显示输入的目标查询数据;
响应模块22,用于响应针对目标查询数据的触发操作,若目标查询数据的意图类型为视频意图类型,则在应用页面的查询结果显示框中,显示推荐结果显示区域;
第二显示模块23,用于在推荐结果显示区域中,顺序显示目标视频专辑集合包含的有序专辑视频;目标视频专辑集合为作品属性信息或来源标签信息与目标查询数据相匹配的视频专辑集合,目标视频专辑集合包括一个或多个作品属性信息分别对应的有序专辑视频;具有相同作品属性信息的有序专辑视频的显示顺序是按照所对应的集数属性信息之间的集数顺序进行排序的;目标视频专辑集合中的有序专辑视频属于解说视频类型。
其中,第一显示模块21、响应模块22以及第二显示模块23的具体功能实现方式可以参见图2b对应实施例中的场景描述,这里不再进行赘述。
进一步地,请参见图9,图9是本申请实施例提供的另一种计算机设备的结构示意图。如图9所示,上述图8所对应实施例中的视频数据处理装置2可以应用于计算机设备2000,该计算机设备2000可以包 括:处理器2001,网络接口2004和存储器2005,此外,上述计算机设备2000还包括:用户接口2003,和至少一个通信总线2002。其中,通信总线2002用于实现这些组件之间的连接通信。其中,用户接口2003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口2003还可以包括标准的有线接口、无线接口。网络接口2004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器2005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器2005可选的还可以是至少一个位于远离前述处理器2001的存储装置。如图9所示,作为一种计算机可读存储介质的存储器2005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在图9所示的计算机设备2000中,网络接口2004可提供网络通讯功能;而用户接口2003主要用于为用户提供输入的接口;而处理器2001可以用于调用存储器2005中存储的设备控制应用程序,以实现:
在应用页面的查询框中显示输入的目标查询数据;
响应针对目标查询数据的触发操作,若目标查询数据的意图类型为视频意图类型,则在应用页面的查询结果显示框中,显示推荐结果显示区域;
在推荐结果显示区域中,顺序显示目标视频专辑集合包含的有序专辑视频;目标视频专辑集合为作品属性信息或来源标签信息与目标查询数据相匹配的视频专辑集合,目标视频专辑集合包括一个或多个作品属性信息分别对应的有序专辑视频;具有相同作品属性信息的有序专辑视频的显示顺序是按照所对应的集数属性信息之间的集数顺序进行排序的;目标视频专辑集合中的有序专辑视频属于解说视频类型。
应当理解,本申请实施例中所描述的计算机设备2000可执行前文各个实施例中对该视频数据处理方法的描述,也可执行前文图3、图5所对应实施例中对该视频数据处理装置2的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且上述计算机可读存储介质中存储有前文提及的视频数据处理装置2所执行的计算机程序,当上述处理器加载并执行上述计算机程序时,能够执行前文任一实施例对上述视频数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
上述计算机可读存储介质可以是前述任一实施例提供的视频数据处理装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
此外,这里需要指出的是:本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行前文图3、图5任一个所对应实施例提供的方法。
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、装置、产品或设备固有的其他步骤单元。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照网元一般性地描述了各示例的组成及步骤。这些网元究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的网元,但是这种实现不应认为超出本申请的范围。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (17)

  1. 一种视频数据处理方法,其特征在于,包括:
    获取M个待处理视频;M为正整数;
    对所述M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取所述每个待处理视频分别对应的来源标签信息;所述视频属性信息包括作品属性信息和集数属性信息;
    根据所述来源标签信息对所述M个待处理视频进行分类,得到初始视频集合,将所述初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;所述初始视频集合中的每个待处理视频具有相同的来源标签信息;所述M个待处理视频所涉及的作品属性信息中包括所述目标作品属性信息;
    根据所述待排序视频对应的集数属性信息,对所述待排序视频进行排序处理,得到排序视频,若所述排序视频对应的集数属性信息满足集数合法条件,则将所述排序视频确定为有序专辑视频,生成包含所述有序专辑视频的视频专辑集合;所述视频专辑集合用于在查询数据与所述有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
  2. 根据权利要求1所述的方法,其特征在于,所述M个待处理视频包括待处理视频Mi,i为小于或等于M的正整数;
    所述对所述M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,包括:
    对所述待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息;
    对所述待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息。
  3. 根据权利要求1或2所述的方法,其特征在于,所述对所述待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息,包括:
    对所述待处理视频Mi进行采样处理,得到视频帧图像;
    将所述视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到所述视频作品库中的视频作品分别与所述视频帧图像之间的画面相似度;
    将与所述视频帧图像之间的画面相似度最高的视频作品,确定为目标视频作品;
    若所述视频帧图像与所述目标视频作品之间的画面相似度大于或等于画面相似度阈值,则将所述目标视频作品对应的视频作品属性信息,确定为所述待处理视频Mi对应的作品属性信息。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述对所述待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息,包括:
    对所述待处理视频Mi进行等间隔采样处理,得到多个视频帧图像;
    遍历所述多个视频帧图像,将所述多个视频帧图像中的第i个视频帧图像与视频作品库中的视频作品分别进行画面匹配处理,得到所述视频作品库中的视频作品分别与所述第i个视频帧图像之间的画面相似度;i为小于或等于所述多个视频帧图像的数量的正整数;
    获取与所述第i个视频帧图像之间的画面相似度最高的视频作品,作为所述第i个视频帧图像对应的待定视频作品,对第i个视频帧图像对应的待定视频作品进行标记;
    当标记完所述多个视频帧图像中每个视频帧图像对应的待定视频作品时,将标记次数最多的待定视频作品对应的视频作品属性信息,确定为所述待处理视频Mi对应的作品属性信息。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述对所述待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息,包括:
    获取所述待处理视频Mi对应的视频标题信息;
    将所述视频标题信息与标题模板库中的标题模板分别进行结构匹配处理,得到所述标题模板库中的标题模板分别与所述视频标题信息之间的结构相似度;
    将与所述视频标题信息之间结构相似度最高的标题模板,确定为目标标题模板;
    若所述视频标题信息与所述目标标题模板之间的结构相似度大于或等于结构相似度阈值,则根据所述目标标题模板对所述视频标题信息进行信息提取处理,得到待处理视频Mi对应的作品属性信息。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述对所述待处理视频Mi进行作品属性提取处理,得到待处理视频Mi对应的作品属性信息,包括:
    遍历获取样本视频库中的第k个样本视频;k为正整数;
    对所述待处理视频Mi与所述第k个样本视频进行画面匹配处理,得到视频画面相似度;
    对所述待处理视频Mi的视频标题信息与第k个样本视频对应的视频标题信息进行相似度计算,得到视频标题相似度;
    获取与所述待处理视频Mi和所述第k个样本视频关联的视频点击日志,对所述视频点击日志进行点击分析处理,得到视频点击相似度;
    根据所述视频画面相似度、所述视频标题相似度以及视频点击相似度,确定所述待处理视频Mi与所述第k个样本视频之间的视频相似度;
    若所述视频相似度大于视频相似度阈值,则根据所述视频相似度对所述第k个样本视频针对关联作品的视频作品置信度进行加权处理,得到所述待处理视频Mi针对所述关联作品的作品置信度;所述第k个样本视频针对关联作品的视频作品置信度用于表征所述第k个样本视频属于所述关联作品的可信程度;
    若所述作品置信度大于或等于作品置信度阈值,则将所述关联作品对应的视频作品属性信息,确定为所述待处理视频Mi对应的作品属性信息。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述对所述待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息,包括:
    从视频作品库中,获取具有所述待处理视频Mi对应的作品属性信息的视频作品,将获取到的视频作品作为待匹配视频作品;
    对所述待处理视频Mi进行采样处理,得到视频帧图像;
    对所述视频帧图像和所述待匹配视频作品中的视频作品画面进行画面匹配处理,得到与所述视频帧图像匹配的视频作品画面;
    将与所述视频帧图像匹配的视频作品画面对应的集数信息,确定为所述待处理视频Mi对应的集数属性信息。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述对所述待处理视频Mi进行集数属性提取处理,得到待处理视频Mi对应的集数属性信息,包括:
    对所述待处理视频Mi的封面图像进行视频布局字符识别处理,得到所述待处理视频Mi对应的封面标题信息;
    将所述封面标题信息与集数模板库中的集数模板分别进行结构匹配处理,得到所述集数模板库中的集数模板分别与所述封面标题信息之间的结构相似度;
    将与所述封面标题信息之间结构相似度最高的集数模板,确定为目标集数模板;
    若所述封面标题信息与所述目标集数模板之间的结构相似度大于或等于结构相似度阈值,则根据所述目标集数模板对所述封面标题信息进行信息提取处理,得到待处理视频Mi对应的集数属性信息。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述方法还包括:
    对所述排序视频对应的集数属性信息进行连续性检测,得到连续性检测结果;
    若所述连续性检测结果为集数连续结果,则根据目标作品知识图谱对所述排序视频进行视频版本识别处理,得到所述排序视频对应的目标视频版本;所述目标作品知识图谱为所述排序视频对应的作品属性信息关联的作品知识图谱;
    在所述目标作品知识图谱中,根据所述目标视频版本确定所述排序视频对应的总集数信息;
    若所述排序视频对应的集数属性信息中最大的集数属性信息和所述总集数信息相同,则确定所述排序视频对应的集数属性信息满足集数合法条件。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述目标作品知识图谱包含一个或多个视频版本以及每个视频版本对应的视频对象列表;
    所述根据目标作品知识图谱对所述排序视频进行视频版本识别处理,得到所述排序视频对应的目标视频版本,包括:
    对所述排序视频进行对象识别处理,得到所述排序视频所包含的多个视频对象以及每个视频对象分别对应的出现时长;
    根据所述每个视频对象分别对应的出现时长之间的时长顺序,从所述多个视频对象中获取R个目标视频对象;R为正整数;
    确定所述R个目标视频对象分别与所述目标作品知识图谱中的每个视频对象列表之间的对象重合度;所述对象重合度是指一个视频对象列表所包含的视频对象与所述R个目标视频对象之间的重合度;
    将对象重合度最大的视频对象列表对应的视频版本,确定为所述排序视频对应的目标视频版本。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述有序专辑视频的数量为至少两个;所述生成包含所述有序专辑视频的视频专辑集合,包括:
    遍历至少两个有序专辑视频,顺序获取第j个有序专辑视频,j为正整数;
    对所述第j个有序专辑视频对应的视频封面和所述第j个有序专辑视频对应的视频标题进行相关度匹配,得到相关度匹配结果;
    若所述相关度匹配结果为相关度匹配成功结果,则将所述第j个有序专辑视频对应的视频封面确定为所述第j个有序专辑视频对应的专辑视频封面;
    若所述相关度匹配结果为相关度匹配失败结果,则对所述第j个有序专辑视频进行视频帧筛选处理,得到与所述第j个有序专辑视频对应的视频标题匹配的视频帧画面,将所述视频帧画面确定为所述第j个有序专辑视频对应的专辑视频封面;
    当获取到每个有序专辑视频分别对应的专辑视频封面时,生成包含所述每个有序专辑视频分别对应的专辑视频封面的视频专辑集合。
  12. 根据权利要求1至11任一项所述的方法,其特征在于,还包括:
    获取第一初始视频集;
    对所述第一初始视频集进行黑边检测,得到所述第一初始视频集中每个初始视频分别对应的黑边占比;
    从所述第一初始视频集中,过滤黑边占比大于黑边占比阈值的初始视频,得到第二初始视频集;
    对所述第二初始视频集进行水印检测,得到所述第二初始视频集中每个初始视频分别对应的水印面积占比;
    从所述第二初始视频集中,过滤水印面积占比大于水印面积占比阈值的初始视频,得到第三初始视频集;
    对所述第三初始视频集进行清晰度识别,得到所述第三初始视频集中每个初始视频分别对应的清晰度;
    从所述第三初始视频集中,过滤清晰度低于清晰度阈值的初始视频,得到M个待处理视频。
  13. 一种视频数据处理方法,其特征在于,包括:
    在应用页面的查询框中显示输入的目标查询数据;
    响应针对所述目标查询数据的触发操作,若所述目标查询数据的意图类型为视频意图类型,则在所述应用页面的查询结果显示框中,显示推荐结果显示区域;
    在所述推荐结果显示区域中,顺序显示目标视频专辑集合包含的有序专辑视频;所述目标视频专辑集合为作品属性信息或来源标签信息与所述目标查询数据相匹配的视频专辑集合,所述目标视频专辑集合包括一个或多个作品属性信息分别对应的有序专辑视频;具有相同作品属性信息的有序专辑视频的显示顺序是按照所对应的集数属性信息之间的集数顺序进行排序的;所述目标视频专辑集合中的有序专辑视频属于解说视频类型。
  14. 一种视频数据处理装置,其特征在于,包括:
    获取模块,用于获取M个待处理视频;M为正整数;
    特征提取模块,用于对所述M个待处理视频分别进行特征提取,得到每个待处理视频分别对应的视频属性信息,获取所述每个待处理视频分别对应的来源标签信息;所述视频属性信息包括作品属性信息和集数属性信息;
    视频确定模块,用于根据所述来源标签信息对所述M个待处理视频进行分类,得到初始视频集合,将所述初始视频集合中具有目标作品属性信息的待处理视频确定为待排序视频;所述初始视频集合中的每个待处理视频具有相同的来源标签信息;所述M个待处理视频所涉及的作品属性信息中包括所述目标作品属性信息;
    生成模块,用于根据所述待排序视频对应的集数属性信息,对所述待排序视频进行排序处理,得到排序视频,若所述排序视频对应的集数属性信息满足集数合法条件,则将所述排序视频确定为有序专辑视频,生成包含所述有序专辑视频的视频专辑集合;所述视频专辑集合用于在查询数据与所述有序专辑视频对应的作品属性信息或来源标签信息相匹配时显示在查询结果显示框中。
  15. 一种计算机设备,其特征在于,包括:处理器、存储器以及网络接口;
    所述处理器与所述存储器、所述网络接口相连,其中,所述网络接口用于提供数据通信功能,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行权利要求1-13任一项所述的方法。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序适于由处理器加载并执行权利要求1-13任一项所述的方法。
  17. 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时,可以执行权利要求1-13任一项所述的方法。
PCT/CN2024/082438 2023-03-20 2024-03-19 视频数据处理方法、装置、设备及可读存储介质 WO2024193538A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310272580.9 2023-03-20
CN202310272580.9A CN116980646A (zh) 2023-03-20 2023-03-20 视频数据处理方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2024193538A1 true WO2024193538A1 (zh) 2024-09-26

Family

ID=88483822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/082438 WO2024193538A1 (zh) 2023-03-20 2024-03-19 视频数据处理方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN116980646A (zh)
WO (1) WO2024193538A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980646A (zh) * 2023-03-20 2023-10-31 北京搜狗科技发展有限公司 视频数据处理方法、装置、设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929874A (zh) * 2011-08-08 2013-02-13 深圳市快播科技有限公司 检索数据的排序方法及装置
CN104008139A (zh) * 2014-05-08 2014-08-27 北京奇艺世纪科技有限公司 视频索引表的创建方法和装置,视频的推荐方法和装置
US20140344293A1 (en) * 2012-03-30 2014-11-20 Rakuten, Inc. Information providing device, information providing method, program, information storage medium, and information providing system
CN106033417A (zh) * 2015-03-09 2016-10-19 深圳市腾讯计算机系统有限公司 视频搜索系列剧的排序方法和装置
CN112015948A (zh) * 2020-08-05 2020-12-01 北京奇艺世纪科技有限公司 视频推荐方法、装置、电子设备及存储介质
CN116980646A (zh) * 2023-03-20 2023-10-31 北京搜狗科技发展有限公司 视频数据处理方法、装置、设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929874A (zh) * 2011-08-08 2013-02-13 深圳市快播科技有限公司 检索数据的排序方法及装置
US20140344293A1 (en) * 2012-03-30 2014-11-20 Rakuten, Inc. Information providing device, information providing method, program, information storage medium, and information providing system
CN104008139A (zh) * 2014-05-08 2014-08-27 北京奇艺世纪科技有限公司 视频索引表的创建方法和装置,视频的推荐方法和装置
CN106033417A (zh) * 2015-03-09 2016-10-19 深圳市腾讯计算机系统有限公司 视频搜索系列剧的排序方法和装置
CN112015948A (zh) * 2020-08-05 2020-12-01 北京奇艺世纪科技有限公司 视频推荐方法、装置、电子设备及存储介质
CN116980646A (zh) * 2023-03-20 2023-10-31 北京搜狗科技发展有限公司 视频数据处理方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN116980646A (zh) 2023-10-31

Similar Documents

Publication Publication Date Title
US10277946B2 (en) Methods and systems for aggregation and organization of multimedia data acquired from a plurality of sources
CN113010703B (zh) 一种信息推荐方法、装置、电子设备和存储介质
CN109117777B (zh) 生成信息的方法和装置
CN105677735B (zh) 一种视频搜索方法及装置
US9176987B1 (en) Automatic face annotation method and system
Mahrishi et al. Video index point detection and extraction framework using custom YoloV4 Darknet object detection model
CN112100438A (zh) 一种标签抽取方法、设备及计算机可读存储介质
CN109408672B (zh) 一种文章生成方法、装置、服务器及存储介质
US20230032728A1 (en) Method and apparatus for recognizing multimedia content
CN113806588A (zh) 搜索视频的方法和装置
WO2024193538A1 (zh) 视频数据处理方法、装置、设备及可读存储介质
CN116975340A (zh) 信息检索方法、装置、设备、程序产品及存储介质
CN116955707A (zh) 内容标签的确定方法、装置、设备、介质及程序产品
US20190258629A1 (en) Data mining method based on mixed-type data
CN116977992A (zh) 文本信息识别方法、装置、计算机设备和存储介质
CN113407775B (zh) 视频搜索方法、装置及电子设备
WO2024188044A1 (zh) 视频标签生成方法、装置、电子设备及存储介质
CN117763510A (zh) 网页识别方法、装置、设备、介质及程序产品
TW201523421A (zh) 決定用於擷取的文章之圖像
CN116978028A (zh) 视频处理方法、装置、电子设备及存储介质
CN117009578A (zh) 视频数据的标注方法、装置、电子设备及存储介质
Xu et al. RETRACTED: Crowd Sensing Based Semantic Annotation of Surveillance Videos
CN117648504A (zh) 媒体资源序列的生成方法、装置、计算机设备和存储介质
CN116483946B (zh) 数据处理方法、装置、设备及计算机程序产品
Seng Enriching Existing Educational Video Datasets to Improve Slide Classification and Analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24774103

Country of ref document: EP

Kind code of ref document: A1