[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112668561B - Teaching video segmentation determination method and device - Google Patents

Teaching video segmentation determination method and device Download PDF

Info

Publication number
CN112668561B
CN112668561B CN202110278404.7A CN202110278404A CN112668561B CN 112668561 B CN112668561 B CN 112668561B CN 202110278404 A CN202110278404 A CN 202110278404A CN 112668561 B CN112668561 B CN 112668561B
Authority
CN
China
Prior art keywords
frame image
image
segmentation
similarity
temporary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110278404.7A
Other languages
Chinese (zh)
Other versions
CN112668561A (en
Inventor
王鑫龙
卢波
王凯夫
彭守业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202110278404.7A priority Critical patent/CN112668561B/en
Publication of CN112668561A publication Critical patent/CN112668561A/en
Application granted granted Critical
Publication of CN112668561B publication Critical patent/CN112668561B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Television Signal Processing For Recording (AREA)

Abstract

本申请提供一种教学类视频的切分确定方法和装置,前述方法包括:按照第一周期提取教学类视频的帧图像,形成有序图像集;比较所述有序图像集中相邻的所述帧图像,确定临时切分点;根据所述临时切分点确定第一帧图像和第二帧图像;比较所述第一帧图像和所述第二帧图像的板书区域,确定第一相似度;所述第一帧图像为所述临时切分点前的帧图像,所述第二帧图像为所述临时切分点后的帧图像;根据所述第一相似度删减对应的所述临时切分点;采用剩余的所述临时切分点作为所述教学类视频的实际切分点。板书区域的内容和教学主题具有直接关系,因此利用板书区域内容能够较为准确地对临时切分点进行筛选,继而实现根据教学类视频的内容特性切分教学类视频。

Figure 202110278404

The present application provides a method and device for determining the segmentation of a teaching video. The aforementioned method includes: extracting frame images of a teaching video according to a first cycle to form an ordered image set; frame image, determine the temporary segmentation point; determine the first frame image and the second frame image according to the temporary segmentation point; compare the blackboard writing area of the first frame image and the second frame image to determine the first similarity ; The first frame image is the frame image before the temporary segmentation point, and the second frame image is the frame image after the temporary segmentation point; according to the first similarity, the corresponding described Temporary segmentation point; adopt the remaining temporary segmentation points as the actual segmentation points of the teaching video. The content of the blackboard writing area has a direct relationship with the teaching theme, so the content of the blackboard writing area can be used to screen the temporary segmentation points more accurately, and then the teaching videos can be segmented according to the content characteristics of the teaching videos.

Figure 202110278404

Description

Teaching video segmentation determination method and device
Technical Field
The application relates to the technical field of video processing, in particular to a segmentation determination method and device for teaching videos.
Background
In order to quickly locate a specific content area in a video and meet the requirements of quick search and specific content extraction, a method for identifying the content of videos such as movies and films and determining segmentation according to the content of the videos is provided at present.
A method for identifying videos such as movies and videos such as documentaries to determine index points is mainly characterized in that segmentation indexes are determined by utilizing large-scale changes of image frames based on the fact that videos of the movies and videos have scene change characteristics and scenes before and after changing have obvious distinguishing characteristics.
Because the teaching video has the characteristic that the scene characteristics are basically unchanged, the segmentation determination method suitable for the videos and the documentaries is not suitable for the teaching video. There is still a need for a method of determining an index for an educational video by manual indexing.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the present application provides a segmentation method and a segmentation apparatus for teaching videos.
On one hand, the application provides a segmentation method of teaching videos, which comprises the following steps:
extracting frame images of the teaching video according to a first period to form an ordered image set;
comparing the adjacent frame images in the ordered image set to determine a temporary segmentation point;
comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; the first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point;
deleting the corresponding temporary segmentation points according to the first similarity;
and adopting the residual temporary segmentation points as actual segmentation points of the teaching video.
Optionally, comparing the blackboard-writing areas of the first frame image and the second frame image, and determining a first similarity, includes:
by usingSSIMComparing blackboard writing areas of the first frame image and the second frame image by using an algorithm, and determining the first similarity; and/or the presence of a gas in the gas,
comparing the blackboard writing areas of the first frame image and the second frame image by adopting a cosine distance method, and determining the first similarity; and/or the presence of a gas in the gas,
text content recognition is carried out on the blackboard writing areas of the first frame image and the second frame image, and two recognition texts are obtained;
and determining the first similarity according to the two recognition texts.
Optionally, comparing the two recognized texts to determine the first similarity includes:
comparing the two identification texts to obtain an editing distance;
and determining the first similarity according to the editing distance and the length of one recognition text.
Optionally, in the case that the teaching type video has a character image,
comparing adjacent ones of the frame images in the ordered set of images to determine temporal segmentation points, comprising: and comparing the regions of the frame images which do not comprise the character image, and determining the temporary segmentation points.
Optionally, comparing adjacent frame images of the ordered image set to determine a temporary segmentation point comprises:
by usingSSIMComparing the adjacent frame images by an algorithm to obtain a second similarity;
and determining to set the temporary dividing point between the adjacent frame images in the case that the second similarity is smaller than a second threshold value.
Optionally, the method further comprises: comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity;
deleting the temporary segmentation point corresponding to the third similarity under the condition that the third similarity is larger than a third threshold; and/or the presence of a gas in the gas,
obtaining a residual image according to the first frame image and the second frame image;
and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.
Optionally, obtaining a residual image according to the first frame image and the second frame image includes:
calculating a first residual image and a second residual image; the first residual image is a difference image between the first frame image and a second frame image, and the second residual image is a difference image between the second frame image and the first frame image;
determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image, including:
calculating a fourth similarity according to the first residual image and the first frame image, and calculating a fifth similarity according to the second residual image and the second frame image;
determining a larger value and a smaller value of a fourth similarity and the fifth similarity;
and deleting the corresponding temporary dividing point under the condition that the larger value is smaller than a fourth threshold value or the smaller value is smaller than a fifth threshold value.
Optionally, the method further comprises: determining the sound intensity in each second period in the teaching video;
setting the corresponding audio identifier of the second period as a mute identifier under the condition that the sound intensity is smaller than a preset intensity threshold; or, when the sound intensity is greater than a preset intensity threshold, setting the corresponding audio identifier of the second period as an audio identifier;
determining audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods;
adopting the remaining temporary segmentation points as actual segmentation points of the teaching video, comprising: and adopting the temporary dividing point and the audio dividing point as the actual dividing point.
Optionally, determining an audio dividing point according to the variation characteristic of the audio identifier corresponding to each second period includes:
judging whether the audio identifiers of the second period of a certain number of continuous periods are mute identifiers or not;
if yes, keeping the audio identifiers of the continuous specific number of the second periods unchanged;
if not, modifying the mute identification in the audio identifications in the second period with continuous specific number into an audio identification;
and setting the audio dividing point at the position where the audio identification changes.
Optionally, the method further comprises: and extracting a video clip of the teaching video between two adjacent actual segmentation points to serve as an extracted clip.
Optionally, in a case that the teaching type video has a character image, extracting a video segment of the teaching type video between two adjacent actual segmentation points includes:
determining the number of people in each frame image in the ordered image set;
counting the number of frame images which are positioned in the ordered image set in a time period determined by two adjacent actual segmentation points and contain more than a preset number of people;
and under the condition that the number of the frame images is less than the preset number, extracting a video clip positioned between two adjacent actual segmentation points.
Optionally, the method further comprises: processing the extracted segments, or processing the frame images corresponding to the extracted segments in the ordered image set, and determining segment topics;
and indexing the extracted fragment by adopting the fragment theme.
On the other hand, this application provides a segmentation device of teaching type video, includes:
the extraction unit is used for extracting frame images of the teaching video according to a first period to form an ordered image set;
a segmentation point primary selection unit, configured to compare adjacent frame images in the ordered image set, and determine a temporary segmentation point;
a cut point deleting unit for determining a first frame image and a second frame image according to the temporary cut points; comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; deleting the corresponding temporary segmentation points according to the first similarity; the first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point;
and the actual segmentation point determining unit is used for adopting the residual temporary segmentation points as the actual segmentation points of the teaching video.
Optionally, the cut point deleting unit adoptsSSIMComparing blackboard writing areas of the first frame image and the second frame image by using an algorithm, and determining the first similarity; and/or the presence of a gas in the gas,
comparing the blackboard writing areas of the first frame image and the second frame image by adopting a cosine distance method, and determining the first similarity; and/or the presence of a gas in the gas,
text content recognition is carried out on the blackboard writing areas of the first frame image and the second frame image, and two recognition texts are obtained; and determining the first similarity according to the two recognition texts.
Optionally, the cut point deleting unit is further configured to:
comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity; and deleting the temporary segmentation point corresponding to the third similarity when the third similarity is larger than a third threshold; and/or the presence of a gas in the gas,
obtaining a residual image according to the first frame image and the second frame image; and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.
Optionally, the method further comprises: the sound intensity determining unit is used for determining the sound intensity in each second period in the teaching video;
a sound identifier determining unit, configured to set the corresponding audio identifier of the second period as a mute identifier when the sound intensity is smaller than a preset intensity threshold; or, when the sound intensity is greater than a preset intensity threshold, setting the corresponding audio identifier of the second period as an audio identifier;
the audio dividing point determining unit is used for determining audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods;
the actual segmentation point determining unit uses the temporary segmentation point and the audio segmentation point as the actual segmentation point.
Optionally, the method further comprises: and the extraction unit is used for extracting a video clip of the teaching video between two adjacent actual segmentation points as an extraction clip.
Optionally, the method further comprises: the theme determining unit is used for processing the extracted fragments or processing the frame images corresponding to the extracted fragments in the ordered image set to determine the theme of the fragments; and indexing the extracted segment with the segment theme.
According to the teaching video segmentation method and device, after the temporary segmentation points are determined, the temporary segmentation points of the contents of the blackboard writing areas of adjacent frames are used for deleting operation, and the temporary segmentation points which are not suitable for being used as actual segmentation points are eliminated according to the contents of the blackboard writing areas. The content of the blackboard writing area has a direct relation with the teaching theme, so that the teaching video can be accurately segmented according to the content characteristics of the teaching video by adopting the method.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive labor;
fig. 1 is a flowchart of a segmentation method for teaching-type video according to an embodiment of the present application;
FIG. 2 is a flow chart of determining audio cut points according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a teaching-type video segmentation apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
wherein: 11-extraction unit, 12-segmentation point initial selection unit, 13-segmentation point deletion unit, 14-actual segmentation point determination unit, 21-processor, 22-memory, 23-communication interface, 24-bus system.
Detailed Description
In order that the above-mentioned objects, features and advantages of the present application may be more clearly understood, the solution of the present application will be further described below. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways than those described herein; it is to be understood that the embodiments described in this specification are only some embodiments of the present application and not all embodiments.
The embodiment of the application provides a segmentation method of a teaching video, which selects a specific segmentation strategy based on the characteristics of the teaching video to realize automatic determination of segmentation points of the teaching video, and subsequent possible label addition and fragment interception.
It should be noted that the teaching video in the embodiment of the present application is a specific video, and is characterized in that the educational video has a certain writing area. The writing area described here is not to be construed narrowly as a handwritten content area, but is to be understood as an area in which a trainee is presented with content by characters, graphics, or pictures in order to present teaching content to the trainee, and may be a handwritten content area or an area displayed by a display or a projection display.
It should be noted that the content of the blackboard-writing area gradually changes as the teaching content advances, including: (1) within the time corresponding to a specific teaching content, the blackboard writing content can be gradually increased until the teaching content is completely displayed; (2) when switching from one teaching content to the next teaching content, the display content in the blackboard-writing area is cleared.
Fig. 1 is a flowchart of a segmentation method for teaching videos according to an embodiment of the present application. As shown in fig. 1, the segmentation method for teaching videos in the embodiment of the present application includes steps S101 to S105.
S101: and extracting frame images of the teaching video according to a first period to form an ordered image set.
In order to ensure that smooth video content is formed, the frame frequency of the teaching video is more than 24 frames/second (the teaching video in practical application may be 30 frames/second, 50 frames/second and 60 frames/second); because the time interval between the adjacent image frames is small, the rate of change of the image content of the two adjacent video frames is not large (it is possible that the content of the video frame image within a certain set period is also not large), and it is not easy to determine whether the two adjacent video frames are determined as the dividing point.
In order to overcome the content mentioned in the previous paragraph, in the embodiment of the present application, according to the method of periodic sampling, the frame images of the teaching class are extracted according to the first period, and an ordered image set for subsequent processing is formed.
In the embodiment of the application, the first period may be determined according to the length of the teaching video, the video content type of the teaching video, and the computer processing capability for performing the subsequent processing. In practical application, the first period is set to be 1s at most; of course, if the teaching progress of the teaching-type video is slow, or the computer processing capability for the subsequent processing is limited, the first period may be set to other values such as 2s or 5 s.
It should also be noted that the ordered image set may be an image set including only two frame images, or may be an image set including more frame images, and the embodiment of the present application is not particularly limited.
S102: and comparing adjacent frame images in the ordered image set to determine a temporary segmentation point.
The adjacent frame images in the ordered image set, that is, the images respectively extracted in the adjacent first cycle in step S101.
The temporal segmentation point is a time point that may be used to segment the teaching-like video. In practical application, after the temporary dividing point is determined, the temporary dividing point is used for dividing two adjacent first periods into two different teaching video clips.
In step S102, the adjacent frame images are compared, and there are various methods for determining the temporary segmentation points; for convenience, the specific implementation of step S102 is described later. It is only necessary to understand that the temporary segmentation point determined in step S102 is a segmentation point determined by using a lower determination criterion and possibly used for segmenting the teaching video; based on the above requirements, the number of temporary segmentation points is larger than the number of actual segmentation points.
S103: and determining a first frame image and a second frame image according to the temporary segmentation points, comparing blackboard writing areas of the first frame image and the second frame image, and determining a first similarity.
The first frame image is a frame image of a first period before the corresponding temporal segmentation point in the ordered image set, and the second frame image is a frame image of a first period after the corresponding temporal segmentation point in the ordered image set.
As described above, the teaching video provided by the embodiment of the present application has a blackboard-writing area. In step S103, the recognition blackboard-writing areas in the first frame image and the second frame image need to be recognized and extracted, and then the blackboard-writing areas are processed to obtain the first similarity.
In the specific application of the embodiment of the application, there are several methods for identifying and extracting the blackboard-writing area from the first frame of image and the second frame of image.
(1) Method for adopting deep learning or self-adaptive recognition
The method using the deep learning or the adaptive recognition is preferably used in the case where the blackboard-writing region has a significant contrast with other regions in the frame image, or the edge features of the blackboard-writing region are significant. This method is preferably used in a case where a camera that captures a teaching-type video may change the viewing range.
In this case, if there is no blackboard-writing area in the first frame image or the second frame image, a null area may not be obtained instead of the corresponding blackboard-writing area; in practice, the first similarity is determined by comparing the blackboard writing areas, and may be comparing one blackboard writing area with one null area, or comparing two null areas.
(2) By extracting specific location areas
The method for extracting the specific area is mostly used for shooting teaching video in a scene with a fixed camera view finding range. The method is characterized in that the position area of the blackboard writing area in the frame image is determined before the teaching video is processed, and then the position area is used as the blackboard writing area when each frame image is processed.
It should be noted that, in the case of extracting a specific area, even if the blackboard-writing area is blocked by other obstacles (for example, blocked by a trainee or a teacher), the area is still regarded as the blackboard-writing area.
In this embodiment of the application, after determining the blackboard writing area, the method for determining the first similarity according to the blackboard writing areas of the first frame image and the second frame image may be selected as follows.
(1) By usingSSIMThe algorithm compares the blackboard writing areas of the first frame image and the second frame image to determine the first similarity. In the embodiment of the present application,SSIMthe algorithm is a structure similarity method (StructuralSIMilarity) For short.
(2) And comparing the blackboard-writing areas of the first frame image and the second frame image by adopting the cosine distance to determine the first similarity. For example, the pixel gray scale of the writing area of the first frame image is x1,x2,……xnThe pixel gray scale of the writing area of the second frame image isy 1 ,y 2 ,……y n The cosine distance is
Figure 172277DEST_PATH_IMAGE001
(3) And determining the first similarity by adopting a text comparison method. Specifically, steps S1031 to S1032 are included.
S1031: and performing text content identification on the blackboard writing areas of the first frame image and the second frame image to obtain two texts.
In the specific application of the embodiment of the present application, the method can be adoptedText content recognition of the content of the blackboard writing to obtain recognized text is carried out by various methods known in the art, and the text recognition method is not repeated here (Optical character ReorganizationOCR) method, and may be embodied in the relevant technical literature or engineering practice.
In other applications of the embodiments of the present application, other methods may be used to determine the first similarity.
S1032: a first similarity is determined based on the lengths of the two recognized texts.
In a specific application of the embodiment of the present application, determining the first similarity according to the lengths of the two recognition texts includes: and comparing the two recognition texts to obtain an editing distance, and taking the ratio of the editing distance to the length of one recognition text as a first similarity.
For example, the text content length of the blackboard-writing area in the first frame image issmThe text content of the blackboard writing area in the second frame image issnEdit distance oflThen the first similarity may bel/smOrl/sn
After the execution of step S103 is completed, the execution of steps S104 to S105 may be continued.
S104: and deleting the temporary segmentation points corresponding to the first similarity according to the first similarity.
In step S104, the corresponding temporary segmentation points are deleted according to the first similarity, and need to be specifically selected according to the text length determination method adopted in step S103.
For example, in the use ofSSIMUnder the condition that the algorithm calculates the first similarity, if the first similarity is larger than the corresponding threshold value, deleting the corresponding temporary segmentation point; and under the condition of adopting a cosine distance method and a text comparison method, if the first similarity is smaller than the corresponding threshold value, deleting the corresponding temporary segmentation point. And under the condition that the first similarity is calculated by adopting a text recognition method, if the first similarity is smaller than the corresponding threshold value, deleting the corresponding temporary segmentation point.
S105: and adopting the residual temporary segmentation points as actual segmentation points of the teaching video.
After the deleting operation of some temporary segmentation points which are not suitable for being used as actual segmentation points is completed in step S104, the temporary segmentation points which are not deleted are used as real segmentation points of the teaching video in step S105, and are used for performing segmentation extraction or labeling addition on the teaching video.
According to the teaching video segmentation method provided by the embodiment of the application, after the temporary segmentation points are determined, the temporary segmentation points of the contents of the adjacent frame blackboard writing areas are used for deleting operation, so that the temporary segmentation points which are not suitable for being used as actual segmentation points are excluded according to the blackboard writing contents. The writing content and the teaching theme have a direct relation, so the teaching video can be accurately segmented according to the content characteristics of the teaching video by adopting the method.
In educational videos specifically processed by some embodiments of the present application, there may be images of people such as instructors, trainees, and the like, particularly images that include instructors. In the teaching process, the instructor can show various body actions according to the development of teaching contents, such as checking a teaching plan, facing to a student (facing to a lens), facing to a blackboard writing area and the like. At this time, the instructor action may affect that the number of the determination temporary division points in step S102 is excessive.
To solve the foregoing problem, in some embodiments of the present application, the step S102 may be executed as follows: and comparing the regions of the frame images which do not comprise the character images, and determining the temporary segmentation points. That is, in step S102, the adjacent frame images are first identified to remove the character image portion, and then the content of the removed character image portion is subjected to feature identification.
In the embodiment of the present application, the method for determining the temporary segmentation point when performing step S102 may include the following methods: comparing the adjacent frame images by adopting an SSIM method to obtain a second similarity; then judging whether the second similarity is smaller than a second threshold value; if the second similarity is smaller than a second threshold value, setting a temporary segmentation point between adjacent frame images; if the second similarity is greater than the second threshold, a temporary cut point is not set between adjacent frame images.
In practical application, becauseSSIMThe image similarity identification method based on the statistical method has the advantages of fast image similarity identification and global contrast, so the method is preferably adopted as the method for determining the temporary segmentation points. In practical applications, the number of the determined temporary dividing points can be adjusted by setting the size of the second threshold.
Of course, in other embodiments of the present application, other processing methods may be used to determine the temporary segmentation point.
In some applications of the embodiment of the present application, in addition to the step of deleting the temporary dividing points in steps S103 and S104, other steps may be further provided for deleting the temporary dividing points, and the following methods may be adopted.
(1) Method for using cosine distance
The method using cosine distance comparison includes steps S106-S108.
S106: comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity;
s107: judging whether the third similarity is smaller than a third threshold value; if yes, go to step S108.
S108: and deleting the temporary segmentation point corresponding to the third similarity.
The cosine distance calculation method mentioned in step S106 is as described above, and will not be repeated here. It should be noted that, as can be seen from the above formula for calculating the cosine distance, the cosine distance method is a method for comparing corresponding pixels, and if the gray scale change of corresponding pixels of two images is not large, the cosine distance between the two images is large, so that it can be conveniently determined whether some temporary cut points can be deleted.
(2) Method using residual comparison
The method using the residual comparison includes steps S108 to S109.
S108: and obtaining a residual image according to the first frame image and the second frame image.
In a specific application, the residual image calculated in step 108 includes a first residual image and a second residual image; the first residual image is a difference image between the first frame image and the second frame image, and the second residual image is a difference image between the second frame image and the first frame image.
S109: and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.
In a specific application of some embodiments of the present application, step S109 can be subdivided into S1091-S1093.
S1091: and calculating a fourth similarity according to the first residual image and the first frame image, and calculating a fifth similarity according to the second residual image and the second frame image.
In a specific embodiment, the calculation of the fourth similarity and the calculation of the fifth similarity may both adoptSSIMThe method is calculated or obtained by other algorithms known in the image processing field.
S1092: and determining the larger value and the smaller value of the fourth similarity and the fifth similarity.
S1093: and deleting the corresponding temporary dividing point under the condition that the larger value is smaller than the fourth threshold value or the smaller value is smaller than the fifth threshold value.
It should be noted that in the specific application of the embodiment of the present application, the actual dividing point can be obtained by using the foregoing various kinds of deletion temporary dividing points.
In practical application, the teaching video includes audio content besides image content, the audio content and the teaching content have strong correlation, and the general probability between different teaching subjects has strong time intervals. Based on the foregoing analysis, in the embodiment of the present application, in addition to determining the actual segmentation point according to the frame image content, it is also possible to determine the audio segmentation point according to the sound feature, and take the audio segmentation point as the actual segmentation point.
Fig. 2 is a flow chart providing a determination of audio cut points according to an embodiment of the present application. As shown in fig. 2, the determining of the audio cut point step includes S201-S205.
S201: and determining the sound intensity in each second period in the teaching video.
In the embodiment of the application, the second period is a time period determined according to factors such as the content progress of the teaching video and the like; in practical applications, the second period may be the same as the first period in the foregoing, or may be different from the first period.
S202: judging whether the sound intensity is smaller than a preset intensity threshold value or not; if yes, go to S203; if not, go to step S204.
S203: and setting the audio identifier of the corresponding second period as a mute identifier.
S204: and setting the audio identifier of the corresponding second period as a sound identifier.
S205: and determining audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods.
By adopting the foregoing steps S201 to S205, after the audio identifier of each second period is determined by adopting the intensity threshold, the sound change characteristic in the teaching video can be determined according to the distribution characteristic of the audio identifier. According to the analysis, the general probability among different teaching subjects has stronger time intervals, so that the audio dividing point can be determined according to the dialectical characteristics of the audio identification.
In the embodiment of the present application, step S205 can be subdivided into steps S2051-S2054.
S2051: judging whether the audio identifiers of the second period of a certain number of continuous periods are mute identifiers or not; if yes, go to S2052; if not, go to S2053.
In the embodiment of the application, the specific number is set according to the teaching speed of the teaching video; in one particular application, the specific number may be set to 5-10.
S2052: the audio identification is maintained for a certain number of consecutive second periods.
S2053: and modifying the mute identifier in the audio identifier in a second period of a certain number of continuous periods into the sound identifier.
S2054: and setting an audio dividing point at the position where the audio identification changes.
After step S2053, the audio identification sequence of the teaching video includes a continuous voiced identification and a continuous unvoiced identification; where the silent indicator characterizes a region of greater silence, which may characterize an interval region for switching from one instructional theme to another, the location at which the audio indicator changes is set as the audio cut-off point.
Based on the foregoing steps S201 to S205, step S105 in some embodiments of the present application specifically includes: and adopting the temporary dividing point and the audio dividing point as actual dividing points.
In some applications of the embodiment of the application, after the actual segmentation point is determined, the segmentation point can be added to the corresponding position in the teaching video to serve as the index point, so that a subsequent student can quickly find the content to be watched through the index point. In other applications of the embodiment of the present application, after the actual segmentation points are determined, it may be necessary to intercept a video segment located between two adjacent actual segmentation points as an extracted segment.
In the embodiment of the present application, in the method for obtaining extracted segments, some segments that are not teaching content need to be deleted. In order to remove segments that do not include instructional content, the method provided by the embodiment of the present application may include steps S301-S305.
S301: the number of people in each frame image in the ordered image set is determined.
S302: and counting the number of the frame images in the ordered image set, wherein the frame images are positioned in the two adjacent actual segmentation points to determine the time period and contain more than the preset number of people.
S303: judging whether the number of frame images containing more than a preset number of people is larger than a preset number; if yes, go to step S304; if not, go to S305.
S304: the segment between the two actual cut points is discarded.
S305: and extracting the video segment between two adjacent actual segmentation points as an extracted segment.
In some teaching videos processed by the application of the embodiment of the application, only a teacher generally appears in a scene during a normal teaching process; the number of people in the shot scene is more than two and the number of people in the shot scene is not much when the student may answer the question; if the number of people who have a long time is more than the set number of people in the scene, the probability is not in the normal teaching. Based on this, some video segments are discarded through steps S301-S305 in some embodiments of the present application, and only some other video segments are taken as extracted segments.
In addition to the foregoing method, in some applications of the embodiments of the present application, some video segments that cannot be extracted may be removed by the following method.
(1) The video segment between two adjacent actual cut points, whose sound signatures are all silence signatures, is discarded.
(2) Discarding video segments with a video length between two adjacent actual segmentation points being smaller than a certain length; wherein the specific length is determined according to the teaching and training content.
(3) In case the content of a video frame between video segments between two adjacent actual cut points has a certain image content or audio content identifying a non-teaching time, the video segments between these two adjacent actual cut points are discarded.
In this embodiment of the present application, after determining the extracted segment, the method provided in this embodiment of the present application may further include adding a theme index to the extracted segment, so as to conveniently and quickly find the corresponding teaching extracted segment subsequently.
In the embodiment of the present application, the process of adding the theme index may include steps S401 to S402.
S401: and processing the extracted segment, or processing the frame image corresponding to the extracted segment in the ordered image set, and determining the segment theme.
In the embodiment of the application, for some specific frame images in the extraction segment or in the corresponding ordered image set, content extraction can be performed on the specific frame images to determine the segment topic. For example, in an application of the embodiment of the present application, the start frame image of the extracted segment, the frame image 1/4, the frame image 1/2, and the frame image 3/4 may be processed to obtain a segment topic.
S402: and extracting the fragments by adopting the fragment topic indexing.
Step S402 is to add the clip topic as a title or attribute content to the extracted clip.
In addition, in some embodiments of the application, under the condition that the extracted segment is not obtained, a segment topic may also be determined according to a frame image in the actual segmentation point corresponding to the teaching video, and the segment topic is used as a label of a corresponding segment of the teaching video.
Besides the teaching video segmentation method, the embodiment of the application also provides a teaching video segmentation device which has the same inventive concept as the teaching video segmentation method.
Fig. 3 is a schematic structural diagram of a teaching-type video segmentation apparatus according to an embodiment of the present application; as shown in fig. 3, in some embodiments, the segmentation apparatus for teaching-type video includes an extraction unit 11, a segmentation point initial selection unit 12, a segmentation point deletion unit 13, and an actual segmentation point determination unit 14.
The extraction unit 11 is used for extracting frame images of the teaching video according to a first period to form an ordered image set;
in order to ensure the formation of smooth video content, the frame frequency of teaching video is more than 24 frames/second; because the time interval between the adjacent image frames is small, the image content change rate of the two adjacent video frames is not large. To overcome this problem, in the embodiment of the present application, the extraction unit 11 extracts the frame images of the teaching class according to the first cycle according to a periodic sampling method, and forms an ordered image set for subsequent processing.
The segmentation point primary selection unit 12 is configured to compare adjacent frame images in the ordered image set, and determine a temporary segmentation point.
In the embodiment of the present application, the temporary dividing point determining unit may adoptSSIMThe method compares the adjacent frame images to obtain a second similarity; then judging whether the second similarity is smaller than a second threshold value; if the second similarity is smaller than a second threshold value, setting a temporary segmentation point between adjacent frame images; if the second similarity is greater than the second threshold, a temporary cut point is not set between adjacent frame images.
A cut point deletion unit 13 for determining a first frame image and a second frame image from the provisional cut points; comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; and deleting the corresponding temporary segmentation points according to the first similarity.
The first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point.
In the embodiment of the present application, theSSIMProcessing blackboard writing areas in the first frame image and the second frame image by one or more of an algorithm, a cosine distance algorithm and a text distance algorithm to determine a first similarity; whether the corresponding temporary segmentation point is deleted or not is determined according to the first similarity.
The text distance method comprises the following steps: text content recognition is carried out on the blackboard writing areas of the first frame image and the second frame image to obtain two recognition texts; comparing the two identification texts to obtain an editing distance; a first similarity is then determined based on the edit distance and a length of a recognized text.
And an actual segmentation point determining unit 14, configured to use the remaining temporary segmentation points as actual segmentation points of the teaching video.
The teaching type video segmentation device provided by the embodiment of the application utilizes some teaching type videos to have blackboard writing areas, and the blackboard writing areas have methods with great similarity in short time, after the temporary segmentation points are determined, the temporary segmentation points of the adjacent frame blackboard writing area contents are utilized to carry out deletion operation, and the temporary segmentation points which are not suitable for being used as actual segmentation points are eliminated according to the blackboard writing contents. Because the blackboard writing content and the teaching theme have direct relation, the teaching video can be accurately segmented according to the content characteristics of the teaching video by adopting the device.
In some applications of the embodiment of the application, in order to eliminate image differences caused by human actions in teaching videos and avoid determining too many temporary segmentation points, the temporary segmentation points may be determined by comparing regions of the frame images that do not include human images.
In the embodiment of the present application, the temporary dividing point determining unit may adoptSSIMThe algorithm compares the adjacent frame images to obtain a second phaseSimilarity; and determining that a temporary segmentation point is set between the adjacent frame images in the case that the second similarity is smaller than a second threshold value.
In the embodiment of the present application, the cutting point deletion unit 13 may delete the temporary cutting points by the following method in addition to deleting the temporary cutting points by comparing the blackboard writing areas.
(1) Comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity; and deleting the temporary dividing point corresponding to the third similarity under the condition that the third similarity is larger than the third threshold.
(2) Obtaining a residual image according to the first frame image and the second frame image; and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.
In a specific application, the residual image may include a first residual image and a second residual image. The first residual image is a difference image between the first frame image and the second frame image, and the second residual image is a difference image between the second frame image and the first frame image.
The step of determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image comprises: calculating a fourth similarity according to the first residual image and the first frame image, and calculating a fifth similarity according to the second residual image and the second frame image; determining a larger value and a smaller value of the fourth similarity and the fifth similarity; and deleting the corresponding temporary dividing point under the condition that the larger value is smaller than the fourth threshold or the smaller value is smaller than the fifth threshold.
In some embodiments of the present application, the video segmentation apparatus for teaching type may further include a sound intensity determination unit, a sound identification determination unit, and an audio segmentation point determination unit.
The sound intensity determining unit is used for determining the sound intensity in each second period in the teaching video.
The sound identifier determining unit is used for setting the corresponding audio identifier of the second period as a mute identifier under the condition that the sound intensity is smaller than a preset intensity threshold value; or, when the sound intensity is greater than the preset intensity threshold, setting the audio identifier of the corresponding second period as a sound identifier.
The audio dividing point determining unit is used for determining the audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods.
In the case of including the aforementioned sound intensity determination unit, sound identification determination unit, and audio cut point determination unit, the actual cut point determination unit 14 employs the provisional cut point and the audio cut point as the actual cut point.
In some applications of the embodiments of the present application, the audio segmentation point determination unit determines the audio segmentation point by using the following method: (1) judging whether the audio identifiers of the second period of a certain number of continuous periods are mute identifiers or not; (2) if yes, keeping the audio identifiers of a second period of continuous specific number unchanged; if not, modifying the mute identification in the audio identification in the second period of continuous specific number into the sound identification; (3) and setting an audio dividing point at the position where the audio identification changes.
In some applications of the embodiments of the present application, an extraction unit 11 is further included. The extracting unit 11 is configured to extract a video segment of the teaching video between two adjacent actual segmentation points as an extracted segment. In one particular application, the method for obtaining the extracted fragment comprises the following steps: (1) determining the number of people in each frame image in the ordered image set; (2) counting the number of frame images which are positioned in the ordered image set in a time period determined by two adjacent actual segmentation points and contain more than a preset number of people; (3) and under the condition that the number of the frame images is less than the preset number, extracting a video clip positioned between two adjacent actual segmentation points as an extracted clip.
In some applications of the application, the segmentation device for teaching videos further comprises a main body determination unit; the theme determining unit is used for processing the extraction fragments or processing the frame images corresponding to the extraction fragments in the ordered image set to determine the theme of the fragments; and extracting the fragments by adopting the fragment theme index.
Based on the inventive concept, the application also provides an electronic device. Fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 4, the first server comprises at least one processor 21, at least one memory 22 and at least one communication interface 23. And a communication interface 23 for information transmission with an external device.
The various components in the first server are coupled together by a bus system 24. Understandably, the bus system 24 is used to enable connective communication between these components. The bus system 24 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, the various buses are labeled as bus system 24 in fig. 4.
It will be appreciated that the memory 22 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. In some embodiments, memory 22 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system and an application program.
The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic tasks and processing hardware-based tasks. Applications, including various applications, e.g. media players: (MediaPlayer) A browser (Browser) Etc. for implementing various application tasks. The program for implementing the segmentation method of the teaching video provided by the embodiment of the present disclosure may be included in an application program.
In the embodiment of the present disclosure, the processor 21 is configured to call a program or an instruction stored in the memory 22, specifically, the program or the instruction stored in the application program, and the processor 21 is configured to execute each step of the teaching video segmentation method provided by the embodiment of the present disclosure.
The segmentation method for teaching videos provided by the embodiment of the present disclosure may be applied to the processor 21, or implemented by the processor 21. The processor 21 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in software in the processor 21. The processor 21 may be a general-purpose processor, a digital signal processor(s) ((DigitalSignalProcessor,DSP) Application specific integrated circuit(s) (ii)ApplicationSpecific IntegratedCircuit,ASIC) Ready-to-use programmable gate array (FieldProgrammableGateArray,FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the segmentation method for teaching videos provided by the embodiment of the present disclosure can be directly embodied as the execution of a hardware decoding processor, or the execution of the hardware decoding processor and a software unit in the decoding processor are combined. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in a memory 22, and the processor 21 reads the information in the memory 22 and performs the steps of the method in combination with its hardware.
The embodiments of the present disclosure further provide a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores a program or an instruction, and the program or the instruction causes a computer to execute the steps of the teaching video segmentation method in each embodiment, and in order to avoid repeated description, details are not repeated here.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (18)

1.一种教学类视频的切分方法,其特征在于,包括:1. a segmentation method of teaching video, is characterized in that, comprises: 按照第一周期提取教学类视频的帧图像,形成有序图像集;所述第一周期根据教学类视频的长度、教学类视频的视频内容类型、用于执行后续处理的计算机处理能力确定;The frame images of the teaching video are extracted according to the first cycle to form an ordered image set; the first cycle is determined according to the length of the teaching video, the video content type of the teaching video, and the computer processing capability for performing subsequent processing; 比较所述有序图像集中相邻的所述帧图像,确定临时切分点;comparing the adjacent frame images in the ordered image set to determine a temporary segmentation point; 比较第一帧图像和第二帧图像的板书区域,确定第一相似度;所述第一帧图像为所述临时切分点前的帧图像,所述第二帧图像为所述临时切分点后的帧图像;包括:采用SSIM算法比较所述第一帧图像和所述第二帧图像的板书区域,确定所述第一相似度;和/或,采用余弦距离方法比较所述第一帧图像和所述第二帧图像的板书区域,确定所述第一相似度;和/或,对所述第一帧图像和所述第二帧图像的板书区域进行文本内容识别,得到两个识别文本;根据两个所述识别文本,确定所述第一相似度;Compare the blackboard writing area of the first frame image and the second frame image to determine the first similarity; the first frame image is the frame image before the temporary segmentation point, and the second frame image is the temporary segmentation The frame image after the point; including: using the SSIM algorithm to compare the blackboard writing area of the first frame image and the second frame image, and determining the first similarity; and/or using the cosine distance method to compare the first frame image The blackboard writing area of the frame image and the second frame image, determine the first similarity; and/or, perform text content recognition on the blackboard writing area of the first frame image and the second frame image, and obtain two Identifying text; determining the first similarity according to the two identified texts; 根据所述第一相似度删减对应的所述临时切分点;Delete the corresponding temporary segmentation point according to the first similarity; 采用剩余的所述临时切分点作为所述教学类视频的实际切分点。The remaining temporary segmentation points are used as actual segmentation points of the teaching video. 2.根据权利要求1所述教学类视频的切分方法,其特征在于,比较两个所述识别文本,确定所述第一相似度,包括:2. the segmentation method of the teaching video according to claim 1, is characterized in that, comparing two described identification texts, and determining described first similarity, comprising: 比较两个识别文本,得到编辑距离;Compare the two recognized texts to get the edit distance; 根据所述编辑距离和一个所述识别文本的长度,确定所述第一相似度。The first similarity is determined according to the edit distance and the length of one of the recognized texts. 3.根据权利要求1所述教学类视频的切分方法,其特征在于,在所述教学类视频具有人物影像的情况下,3. the segmentation method of the teaching video according to claim 1, is characterized in that, under the situation that described teaching video has person image, 比较所述有序图像集中相邻的所述帧图像,确定临时切分点,包括:比较所述帧图像中不包括所述人物影像的区域,确定所述临时切分点。Comparing the adjacent frame images in the ordered image set to determine a temporary segmentation point includes: comparing regions of the frame images that do not include the person image, and determining the temporary segmentation point. 4.根据权利要求1-3任一项所述教学类视频的切分方法,其特征在于,比较所述有序图像集的相邻帧图像,确定临时切分点,包括:4. The segmentation method of the teaching video according to any one of claims 1-3, wherein comparing the adjacent frame images of the ordered image set to determine a temporary segmentation point, comprising: 采用SSIM算法比较所述相邻帧图像,得到第二相似度;The SSIM algorithm is used to compare the adjacent frame images to obtain the second similarity; 在第二相似度小于第二阈值的情况下,确定在所述相邻帧图像间设置所述临时切分点。In the case that the second similarity is smaller than the second threshold, it is determined to set the temporary segmentation point between the adjacent frame images. 5.根据权利要求1-3任一项所述教学类视频的切分方法,其特征在于,还包括:5. the segmentation method of the teaching video according to any one of claims 1-3, is characterized in that, also comprises: 采用余弦距离法比较所述第一帧图像和所述第二帧图像,得到第三相似度;Using the cosine distance method to compare the first frame image and the second frame image to obtain a third similarity; 在所述第三相似度大于第三阈值的情况下,删除所述第三相似度对应的所述临时切分点;和/或,In the case that the third similarity is greater than a third threshold, delete the temporary segmentation point corresponding to the third similarity; and/or, 根据所述第一帧图像和所述第二帧图像得到残差图像;Obtain a residual image according to the first frame image and the second frame image; 根据所述残差图像、所述第一帧图像和所述第二帧图像,确定是否删除对应的所述临时切分点。According to the residual image, the first frame image and the second frame image, it is determined whether to delete the corresponding temporary segmentation point. 6.根据权利要求5所述的教学类视频的切分方法,其特征在于,6. the segmentation method of teaching video according to claim 5, is characterized in that, 根据所述第一帧图像和所述第二帧图像得到残差图像,包括:Obtaining a residual image according to the first frame of image and the second frame of image, including: 计算第一残差图像和第二残差图像;所述第一残差图像为所述第一帧图像与第二帧图像的差值图像,所述第二残差图像为所述第二帧图像与所述第一帧图像的差值图像;Calculate a first residual image and a second residual image; the first residual image is the difference image between the first frame image and the second frame image, and the second residual image is the second frame a difference image between the image and the first frame image; 根据所述残差图像、所述第一帧图像和所述第二帧图像,确定是否删除对应的所述临时切分点,包括:According to the residual image, the first frame image and the second frame image, determining whether to delete the corresponding temporary segmentation point, including: 根据所述第一残差图像和所述第一帧图像计算第四相似度,根据所述第二残差图像和所述第二帧图像计算第五相似度;Calculate a fourth similarity according to the first residual image and the first frame image, and calculate a fifth similarity according to the second residual image and the second frame image; 确定第四相似度和所述第五相似度的较大值和较小值;determining a larger value and a smaller value of the fourth similarity and the fifth similarity; 在所述较大值小于第四阈值或者所述较小值第五阈值的情况下,删除对应的所述临时切分点。In the case that the larger value is smaller than the fourth threshold or the smaller fifth threshold, the corresponding temporary segmentation point is deleted. 7.根据权利要求1所述的教学类视频的切分方法,其特征在于,还包括:7. the segmentation method of teaching video according to claim 1, is characterized in that, also comprises: 确定所述教学类视频各个第二周期中的声音强度;determining the sound intensity in each second cycle of the teaching video; 在所述声音强度小于预设强度阈值的情况下,将对应的所述第二周期的音频标识设为静音标识;或者,在所述声音强度大于预设强度阈值的情况下,将对应的所述第二周期的音频标识设为有声标识;In the case that the sound intensity is less than the preset intensity threshold, the corresponding audio identifier of the second period is set as the mute identifier; or, in the case that the sound intensity is greater than the preset intensity threshold, the corresponding The audio mark of the second period is set as a sound mark; 根据各个所述第二周期对应的所述音频标识的变化特性,确定音频切分点;Determine the audio segmentation point according to the variation characteristics of the audio identifiers corresponding to each of the second periods; 采用剩余的所述临时切分点作为所述教学类视频的实际切分点,包括:采用所述临时切分点和所述音频切分点作为所述实际切分点。Using the remaining temporary segmentation points as the actual segmentation points of the teaching video includes: using the temporary segmentation points and the audio segmentation points as the actual segmentation points. 8.根据权利要求7所述的教学类视频的切分方法,其特征在于,根据各个所述第二周期对应的所述音频标识的变化特性,确定音频切分点,包括:8. The segmentation method of teaching video according to claim 7, wherein, according to the variation characteristics of the audio marks corresponding to each of the second periods, determining an audio segmentation point, comprising: 判断连续特定数量的所述第二周期的音频标识是否均为静音标识;Judging whether the audio marks of the second period of a continuous specific number are all mute marks; 若是,维持连续特定数量的所述第二周期的音频标识不变;If so, maintaining the audio identification of the second period for a specified number of consecutive times unchanged; 若否,将连续特定数量的所述第二周期中音频标识中的静音标识修改为有声标识;If not, modify the mute identifiers in the audio identifiers in the second cycle for a specific number of consecutive times to sound identifiers; 在音频标识发生改变的位置设置所述音频切分点。The audio split point is set at the position where the audio identification changes. 9.根据权利要求1或7所述的教学类视频的切分方法,其特征在于,还包括:9. the segmentation method of teaching video according to claim 1 or 7, is characterized in that, also comprises: 提取所述教学类视频位于相邻的两个所述实际切分点之间的视频片段,作为提取片段。Extracting video clips in which the teaching video is located between two adjacent actual segmentation points, as extracted clips. 10.根据权利要求1或7所述的教学类视频的切分方法,其特征在于,在所述教学类视频具有人物影像的情况下,提取所述教学类视频位于相邻的两个所述实际切分点之间的视频片段,包括:10. The method for segmenting a teaching video according to claim 1 or 7, characterized in that, in the case that the teaching video has a human image, extracting that the teaching video is located in two adjacent said teaching videos. Video clips between actual breakpoints, including: 确定所述有序图像集中的各个帧图像中的人物数量;determining the number of people in each frame image in the ordered image set; 统计有序图像集中位于相邻的两个所述实际切分点确定的时间段内,并且包含超过预设人数的帧图像的数量;Statistically ordered image sets are located within a time period determined by two adjacent actual segmentation points, and include the number of frame images that exceed a preset number of people; 在所述帧图像数量小于预设数量的情况下,提取位于相邻的两个所述实际切分点之间的视频片段。In the case that the number of the frame images is less than the preset number, extract the video segments located between the two adjacent actual segmentation points. 11.根据权利要求9所述的教学类视频的切分方法,其特征在于,还包括:11. The segmentation method of teaching video according to claim 9, is characterized in that, also comprises: 处理所述提取片段,或者处理所述有序图像集中对应所述提取片段的帧图像,确定片段主题;Process the extracted segment, or process the frame image corresponding to the extracted segment in the ordered image set, to determine the segment theme; 采用所述片段主题标引所述提取片段。The extracted fragments are indexed with the fragment subject. 12.一种教学类视频的切分装置,其特征在于,包括:12. A segmentation device for teaching video, characterized in that, comprising: 提取单元,用于按照第一周期提取教学类视频的帧图像,形成有序图像集;所述第一周期根据教学类视频的长度、教学类视频的视频内容类型、用于执行后续处理的计算机处理能力确定;The extraction unit is used to extract the frame images of the teaching video according to the first cycle to form an ordered image set; the first cycle is based on the length of the teaching video, the video content type of the teaching video, and the computer for performing subsequent processing. Processing capacity determination; 切分点初选单元,用于比较所述有序图像集中相邻的所述帧图像,确定临时切分点;A segmentation point preliminary selection unit, configured to compare the adjacent frame images in the ordered image set, and determine a temporary segmentation point; 切分点删减单元,用于比较第一帧图像和第二帧图像的板书区域,确定第一相似度;以及,根据所述第一相似度删减对应的所述临时切分点;所述第一帧图像为所述临时切分点前的帧图像,所述第二帧图像为所述临时切分点后的帧图像;所述切分点删减单元,采用SSIM算法比较所述第一帧图像和所述第二帧图像的板书区域,确定所述第一相似度;和/或,采用余弦距离方法比较所述第一帧图像和所述第二帧图像的板书区域,确定所述第一相似度;和/或,对所述第一帧图像和所述第二帧图像的板书区域进行文本内容识别,得到两个识别文本;以及,根据两个所述识别文本,确定所述第一相似度;A segmentation point deletion unit, used for comparing the blackboard writing area of the first frame image and the second frame image, and determining a first similarity; and deleting the corresponding temporary segmentation point according to the first similarity; The first frame image is the frame image before the temporary segmentation point, and the second frame image is the frame image after the temporary segmentation point; the segmentation point deletion unit uses the SSIM algorithm to compare the The blackboard writing area of the first frame image and the second frame image, to determine the first similarity; and/or, using the cosine distance method to compare the blackboard writing area of the first frame image and the second frame image, to determine the first similarity; and/or, performing text content recognition on the blackboard writing area of the first frame image and the second frame image to obtain two recognized texts; and, according to the two recognized texts, determine the first similarity; 实际切分点确定单元,用于采用剩余的所述临时切分点作为所述教学类视频的实际切分点。The actual split point determination unit is configured to use the remaining temporary split points as the actual split points of the teaching video. 13.根据权利要求12所述教学类视频的切分装置,其特征在于,切分点删减单元还用于:13. according to the segmentation device of the described teaching video of claim 12, it is characterized in that, segmentation point deletion unit is also used for: 采用余弦距离法比较所述第一帧图像和第二帧图像,得到第三相似度;以及,在所述第三相似度大于第三阈值的情况下,删除所述第三相似度对应的所述临时切分点;和/或,Using the cosine distance method to compare the first frame image and the second frame image to obtain a third degree of similarity; and, in the case that the third degree of similarity is greater than a third threshold, delete all the images corresponding to the third degree of similarity the temporary cut-off point; and/or, 根据所述第一帧图像和所述第二帧图像得到残差图像;根据所述残差图像、所述第一帧图像和所述第二帧图像,确定是否删除对应的所述临时切分点。Obtain a residual image according to the first frame image and the second frame image; determine whether to delete the corresponding temporary segmentation according to the residual image, the first frame image and the second frame image point. 14.根据权利要求12教学类视频的切分装置,其特征在于,还包括:14. according to the segmentation device of claim 12 teaching video, it is characterized in that, also comprises: 声音强度确定单元,用于确定所述教学类视频中各个第二周期中的声音强度;a sound intensity determination unit, configured to determine the sound intensity in each second cycle in the teaching video; 声音标识确定单元,用于在所述声音强度小于预设强度阈值的情况下,将对应的所述第二周期的音频标识设为静音标识;或者,在所述声音强度大于预设强度阈值的情况下,将对应的所述第二周期的音频标识设为有声标识;A sound identification determining unit, configured to set the corresponding audio identification of the second period as a mute identification when the sound intensity is less than a preset intensity threshold; or, when the sound intensity is greater than a preset intensity threshold Under the circumstance, the audio mark of the corresponding described second period is set as the sound mark; 音频切分点确定单元,用于根据各个所述第二周期对应的所述音频标识的变化特性,确定音频切分点;an audio segmentation point determination unit, configured to determine an audio segmentation point according to the variation characteristics of the audio identifiers corresponding to each of the second periods; 所述实际切分点确定单元采用所述临时切分点和所述音频切分点作为所述实际切分点。The actual split point determination unit adopts the temporary split point and the audio split point as the actual split point. 15.根据权利要求12-14任一项所述的教学类视频的切分装置,其特征在于,还包括:15. The device for segmenting a teaching video according to any one of claims 12-14, further comprising: 提取单元,用于提取所述教学类视频位于相邻的两个所述实际切分点之间的视频片段,作为提取片段。An extraction unit, configured to extract the video segments of the teaching video located between two adjacent actual segmentation points, as extraction segments. 16.根据权利要求15所述的教学类视频的切分装置,其特征在于,还包括:16. The device for segmenting a teaching video according to claim 15, further comprising: 主题确定单元,用于处理所述提取片段,或者处理所述有序图像集中对应所述提取片段的帧图像,确定片段主题;以及,a theme determination unit, configured to process the extracted segment, or process the frame images corresponding to the extracted segment in the ordered image set, to determine the segment theme; and, 采用所述片段主题标引所述提取片段。The extracted fragments are indexed with the fragment subject. 17.一种电子设备,其特征在于,包括处理器和存储器;17. An electronic device, comprising a processor and a memory; 所述处理器通过调用所述存储器存储的程序或指令,用于执行如权利要求1至11任一项所述教学类视频的切分方法的步骤。The processor is configured to execute the steps of the method for segmenting a teaching video according to any one of claims 1 to 11 by calling a program or an instruction stored in the memory. 18.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如权利要求1至11任一项所述教学类视频的切分方法的步骤。18. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a program or an instruction, and the program or instruction causes a computer to perform the cutting of the instructional video according to any one of claims 1 to 11. steps of the method.
CN202110278404.7A 2021-03-16 2021-03-16 Teaching video segmentation determination method and device Active CN112668561B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110278404.7A CN112668561B (en) 2021-03-16 2021-03-16 Teaching video segmentation determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110278404.7A CN112668561B (en) 2021-03-16 2021-03-16 Teaching video segmentation determination method and device

Publications (2)

Publication Number Publication Date
CN112668561A CN112668561A (en) 2021-04-16
CN112668561B true CN112668561B (en) 2022-03-29

Family

ID=75399421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110278404.7A Active CN112668561B (en) 2021-03-16 2021-03-16 Teaching video segmentation determination method and device

Country Status (1)

Country Link
CN (1) CN112668561B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114245229B (en) * 2022-01-29 2024-02-06 北京百度网讯科技有限公司 Short video production method, device, equipment and storage medium
CN116704392B (en) * 2022-02-28 2024-10-15 腾讯科技(深圳)有限公司 Video processing method, device, equipment, storage medium and product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902289A (en) * 2019-01-23 2019-06-18 汕头大学 A news video topic segmentation method for fuzzy text mining
US20190287415A1 (en) * 2018-03-14 2019-09-19 At&T Intellectual Property I, L.P. Content curation for course generation
CN112261477A (en) * 2020-10-22 2021-01-22 新东方教育科技集团有限公司 Video processing method and device, training method and storage medium
CN112289321A (en) * 2020-12-29 2021-01-29 平安科技(深圳)有限公司 Explanation synchronization video highlight processing method and device, computer equipment and medium
CN112287914A (en) * 2020-12-27 2021-01-29 平安科技(深圳)有限公司 PPT video segment extraction method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190287415A1 (en) * 2018-03-14 2019-09-19 At&T Intellectual Property I, L.P. Content curation for course generation
CN109902289A (en) * 2019-01-23 2019-06-18 汕头大学 A news video topic segmentation method for fuzzy text mining
CN112261477A (en) * 2020-10-22 2021-01-22 新东方教育科技集团有限公司 Video processing method and device, training method and storage medium
CN112287914A (en) * 2020-12-27 2021-01-29 平安科技(深圳)有限公司 PPT video segment extraction method, device, equipment and medium
CN112289321A (en) * 2020-12-29 2021-01-29 平安科技(深圳)有限公司 Explanation synchronization video highlight processing method and device, computer equipment and medium

Also Published As

Publication number Publication date
CN112668561A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
US8433136B2 (en) Tagging video using character recognition and propagation
CN111931775B (en) Method, system, computer device and storage medium for automatically acquiring news headlines
CN114465737B (en) Data processing method and device, computer equipment and storage medium
Shekhar et al. Show and recall: Learning what makes videos memorable
CN112668561B (en) Teaching video segmentation determination method and device
CN109583443B (en) Video content judgment method based on character recognition
CN111813998B (en) Video data processing method, device, equipment and storage medium
CN118470717B (en) Method, device, computer program product, equipment and medium for generating annotation text
CN110198482A (en) A kind of video emphasis bridge section mask method, terminal and storage medium
CN113435438B (en) Image and subtitle fused video screen plate extraction and video segmentation method
CN109471955B (en) Video clip positioning method, computing device and storage medium
CN113254708A (en) Video searching method and device, computer equipment and storage medium
CN115438223B (en) Video processing method, device, electronic equipment and storage medium
CN110795597A (en) Video keyword determination method, video retrieval method, video keyword determination device, video retrieval device, storage medium and terminal
CN116524906A (en) Training data generation method and system for voice recognition and electronic equipment
CN114257757B (en) Automatic video clipping and switching method and system, video player and storage medium
CN111522992A (en) Method, device and equipment for putting questions into storage and storage medium
CN111914760A (en) A method and system for analyzing the composition of online course video resources
CN113301382B (en) Video processing method, device, medium, and program product
CN116226453B (en) Method, device and terminal equipment for identifying dancing teaching video clips
EP4447469A2 (en) Processing method and apparatus, terminal device and medium
CN111008295A (en) Page retrieval method and device, electronic equipment and storage medium
CN113194333B (en) Video editing method, device, equipment and computer readable storage medium
CN116017088A (en) Video subtitle processing method, device, electronic device and storage medium
CN115665508A (en) Video abstract generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant