CN113269854A - Method for intelligently generating interview-type comprehensive programs - Google Patents
Method for intelligently generating interview-type comprehensive programs Download PDFInfo
- Publication number
- CN113269854A CN113269854A CN202110803384.0A CN202110803384A CN113269854A CN 113269854 A CN113269854 A CN 113269854A CN 202110803384 A CN202110803384 A CN 202110803384A CN 113269854 A CN113269854 A CN 113269854A
- Authority
- CN
- China
- Prior art keywords
- face
- frame
- video
- channel
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000000463 material Substances 0.000 claims abstract description 45
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 230000000087 stabilizing effect Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 239000002131 composite material Substances 0.000 claims description 3
- 239000012634 fragment Substances 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000006641 stabilisation Effects 0.000 claims description 3
- 238000011105 stabilization Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Television Signal Processing For Recording (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for intelligently generating interview-type comprehensive programs, which comprises the following steps: s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software; s2, setting the role played by each channel material according to the camera shooting picture in the program video; s3, extracting video characteristics of each channel material; s4, generating a plurality of candidate video clips in each channel according to the extracted video features; s5, selecting candidate video clips according to predefined rules, synthesizing program initial clips and the like; the invention can quickly generate the initial film, provides the later editing personnel with quick editing and film forming, and reduces the manual load.
Description
Technical Field
The invention relates to the field of video program synthesis, in particular to a method for intelligently generating interview-type integrated art programs.
Background
The interview-type program is a television program form which is easy and pleasant in atmosphere and is carried out around a certain theme between a host and guests in a mode of taking conversation as a main form, and the interview-type comprehensive program is an interview program which is mainly aimed at pleasure, mind and body and leisure fun, and is added with more comprehensive components and comic situation design to achieve a dramatic effect so as to be earmarked by entertainers. Its guests are mainly celebrity and sports stars, and therefore tend to have a very high popularity among young people. Although the programs are not similar to other art programs and are usually shot in a single scene and stage, a large number of cameras are still required to be arranged on the spot, and during shooting, pictures shot at different angles by different scenes on the spot are fully utilized to synthesize the initial program through a series of complicated operations such as real-time coordination between a director on the spot and each machine group member, lens cutting and the like, which often requires rich command experience and on-site capability of the director.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for intelligently generating interview-type comprehensive programs, which can quickly generate an initial film and provide the editing personnel for later-stage editing to quickly edit and generate the film, thereby reducing the manual load.
The purpose of the invention is realized by the following scheme:
a method for intelligently generating interview-type integrated art programs comprises the following steps:
s1, recording program video materials shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video;
s3, extracting video characteristics of each channel material;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features;
and S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips.
Further, in step S2, the setting the role played by each channel material includes the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
Further, in step S3, the following steps are included:
s31, establishing a face library containing the host and the guest of the field program;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame;
s33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error;
and S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time.
Further, in step S31, if the program is sharedAnd the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face featuresFeature matrix ofAndname matrix of;Is an integer which is the number of the whole,respectively correspond to the matrixAndto (1) aGo to the firstColumn elements.
Further, in step S32, if there is anyVideo material of each channel, wherein each video material isFrames, each frame having been aligned on a timeline, are passed throughAn individual materialTo (1) aFrame imageFace recognition processing is carried out to obtain a processing result set of the frame,
WhereinDenotes the firstA face feature matrix obtained by frame extraction,in order to detect the number of faces,is shown asFirst of frame extractionThe characteristics of the individual's face are,denotes the firstAll the face frames detected by the frame are,is shown asFirst of frame detectionThe number of the face frames is one,denotes the firstThe key points of all the faces detected by the frame,is shown asFirst of frame detectionThe key points of the face of the individual,denotes the firstThe face detected by the frame corresponds to the identified name,is shown asFirst of frame detectionThe name of the person corresponding to the individual person,
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,is shown asThe name of the individual person is used,indicating that the index corresponding to the maximum value is taken,representing a similarity calculation function. The result of extracting video features from all stories is expressed as。
Further, in step S33, for the second stepAn individual materialTo (1) aFrame imageGiven its width ofHigh isBy counting the picture stability scoresTo characterize whether the frame of image picture is stable,
wherein,is to show toThe frame image is taken as a gray-scale image,which represents the fourier transform of the signal,representing the conversion of the 0 frequency component to the center of the spectrum,it is indicated that the absolute value is taken,is composed ofThe absolute value of (a) is,is composed ofThe grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,is a threshold value set asOf medium maximum value,Is composed ofThe number of pixels greater than the threshold value inIf the value is larger than the set empirical value, the image is representedAnd (5) stabilizing the picture.
Further, in step S34, for the second stepAn individual materialTaking a fixed time window size of(i.e., fixed duration of time of) The key point data of the face of the same personI.e. by
WhereinIs shown byThe average value of the human figure mouth-shaped area,representing a characterAt the moment of timeThe key points of the face at the time of the operation,indicates the calculated area thereof whenWhen the value is larger than the set empirical value, the name isThe character ofSpeaking during the time period is marked as a speaker.
Further, in step S4, the following steps are included:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the firstAn individual materialAll-frame analysis results ofGo through all the results whenGreater than a set empirical value, the flagContinuously traversing subsequent results for the entry point of the updated candidate segment whenWhen the value is less than or equal to the set empirical value, markingGenerating material for the out-pointing of the updated candidate segment, and so onIn a common vesselInitial candidate segment list of candidate segments;
S42, traversing the initial candidate segment list generated in S41Comparing the current segmentOut point ofWith the next segmentIn the point of entryIf, ifIf the value is larger than the set empirical value, the segment is dividedAnd fragments thereofAre combined intoAt the point of entry isIn the point of entryAt the point of departure isOut point ofAnd so on, generating a final candidate segment list。
Further, in step S5, the following steps are included:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42Final candidate segment column for individual channel materialWatch (A)And the speaker marking result in the step S34, filling the segment in the final candidate list of each channel material into the final slicing timeline according to the following rules (the higher the priority is in the front), to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above ruleAnd selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The beneficial effects of the invention include:
(1) the method of the invention provides a program initial film generation method by utilizing video face recognition, speaker recognition and picture stability analysis through observing the on-site command and lens cutting logic of a director when shooting interview-type integrated art programs, and the method extracts the most appropriate lens segments from pictures shot from different angles and automatically generates the interview-type integrated art program initial film so as to reduce the workload of the director and later-period program editors.
(2) The invention provides a simple and efficient method for automatically synthesizing interview-type comprehensive video program initial films only by a small amount of presetting; specifically, roles are divided according to scenes by shooting pictures of different cameras on site in a node list system, a host and a guest are marked through face recognition processing, a speaker is marked through mouth shape analysis, invalid lenses are filtered through calculating picture stability scores to generate a candidate video clip list, and finally all candidate video clips are combined regularly to generate a program primary clip. The method of the invention achieves the purposes of quickly generating the initial film, providing the post-editing personnel with quick editing and film forming and reducing the manual load.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of steps in an embodiment of a method of the present invention;
FIG. 2 is a flow chart of the method embodiment of the present invention for extracting visual features from a channel material.
Detailed Description
All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
As shown in fig. 1 and 2, a method for intelligently generating interview-type integrated art programs includes the steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
for example, in this step, videos of programs shot by 6 cameras at the scene of the program "when going on the spring and evening" are recorded, respectively(ii) a Other programs may also be recorded, and the number of cameras may be 8, 10, 12, and the like, which is not described herein again.
S2, setting the role played by each channel material according to the camera shooting picture in the program video;
setting the role played by each channel material according to the picture shot by the camera; in particular, the amount of the solvent to be used,the camera is fixed, the shot picture is a close shot,the camera is fixed, the shot picture is a middle view,the camera is fixed, the shot picture is a long shot,the camera is a rocker arm camera, and the shot picture is a long shot.
In step S2, the setting of the role played by each channel material includes the following steps: dividing each channel material into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
S3, extracting video features for each channel material, in step S3, the method includes the following steps:
s31, establishing a face library containing the host and the guest of the field program;
in step S31, if the program is sharedAnd the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face featuresFeature matrix ofAndname matrix of;Is an integer which is the number of the whole,respectively correspond to the matrixAndto (1) aGo to the firstColumn elements.
S32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame; in step S32, if there is anyThe video materials of the channels, here, N =6 (may be other numbers), each of which is a video material of one channelFrames, each frame having been aligned on a timeline, are passed throughAn individual materialTo (1) aFrame imageFace recognition processing is carried out to obtain a processing result set of the frame;
WhereinDenotes the firstA face feature matrix obtained by frame extraction,in order to detect the number of faces,is shown asFirst of frame extractionThe characteristics of the individual's face are,denotes the firstAll the face frames detected by the frame are,is shown asFirst of frame detectionThe number of the face frames is one,denotes the firstThe key points of all the faces detected by the frame,is shown asFirst of frame detectionThe key points of the face of the individual,denotes the firstThe face detected by the frame corresponds to the identified name,is shown asFirst of frame detectionThe name of the person corresponding to the individual person,
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,is shown asThe name of the individual person is used,indicating that the index corresponding to the maximum value is taken,representing a similarity calculation function. The result of extracting video special frames from all the materials is expressed as。
S33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error; in step S33, for the second stepAn individual materialTo (1) aFrame imageGiven its width ofHigh isBy counting the picture stability scoresTo characterize whether the frame of image picture is stable,
wherein,is to show toThe frame image is taken as a gray-scale image,which represents the fourier transform of the signal,representing the conversion of the 0 frequency component to the center of the spectrum,it is indicated that the absolute value is taken,is composed ofThe absolute value of (a) is,is composed ofThe grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,is a threshold value set asOf medium maximum value,Is composed ofThe number of pixels greater than the threshold value inWhen the image is larger than a certain preset value, the image is representedAnd (5) stabilizing the picture. In the present embodiment, for example, the preset value is taken asI.e. byThen represent the imageAnd (5) stabilizing the picture.
S34, using the data in step S31, using the face key point data of the same person in continuous time dimension, performing mouth shape analysis, and judging the sameWhether speaking is occurring within a set time. In step S34, for the second stepAn individual materialTaking a fixed duration ofThe key point data of the face of the same personI.e. by
WhereinThe average value of the human mouth shape area in the period of time is shown,representing a characterAt the moment of timeThe key points of the face at the time of the operation,indicates the calculated area thereof whenIf it is greater than a predetermined value, V may be 500, and is referred to asThe character ofSpeaking during the time period is marked as a speaker. In this embodiment, T may be 250 units, for example, and is selected according to actual conditions.
S4, generating a plurality of candidate video clips in each channel according to the extracted video features; in step S4, the method includes the steps of:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the firstAn individual materialAll-frame analysis results ofGo through all the results whenAbove a certain preset value (where the preset value can be 0.002, depending on different programs), the mark is markedContinuously traversing subsequent results for the entry point of the updated candidate segment whenLess than or equal to a predetermined value (the predetermined value may be 0.002, depending on the program), markingGenerating material for the out-pointing of the updated candidate segment, and so onIn a common vesselInitial candidate segment list of candidate segments;
S42, traversing the initial candidate segment list generated in S41Comparing the current segmentOut point ofWith the next segmentIn the point of entryIf, ifAbove a certain preset value (here, 50 frames) the segment is segmentedAnd fragments thereofAre combined intoAt the point of entry isIn the point of entryAt the point of departure isOut point ofAnd so on, generating a final candidate segment list。
And S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips. In step S5, the method includes the steps of:
s51, setting priority according to the scene according to the shooting picture category of each channel material; in particular, for the 6 channel materialTo,The highest priority is given to the first group,the priority level is set to a second priority level,the lowest priority;
s52, integrating the step S42Final candidate segment list for individual channel materialAnd the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above ruleAnd selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The parts not involved in the present invention are the same as or can be implemented using the prior art.
The above-described embodiment is only one embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be easily made based on the application and principle of the present invention disclosed in the present application, and the present invention is not limited to the method described in the above-described embodiment of the present invention, so that the above-described embodiment is only preferred, and not restrictive.
Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.
The functionality of the present invention, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium, and all or part of the steps of the method according to the embodiments of the present invention are executed in a computer device (which may be a personal computer, a server, or a network device) and corresponding software. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, or an optical disk, exist in a read-only Memory (RAM), a Random Access Memory (RAM), and the like, for performing a test or actual data in a program implementation.
Claims (10)
1. A method for intelligently generating interview-type comprehensive programs is characterized by comprising the following steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video;
s3, extracting video characteristics of each channel material;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features;
and S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips.
2. The method of claim 1, wherein in step S2, the setting of the role played by each channel material comprises the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
3. The method for intelligently generating interview-like integrated art programs according to claim 1, wherein in step S3, the method comprises the following steps:
s31, establishing a face library containing the host and the guest of the field program;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame;
s33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error;
and S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time.
4. The method of claim 3, wherein in step S31, if the program is sharedAnd the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face featuresFeature matrix ofAndname matrix of;Is an integer which is the number of the whole,respectively correspond to the matrixAndto (1) aGo to the firstColumn elements.
5. The method of claim 4, wherein in step S32, if there is any, there is a method for generating interview-like integrated art programsVideo material of each channel, wherein each video material isFrames of eachA frame is aligned on the time line, then pass throughAn individual materialTo (1) aFrame imageFace recognition processing is carried out to obtain a processing result set of the frame,
WhereinDenotes the firstA face feature matrix obtained by frame extraction,in order to detect the number of faces,is shown asFirst of frame extractionThe characteristics of the individual's face are,is shown asAll the face frames detected by the frame are,is shown asFirst of frame detectionThe number of the face frames is one,is shown asThe key points of all the faces detected by the frame,is shown asFirst of frame detectionThe key points of the face of the individual,is shown asFace correspondence recognition from frame detectionThe name of the person(s) of (c),is shown asFirst of frame detectionThe name of the person corresponding to the individual person,
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,is shown asThe name of the individual person is used,indicating that the index corresponding to the maximum value is taken,representing a similarity calculation function; the result of extracting video features from all stories is expressed as。
6. The method of claim 5, wherein in step S33, for the second stepAn individual materialTo (1) aFrame imageGiven its width ofHigh isBy counting the picture stability scoresTo characterize whether the frame of image picture is stable,
wherein,is to show toThe frame image is taken as a gray-scale image,which represents the fourier transform of the signal,representing the conversion of the 0 frequency component to the center of the spectrum,it is indicated that the absolute value is taken,is composed ofThe absolute value of (a) is,is composed ofThe grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,is a threshold value set asOf medium maximum value,Is composed ofThe number of pixels greater than the threshold value inIf the value is larger than the set empirical value, the image is representedAnd (5) stabilizing the picture.
7. The method of claim 6, wherein in step S34, for the second stepAn individual materialTaking a fixed time window size ofThe key point data of the face of the same personI.e. by
WhereinIs shown byThe average value of the human figure mouth-shaped area,representing a characterAt the moment of timeThe key points of the face at the time of the operation,indicates the calculated area thereof whenWhen the value is larger than the set empirical value, the name isThe character ofSpeaking during the time period is marked as a speaker.
8. The method for intelligently generating interview-like integrated art programs according to claim 7, wherein in step S4, the method comprises the following steps:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the firstAn individual materialAll-frame analysis results ofGo through all the results whenGreater than a set empirical value, the flagContinuously traversing subsequent results for the entry point of the updated candidate segment whenWhen the value is less than or equal to the set empirical value, markingGenerating material for the out-pointing of the updated candidate segment, and so onIn a common vesselInitial candidate segment list of candidate segments;
S42, traversing the initial candidate segment list generated in S41Comparing the current segmentOut point ofWith the next segmentIn the point of entryIf, ifIf the value is larger than the set empirical value, the segment is dividedAnd fragments thereofAre combined intoAt the point of entry isIn the point of entryAt the point of departure isOut point ofAnd so on, generating a final candidate segment list。
9. The method for intelligently generating interview-like integrated art programs according to claim 8, wherein in step S5, the method comprises the following steps:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42Final candidate segment list for individual channel materialAnd the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
10. The method of claim 9, wherein in step S51, priority is set as: short shot > medium shot > long shot.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110803384.0A CN113269854B (en) | 2021-07-16 | 2021-07-16 | Method for intelligently generating interview-type comprehensive programs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110803384.0A CN113269854B (en) | 2021-07-16 | 2021-07-16 | Method for intelligently generating interview-type comprehensive programs |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113269854A true CN113269854A (en) | 2021-08-17 |
CN113269854B CN113269854B (en) | 2021-10-15 |
Family
ID=77236586
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110803384.0A Active CN113269854B (en) | 2021-07-16 | 2021-07-16 | Method for intelligently generating interview-type comprehensive programs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113269854B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115174962A (en) * | 2022-07-22 | 2022-10-11 | 湖南芒果无际科技有限公司 | Rehearsal simulation method and device, computer equipment and computer readable storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005091211A1 (en) * | 2004-03-16 | 2005-09-29 | 3Vr Security, Inc. | Interactive system for recognition analysis of multiple streams of video |
US20070265968A1 (en) * | 2006-05-15 | 2007-11-15 | The Directv Group, Inc. | Methods and apparatus to conditionally authorize content delivery at content servers in pay delivery systems |
US20110004513A1 (en) * | 2003-02-05 | 2011-01-06 | Hoffberg Steven M | System and method |
CN104732991A (en) * | 2015-04-08 | 2015-06-24 | 成都索贝数码科技股份有限公司 | System and method for rapidly sorting, selecting and editing entertainment program massive materials |
CN105307028A (en) * | 2015-10-26 | 2016-02-03 | 新奥特(北京)视频技术有限公司 | Video editing method and device specific to video materials of plurality of lenses |
US20170032559A1 (en) * | 2015-10-16 | 2017-02-02 | Mediatek Inc. | Simulated Transparent Device |
CN106682617A (en) * | 2016-12-28 | 2017-05-17 | 电子科技大学 | Image definition judgment and feature extraction method based on frequency spectrum section information |
CN108875602A (en) * | 2018-05-31 | 2018-11-23 | 珠海亿智电子科技有限公司 | Monitor the face identification method based on deep learning under environment |
CN110691258A (en) * | 2019-10-30 | 2020-01-14 | 中央电视台 | Program material manufacturing method and device, computer storage medium and electronic equipment |
CN111191484A (en) * | 2018-11-14 | 2020-05-22 | 普天信息技术有限公司 | Method and device for recognizing human speaking in video image |
-
2021
- 2021-07-16 CN CN202110803384.0A patent/CN113269854B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110004513A1 (en) * | 2003-02-05 | 2011-01-06 | Hoffberg Steven M | System and method |
WO2005091211A1 (en) * | 2004-03-16 | 2005-09-29 | 3Vr Security, Inc. | Interactive system for recognition analysis of multiple streams of video |
US20070265968A1 (en) * | 2006-05-15 | 2007-11-15 | The Directv Group, Inc. | Methods and apparatus to conditionally authorize content delivery at content servers in pay delivery systems |
CN104732991A (en) * | 2015-04-08 | 2015-06-24 | 成都索贝数码科技股份有限公司 | System and method for rapidly sorting, selecting and editing entertainment program massive materials |
US20170032559A1 (en) * | 2015-10-16 | 2017-02-02 | Mediatek Inc. | Simulated Transparent Device |
CN105307028A (en) * | 2015-10-26 | 2016-02-03 | 新奥特(北京)视频技术有限公司 | Video editing method and device specific to video materials of plurality of lenses |
CN106682617A (en) * | 2016-12-28 | 2017-05-17 | 电子科技大学 | Image definition judgment and feature extraction method based on frequency spectrum section information |
CN108875602A (en) * | 2018-05-31 | 2018-11-23 | 珠海亿智电子科技有限公司 | Monitor the face identification method based on deep learning under environment |
CN111191484A (en) * | 2018-11-14 | 2020-05-22 | 普天信息技术有限公司 | Method and device for recognizing human speaking in video image |
CN110691258A (en) * | 2019-10-30 | 2020-01-14 | 中央电视台 | Program material manufacturing method and device, computer storage medium and electronic equipment |
Non-Patent Citations (3)
Title |
---|
F´ELICIEN VALLET等: ""ROBUST VISUAL FEATURES FOR THE MULTIMODAL IDENTIFICATION OF UNREGISTERED SPEAKERS IN TV TALK-SHOWS"", 《2010 IEEE 17TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 * |
无: ""索贝AI剪辑应用于总台综艺访谈类节目"", 《现代电视技术》 * |
王炳锡等: "说话人辨认中有效参数的研究", 《应用声学》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115174962A (en) * | 2022-07-22 | 2022-10-11 | 湖南芒果无际科技有限公司 | Rehearsal simulation method and device, computer equipment and computer readable storage medium |
CN115174962B (en) * | 2022-07-22 | 2024-05-24 | 湖南芒果融创科技有限公司 | Method, device, computer equipment and computer readable storage medium for previewing simulation |
Also Published As
Publication number | Publication date |
---|---|
CN113269854B (en) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7252362B2 (en) | Method for automatically editing video and portable terminal | |
CN107707931B (en) | Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment | |
Chen et al. | What comprises a good talking-head video generation?: A survey and benchmark | |
US7949188B2 (en) | Image processing apparatus, image processing method, and program | |
CN106686452B (en) | Method and device for generating dynamic picture | |
JP5510167B2 (en) | Video search system and computer program therefor | |
Sah et al. | Semantic text summarization of long videos | |
US8879788B2 (en) | Video processing apparatus, method and system | |
CN107430780B (en) | Method for output creation based on video content characteristics | |
US6492990B1 (en) | Method for the automatic computerized audio visual dubbing of movies | |
CN111683209A (en) | Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium | |
US20070165022A1 (en) | Method and system for the automatic computerized audio visual dubbing of movies | |
US20030085901A1 (en) | Method and system for the automatic computerized audio visual dubbing of movies | |
CN110505498A (en) | Processing, playback method, device and the computer-readable medium of video | |
WO2020029883A1 (en) | Method and device for generating video fingerprint | |
CN113255628B (en) | Scene identification recognition method for news scene | |
JP6389296B1 (en) | VIDEO DATA PROCESSING DEVICE, VIDEO DATA PROCESSING METHOD, AND COMPUTER PROGRAM | |
CN113269854B (en) | Method for intelligently generating interview-type comprehensive programs | |
US9542976B2 (en) | Synchronizing videos with frame-based metadata using video content | |
JP2010039877A (en) | Apparatus and program for generating digest content | |
CN116708055B (en) | Intelligent multimedia audiovisual image processing method, system and storage medium | |
CN113992973A (en) | Video abstract generation method and device, electronic equipment and storage medium | |
Choi et al. | Automated video editing for aesthetic quality improvement | |
CN116916089B (en) | Intelligent video editing method integrating voice features and face features | |
CN116844562A (en) | Short video background music editing method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |