[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113269854A - Method for intelligently generating interview-type comprehensive programs - Google Patents

Method for intelligently generating interview-type comprehensive programs Download PDF

Info

Publication number
CN113269854A
CN113269854A CN202110803384.0A CN202110803384A CN113269854A CN 113269854 A CN113269854 A CN 113269854A CN 202110803384 A CN202110803384 A CN 202110803384A CN 113269854 A CN113269854 A CN 113269854A
Authority
CN
China
Prior art keywords
face
frame
video
channel
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110803384.0A
Other languages
Chinese (zh)
Other versions
CN113269854B (en
Inventor
袁琦
李�杰
杨瀚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sobei Video Cloud Computing Co ltd
Original Assignee
Chengdu Sobei Video Cloud Computing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sobei Video Cloud Computing Co ltd filed Critical Chengdu Sobei Video Cloud Computing Co ltd
Priority to CN202110803384.0A priority Critical patent/CN113269854B/en
Publication of CN113269854A publication Critical patent/CN113269854A/en
Application granted granted Critical
Publication of CN113269854B publication Critical patent/CN113269854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for intelligently generating interview-type comprehensive programs, which comprises the following steps: s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software; s2, setting the role played by each channel material according to the camera shooting picture in the program video; s3, extracting video characteristics of each channel material; s4, generating a plurality of candidate video clips in each channel according to the extracted video features; s5, selecting candidate video clips according to predefined rules, synthesizing program initial clips and the like; the invention can quickly generate the initial film, provides the later editing personnel with quick editing and film forming, and reduces the manual load.

Description

Method for intelligently generating interview-type comprehensive programs
Technical Field
The invention relates to the field of video program synthesis, in particular to a method for intelligently generating interview-type integrated art programs.
Background
The interview-type program is a television program form which is easy and pleasant in atmosphere and is carried out around a certain theme between a host and guests in a mode of taking conversation as a main form, and the interview-type comprehensive program is an interview program which is mainly aimed at pleasure, mind and body and leisure fun, and is added with more comprehensive components and comic situation design to achieve a dramatic effect so as to be earmarked by entertainers. Its guests are mainly celebrity and sports stars, and therefore tend to have a very high popularity among young people. Although the programs are not similar to other art programs and are usually shot in a single scene and stage, a large number of cameras are still required to be arranged on the spot, and during shooting, pictures shot at different angles by different scenes on the spot are fully utilized to synthesize the initial program through a series of complicated operations such as real-time coordination between a director on the spot and each machine group member, lens cutting and the like, which often requires rich command experience and on-site capability of the director.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for intelligently generating interview-type comprehensive programs, which can quickly generate an initial film and provide the editing personnel for later-stage editing to quickly edit and generate the film, thereby reducing the manual load.
The purpose of the invention is realized by the following scheme:
a method for intelligently generating interview-type integrated art programs comprises the following steps:
s1, recording program video materials shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video;
s3, extracting video characteristics of each channel material;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features;
and S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips.
Further, in step S2, the setting the role played by each channel material includes the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
Further, in step S3, the following steps are included:
s31, establishing a face library containing the host and the guest of the field program;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame;
s33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error;
and S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time.
Further, in step S31, if the program is shared
Figure 919903DEST_PATH_IMAGE001
And the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face features
Figure 633781DEST_PATH_IMAGE002
Feature matrix of
Figure 679097DEST_PATH_IMAGE003
And
Figure 871044DEST_PATH_IMAGE004
name matrix of
Figure 950996DEST_PATH_IMAGE005
Figure 304617DEST_PATH_IMAGE001
Is an integer which is the number of the whole,
Figure 102809DEST_PATH_IMAGE006
respectively correspond to the matrix
Figure 832867DEST_PATH_IMAGE007
And
Figure 537299DEST_PATH_IMAGE008
to (1) a
Figure 61821DEST_PATH_IMAGE009
Go to the first
Figure 347309DEST_PATH_IMAGE010
Column elements.
Further, in step S32, if there is any
Figure 615479DEST_PATH_IMAGE011
Video material of each channel, wherein each video material is
Figure 404444DEST_PATH_IMAGE012
Frames, each frame having been aligned on a timeline, are passed through
Figure 365446DEST_PATH_IMAGE013
An individual material
Figure 138230DEST_PATH_IMAGE014
To (1) a
Figure 678933DEST_PATH_IMAGE015
Frame image
Figure 119142DEST_PATH_IMAGE016
Face recognition processing is carried out to obtain a processing result set of the frame
Figure 221352DEST_PATH_IMAGE017
Figure 481432DEST_PATH_IMAGE018
Wherein
Figure 91405DEST_PATH_IMAGE019
Denotes the first
Figure 120541DEST_PATH_IMAGE015
A face feature matrix obtained by frame extraction,
Figure 423346DEST_PATH_IMAGE020
in order to detect the number of faces,
Figure 905143DEST_PATH_IMAGE021
is shown as
Figure 318807DEST_PATH_IMAGE015
First of frame extraction
Figure 202450DEST_PATH_IMAGE010
The characteristics of the individual's face are,
Figure 174691DEST_PATH_IMAGE022
denotes the first
Figure 143784DEST_PATH_IMAGE015
All the face frames detected by the frame are,
Figure 95560DEST_PATH_IMAGE023
is shown as
Figure 99288DEST_PATH_IMAGE015
First of frame detection
Figure 478317DEST_PATH_IMAGE010
The number of the face frames is one,
Figure 934706DEST_PATH_IMAGE024
denotes the first
Figure 690172DEST_PATH_IMAGE015
The key points of all the faces detected by the frame,
Figure 282828DEST_PATH_IMAGE025
is shown as
Figure 98337DEST_PATH_IMAGE015
First of frame detection
Figure 402541DEST_PATH_IMAGE010
The key points of the face of the individual,
Figure 961699DEST_PATH_IMAGE026
denotes the first
Figure 674440DEST_PATH_IMAGE015
The face detected by the frame corresponds to the identified name,
Figure 395271DEST_PATH_IMAGE027
is shown as
Figure 560673DEST_PATH_IMAGE015
First of frame detection
Figure 416197DEST_PATH_IMAGE010
The name of the person corresponding to the individual person,
Figure 249024DEST_PATH_IMAGE028
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,
Figure 140757DEST_PATH_IMAGE029
is shown as
Figure 793455DEST_PATH_IMAGE030
The name of the individual person is used,
Figure 694415DEST_PATH_IMAGE031
indicating that the index corresponding to the maximum value is taken,
Figure 116169DEST_PATH_IMAGE032
representing a similarity calculation function. The result of extracting video features from all stories is expressed as
Figure 444382DEST_PATH_IMAGE033
Further, in step S33, for the second step
Figure 584376DEST_PATH_IMAGE013
An individual material
Figure 790492DEST_PATH_IMAGE034
To (1) a
Figure 332332DEST_PATH_IMAGE015
Frame image
Figure 565867DEST_PATH_IMAGE016
Given its width of
Figure 458737DEST_PATH_IMAGE035
High is
Figure 701499DEST_PATH_IMAGE036
By counting the picture stability scores
Figure 97845DEST_PATH_IMAGE037
To characterize whether the frame of image picture is stable,
Figure 502282DEST_PATH_IMAGE038
,
Figure 616869DEST_PATH_IMAGE039
,
Figure 896278DEST_PATH_IMAGE040
,
Figure 412710DEST_PATH_IMAGE041
,
wherein,
Figure 253627DEST_PATH_IMAGE042
is to show to
Figure 855510DEST_PATH_IMAGE016
The frame image is taken as a gray-scale image,
Figure 440075DEST_PATH_IMAGE043
which represents the fourier transform of the signal,
Figure 545434DEST_PATH_IMAGE044
representing the conversion of the 0 frequency component to the center of the spectrum,
Figure 557252DEST_PATH_IMAGE045
it is indicated that the absolute value is taken,
Figure 380852DEST_PATH_IMAGE046
is composed of
Figure 769108DEST_PATH_IMAGE047
The absolute value of (a) is,
Figure 230438DEST_PATH_IMAGE047
is composed of
Figure 413158DEST_PATH_IMAGE016
The grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,
Figure 989633DEST_PATH_IMAGE048
is a threshold value set as
Figure 916001DEST_PATH_IMAGE046
Of medium maximum value
Figure 995952DEST_PATH_IMAGE049
Figure 615152DEST_PATH_IMAGE050
Is composed of
Figure 413344DEST_PATH_IMAGE046
The number of pixels greater than the threshold value in
Figure 877823DEST_PATH_IMAGE037
If the value is larger than the set empirical value, the image is represented
Figure 812281DEST_PATH_IMAGE016
And (5) stabilizing the picture.
Further, in step S34, for the second step
Figure 106777DEST_PATH_IMAGE013
An individual material
Figure 392265DEST_PATH_IMAGE034
Taking a fixed time window size of
Figure 660435DEST_PATH_IMAGE051
(i.e., fixed duration of time of
Figure 714979DEST_PATH_IMAGE051
) The key point data of the face of the same person
Figure 410403DEST_PATH_IMAGE052
I.e. by
Figure 917607DEST_PATH_IMAGE053
Calculating the area of the mouth
Figure 989469DEST_PATH_IMAGE054
I.e. by
Figure 164098DEST_PATH_IMAGE055
Thereby calculating out
Figure 764844DEST_PATH_IMAGE051
Variance of the area of the figure's mouth
Figure 526388DEST_PATH_IMAGE056
Figure 136361DEST_PATH_IMAGE057
Wherein
Figure 165497DEST_PATH_IMAGE058
Is shown by
Figure 202723DEST_PATH_IMAGE051
The average value of the human figure mouth-shaped area,
Figure 950100DEST_PATH_IMAGE059
representing a character
Figure 363763DEST_PATH_IMAGE060
At the moment of time
Figure 981826DEST_PATH_IMAGE010
The key points of the face at the time of the operation,
Figure 455533DEST_PATH_IMAGE061
indicates the calculated area thereof when
Figure 923161DEST_PATH_IMAGE056
When the value is larger than the set empirical value, the name is
Figure 874937DEST_PATH_IMAGE060
The character of
Figure 878665DEST_PATH_IMAGE051
Speaking during the time period is marked as a speaker.
Further, in step S4, the following steps are included:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the first
Figure 257694DEST_PATH_IMAGE013
An individual material
Figure 714083DEST_PATH_IMAGE014
All-frame analysis results of
Figure 203970DEST_PATH_IMAGE062
Go through all the results when
Figure 327784DEST_PATH_IMAGE063
Greater than a set empirical value, the flag
Figure 877714DEST_PATH_IMAGE009
Continuously traversing subsequent results for the entry point of the updated candidate segment when
Figure 821399DEST_PATH_IMAGE064
When the value is less than or equal to the set empirical value, marking
Figure 616442DEST_PATH_IMAGE065
Generating material for the out-pointing of the updated candidate segment, and so on
Figure 329183DEST_PATH_IMAGE034
In a common vessel
Figure 315594DEST_PATH_IMAGE066
Initial candidate segment list of candidate segments
Figure 746575DEST_PATH_IMAGE067
S42, traversing the initial candidate segment list generated in S41
Figure 843844DEST_PATH_IMAGE068
Comparing the current segment
Figure 145512DEST_PATH_IMAGE069
Out point of
Figure 302824DEST_PATH_IMAGE070
With the next segment
Figure 221102DEST_PATH_IMAGE071
In the point of entry
Figure 372596DEST_PATH_IMAGE072
If, if
Figure 59929DEST_PATH_IMAGE073
If the value is larger than the set empirical value, the segment is divided
Figure 388142DEST_PATH_IMAGE069
And fragments thereof
Figure 262557DEST_PATH_IMAGE071
Are combined into
Figure 967208DEST_PATH_IMAGE074
At the point of entry is
Figure 509048DEST_PATH_IMAGE069
In the point of entry
Figure 8162DEST_PATH_IMAGE075
At the point of departure is
Figure 635453DEST_PATH_IMAGE071
Out point of
Figure 612636DEST_PATH_IMAGE076
And so on, generating a final candidate segment list
Figure 510447DEST_PATH_IMAGE077
Further, in step S5, the following steps are included:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42
Figure 180463DEST_PATH_IMAGE011
Final candidate segment column for individual channel materialWatch (A)
Figure 295049DEST_PATH_IMAGE078
And the speaker marking result in the step S34, filling the segment in the final candidate list of each channel material into the final slicing timeline according to the following rules (the higher the priority is in the front), to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above rule
Figure 75923DEST_PATH_IMAGE077
And selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The beneficial effects of the invention include:
(1) the method of the invention provides a program initial film generation method by utilizing video face recognition, speaker recognition and picture stability analysis through observing the on-site command and lens cutting logic of a director when shooting interview-type integrated art programs, and the method extracts the most appropriate lens segments from pictures shot from different angles and automatically generates the interview-type integrated art program initial film so as to reduce the workload of the director and later-period program editors.
(2) The invention provides a simple and efficient method for automatically synthesizing interview-type comprehensive video program initial films only by a small amount of presetting; specifically, roles are divided according to scenes by shooting pictures of different cameras on site in a node list system, a host and a guest are marked through face recognition processing, a speaker is marked through mouth shape analysis, invalid lenses are filtered through calculating picture stability scores to generate a candidate video clip list, and finally all candidate video clips are combined regularly to generate a program primary clip. The method of the invention achieves the purposes of quickly generating the initial film, providing the post-editing personnel with quick editing and film forming and reducing the manual load.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of steps in an embodiment of a method of the present invention;
FIG. 2 is a flow chart of the method embodiment of the present invention for extracting visual features from a channel material.
Detailed Description
All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
As shown in fig. 1 and 2, a method for intelligently generating interview-type integrated art programs includes the steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
for example, in this step, videos of programs shot by 6 cameras at the scene of the program "when going on the spring and evening" are recorded, respectively
Figure 592355DEST_PATH_IMAGE079
(ii) a Other programs may also be recorded, and the number of cameras may be 8, 10, 12, and the like, which is not described herein again.
S2, setting the role played by each channel material according to the camera shooting picture in the program video;
setting the role played by each channel material according to the picture shot by the camera; in particular, the amount of the solvent to be used,
Figure 433272DEST_PATH_IMAGE080
the camera is fixed, the shot picture is a close shot,
Figure 35155DEST_PATH_IMAGE081
the camera is fixed, the shot picture is a middle view,
Figure 619720DEST_PATH_IMAGE082
the camera is fixed, the shot picture is a long shot,
Figure 489194DEST_PATH_IMAGE083
the camera is a rocker arm camera, and the shot picture is a long shot.
In step S2, the setting of the role played by each channel material includes the following steps: dividing each channel material into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
S3, extracting video features for each channel material, in step S3, the method includes the following steps:
s31, establishing a face library containing the host and the guest of the field program;
in step S31, if the program is shared
Figure 501012DEST_PATH_IMAGE001
And the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face features
Figure 324612DEST_PATH_IMAGE002
Feature matrix of
Figure 712868DEST_PATH_IMAGE003
And
Figure 672733DEST_PATH_IMAGE004
name matrix of
Figure 855453DEST_PATH_IMAGE005
Figure 431928DEST_PATH_IMAGE001
Is an integer which is the number of the whole,
Figure 358296DEST_PATH_IMAGE006
respectively correspond to the matrix
Figure 438247DEST_PATH_IMAGE007
And
Figure 558912DEST_PATH_IMAGE008
to (1) a
Figure 357104DEST_PATH_IMAGE009
Go to the first
Figure 556004DEST_PATH_IMAGE010
Column elements.
S32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame; in step S32, if there is any
Figure 756041DEST_PATH_IMAGE011
The video materials of the channels, here, N =6 (may be other numbers), each of which is a video material of one channel
Figure 546143DEST_PATH_IMAGE012
Frames, each frame having been aligned on a timeline, are passed through
Figure 566051DEST_PATH_IMAGE013
An individual material
Figure 99801DEST_PATH_IMAGE014
To (1) a
Figure 154344DEST_PATH_IMAGE015
Frame image
Figure 584189DEST_PATH_IMAGE016
Face recognition processing is carried out to obtain a processing result set of the frame
Figure 861367DEST_PATH_IMAGE017
Figure 933228DEST_PATH_IMAGE018
Wherein
Figure 842279DEST_PATH_IMAGE019
Denotes the first
Figure 974183DEST_PATH_IMAGE015
A face feature matrix obtained by frame extraction,
Figure 703104DEST_PATH_IMAGE020
in order to detect the number of faces,
Figure 578656DEST_PATH_IMAGE021
is shown as
Figure 607792DEST_PATH_IMAGE015
First of frame extraction
Figure 645018DEST_PATH_IMAGE010
The characteristics of the individual's face are,
Figure 893859DEST_PATH_IMAGE022
denotes the first
Figure 307523DEST_PATH_IMAGE015
All the face frames detected by the frame are,
Figure 925586DEST_PATH_IMAGE023
is shown as
Figure 399293DEST_PATH_IMAGE015
First of frame detection
Figure 368386DEST_PATH_IMAGE010
The number of the face frames is one,
Figure 585741DEST_PATH_IMAGE024
denotes the first
Figure 58310DEST_PATH_IMAGE015
The key points of all the faces detected by the frame,
Figure 702918DEST_PATH_IMAGE025
is shown as
Figure 159308DEST_PATH_IMAGE015
First of frame detection
Figure 413309DEST_PATH_IMAGE010
The key points of the face of the individual,
Figure 271544DEST_PATH_IMAGE026
denotes the first
Figure 821474DEST_PATH_IMAGE015
The face detected by the frame corresponds to the identified name,
Figure 765159DEST_PATH_IMAGE027
is shown as
Figure 58737DEST_PATH_IMAGE015
First of frame detection
Figure 771478DEST_PATH_IMAGE010
The name of the person corresponding to the individual person,
Figure 757889DEST_PATH_IMAGE028
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,
Figure 188870DEST_PATH_IMAGE029
is shown as
Figure 787604DEST_PATH_IMAGE030
The name of the individual person is used,
Figure 354851DEST_PATH_IMAGE031
indicating that the index corresponding to the maximum value is taken,
Figure 246584DEST_PATH_IMAGE032
representing a similarity calculation function. The result of extracting video special frames from all the materials is expressed as
Figure 899282DEST_PATH_IMAGE084
S33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error; in step S33, for the second step
Figure 800242DEST_PATH_IMAGE013
An individual material
Figure 487575DEST_PATH_IMAGE034
To (1) a
Figure 815789DEST_PATH_IMAGE015
Frame image
Figure 955783DEST_PATH_IMAGE016
Given its width of
Figure 394855DEST_PATH_IMAGE035
High is
Figure 429370DEST_PATH_IMAGE036
By counting the picture stability scores
Figure 928485DEST_PATH_IMAGE037
To characterize whether the frame of image picture is stable,
Figure 290196DEST_PATH_IMAGE038
,
Figure 532958DEST_PATH_IMAGE039
,
Figure 194884DEST_PATH_IMAGE040
,
Figure 864900DEST_PATH_IMAGE041
,
wherein,
Figure 713907DEST_PATH_IMAGE042
is to show to
Figure 760360DEST_PATH_IMAGE016
The frame image is taken as a gray-scale image,
Figure 778257DEST_PATH_IMAGE043
which represents the fourier transform of the signal,
Figure 619174DEST_PATH_IMAGE044
representing the conversion of the 0 frequency component to the center of the spectrum,
Figure 689898DEST_PATH_IMAGE045
it is indicated that the absolute value is taken,
Figure 274463DEST_PATH_IMAGE046
is composed of
Figure 910981DEST_PATH_IMAGE047
The absolute value of (a) is,
Figure 922799DEST_PATH_IMAGE047
is composed of
Figure 746399DEST_PATH_IMAGE016
The grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,
Figure 869076DEST_PATH_IMAGE048
is a threshold value set as
Figure 360100DEST_PATH_IMAGE046
Of medium maximum value
Figure 41355DEST_PATH_IMAGE049
Figure 352250DEST_PATH_IMAGE050
Is composed of
Figure 13039DEST_PATH_IMAGE046
The number of pixels greater than the threshold value in
Figure 624149DEST_PATH_IMAGE037
When the image is larger than a certain preset value, the image is represented
Figure 712191DEST_PATH_IMAGE016
And (5) stabilizing the picture. In the present embodiment, for example, the preset value is taken as
Figure 510382DEST_PATH_IMAGE085
I.e. by
Figure 974862DEST_PATH_IMAGE086
Then represent the image
Figure 440478DEST_PATH_IMAGE016
And (5) stabilizing the picture.
S34, using the data in step S31, using the face key point data of the same person in continuous time dimension, performing mouth shape analysis, and judging the sameWhether speaking is occurring within a set time. In step S34, for the second step
Figure 699421DEST_PATH_IMAGE013
An individual material
Figure 486374DEST_PATH_IMAGE034
Taking a fixed duration of
Figure 20123DEST_PATH_IMAGE051
The key point data of the face of the same person
Figure 74667DEST_PATH_IMAGE052
I.e. by
Figure 770090DEST_PATH_IMAGE053
Calculating the area of the mouth
Figure 277295DEST_PATH_IMAGE054
I.e. by
Figure 349156DEST_PATH_IMAGE055
Thereby calculating the variance of the figure mouth area in the period of time
Figure 258207DEST_PATH_IMAGE056
Figure 655690DEST_PATH_IMAGE057
Wherein
Figure 154585DEST_PATH_IMAGE058
The average value of the human mouth shape area in the period of time is shown,
Figure 764558DEST_PATH_IMAGE059
representing a character
Figure 528115DEST_PATH_IMAGE060
At the moment of time
Figure 830920DEST_PATH_IMAGE010
The key points of the face at the time of the operation,
Figure 312717DEST_PATH_IMAGE061
indicates the calculated area thereof when
Figure 726381DEST_PATH_IMAGE056
If it is greater than a predetermined value, V may be 500, and is referred to as
Figure 610023DEST_PATH_IMAGE060
The character of
Figure 818151DEST_PATH_IMAGE051
Speaking during the time period is marked as a speaker. In this embodiment, T may be 250 units, for example, and is selected according to actual conditions.
S4, generating a plurality of candidate video clips in each channel according to the extracted video features; in step S4, the method includes the steps of:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the first
Figure 787244DEST_PATH_IMAGE013
An individual material
Figure 506063DEST_PATH_IMAGE014
All-frame analysis results of
Figure 244212DEST_PATH_IMAGE062
Go through all the results when
Figure 888820DEST_PATH_IMAGE063
Above a certain preset value (where the preset value can be 0.002, depending on different programs), the mark is marked
Figure 345209DEST_PATH_IMAGE009
Continuously traversing subsequent results for the entry point of the updated candidate segment when
Figure 100676DEST_PATH_IMAGE064
Less than or equal to a predetermined value (the predetermined value may be 0.002, depending on the program), marking
Figure 693331DEST_PATH_IMAGE065
Generating material for the out-pointing of the updated candidate segment, and so on
Figure 243261DEST_PATH_IMAGE034
In a common vessel
Figure 186946DEST_PATH_IMAGE066
Initial candidate segment list of candidate segments
Figure 244639DEST_PATH_IMAGE067
S42, traversing the initial candidate segment list generated in S41
Figure 957380DEST_PATH_IMAGE068
Comparing the current segment
Figure 678211DEST_PATH_IMAGE069
Out point of
Figure 843613DEST_PATH_IMAGE070
With the next segment
Figure 940882DEST_PATH_IMAGE071
In the point of entry
Figure 773709DEST_PATH_IMAGE072
If, if
Figure 665442DEST_PATH_IMAGE073
Above a certain preset value (here, 50 frames) the segment is segmented
Figure 318140DEST_PATH_IMAGE069
And fragments thereof
Figure 219100DEST_PATH_IMAGE071
Are combined into
Figure 407898DEST_PATH_IMAGE074
At the point of entry is
Figure 736111DEST_PATH_IMAGE069
In the point of entry
Figure 610526DEST_PATH_IMAGE075
At the point of departure is
Figure 315177DEST_PATH_IMAGE071
Out point of
Figure 591438DEST_PATH_IMAGE076
And so on, generating a final candidate segment list
Figure 90552DEST_PATH_IMAGE077
And S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips. In step S5, the method includes the steps of:
s51, setting priority according to the scene according to the shooting picture category of each channel material; in particular, for the 6 channel material
Figure 717842DEST_PATH_IMAGE087
To
Figure 695026DEST_PATH_IMAGE083
Figure 91372DEST_PATH_IMAGE080
The highest priority is given to the first group,
Figure 277501DEST_PATH_IMAGE081
the priority level is set to a second priority level,
Figure 392088DEST_PATH_IMAGE088
the lowest priority;
s52, integrating the step S42
Figure 704120DEST_PATH_IMAGE011
Final candidate segment list for individual channel material
Figure 689394DEST_PATH_IMAGE078
And the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above rule
Figure 530311DEST_PATH_IMAGE077
And selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The parts not involved in the present invention are the same as or can be implemented using the prior art.
The above-described embodiment is only one embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be easily made based on the application and principle of the present invention disclosed in the present application, and the present invention is not limited to the method described in the above-described embodiment of the present invention, so that the above-described embodiment is only preferred, and not restrictive.
Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.
The functionality of the present invention, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium, and all or part of the steps of the method according to the embodiments of the present invention are executed in a computer device (which may be a personal computer, a server, or a network device) and corresponding software. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, or an optical disk, exist in a read-only Memory (RAM), a Random Access Memory (RAM), and the like, for performing a test or actual data in a program implementation.

Claims (10)

1. A method for intelligently generating interview-type comprehensive programs is characterized by comprising the following steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video;
s3, extracting video characteristics of each channel material;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features;
and S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips.
2. The method of claim 1, wherein in step S2, the setting of the role played by each channel material comprises the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
3. The method for intelligently generating interview-like integrated art programs according to claim 1, wherein in step S3, the method comprises the following steps:
s31, establishing a face library containing the host and the guest of the field program;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame;
s33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error;
and S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time.
4. The method of claim 3, wherein in step S31, if the program is shared
Figure 174552DEST_PATH_IMAGE001
And the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face features
Figure 501366DEST_PATH_IMAGE002
Feature matrix of
Figure 767262DEST_PATH_IMAGE003
And
Figure 110519DEST_PATH_IMAGE004
name matrix of
Figure 752852DEST_PATH_IMAGE005
Figure 232375DEST_PATH_IMAGE001
Is an integer which is the number of the whole,
Figure 669173DEST_PATH_IMAGE006
respectively correspond to the matrix
Figure 234146DEST_PATH_IMAGE007
And
Figure 856670DEST_PATH_IMAGE008
to (1) a
Figure 456278DEST_PATH_IMAGE009
Go to the first
Figure 329556DEST_PATH_IMAGE010
Column elements.
5. The method of claim 4, wherein in step S32, if there is any, there is a method for generating interview-like integrated art programs
Figure 116247DEST_PATH_IMAGE011
Video material of each channel, wherein each video material is
Figure 834804DEST_PATH_IMAGE012
Frames of eachA frame is aligned on the time line, then pass through
Figure 288919DEST_PATH_IMAGE013
An individual material
Figure 333099DEST_PATH_IMAGE014
To (1) a
Figure 371199DEST_PATH_IMAGE015
Frame image
Figure 893448DEST_PATH_IMAGE016
Face recognition processing is carried out to obtain a processing result set of the frame
Figure 936490DEST_PATH_IMAGE017
Figure 151571DEST_PATH_IMAGE018
Wherein
Figure 647274DEST_PATH_IMAGE019
Denotes the first
Figure 238792DEST_PATH_IMAGE015
A face feature matrix obtained by frame extraction,
Figure 136341DEST_PATH_IMAGE020
in order to detect the number of faces,
Figure 256744DEST_PATH_IMAGE021
is shown as
Figure 269437DEST_PATH_IMAGE015
First of frame extraction
Figure 399067DEST_PATH_IMAGE010
The characteristics of the individual's face are,
Figure 151123DEST_PATH_IMAGE022
is shown as
Figure 708006DEST_PATH_IMAGE015
All the face frames detected by the frame are,
Figure 443881DEST_PATH_IMAGE023
is shown as
Figure 846043DEST_PATH_IMAGE015
First of frame detection
Figure 983763DEST_PATH_IMAGE010
The number of the face frames is one,
Figure 711548DEST_PATH_IMAGE024
is shown as
Figure 167675DEST_PATH_IMAGE015
The key points of all the faces detected by the frame,
Figure 639107DEST_PATH_IMAGE025
is shown as
Figure 631334DEST_PATH_IMAGE015
First of frame detection
Figure 264441DEST_PATH_IMAGE010
The key points of the face of the individual,
Figure 974908DEST_PATH_IMAGE026
is shown as
Figure 984452DEST_PATH_IMAGE015
Face correspondence recognition from frame detectionThe name of the person(s) of (c),
Figure 831186DEST_PATH_IMAGE027
is shown as
Figure 900773DEST_PATH_IMAGE015
First of frame detection
Figure 349070DEST_PATH_IMAGE010
The name of the person corresponding to the individual person,
Figure 162305DEST_PATH_IMAGE028
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,
Figure 129124DEST_PATH_IMAGE029
is shown as
Figure 104033DEST_PATH_IMAGE030
The name of the individual person is used,
Figure 789093DEST_PATH_IMAGE031
indicating that the index corresponding to the maximum value is taken,
Figure 140440DEST_PATH_IMAGE032
representing a similarity calculation function; the result of extracting video features from all stories is expressed as
Figure 961765DEST_PATH_IMAGE033
6. The method of claim 5, wherein in step S33, for the second step
Figure 107576DEST_PATH_IMAGE013
An individual material
Figure 512887DEST_PATH_IMAGE034
To (1) a
Figure 667925DEST_PATH_IMAGE015
Frame image
Figure 343757DEST_PATH_IMAGE016
Given its width of
Figure 926048DEST_PATH_IMAGE035
High is
Figure 320120DEST_PATH_IMAGE036
By counting the picture stability scores
Figure 13269DEST_PATH_IMAGE037
To characterize whether the frame of image picture is stable,
Figure 809187DEST_PATH_IMAGE038
,
Figure 562379DEST_PATH_IMAGE039
,
Figure 676704DEST_PATH_IMAGE040
,
Figure 173544DEST_PATH_IMAGE041
,
wherein,
Figure 823968DEST_PATH_IMAGE042
is to show to
Figure 748062DEST_PATH_IMAGE016
The frame image is taken as a gray-scale image,
Figure 116726DEST_PATH_IMAGE043
which represents the fourier transform of the signal,
Figure 151679DEST_PATH_IMAGE044
representing the conversion of the 0 frequency component to the center of the spectrum,
Figure 656609DEST_PATH_IMAGE045
it is indicated that the absolute value is taken,
Figure 751604DEST_PATH_IMAGE046
is composed of
Figure 840521DEST_PATH_IMAGE047
The absolute value of (a) is,
Figure 679164DEST_PATH_IMAGE047
is composed of
Figure 304180DEST_PATH_IMAGE016
The grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,
Figure 570076DEST_PATH_IMAGE048
is a threshold value set as
Figure 647754DEST_PATH_IMAGE046
Of medium maximum value
Figure 24508DEST_PATH_IMAGE049
Figure 504031DEST_PATH_IMAGE050
Is composed of
Figure 940829DEST_PATH_IMAGE046
The number of pixels greater than the threshold value in
Figure 10197DEST_PATH_IMAGE037
If the value is larger than the set empirical value, the image is represented
Figure 190643DEST_PATH_IMAGE016
And (5) stabilizing the picture.
7. The method of claim 6, wherein in step S34, for the second step
Figure 790251DEST_PATH_IMAGE013
An individual material
Figure 397950DEST_PATH_IMAGE034
Taking a fixed time window size of
Figure 184640DEST_PATH_IMAGE051
The key point data of the face of the same person
Figure 168777DEST_PATH_IMAGE052
I.e. by
Figure 622892DEST_PATH_IMAGE053
Calculating the area of the mouth
Figure 401492DEST_PATH_IMAGE054
I.e. by
Figure 472216DEST_PATH_IMAGE055
Thereby calculating out
Figure 493000DEST_PATH_IMAGE051
Variance of the area of the figure's mouth
Figure 801621DEST_PATH_IMAGE056
Figure 751123DEST_PATH_IMAGE057
Wherein
Figure 777985DEST_PATH_IMAGE058
Is shown by
Figure 838345DEST_PATH_IMAGE051
The average value of the human figure mouth-shaped area,
Figure 267052DEST_PATH_IMAGE059
representing a character
Figure 387455DEST_PATH_IMAGE060
At the moment of time
Figure 636033DEST_PATH_IMAGE010
The key points of the face at the time of the operation,
Figure 998619DEST_PATH_IMAGE061
indicates the calculated area thereof when
Figure 281833DEST_PATH_IMAGE056
When the value is larger than the set empirical value, the name is
Figure 573137DEST_PATH_IMAGE060
The character of
Figure 309012DEST_PATH_IMAGE051
Speaking during the time period is marked as a speaker.
8. The method for intelligently generating interview-like integrated art programs according to claim 7, wherein in step S4, the method comprises the following steps:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the first
Figure 976754DEST_PATH_IMAGE013
An individual material
Figure 114474DEST_PATH_IMAGE014
All-frame analysis results of
Figure 576679DEST_PATH_IMAGE062
Go through all the results when
Figure 799850DEST_PATH_IMAGE063
Greater than a set empirical value, the flag
Figure 504239DEST_PATH_IMAGE009
Continuously traversing subsequent results for the entry point of the updated candidate segment when
Figure 762045DEST_PATH_IMAGE064
When the value is less than or equal to the set empirical value, marking
Figure 129572DEST_PATH_IMAGE065
Generating material for the out-pointing of the updated candidate segment, and so on
Figure 105619DEST_PATH_IMAGE034
In a common vessel
Figure 849584DEST_PATH_IMAGE066
Initial candidate segment list of candidate segments
Figure 696317DEST_PATH_IMAGE067
S42, traversing the initial candidate segment list generated in S41
Figure 765904DEST_PATH_IMAGE068
Comparing the current segment
Figure 698088DEST_PATH_IMAGE069
Out point of
Figure 3999DEST_PATH_IMAGE070
With the next segment
Figure 705239DEST_PATH_IMAGE071
In the point of entry
Figure 211306DEST_PATH_IMAGE072
If, if
Figure 630786DEST_PATH_IMAGE073
If the value is larger than the set empirical value, the segment is divided
Figure 716554DEST_PATH_IMAGE069
And fragments thereof
Figure 803459DEST_PATH_IMAGE071
Are combined into
Figure 214849DEST_PATH_IMAGE074
At the point of entry is
Figure 121625DEST_PATH_IMAGE069
In the point of entry
Figure 775198DEST_PATH_IMAGE075
At the point of departure is
Figure 451030DEST_PATH_IMAGE071
Out point of
Figure 33321DEST_PATH_IMAGE076
And so on, generating a final candidate segment list
Figure 427393DEST_PATH_IMAGE077
9. The method for intelligently generating interview-like integrated art programs according to claim 8, wherein in step S5, the method comprises the following steps:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42
Figure 120543DEST_PATH_IMAGE011
Final candidate segment list for individual channel material
Figure 916460DEST_PATH_IMAGE078
And the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
10. The method of claim 9, wherein in step S51, priority is set as: short shot > medium shot > long shot.
CN202110803384.0A 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs Active CN113269854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110803384.0A CN113269854B (en) 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110803384.0A CN113269854B (en) 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs

Publications (2)

Publication Number Publication Date
CN113269854A true CN113269854A (en) 2021-08-17
CN113269854B CN113269854B (en) 2021-10-15

Family

ID=77236586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110803384.0A Active CN113269854B (en) 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs

Country Status (1)

Country Link
CN (1) CN113269854B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174962A (en) * 2022-07-22 2022-10-11 湖南芒果无际科技有限公司 Rehearsal simulation method and device, computer equipment and computer readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005091211A1 (en) * 2004-03-16 2005-09-29 3Vr Security, Inc. Interactive system for recognition analysis of multiple streams of video
US20070265968A1 (en) * 2006-05-15 2007-11-15 The Directv Group, Inc. Methods and apparatus to conditionally authorize content delivery at content servers in pay delivery systems
US20110004513A1 (en) * 2003-02-05 2011-01-06 Hoffberg Steven M System and method
CN104732991A (en) * 2015-04-08 2015-06-24 成都索贝数码科技股份有限公司 System and method for rapidly sorting, selecting and editing entertainment program massive materials
CN105307028A (en) * 2015-10-26 2016-02-03 新奥特(北京)视频技术有限公司 Video editing method and device specific to video materials of plurality of lenses
US20170032559A1 (en) * 2015-10-16 2017-02-02 Mediatek Inc. Simulated Transparent Device
CN106682617A (en) * 2016-12-28 2017-05-17 电子科技大学 Image definition judgment and feature extraction method based on frequency spectrum section information
CN108875602A (en) * 2018-05-31 2018-11-23 珠海亿智电子科技有限公司 Monitor the face identification method based on deep learning under environment
CN110691258A (en) * 2019-10-30 2020-01-14 中央电视台 Program material manufacturing method and device, computer storage medium and electronic equipment
CN111191484A (en) * 2018-11-14 2020-05-22 普天信息技术有限公司 Method and device for recognizing human speaking in video image

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110004513A1 (en) * 2003-02-05 2011-01-06 Hoffberg Steven M System and method
WO2005091211A1 (en) * 2004-03-16 2005-09-29 3Vr Security, Inc. Interactive system for recognition analysis of multiple streams of video
US20070265968A1 (en) * 2006-05-15 2007-11-15 The Directv Group, Inc. Methods and apparatus to conditionally authorize content delivery at content servers in pay delivery systems
CN104732991A (en) * 2015-04-08 2015-06-24 成都索贝数码科技股份有限公司 System and method for rapidly sorting, selecting and editing entertainment program massive materials
US20170032559A1 (en) * 2015-10-16 2017-02-02 Mediatek Inc. Simulated Transparent Device
CN105307028A (en) * 2015-10-26 2016-02-03 新奥特(北京)视频技术有限公司 Video editing method and device specific to video materials of plurality of lenses
CN106682617A (en) * 2016-12-28 2017-05-17 电子科技大学 Image definition judgment and feature extraction method based on frequency spectrum section information
CN108875602A (en) * 2018-05-31 2018-11-23 珠海亿智电子科技有限公司 Monitor the face identification method based on deep learning under environment
CN111191484A (en) * 2018-11-14 2020-05-22 普天信息技术有限公司 Method and device for recognizing human speaking in video image
CN110691258A (en) * 2019-10-30 2020-01-14 中央电视台 Program material manufacturing method and device, computer storage medium and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
F´ELICIEN VALLET等: ""ROBUST VISUAL FEATURES FOR THE MULTIMODAL IDENTIFICATION OF UNREGISTERED SPEAKERS IN TV TALK-SHOWS"", 《2010 IEEE 17TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 *
无: ""索贝AI剪辑应用于总台综艺访谈类节目"", 《现代电视技术》 *
王炳锡等: "说话人辨认中有效参数的研究", 《应用声学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174962A (en) * 2022-07-22 2022-10-11 湖南芒果无际科技有限公司 Rehearsal simulation method and device, computer equipment and computer readable storage medium
CN115174962B (en) * 2022-07-22 2024-05-24 湖南芒果融创科技有限公司 Method, device, computer equipment and computer readable storage medium for previewing simulation

Also Published As

Publication number Publication date
CN113269854B (en) 2021-10-15

Similar Documents

Publication Publication Date Title
JP7252362B2 (en) Method for automatically editing video and portable terminal
CN107707931B (en) Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment
Chen et al. What comprises a good talking-head video generation?: A survey and benchmark
US7949188B2 (en) Image processing apparatus, image processing method, and program
CN106686452B (en) Method and device for generating dynamic picture
JP5510167B2 (en) Video search system and computer program therefor
Sah et al. Semantic text summarization of long videos
US8879788B2 (en) Video processing apparatus, method and system
CN107430780B (en) Method for output creation based on video content characteristics
US6492990B1 (en) Method for the automatic computerized audio visual dubbing of movies
CN111683209A (en) Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium
US20070165022A1 (en) Method and system for the automatic computerized audio visual dubbing of movies
US20030085901A1 (en) Method and system for the automatic computerized audio visual dubbing of movies
CN110505498A (en) Processing, playback method, device and the computer-readable medium of video
WO2020029883A1 (en) Method and device for generating video fingerprint
CN113255628B (en) Scene identification recognition method for news scene
JP6389296B1 (en) VIDEO DATA PROCESSING DEVICE, VIDEO DATA PROCESSING METHOD, AND COMPUTER PROGRAM
CN113269854B (en) Method for intelligently generating interview-type comprehensive programs
US9542976B2 (en) Synchronizing videos with frame-based metadata using video content
JP2010039877A (en) Apparatus and program for generating digest content
CN116708055B (en) Intelligent multimedia audiovisual image processing method, system and storage medium
CN113992973A (en) Video abstract generation method and device, electronic equipment and storage medium
Choi et al. Automated video editing for aesthetic quality improvement
CN116916089B (en) Intelligent video editing method integrating voice features and face features
CN116844562A (en) Short video background music editing method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant