CN113490058A - Intelligent subtitle matching system applied to later stage of movie and television - Google Patents
Intelligent subtitle matching system applied to later stage of movie and television Download PDFInfo
- Publication number
- CN113490058A CN113490058A CN202110960220.9A CN202110960220A CN113490058A CN 113490058 A CN113490058 A CN 113490058A CN 202110960220 A CN202110960220 A CN 202110960220A CN 113490058 A CN113490058 A CN 113490058A
- Authority
- CN
- China
- Prior art keywords
- subsystem
- matching
- movie
- subtitle
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000000034 method Methods 0.000 abstract description 6
- 230000009471 action Effects 0.000 description 6
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Studio Circuits (AREA)
Abstract
The invention discloses an intelligent subtitle matching system applied to the later stage of movies and videos, which comprises an input subsystem, an identification subsystem, a subtitle matching subsystem and an output subsystem, wherein the input subsystem is used for selecting and inputting movie and video to be processed, the identification subsystem is used for carrying out voice identification and lip language identification on the movie and video, the subtitle matching subsystem is used for carrying out automatic matching generation on subtitles of the movie and video according to data identified by the identification subsystem, and the output subsystem is used for carrying out final output on the movie and video with the subtitles. The method and the device have reasonable design, and can realize automatic and accurate matching of scene captions between the text generated by voice recognition and the specific scene of the video by matching voice recognition with video lip language recognition, thereby greatly reducing the workload of editing personnel and improving the working efficiency of the editing personnel.
Description
Technical Field
The invention relates to the field of movie and television editing, in particular to an intelligent subtitle matching system applied to the later stage of movies and television.
Background
Subtitles refer to non-video contents such as dialogs in television, movie and stage works displayed in a text form, and also generally refer to characters in post-processing of movie and television works. The commentary and various characters appearing below the movie screen or the television screen, such as the film title, the credits, the lyrics, the dialogues, the captions and the explanatory words are called subtitles according to the introduction of people, the place name and the year. Subtitles for movies and television works generally appear below the screen, whereas subtitles for drama works may appear on both sides or above the stage. The excellent caption has five characteristics of accuracy, consistency, clearness, readability and equivalence. The accuracy refers to that the finished product has no low-level errors such as wrongly written characters and the like; consistency means that the consistency of the subtitles in form and presentation is critical to the understanding of the viewer; intelligibility refers to the complete presentation of audio, including speaker recognition and non-conversational content, that needs to be presented verbally in a clear manner; readability means that the time of the subtitle is enough for the audience to read, the subtitle is synchronous with audio, and the effective content of the picture is not covered by the subtitle; the term "equivalent" means that the subtitle should completely convey the content and intention of the video material, and the content of the subtitle should be equivalent to that of the video material.
At present, subtitles need a large amount of manual input when a video is shot, and later-stage speech recognition only recognizes a matching sequence aiming at speech, so that specific frames are difficult to match, the workload of editing personnel is greatly increased, the editing efficiency of the editing personnel is influenced, and certain defects exist.
Disclosure of Invention
The invention aims to provide an intelligent subtitle matching system applied to the later stage of movies and televisions, so that accurate matching of subtitles and videos is realized, and the working efficiency of editing personnel is improved.
The invention is realized by the following steps:
the intelligent subtitle matching system applied to the later stage of movies and televisions comprises an input subsystem, an identification subsystem, a subtitle matching subsystem and an output subsystem, wherein the input subsystem is used for selecting and inputting movie videos to be processed, the identification subsystem is used for carrying out voice identification and lip language identification on the movie videos, the subtitle matching subsystem is used for carrying out automatic matching generation on the subtitles of the movie videos according to data identified by the identification subsystem, and the output subsystem is used for carrying out final output on the movie videos with the subtitles.
The recognition subsystem comprises a voice recognition unit and a lip language recognition unit, the voice recognition unit is used for recognizing voice in the movie and television video and generating the voice into text in real time, and the lip language recognition unit is used for recognizing lip language in the movie and television video frame by frame.
The recognition subsystem also comprises a text conversion unit which is used for freely converting or matching different languages for the text generated by speech recognition.
The subtitle matching subsystem comprises a calibration matching unit and a subtitle inserting unit, the calibration matching unit is used for calibrating and matching the subtitle time of a specific scene according to the lip language recognition time of the lip language recognition unit, and the subtitle inserting unit is used for inserting a text segment of a text generated by voice recognition and corresponding to a time point into each scene.
The subtitle matching subsystem further comprises a subtitle editing unit, and the subtitle editing unit is used for editing and modifying the position and size characteristics of the subtitles generated in the movie and television videos.
According to the method and the device, the text generated by voice recognition and the specific scene of the video can be automatically and accurately matched with the scene subtitle through voice recognition and video lip language recognition, so that the workload of editing personnel is greatly reduced, and the working efficiency of the editing personnel is improved.
Drawings
Fig. 1 is a block diagram of a structure of an intelligent subtitle matching system applied to the later stage of movies and televisions.
In the figure, 1, an input subsystem; 2. identifying a subsystem; 3. a subtitle matching subsystem; 4. an output subsystem; 5. a voice recognition unit; 6. a lip language identification unit; 7. a calibration matching unit; 8. a subtitle insertion unit; 9. a subtitle editing unit; 10. and a text conversion unit.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
Referring to fig. 1, an intelligent subtitle matching system applied to a later stage of a movie comprises an input subsystem 1, an identification subsystem 2, a subtitle matching subsystem 3 and an output subsystem 4, wherein the input subsystem 1 is used for selecting and inputting a movie video to be processed, the identification subsystem 2 is used for performing voice identification and lip language identification of the movie video, the subtitle matching subsystem 3 is used for performing automatic matching generation of a movie video subtitle according to data identified by the identification subsystem 2, and the output subsystem 4 is used for performing final output of the movie video with a subtitle.
The recognition subsystem 2 comprises a voice recognition unit 5 and a lip language recognition unit 6, wherein the voice recognition unit 5 is used for recognizing voice in the movie and television video and generating the voice into text in real time, and the lip language recognition unit 6 is used for recognizing lip language in the movie and television video frame by frame. In the embodiment, the voice recognition method is mainly a mode matching method, in the training stage, a user speaks each word in a vocabulary list in sequence, and stores the feature vector of each word as a template into a template library, in the recognition stage, the feature vector of input voice is compared with each template in the template library in sequence in similarity, and the highest similarity is output as a recognition result; the lip language recognition uses a machine vision technology to continuously recognize human faces from images, judge a person speaking in the images and extract continuous mouth shape change characteristics of the person.
The recognition subsystem 2 further comprises a text conversion unit 10, and the text conversion unit 10 is configured to perform free conversion or matching of different languages on the text generated by speech recognition. In the embodiment, by converting the text in different languages, the clipping personnel can select the language of the inserted caption according to the requirement.
The subtitle matching subsystem 3 comprises a calibration matching unit 7 and a subtitle inserting unit 8, the calibration matching unit 7 is used for calibrating and matching the subtitle time of a specific scene according to the lip language recognition time of the lip language recognition unit 6, and the subtitle inserting unit 8 is used for inserting the text segment of the corresponding time point in the text generated by voice recognition into each scene. In the present embodiment, the calibration matching unit 7 allocates the caption time according to the recognition time of the lip language in the scene, and ensures that the caption matches the movie scene in cooperation with the speech recognition text of the corresponding time extracted by the caption inserting unit 8.
The subtitle matching subsystem 3 further comprises a subtitle editing unit 9, and the subtitle editing unit 9 is used for editing and modifying the position and size characteristics of the subtitles generated in the movie and television videos. In the present embodiment, the subtitle editing unit 9 provides a feature modification function for subtitles, and an editor can modify the features of the subtitles, such as size and position, according to actual situations, so as to ensure the display effect of the subtitles.
The following is a list of preferred embodiments of the intelligent subtitle matching system applied to the later stage of movie and television for clearly illustrating the content of the present invention, and it should be understood that the content of the present invention is not limited to the following embodiments, and other modifications by conventional technical means of those skilled in the art are within the scope of the idea of the present invention.
The embodiment of the invention provides an operation process of an intelligent subtitle matching system applied to the later stage of movies and televisions, which specifically comprises the following steps:
s1, selecting the original film and video with audio and the front face identification picture of people through the input subsystem 1 to carry out system input;
s2, recognizing the audio in the movie video by the voice recognition unit 5 in the subsystem 2, converting the audio into a text in real time, and recognizing the lip language action in the movie video frame by the lip language recognition unit 6 in the subsystem 2;
s3, the calibration matching unit 7 calibrates and matches the corresponding caption time according to the time when the lip language appears in the video scene, and then the caption inserting unit 8 invokes the recognition text of the corresponding time point in the text converted by the voice recognition as the caption of the matching scene;
and S4, the output subsystem 4 outputs the video after the subtitle matching is finished.
To further facilitate understanding of the above process, the following is exemplified:
taking a certain scene in a movie video as an example, the voice recognition unit 5 recognizes dialogue voice in the scene and converts the dialogue voice into text in real time, the lip language recognition unit 6 recognizes lip language actions in the scene and recognizes a plurality of sets of consecutive lip language actions, the calibration matching unit 7 counts the running time of each set of consecutive lip language actions as the appearance time of the subtitle, then the subtitle insertion unit 8 calls the text converted by the corresponding voice recognition as the subtitle, for example, the starting time of one set of consecutive lip language actions is the third minute of the video, the ending time is the third minute of the video and twenty seconds, the appearance time of the subtitle is the third minute, the ending time is the third twentieth seconds, the subtitle insertion unit 8 selects the text generated by three-to-twenty-second voice recognition to insert into the video, and so on until all pictures of the consecutive lip language actions are inserted with matching text of matching time, the output subsystem 4 performs video output.
The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (5)
1. The utility model provides an intelligent subtitle matching system for movie & TV later stage which characterized in that: the system comprises an input subsystem (1), an identification subsystem (2), a subtitle matching subsystem (3) and an output subsystem (4), wherein the input subsystem (1) is used for selecting and inputting a movie video to be processed, the identification subsystem (2) is used for carrying out voice identification and lip language identification on the movie video, the subtitle matching subsystem (3) is used for carrying out automatic matching generation on a movie video subtitle according to data identified by the identification subsystem, and the output subsystem (4) is used for carrying out final output on the movie video with the subtitle.
2. The system for matching smart subtitles in the late stage of movie and television according to claim 1, wherein: the recognition subsystem (2) comprises a voice recognition unit (5) and a lip language recognition unit (6), wherein the voice recognition unit (5) is used for recognizing voice in the movie and television video and generating the voice into a text in real time, and the lip language recognition unit (6) is used for recognizing lip language in the movie and television video frame by frame.
3. The system for matching smart subtitles in the late stage of movie and television according to claim 2, wherein: the recognition subsystem (2) further comprises a text conversion unit (10), and the text conversion unit (10) is used for freely converting or matching texts generated by speech recognition in different languages.
4. The system for matching smart subtitles in the late stage of movie and television according to claim 2, wherein: the subtitle matching subsystem (3) comprises a calibration matching unit (7) and a subtitle inserting unit (8), wherein the calibration matching unit (7) is used for calibrating and matching the subtitle time of a specific scene according to the lip language recognition time of the lip language recognition unit, and the subtitle inserting unit (8) is used for inserting a text segment of a text generated by voice recognition and corresponding to a time point into each scene.
5. The system for matching smart subtitles in the late stage of movie and television according to claim 4, wherein: the subtitle matching subsystem (3) further comprises a subtitle editing unit (9), and the subtitle editing unit (9) is used for editing and modifying the position and size characteristics of the subtitles generated in the movie and television videos.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110960220.9A CN113490058A (en) | 2021-08-20 | 2021-08-20 | Intelligent subtitle matching system applied to later stage of movie and television |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110960220.9A CN113490058A (en) | 2021-08-20 | 2021-08-20 | Intelligent subtitle matching system applied to later stage of movie and television |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113490058A true CN113490058A (en) | 2021-10-08 |
Family
ID=77946937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110960220.9A Pending CN113490058A (en) | 2021-08-20 | 2021-08-20 | Intelligent subtitle matching system applied to later stage of movie and television |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113490058A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000322077A (en) * | 1999-05-12 | 2000-11-24 | Sony Corp | Television device |
CN105100647A (en) * | 2015-07-31 | 2015-11-25 | 深圳市金立通信设备有限公司 | Subtitle correction method and terminal |
CN105512348A (en) * | 2016-01-28 | 2016-04-20 | 北京旷视科技有限公司 | Method and device for processing videos and related audios and retrieving method and device |
CN105704538A (en) * | 2016-03-17 | 2016-06-22 | 广东小天才科技有限公司 | Audio and video subtitle generation method and system |
CN107770598A (en) * | 2017-10-12 | 2018-03-06 | 维沃移动通信有限公司 | A kind of detection method synchronously played, mobile terminal |
CN110035326A (en) * | 2019-04-04 | 2019-07-19 | 北京字节跳动网络技术有限公司 | Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment |
CN110691204A (en) * | 2019-09-09 | 2020-01-14 | 苏州臻迪智能科技有限公司 | Audio and video processing method and device, electronic equipment and storage medium |
CN111401101A (en) * | 2018-12-29 | 2020-07-10 | 上海智臻智能网络科技股份有限公司 | Video generation system based on portrait |
CN111813998A (en) * | 2020-09-10 | 2020-10-23 | 北京易真学思教育科技有限公司 | Video data processing method, device, equipment and storage medium |
US20200404386A1 (en) * | 2018-02-26 | 2020-12-24 | Google Llc | Automated voice translation dubbing for prerecorded video |
CN112714348A (en) * | 2020-12-28 | 2021-04-27 | 深圳市亿联智能有限公司 | Intelligent audio and video synchronization method |
CN113033357A (en) * | 2021-03-11 | 2021-06-25 | 深圳市鹰硕技术有限公司 | Subtitle adjusting method and device based on mouth shape features |
-
2021
- 2021-08-20 CN CN202110960220.9A patent/CN113490058A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000322077A (en) * | 1999-05-12 | 2000-11-24 | Sony Corp | Television device |
CN105100647A (en) * | 2015-07-31 | 2015-11-25 | 深圳市金立通信设备有限公司 | Subtitle correction method and terminal |
CN105512348A (en) * | 2016-01-28 | 2016-04-20 | 北京旷视科技有限公司 | Method and device for processing videos and related audios and retrieving method and device |
CN105704538A (en) * | 2016-03-17 | 2016-06-22 | 广东小天才科技有限公司 | Audio and video subtitle generation method and system |
CN107770598A (en) * | 2017-10-12 | 2018-03-06 | 维沃移动通信有限公司 | A kind of detection method synchronously played, mobile terminal |
US20200404386A1 (en) * | 2018-02-26 | 2020-12-24 | Google Llc | Automated voice translation dubbing for prerecorded video |
CN111401101A (en) * | 2018-12-29 | 2020-07-10 | 上海智臻智能网络科技股份有限公司 | Video generation system based on portrait |
CN110035326A (en) * | 2019-04-04 | 2019-07-19 | 北京字节跳动网络技术有限公司 | Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment |
CN110691204A (en) * | 2019-09-09 | 2020-01-14 | 苏州臻迪智能科技有限公司 | Audio and video processing method and device, electronic equipment and storage medium |
CN111813998A (en) * | 2020-09-10 | 2020-10-23 | 北京易真学思教育科技有限公司 | Video data processing method, device, equipment and storage medium |
CN112714348A (en) * | 2020-12-28 | 2021-04-27 | 深圳市亿联智能有限公司 | Intelligent audio and video synchronization method |
CN113033357A (en) * | 2021-03-11 | 2021-06-25 | 深圳市鹰硕技术有限公司 | Subtitle adjusting method and device based on mouth shape features |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105245917B (en) | A kind of system and method for multi-media voice subtitle generation | |
KR101990023B1 (en) | Method for chunk-unit separation rule and display automated key word to develop foreign language studying, and system thereof | |
JP3844431B2 (en) | Caption system based on speech recognition | |
EP3226245B1 (en) | System and method to insert visual subtitles in videos | |
CN105704538A (en) | Audio and video subtitle generation method and system | |
CN100469109C (en) | Automatic translation method for digital video captions | |
CA2956566C (en) | Custom video content | |
KR101492816B1 (en) | Apparatus and method for providing auto lip-synch in animation | |
KR20010072936A (en) | Post-Synchronizing an information stream | |
CN114157920B (en) | Method and device for playing sign language, intelligent television and storage medium | |
CN112714348A (en) | Intelligent audio and video synchronization method | |
Haikuo | Film translation in China: Features and technical constraints of dubbing and subtitling English into Chinese | |
US20110243447A1 (en) | Method and apparatus for synthesizing speech | |
CN110781346A (en) | News production method, system, device and storage medium based on virtual image | |
EP3839953A1 (en) | Automatic caption synchronization and positioning | |
CN117596433B (en) | International Chinese teaching audiovisual courseware editing system based on time axis fine adjustment | |
CN118283367A (en) | Conversational video editing method, device and equipment capable of customizing story line | |
CN113490058A (en) | Intelligent subtitle matching system applied to later stage of movie and television | |
KR102160117B1 (en) | a real-time broadcast content generating system for disabled | |
CN113055734A (en) | Smart screen with voice recognition and online subtitle display functions | |
CN113033357B (en) | Subtitle adjusting method and device based on mouth shape characteristics | |
CN116017088A (en) | Video subtitle processing method, device, electronic equipment and storage medium | |
US11948555B2 (en) | Method and system for content internationalization and localization | |
Park et al. | Automatic subtitles localization through speaker identification in multimedia system | |
Aleksandrova et al. | Audiovisual content analysis in the translation process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |