[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113490058A - Intelligent subtitle matching system applied to later stage of movie and television - Google Patents

Intelligent subtitle matching system applied to later stage of movie and television Download PDF

Info

Publication number
CN113490058A
CN113490058A CN202110960220.9A CN202110960220A CN113490058A CN 113490058 A CN113490058 A CN 113490058A CN 202110960220 A CN202110960220 A CN 202110960220A CN 113490058 A CN113490058 A CN 113490058A
Authority
CN
China
Prior art keywords
subsystem
matching
movie
subtitle
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110960220.9A
Other languages
Chinese (zh)
Inventor
马晨光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Shanghai Intelligent Technology Co Ltd
Original Assignee
Unisound Shanghai Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Shanghai Intelligent Technology Co Ltd filed Critical Unisound Shanghai Intelligent Technology Co Ltd
Priority to CN202110960220.9A priority Critical patent/CN113490058A/en
Publication of CN113490058A publication Critical patent/CN113490058A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Studio Circuits (AREA)

Abstract

The invention discloses an intelligent subtitle matching system applied to the later stage of movies and videos, which comprises an input subsystem, an identification subsystem, a subtitle matching subsystem and an output subsystem, wherein the input subsystem is used for selecting and inputting movie and video to be processed, the identification subsystem is used for carrying out voice identification and lip language identification on the movie and video, the subtitle matching subsystem is used for carrying out automatic matching generation on subtitles of the movie and video according to data identified by the identification subsystem, and the output subsystem is used for carrying out final output on the movie and video with the subtitles. The method and the device have reasonable design, and can realize automatic and accurate matching of scene captions between the text generated by voice recognition and the specific scene of the video by matching voice recognition with video lip language recognition, thereby greatly reducing the workload of editing personnel and improving the working efficiency of the editing personnel.

Description

Intelligent subtitle matching system applied to later stage of movie and television
Technical Field
The invention relates to the field of movie and television editing, in particular to an intelligent subtitle matching system applied to the later stage of movies and television.
Background
Subtitles refer to non-video contents such as dialogs in television, movie and stage works displayed in a text form, and also generally refer to characters in post-processing of movie and television works. The commentary and various characters appearing below the movie screen or the television screen, such as the film title, the credits, the lyrics, the dialogues, the captions and the explanatory words are called subtitles according to the introduction of people, the place name and the year. Subtitles for movies and television works generally appear below the screen, whereas subtitles for drama works may appear on both sides or above the stage. The excellent caption has five characteristics of accuracy, consistency, clearness, readability and equivalence. The accuracy refers to that the finished product has no low-level errors such as wrongly written characters and the like; consistency means that the consistency of the subtitles in form and presentation is critical to the understanding of the viewer; intelligibility refers to the complete presentation of audio, including speaker recognition and non-conversational content, that needs to be presented verbally in a clear manner; readability means that the time of the subtitle is enough for the audience to read, the subtitle is synchronous with audio, and the effective content of the picture is not covered by the subtitle; the term "equivalent" means that the subtitle should completely convey the content and intention of the video material, and the content of the subtitle should be equivalent to that of the video material.
At present, subtitles need a large amount of manual input when a video is shot, and later-stage speech recognition only recognizes a matching sequence aiming at speech, so that specific frames are difficult to match, the workload of editing personnel is greatly increased, the editing efficiency of the editing personnel is influenced, and certain defects exist.
Disclosure of Invention
The invention aims to provide an intelligent subtitle matching system applied to the later stage of movies and televisions, so that accurate matching of subtitles and videos is realized, and the working efficiency of editing personnel is improved.
The invention is realized by the following steps:
the intelligent subtitle matching system applied to the later stage of movies and televisions comprises an input subsystem, an identification subsystem, a subtitle matching subsystem and an output subsystem, wherein the input subsystem is used for selecting and inputting movie videos to be processed, the identification subsystem is used for carrying out voice identification and lip language identification on the movie videos, the subtitle matching subsystem is used for carrying out automatic matching generation on the subtitles of the movie videos according to data identified by the identification subsystem, and the output subsystem is used for carrying out final output on the movie videos with the subtitles.
The recognition subsystem comprises a voice recognition unit and a lip language recognition unit, the voice recognition unit is used for recognizing voice in the movie and television video and generating the voice into text in real time, and the lip language recognition unit is used for recognizing lip language in the movie and television video frame by frame.
The recognition subsystem also comprises a text conversion unit which is used for freely converting or matching different languages for the text generated by speech recognition.
The subtitle matching subsystem comprises a calibration matching unit and a subtitle inserting unit, the calibration matching unit is used for calibrating and matching the subtitle time of a specific scene according to the lip language recognition time of the lip language recognition unit, and the subtitle inserting unit is used for inserting a text segment of a text generated by voice recognition and corresponding to a time point into each scene.
The subtitle matching subsystem further comprises a subtitle editing unit, and the subtitle editing unit is used for editing and modifying the position and size characteristics of the subtitles generated in the movie and television videos.
According to the method and the device, the text generated by voice recognition and the specific scene of the video can be automatically and accurately matched with the scene subtitle through voice recognition and video lip language recognition, so that the workload of editing personnel is greatly reduced, and the working efficiency of the editing personnel is improved.
Drawings
Fig. 1 is a block diagram of a structure of an intelligent subtitle matching system applied to the later stage of movies and televisions.
In the figure, 1, an input subsystem; 2. identifying a subsystem; 3. a subtitle matching subsystem; 4. an output subsystem; 5. a voice recognition unit; 6. a lip language identification unit; 7. a calibration matching unit; 8. a subtitle insertion unit; 9. a subtitle editing unit; 10. and a text conversion unit.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
Referring to fig. 1, an intelligent subtitle matching system applied to a later stage of a movie comprises an input subsystem 1, an identification subsystem 2, a subtitle matching subsystem 3 and an output subsystem 4, wherein the input subsystem 1 is used for selecting and inputting a movie video to be processed, the identification subsystem 2 is used for performing voice identification and lip language identification of the movie video, the subtitle matching subsystem 3 is used for performing automatic matching generation of a movie video subtitle according to data identified by the identification subsystem 2, and the output subsystem 4 is used for performing final output of the movie video with a subtitle.
The recognition subsystem 2 comprises a voice recognition unit 5 and a lip language recognition unit 6, wherein the voice recognition unit 5 is used for recognizing voice in the movie and television video and generating the voice into text in real time, and the lip language recognition unit 6 is used for recognizing lip language in the movie and television video frame by frame. In the embodiment, the voice recognition method is mainly a mode matching method, in the training stage, a user speaks each word in a vocabulary list in sequence, and stores the feature vector of each word as a template into a template library, in the recognition stage, the feature vector of input voice is compared with each template in the template library in sequence in similarity, and the highest similarity is output as a recognition result; the lip language recognition uses a machine vision technology to continuously recognize human faces from images, judge a person speaking in the images and extract continuous mouth shape change characteristics of the person.
The recognition subsystem 2 further comprises a text conversion unit 10, and the text conversion unit 10 is configured to perform free conversion or matching of different languages on the text generated by speech recognition. In the embodiment, by converting the text in different languages, the clipping personnel can select the language of the inserted caption according to the requirement.
The subtitle matching subsystem 3 comprises a calibration matching unit 7 and a subtitle inserting unit 8, the calibration matching unit 7 is used for calibrating and matching the subtitle time of a specific scene according to the lip language recognition time of the lip language recognition unit 6, and the subtitle inserting unit 8 is used for inserting the text segment of the corresponding time point in the text generated by voice recognition into each scene. In the present embodiment, the calibration matching unit 7 allocates the caption time according to the recognition time of the lip language in the scene, and ensures that the caption matches the movie scene in cooperation with the speech recognition text of the corresponding time extracted by the caption inserting unit 8.
The subtitle matching subsystem 3 further comprises a subtitle editing unit 9, and the subtitle editing unit 9 is used for editing and modifying the position and size characteristics of the subtitles generated in the movie and television videos. In the present embodiment, the subtitle editing unit 9 provides a feature modification function for subtitles, and an editor can modify the features of the subtitles, such as size and position, according to actual situations, so as to ensure the display effect of the subtitles.
The following is a list of preferred embodiments of the intelligent subtitle matching system applied to the later stage of movie and television for clearly illustrating the content of the present invention, and it should be understood that the content of the present invention is not limited to the following embodiments, and other modifications by conventional technical means of those skilled in the art are within the scope of the idea of the present invention.
The embodiment of the invention provides an operation process of an intelligent subtitle matching system applied to the later stage of movies and televisions, which specifically comprises the following steps:
s1, selecting the original film and video with audio and the front face identification picture of people through the input subsystem 1 to carry out system input;
s2, recognizing the audio in the movie video by the voice recognition unit 5 in the subsystem 2, converting the audio into a text in real time, and recognizing the lip language action in the movie video frame by the lip language recognition unit 6 in the subsystem 2;
s3, the calibration matching unit 7 calibrates and matches the corresponding caption time according to the time when the lip language appears in the video scene, and then the caption inserting unit 8 invokes the recognition text of the corresponding time point in the text converted by the voice recognition as the caption of the matching scene;
and S4, the output subsystem 4 outputs the video after the subtitle matching is finished.
To further facilitate understanding of the above process, the following is exemplified:
taking a certain scene in a movie video as an example, the voice recognition unit 5 recognizes dialogue voice in the scene and converts the dialogue voice into text in real time, the lip language recognition unit 6 recognizes lip language actions in the scene and recognizes a plurality of sets of consecutive lip language actions, the calibration matching unit 7 counts the running time of each set of consecutive lip language actions as the appearance time of the subtitle, then the subtitle insertion unit 8 calls the text converted by the corresponding voice recognition as the subtitle, for example, the starting time of one set of consecutive lip language actions is the third minute of the video, the ending time is the third minute of the video and twenty seconds, the appearance time of the subtitle is the third minute, the ending time is the third twentieth seconds, the subtitle insertion unit 8 selects the text generated by three-to-twenty-second voice recognition to insert into the video, and so on until all pictures of the consecutive lip language actions are inserted with matching text of matching time, the output subsystem 4 performs video output.
The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. The utility model provides an intelligent subtitle matching system for movie & TV later stage which characterized in that: the system comprises an input subsystem (1), an identification subsystem (2), a subtitle matching subsystem (3) and an output subsystem (4), wherein the input subsystem (1) is used for selecting and inputting a movie video to be processed, the identification subsystem (2) is used for carrying out voice identification and lip language identification on the movie video, the subtitle matching subsystem (3) is used for carrying out automatic matching generation on a movie video subtitle according to data identified by the identification subsystem, and the output subsystem (4) is used for carrying out final output on the movie video with the subtitle.
2. The system for matching smart subtitles in the late stage of movie and television according to claim 1, wherein: the recognition subsystem (2) comprises a voice recognition unit (5) and a lip language recognition unit (6), wherein the voice recognition unit (5) is used for recognizing voice in the movie and television video and generating the voice into a text in real time, and the lip language recognition unit (6) is used for recognizing lip language in the movie and television video frame by frame.
3. The system for matching smart subtitles in the late stage of movie and television according to claim 2, wherein: the recognition subsystem (2) further comprises a text conversion unit (10), and the text conversion unit (10) is used for freely converting or matching texts generated by speech recognition in different languages.
4. The system for matching smart subtitles in the late stage of movie and television according to claim 2, wherein: the subtitle matching subsystem (3) comprises a calibration matching unit (7) and a subtitle inserting unit (8), wherein the calibration matching unit (7) is used for calibrating and matching the subtitle time of a specific scene according to the lip language recognition time of the lip language recognition unit, and the subtitle inserting unit (8) is used for inserting a text segment of a text generated by voice recognition and corresponding to a time point into each scene.
5. The system for matching smart subtitles in the late stage of movie and television according to claim 4, wherein: the subtitle matching subsystem (3) further comprises a subtitle editing unit (9), and the subtitle editing unit (9) is used for editing and modifying the position and size characteristics of the subtitles generated in the movie and television videos.
CN202110960220.9A 2021-08-20 2021-08-20 Intelligent subtitle matching system applied to later stage of movie and television Pending CN113490058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110960220.9A CN113490058A (en) 2021-08-20 2021-08-20 Intelligent subtitle matching system applied to later stage of movie and television

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110960220.9A CN113490058A (en) 2021-08-20 2021-08-20 Intelligent subtitle matching system applied to later stage of movie and television

Publications (1)

Publication Number Publication Date
CN113490058A true CN113490058A (en) 2021-10-08

Family

ID=77946937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110960220.9A Pending CN113490058A (en) 2021-08-20 2021-08-20 Intelligent subtitle matching system applied to later stage of movie and television

Country Status (1)

Country Link
CN (1) CN113490058A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322077A (en) * 1999-05-12 2000-11-24 Sony Corp Television device
CN105100647A (en) * 2015-07-31 2015-11-25 深圳市金立通信设备有限公司 Subtitle correction method and terminal
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device
CN105704538A (en) * 2016-03-17 2016-06-22 广东小天才科技有限公司 Audio and video subtitle generation method and system
CN107770598A (en) * 2017-10-12 2018-03-06 维沃移动通信有限公司 A kind of detection method synchronously played, mobile terminal
CN110035326A (en) * 2019-04-04 2019-07-19 北京字节跳动网络技术有限公司 Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment
CN110691204A (en) * 2019-09-09 2020-01-14 苏州臻迪智能科技有限公司 Audio and video processing method and device, electronic equipment and storage medium
CN111401101A (en) * 2018-12-29 2020-07-10 上海智臻智能网络科技股份有限公司 Video generation system based on portrait
CN111813998A (en) * 2020-09-10 2020-10-23 北京易真学思教育科技有限公司 Video data processing method, device, equipment and storage medium
US20200404386A1 (en) * 2018-02-26 2020-12-24 Google Llc Automated voice translation dubbing for prerecorded video
CN112714348A (en) * 2020-12-28 2021-04-27 深圳市亿联智能有限公司 Intelligent audio and video synchronization method
CN113033357A (en) * 2021-03-11 2021-06-25 深圳市鹰硕技术有限公司 Subtitle adjusting method and device based on mouth shape features

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322077A (en) * 1999-05-12 2000-11-24 Sony Corp Television device
CN105100647A (en) * 2015-07-31 2015-11-25 深圳市金立通信设备有限公司 Subtitle correction method and terminal
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device
CN105704538A (en) * 2016-03-17 2016-06-22 广东小天才科技有限公司 Audio and video subtitle generation method and system
CN107770598A (en) * 2017-10-12 2018-03-06 维沃移动通信有限公司 A kind of detection method synchronously played, mobile terminal
US20200404386A1 (en) * 2018-02-26 2020-12-24 Google Llc Automated voice translation dubbing for prerecorded video
CN111401101A (en) * 2018-12-29 2020-07-10 上海智臻智能网络科技股份有限公司 Video generation system based on portrait
CN110035326A (en) * 2019-04-04 2019-07-19 北京字节跳动网络技术有限公司 Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment
CN110691204A (en) * 2019-09-09 2020-01-14 苏州臻迪智能科技有限公司 Audio and video processing method and device, electronic equipment and storage medium
CN111813998A (en) * 2020-09-10 2020-10-23 北京易真学思教育科技有限公司 Video data processing method, device, equipment and storage medium
CN112714348A (en) * 2020-12-28 2021-04-27 深圳市亿联智能有限公司 Intelligent audio and video synchronization method
CN113033357A (en) * 2021-03-11 2021-06-25 深圳市鹰硕技术有限公司 Subtitle adjusting method and device based on mouth shape features

Similar Documents

Publication Publication Date Title
CN105245917B (en) A kind of system and method for multi-media voice subtitle generation
KR101990023B1 (en) Method for chunk-unit separation rule and display automated key word to develop foreign language studying, and system thereof
JP3844431B2 (en) Caption system based on speech recognition
EP3226245B1 (en) System and method to insert visual subtitles in videos
CN105704538A (en) Audio and video subtitle generation method and system
CN100469109C (en) Automatic translation method for digital video captions
CA2956566C (en) Custom video content
KR101492816B1 (en) Apparatus and method for providing auto lip-synch in animation
KR20010072936A (en) Post-Synchronizing an information stream
CN114157920B (en) Method and device for playing sign language, intelligent television and storage medium
CN112714348A (en) Intelligent audio and video synchronization method
Haikuo Film translation in China: Features and technical constraints of dubbing and subtitling English into Chinese
US20110243447A1 (en) Method and apparatus for synthesizing speech
CN110781346A (en) News production method, system, device and storage medium based on virtual image
EP3839953A1 (en) Automatic caption synchronization and positioning
CN117596433B (en) International Chinese teaching audiovisual courseware editing system based on time axis fine adjustment
CN118283367A (en) Conversational video editing method, device and equipment capable of customizing story line
CN113490058A (en) Intelligent subtitle matching system applied to later stage of movie and television
KR102160117B1 (en) a real-time broadcast content generating system for disabled
CN113055734A (en) Smart screen with voice recognition and online subtitle display functions
CN113033357B (en) Subtitle adjusting method and device based on mouth shape characteristics
CN116017088A (en) Video subtitle processing method, device, electronic equipment and storage medium
US11948555B2 (en) Method and system for content internationalization and localization
Park et al. Automatic subtitles localization through speaker identification in multimedia system
Aleksandrova et al. Audiovisual content analysis in the translation process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination