CN112735430A

CN112735430A - Multilingual online simultaneous interpretation system

Info

Publication number: CN112735430A
Application number: CN202011582495.5A
Authority: CN
Inventors: 彭川
Original assignee: Transn Iol Technology Co ltd
Current assignee: Transn Iol Technology Co ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-30

Abstract

The invention provides a multilingual online simultaneous interpretation system, which comprises: the system comprises an audio and video acquisition end, a voice recognition end, a translator end and a spectator end; the audio and video acquisition end is used for acquiring an original video of the speech of the speaker and sending the original video to the translator end and the audience end; the translator end is used for inputting a target audio translated by the translator according to the original video and sending the target audio to the voice recognition end and the audience end; wherein, the language selected by the audience at the audience end is the same as the language of the target audio; the voice recognition end is used for recognizing the target audio frequency, acquiring characters corresponding to the target audio frequency and sending the characters corresponding to the target audio frequency to the audience end; and the audience terminal is used for playing the target audio and the original video and displaying the characters corresponding to the target audio. The invention realizes remote simultaneous interpretation of interpreters, saves time and cost and improves conference quality.

Description

Multilingual online simultaneous interpretation system

Technical Field

The invention relates to the technical field of simultaneous interpretation, in particular to a multilingual online simultaneous interpretation system.

Background

With the development of science and technology and society, the demand of translation is increasing, especially the demand of simultaneous interpretation. Firstly, the simultaneous interpretation in the conference with compact rhythm can save more time and improve the efficiency; secondly, some conferences involve more than two foreign languages, and under the circumstance, interactive interpretation is obviously unrealistic and relay simultaneous transmission is needed.

The simultaneous interpretation is the most difficult one in various interpretation activities, and is a popular interpretation mode at present. The simultaneous interpretation is characterized in that a speaker continuously speaks, a translator interprets while listening, and the average interval time between the translation of the original text and the translation of the translated text is three to four seconds and at most more than ten seconds. Ear-hearing, eye-watching, hand-writing and mouth-speaking are performed at almost the same time, and a translator only uses a slight gap between two adjacent sentences spoken by the speaker to complete the translation work, so that the requirement on the quality of a practitioner is very high.

At present, in a conference needing simultaneous interpretation, an interpreter must go to a conference site to finish simultaneous interpretation work with high quality, and the interpreter who interprets simultaneous interpretation usually needs to come to a city where the conference is located one day in advance to prepare, so that high cost is brought.

Disclosure of Invention

The invention provides a multilingual online simultaneous interpretation system, which is used for solving the defects that in the prior art, an interpreter of simultaneous interpretation needs to visit a meeting site and prepare in advance, time and labor are wasted, and the cost is high, and the interpreter can remotely carry out simultaneous interpretation.

The invention provides a multilingual online simultaneous interpretation system, which comprises an audio and video acquisition end, a voice recognition end, an interpreter end and an audience end, wherein the interpreter end is used for interpreting the audio and video acquisition end;

the audio and video acquisition end is used for acquiring an original video of a speaker for speaking and sending the original video to the interpreter end and the audience end;

the interpreter end is used for inputting a target audio translated by an interpreter according to the original video and sending the target audio to the voice recognition end and the audience end; wherein, the language selected by the audience at the audience end is the same as the language of the target audio;

the voice recognition end is used for recognizing the target audio frequency, acquiring characters corresponding to the target audio frequency and sending the characters corresponding to the target audio frequency to the audience end;

and the audience terminal is used for playing the target audio and the original video and displaying characters corresponding to the target audio.

According to the multilingual online simultaneous interpretation system provided by the invention, the audio and video acquisition end is also used for acquiring the original audio of the speech of the speaker and sending the original audio to the voice recognition end;

the voice recognition end is further used for recognizing the original audio, acquiring characters corresponding to the original audio and sending the characters corresponding to the original audio to the audience end;

and the audience terminal is used for displaying the characters corresponding to the original audio.

According to the multilingual online simultaneous interpretation system provided by the invention, the audio and video acquisition end is also used for acquiring the original audio of the speech of the speaker and sending the original audio to the audience end;

and the audience terminal is used for playing the original audio.

According to the multilingual online simultaneous interpretation system provided by the invention, a plurality of interpreter terminals are provided, and each interpreter terminal is used for inputting a target audio translated by each interpreter according to a corresponding segment of the original video; and the target audio of all the segments recorded by the translator end forms the target audio of the whole original video.

According to the multilingual online simultaneous interpretation system provided by the invention, the audio and video acquisition end is further used for acquiring the language of the speaker speaking input by the user and switching the audio and video acquisition end to acquire the language of the original video according to the language of the speaker speaking.

According to the multilingual online simultaneous interpretation system provided by the invention, the voice recognition end is further used for uploading the characters corresponding to the target audio and the characters corresponding to the original audio to a cloud end, so that the audience end can acquire and display the characters corresponding to the target audio and the characters corresponding to the original audio from the cloud end.

According to the multilingual online simultaneous interpretation system, the original video of the speech of the speaker is collected by the audio and video acquisition end and sent to the interpreter end, the interpreter can translate the original video through the remote interpreter end and input the translated audio, so that the interpreter can remotely interpret simultaneously, and time and cost are saved; and the audience can hear the audio translated by the translator through the audience end and can also see the subtitles of the audio translated by the translator, thereby reducing the gap sense that the audience cannot understand foreign languages, improving the conference quality and having no need of wearing additional equipment on the conference site.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a framework of a multilingual online simultaneous interpretation system provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The multilingual online simultaneous interpretation system of the present invention is described below with reference to fig. 1, and includes an audio/video acquisition end, a speech recognition end, an interpreter end and an audience end; the audio and video acquisition end is used for acquiring an original video of a speaker for speaking and sending the original video to the interpreter end and the audience end;

the multilingual online simultaneous interpretation system in the embodiment comprises four terminals, namely an audio and video acquisition terminal, a voice recognition terminal, an interpreter terminal and an audience terminal. The speaking scene of the speaker can be simultaneous interpretation of the live conference, simultaneous interpretation of the online conference, simultaneous interpretation of the live broadcast of the network and the like.

The audio and video acquisition end works under a Windows system and is mainly responsible for the acquisition of audio and video. And transmitting the acquired audio and video data to the interpreter terminal and the audience terminal through the live broadcast stream. The audio and video acquisition end is the starting point of the whole set of multi-language on-line simultaneous interpretation system. And taking the video collected by the audio and video collection end as an original video and taking the audio collected by the audio and video collection end as an original audio.

the interpreter side comprises Windows and Mac versions and is mainly used for recording and sending target audio translated by the interpreter according to the original video translated by the interpreter. The translator can work normally in any place with a computer without visiting the meeting site. The translator can see or hear the meeting site or see the remote shared picture with low delay, and the translation work is assisted. Conventional simultaneous interpretation equipment is not required.

The target audio is audio of a language desired by the viewer. Because different audiences can understand different languages, the audiences can select the required language at the audience side. Target audios of one or more languages are input into the interpreter according to needs. And then, the target audio of the corresponding language is sent to the audience according to the requirement of each audience. The embodiment can realize multi-language simultaneous interpretation.

the voice recognition end works under an Android system, is mainly responsible for the recognition work of audio, and pushes recognized characters to a viewer end through Instant Messaging (IM).

The viewer side is an APP (Application) used by the end user. The audiences can hear the target audio of the language selected by the audiences and see the characters of the target audio through the APP, and can see the original video of the speaker speaking.

In the embodiment, the original video of the speech of the speaker is collected by the audio and video acquisition end and is sent to the translator end, the translator can translate the original video through the remote translator end and input the translated audio, so that the remote simultaneous interpretation of the translator is realized, and the time and the cost are saved; and the audience can hear the audio translated by the translator through the audience end and can also see the subtitles of the audio translated by the translator, thereby reducing the gap sense that the audience cannot understand foreign languages, improving the conference quality and having no need of wearing additional equipment on the conference site.

On the basis of the above embodiment, in this embodiment, the audio/video acquisition end is further configured to acquire an original audio of a speech of a speaker, and send the original audio to the voice recognition end; the voice recognition end is further used for recognizing the original audio, acquiring characters corresponding to the original audio and sending the characters corresponding to the original audio to the audience end; and the audience terminal is used for displaying the characters corresponding to the original audio.

Specifically, the translated target audio is sent to the voice recognition end to recognize corresponding characters, and the collected original audio is sent to the voice recognition end to recognize corresponding characters. And the characters of the original audio are displayed through the audience terminal, so that the real-time subtitle display function of the original audio and the target audio is realized.

On the basis of the above embodiment, the audio/video acquisition end in this embodiment is further configured to acquire an original audio of a speech of a speaker, and send the original audio to the audience; and the audience terminal is used for playing the original audio.

Specifically, the original video or the original audio is sent to the audience, so that the audience can remotely see the original video or hear the original audio of the conference through the audience.

On the basis of the above embodiments, in this embodiment, there are a plurality of interpreter ends, and each interpreter end is configured to enter a target audio translated by each interpreter according to a corresponding segment of the original video; and the target audio of all the segments recorded by the translator end forms the target audio of the whole original video.

Specifically, because the simultaneous interpretation work intensity is high, when the original video is translated into the target audio of a certain language, a plurality of interpreters work together on line through corresponding interpreter terminals. For example, two translators take turns to translate.

On the basis of the foregoing embodiments, in this embodiment, the audio/video acquisition end is further configured to acquire the language type of the speaker speaking, which is input by the user, and switch the audio/video acquisition end to acquire the language type of the original video according to the language type of the speaker speaking.

Specifically, the audio/video acquisition terminal cannot identify the language used in the field. Therefore, after the language used on the spot is artificially determined, the language used on the spot is input. And switching the audio and video acquisition end to the language used on site to acquire the audio and video.

On the basis of the foregoing embodiment, in this embodiment, the voice recognition end is further configured to upload the text corresponding to the target audio and the text corresponding to the original audio to a cloud end, so that the audience obtains and displays the text corresponding to the target audio and the text corresponding to the original audio from the cloud end.

Specifically, the embodiment uploads the conference site real-time subtitles to the cloud synchronously. After the conference is finished, all text contents corresponding to the audio of the speaker and the translator in the current conference can be checked, so that the conference summary can be generated quickly.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A multilingual online simultaneous interpretation system is characterized by comprising an audio and video acquisition end, a voice recognition end, an interpreter end and an audience end;

2. The multilingual online simultaneous interpretation system according to claim 1, wherein the audio/video acquisition end is further configured to acquire an original audio of a speech of a speaker and send the original audio to the voice recognition end;

3. The multilingual online simultaneous interpretation system according to claim 1, wherein the audio/video acquisition end is further configured to acquire an original audio of a speech of a speaker and send the original audio to the audience;

and the audience terminal is used for playing the original audio.

4. The multilingual online simultaneous interpretation system according to any one of claims 1 to 3, wherein there are a plurality of said interpreter terminals, each of said interpreter terminals being configured to enter a target audio translated by each interpreter according to a corresponding segment of said original video; and the target audio of all the segments recorded by the translator end forms the target audio of the whole original video.

5. The multilingual online simultaneous interpretation system according to any one of claims 1 to 3, wherein the audio/video capture end is further configured to obtain a language spoken by the speaker input by a user, and switch the audio/video capture end to capture the language of the original video according to the language spoken by the speaker.

6. The multilingual online simultaneous interpretation system according to claim 2, wherein the speech recognition module is further configured to upload the text corresponding to the target audio and the text corresponding to the original audio to a cloud, so that the viewer can obtain the text corresponding to the target audio and the text corresponding to the original audio from the cloud and display the text corresponding to the target audio and the text corresponding to the original audio.