CN109274922A - A kind of Video Conference Controlling System based on speech recognition - Google Patents
A kind of Video Conference Controlling System based on speech recognition Download PDFInfo
- Publication number
- CN109274922A CN109274922A CN201811380150.4A CN201811380150A CN109274922A CN 109274922 A CN109274922 A CN 109274922A CN 201811380150 A CN201811380150 A CN 201811380150A CN 109274922 A CN109274922 A CN 109274922A
- Authority
- CN
- China
- Prior art keywords
- meeting
- place
- unit
- audio
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 claims abstract description 61
- 238000004458 analytical method Methods 0.000 claims abstract description 37
- 238000001514 detection method Methods 0.000 claims description 38
- 238000000034 method Methods 0.000 claims description 26
- 230000002159 abnormal effect Effects 0.000 claims description 21
- 230000000694 effects Effects 0.000 claims description 18
- 230000000007 visual effect Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 8
- 230000003993 interaction Effects 0.000 claims description 8
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 20
- 230000001276 controlling effect Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000035939 shock Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241001122767 Theaceae Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention belongs to Video Conference Controlling System domain variabilities to disclose a kind of Video Conference Controlling System based on speech recognition;Including phonetic order input system, speech analysis processing system, meeting station control system, the phonetic order input system receives the phonetic order in each meeting-place, and phonetic order is transferred to speech analysis processing system, the speech analysis processing system identifies phonetic order, and control signal is issued to meeting station control system, the meeting station control system controls the equipment in meeting-place after receiving control signal;The phonetic order input system includes multiple pronunciation receivers, and is provided at least one pronunciation receiver in meeting-place of each attending a meeting;The present invention provides Video Conference Controlling System, can accurately determine the bad meeting-place of order, and propose to warn to it, this has not only saved the cost for maintaining meeting-place order, while also reducing the external interference to video conference control.
Description
Technical field
The present invention relates to Video Conference Controlling System technical field more particularly to a kind of video conferences based on speech recognition
Control system.
Background technique
With the development of real-time video technology, in modern business activities, video conference has become very universal.However
In the prior art, a considerable amount of meeting support personnels are needed to arrange, so many personnel are not only difficult to carry out cooperation,
And any fault in linking cooperation all can cause meeting safeguard work to go wrong, and it, should in more meeting-place video conferences
Problem performance more protrudes.Meanwhile increasing with each department's business demand, demand of the participant to meeting Self-Service are got over
Come more urgent, and in existing conference system, conference agenda must be fixed up in advance to ensure meeting operation according to setting in advance
Want to carry out, it is difficult to make change according to the offhand decision of participant, user's experience is bad.
Therefore, how a kind of system that can carry out intelligentized control method to video conference is provided, conference dispatching stream is being simplified
While journey, participant's Autonomous Scheduling meeting process can be realized, that improves video conference holds efficiency, reduces the generation of fault
It is the technical problem that those skilled in the art need to solve.
Summary of the invention
Weak, participant can not Autonomous Scheduling meeting for video conferencing system autonomous operation ability in the prior art by the present invention
Etc. technical problems, and it is strong and can make the video conference of participant's Autonomous Scheduling meeting process to provide a kind of autonomous operation ability
Control system.
The present invention using following technical scheme in order to solve the above technical problems, realized:
Design a kind of Video Conference Controlling System based on speech recognition, including phonetic order input system, speech analysis
Processing system, meeting station control system and control instruction record system, the phonetic order input system and the voice
Analysis process system connection, the speech analysis processing system are connect with the meeting station control system;The meeting-place control
System processed is connect with the control instruction record system;
The phonetic order input system is used to receive the phonetic order in each meeting-place, and phonetic order is transferred to language
Sound analysis process system;The speech analysis processing system identifies phonetic order, and issues control letter to meeting station control system
Number;After the meeting station control system receives control signal, and control is issued to the equipment in meeting-place according to control signal and is referred to
It enables;The control instruction record system is based on for each time point in multiple time points on the time of meeting line
Configuration file extracts the command information in each meeting-place, and is interacted or edited according to the command information of extraction;The voice
Instruction input system includes pronunciation receiver, pronunciation receiver control subsystem and spokesman's focusing subsystem, speech
The setting of people's focusing subsystem can enable what the control instruction of spokesman was more clear to be transferred in system, improve whole
The accuracy of a conference control system;The pronunciation receiver is equipped in multiple and meeting-place of each attending a meeting and is provided at least
One pronunciation receiver;The pronunciation receiver control subsystem includes that the first signal acquiring unit and control are single
Member;Spokesman's focusing subsystem includes second signal acquiring unit, signal generation unit, signature computation unit and language
Sound reception device control unit.Due to the pronunciation receiver in each meeting-place have it is multiple, and in different meetings, the number of participant
Amount is also not quite similar, and the setting of pronunciation receiver control subsystem can make not correspond to the pronunciation receiver of participant not
It opens, using electricity wisely cost improves the utilization rate of pronunciation receiver.
Above-mentioned technical proposal controls pronunciation receiver with meeting-place support personnel and controls completely different, this side to carry out instruction
In case, due to being provided with multiple pronunciation receivers in meeting-place of attending a meeting, the participant in each meeting-place can participate in meeting-place
Scheduling eliminates the scheduling institution for being responsible for scheduling meeting-place specially, keeps the process of meeting more smooth, and can be according to practical meeting
Situation is regulated and controled, it is not necessary to which the mechanical commander according to scheduling institution carries out meeting.
Preferably, there are three the pronunciation receiver in each meeting-place is all provided with, a voice can be only set to avoid meeting-place
When reception device, the problem of cannot clearly be received apart from the farther away participant of the device its control instruction, participant is increased
The participation of person;And pronunciation receiver its voice messaging for being used to receive participant, the pronunciation receiver are also simultaneously
Speech device;The program mutually unifies the pronunciation receiver in conference control system with the speech equipment in meeting, saves
The use of equipment, while the operation for switching distinct device in meeting is also eliminated, keep entire video conference more smooth.
Preferably, first signal acquiring unit, for obtaining the location information of participant, the control list
Member, the pronunciation receiver in setting range corresponding to participant's location information for obtaining the first signal acquiring unit
Open.
Preferably, the second signal acquiring unit includes video acquisition unit and the first voice acquisition unit, institute
The video acquisition stated is applied alone in the video information for obtaining multiple participants;First voice acquisition unit is for obtaining meeting
Audio-frequency information;The signal generation unit, to video acquisition unit obtain video information in, each participant's speech activity
Relevant visual signal is detected respectively, generates the visual activity detection signal to match with each participant;Simultaneously to the
The audio-frequency information that one voice acquisition unit obtains is detected, to generate voice activity detection signal;The signature computation unit,
For the multiple visual activity detection signal to be compared with the voice activity detection signal respectively, and will be with institute's predicate
Participant corresponding to the highest visual activity detection signal of the sound activity detection signal degree of correlation is determined as current speaker;It is described
Pronunciation receiver control unit, the spokesman for receiving signature computation unit determine as a result, to the pronunciation receiver in meeting-place
It is controlled, so that being transferred in system of being more clear of the voice of spokesman.
Preferably, the phonetic order input system further includes warning subsystems, and the warning subsystems include third
Signal acquiring unit, abnormal meeting-place determination unit and reminding unit;The third signal acquiring unit, for obtaining default
In time interval, at least one of the audio-frequency information in each meeting-place, video information conference signal in video conference, the audio letter
Breath is obtained by pronunciation receiver;Exception meeting-place determination unit, for what is obtained to the third signal acquiring unit
The conference signal in each meeting-place is analyzed, and determines the related meeting-place for influencing meeting order;Exception meeting-place determination unit, including
Signal acquisition module, for obtaining the audio-frequency information in each meeting-place in preset time period;The exception meeting-place determination unit further includes
Signal analysis module is analyzed for the audio-frequency information to each meeting-place in the preset time period, and determining influences meeting order
Abnormal meeting-place;The reminding unit, for reminding the abnormal meeting-place for influencing meeting order.
Preferably, the signal analysis module further includes that the first processing subelement and first determine subelement;It is described
The first processing subelement for the audio-frequency information according to each meeting-place obtain the audio status in each meeting-place, the audio status packet
Include talk situation and non-talk situation;Described first determines subelement, when detecting two or more meeting-place audios
When state is talk situation, determine described two or more than two meeting-place for the abnormal meeting-place of influence meeting order.
Preferably, the phonetic order input system further includes echo processing subsystem;The echo handles subsystem
System includes the second voice acquisition unit and echo processing module;Second voice acquisition unit includes that several voices obtain
Modulus block, audio frequency vibration module, speech detection module and session control center, the voice obtain module with one
Audio frequency vibration module is connected with a speech detection module;The speech detection module is obtained for detecting corresponding voice
The audio-frequency information of module is sent to session control center;The audio frequency vibration module obtains module for detecting corresponding voice
Audio frequency vibration information, be sent to session control center;Volume session control center, receives and processes speech detection module
The audio frequency vibration information of audio-frequency information and audio frequency vibration module, and it is sent to echo processing module;The echo processing module
It receives audio-frequency information and eliminates echo, send the audio-frequency information after eliminating echo to adaptive-filtering module;Described is adaptive
Filter module receives the audio-frequency information of echo processing module, and speech analysis processing system is sent to after filtering processing;Some plays
Meeting in, since related participant is less, cause meeting-place spacious, the voice of spokesman can form echo in meeting-place, this
Very big influence is caused on the speech recognition of conference control system, the setting of echo processing subsystem reduces the influence of echo,
Improve the accuracy of meeting-place control.
Preferably, the described control instruction record system include extraction unit, index point generation unit, complete unit and
Interaction and edit cell;The extraction unit, for each time point in multiple time points on the time of meeting line,
The command information in each meeting-place is extracted based on configuration file, wherein the time of meeting line and meeting time correlation join, it is described to match
Set command information of the file for conference setup;The index point generation unit, for believing the instruction in each meeting-place
Breath is combined into crucial index point, and the key index point is used as the index point for being interacted or being edited with instruction record;Described
Unit is completed, for the multiple crucial index points for corresponding to multiple time points to be combined into instruction record;The interaction and volume
Unit is collected, is interacted or is edited with described instruction record for the key message in being recorded according to described instruction.
A kind of Video Conference Controlling System based on speech recognition proposed by the present invention, beneficial effect are:
(1) present invention provides Video Conference Controlling System, makes the participant in each meeting-place that can participate in the tune in meeting-place
Degree eliminates the scheduling institution for being responsible for scheduling meeting-place specially, keeps the process of meeting more smooth, and can be according to practical meeting feelings
Condition is regulated and controled, it is not necessary to which the mechanical commander according to scheduling institution carries out meeting;
(2) present invention provides Video Conference Controlling System, can also more accurately determine spokesman, make the control of spokesman
It instructs what can be more clear to be transferred in system, improves the accuracy of entire conference control system;
(3) present invention provides Video Conference Controlling System, can accurately determine the bad meeting-place of order, and propose to warn to it
Show, this has not only saved the cost for maintaining meeting-place order, while also reducing the external interference to video conference control.
Detailed description of the invention
The present invention is described in further detail for embodiment in reference to the accompanying drawing, but does not constitute to of the invention
Any restrictions.
Fig. 1 is a kind of structural schematic diagram of specific embodiment of Video Conference Controlling System of the present invention;
Fig. 2 is the structural schematic diagram of the first specific embodiment of phonetic order input system of the present invention;
Fig. 3 is a kind of structural schematic diagram of specific embodiment of phonetic order input system of the present invention;
Fig. 4 is a kind of structural schematic diagram of specific embodiment of phonetic order input system of the present invention;
Fig. 5 is a kind of structural schematic diagram of specific embodiment of second signal acquiring unit of the present invention;
Fig. 6 is a kind of structural schematic diagram of specific embodiment of phonetic order input system of the present invention;
Fig. 7 is a kind of structural schematic diagram of specific embodiment of signal analysis module of the present invention;
Fig. 8 is a kind of structural schematic diagram of specific embodiment of signal analysis module of the present invention;
Fig. 9 is a kind of structural schematic diagram of specific embodiment of signal analysis module of the present invention;
Figure 10 is a kind of structural schematic diagram of specific embodiment of phonetic order input system of the present invention;
Figure 11 is a kind of structural schematic diagram of specific embodiment of the second voice acquisition unit;
Figure 12 is that control instruction of the present invention records a kind of structural schematic diagram of specific embodiment of system;
Figure 13 is a kind of structural schematic diagram of specific embodiment of video conferencing system of the present invention;
Figure 14 is a kind of structural schematic diagram of specific embodiment of speech analysis processing system of the present invention.
In figure: phonetic order input system 1, pronunciation receiver 11, pronunciation receiver control subsystem 12, first are believed
Number acquiring unit 121, control unit 122, spokesman's focusing subsystem 13, second signal acquiring unit 131, video acquisition unit
1311, the first voice acquisition unit 1312, signal generation unit 132, signature computation unit 133, pronunciation receiver control are single
Member 134, warning subsystems 14, third signal acquiring unit 141, abnormal meeting-place determination unit 142, signal acquisition module 1421,
Signal analysis module 1422, first handle subelement 14221, first determine subelement 14222, second processing subelement 14223,
Second statistics subelement 14224, second determines that subelement 14225, speech recognition subelement 14226, third determine subelement
14227, third processing subelement 14228, the 4th determine subelement 14229, reminding unit 143, echo processing subsystem 15, the
Two voice acquisition units 151, voice obtain module 1511, audio frequency vibration module 1512, speech detection module 1513, session control
Center 1514, speech analysis processing system 2, meeting station control system 3, control instruction record system 4, mentions echo processing module 152
It takes unit 41, index point generation unit 42, complete unit 43, interaction and edit cell 44.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this
The embodiment of invention, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, belongs to protection scope of the present invention.
Fig. 1 is a kind of structural schematic diagram of embodiment of Video Conference Controlling System of the present invention, with reference to Fig. 1, video council
View control system includes phonetic order input system 1, speech analysis processing system 2, meeting station control system 3 and control instruction note
Recording system 4, the phonetic order input system 2 receives the phonetic order in each meeting-place, and phonetic order is transferred to voice point
Processing system 2 is analysed, the speech analysis processing system 2 identifies phonetic order, and issues control signal to meeting station control system 3,
The meeting station control system 3 controls the equipment in meeting-place after receiving control signal;The control instruction records system
In each time point of the system 4 in multiple time points on the time of meeting line, the instruction in each meeting-place is extracted based on configuration file
Information, and interacted or edited according to the command information of extraction.
Fig. 2 is the structural schematic diagram of the first embodiment of phonetic order input system 1 of the present invention, the phonetic order
Input system 1 includes multiple pronunciation receivers 11, and is provided with multiple pronunciation receivers 11, institute in meeting-place of each attending a meeting
Stating pronunciation receiver 11 can be the fixed equipment being fixed in each meeting-place, or movable equipment, or it is fixed
The combination of equipment and movable equipment.In this scheme, due to being provided with multiple pronunciation receivers 11, Mei Gehui in meeting-place of attending a meeting
Participant can participate in the scheduling in meeting-place, eliminate the scheduling institution for being responsible for scheduling meeting-place specially, make meeting into
Cheng Gengwei is smooth, and can be regulated and controled according to practical meeting situation, it is not necessary to mechanical according to the commander of scheduling institution
View, meanwhile, multiple pronunciation receivers 11 in each meeting-place also make the equal energy of control instruction of the participant in different location
Clearly received.Further, in the present embodiment, the pronunciation receiver 11 is also simultaneously speech device.The party
Case mutually unifies the pronunciation receiver 11 in conference control system with the speech equipment in meeting, has saved the use of equipment,
The operation for switching distinct device in meeting is also eliminated simultaneously, keeps entire video conference more smooth.
Fig. 3 is the structural schematic diagram of the another embodiment of phonetic order input system 1 of the present invention, the phonetic order
Input system 1 further includes pronunciation receiver control subsystem 12, and the pronunciation receiver control subsystem 12 includes first
Signal acquiring unit 121, control unit 122, first signal acquiring unit 121 are set in each meeting-place.In meeting-place,
The region that each participant may take a seat can be arranged pronunciation receiver 11, and a kind of conventional arrangement is each seat
At least one corresponding pronunciation receiver 11, and the corresponding relationship of seat and pronunciation receiver 11 is defaulted into control unit
In 122.First signal acquiring unit 121, the location information of participant is obtained when meeting starts, and the information is transmitted
Into control unit 122;Described control unit 122, according to the location information of participant, by the language in corresponding setting range
Sound reception device 11 is opened.The setting of pronunciation receiver control subsystem 12 can make can not temporarily to use in meeting-place
Pronunciation receiver 11 is not opened at first with meeting, has saved electric cost, also improves the use of pronunciation receiver
Rate.
Fig. 4 is the structural schematic diagram of another embodiment of phonetic order input system 1 of the present invention, the phonetic order
Input system 1 further includes spokesman's focusing subsystem 13, and spokesman's focusing subsystem 13 includes second signal acquiring unit
131, signal generation unit 132, signature computation unit 133, pronunciation receiver control unit 134.
As shown in figure 5, the second signal acquiring unit 131 include video acquisition unit 1311, for obtain it is multiple with
The video information of meeting person;The second signal acquiring unit 131 further includes the first voice acquisition unit 1312, for obtaining meeting
Audio-frequency information, it is preferred that first voice acquisition unit 1312 be the pronunciation receiver 11.
The signal generation unit 132, to video acquisition unit 1311 obtain video information in, each participant's voice
The relevant visual signal of activity is detected respectively, generates the visual activity detection signal to match with each participant, such as
VVADl, VVAD2, VVAD3 etc.;Usually along with the rapidly, continuously movement of its mouth, which leads the floor status of spokesman
The consecutive variations of lip interval area are caused, therefore in a kind of scheme, the visual activity is preferably the lip activity of participant
Mode, the video acquisition unit 1311 carry out independent visual activity detection, the video acquisition to multiple participants respectively
Unit 1311 obtains lip outline by the difference of lip and face's color, and the gap based on upper lower lip is in brightness, face
Difference on color determines the area in lip gap.When difference of the area in the successive frame of video is more than preset threshold value
When, the visual activity detection signal output of the participant is " 1 ", and otherwise, the visual activity detection signal output of the participant is
"0";The audio-frequency information obtained simultaneously to the first voice acquisition unit 1312 detects, to generate voice activity detection signal
AVAD, first voice acquisition unit 1312 are used to believe by detecting the audio-frequency information to obtain the voice activity detection
Number;When, there are when voice, the voice activity detection signal output is " 1 ", otherwise, the voice activity detection in audio-frequency information
Signal output is " 0 ".
The signature computation unit 133, for by the multiple visual activity detection signal respectively with the speech activity
Detection signal is compared, and will be detected corresponding to signal with the highest visual activity of the voice activity detection signal degree of correlation
Participant be determined as current speaker.In a kind of scheme, the signature computation unit 133 uses comparison circuit, comparator
Equal components show that each visual activity detection signal VVAD1, VVAD2, VVAD3 etc. are related to voice activity detection signal AVAD
Degree, and the maximum participant of the degree of correlation is determined as spokesman.
The pronunciation receiver control unit 134, the spokesman for receiving signature computation unit 133 determine as a result, to meeting
Pronunciation receiver 11 in is controlled, so that being transferred in system of being more clear of the voice of spokesman.It is described
It can be closing and the incoherent pronunciation receiver 11 of the spokesman to the control method of pronunciation receiver 11;It can also be with
To control pronunciation receiver 11 all in meeting-place, making these pronunciation receivers towards the spokesman.
The setting of spokesman's focusing subsystem 13, being transferred to of can enabling that the control instruction of spokesman is more clear are
In system, the accuracy of entire conference control system is improved.
Fig. 6 is the structural schematic diagram of the third embodiment of phonetic order input system 1 of the present invention, the phonetic order
Input system 1 includes warning subsystems 14, and the warning subsystems 14 are true including third signal acquiring unit 141, abnormal meeting-place
Order member 142, reminding unit 143.
The third signal acquiring unit 141, for obtaining the sound in each meeting-place in video conference in preset time section
At least one of frequency information, video information conference signal, in the present embodiment, the audio-frequency information can be filled by phonetic incepting
Set 11 acquisitions.The exception meeting-place determination unit 142, each meeting-place for being obtained to the third signal acquiring unit 141
Conference signal is analyzed, and determines the related meeting-place for influencing meeting order.Exception meeting-place determination unit 142, including signal
Module 1421 is obtained, for obtaining the audio-frequency information in each meeting-place in preset time period;The exception meeting-place determination unit further includes
Signal analysis module 1422 is analyzed for the audio-frequency information to each meeting-place in the preset time period, and determining influences meeting
The abnormal meeting-place of order.The reminding unit 143 is reminded for reminding the abnormal meeting-place for influencing meeting order
Mode can be the modes such as voice, text, can also prevent its influence from referring to the pronunciation receiver 11 in temporary close exception meeting-place
The identification of order.
A kind of embodiment of the signal analysis module 1422 is as shown in figure 8, its control flow are as follows:
S101, the first processing subelement 14221 obtain the audio status in each meeting-place, institute according to the audio-frequency information in each meeting-place
Stating audio status includes talk situation and non-talk situation;S102, the first judgement subelement 14222, which is worked as, detects two or two
When above meeting-place audio status is talk situation, described two or more than two meeting-place are determined to influence meeting order
Abnormal meeting-place.
Specifically, in S101, the acquisition of each meeting-place voice status is specifically determined according to the audio-frequency information in each meeting-place each
Whether meeting-place is in the voice status of speech, for a certain meeting-place, at a time, if being determined as according to audio-frequency information
When voice, then it can determine that voice mobility of the meeting-place at the moment is 1, indicate that meeting-place is in talk situation, otherwise, voice is living
Dynamic degree is 0, indicates nobody's speech in meeting-place, is non-talk situation.For S102, by taking the meeting with 3 meeting-place as an example, come
Illustrate the voice status in each meeting-place, if in a certain period of time, meeting-place 1 and meeting-place 2 are in the state alternately talked, this can recognize
To be that the people in two meeting-place is in the state alternately made a speech, the instruction control of entire meeting is in normal condition;If a certain
Period, meeting-place 1 and meeting-place 3 are in while the state of speech, it is believed that in this stage, meeting-place 1 and meeting-place 3 are in influence
The state of meeting order.
Second of embodiment of the signal analysis module 1422 is as shown in figure 9, its control flow are as follows:
S201, second processing subelement 14223 obtain the audio status in each meeting-place, institute according to the audio-frequency information in each meeting-place
Stating audio status includes talk situation and non-talk situation;S202, the second statistics subelement 14224 count the audio in each meeting-place
State is the speech duration in several meeting-place of talk situation;S203, the second judgement subelement 14225 calculate several meeting-place
The ratio for the duration and the preset time section of talking, and when the ratio is greater than pre-set ratio threshold value, which is determined
For abnormal meeting-place candidate meeting-place;S204, the language in the audio-frequency information in 14226 pairs of speech recognition subelement abnormal meeting-place candidate meeting-place
Sound carries out the identification of voice to text;S205, third determine the language in the abnormal meeting-place candidate meeting-place that subelement 14227 will identify that
The corresponding text of sound is compared with preset keyword, and the abnormal meeting-place candidate meeting-place for not occurring keyword is judged to influencing
The abnormal meeting-place of meeting order.
Specifically, a period of time section can be preset in S203, that is, participant issues the routine of control instruction
Time span illustrates the meeting when the ratio of the speech duration in certain meeting-place and Conventional Time length is more than the threshold value of a certain setting
Field talk time is too long, in fact it could happen that the non-controlling instruction speech such as participant's chat;In S205, meeting can be preset to be begged for
By the keyword of content, after S204 identifies the corresponding text of voice in each meeting-place, so that it may it is compared with Key word voice,
When in meeting-place personnel discuss content be not related to, i.e., when not including the Key word voice, it may be determined that the meeting-place discussing with
The meeting-place then can be judged to influencing the abnormal meeting-place of meeting order by the unrelated content of control instruction.For example, the view of a certain meeting
The entitled power saving that power Transmission process is discussed, in this way, can determine that some controls refer in advance according to meeting subject under discussion for the subject under discussion
The keyword of order, such as speaker's information, agenda, discussion topic, screens switch, meeting tea are had a rest, in this way, in meeting
After beginning, so that it may to the voice in each meeting-place carry out identification and semantic analysis, when discovery participant speech information in do not include pre-
When the keyword being first arranged, then it is assumed that the topic of the discussion in corresponding meeting-place is unrelated with that can discuss control instruction, to influence meeting order
Abnormal meeting-place, which can be reminded.
The third embodiment of the signal analysis module 1422 is as shown in Figure 10, control flow are as follows:
S301, third handle subelement 14228 according to the audio-frequency information in each meeting-place, obtain the audio volume in each meeting-place;
The meeting-place that audio volume is greater than default volume threshold is judged to influencing meeting order by S302, the 4th judgement subelement 14229
Abnormal meeting-place.
Specifically, in S302, it can be according to the volume in each meeting-place, whether the speech to determine each meeting-place is normal, such as volume
It is excessively high, then it is assumed that not to be normal control instruction, it may be possible to which therefore that quarrel or confused noise etc. can determine the excessively high meeting-place of volume
For influence meeting order related meeting-place, and on these influence meeting orders related meeting-place remind.Sound can be preset
Threshold value, such as 90 decibels or 100 decibels are measured, when the volume in meeting-place is more than the default volume threshold, so that it may determine meeting-place volume
It is excessive.
The setting of warning subsystems can accurately determine the bad meeting-place of order, and propose to warn to it, or even temporarily right
Its pronunciation receiver 11 is closed, this has not only saved the cost for maintaining meeting-place order, while also reducing video conference
The external interference of control.
Figure 10 is the structural schematic diagram of the another embodiment of phonetic order input system 1 of the present invention, and the voice refers to
Enable input system 1 include echo processing subsystem 15, the echo processing subsystem 15 include the second voice acquisition unit 151,
Echo processing module 152.
The structure of second voice acquisition unit 151 is as shown in figure 12, including multiple voices obtain module 1511, audio
Shock module 1512, speech detection module 1513, session control center 1514, each voice obtain module 1511 with a sound
Frequency shock module 1512 is connected with a speech detection module 1513, in the present embodiment, second voice acquisition unit 151
For pronunciation receiver 11.
The speech detection module 1513 detects the audio-frequency information that corresponding voice obtains module 1511, is sent to session control
Center 1514 processed;The audio frequency vibration module 1512 detects the audio frequency vibration information that corresponding voice obtains module 1511, is sent to
Session control center 1514;The session control center 1514 receives the audio-frequency information of speech detection module 1513, with database
Comparison, whether the audio information content includes preset audio, including when preset audio, sends and closes microphone instruction to corresponding language
Sound obtains switch module;The corresponding voice obtains switch module and receives instruction and close corresponding voice acquisition module 1511,
The vibration information of audio frequency vibration module transmission is continued to, sends and opens microphone instruction to voice acquisition switch module, institute's predicate
Sound obtains switch module and receives open command and open corresponding voice acquisition module 1511;When not including preset audio, continue
Speech detection module information is received, and does not receive the vibration information of audio frequency vibration module;The session control center 1514 receives
The voice messaging of speech detection module, and it is sent to echo processing module 152.
The echo processing module 152 receives audio-frequency information and simultaneously eliminates echo, send the audio-frequency information after eliminating echo to
Adaptive-filtering module;The adaptive-filtering module: the audio-frequency information of echo processing module is received, is sent to after filtering processing
Speech analysis processing system 2.
In the meeting of some plays, since related participant is less, cause meeting-place spacious, the voice of spokesman can be in meeting
Echo is formed in, this causes very big influence to the speech recognition of conference control system, the setting of echo processing subsystem 15,
The influence for reducing echo improves the accuracy of meeting-place control.
Figure 14 is a kind of specific embodiment of speech analysis processing system 2 provided in the present invention, unless otherwise instructed,
Other embodiments are all made of this method and carry out speech analysis processing in the present invention.
In speech recognition process, with the increase of word quantity in vocabulary, a possibility that selecting wrong word, may also
Increase.In order to improve, speech recognition system must be by reducing vocabulary while improving the accuracy of speech-to-text conversion
Size becomes more intelligent.A kind of mode for reducing vocabulary is the vocabulary of the personalized system, for example, system can be by pre-add
The vocabulary in field described in meeting is carried, the field is, for example, petroleum, electric power or intellectual property industry.Reduce the another of vocabulary size
Kind mode is that vocabulary is carried out personalization for particular individual.For example, by being harvested from participant's used terminal device intelligence
Network data, to create personal vocabulary.
Voice or audio are received at one or more endpoints.Decoder receive from acoustic model, dictionary model with
And the input of language model 107, to decode the voice.Voice 101 is converted into text by decoder 10, and the text is as word grid
Output.Decoder can also calculate confidence score, and confidence score can be confidence interval.
Voice can be analog signal.The analog signal can be different sampling rate (that is, the sample number of each second, most
Commonly: 8kHz, 16kHz, 32kHz, 44.1kHz, 48kHz and 96kHz) and/or different every sample bits are (most common
: 8 bits, 16 bits or 32 bits) it encodes.
One or more of acoustic model, dictionary model and language model are storable in decoder, or can be from
External data base receives.
Acoustic model can be created according to the statistical analysis for the writing record opened up to voice and human hair.The statistics credit
Analysis is related to forming the sound of each word.Acoustic model can be from the program creation of referred to as " training ".In training, user is to voice
Identifying system says specified word.Dictionary model 105 is pronunciation vocabulary.For example, in the presence of that can pronounce not to same word
Same mode.For example, word " electric power " has different pronunciations from In Fujian Province in Shandong District.Speech recognition system utilizes dictionary mould
Type identifies various pronunciations.Acoustic model, language model are optional system.
Language model limits word and appears in the probability in sentence.For example, speech recognition can be " defeated by speech recognition system
Send " or " comfortable ", every kind of possibility is with equal likelihood.However, if subsequent word is identified as " electric power ",
It is " conveying " rather than " comfortable " that language model, which then shows that word in the early time has very high probability,.Language model can be from text data
Building.Language model may include the probability distribution of sequence of terms.The probability distribution can be conditional probability (that is, in another word
The probability of the next word of the case where language occurs).
Decoder can convert audio or voice in ongoing meeting.In this way, the spy that the session occurs
Determine viewpoint or text is quickly recorded.
Decoder can be the network equipment, such as cloud computing center.Decoder include controller, memory, database and
Communication interface, the communication interface include input interface and output interface.Input interface receives the voice from endpoint.Output interface
Decoded text can be provided to external data base or search engine.Alternatively, decoded text can store in data
Library.
One or more of acoustic model, dictionary model and language model can be stored in memory or database.
Memory can be the volatile memory or nonvolatile memory of any known type.Memory may include read-only memory
(ROM), dynamic random access memory (DRAM), static random access memory (SRAM), programmable random access memory
(PROM), flash memory, electronic erasable programmable read-only memory (EEPROM), static random access memory (RAM) or other classes
One or more of type memory.Memory may include light, magnetic (hard disk drive) or the data of any other form
Storage device.What memory can either can be removed in remote-control device, such as secure digital (SD) storage card.
Database can be set decoder outside or be comprised in decoder.Database can by memory Lai
Storage or individually storage.Database can be hardware or software form.
Memory can store computer executable instructions.Controller can execute the executable instruction of computer.It calculates
Machine executable instruction may include in computer code.Computer code can be stored in memory.Computer code can appoint
What computer language is write, such as C, C++, C#, Java, Pascal, VisualBasic, Perl, hypertext markup language
(HTML), JavaScript, assembler language, extensible markup language (XML) and any combination thereof.
Computer code can be coding in one or more tangible mediums or one or more nonvolatile tangible mediums
In so as to logic performed by controller.Coding is in one or more tangible mediums so that can be defined as can for the logic of execution
Instruction performed by controller, and these instructions are that computer-readable storage medium, memory or their combination above mention
It supplies.For command net equipment instruction be storable in it is any in logic.As used herein, described " logic " includes
But it is not limited to hardware, firmware, the software executed on machine and/or respective combination, for realizing (one or more) function
Or (one or more) movement, and/or facilitate function or movement from another logic, method and/or system.Logic can wrap
Include the microprocessor of such as software control, ASIC, analog circuit, digital circuit, the logic device of programming and comprising instruction
Memory device.
Instruction is storable on any computer-readable medium.Computer-readable medium can include but is not limited to floppy disk,
Hard disk, specific integrated circuit (ASIC), compact-disc CD, other optical mediums, random access memory (RAM), read-only memory
(ROM), storage chip or card, memory stick and computer, processor or other electronic equipments can read from its
His medium.
Controller may include general processor, digital signal processor, specific integrated circuit, field-programmable gate array
Column, analog circuit, digital circuit, processor-server, above-mentioned items combination or other are currently known or develop later
Processor.Controller can be the combination of single device for example related with network or distribution process or multiple devices.This
Outside, those skilled in the art will appreciate that, controller can realize Viterbi (Viterbi) decoding algorithm for speech recognition.
Any strategy in various processing strategies, such as multiprocessing, multitask, parallel processing, long-range processing, centralized processing can be used
Etc..Controller can be responded or be operable to execute and deposit as software, hardware, integrated circuit, firmware, microcode etc.
The instruction of storage.Function, movement, method or the task for being shown in the accompanying drawings or being described herein can be stored in by execution
The controller of instruction in reservoir executes.These functions, movement, method or task are independently of instruction set, storage medium, processing
The concrete type of device or processing strategie, and can be by the software, hardware, integrated circuit, solid that independently or in combination runs
Part, microcode etc. execute.These instructions are to realize processing, technology, method or movement described herein.
It will be appreciated by those skilled in the art that pronunciation receiver control subsystem 12, spokesman's focusing subsystem 13, police
Show that subsystem 14, echo processing subsystem 15 can be selected according to actual needs, and be can be used in combination, and the group of above-mentioned subsystem
Usage mode is closed, the present invention is simultaneously not particularly limited.It should also be appreciated by one skilled in the art that being combined in above-mentioned subsystem
In the case where use, it may appear that more embodiments, these embodiments also fall into this hair without departing from the principle of the present invention
In the range of bright protection.
Further, the Video Conference Controlling System further includes control instruction record system 4, the control instruction record
System 4 includes: extraction unit 41, index point generation unit 42, completes unit 43, interaction and edit cell 44.Wherein, S401,
Extraction unit 41 in each time point on the time of meeting line in multiple time points, extracts each meeting-place based on configuration file
Key message, wherein the time of meeting line and meeting time correlation join, the configuration file is used for the instruction of conference setup
Information;The command information in each meeting-place is combined into crucial index point, the key rope by S402, index point generation unit 42
Draw the index point for being a little used as and being interacted or edited with instruction, i.e., the key message group in each meeting-place is combined into corresponding to institute
State the crucial index point of all information contained by key message;S403, the multiple of multiple time points will be corresponded to by completing unit 43
Crucial index point generates instruction record;S404, interaction with edit cell 44 according to described instruction record in key message and institute
Instruction record is stated to be interacted or edited.
In S401, it is however generally that, configuration file include voice, video detection and identification module, key message extraction module,
Event determines and analysis module.Key message includes one or more of following information: face, limb action, voice, key
Frame, customized event.Wherein so-called customized event can be some special events in instruction control, for example including such as finger
The scenes such as show, refuse, arguing, also may include other customized things.The format of instruction record is text file, audio
File, video file, flash file or PPT file.
In S402, for example, including face, voice in the key message that configuration file defines, then extracting in each meeting-place
Corresponding to the face key message and voice key information at a time point on the time of meeting line, then by face key
Information and voice key information are combined into a crucial index point.
In S403, on the basis of crucial index point, the crucial index point that multiple time points are generated is combined together just
Generate the instruction record of the video conference.Specifically, the mode of instruction record is generated, according to certain motor pattern multiple
Crucial index point is together in series.
In S404, in order to obtain more complete instruction record, participant can be recorded according to described instruction in crucial letter
Breath is interacted or is edited with described instruction record.The mode of this interaction and editor can be participant's click commands record
In name when, the brief information of the people is displayed in real time out, or provide further reference key, so that participant is to this
Instruction is verified.
Control instruction records the setting of system 4, and the key that participant can be assisted to record in entire conference process refers to
It enables, auxiliary participant summarizes conference process, and by the interpretation to key instruction, participant can analyze out some non-meetings
The relevant content of content, if which meeting-place order is good, the instruction which meeting-place issues is more effective etc..
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, RandomAccessMemory), magnetic or disk.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Within the technical scope of the present disclosure, any changes or substitutions that can be easily thought of by anyone skilled in the art,
It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims
Subject to enclosing.
Claims (8)
1. a kind of Video Conference Controlling System based on speech recognition, which is characterized in that including phonetic order input system (1),
Speech analysis processing system (2), meeting station control system (3) and control instruction record system (4), the phonetic order typing
System (1) is connect with the speech analysis processing system (2), the speech analysis processing system (2) and the meeting-place
Control system (3) connection;The meeting station control system (3) is connect with control instruction record system (4);
The phonetic order input system (1) is used to receive the phonetic order in each meeting-place, and phonetic order is transferred to language
Sound analysis process system (2);The speech analysis processing system (2) identifies phonetic order, and to meeting station control system (3) hair
Signal is controlled out;After the meeting station control system (3) receives control signal, and according to control signal to the equipment in meeting-place
Issue control instruction;Control instruction record system (4) is for each of multiple time points on the time of meeting line
On time point, the command information in each meeting-place is extracted based on configuration file, and interacted or compiled according to the command information of extraction
Volume;
The phonetic order input system (1) includes pronunciation receiver (11), pronunciation receiver control subsystem (12)
And spokesman's focusing subsystem (13);The pronunciation receiver (11) is equipped in multiple and meeting-place of each attending a meeting and is respectively provided with
There is at least one pronunciation receiver (11);The pronunciation receiver control subsystem (12) includes the first signal acquisition list
First (121) and control unit (122);Spokesman's focusing subsystem (13) include second signal acquiring unit (131),
Signal generation unit (132), signature computation unit (133) and pronunciation receiver control unit (134).
2. a kind of Video Conference Controlling System based on speech recognition according to claim 1, which is characterized in that Ge Gehui
Pronunciation receiver (11) in is equipped with no less than three, is used to receive the voice messaging of participant.
3. a kind of Video Conference Controlling System based on speech recognition according to claim 1, which is characterized in that described
First signal acquiring unit (121), for obtaining the location information of participant, the control unit (122) is used for first
Pronunciation receiver (11) in setting range corresponding to participant's location information that signal acquiring unit (121) obtains is beaten
It opens.
4. a kind of Video Conference Controlling System based on speech recognition according to claim 1, which is characterized in that described
Second signal acquiring unit (131) includes video acquisition unit (1311) and the first voice acquisition unit (1312), described
Video acquisition unit (1311) is used to obtain the video information of multiple participants;First voice acquisition unit (1312) is used
In the audio-frequency information for obtaining meeting;The signal generation unit (132), to video acquisition unit obtain video information in, often
The relevant visual signal of a participant's speech activity detects respectively, generates the visual activity to match with each participant and examines
Survey signal;The audio-frequency information obtained simultaneously to the first voice acquisition unit (1312) detects, to generate voice activity detection
Signal;The signature computation unit (133), for examining the multiple visual activity detection signal with the speech activity respectively
It surveys signal to be compared, and will be detected corresponding to signal with the highest visual activity of the voice activity detection signal degree of correlation
Participant is determined as current speaker;The pronunciation receiver control unit (134), receives the spokesman of signature computation unit
Determine as a result, controlling the pronunciation receiver (11) in meeting-place, so that the biography that the voice of spokesman can be more clear
It is handed in system.
5. a kind of Video Conference Controlling System based on speech recognition according to claim 1, which is characterized in that described
Phonetic order input system (1) further includes warning subsystems (14), and the warning subsystems (14) include third signal acquisition list
First (141), abnormal meeting-place determination unit (142) and reminding unit (143);The third signal acquiring unit (141), is used for
It obtains in preset time section, at least one of the audio-frequency information in each meeting-place, video information conference signal in video conference,
The audio-frequency information is obtained by pronunciation receiver (11);The exception meeting-place determination unit (142), for the third
The conference signal in each meeting-place that signal acquiring unit (141) obtains is analyzed, and determines the related meeting-place for influencing meeting order;Institute
Abnormal meeting-place determination unit (142), including signal acquisition module (1421) are stated, for obtaining the sound in each meeting-place in preset time period
Frequency information;The exception meeting-place determination unit (142) further includes signal analysis module (1422), for the preset time period
The audio-frequency information in interior each meeting-place is analyzed, and determines the abnormal meeting-place for influencing meeting order;The reminding unit (143), is used for
The abnormal meeting-place for influencing meeting order is reminded.
6. a kind of Video Conference Controlling System based on speech recognition according to claim 5, which is characterized in that described
Signal analysis module (1422) further includes that the first processing subelement (14221) and first determine subelement (14222);Described
First processing subelement (14221) obtains the audio status in each meeting-place, the audio for the audio-frequency information according to each meeting-place
State includes talk situation and non-talk situation;Described first determines subelement (14222), when detect two or two with
On meeting-place audio status when being talk situation, determine that described two or more than two meeting-place are influence meeting order different
Normal meeting-place.
7. a kind of Video Conference Controlling System based on speech recognition according to claim 1, which is characterized in that described
Phonetic order input system (1) further includes echo processing subsystem (15);The echo processing subsystem (15) includes second
Voice acquisition unit (151) and echo processing module (152);Second voice acquisition unit (151) includes several
Voice obtains module (1511), audio frequency vibration module (1512), speech detection module (1513) and session control center (1514),
The voice obtain module (1511) with an audio frequency vibration module (1512) and a speech detection module
(1513) it is connected;The speech detection module (1513) obtains the audio letter of module (1511) for detecting corresponding voice
Breath, is sent to session control center (1514);The audio frequency vibration module (1512) obtains module for detecting corresponding voice
(1511) audio frequency vibration information is sent to session control center (1514);Volume session control center (1514) receives simultaneously
The audio-frequency information of speech detection module (1513) and the audio frequency vibration information of audio frequency vibration module (1512) are handled, and is sent to back
Sound processing module (152);The echo processing module (152) receives audio-frequency information and eliminates echo, sends after eliminating echo
Audio-frequency information to adaptive-filtering module;The adaptive-filtering module receives the audio letter of echo processing module (152)
Breath, is sent to speech analysis processing system (2) after filtering processing.
8. a kind of Video Conference Controlling System based on speech recognition according to claim 1, which is characterized in that described
Control instruction record system (4) include extraction unit (41), index point generation unit (42), complete unit (43) and interaction with
Edit cell (44);The extraction unit (41), for each time point in multiple time points on the time of meeting line
On, the command information in each meeting-place is extracted based on configuration file, wherein the time of meeting line and meeting time correlation join, it is described
Configuration file is used for the command information of conference setup;The index point generation unit (42), for by each meeting-place
Command information is combined into crucial index point, and the key index point is used as the index point for being interacted or being edited with instruction record;
The completion unit (43), for the multiple crucial index points for corresponding to multiple time points to be combined into instruction record;It is described
Interaction and edit cell (44), recorded for key message and the described instruction in being recorded according to described instruction interacted or
Editor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811380150.4A CN109274922A (en) | 2018-11-19 | 2018-11-19 | A kind of Video Conference Controlling System based on speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811380150.4A CN109274922A (en) | 2018-11-19 | 2018-11-19 | A kind of Video Conference Controlling System based on speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109274922A true CN109274922A (en) | 2019-01-25 |
Family
ID=65190087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811380150.4A Pending CN109274922A (en) | 2018-11-19 | 2018-11-19 | A kind of Video Conference Controlling System based on speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109274922A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110300001A (en) * | 2019-05-21 | 2019-10-01 | 深圳壹账通智能科技有限公司 | Conference audio control method, system, equipment and computer readable storage medium |
CN111556279A (en) * | 2020-05-22 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Monitoring method and communication method of instant session |
CN111833876A (en) * | 2020-07-14 | 2020-10-27 | 科大讯飞股份有限公司 | Conference speech control method, system, electronic device and storage medium |
WO2020248713A1 (en) * | 2019-06-12 | 2020-12-17 | 中兴通讯股份有限公司 | Method and mcu for orderly controlling conference terminal to speak, and storage medium |
CN114509157A (en) * | 2020-11-17 | 2022-05-17 | 丰田自动车株式会社 | Information processing system, information processing method, and program |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102843543A (en) * | 2012-09-17 | 2012-12-26 | 华为技术有限公司 | Video conferencing reminding method, device and video conferencing system |
CN103581608A (en) * | 2012-07-20 | 2014-02-12 | Polycom通讯技术(北京)有限公司 | Spokesman detecting system, spokesman detecting method and audio/video conference system |
US20140099075A1 (en) * | 2012-01-16 | 2014-04-10 | Huawei Technologies Co., Ltd. | Conference recording method and conference system |
CN105516642A (en) * | 2015-12-14 | 2016-04-20 | 广东亿迅科技有限公司 | Video conference control system and method based on interactive voice response |
US20170006162A1 (en) * | 2011-04-29 | 2017-01-05 | Crestron Electronics, Inc. | Conference system including automated equipment setup |
CN106683671A (en) * | 2016-12-21 | 2017-05-17 | 深圳启益新科技有限公司 | Center control voice interactive control system and control method |
CN107249116A (en) * | 2017-08-09 | 2017-10-13 | 成都全云科技有限公司 | Noise echo eliminating device based on video conference |
CN206865475U (en) * | 2017-06-29 | 2018-01-09 | 安徽听见科技有限公司 | A kind of intelligent meeting system |
CN206921471U (en) * | 2017-06-16 | 2018-01-23 | 青岛爱上办公集成有限公司 | A kind of intelligent meeting control system based on speech recognition |
CN108347536A (en) * | 2018-02-06 | 2018-07-31 | 北京智能管家科技有限公司 | Record system |
-
2018
- 2018-11-19 CN CN201811380150.4A patent/CN109274922A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170006162A1 (en) * | 2011-04-29 | 2017-01-05 | Crestron Electronics, Inc. | Conference system including automated equipment setup |
US20140099075A1 (en) * | 2012-01-16 | 2014-04-10 | Huawei Technologies Co., Ltd. | Conference recording method and conference system |
CN103581608A (en) * | 2012-07-20 | 2014-02-12 | Polycom通讯技术(北京)有限公司 | Spokesman detecting system, spokesman detecting method and audio/video conference system |
CN102843543A (en) * | 2012-09-17 | 2012-12-26 | 华为技术有限公司 | Video conferencing reminding method, device and video conferencing system |
CN105516642A (en) * | 2015-12-14 | 2016-04-20 | 广东亿迅科技有限公司 | Video conference control system and method based on interactive voice response |
CN106683671A (en) * | 2016-12-21 | 2017-05-17 | 深圳启益新科技有限公司 | Center control voice interactive control system and control method |
CN206921471U (en) * | 2017-06-16 | 2018-01-23 | 青岛爱上办公集成有限公司 | A kind of intelligent meeting control system based on speech recognition |
CN206865475U (en) * | 2017-06-29 | 2018-01-09 | 安徽听见科技有限公司 | A kind of intelligent meeting system |
CN107249116A (en) * | 2017-08-09 | 2017-10-13 | 成都全云科技有限公司 | Noise echo eliminating device based on video conference |
CN108347536A (en) * | 2018-02-06 | 2018-07-31 | 北京智能管家科技有限公司 | Record system |
Non-Patent Citations (1)
Title |
---|
郑广宁;车四四;魏永静;刘鸿雁;何子亨;: "基于人工智能的电视会议自主控制系统", 电力信息与通信技术, no. 08 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110300001A (en) * | 2019-05-21 | 2019-10-01 | 深圳壹账通智能科技有限公司 | Conference audio control method, system, equipment and computer readable storage medium |
CN110300001B (en) * | 2019-05-21 | 2022-03-15 | 深圳壹账通智能科技有限公司 | Conference audio control method, system, device and computer readable storage medium |
WO2020248713A1 (en) * | 2019-06-12 | 2020-12-17 | 中兴通讯股份有限公司 | Method and mcu for orderly controlling conference terminal to speak, and storage medium |
CN111556279A (en) * | 2020-05-22 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Monitoring method and communication method of instant session |
CN111833876A (en) * | 2020-07-14 | 2020-10-27 | 科大讯飞股份有限公司 | Conference speech control method, system, electronic device and storage medium |
CN114509157A (en) * | 2020-11-17 | 2022-05-17 | 丰田自动车株式会社 | Information processing system, information processing method, and program |
CN114509157B (en) * | 2020-11-17 | 2024-04-05 | 丰田自动车株式会社 | Information processing system, information processing method, and program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109274922A (en) | A kind of Video Conference Controlling System based on speech recognition | |
CN105390136B (en) | Vehicle arrangement control device and method for user's adaptive type service | |
US8204759B2 (en) | Social analysis in multi-participant meetings | |
JP5381988B2 (en) | Dialogue speech recognition system, dialogue speech recognition method, and dialogue speech recognition program | |
WO2021051506A1 (en) | Voice interaction method and apparatus, computer device and storage medium | |
CN110136749A (en) | The relevant end-to-end speech end-point detecting method of speaker and device | |
US20130211826A1 (en) | Audio Signals as Buffered Streams of Audio Signals and Metadata | |
CN109036412A (en) | voice awakening method and system | |
EP4394761A1 (en) | Audio signal processing method and apparatus, electronic device, and storage medium | |
CN116417003A (en) | Voice interaction system, method, electronic device and storage medium | |
CN104766608A (en) | Voice control method and voice control device | |
CN103377651A (en) | Device and method for automatic voice synthesis | |
EP2763136B1 (en) | Method and system for obtaining relevant information from a voice communication | |
CN106067996A (en) | Voice reproduction method, voice dialogue device | |
CN109215642A (en) | Processing method, device and the electronic equipment of man-machine conversation | |
CN115735357A (en) | Voting questions for teleconference discussion | |
CN110602334A (en) | Intelligent outbound method and system based on man-machine cooperation | |
CN112700767B (en) | Man-machine conversation interruption method and device | |
CN114360485B (en) | Voice processing method, system, device and medium | |
US11996114B2 (en) | End-to-end time-domain multitask learning for ML-based speech enhancement | |
KR102592613B1 (en) | Automatic interpretation server and method thereof | |
CN112634879B (en) | Voice conference management method, device, equipment and medium | |
Li et al. | Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech | |
CN116527840B (en) | Live conference intelligent subtitle display method and system based on cloud edge collaboration | |
CN115552517A (en) | Non-hotword preemption of automated assistant response presentations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190125 |
|
WD01 | Invention patent application deemed withdrawn after publication |