CN103530432A - Conference recorder with speech extracting function and speech extracting method - Google Patents
Conference recorder with speech extracting function and speech extracting method Download PDFInfo
- Publication number
- CN103530432A CN103530432A CN201310439113.7A CN201310439113A CN103530432A CN 103530432 A CN103530432 A CN 103530432A CN 201310439113 A CN201310439113 A CN 201310439113A CN 103530432 A CN103530432 A CN 103530432A
- Authority
- CN
- China
- Prior art keywords
- speaker
- voice
- module
- voice segments
- segments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a conference recorder with a speaker speech extracting function. The conference recorder with the speaker speech extracting function comprises a main control module, a recording and playback module, a removable storage module, an interaction and display module and a speaker speech processing module, wherein the speaker speech processing module comprises a speaker segmenting module and a speaker clustering module. The main control module transmits a conference speech stream to the speaker segmenting module, and the speaker segmenting module detects speaker changing points in the speech stream and segments the speech stream into a plurality of speech sections according to the changing points; the speaker clustering module carries out speaker clustering on the segmented speech sections through a spectral clustering method, the speech sections of the same speakers are jointed together in sequence, and thus the number of the speakers and the speech of each speaker are obtained. The conference recorder and the speech extracting method are capable of automatically extracting the speech of each speaker from the conference speech, comprehensive in function and convenient to use.
Description
Technical field
The present invention relates to field of audio processing, particularly a kind of minutes device and voice extracting method with voice abstraction function.
Background technology
Minutes device in the market just has the functions such as simple recording, playback, unloading, does not have speaker's voice content to analyze and the function of understanding.User, when affected minutes, if need to gather and processing for some specific speaker's speeches, must hear out whole recording, and whether manually identify is same speaker.In order to save time, can there is the risk of missing useful information again in fast-forward play.By manual, speech data being marked and extracted, concerning user, is very inconvenient.
Therefore, the functions such as people wish minutes device except can record, playback, can also carry out content analysis and understanding to minutes content, wish that especially minutes device can automatically extract each speaker's voice according to conference voice data from all participants.
Summary of the invention
The shortcoming that the object of the invention is to overcome prior art, with not enough, provides a kind of minutes device with voice abstraction function, and it not only has recording, playback, unloading function, but also can automatically extract each speaker's voice.
Another object of the present invention is to provide a kind of voice extracting method, it can be analyzed speaker's number and each speaker's voice are classified.
Object of the present invention realizes by following technical scheme: a kind of minutes device with voice abstraction function, comprise main control module, recording and playback module, removable memory module, mutual and display module, also comprise speaker's speech processing module, speaker's speech processing module comprises speaker and cuts apart module and speaker clustering module, wherein
Speaker is cut apart module: main control module is cut apart module by meeting sound flow transmission to speaker, and speaker is cut apart module and detected speaker in above-mentioned voice flow and change a little, according to these changes voice flow of naming a person for a particular job, is divided into a plurality of voice segments;
Speaker clustering module, utilizes spectral clustering to cut apart module segmentation voice segments out to speaker and carries out speaker clustering, and identical speaker's voice segments is stitched together in order, obtains speaker's number and each speaker's voice.
Described speaker is cut apart module, comprises that quiet section and voice segments detection module, audio feature extraction module, speaker change detection of change-point module and voice segments is cut apart module, wherein
Quiet section and voice segments detection module, utilize the quiet detection algorithm based on threshold judgement from the above-mentioned voice flow reading in, to find out quiet section and voice segments;
Audio feature extraction module, is spliced into a long voice segments in order by above-mentioned voice segments, and extracts audio frequency characteristics from long voice segments;
Speaker changes detection of change-point module, utilizes said extracted audio frequency characteristics out, and according to bayesian information criterion, the similarity in the long voice segments of judgement between adjacent data window detects speaker and changes a little;
Voice segments is cut apart module, according to above-mentioned speaker, changes a little, and voice flow is divided into a plurality of voice segments, and each voice segments only comprises a speaker.
In quiet section and voice segments detection module, the step that the described quiet detection algorithm based on threshold judgement comprises following order:
(1) to the voice flow reading in, divide frame, and calculate the energy of every frame voice, obtain the energy feature vector of voice flow;
(2) calculating energy thresholding;
(3) by the energy of every frame voice and energy threshold comparison, lower than the frame of energy threshold, be mute frame, otherwise be speech frame, adjacent mute frame is spliced into one quiet section in order, adjacent speech frame is spliced into a voice segments in order.
In audio feature extraction module, described audio frequency characteristics comprises Mel frequency cepstral coefficient (Mel Frequency Cepstral Coefficients, MFCCs) and first order difference (Delta-MFCCs) thereof.Mel frequency cepstral coefficient and first order difference thereof are known features in the industry.
Described recording and playback module, comprise microphone, loudspeaker and audio processing chip.
Described mutual and display module, comprises a touch-screen and control circuit thereof, provides and has the User Interface of controlling function, utilizes touch-screen and user interactions.
Described removable memory module, adopts SD card to realize the storage to data.
Another object of the present invention realizes by following technical scheme: a kind of voice extracting method, and the step that comprises following order:
(1) read in voice flow: read in the voice flow that records many speakers voice;
(2) by speaker's speech processing module, the voice flow reading in is processed, wherein speaker's speech processing module comprises that speaker cuts apart module and speaker clustering module;
(3) by speaker, cut apart module and detect speaker in above-mentioned voice flow and change a little, according to these changes voice flow of naming a person for a particular job, be divided into a plurality of voice segments;
(4) speaker clustering module is utilized spectral clustering to cut apart module segmentation voice segments out to speaker to carry out speaker clustering, identical speaker's voice segments is stitched together in order, obtain speaker's number and each speaker's voice.
Described step (3) specifically comprises following steps:
A, speaker are cut apart module and are comprised quiet section and voice segments detection module, audio feature extraction module, speaker change detection of change-point module and voice segments is cut apart module;
B, quiet section and the quiet detection algorithm of voice segments detection module utilization based on threshold judgement are found out quiet section and voice segments from the above-mentioned voice flow reading in;
C, audio feature extraction module, be spliced into a long voice segments in order by above-mentioned voice segments, and extract audio frequency characteristics from long voice segments;
D, speaker change detection of change-point module, utilize said extracted audio frequency characteristics out, and according to bayesian information criterion, the similarity in the long voice segments of judgement between adjacent data window detects speaker and changes a little;
E, voice segments are cut apart module, according to above-mentioned speaker, change a little, and voice flow is divided into a plurality of voice segments, and each voice segments only comprises a speaker.
In step b, the step that the described quiet detection algorithm based on threshold judgement comprises following order:
(1) to the voice flow reading in, divide frame, and calculate the energy of every frame voice, obtain the energy feature vector of voice flow;
(2) calculating energy thresholding;
(3) by the energy of every frame voice and energy threshold comparison, lower than the frame of energy threshold, be mute frame, otherwise be speech frame, adjacent mute frame is spliced into one quiet section in order, adjacent speech frame is spliced into a voice segments in order;
In step c, described audio frequency characteristics comprises Mel frequency cepstral coefficient and first order difference thereof.
Compared with prior art, tool has the following advantages and beneficial effect in the present invention:
A, easy to use, save time: after minutes device of the present invention gathers speech data by recording and playback module, can automatically process voice data, each speaker's difference is come, and each speaker's voice are sorted out, stored, user can directly select according to the needs of oneself speaker dependent and speaker dependent's voice.
B, complete function: minutes device of the present invention has the function of general minutes device simultaneously, as recording, playback, unloading, the speech data that its removable memory module can obtain other places in addition copies these minutes device to and carries out analyzing and processing.
Accompanying drawing explanation
Fig. 1 is a kind of structured flowchart with the minutes device of speaker's voice abstraction function of the present invention;
Fig. 2 is the workflow diagram of minutes device described in Fig. 1;
Fig. 3 is the process flow diagram of voice extracting method of the present invention.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, as Fig. 1,2, a kind of minutes device with speaker's voice abstraction function, as Fig. 1, comprise main control module, recording and playback module, removable memory module, mutual and display module, also comprise speaker's speech processing module, speaker's speech processing module comprises speaker and cuts apart module and speaker clustering module, wherein
Recording and playback module, comprise microphone, loudspeaker and audio processing chip;
Mutual and display module, comprises a touch-screen and control circuit thereof, provides and has the User Interface of controlling function, utilizes touch-screen and user interactions;
Removable memory module, adopts SD card to realize the storage to data;
Recording and playback module, be responsible for typing and the broadcasting of voice data;
Main control module, sends instruction, controls the co-ordination between modules, and main control module adopts the micro computer processing platform based on Samsung S5PV210 processor, carries embedded Linux system;
Speaker is cut apart module, main control module transfers to speaker and cuts apart module reading in the voice flow that records a plurality of speaker's voice, speaker is cut apart module and is detected speaker in above-mentioned voice flow and change a little, according to these changes voice flow of naming a person for a particular job, be divided into a plurality of voice segments, speaker is cut apart module and is specifically comprised that quiet section and voice segments detection module, audio feature extraction module, speaker change detection of change-point module and voice segments is cut apart module, wherein
Quiet section and voice segments detection module, utilize the quiet detection algorithm based on threshold judgement from the above-mentioned voice flow reading in, to find out quiet section and voice segments, the step that wherein the quiet detection algorithm based on threshold judgement comprises following order:
(1) to the voice flow reading in, divide frame, and calculate the energy of every frame voice, obtain the energy feature vector of voice flow;
(2) calculating energy thresholding;
(3) by the energy of every frame voice and energy threshold comparison, lower than the frame of energy threshold, be mute frame, otherwise be speech frame, adjacent mute frame is spliced into one quiet section in order, adjacent speech frame is spliced into a voice segments in order;
Audio feature extraction module, is spliced into a long voice segments in order by above-mentioned voice segments, and extracts audio frequency characteristics from long voice segments, and audio frequency characteristics comprises Mel frequency cepstral coefficient and first order difference thereof;
Speaker changes in detection of change-point module, and the described method of utilizing bayesian information criterion to determine that speaker changes a little specifically comprises the following steps:
(1) each voice segments obtaining through quiet detection is spliced into a long voice segments in order, length voice segments is cut into data window, window length is 2 seconds, and it is 0.1 second that window moves.To each data window, divide frame, frame length is 32 milliseconds, it is 16 milliseconds that frame moves, from each frame voice signal, extract MFCCs and Delta-MFCCs feature, the dimension M of MFCCs and Delta-MFCCs gets 12, the feature of each data window forms an eigenmatrix F, and the dimension d=2M of eigenmatrix F is 24;
(2) calculate the BIC distance between two adjacent data windows (x and y), BIC is as follows apart from computing formula:
Wherein, z merges the data window obtaining afterwards, n by data window x and y
xand n
yrespectively the frame number of data window x and y, F
x, F
yand F
zrespectively the eigenmatrix of data window x, y and z, cov (F
x), cov (F
y) and cov (F
z) be respectively eigenmatrix F
x, F
yand F
zcovariance matrix, it is that penalty coefficient and experiment value are 2.0 that det () represents to ask determinant of a matrix value, α;
(3) if BIC distance, delta BIC is greater than zero, these two data windows are regarded as belonging to two different speakers (being to exist speaker to change a little between them), otherwise these two data windows are regarded as belonging to same speaker and they are merged;
(4) data window that constantly slides judges whether two BIC between adjacent data window distances are greater than zero, and preserves speaker and change a little, until the BIC distance between all adjacent data windows of long voice segments has all been judged;
Voice segments is cut apart module, according to above-mentioned speaker, changes a little, and voice flow is divided into a plurality of voice segments, and each voice segments only comprises a speaker;
In speaker clustering module, described Spectral Clustering specifically comprises the following steps:
(1) from every frame voice, extract the audio frequency characteristics of Mel frequency cepstral coefficient and first order difference thereof, the dimension M of MFCCs and Delta-MFCCs, the feature of each voice segments forms an eigenmatrix F
j, eigenmatrix F
jdimension d=2M;
(2) according to each eigenmatrix F
jobtain the eigenmatrix set F={F of all voice segments to be clustered
1..., F
j, J is the total number of voice segments, then constructs affine matrix A ∈ R according to F
j * J, (i, j) individual elements A of A
ijbe defined as follows:
Wherein, d (F
i, F
j) be eigenmatrix F
iwith F
jbetween Euclidean distance, σ
ior σ
jrepresent scale parameter, be defined as i or j eigenmatrix F
ior F
jand the variance of the Euclidean distance vector between other J-1 eigenmatrix, described T represents the totalframes that multi-conference voice are divided into, i, j represent the numbering of voice segments;
(3) structure diagonal matrix D, its (i, i) individual element equals the capable all elements sum of i of affine matrix A, then according to matrix D and the normalized affine matrix L=D of A structure
-1/2aD
-1/2;
(4) calculate the front K of affine matrix L
maxthe eigenwert of individual maximum
and eigenwert vector
v wherein
kfor column vector and
according to the difference between adjacent feature value, estimate speaker's number K:
According to the speaker's number K estimating, structural matrix V=[v
1, v
2..., v
k] ∈ R
j * K, in formula: 1≤k≤K
max;
(5) every a line of normalization matrix V, obtains matrix Y ∈ R
j * K, (j, k) individual element Y of Y
jk:
(6) each trade in matrix Y is made to space R
kin a point, utilize K mean algorithm to be clustered into K class;
(7) when the j of matrix Y capable by cluster in k class, eigenmatrix F
jcorresponding voice segments is judged to i.e. k the speaker of k class;
(8), according to above-mentioned cluster result, obtain speaker's number, each speaker's voice duration and each speaker's voice hop count.
As Fig. 2, a kind of workflow of the minutes device with speaker's voice abstraction function is as follows:
1) minutes device start, carries out system initialization;
2), by mutual and display module, minutes device shows interactive interface;
3) user selects by interactive interface the action of whether recording:
If recording, master control module controls recording starts recording with playback module, and recording material is stored in removable memory module, after finishing, returns to interactive interface;
If do not record, user selects to record file by interactive interface, then master control module controls speaker speech processing module is that speaker is cut apart module and speaker clustering module, to speaker's voice cut apart, clustering processing, extract each speaker's voice;
4) then interactive interface prompting user selects whether to play raw tone:
If so, play raw tone;
If not, further prompting certain speaker's voice whether: if so, select this person and play its voice; If not, turn back to interactive interface.
An extracting method, as Fig. 3, the step that comprises following order:
(1) read in voice flow: read in the voice flow that records many speakers voice;
(2) by speaker's speech processing module, the voice flow reading in is processed, wherein speaker's speech processing module comprises that speaker cuts apart module and speaker clustering module;
(3) by speaker, cut apart module and detect speaker in above-mentioned voice flow and change a little, according to these changes voice flow of naming a person for a particular job, be divided into a plurality of voice segments, specifically comprise following steps:
A, speaker are cut apart module and are comprised quiet section and voice segments detection module, audio feature extraction module, speaker change detection of change-point module and voice segments is cut apart module;
B, quiet section and the quiet detection algorithm of voice segments detection module utilization based on threshold judgement are found out quiet section and voice segments, the step that wherein the quiet detection algorithm based on threshold judgement comprises following order from the above-mentioned voice flow reading in:
(1) to the voice flow reading in, divide frame, and calculate the energy of every frame voice, obtain the energy feature vector of voice flow;
(2) calculating energy thresholding;
(3) by the energy of every frame voice and energy threshold comparison, lower than the frame of energy threshold, be mute frame, otherwise be speech frame, adjacent mute frame is spliced into one quiet section in order, adjacent speech frame is spliced into a voice segments in order;
C, audio feature extraction module, be spliced into a long voice segments in order by above-mentioned voice segments, and extract audio frequency characteristics from long voice segments, and audio frequency characteristics comprises Mel frequency cepstral coefficient and first order difference thereof;
D, speaker change detection of change-point module, utilize said extracted audio frequency characteristics out, and according to bayesian information criterion, the similarity in the long voice segments of judgement between adjacent data window detects speaker and changes a little;
E, voice segments are cut apart module, according to above-mentioned speaker, change a little, and voice flow is divided into a plurality of voice segments, and each voice segments only comprises a speaker;
(4) speaker clustering module is utilized spectral clustering to cut apart module segmentation voice segments out to speaker to carry out speaker clustering, identical speaker's voice segments is stitched together in order, obtain speaker's number and each speaker's voice.
Above-described embodiment is preferably embodiment of the present invention; but embodiments of the present invention are not restricted to the described embodiments; other any do not deviate from change, the modification done under Spirit Essence of the present invention and principle, substitutes, combination, simplify; all should be equivalent substitute mode, within being included in protection scope of the present invention.
Claims (10)
1. a minutes device with voice abstraction function, comprise main control module, recording and playback module, removable memory module, mutual and display module, it is characterized in that, also comprise speaker's speech processing module, speaker's speech processing module comprises speaker and cuts apart module and speaker clustering module, wherein
Speaker is cut apart module: main control module transfers to speaker by meeting voice flow and cuts apart module, and speaker is cut apart module and detected speaker in above-mentioned conference voice stream and change a little, according to these changes voice flow of naming a person for a particular job, is divided into a plurality of voice segments;
Speaker clustering module, utilizes spectral clustering to cut apart module segmentation voice segments out to speaker and carries out speaker clustering, and identical speaker's voice segments is stitched together in order, obtains speaker's number and each speaker's voice.
2. the minutes device with voice abstraction function according to claim 1, it is characterized in that, described speaker is cut apart module, comprises that quiet section and voice segments detection module, audio feature extraction module, speaker change detection of change-point module and voice segments is cut apart module, wherein
Quiet section and voice segments detection module, utilize the quiet detection algorithm based on threshold judgement from the above-mentioned voice flow reading in, to find out quiet section and voice segments;
Audio feature extraction module, is spliced into a long voice segments in order by above-mentioned voice segments, and extracts audio frequency characteristics from long voice segments;
Speaker changes detection of change-point module, utilizes said extracted audio frequency characteristics out, and according to bayesian information criterion, the similarity in the long voice segments of judgement between adjacent data window detects speaker and changes a little;
Voice segments is cut apart module, according to above-mentioned speaker, changes a little, and voice flow is divided into a plurality of voice segments, and each voice segments only comprises a speaker.
3. the minutes device with voice abstraction function according to claim 2, is characterized in that, in quiet section and voice segments detection module, and the step that the described quiet detection algorithm based on threshold judgement comprises following order:
(1) to the voice flow reading in, divide frame, and calculate the energy of every frame voice, obtain the energy feature vector of voice flow;
(2) calculating energy thresholding;
(3) by the energy of every frame voice and energy threshold comparison, lower than the frame of energy threshold, be mute frame, otherwise be speech frame, adjacent mute frame is spliced into one quiet section in order, adjacent speech frame is spliced into a voice segments in order.
4. the minutes device with voice abstraction function according to claim 2, is characterized in that, in audio feature extraction module, described audio frequency characteristics comprises Mel frequency cepstral coefficient and first order difference thereof.
5. the minutes device with voice abstraction function according to claim 1, is characterized in that described recording and playback module comprise microphone, loudspeaker and audio processing chip.
6. the minutes device with voice abstraction function according to claim 1, it is characterized in that, described mutual and display module, comprises a touch-screen and control circuit thereof, provide and there is the User Interface of controlling function, utilize touch-screen and user interactions.
7. the minutes device with voice abstraction function according to claim 1, is characterized in that, described removable memory module adopts SD card to realize the storage to data.
8. a voice extracting method, the step that comprises following order:
(1) read in voice flow: read in the voice flow that records many speakers voice;
(2) by speaker's speech processing module, the voice flow reading in is processed, wherein speaker's speech processing module comprises that speaker cuts apart module and speaker clustering module;
(3) by speaker, cut apart module and detect speaker in above-mentioned voice flow and change a little, according to these changes voice flow of naming a person for a particular job, be divided into a plurality of voice segments;
(4) speaker clustering module is utilized spectral clustering to cut apart module segmentation voice segments out to speaker to carry out speaker clustering, identical speaker's voice segments is stitched together in order, obtain speaker's number and each speaker's voice.
9. voice extracting method according to claim 8, is characterized in that, described step (3) specifically comprises following steps:
A, speaker are cut apart module and are comprised quiet section and voice segments detection module, audio feature extraction module, speaker change detection of change-point module and voice segments is cut apart module;
B, quiet section and the quiet detection algorithm of voice segments detection module utilization based on threshold judgement are found out quiet section and voice segments from the above-mentioned voice flow reading in;
C, audio feature extraction module, be spliced into a long voice segments in order by above-mentioned voice segments, and extract audio frequency characteristics from long voice segments;
D, speaker change detection of change-point module, utilize said extracted audio frequency characteristics out, and according to bayesian information criterion, the similarity in the long voice segments of judgement between adjacent data window detects speaker and changes a little;
E, voice segments are cut apart module, according to above-mentioned speaker, change a little, and voice flow is divided into a plurality of voice segments, and each voice segments only comprises a speaker.
10. voice extracting method according to claim 9, is characterized in that, in step b, and the step that the described quiet detection algorithm based on threshold judgement comprises following order:
(1) to the voice flow reading in, divide frame, and calculate the energy of every frame voice, obtain the energy feature vector of voice flow;
(2) calculating energy thresholding;
(3) by the energy of every frame voice and energy threshold comparison, lower than the frame of energy threshold, be mute frame, otherwise be speech frame, adjacent mute frame is spliced into one quiet section in order, adjacent speech frame is spliced into a voice segments in order;
In step c, described audio frequency characteristics comprises Mel frequency cepstral coefficient (Mel Frequency Cepstral Coefficients, MFCCs) and first order difference (Delta-MFCCs) thereof.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310439113.7A CN103530432A (en) | 2013-09-24 | 2013-09-24 | Conference recorder with speech extracting function and speech extracting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310439113.7A CN103530432A (en) | 2013-09-24 | 2013-09-24 | Conference recorder with speech extracting function and speech extracting method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103530432A true CN103530432A (en) | 2014-01-22 |
Family
ID=49932441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310439113.7A Pending CN103530432A (en) | 2013-09-24 | 2013-09-24 | Conference recorder with speech extracting function and speech extracting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103530432A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104021785A (en) * | 2014-05-28 | 2014-09-03 | 华南理工大学 | Method of extracting speech of most important guest in meeting |
CN104409080A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Voice end node detection method and device |
CN105161093A (en) * | 2015-10-14 | 2015-12-16 | 科大讯飞股份有限公司 | Method and system for determining the number of speakers |
CN105895102A (en) * | 2015-11-15 | 2016-08-24 | 乐视移动智能信息技术(北京)有限公司 | Recording editing method and recording device |
WO2016165346A1 (en) * | 2015-09-16 | 2016-10-20 | 中兴通讯股份有限公司 | Method and apparatus for storing and playing audio file |
CN106375182A (en) * | 2016-08-22 | 2017-02-01 | 腾讯科技(深圳)有限公司 | Voice communication method and device based on instant messaging application |
CN107886955A (en) * | 2016-09-29 | 2018-04-06 | 百度在线网络技术(北京)有限公司 | A kind of personal identification method, device and the equipment of voice conversation sample |
CN106610451B (en) * | 2016-12-23 | 2019-01-04 | 杭州电子科技大学 | Based on the extraction of the periodic signal fundamental frequency of cepstrum and Bayesian decision and matching process |
CN109599120A (en) * | 2018-12-25 | 2019-04-09 | 哈尔滨工程大学 | One kind being based on large-scale farming field factory mammal abnormal sound monitoring method |
CN109767757A (en) * | 2019-01-16 | 2019-05-17 | 平安科技(深圳)有限公司 | A kind of minutes generation method and device |
CN109960743A (en) * | 2019-01-16 | 2019-07-02 | 平安科技(深圳)有限公司 | Conference content differentiating method, device, computer equipment and storage medium |
CN110021302A (en) * | 2019-03-06 | 2019-07-16 | 厦门快商通信息咨询有限公司 | A kind of Intelligent office conference system and minutes method |
CN110197665A (en) * | 2019-06-25 | 2019-09-03 | 广东工业大学 | A kind of speech Separation and tracking for police criminal detection monitoring |
WO2019183904A1 (en) * | 2018-03-29 | 2019-10-03 | 华为技术有限公司 | Method for automatically identifying different human voices in audio |
CN110517667A (en) * | 2019-09-03 | 2019-11-29 | 龙马智芯(珠海横琴)科技有限公司 | A kind of method of speech processing, device, electronic equipment and storage medium |
CN110517694A (en) * | 2019-09-06 | 2019-11-29 | 北京清帆科技有限公司 | A kind of teaching scene voice conversion detection system |
CN110689906A (en) * | 2019-11-05 | 2020-01-14 | 江苏网进科技股份有限公司 | Law enforcement detection method and system based on voice processing technology |
CN110930984A (en) * | 2019-12-04 | 2020-03-27 | 北京搜狗科技发展有限公司 | Voice processing method and device and electronic equipment |
CN111883159A (en) * | 2020-08-05 | 2020-11-03 | 龙马智芯(珠海横琴)科技有限公司 | Voice processing method and device |
CN111968657A (en) * | 2020-08-17 | 2020-11-20 | 北京字节跳动网络技术有限公司 | Voice processing method and device, electronic equipment and computer readable medium |
CN112053691A (en) * | 2020-09-21 | 2020-12-08 | 广东迷听科技有限公司 | Conference assisting method and device, electronic equipment and storage medium |
CN112165599A (en) * | 2020-10-10 | 2021-01-01 | 广州科天视畅信息科技有限公司 | Automatic conference summary generation method for video conference |
CN112382282A (en) * | 2020-11-06 | 2021-02-19 | 北京五八信息技术有限公司 | Voice denoising processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
CN101211615A (en) * | 2006-12-31 | 2008-07-02 | 于柏泉 | Method, system and apparatus for automatic recording for specific human voice |
CN102543063A (en) * | 2011-12-07 | 2012-07-04 | 华南理工大学 | Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers |
CN102682760A (en) * | 2011-03-07 | 2012-09-19 | 株式会社理光 | Overlapped voice detection method and system |
CN102968986A (en) * | 2012-11-07 | 2013-03-13 | 华南理工大学 | Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics |
-
2013
- 2013-09-24 CN CN201310439113.7A patent/CN103530432A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
CN101211615A (en) * | 2006-12-31 | 2008-07-02 | 于柏泉 | Method, system and apparatus for automatic recording for specific human voice |
CN102682760A (en) * | 2011-03-07 | 2012-09-19 | 株式会社理光 | Overlapped voice detection method and system |
CN102543063A (en) * | 2011-12-07 | 2012-07-04 | 华南理工大学 | Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers |
CN102968986A (en) * | 2012-11-07 | 2013-03-13 | 华南理工大学 | Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104021785A (en) * | 2014-05-28 | 2014-09-03 | 华南理工大学 | Method of extracting speech of most important guest in meeting |
CN104409080B (en) * | 2014-12-15 | 2018-09-18 | 北京国双科技有限公司 | Sound end detecting method and device |
CN104409080A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Voice end node detection method and device |
WO2016165346A1 (en) * | 2015-09-16 | 2016-10-20 | 中兴通讯股份有限公司 | Method and apparatus for storing and playing audio file |
CN105161093B (en) * | 2015-10-14 | 2019-07-09 | 科大讯飞股份有限公司 | A kind of method and system judging speaker's number |
CN105161093A (en) * | 2015-10-14 | 2015-12-16 | 科大讯飞股份有限公司 | Method and system for determining the number of speakers |
WO2017080235A1 (en) * | 2015-11-15 | 2017-05-18 | 乐视控股(北京)有限公司 | Audio recording editing method and recording device |
CN105895102A (en) * | 2015-11-15 | 2016-08-24 | 乐视移动智能信息技术(北京)有限公司 | Recording editing method and recording device |
CN106375182A (en) * | 2016-08-22 | 2017-02-01 | 腾讯科技(深圳)有限公司 | Voice communication method and device based on instant messaging application |
CN106375182B (en) * | 2016-08-22 | 2019-08-27 | 腾讯科技(深圳)有限公司 | Voice communication method and device based on instant messaging application |
CN107886955A (en) * | 2016-09-29 | 2018-04-06 | 百度在线网络技术(北京)有限公司 | A kind of personal identification method, device and the equipment of voice conversation sample |
CN107886955B (en) * | 2016-09-29 | 2021-10-26 | 百度在线网络技术(北京)有限公司 | Identity recognition method, device and equipment of voice conversation sample |
CN106610451B (en) * | 2016-12-23 | 2019-01-04 | 杭州电子科技大学 | Based on the extraction of the periodic signal fundamental frequency of cepstrum and Bayesian decision and matching process |
WO2019183904A1 (en) * | 2018-03-29 | 2019-10-03 | 华为技术有限公司 | Method for automatically identifying different human voices in audio |
CN109599120A (en) * | 2018-12-25 | 2019-04-09 | 哈尔滨工程大学 | One kind being based on large-scale farming field factory mammal abnormal sound monitoring method |
CN109599120B (en) * | 2018-12-25 | 2021-12-07 | 哈尔滨工程大学 | Abnormal mammal sound monitoring method based on large-scale farm plant |
CN109767757A (en) * | 2019-01-16 | 2019-05-17 | 平安科技(深圳)有限公司 | A kind of minutes generation method and device |
CN109960743A (en) * | 2019-01-16 | 2019-07-02 | 平安科技(深圳)有限公司 | Conference content differentiating method, device, computer equipment and storage medium |
WO2020147407A1 (en) * | 2019-01-16 | 2020-07-23 | 平安科技(深圳)有限公司 | Conference record generation method and apparatus, storage medium and computer device |
CN110021302A (en) * | 2019-03-06 | 2019-07-16 | 厦门快商通信息咨询有限公司 | A kind of Intelligent office conference system and minutes method |
CN110197665A (en) * | 2019-06-25 | 2019-09-03 | 广东工业大学 | A kind of speech Separation and tracking for police criminal detection monitoring |
CN110517667A (en) * | 2019-09-03 | 2019-11-29 | 龙马智芯(珠海横琴)科技有限公司 | A kind of method of speech processing, device, electronic equipment and storage medium |
CN110517694A (en) * | 2019-09-06 | 2019-11-29 | 北京清帆科技有限公司 | A kind of teaching scene voice conversion detection system |
CN110689906A (en) * | 2019-11-05 | 2020-01-14 | 江苏网进科技股份有限公司 | Law enforcement detection method and system based on voice processing technology |
CN110930984A (en) * | 2019-12-04 | 2020-03-27 | 北京搜狗科技发展有限公司 | Voice processing method and device and electronic equipment |
CN111883159A (en) * | 2020-08-05 | 2020-11-03 | 龙马智芯(珠海横琴)科技有限公司 | Voice processing method and device |
CN111968657A (en) * | 2020-08-17 | 2020-11-20 | 北京字节跳动网络技术有限公司 | Voice processing method and device, electronic equipment and computer readable medium |
CN112053691A (en) * | 2020-09-21 | 2020-12-08 | 广东迷听科技有限公司 | Conference assisting method and device, electronic equipment and storage medium |
CN112165599A (en) * | 2020-10-10 | 2021-01-01 | 广州科天视畅信息科技有限公司 | Automatic conference summary generation method for video conference |
CN112382282A (en) * | 2020-11-06 | 2021-02-19 | 北京五八信息技术有限公司 | Voice denoising processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103530432A (en) | Conference recorder with speech extracting function and speech extracting method | |
CN105405439B (en) | Speech playing method and device | |
CN107274916B (en) | Method and device for operating audio/video file based on voiceprint information | |
Heittola et al. | Supervised model training for overlapping sound events based on unsupervised source separation | |
US9514751B2 (en) | Speech recognition device and the operation method thereof | |
US8793127B2 (en) | Method and apparatus for automatically determining speaker characteristics for speech-directed advertising or other enhancement of speech-controlled devices or services | |
Temko et al. | Acoustic event detection in meeting-room environments | |
US8867891B2 (en) | Video concept classification using audio-visual grouplets | |
US7263485B2 (en) | Robust detection and classification of objects in audio using limited training data | |
EP2642427A2 (en) | Video concept classification using temporally-correlated grouplets | |
US20130089304A1 (en) | Video concept classification using video similarity scores | |
CN101470897B (en) | Sensitive film detection method based on audio/video amalgamation policy | |
US20060224438A1 (en) | Method and device for providing information | |
Imoto | Introduction to acoustic event and scene analysis | |
JP2004229283A (en) | Method for identifying transition of news presenter in news video | |
KR100792016B1 (en) | Apparatus and method for character based video summarization by audio and video contents analysis | |
Lailler et al. | Semi-supervised and unsupervised data extraction targeting speakers: From speaker roles to fame? | |
CN104021785A (en) | Method of extracting speech of most important guest in meeting | |
CN103559882A (en) | Meeting presenter voice extracting method based on speaker division | |
WO2023088448A1 (en) | Speech processing method and device, and storage medium | |
CN107025913A (en) | A kind of way of recording and terminal | |
JP4759745B2 (en) | Video classification device, video classification method, video classification program, and computer-readable recording medium | |
CN109997186B (en) | Apparatus and method for classifying acoustic environments | |
CN102509548B (en) | Audio indexing method based on multi-distance sound sensor | |
Ellis et al. | Accessing minimal-impact personal audio archives |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140122 |