WO2017113680A1 - 声纹认证处理方法及装置 - Google Patents
声纹认证处理方法及装置 Download PDFInfo
- Publication number
- WO2017113680A1 WO2017113680A1 PCT/CN2016/088435 CN2016088435W WO2017113680A1 WO 2017113680 A1 WO2017113680 A1 WO 2017113680A1 CN 2016088435 W CN2016088435 W CN 2016088435W WO 2017113680 A1 WO2017113680 A1 WO 2017113680A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voiceprint
- model
- gender
- feature vector
- voice
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 80
- 238000000605 extraction Methods 0.000 claims abstract description 33
- 238000004458 analytical method Methods 0.000 claims abstract description 26
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 239000013598 vector Substances 0.000 claims description 101
- 238000003672 processing method Methods 0.000 claims description 25
- 239000000284 extract Substances 0.000 claims description 17
- 238000002372 labelling Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/038—Vector quantisation, e.g. TwinVQ audio
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0861—Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
Definitions
- the present application relates to the field of voiceprint authentication technologies, and in particular, to a voiceprint authentication processing method and apparatus.
- VPR Voiceprint Recognition
- Voiceprint recognition can confirm whether a certain voice is spoken by a designated person, for example, attendance punching, or confirmation of the user's voice when banking transactions. Before the voiceprint recognition, the voiceprint of the speaker needs to be modeled first. This is the so-called “training” or “learning” process.
- the current training process of voiceprint recognition is to train and identify voiceprints through a common model, and the accuracy is not high.
- the present application aims to solve at least one of the technical problems in the related art to some extent.
- the first object of the present application is to propose a voiceprint authentication processing method, which establishes a gender-based voiceprint authentication processing model in order to improve the efficiency and accuracy of voiceprint authentication.
- a second object of the present application is to provide a voiceprint authentication processing apparatus.
- a third object of the present invention is to provide a storage medium.
- a fourth object of the present invention is to provide a voiceprint authentication processing apparatus.
- the first aspect of the present application provides a voiceprint authentication processing method, including: applying a mixed-sex deep neural network DNN voiceprint baseline system, and extracting a first feature vector of each voice in the training set;
- the first feature vector of each voice and the pre-labeled gender tag train the gender classifier; respectively, according to the voice data of different genders in the training set, respectively training DNN models of different genders; DNN models according to different genders and the training
- the speech data of different genders are concentrated, and the unified background model, eigenvector extraction model and probabilistic linear discriminant analysis model of different genders are trained respectively.
- the voiceprint authentication processing method of the embodiment of the present application by applying a mixed gender deep neural network DNN voiceprint baseline a system, extracting a first feature vector of each voice in the training set; training a gender classifier according to the first feature vector of each voice and a pre-labeled gender tag; and training differently according to voice data of different genders in the training set
- the DNN model of gender; according to the DNN model of different genders and the speech data of different genders in the training set, the unified background model, the feature vector extraction model and the probability linear discriminant analysis model of different genders are respectively trained. Therefore, a gender-based voiceprint authentication processing model is established to improve the efficiency and accuracy of voiceprint authentication.
- the second aspect of the present application provides a voiceprint authentication processing apparatus, including: an extraction module, configured to apply a mixed-sex deep neural network DNN voiceprint baseline system, and extract each voice of the training set. a feature vector; a generating module, configured to train a gender classifier according to the first feature vector of each voice and a pre-labeled gender tag; the first training module is configured to respectively perform voice data according to different genders in the training set Training DNN models of different genders; a second training module for training unified gender models of different genders, feature vector extraction models, and probability linear discriminant analysis according to different gender DNN models and different genders of speech data in the training set model.
- the voiceprint authentication processing apparatus of the embodiment of the present application extracts a first feature vector of each voice in the training set by applying a depth neural network DNN voiceprint baseline system of the mixed gender; according to the first feature vector of each voice and the advance
- the labeled gender tag trains the gender classifier; according to the gender data of the different genders in the training set, the DNN models of different genders are respectively trained; the DNN models of different genders and the voice data of different genders in the training set are respectively trained to train different genders.
- Unified background model, feature vector extraction model, and probabilistic linear discriminant analysis model Therefore, a gender-based voiceprint authentication processing model is established to improve the efficiency and accuracy of voiceprint authentication.
- a storage medium configured to store an application for performing a voiceprint authentication processing method according to the first aspect of the present invention.
- a voiceprint authentication processing apparatus includes: one or more processors; a memory; one or more modules, wherein the one or more modules are stored in the memory, When executed by the one or more processors, the following operations are performed: applying a mixed-sex deep neural network DNN voiceprint baseline system, extracting a first feature vector of each voice in the training set; according to the first of each voice
- the feature vector and the pre-labeled gender tag train the gender classifier; respectively, according to the speech data of different genders in the training set, respectively training the DNN models of different genders; according to the DNN models of different genders and the voice data of different genders in the training set, respectively Training uniform background models of different genders, feature vector extraction models, and probabilistic linear discriminant analysis models.
- FIG. 1 is a flowchart of a voiceprint authentication processing method according to an embodiment of the present application.
- FIG. 2 is a schematic diagram of the generation of a gender classifier
- FIG. 3 is a schematic diagram of generating a male voiceprint authentication processing model
- FIG. 4 is a schematic diagram of generating a female voiceprint authentication processing model
- FIG. 5 is a flowchart of a voiceprint authentication processing method according to another embodiment of the present application.
- FIG. 6 is a flowchart of a voiceprint authentication processing method according to another embodiment of the present application.
- FIG. 7 is a schematic structural diagram of a voiceprint authentication processing apparatus according to an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application.
- FIG. 1 is a flow chart of a voiceprint authentication processing method according to an embodiment of the present application.
- the voiceprint authentication processing method includes:
- Step 101 Apply a gender-based deep neural network DNN voiceprint baseline system to extract a first feature vector of each speech in the training set.
- Step 102 Train the gender classifier according to the first feature vector of each voice and the pre-labeled gender tag.
- the gender-based voiceprint authentication processing model it is first necessary to apply a gender-based deep neural network DNN voiceprint baseline system to generate a gender training gender classifier, so as to apply the training gender classifier to identify the gender of the input voice as an input.
- the sound is assigned a gender tag.
- Figure 2 is a schematic diagram of the generation of the gender classifier. See Figure 2 for how to apply the mixed gender DNN voiceprint baseline system to generate the gender classifier, as follows:
- a training set containing a plurality of voices is preset, and each voice data in the training set is pre-labeled with corresponding gender information.
- gender corresponding to the first voice data is male data
- gender corresponding to the second voice data is female. data.
- Each speech data in the training set is input into the DNN voiceprint baseline system of the mixed gender, and the DNN voiceprint baseline system is used to perform data processing on each piece of speech data, and the first feature vector corresponding to each speech is extracted.
- the gender classifier is trained according to the first feature vector of each voice and the gender of each voice pre-labeled, so that the training gender classifier can be applied to identify the gender of the input voice, and the input voice is assigned a gender tag.
- Step 103 Train DNN models of different genders according to voice data of different genders in the training set.
- the DNN models of different genders are respectively trained according to the speech data of different genders in the training set and the preset deep neural network algorithm, that is, the male DNN model and the female DNN model are respectively trained.
- the male DNN model is configured to receive male voice data, and output a posterior probability corresponding to the male voice data
- the female DNN model is configured to receive female voice data, and output a posterior probability corresponding to the female voice data.
- Step 104 According to the DNN model of different genders and the speech data of different genders in the training set, respectively, the unified background model, the feature vector extraction model, and the probabilistic linear discriminant analysis model of different genders are trained.
- the unified background model, the feature vector extraction model and the probability linear discriminant analysis model of different genders are respectively trained.
- a feature vector extraction model configured to receive a posterior probability of the DNN model output and voice data input by the user, and extract a second feature vector of the voice data according to a preset algorithm
- the probability linear discriminant analysis model is configured to compare the similarity between the second feature vector of the voice data input by the user and the pre-stored voiceprint registration template.
- FIG. 3 is a schematic diagram of the generation of a male voiceprint authentication processing model. Referring to Figure 3, the details are as follows:
- the male DNN model is used to process the speech data of the male in the training set to output the posterior probability, and the posterior probability of the output is normalized to train the unified background model in the male voiceprint authentication processing model.
- the similarity between the second feature vector of the male voice data and the pre-stored male voiceprint registration template is compared, and the probability linear discriminant analysis model in the male voiceprint authentication processing model is trained.
- FIG. 4 is a schematic diagram of the generation of the female voiceprint authentication processing model. Referring to Figure 4, the details are as follows:
- the female DNN model is used to process the speech data of the women in the training episode to output the posterior probability, and the posterior probabilities of the output are normalized to train the unified background model in the female voiceprint authentication processing model.
- the posterior probability of the DNN model output and the female voice data are obtained, and the second feature vector of the female voice data is extracted according to a preset algorithm, and the feature vector extraction model in the female voiceprint authentication processing model is trained.
- the similarity between the second feature vector of the female voice data and the pre-stored female voiceprint registration template is compared, and the probabilistic linear discriminant analysis model in the female voiceprint authentication processing model is trained.
- the voiceprint authentication processing method of the embodiment applies a depth neural network DNN voiceprint baseline system of a mixed gender, and extracts a first feature vector of each voice in the training set, according to the first feature vector of each voice and a pre-labeled
- the gender training gender classifier trains DNN models of different genders according to the gender data of the different genders in the training set, and respectively trains the unified background models of different genders according to the DNN models of different genders and the speech data of different genders in the training set. , feature vector extraction model, and probability linear discriminant analysis model. Therefore, a gender-based voiceprint authentication processing model is established to improve the efficiency and accuracy of voiceprint authentication.
- FIG. 5 is a flowchart of a voiceprint authentication processing method according to another embodiment of the present application.
- the voiceprint authentication processing method further includes the following voiceprint registration steps:
- Step 201 Receive a voiceprint registration request that is sent by a user and carries a user identifier.
- Step 202 Acquire a plurality of voices sent by the user for voiceprint registration, extract first feature information of the first voice, and apply the gender classifier to obtain a gender tag of the first feature information.
- a user who needs to perform voiceprint authentication needs to perform voiceprint registration in the voiceprint authentication processing model in advance.
- the user needs to send a voiceprint registration request carrying the user identification to the voiceprint authentication processing model.
- the voiceprint authentication processing model After receiving the voiceprint registration request sent by the user and carrying the user identifier, the voiceprint authentication processing model prompts the user to input the voice. The user transmits a plurality of voices for voiceprint registration to the voiceprint authentication processing model.
- the voiceprint authentication processing model extracts first feature information of the first voice and transmits the first feature information to a pre-generated gender classifier.
- the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the first voice.
- Step 203 Acquire a posterior probability of each voice according to a DNN model corresponding to the gender tag.
- Step 204 Extract a second feature vector of each voice according to the unified background model and the feature vector extraction model corresponding to the gender tag.
- Step 205 Acquire a voiceprint registration model of the user according to a plurality of second feature vectors corresponding to the plurality of voices.
- Step 206 Store the correspondence between the user identifier, the gender tag, and the voiceprint registration model to the voiceprint registration database.
- the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the first voice returned by the gender classifier. That is, if the first speech corresponds to male speech, multiple speeches are sent to the male DNN model. If the first voice corresponds to a female voice, multiple voices are sent to the female DNN model.
- a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
- the feature vector extraction model extracts a second feature vector of each speech according to each speech and a corresponding normalized posterior probability.
- An average feature vector of the plurality of second feature vectors is obtained as a voiceprint registration model of the user.
- the correspondence between the user identifier requested by the user, the gender label of the user, and the voiceprint registration model is stored in the voiceprint registration database, so that the voiceprint recognition is performed according to the voiceprint registration model.
- the gender classifier is first used to obtain the gender label of the first voice input by the user, and the posterior probability of each voice is obtained according to the DNN model corresponding to the gender label, according to the gender label.
- Corresponding unified background model and feature vector extraction model respectively extracting a second feature vector of each voice, acquiring a voiceprint registration model of the user according to the plurality of second feature vectors, and using the user identifier, the gender tag,
- the correspondence of the voiceprint registration model is stored in the voiceprint registration database.
- FIG. 6 is a flow chart of a voiceprint authentication processing method according to another embodiment of the present application.
- the voiceprint authentication processing method includes:
- Step 301 Receive a voiceprint identification request that is sent by a user and carries a user identifier.
- Step 302 Query the voiceprint registration database to obtain a gender label and a voiceprint registration model corresponding to the user identifier.
- the user who needs to perform voiceprint recognition needs to input a user identifier in the voiceprint authentication processing model and send a voiceprint recognition request carrying the user identifier.
- the voiceprint identification request sent by the user is parsed to obtain a user identifier, and the voiceprint registration database is queried to obtain a gender label and a voiceprint registration model corresponding to the user identifier, so as to obtain the gender label and the voiceprint registration model of the user.
- Step 303 Acquire a voice for voiceprint recognition sent by the user, and acquire a posterior probability of the voice according to a DNN model corresponding to the gender tag.
- the voice sent by the user for voiceprint recognition is acquired, and the voice is sent to a DNN model corresponding to the gender tag of the user, and the DNN model processes the voice to obtain a posterior probability of the voice.
- Step 304 Apply a unified background model and a feature vector extraction model corresponding to the gender tag, and extract a second feature vector of the voice.
- the posterior probability of the speech is sent to a unified background model corresponding to the gender tag.
- the unified background model normalizes each posterior probability, and applies a pre-trained feature vector extraction model according to the speech, And corresponding normalized posterior probability, the second feature vector of the speech is extracted.
- Step 305 Apply a probability linear discriminant analysis model corresponding to the gender tag, and compare the similarity between the second feature vector of the speech and the voiceprint registration model.
- Step 306 returning a voiceprint recognition result to the user according to the similarity and a preset threshold.
- the second feature vector of the voice is sent to a probabilistic linear discriminant analysis model corresponding to the gender tag, and the probabilistic linear discriminant analysis model compares the second feature vector of the speech with the pre-stored voiceprint registration model of the user. Similarity.
- the voiceprint recognition is successful
- the return voiceprint recognition fails.
- the voiceprint authentication processing method of the embodiment first queries the voiceprint registration database to obtain the gender label and the voiceprint registration model corresponding to the user identifier; applies the unified background model and the feature vector extraction model corresponding to the gender label, and extracts the second voice.
- the feature vector, the probabilistic linear discriminant analysis model compares the similarity between the second feature vector of the speech and the voiceprint registration model, and returns the voiceprint recognition result to the user according to the similarity and the preset threshold.
- the present application also proposes a voiceprint authentication processing apparatus.
- FIG. 7 is a schematic structural diagram of a voiceprint authentication processing apparatus according to an embodiment of the present application.
- the voiceprint authentication processing apparatus includes:
- the extracting module 11 is configured to apply a mixed-sex deep neural network DNN voiceprint baseline system, and extract a first feature vector of each voice in the training set;
- the generating module 12 is configured to train the gender classifier according to the first feature vector of each voice and the pre-labeled gender tag;
- the first training module 13 is configured to separately train DNN models of different genders according to voice data of different genders in the training set;
- the second training module 14 is configured to train a unified background model, a feature vector extraction model, and a probability linear discriminant analysis model of different genders according to DNN models of different genders and voice data of different genders in the training set.
- the voiceprint authentication processing apparatus of the embodiment of the present application applies a depth neural network DNN voiceprint baseline system of a mixed gender, and extracts a first feature vector of each voice in the training set, according to the first feature vector of each voice and the advance
- the labeled gender tag trains the gender classifier, and according to the speech data of different genders in the training set, respectively trains DNN models of different genders, and respectively trains different genders according to DNN models of different genders and voice data of different genders in the training set.
- Unified background model, feature vector extraction model, and probabilistic linear discriminant analysis model are established to improve the efficiency and accuracy of voiceprint authentication.
- FIG. 8 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application. As shown in FIG. 8, according to the embodiment shown in FIG. 7, the method further includes:
- the first receiving module 15 is configured to receive a voiceprint registration request that is sent by the user and that carries the user identifier.
- the gender labeling module 16 is configured to acquire a plurality of voices sent by the user for the voiceprint registration, extract the first feature information of the first voice, and apply the gender classifier to obtain the gender tag of the first feature information;
- the first processing module 17 is configured to obtain a posterior probability of each voice according to a DNN model corresponding to the gender tag, and extract a voice model according to a unified background model and a feature vector corresponding to the gender tag. Second feature vector;
- the obtaining module 18 is configured to acquire the voiceprint registration model of the user according to the plurality of second feature vectors corresponding to the plurality of voices;
- the registration module 19 is configured to store the correspondence between the user identifier, the gender label, and the voiceprint registration model to the voiceprint registration database.
- the obtaining module 18 is configured to:
- the voiceprint authentication processing apparatus of the embodiment of the present application first applies a gender classifier to obtain a gender label of a first voice input by a user, and obtains a posterior probability of each voice according to a DNN model corresponding to the gender label, according to the gender a unified background model and a feature vector extraction model corresponding to the label, respectively extracting a second feature vector of each voice, acquiring a voiceprint registration model of the user according to the plurality of second feature vectors, and using the user identifier and the gender label
- the correspondence relationship of the voiceprint registration model is stored in the voiceprint registration database.
- the gender-based voiceprint registration process is implemented, so that the gender-specific voiceprint authentication processing model is applied to improve the efficiency and accuracy of voiceprint authentication.
- FIG. 9 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application. As shown in FIG. 9, according to the embodiment shown in FIG. 8, the method further includes:
- the second receiving module 20 is configured to receive a voiceprint identification request that is sent by the user and that carries the user identifier.
- the query module 21 is configured to query the voiceprint registration database to obtain a gender label and a voiceprint registration model corresponding to the user identifier.
- the second processing module 22 is configured to acquire a voice for voiceprint recognition sent by the user, obtain a posterior probability of the voice according to a DNN model corresponding to the gender tag, and apply a unified background model corresponding to the gender tag. And a feature vector extraction model, extracting a second feature vector of the speech;
- a comparison module 23 configured to apply a probability linear discriminant analysis model corresponding to the gender tag, and compare a similarity between the second feature vector of the speech and the voiceprint registration model;
- the identification module 24 is configured to return a voiceprint recognition result to the user according to the similarity and a preset threshold.
- the identification module 24 is configured to:
- the voiceprint recognition is successful
- the return voiceprint recognition fails.
- the voiceprint authentication processing apparatus of the embodiment of the present application first queries the voiceprint registration database to obtain a gender label and a voiceprint registration model corresponding to the user identifier, and applies a unified background model and a feature vector extraction model corresponding to the gender label to extract the voice number.
- the two feature vectors apply a probabilistic linear discriminant analysis model to compare the similarity between the second feature vector of the voice and the voiceprint registration model, and return the voiceprint recognition result to the user according to the similarity and the preset threshold.
- a storage medium configured to store an application for performing a voiceprint authentication processing method according to the first aspect of the present invention.
- the voiceprint authentication processing apparatus of the fourth aspect of the present invention includes: one or more processors; a memory; one or more modules, the one or more modules being stored in the memory When performed by the one or more processors, do the following:
- first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
- features defining “first” or “second” may include at least one of the features, either explicitly or implicitly.
- the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.
- a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device.
- computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
- the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
- portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
- multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
- a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuit, ASIC with suitable combinational logic gate, Programmable Gate Array (PGA), now Field programmable gate array (FPGA), etc.
- each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
- the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
- the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. While the embodiments of the present application have been shown and described above, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the present application. The embodiments are subject to variations, modifications, substitutions and variations.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Collating Specific Patterns (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (12)
- 一种声纹认证处理方法,其特征在于,包括以下步骤:应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
- 如权利要求1所述的方法,其特征在于,还包括:接收用户发送的携带用户标识的声纹注册请求;获取用户发送的用于声纹注册的多条语音,提取第一条语音的第一特征信息,应用所述性别分类器获取所述第一特征信息的性别标签;根据与所述性别标签对应的DNN模型获取每条语音的后验概率;根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量;根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型;将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。
- 如权利要求2所述的方法,其特征在于,所述根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型,包括:获取所述多个第二特征向量的平均特征向量作为所述用户的声纹注册模型。
- 如权利要求2或3所述的方法,其特征在于,还包括:接收用户发送的携带用户标识的声纹识别请求;查询所述声纹注册数据库获取与所述用户标识对应的性别标签和声纹注册模型;获取用户发送的用于声纹识别的语音,根据与所述性别标签对应的DNN模型获取所述语音的后验概率;应用与所述性别标签对应的统一背景模型和特征向量提取模型,提取所述语音的第二特征向量;应用与所述性别标签对应的概率线性判别分析模型,比较所述语音的第二特征向量和所述声纹注册模型的相似度;根据所述相似度和预设的阈值向所述用户返回声纹识别结果。
- 如权利要求4所述的方法,其特征在于,所述根据所述相似度和预设的阈值向所述用户返回声纹识别结果,包括:比较所述相似度和预设的阈值的大小;若获知所述相似度大于等于预设的阈值,则返回声纹识别成功;若获知所述相似度小于预设的阈值,则返回声纹识别失败。
- 一种声纹认证处理装置,其特征在于,包括:提取模块,用于应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;生成模块,用于根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;第一训练模块,用于根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;第二训练模块,用于根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
- 如权利要求6所述的装置,其特征在于,还包括:第一接收模块,用于接收用户发送的携带用户标识的声纹注册请求;性别标注模块,用于获取用户发送的用于声纹注册的多条语音,提取第一条语音的第一特征信息,应用所述性别分类器获取所述第一特征信息的性别标签;第一处理模块,用于根据与所述性别标签对应的DNN模型获取每条语音的后验概率;根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量;获取模块,用于根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型;注册模块,用于将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。
- 如权利要求7所述的装置,其特征在于,所述获取模块用于:获取所述多个第二特征向量的平均特征向量作为所述用户的声纹注册模型。
- 如权利要求7或8所述的装置,其特征在于,还包括:第二接收模块,用于接收用户发送的携带用户标识的声纹识别请求;查询模块,用于查询所述声纹注册数据库获取与所述用户标识对应的性别标签和声纹 注册模型;第二处理模块,用于获取用户发送的用于声纹识别的语音,根据与所述性别标签对应的DNN模型获取所述语音的后验概率,应用与所述性别标签对应的统一背景模型和特征向量提取模型,提取所述语音的第二特征向量;比较模块,用于应用与所述性别标签对应的概率线性判别分析模型,比较所述语音的第二特征向量和所述声纹注册模型的相似度;识别模块,用于根据所述相似度和预设的阈值向所述用户返回声纹识别结果。
- 如权利要求9所述的装置,其特征在于,所述识别模块用于:比较所述相似度和预设的阈值的大小;若获知所述相似度大于等于预设的阈值,则返回声纹识别成功;若获知所述相似度小于预设的阈值,则返回声纹识别失败。
- 一种存储介质,其特征在于,用于存储应用程序,所述应用程序用于执行权利要求1至5中任一项所述的声纹认证处理方法。
- 一种声纹认证处理设备,其特征在于,包括:一个或者多个处理器;存储器;一个或者多个模块,所述一个或者多个模块存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017519504A JP6682523B2 (ja) | 2015-12-30 | 2016-07-04 | 声紋認証処理方法及び装置 |
US15/501,292 US10685658B2 (en) | 2015-12-30 | 2016-07-04 | Method and device for processing voiceprint authentication |
EP16829225.8A EP3296991B1 (en) | 2015-12-30 | 2016-07-04 | Method and device for voiceprint authentication processing |
KR1020177002005A KR101870093B1 (ko) | 2015-12-30 | 2016-07-04 | 성문 인증 처리 방법 및 장치 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511024873.7 | 2015-12-30 | ||
CN201511024873.7A CN105513597B (zh) | 2015-12-30 | 2015-12-30 | 声纹认证处理方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017113680A1 true WO2017113680A1 (zh) | 2017-07-06 |
Family
ID=55721524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/088435 WO2017113680A1 (zh) | 2015-12-30 | 2016-07-04 | 声纹认证处理方法及装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10685658B2 (zh) |
EP (1) | EP3296991B1 (zh) |
JP (1) | JP6682523B2 (zh) |
KR (1) | KR101870093B1 (zh) |
CN (1) | CN105513597B (zh) |
WO (1) | WO2017113680A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109545227A (zh) * | 2018-04-28 | 2019-03-29 | 华中师范大学 | 基于深度自编码网络的说话人性别自动识别方法及系统 |
CN110136726A (zh) * | 2019-06-20 | 2019-08-16 | 厦门市美亚柏科信息股份有限公司 | 一种语音性别的估计方法、装置、系统及存储介质 |
WO2019214047A1 (zh) * | 2018-05-08 | 2019-11-14 | 平安科技(深圳)有限公司 | 建立声纹模型的方法、装置、计算机设备和存储介质 |
CN111241512A (zh) * | 2020-01-09 | 2020-06-05 | 珠海格力电器股份有限公司 | 留言信息播报方法、装置、电子设备及存储介质 |
CN114141255A (zh) * | 2021-11-24 | 2022-03-04 | 中国电信股份有限公司 | 声纹识别模型的训练方法及装置、声纹识别方法及装置 |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9875742B2 (en) * | 2015-01-26 | 2018-01-23 | Verint Systems Ltd. | Word-level blind diarization of recorded calls with arbitrary number of speakers |
CN105513597B (zh) * | 2015-12-30 | 2018-07-10 | 百度在线网络技术(北京)有限公司 | 声纹认证处理方法及装置 |
CN107346568B (zh) * | 2016-05-05 | 2020-04-17 | 阿里巴巴集团控股有限公司 | 一种门禁系统的认证方法和装置 |
EP3460791A4 (en) * | 2016-05-16 | 2019-05-22 | Sony Corporation | INFORMATION PROCESSING DEVICE |
CN106297807B (zh) * | 2016-08-05 | 2019-03-01 | 腾讯科技(深圳)有限公司 | 训练声纹识别系统的方法和装置 |
CN106710599A (zh) * | 2016-12-02 | 2017-05-24 | 深圳撒哈拉数据科技有限公司 | 一种基于深度神经网络的特定声源检测方法与系统 |
CN106710604A (zh) * | 2016-12-07 | 2017-05-24 | 天津大学 | 提高语音可懂度的共振峰增强装置和方法 |
CN107610707B (zh) * | 2016-12-15 | 2018-08-31 | 平安科技(深圳)有限公司 | 一种声纹识别方法及装置 |
CN108288470B (zh) * | 2017-01-10 | 2021-12-21 | 富士通株式会社 | 基于声纹的身份验证方法和装置 |
CN108573698B (zh) * | 2017-03-09 | 2021-06-08 | 中国科学院声学研究所 | 一种基于性别融合信息的语音降噪方法 |
AU2017305006A1 (en) * | 2017-06-13 | 2019-01-03 | Beijing Didi Infinity Technology And Development Co., Ltd. | Method, apparatus and system for speaker verification |
CN107610709B (zh) * | 2017-08-01 | 2021-03-19 | 百度在线网络技术(北京)有限公司 | 一种训练声纹识别模型的方法及系统 |
CN107623614B (zh) * | 2017-09-19 | 2020-12-08 | 百度在线网络技术(北京)有限公司 | 用于推送信息的方法和装置 |
CN108694954A (zh) * | 2018-06-13 | 2018-10-23 | 广州势必可赢网络科技有限公司 | 一种性别年龄识别方法、装置、设备及可读存储介质 |
CN109036436A (zh) * | 2018-09-18 | 2018-12-18 | 广州势必可赢网络科技有限公司 | 一种声纹数据库建立方法、声纹识别方法、装置及系统 |
JP7326033B2 (ja) * | 2018-10-05 | 2023-08-15 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 話者認識装置、話者認識方法、及び、プログラム |
CN109473105A (zh) * | 2018-10-26 | 2019-03-15 | 平安科技(深圳)有限公司 | 与文本无关的声纹验证方法、装置和计算机设备 |
CN109378007B (zh) * | 2018-12-28 | 2022-09-13 | 浙江百应科技有限公司 | 一种基于智能语音对话实现性别识别的方法 |
CN109378006B (zh) * | 2018-12-28 | 2022-09-16 | 三星电子(中国)研发中心 | 一种跨设备声纹识别方法及系统 |
US11031017B2 (en) * | 2019-01-08 | 2021-06-08 | Google Llc | Fully supervised speaker diarization |
CN111462760B (zh) * | 2019-01-21 | 2023-09-26 | 阿里巴巴集团控股有限公司 | 声纹识别系统、方法、装置及电子设备 |
CN109637547B (zh) * | 2019-01-29 | 2020-11-03 | 北京猎户星空科技有限公司 | 音频数据标注方法、装置、电子设备及存储介质 |
US11289098B2 (en) | 2019-03-08 | 2022-03-29 | Samsung Electronics Co., Ltd. | Method and apparatus with speaker recognition registration |
CN109994116B (zh) * | 2019-03-11 | 2021-01-19 | 南京邮电大学 | 一种基于会议场景小样本条件下的声纹准确识别方法 |
CN109920435B (zh) * | 2019-04-09 | 2021-04-06 | 厦门快商通信息咨询有限公司 | 一种声纹识别方法及声纹识别装置 |
CN113892136A (zh) * | 2019-05-28 | 2022-01-04 | 日本电气株式会社 | 信号提取系统、信号提取学习方法以及信号提取学习程序 |
CN110660484B (zh) * | 2019-08-01 | 2022-08-23 | 平安科技(深圳)有限公司 | 骨龄预测方法、装置、介质及电子设备 |
CN110517698B (zh) * | 2019-09-05 | 2022-02-01 | 科大讯飞股份有限公司 | 一种声纹模型的确定方法、装置、设备及存储介质 |
CN110956966B (zh) * | 2019-11-01 | 2023-09-19 | 平安科技(深圳)有限公司 | 声纹认证方法、装置、介质及电子设备 |
CN110660399A (zh) * | 2019-11-11 | 2020-01-07 | 广州国音智能科技有限公司 | 声纹识别的训练方法、装置、终端及计算机存储介质 |
CN111009262A (zh) * | 2019-12-24 | 2020-04-14 | 携程计算机技术(上海)有限公司 | 语音性别识别的方法及系统 |
CN111147484B (zh) * | 2019-12-25 | 2022-06-14 | 秒针信息技术有限公司 | 账号登录方法和装置 |
CN110797032B (zh) * | 2020-01-06 | 2020-05-12 | 深圳中创华安科技有限公司 | 一种声纹数据库建立方法及声纹识别方法 |
CN111179942B (zh) * | 2020-01-06 | 2022-11-08 | 泰康保险集团股份有限公司 | 声纹识别方法、装置、设备及计算机可读存储介质 |
CN111243607A (zh) * | 2020-03-26 | 2020-06-05 | 北京字节跳动网络技术有限公司 | 用于生成说话人信息的方法、装置、电子设备和介质 |
WO2021192719A1 (ja) * | 2020-03-27 | 2021-09-30 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 話者識別方法、話者識別装置、話者識別プログラム、性別識別モデル生成方法及び話者識別モデル生成方法 |
JP7473910B2 (ja) | 2020-03-27 | 2024-04-24 | 株式会社フュートレック | 話者認識装置、話者認識方法およびプログラム |
CN111489756B (zh) * | 2020-03-31 | 2024-03-01 | 中国工商银行股份有限公司 | 一种声纹识别方法及装置 |
CN111583935A (zh) * | 2020-04-02 | 2020-08-25 | 深圳壹账通智能科技有限公司 | 贷款智能进件方法、装置及存储介质 |
CN111933147B (zh) * | 2020-06-22 | 2023-02-14 | 厦门快商通科技股份有限公司 | 声纹识别方法、系统、移动终端及存储介质 |
US11522994B2 (en) | 2020-11-23 | 2022-12-06 | Bank Of America Corporation | Voice analysis platform for voiceprint tracking and anomaly detection |
CN112637428A (zh) * | 2020-12-29 | 2021-04-09 | 平安科技(深圳)有限公司 | 无效通话判断方法、装置、计算机设备及存储介质 |
US20220215834A1 (en) * | 2021-01-01 | 2022-07-07 | Jio Platforms Limited | System and method for speech to text conversion |
US11996087B2 (en) | 2021-04-30 | 2024-05-28 | Comcast Cable Communications, Llc | Method and apparatus for intelligent voice recognition |
KR102478076B1 (ko) * | 2022-06-13 | 2022-12-15 | 주식회사 액션파워 | 음성 인식 오류 검출을 위해 학습 데이터를 생성하기 위한 방법 |
JP7335651B1 (ja) | 2022-08-05 | 2023-08-30 | 株式会社Interior Haraguchi | 顔認証決済システムおよび顔認証決済方法 |
CN117351484B (zh) * | 2023-10-12 | 2024-08-27 | 深圳市前海高新国际医疗管理有限公司 | 基于ai的肿瘤干细胞特征提取及分类系统 |
CN117470976B (zh) * | 2023-12-28 | 2024-03-26 | 烟台宇控软件有限公司 | 一种基于声纹特征的输电线路缺陷检测方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971690A (zh) * | 2013-01-28 | 2014-08-06 | 腾讯科技(深圳)有限公司 | 一种声纹识别方法和装置 |
US20150127342A1 (en) * | 2013-11-04 | 2015-05-07 | Google Inc. | Speaker identification |
US20150149165A1 (en) * | 2013-11-27 | 2015-05-28 | International Business Machines Corporation | Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors |
CN105513597A (zh) * | 2015-12-30 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | 声纹认证处理方法及装置 |
Family Cites Families (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7006605B1 (en) * | 1996-06-28 | 2006-02-28 | Ochopee Big Cypress Llc | Authenticating a caller before providing the caller with access to one or more secured resources |
US5897616A (en) * | 1997-06-11 | 1999-04-27 | International Business Machines Corporation | Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases |
US6665644B1 (en) * | 1999-08-10 | 2003-12-16 | International Business Machines Corporation | Conversational data mining |
US20030110038A1 (en) * | 2001-10-16 | 2003-06-12 | Rajeev Sharma | Multi-modal gender classification using support vector machines (SVMs) |
US7266497B2 (en) * | 2002-03-29 | 2007-09-04 | At&T Corp. | Automatic segmentation in speech synthesis |
US7620547B2 (en) * | 2002-07-25 | 2009-11-17 | Sony Deutschland Gmbh | Spoken man-machine interface with speaker identification |
US7404087B2 (en) * | 2003-12-15 | 2008-07-22 | Rsa Security Inc. | System and method for providing improved claimant authentication |
US7231019B2 (en) * | 2004-02-12 | 2007-06-12 | Microsoft Corporation | Automatic identification of telephone callers based on voice characteristics |
US20070299671A1 (en) * | 2004-03-31 | 2007-12-27 | Ruchika Kapur | Method and apparatus for analysing sound- converting sound into information |
CN101136199B (zh) * | 2006-08-30 | 2011-09-07 | 纽昂斯通讯公司 | 语音数据处理方法和设备 |
KR100864828B1 (ko) * | 2006-12-06 | 2008-10-23 | 한국전자통신연구원 | 화자의 음성 특징 정보를 이용한 화자 정보 획득 시스템 및그 방법 |
US7949526B2 (en) * | 2007-06-04 | 2011-05-24 | Microsoft Corporation | Voice aware demographic personalization |
JP2009109712A (ja) * | 2007-10-30 | 2009-05-21 | National Institute Of Information & Communication Technology | オンライン話者逐次区別システム及びそのコンピュータプログラム |
US8433669B2 (en) * | 2007-11-14 | 2013-04-30 | International Business Machines Corporation | Configuring individual classifiers with multiple operating points for cascaded classifier topologies under resource constraints |
WO2011028844A2 (en) * | 2009-09-02 | 2011-03-10 | Sri International | Method and apparatus for tailoring the output of an intelligent automated assistant to a user |
JP5214679B2 (ja) * | 2010-08-30 | 2013-06-19 | 株式会社東芝 | 学習装置、方法及びプログラム |
US8559682B2 (en) * | 2010-11-09 | 2013-10-15 | Microsoft Corporation | Building a person profile database |
US8515750B1 (en) * | 2012-06-05 | 2013-08-20 | Google Inc. | Realtime acoustic adaptation using stability measures |
US9502038B2 (en) * | 2013-01-28 | 2016-11-22 | Tencent Technology (Shenzhen) Company Limited | Method and device for voiceprint recognition |
US9336781B2 (en) * | 2013-10-17 | 2016-05-10 | Sri International | Content-aware speaker recognition |
US20150154002A1 (en) * | 2013-12-04 | 2015-06-04 | Google Inc. | User interface customization based on speaker characteristics |
US9542948B2 (en) * | 2014-04-09 | 2017-01-10 | Google Inc. | Text-dependent speaker identification |
US9564123B1 (en) * | 2014-05-12 | 2017-02-07 | Soundhound, Inc. | Method and system for building an integrated user profile |
US9792899B2 (en) * | 2014-07-15 | 2017-10-17 | International Business Machines Corporation | Dataset shift compensation in machine learning |
US9373330B2 (en) * | 2014-08-07 | 2016-06-21 | Nuance Communications, Inc. | Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis |
US10476872B2 (en) * | 2015-02-20 | 2019-11-12 | Sri International | Joint speaker authentication and key phrase identification |
US11823658B2 (en) * | 2015-02-20 | 2023-11-21 | Sri International | Trial-based calibration for audio-based identification, recognition, and detection system |
US10146923B2 (en) * | 2015-03-20 | 2018-12-04 | Aplcomp Oy | Audiovisual associative authentication method, related system and device |
US9666183B2 (en) * | 2015-03-27 | 2017-05-30 | Qualcomm Incorporated | Deep neural net based filter prediction for audio event classification and extraction |
US9721559B2 (en) * | 2015-04-17 | 2017-08-01 | International Business Machines Corporation | Data augmentation method based on stochastic feature mapping for automatic speech recognition |
-
2015
- 2015-12-30 CN CN201511024873.7A patent/CN105513597B/zh active Active
-
2016
- 2016-07-04 US US15/501,292 patent/US10685658B2/en active Active
- 2016-07-04 EP EP16829225.8A patent/EP3296991B1/en active Active
- 2016-07-04 WO PCT/CN2016/088435 patent/WO2017113680A1/zh active Application Filing
- 2016-07-04 JP JP2017519504A patent/JP6682523B2/ja active Active
- 2016-07-04 KR KR1020177002005A patent/KR101870093B1/ko active IP Right Grant
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971690A (zh) * | 2013-01-28 | 2014-08-06 | 腾讯科技(深圳)有限公司 | 一种声纹识别方法和装置 |
US20150127342A1 (en) * | 2013-11-04 | 2015-05-07 | Google Inc. | Speaker identification |
US20150149165A1 (en) * | 2013-11-27 | 2015-05-28 | International Business Machines Corporation | Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors |
CN105513597A (zh) * | 2015-12-30 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | 声纹认证处理方法及装置 |
Non-Patent Citations (6)
Title |
---|
KENNY, P. ET AL.: "Deep Neural Networks for Extracting Baum-welch Statistics for Speaker Recognition", PROC. OF ODYSSEY 2014 : THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 19 June 2014 (2014-06-19), pages 293 - 298, XP055361192 * |
LEI, YUN ET AL.: "A Deep Neural Network Speaker Verification System Targeting Microphone Speech", INTERSPEECH 2014, 18 September 2014 (2014-09-18), pages 681 - 685, XP055396135 * |
LEI, YUN ET AL.: "A Novel Scheme for Speaker Recognition Using a Phonetically-aware Deep Neural Network", ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014 IEEE INTERNATIONAL CONFERENCE ON, 9 May 2014 (2014-05-09), pages 1695 - 1699, XP032617660 * |
LI, LANTIAN ET AL.: "Gender-dependent Feature Extraction for Speaker Recognition", SIGNAL AND INFORMATION PROCESSING ( CHINA SIP), 2015 IEEE CHINA SUMMIT AND INTERNATIONAL CONFERENCE ON, 15 July 2015 (2015-07-15), pages 509 - 513, XP055396130 * |
SNYDER, D. ET AL.: "Time Delay Deep Neural Network-based Universal Background Models for Speaker Recognition", AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU, 17 December 2015 (2015-12-17), pages 92 - 97, XP032863530 * |
XU, YAN ET AL.: "Improved i-Vector Representation for Speaker Diarization", CIRCUITS SYSTEMS AND SIGNAL PROCESSING, vol. 9, no. 35, 22 December 2015 (2015-12-22), pages 3393 - 3404, XP036015063 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109545227A (zh) * | 2018-04-28 | 2019-03-29 | 华中师范大学 | 基于深度自编码网络的说话人性别自动识别方法及系统 |
CN109545227B (zh) * | 2018-04-28 | 2023-05-09 | 华中师范大学 | 基于深度自编码网络的说话人性别自动识别方法及系统 |
WO2019214047A1 (zh) * | 2018-05-08 | 2019-11-14 | 平安科技(深圳)有限公司 | 建立声纹模型的方法、装置、计算机设备和存储介质 |
CN110136726A (zh) * | 2019-06-20 | 2019-08-16 | 厦门市美亚柏科信息股份有限公司 | 一种语音性别的估计方法、装置、系统及存储介质 |
CN111241512A (zh) * | 2020-01-09 | 2020-06-05 | 珠海格力电器股份有限公司 | 留言信息播报方法、装置、电子设备及存储介质 |
CN114141255A (zh) * | 2021-11-24 | 2022-03-04 | 中国电信股份有限公司 | 声纹识别模型的训练方法及装置、声纹识别方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
EP3296991A1 (en) | 2018-03-21 |
CN105513597B (zh) | 2018-07-10 |
US10685658B2 (en) | 2020-06-16 |
KR101870093B1 (ko) | 2018-06-21 |
US20180293990A1 (en) | 2018-10-11 |
JP6682523B2 (ja) | 2020-04-15 |
CN105513597A (zh) | 2016-04-20 |
JP2018508799A (ja) | 2018-03-29 |
EP3296991B1 (en) | 2019-11-13 |
EP3296991A4 (en) | 2018-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017113680A1 (zh) | 声纹认证处理方法及装置 | |
EP3477519B1 (en) | Identity authentication method, terminal device, and computer-readable storage medium | |
WO2021068321A1 (zh) | 基于人机交互的信息推送方法、装置和计算机设备 | |
CN107492379B (zh) | 一种声纹创建与注册方法及装置 | |
WO2017113658A1 (zh) | 基于人工智能的声纹认证方法以及装置 | |
US20180197548A1 (en) | System and method for diarization of speech, automated generation of transcripts, and automatic information extraction | |
Dobrišek et al. | Towards efficient multi-modal emotion recognition | |
JP6567040B2 (ja) | 人工知能に基づく声紋ログイン方法と装置 | |
US11270698B2 (en) | Proactive command framework | |
US6141644A (en) | Speaker verification and speaker identification based on eigenvoices | |
US10509895B2 (en) | Biometric authentication | |
US20210082438A1 (en) | Convolutional neural network with phonetic attention for speaker verification | |
WO2021232594A1 (zh) | 语音情绪识别方法、装置、电子设备及存储介质 | |
CN110516083B (zh) | 相册管理方法、存储介质及电子设备 | |
US20170294192A1 (en) | Classifying Signals Using Mutual Information | |
WO2021047319A1 (zh) | 基于语音的个人信用评估方法、装置、终端及存储介质 | |
US11756572B2 (en) | Self-supervised speech representations for fake audio detection | |
CN112233648B (zh) | 结合rpa及ai的数据的处理方法、装置、设备及存储介质 | |
JP2015230455A (ja) | 音声分類装置、音声分類方法、プログラム | |
CA3199456A1 (en) | Embedded dictation detection | |
CN117235234B (zh) | 对象信息获取方法、装置、计算机设备和存储介质 | |
CN111081261A (zh) | 一种基于lda的文本无关声纹识别方法 | |
US11792365B1 (en) | Message data analysis for response recommendations | |
Trabelsi et al. | Training universal background models with restricted data for speech emotion recognition | |
Trabelsi et al. | Dynamic sequence-based learning approaches on emotion recognition systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
REEP | Request for entry into the european phase |
Ref document number: 2016829225 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2016829225 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15501292 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2017519504 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16829225 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |