[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2017113680A1 - 声纹认证处理方法及装置 - Google Patents

声纹认证处理方法及装置 Download PDF

Info

Publication number
WO2017113680A1
WO2017113680A1 PCT/CN2016/088435 CN2016088435W WO2017113680A1 WO 2017113680 A1 WO2017113680 A1 WO 2017113680A1 CN 2016088435 W CN2016088435 W CN 2016088435W WO 2017113680 A1 WO2017113680 A1 WO 2017113680A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
model
gender
feature vector
voice
Prior art date
Application number
PCT/CN2016/088435
Other languages
English (en)
French (fr)
Inventor
李超
吴本谷
朱林
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to JP2017519504A priority Critical patent/JP6682523B2/ja
Priority to US15/501,292 priority patent/US10685658B2/en
Priority to EP16829225.8A priority patent/EP3296991B1/en
Priority to KR1020177002005A priority patent/KR101870093B1/ko
Publication of WO2017113680A1 publication Critical patent/WO2017113680A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/038Vector quantisation, e.g. TwinVQ audio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan

Definitions

  • the present application relates to the field of voiceprint authentication technologies, and in particular, to a voiceprint authentication processing method and apparatus.
  • VPR Voiceprint Recognition
  • Voiceprint recognition can confirm whether a certain voice is spoken by a designated person, for example, attendance punching, or confirmation of the user's voice when banking transactions. Before the voiceprint recognition, the voiceprint of the speaker needs to be modeled first. This is the so-called “training” or “learning” process.
  • the current training process of voiceprint recognition is to train and identify voiceprints through a common model, and the accuracy is not high.
  • the present application aims to solve at least one of the technical problems in the related art to some extent.
  • the first object of the present application is to propose a voiceprint authentication processing method, which establishes a gender-based voiceprint authentication processing model in order to improve the efficiency and accuracy of voiceprint authentication.
  • a second object of the present application is to provide a voiceprint authentication processing apparatus.
  • a third object of the present invention is to provide a storage medium.
  • a fourth object of the present invention is to provide a voiceprint authentication processing apparatus.
  • the first aspect of the present application provides a voiceprint authentication processing method, including: applying a mixed-sex deep neural network DNN voiceprint baseline system, and extracting a first feature vector of each voice in the training set;
  • the first feature vector of each voice and the pre-labeled gender tag train the gender classifier; respectively, according to the voice data of different genders in the training set, respectively training DNN models of different genders; DNN models according to different genders and the training
  • the speech data of different genders are concentrated, and the unified background model, eigenvector extraction model and probabilistic linear discriminant analysis model of different genders are trained respectively.
  • the voiceprint authentication processing method of the embodiment of the present application by applying a mixed gender deep neural network DNN voiceprint baseline a system, extracting a first feature vector of each voice in the training set; training a gender classifier according to the first feature vector of each voice and a pre-labeled gender tag; and training differently according to voice data of different genders in the training set
  • the DNN model of gender; according to the DNN model of different genders and the speech data of different genders in the training set, the unified background model, the feature vector extraction model and the probability linear discriminant analysis model of different genders are respectively trained. Therefore, a gender-based voiceprint authentication processing model is established to improve the efficiency and accuracy of voiceprint authentication.
  • the second aspect of the present application provides a voiceprint authentication processing apparatus, including: an extraction module, configured to apply a mixed-sex deep neural network DNN voiceprint baseline system, and extract each voice of the training set. a feature vector; a generating module, configured to train a gender classifier according to the first feature vector of each voice and a pre-labeled gender tag; the first training module is configured to respectively perform voice data according to different genders in the training set Training DNN models of different genders; a second training module for training unified gender models of different genders, feature vector extraction models, and probability linear discriminant analysis according to different gender DNN models and different genders of speech data in the training set model.
  • the voiceprint authentication processing apparatus of the embodiment of the present application extracts a first feature vector of each voice in the training set by applying a depth neural network DNN voiceprint baseline system of the mixed gender; according to the first feature vector of each voice and the advance
  • the labeled gender tag trains the gender classifier; according to the gender data of the different genders in the training set, the DNN models of different genders are respectively trained; the DNN models of different genders and the voice data of different genders in the training set are respectively trained to train different genders.
  • Unified background model, feature vector extraction model, and probabilistic linear discriminant analysis model Therefore, a gender-based voiceprint authentication processing model is established to improve the efficiency and accuracy of voiceprint authentication.
  • a storage medium configured to store an application for performing a voiceprint authentication processing method according to the first aspect of the present invention.
  • a voiceprint authentication processing apparatus includes: one or more processors; a memory; one or more modules, wherein the one or more modules are stored in the memory, When executed by the one or more processors, the following operations are performed: applying a mixed-sex deep neural network DNN voiceprint baseline system, extracting a first feature vector of each voice in the training set; according to the first of each voice
  • the feature vector and the pre-labeled gender tag train the gender classifier; respectively, according to the speech data of different genders in the training set, respectively training the DNN models of different genders; according to the DNN models of different genders and the voice data of different genders in the training set, respectively Training uniform background models of different genders, feature vector extraction models, and probabilistic linear discriminant analysis models.
  • FIG. 1 is a flowchart of a voiceprint authentication processing method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of the generation of a gender classifier
  • FIG. 3 is a schematic diagram of generating a male voiceprint authentication processing model
  • FIG. 4 is a schematic diagram of generating a female voiceprint authentication processing model
  • FIG. 5 is a flowchart of a voiceprint authentication processing method according to another embodiment of the present application.
  • FIG. 6 is a flowchart of a voiceprint authentication processing method according to another embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a voiceprint authentication processing apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application.
  • FIG. 1 is a flow chart of a voiceprint authentication processing method according to an embodiment of the present application.
  • the voiceprint authentication processing method includes:
  • Step 101 Apply a gender-based deep neural network DNN voiceprint baseline system to extract a first feature vector of each speech in the training set.
  • Step 102 Train the gender classifier according to the first feature vector of each voice and the pre-labeled gender tag.
  • the gender-based voiceprint authentication processing model it is first necessary to apply a gender-based deep neural network DNN voiceprint baseline system to generate a gender training gender classifier, so as to apply the training gender classifier to identify the gender of the input voice as an input.
  • the sound is assigned a gender tag.
  • Figure 2 is a schematic diagram of the generation of the gender classifier. See Figure 2 for how to apply the mixed gender DNN voiceprint baseline system to generate the gender classifier, as follows:
  • a training set containing a plurality of voices is preset, and each voice data in the training set is pre-labeled with corresponding gender information.
  • gender corresponding to the first voice data is male data
  • gender corresponding to the second voice data is female. data.
  • Each speech data in the training set is input into the DNN voiceprint baseline system of the mixed gender, and the DNN voiceprint baseline system is used to perform data processing on each piece of speech data, and the first feature vector corresponding to each speech is extracted.
  • the gender classifier is trained according to the first feature vector of each voice and the gender of each voice pre-labeled, so that the training gender classifier can be applied to identify the gender of the input voice, and the input voice is assigned a gender tag.
  • Step 103 Train DNN models of different genders according to voice data of different genders in the training set.
  • the DNN models of different genders are respectively trained according to the speech data of different genders in the training set and the preset deep neural network algorithm, that is, the male DNN model and the female DNN model are respectively trained.
  • the male DNN model is configured to receive male voice data, and output a posterior probability corresponding to the male voice data
  • the female DNN model is configured to receive female voice data, and output a posterior probability corresponding to the female voice data.
  • Step 104 According to the DNN model of different genders and the speech data of different genders in the training set, respectively, the unified background model, the feature vector extraction model, and the probabilistic linear discriminant analysis model of different genders are trained.
  • the unified background model, the feature vector extraction model and the probability linear discriminant analysis model of different genders are respectively trained.
  • a feature vector extraction model configured to receive a posterior probability of the DNN model output and voice data input by the user, and extract a second feature vector of the voice data according to a preset algorithm
  • the probability linear discriminant analysis model is configured to compare the similarity between the second feature vector of the voice data input by the user and the pre-stored voiceprint registration template.
  • FIG. 3 is a schematic diagram of the generation of a male voiceprint authentication processing model. Referring to Figure 3, the details are as follows:
  • the male DNN model is used to process the speech data of the male in the training set to output the posterior probability, and the posterior probability of the output is normalized to train the unified background model in the male voiceprint authentication processing model.
  • the similarity between the second feature vector of the male voice data and the pre-stored male voiceprint registration template is compared, and the probability linear discriminant analysis model in the male voiceprint authentication processing model is trained.
  • FIG. 4 is a schematic diagram of the generation of the female voiceprint authentication processing model. Referring to Figure 4, the details are as follows:
  • the female DNN model is used to process the speech data of the women in the training episode to output the posterior probability, and the posterior probabilities of the output are normalized to train the unified background model in the female voiceprint authentication processing model.
  • the posterior probability of the DNN model output and the female voice data are obtained, and the second feature vector of the female voice data is extracted according to a preset algorithm, and the feature vector extraction model in the female voiceprint authentication processing model is trained.
  • the similarity between the second feature vector of the female voice data and the pre-stored female voiceprint registration template is compared, and the probabilistic linear discriminant analysis model in the female voiceprint authentication processing model is trained.
  • the voiceprint authentication processing method of the embodiment applies a depth neural network DNN voiceprint baseline system of a mixed gender, and extracts a first feature vector of each voice in the training set, according to the first feature vector of each voice and a pre-labeled
  • the gender training gender classifier trains DNN models of different genders according to the gender data of the different genders in the training set, and respectively trains the unified background models of different genders according to the DNN models of different genders and the speech data of different genders in the training set. , feature vector extraction model, and probability linear discriminant analysis model. Therefore, a gender-based voiceprint authentication processing model is established to improve the efficiency and accuracy of voiceprint authentication.
  • FIG. 5 is a flowchart of a voiceprint authentication processing method according to another embodiment of the present application.
  • the voiceprint authentication processing method further includes the following voiceprint registration steps:
  • Step 201 Receive a voiceprint registration request that is sent by a user and carries a user identifier.
  • Step 202 Acquire a plurality of voices sent by the user for voiceprint registration, extract first feature information of the first voice, and apply the gender classifier to obtain a gender tag of the first feature information.
  • a user who needs to perform voiceprint authentication needs to perform voiceprint registration in the voiceprint authentication processing model in advance.
  • the user needs to send a voiceprint registration request carrying the user identification to the voiceprint authentication processing model.
  • the voiceprint authentication processing model After receiving the voiceprint registration request sent by the user and carrying the user identifier, the voiceprint authentication processing model prompts the user to input the voice. The user transmits a plurality of voices for voiceprint registration to the voiceprint authentication processing model.
  • the voiceprint authentication processing model extracts first feature information of the first voice and transmits the first feature information to a pre-generated gender classifier.
  • the gender classifier analyzes the first feature information, and obtains a gender tag of the first feature information, that is, a gender tag of the first voice.
  • Step 203 Acquire a posterior probability of each voice according to a DNN model corresponding to the gender tag.
  • Step 204 Extract a second feature vector of each voice according to the unified background model and the feature vector extraction model corresponding to the gender tag.
  • Step 205 Acquire a voiceprint registration model of the user according to a plurality of second feature vectors corresponding to the plurality of voices.
  • Step 206 Store the correspondence between the user identifier, the gender tag, and the voiceprint registration model to the voiceprint registration database.
  • the plurality of voices input by the user are sent to the DNN model of the corresponding gender according to the gender tag corresponding to the first voice returned by the gender classifier. That is, if the first speech corresponds to male speech, multiple speeches are sent to the male DNN model. If the first voice corresponds to a female voice, multiple voices are sent to the female DNN model.
  • a plurality of posterior probabilities corresponding to each speech are obtained according to the DNN model corresponding to the gender tag.
  • the feature vector extraction model extracts a second feature vector of each speech according to each speech and a corresponding normalized posterior probability.
  • An average feature vector of the plurality of second feature vectors is obtained as a voiceprint registration model of the user.
  • the correspondence between the user identifier requested by the user, the gender label of the user, and the voiceprint registration model is stored in the voiceprint registration database, so that the voiceprint recognition is performed according to the voiceprint registration model.
  • the gender classifier is first used to obtain the gender label of the first voice input by the user, and the posterior probability of each voice is obtained according to the DNN model corresponding to the gender label, according to the gender label.
  • Corresponding unified background model and feature vector extraction model respectively extracting a second feature vector of each voice, acquiring a voiceprint registration model of the user according to the plurality of second feature vectors, and using the user identifier, the gender tag,
  • the correspondence of the voiceprint registration model is stored in the voiceprint registration database.
  • FIG. 6 is a flow chart of a voiceprint authentication processing method according to another embodiment of the present application.
  • the voiceprint authentication processing method includes:
  • Step 301 Receive a voiceprint identification request that is sent by a user and carries a user identifier.
  • Step 302 Query the voiceprint registration database to obtain a gender label and a voiceprint registration model corresponding to the user identifier.
  • the user who needs to perform voiceprint recognition needs to input a user identifier in the voiceprint authentication processing model and send a voiceprint recognition request carrying the user identifier.
  • the voiceprint identification request sent by the user is parsed to obtain a user identifier, and the voiceprint registration database is queried to obtain a gender label and a voiceprint registration model corresponding to the user identifier, so as to obtain the gender label and the voiceprint registration model of the user.
  • Step 303 Acquire a voice for voiceprint recognition sent by the user, and acquire a posterior probability of the voice according to a DNN model corresponding to the gender tag.
  • the voice sent by the user for voiceprint recognition is acquired, and the voice is sent to a DNN model corresponding to the gender tag of the user, and the DNN model processes the voice to obtain a posterior probability of the voice.
  • Step 304 Apply a unified background model and a feature vector extraction model corresponding to the gender tag, and extract a second feature vector of the voice.
  • the posterior probability of the speech is sent to a unified background model corresponding to the gender tag.
  • the unified background model normalizes each posterior probability, and applies a pre-trained feature vector extraction model according to the speech, And corresponding normalized posterior probability, the second feature vector of the speech is extracted.
  • Step 305 Apply a probability linear discriminant analysis model corresponding to the gender tag, and compare the similarity between the second feature vector of the speech and the voiceprint registration model.
  • Step 306 returning a voiceprint recognition result to the user according to the similarity and a preset threshold.
  • the second feature vector of the voice is sent to a probabilistic linear discriminant analysis model corresponding to the gender tag, and the probabilistic linear discriminant analysis model compares the second feature vector of the speech with the pre-stored voiceprint registration model of the user. Similarity.
  • the voiceprint recognition is successful
  • the return voiceprint recognition fails.
  • the voiceprint authentication processing method of the embodiment first queries the voiceprint registration database to obtain the gender label and the voiceprint registration model corresponding to the user identifier; applies the unified background model and the feature vector extraction model corresponding to the gender label, and extracts the second voice.
  • the feature vector, the probabilistic linear discriminant analysis model compares the similarity between the second feature vector of the speech and the voiceprint registration model, and returns the voiceprint recognition result to the user according to the similarity and the preset threshold.
  • the present application also proposes a voiceprint authentication processing apparatus.
  • FIG. 7 is a schematic structural diagram of a voiceprint authentication processing apparatus according to an embodiment of the present application.
  • the voiceprint authentication processing apparatus includes:
  • the extracting module 11 is configured to apply a mixed-sex deep neural network DNN voiceprint baseline system, and extract a first feature vector of each voice in the training set;
  • the generating module 12 is configured to train the gender classifier according to the first feature vector of each voice and the pre-labeled gender tag;
  • the first training module 13 is configured to separately train DNN models of different genders according to voice data of different genders in the training set;
  • the second training module 14 is configured to train a unified background model, a feature vector extraction model, and a probability linear discriminant analysis model of different genders according to DNN models of different genders and voice data of different genders in the training set.
  • the voiceprint authentication processing apparatus of the embodiment of the present application applies a depth neural network DNN voiceprint baseline system of a mixed gender, and extracts a first feature vector of each voice in the training set, according to the first feature vector of each voice and the advance
  • the labeled gender tag trains the gender classifier, and according to the speech data of different genders in the training set, respectively trains DNN models of different genders, and respectively trains different genders according to DNN models of different genders and voice data of different genders in the training set.
  • Unified background model, feature vector extraction model, and probabilistic linear discriminant analysis model are established to improve the efficiency and accuracy of voiceprint authentication.
  • FIG. 8 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application. As shown in FIG. 8, according to the embodiment shown in FIG. 7, the method further includes:
  • the first receiving module 15 is configured to receive a voiceprint registration request that is sent by the user and that carries the user identifier.
  • the gender labeling module 16 is configured to acquire a plurality of voices sent by the user for the voiceprint registration, extract the first feature information of the first voice, and apply the gender classifier to obtain the gender tag of the first feature information;
  • the first processing module 17 is configured to obtain a posterior probability of each voice according to a DNN model corresponding to the gender tag, and extract a voice model according to a unified background model and a feature vector corresponding to the gender tag. Second feature vector;
  • the obtaining module 18 is configured to acquire the voiceprint registration model of the user according to the plurality of second feature vectors corresponding to the plurality of voices;
  • the registration module 19 is configured to store the correspondence between the user identifier, the gender label, and the voiceprint registration model to the voiceprint registration database.
  • the obtaining module 18 is configured to:
  • the voiceprint authentication processing apparatus of the embodiment of the present application first applies a gender classifier to obtain a gender label of a first voice input by a user, and obtains a posterior probability of each voice according to a DNN model corresponding to the gender label, according to the gender a unified background model and a feature vector extraction model corresponding to the label, respectively extracting a second feature vector of each voice, acquiring a voiceprint registration model of the user according to the plurality of second feature vectors, and using the user identifier and the gender label
  • the correspondence relationship of the voiceprint registration model is stored in the voiceprint registration database.
  • the gender-based voiceprint registration process is implemented, so that the gender-specific voiceprint authentication processing model is applied to improve the efficiency and accuracy of voiceprint authentication.
  • FIG. 9 is a schematic structural diagram of a voiceprint authentication processing apparatus according to another embodiment of the present application. As shown in FIG. 9, according to the embodiment shown in FIG. 8, the method further includes:
  • the second receiving module 20 is configured to receive a voiceprint identification request that is sent by the user and that carries the user identifier.
  • the query module 21 is configured to query the voiceprint registration database to obtain a gender label and a voiceprint registration model corresponding to the user identifier.
  • the second processing module 22 is configured to acquire a voice for voiceprint recognition sent by the user, obtain a posterior probability of the voice according to a DNN model corresponding to the gender tag, and apply a unified background model corresponding to the gender tag. And a feature vector extraction model, extracting a second feature vector of the speech;
  • a comparison module 23 configured to apply a probability linear discriminant analysis model corresponding to the gender tag, and compare a similarity between the second feature vector of the speech and the voiceprint registration model;
  • the identification module 24 is configured to return a voiceprint recognition result to the user according to the similarity and a preset threshold.
  • the identification module 24 is configured to:
  • the voiceprint recognition is successful
  • the return voiceprint recognition fails.
  • the voiceprint authentication processing apparatus of the embodiment of the present application first queries the voiceprint registration database to obtain a gender label and a voiceprint registration model corresponding to the user identifier, and applies a unified background model and a feature vector extraction model corresponding to the gender label to extract the voice number.
  • the two feature vectors apply a probabilistic linear discriminant analysis model to compare the similarity between the second feature vector of the voice and the voiceprint registration model, and return the voiceprint recognition result to the user according to the similarity and the preset threshold.
  • a storage medium configured to store an application for performing a voiceprint authentication processing method according to the first aspect of the present invention.
  • the voiceprint authentication processing apparatus of the fourth aspect of the present invention includes: one or more processors; a memory; one or more modules, the one or more modules being stored in the memory When performed by the one or more processors, do the following:
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” or “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.
  • a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device.
  • computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
  • the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuit, ASIC with suitable combinational logic gate, Programmable Gate Array (PGA), now Field programmable gate array (FPGA), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. While the embodiments of the present application have been shown and described above, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the present application. The embodiments are subject to variations, modifications, substitutions and variations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Collating Specific Patterns (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种声纹认证处理方法和装置,其中,该方法包括:应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量(101);根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器(102);根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型(103);根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型(104)。

Description

声纹认证处理方法及装置
相关申请的交叉引用
本申请要求百度在线网络技术(北京)有限公司于2015年12月30日提交的、发明名称为“声纹认证处理方法及装置”的、中国专利申请号“201511024873.7”的优先权。
技术领域
本申请涉及声纹认证技术领域,尤其涉及一种声纹认证处理方法及装置。
背景技术
随着技术的不断进步,声纹识别(Voiceprint Recognition,VPR)技术的应用领域越来越广泛。
声纹识别可以确认某段语音是否是指定的某个人所说的,例如,考勤打卡,或者银行交易时需要对用户声音进行确认。在声纹识别之前,都需要先对说话人的声纹进行建模,这就是所谓的“训练”或“学习”过程。
目前的声纹识别的训练过程是通过通用的模型进行声纹的训练和识别,准确性不高。
发明内容
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本申请的第一个目的在于提出一种声纹认证处理方法,该方法建立了区分性别的声纹认证处理模型,以便提高了声纹认证的效率和准确性。
本申请的第二个目的在于提出一种声纹认证处理装置。
本发明的第三个目的在于提出一种存储介质。
本发明的第四个目的在于提出一种声纹认证处理设备。
为达上述目的,本申请第一方面实施例提出了一种声纹认证处理方法,包括:应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
本申请实施例的声纹认证处理方法,通过应用混合性别的深度神经网络DNN声纹基线 系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。由此,建立了区分性别的声纹认证处理模型,以便提高了声纹认证的效率和准确性。
为达上述目的,本申请第二方面实施例提出了一种声纹认证处理装置,包括:提取模块,用于应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;生成模块,用于根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;第一训练模块,用于根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;第二训练模块,用于根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
本申请实施例的声纹认证处理装置,通过应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。由此,建立了区分性别的声纹认证处理模型,以便提高了声纹认证的效率和准确性。
为了实现上述目的,本发明第三方面实施例的存储介质,用于存储应用程序,所述应用程序用于执行本发明第一方面实施例所述的声纹认证处理方法。
为了实现上述目的,本发明第四方面实施例的声纹认证处理设备,包括:一个或者多个处理器;存储器;一个或者多个模块,所述一个或者多个模块存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
附图说明
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1是本申请一个实施例的声纹认证处理方法的流程图;
图2为性别分类器的生成示意图;
图3为男性声纹认证处理模型生成示意图;
图4为女性声纹认证处理模型生成示意图;
图5是本申请另一个实施例的声纹认证处理方法的流程图;
图6是本申请另一个实施例的声纹认证处理方法的流程图;
图7是本申请一个实施例的声纹认证处理装置的结构示意图;
图8是本申请另一个实施例的声纹认证处理装置的结构示意图;
图9是本申请另一个实施例的声纹认证处理装置的结构示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。
下面参考附图描述本申请实施例的声纹认证处理方法及装置。
图1是本申请一个实施例的声纹认证处理方法的流程图。
如图1所示,该声纹认证处理方法包括:
步骤101,应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量。
步骤102,根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器。
具体地,为了训练建立区分性别的声纹认证处理模型,首先需要应用混合性别的深度神经网络DNN声纹基线系统生成性别训练性别分类器,以便应用训练性别分类器识别输入声音的性别,为输入的声音分配性别标签。
图2为性别分类器的生成示意图,参见图2说明如何应用混合性别的DNN声纹基线系统生成性别分类器,具体如下:
预先设置包含多条语音的训练集,训练集中的每条语音数据都预先标注有对应的性别信息,比如,第一条语音数据对应的性别为男性数据,第二条语音数据对应的性别为女性数据。
将训练集中的每条语音数据输入混合性别的DNN声纹基线系统,应用DNN声纹基线系统对每条语音数据进行数据处理,提取与每条语音对应的第一特征向量。
进而,根据每条语音的第一特征向量,以及预先标注的每条语音的性别训练性别分类器,从而可以应用训练性别分类器识别输入声音的性别,为输入的声音分配性别标签。
步骤103,根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型。
具体地,根据训练集中不同性别的语音数据,以及预设的深度神经网络算法分别训练不同性别的DNN模型,也就是说,分别训练男性DNN模型和女性DNN模型。
其中,男性DNN模型用于接收男性的语音数据,输出与该男性语音数据对应的后验概率,女性DNN模型用于接收女性的语音数据,输出与该女性语音数据对应的后验概率。
步骤104,根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
具体地,根据不同性别的DNN模型以及训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
其中,对各个模型的功能解释如下:
统一背景模型,用于对DNN模型输出的后验概率进行归一化处理;
特征向量提取模型,用于接收DNN模型输出的后验概率以及用户输入的语音数据,并且根据预设的算法提取语音数据的第二特征向量;
概率线性判别分析模型,用于比较用户输入的语音数据的第二特征向量与预先存储的声纹注册模板的相似度。
图3为男性声纹认证处理模型生成示意图,参见图3,具体如下:
应用男性的DNN模型对训练集中男性的语音数据进行处理输出后验概率,并对输出的后验概率进行归一化处理,训练男性声纹认证处理模型中的统一背景模型。
获取DNN模型输出的后验概率以及男性语音数据,根据预设的算法提取男性语音数据的第二特征向量,训练男性声纹认证处理模型中的特征向量提取模型。
比较男性语音数据的第二特征向量与预先存储的男性声纹注册模板的相似度,训练男性声纹认证处理模型中的概率线性判别分析模型。
图4为女性声纹认证处理模型生成示意图,参见图4,具体如下:
应用女性的DNN模型对训练集中女性的语音数据进行处理输出后验概率,并对输出的后验概率进行归一化处理,训练女性声纹认证处理模型中的统一背景模型。
获取DNN模型输出的后验概率以及女性语音数据,根据预设的算法提取女性语音数据的第二特征向量,训练女性声纹认证处理模型中的特征向量提取模型。
比较女性语音数据的第二特征向量与预先存储的女性声纹注册模板的相似度,训练女性声纹认证处理模型中的概率线性判别分析模型。
本实施例的声纹认证处理方法,应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量,根据所述每条语音的第一特征向量以及预先标注的性别训练性别分类器,根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型,根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。由此,建立了区分性别的声纹认证处理模型,以提高声纹认证的效率和准确性。
图5是本申请另一个实施例的声纹认证处理方法的流程图。
参见图5,在步骤104之后,该声纹认证处理方法还包括以下声纹注册步骤:
步骤201,接收用户发送的携带用户标识的声纹注册请求。
步骤202,获取用户发送的用于声纹注册的多条语音,提取第一条语音的第一特征信息,应用所述性别分类器获取所述第一特征信息的性别标签。
具体地,需要进行声纹认证的用户需要预先在声纹认证处理模型中进行声纹注册。首先,用户需要向声纹认证处理模型发送携带用户标识的声纹注册请求。
声纹认证处理模型接收用户发送的携带用户标识的声纹注册请求之后,向用户提示输入语音。用户向声纹认证处理模型发送用于声纹注册的多条语音。
声纹认证处理模型提取第一条语音的第一特征信息,并将第一特征信息发送给预先生成的性别分类器。性别分类器对第一特征信息进行分析,获取所述第一特征信息的性别标签,也就是第一条语音的性别标签。
步骤203,根据与所述性别标签对应的DNN模型获取每条语音的后验概率。
步骤204,根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量。
步骤205,根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型。
步骤206,将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。
具体地,根据性别分类器返回的与第一条语音对应的性别标签,将用户输入的多条语音发送到对应性别的DNN模型中。也就是说,如果第一条语音对应的是男性语音,将多条语音发送到男性DNN模型中。如果第一条语音对应的是女性语音,将多条语音发送到女性DNN模型中。
根据与性别标签对应的DNN模型获取每条语音对应的多个后验概率。
根据与性别标签对应的统一背景模型对每个后验概率进行归一化处理,应用预先训练 的特征向量提取模型根据每条语音,以及对应的归一化的后验概率,分别提取每条语音的第二特征向量。
根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型,获取的方式很多,可以根据不同的应用需要进行选择,例如:
获取多个第二特征向量的平均特征向量作为所述用户的声纹注册模型。
进而,将用户请求注册的用户标识、该用户的性别标签和声纹注册模型的对应关系存储到声纹注册数据库,以便后续根据该声纹注册模型进行声纹识别。
本实施例的声纹认证处理方法,首先应用性别分类器获取用户输入的第一条语音的性别标签,根据与性别标签对应的DNN模型获取每条语音的后验概率,根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量,根据多个第二特征向量获取所述用户的声纹注册模型,将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。由此,实现了区分性别的声纹注册过程,以便应用区分性别的声纹认证处理模型提高了声纹认证的效率和准确性。
图6是本申请另一个实施例的声纹认证处理方法的流程图。
参见图6,该声纹认证处理方法包括:
步骤301,接收用户发送的携带用户标识的声纹识别请求。
步骤302,查询所述声纹注册数据库获取与所述用户标识对应的性别标签和声纹注册模型。
具体地,需要进行声纹识别的用户需要在声纹认证处理模型中输入用户标识,并发送携带用户标识的声纹识别请求。
对用户发送的声纹识别请求进行解析获取用户标识,查询所述声纹注册数据库获取与所述用户标识对应的性别标签和声纹注册模型,从而获取该用户的性别标签和声纹注册模型。
步骤303,获取用户发送的用于声纹识别的语音,根据与所述性别标签对应的DNN模型获取所述语音的后验概率。
具体地,获取用户发送的用于声纹识别的语音,将该语音发送到与用户的性别标签对应的DNN模型中,DNN模型对该语音进行处理,获取该语音的后验概率。
步骤304,应用与所述性别标签对应的统一背景模型和特征向量提取模型,提取所述语音的第二特征向量。
具体地,将该语音的后验概率发送给与所述性别标签对应的统一背景模型。统一背景模型对每个后验概率进行归一化处理,应用预先训练的特征向量提取模型根据该语音,以 及对应的归一化的后验概率,提取该语音的第二特征向量。
步骤305,应用与所述性别标签对应的概率线性判别分析模型,比较所述语音的第二特征向量和所述声纹注册模型的相似度。
步骤306,根据所述相似度和预设的阈值向所述用户返回声纹识别结果。
具体地,将该语音的第二特征向量发送给与性别标签对应的概率线性判别分析模型中,概率线性判别分析模型比较该语音的第二特征向量和预先存储的该用户的声纹注册模型的相似度。
比较两者的相似度和预设的阈值的大小;
若获知所述相似度大于等于预设的阈值,则返回声纹识别成功;
若获知所述相似度小于预设的阈值,则返回声纹识别失败。
本实施例的声纹认证处理方法,首先查询声纹注册数据库获取与用户标识对应的性别标签和声纹注册模型;应用与性别标签对应的统一背景模型和特征向量提取模型,提取语音的第二特征向量,应用概率线性判别分析模型比较语音的第二特征向量和声纹注册模型的相似度,根据所述相似度和预设的阈值向所述用户返回声纹识别结果。由此,实现了区分性别的声纹认证过程,提高了声纹认证的效率和准确性。
为了实现上述实施例,本申请还提出一种声纹认证处理装置。
图7是本申请一个实施例的声纹认证处理装置的结构示意图。
如图7所示,该声纹认证处理装置包括:
提取模块11,用于应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;
生成模块12,用于根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;
第一训练模块13,用于根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;
第二训练模块14,用于根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
需要说明的是,前述对声纹认证处理方法实施例的解释说明也适用于该实施例的声纹认证处理装置,此处不再赘述。
本申请实施例的声纹认证处理装置,应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量,根据所述每条语音的第一特征向量以及预先 标注的性别标签训练性别分类器,根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型,根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。由此,建立了区分性别的声纹认证处理模型,以提高声纹认证的效率和准确性。
图8是本申请另一个实施例的声纹认证处理装置的结构示意图,如图8所示,基于图7所示实施例,还包括:
第一接收模块15,用于接收用户发送的携带用户标识的声纹注册请求;
性别标注模块16,用于获取用户发送的用于声纹注册的多条语音,提取第一条语音的第一特征信息,应用所述性别分类器获取所述第一特征信息的性别标签;
第一处理模块17,用于根据与所述性别标签对应的DNN模型获取每条语音的后验概率;根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量;
获取模块18,用于根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型;
注册模块19,用于将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。
在一个实施例中,所述获取模块18用于:
获取所述多个第二特征向量的平均特征向量作为所述用户的声纹注册模型。
需要说明的是,前述对声纹认证处理方法实施例的解释说明也适用于该实施例的声纹认证处理装置,此处不再赘述。
本申请实施例的声纹认证处理装置,首先应用性别分类器获取用户输入的第一条语音的性别标签,根据与性别标签对应的DNN模型获取每条语音的后验概率,根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量,根据多个第二特征向量获取所述用户的声纹注册模型,将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。由此,实现了区分性别的声纹注册过程,以便应用区分性别的声纹认证处理模型提高了声纹认证的效率和准确性。
图9是本申请另一个实施例的声纹认证处理装置的结构示意图,如图9所示,基于图8所示实施例,还包括:
第二接收模块20,用于接收用户发送的携带用户标识的声纹识别请求;
查询模块21,用于查询所述声纹注册数据库获取与所述用户标识对应的性别标签和声纹注册模型;
第二处理模块22,用于获取用户发送的用于声纹识别的语音,根据与所述性别标签对应的DNN模型获取所述语音的后验概率,应用与所述性别标签对应的统一背景模型和特征向量提取模型,提取所述语音的第二特征向量;
比较模块23,用于应用与所述性别标签对应的概率线性判别分析模型,比较所述语音的第二特征向量和所述声纹注册模型的相似度;
识别模块24,用于根据所述相似度和预设的阈值向所述用户返回声纹识别结果。
其中,所述识别模块24用于:
比较所述相似度和预设的阈值的大小;
若获知所述相似度大于等于预设的阈值,则返回声纹识别成功;
若获知所述相似度小于预设的阈值,则返回声纹识别失败。
需要说明的是,前述对声纹认证处理方法实施例的解释说明也适用于该实施例的声纹认证处理装置,此处不再赘述。
本申请实施例的声纹认证处理装置,首先查询声纹注册数据库获取与用户标识对应的性别标签和声纹注册模型;应用与性别标签对应的统一背景模型和特征向量提取模型,提取语音的第二特征向量,应用概率线性判别分析模型比较语音的第二特征向量和声纹注册模型的相似度,根据所述相似度和预设的阈值向所述用户返回声纹识别结果。由此,实现了区分性别的声纹认证过程,提高了声纹认证的效率和准确性。
为了实现上述实施例,本发明第三方面实施例的存储介质,用于存储应用程序,所述应用程序用于执行本发明第一方面实施例所述的声纹认证处理方法。
为了实现上述实施例,本发明第四方面实施例的声纹认证处理设备,包括:一个或者多个处理器;存储器;一个或者多个模块,所述一个或者多个模块存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:
S101',应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量。
S102',根据每条语音的第一特征向量以及预先标注的性别标签训练性别分类器。
S103',根据训练集中不同性别的语音数据,分别训练不同性别的DNN模型。
S104',根据不同性别的DNN模型以及训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须 针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现 场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (12)

  1. 一种声纹认证处理方法,其特征在于,包括以下步骤:
    应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;
    根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;
    根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;
    根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
  2. 如权利要求1所述的方法,其特征在于,还包括:
    接收用户发送的携带用户标识的声纹注册请求;
    获取用户发送的用于声纹注册的多条语音,提取第一条语音的第一特征信息,应用所述性别分类器获取所述第一特征信息的性别标签;
    根据与所述性别标签对应的DNN模型获取每条语音的后验概率;
    根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量;
    根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型;
    将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。
  3. 如权利要求2所述的方法,其特征在于,所述根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型,包括:
    获取所述多个第二特征向量的平均特征向量作为所述用户的声纹注册模型。
  4. 如权利要求2或3所述的方法,其特征在于,还包括:
    接收用户发送的携带用户标识的声纹识别请求;
    查询所述声纹注册数据库获取与所述用户标识对应的性别标签和声纹注册模型;
    获取用户发送的用于声纹识别的语音,根据与所述性别标签对应的DNN模型获取所述语音的后验概率;
    应用与所述性别标签对应的统一背景模型和特征向量提取模型,提取所述语音的第二特征向量;
    应用与所述性别标签对应的概率线性判别分析模型,比较所述语音的第二特征向量和所述声纹注册模型的相似度;
    根据所述相似度和预设的阈值向所述用户返回声纹识别结果。
  5. 如权利要求4所述的方法,其特征在于,所述根据所述相似度和预设的阈值向所述用户返回声纹识别结果,包括:
    比较所述相似度和预设的阈值的大小;
    若获知所述相似度大于等于预设的阈值,则返回声纹识别成功;
    若获知所述相似度小于预设的阈值,则返回声纹识别失败。
  6. 一种声纹认证处理装置,其特征在于,包括:
    提取模块,用于应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;
    生成模块,用于根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;
    第一训练模块,用于根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;
    第二训练模块,用于根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
  7. 如权利要求6所述的装置,其特征在于,还包括:
    第一接收模块,用于接收用户发送的携带用户标识的声纹注册请求;
    性别标注模块,用于获取用户发送的用于声纹注册的多条语音,提取第一条语音的第一特征信息,应用所述性别分类器获取所述第一特征信息的性别标签;
    第一处理模块,用于根据与所述性别标签对应的DNN模型获取每条语音的后验概率;根据与所述性别标签对应的统一背景模型和特征向量提取模型,分别提取每条语音的第二特征向量;
    获取模块,用于根据与所述多条语音对应的多个第二特征向量获取所述用户的声纹注册模型;
    注册模块,用于将所述用户标识、所述性别标签、所述声纹注册模型的对应关系存储到声纹注册数据库。
  8. 如权利要求7所述的装置,其特征在于,所述获取模块用于:
    获取所述多个第二特征向量的平均特征向量作为所述用户的声纹注册模型。
  9. 如权利要求7或8所述的装置,其特征在于,还包括:
    第二接收模块,用于接收用户发送的携带用户标识的声纹识别请求;
    查询模块,用于查询所述声纹注册数据库获取与所述用户标识对应的性别标签和声纹 注册模型;
    第二处理模块,用于获取用户发送的用于声纹识别的语音,根据与所述性别标签对应的DNN模型获取所述语音的后验概率,应用与所述性别标签对应的统一背景模型和特征向量提取模型,提取所述语音的第二特征向量;
    比较模块,用于应用与所述性别标签对应的概率线性判别分析模型,比较所述语音的第二特征向量和所述声纹注册模型的相似度;
    识别模块,用于根据所述相似度和预设的阈值向所述用户返回声纹识别结果。
  10. 如权利要求9所述的装置,其特征在于,所述识别模块用于:
    比较所述相似度和预设的阈值的大小;
    若获知所述相似度大于等于预设的阈值,则返回声纹识别成功;
    若获知所述相似度小于预设的阈值,则返回声纹识别失败。
  11. 一种存储介质,其特征在于,用于存储应用程序,所述应用程序用于执行权利要求1至5中任一项所述的声纹认证处理方法。
  12. 一种声纹认证处理设备,其特征在于,包括:
    一个或者多个处理器;
    存储器;
    一个或者多个模块,所述一个或者多个模块存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:
    应用混合性别的深度神经网络DNN声纹基线系统,提取训练集中每条语音的第一特征向量;
    根据所述每条语音的第一特征向量以及预先标注的性别标签训练性别分类器;
    根据所述训练集中不同性别的语音数据,分别训练不同性别的DNN模型;
    根据不同性别的DNN模型以及所述训练集中不同性别的语音数据,分别训练不同性别的统一背景模型、特征向量提取模型、以及概率线性判别分析模型。
PCT/CN2016/088435 2015-12-30 2016-07-04 声纹认证处理方法及装置 WO2017113680A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2017519504A JP6682523B2 (ja) 2015-12-30 2016-07-04 声紋認証処理方法及び装置
US15/501,292 US10685658B2 (en) 2015-12-30 2016-07-04 Method and device for processing voiceprint authentication
EP16829225.8A EP3296991B1 (en) 2015-12-30 2016-07-04 Method and device for voiceprint authentication processing
KR1020177002005A KR101870093B1 (ko) 2015-12-30 2016-07-04 성문 인증 처리 방법 및 장치

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511024873.7 2015-12-30
CN201511024873.7A CN105513597B (zh) 2015-12-30 2015-12-30 声纹认证处理方法及装置

Publications (1)

Publication Number Publication Date
WO2017113680A1 true WO2017113680A1 (zh) 2017-07-06

Family

ID=55721524

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/088435 WO2017113680A1 (zh) 2015-12-30 2016-07-04 声纹认证处理方法及装置

Country Status (6)

Country Link
US (1) US10685658B2 (zh)
EP (1) EP3296991B1 (zh)
JP (1) JP6682523B2 (zh)
KR (1) KR101870093B1 (zh)
CN (1) CN105513597B (zh)
WO (1) WO2017113680A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109545227A (zh) * 2018-04-28 2019-03-29 华中师范大学 基于深度自编码网络的说话人性别自动识别方法及系统
CN110136726A (zh) * 2019-06-20 2019-08-16 厦门市美亚柏科信息股份有限公司 一种语音性别的估计方法、装置、系统及存储介质
WO2019214047A1 (zh) * 2018-05-08 2019-11-14 平安科技(深圳)有限公司 建立声纹模型的方法、装置、计算机设备和存储介质
CN111241512A (zh) * 2020-01-09 2020-06-05 珠海格力电器股份有限公司 留言信息播报方法、装置、电子设备及存储介质
CN114141255A (zh) * 2021-11-24 2022-03-04 中国电信股份有限公司 声纹识别模型的训练方法及装置、声纹识别方法及装置

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9875742B2 (en) * 2015-01-26 2018-01-23 Verint Systems Ltd. Word-level blind diarization of recorded calls with arbitrary number of speakers
CN105513597B (zh) * 2015-12-30 2018-07-10 百度在线网络技术(北京)有限公司 声纹认证处理方法及装置
CN107346568B (zh) * 2016-05-05 2020-04-17 阿里巴巴集团控股有限公司 一种门禁系统的认证方法和装置
EP3460791A4 (en) * 2016-05-16 2019-05-22 Sony Corporation INFORMATION PROCESSING DEVICE
CN106297807B (zh) * 2016-08-05 2019-03-01 腾讯科技(深圳)有限公司 训练声纹识别系统的方法和装置
CN106710599A (zh) * 2016-12-02 2017-05-24 深圳撒哈拉数据科技有限公司 一种基于深度神经网络的特定声源检测方法与系统
CN106710604A (zh) * 2016-12-07 2017-05-24 天津大学 提高语音可懂度的共振峰增强装置和方法
CN107610707B (zh) * 2016-12-15 2018-08-31 平安科技(深圳)有限公司 一种声纹识别方法及装置
CN108288470B (zh) * 2017-01-10 2021-12-21 富士通株式会社 基于声纹的身份验证方法和装置
CN108573698B (zh) * 2017-03-09 2021-06-08 中国科学院声学研究所 一种基于性别融合信息的语音降噪方法
AU2017305006A1 (en) * 2017-06-13 2019-01-03 Beijing Didi Infinity Technology And Development Co., Ltd. Method, apparatus and system for speaker verification
CN107610709B (zh) * 2017-08-01 2021-03-19 百度在线网络技术(北京)有限公司 一种训练声纹识别模型的方法及系统
CN107623614B (zh) * 2017-09-19 2020-12-08 百度在线网络技术(北京)有限公司 用于推送信息的方法和装置
CN108694954A (zh) * 2018-06-13 2018-10-23 广州势必可赢网络科技有限公司 一种性别年龄识别方法、装置、设备及可读存储介质
CN109036436A (zh) * 2018-09-18 2018-12-18 广州势必可赢网络科技有限公司 一种声纹数据库建立方法、声纹识别方法、装置及系统
JP7326033B2 (ja) * 2018-10-05 2023-08-15 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 話者認識装置、話者認識方法、及び、プログラム
CN109473105A (zh) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 与文本无关的声纹验证方法、装置和计算机设备
CN109378007B (zh) * 2018-12-28 2022-09-13 浙江百应科技有限公司 一种基于智能语音对话实现性别识别的方法
CN109378006B (zh) * 2018-12-28 2022-09-16 三星电子(中国)研发中心 一种跨设备声纹识别方法及系统
US11031017B2 (en) * 2019-01-08 2021-06-08 Google Llc Fully supervised speaker diarization
CN111462760B (zh) * 2019-01-21 2023-09-26 阿里巴巴集团控股有限公司 声纹识别系统、方法、装置及电子设备
CN109637547B (zh) * 2019-01-29 2020-11-03 北京猎户星空科技有限公司 音频数据标注方法、装置、电子设备及存储介质
US11289098B2 (en) 2019-03-08 2022-03-29 Samsung Electronics Co., Ltd. Method and apparatus with speaker recognition registration
CN109994116B (zh) * 2019-03-11 2021-01-19 南京邮电大学 一种基于会议场景小样本条件下的声纹准确识别方法
CN109920435B (zh) * 2019-04-09 2021-04-06 厦门快商通信息咨询有限公司 一种声纹识别方法及声纹识别装置
CN113892136A (zh) * 2019-05-28 2022-01-04 日本电气株式会社 信号提取系统、信号提取学习方法以及信号提取学习程序
CN110660484B (zh) * 2019-08-01 2022-08-23 平安科技(深圳)有限公司 骨龄预测方法、装置、介质及电子设备
CN110517698B (zh) * 2019-09-05 2022-02-01 科大讯飞股份有限公司 一种声纹模型的确定方法、装置、设备及存储介质
CN110956966B (zh) * 2019-11-01 2023-09-19 平安科技(深圳)有限公司 声纹认证方法、装置、介质及电子设备
CN110660399A (zh) * 2019-11-11 2020-01-07 广州国音智能科技有限公司 声纹识别的训练方法、装置、终端及计算机存储介质
CN111009262A (zh) * 2019-12-24 2020-04-14 携程计算机技术(上海)有限公司 语音性别识别的方法及系统
CN111147484B (zh) * 2019-12-25 2022-06-14 秒针信息技术有限公司 账号登录方法和装置
CN110797032B (zh) * 2020-01-06 2020-05-12 深圳中创华安科技有限公司 一种声纹数据库建立方法及声纹识别方法
CN111179942B (zh) * 2020-01-06 2022-11-08 泰康保险集团股份有限公司 声纹识别方法、装置、设备及计算机可读存储介质
CN111243607A (zh) * 2020-03-26 2020-06-05 北京字节跳动网络技术有限公司 用于生成说话人信息的方法、装置、电子设备和介质
WO2021192719A1 (ja) * 2020-03-27 2021-09-30 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 話者識別方法、話者識別装置、話者識別プログラム、性別識別モデル生成方法及び話者識別モデル生成方法
JP7473910B2 (ja) 2020-03-27 2024-04-24 株式会社フュートレック 話者認識装置、話者認識方法およびプログラム
CN111489756B (zh) * 2020-03-31 2024-03-01 中国工商银行股份有限公司 一种声纹识别方法及装置
CN111583935A (zh) * 2020-04-02 2020-08-25 深圳壹账通智能科技有限公司 贷款智能进件方法、装置及存储介质
CN111933147B (zh) * 2020-06-22 2023-02-14 厦门快商通科技股份有限公司 声纹识别方法、系统、移动终端及存储介质
US11522994B2 (en) 2020-11-23 2022-12-06 Bank Of America Corporation Voice analysis platform for voiceprint tracking and anomaly detection
CN112637428A (zh) * 2020-12-29 2021-04-09 平安科技(深圳)有限公司 无效通话判断方法、装置、计算机设备及存储介质
US20220215834A1 (en) * 2021-01-01 2022-07-07 Jio Platforms Limited System and method for speech to text conversion
US11996087B2 (en) 2021-04-30 2024-05-28 Comcast Cable Communications, Llc Method and apparatus for intelligent voice recognition
KR102478076B1 (ko) * 2022-06-13 2022-12-15 주식회사 액션파워 음성 인식 오류 검출을 위해 학습 데이터를 생성하기 위한 방법
JP7335651B1 (ja) 2022-08-05 2023-08-30 株式会社Interior Haraguchi 顔認証決済システムおよび顔認証決済方法
CN117351484B (zh) * 2023-10-12 2024-08-27 深圳市前海高新国际医疗管理有限公司 基于ai的肿瘤干细胞特征提取及分类系统
CN117470976B (zh) * 2023-12-28 2024-03-26 烟台宇控软件有限公司 一种基于声纹特征的输电线路缺陷检测方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971690A (zh) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
US20150127342A1 (en) * 2013-11-04 2015-05-07 Google Inc. Speaker identification
US20150149165A1 (en) * 2013-11-27 2015-05-28 International Business Machines Corporation Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
CN105513597A (zh) * 2015-12-30 2016-04-20 百度在线网络技术(北京)有限公司 声纹认证处理方法及装置

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7006605B1 (en) * 1996-06-28 2006-02-28 Ochopee Big Cypress Llc Authenticating a caller before providing the caller with access to one or more secured resources
US5897616A (en) * 1997-06-11 1999-04-27 International Business Machines Corporation Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases
US6665644B1 (en) * 1999-08-10 2003-12-16 International Business Machines Corporation Conversational data mining
US20030110038A1 (en) * 2001-10-16 2003-06-12 Rajeev Sharma Multi-modal gender classification using support vector machines (SVMs)
US7266497B2 (en) * 2002-03-29 2007-09-04 At&T Corp. Automatic segmentation in speech synthesis
US7620547B2 (en) * 2002-07-25 2009-11-17 Sony Deutschland Gmbh Spoken man-machine interface with speaker identification
US7404087B2 (en) * 2003-12-15 2008-07-22 Rsa Security Inc. System and method for providing improved claimant authentication
US7231019B2 (en) * 2004-02-12 2007-06-12 Microsoft Corporation Automatic identification of telephone callers based on voice characteristics
US20070299671A1 (en) * 2004-03-31 2007-12-27 Ruchika Kapur Method and apparatus for analysing sound- converting sound into information
CN101136199B (zh) * 2006-08-30 2011-09-07 纽昂斯通讯公司 语音数据处理方法和设备
KR100864828B1 (ko) * 2006-12-06 2008-10-23 한국전자통신연구원 화자의 음성 특징 정보를 이용한 화자 정보 획득 시스템 및그 방법
US7949526B2 (en) * 2007-06-04 2011-05-24 Microsoft Corporation Voice aware demographic personalization
JP2009109712A (ja) * 2007-10-30 2009-05-21 National Institute Of Information & Communication Technology オンライン話者逐次区別システム及びそのコンピュータプログラム
US8433669B2 (en) * 2007-11-14 2013-04-30 International Business Machines Corporation Configuring individual classifiers with multiple operating points for cascaded classifier topologies under resource constraints
WO2011028844A2 (en) * 2009-09-02 2011-03-10 Sri International Method and apparatus for tailoring the output of an intelligent automated assistant to a user
JP5214679B2 (ja) * 2010-08-30 2013-06-19 株式会社東芝 学習装置、方法及びプログラム
US8559682B2 (en) * 2010-11-09 2013-10-15 Microsoft Corporation Building a person profile database
US8515750B1 (en) * 2012-06-05 2013-08-20 Google Inc. Realtime acoustic adaptation using stability measures
US9502038B2 (en) * 2013-01-28 2016-11-22 Tencent Technology (Shenzhen) Company Limited Method and device for voiceprint recognition
US9336781B2 (en) * 2013-10-17 2016-05-10 Sri International Content-aware speaker recognition
US20150154002A1 (en) * 2013-12-04 2015-06-04 Google Inc. User interface customization based on speaker characteristics
US9542948B2 (en) * 2014-04-09 2017-01-10 Google Inc. Text-dependent speaker identification
US9564123B1 (en) * 2014-05-12 2017-02-07 Soundhound, Inc. Method and system for building an integrated user profile
US9792899B2 (en) * 2014-07-15 2017-10-17 International Business Machines Corporation Dataset shift compensation in machine learning
US9373330B2 (en) * 2014-08-07 2016-06-21 Nuance Communications, Inc. Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis
US10476872B2 (en) * 2015-02-20 2019-11-12 Sri International Joint speaker authentication and key phrase identification
US11823658B2 (en) * 2015-02-20 2023-11-21 Sri International Trial-based calibration for audio-based identification, recognition, and detection system
US10146923B2 (en) * 2015-03-20 2018-12-04 Aplcomp Oy Audiovisual associative authentication method, related system and device
US9666183B2 (en) * 2015-03-27 2017-05-30 Qualcomm Incorporated Deep neural net based filter prediction for audio event classification and extraction
US9721559B2 (en) * 2015-04-17 2017-08-01 International Business Machines Corporation Data augmentation method based on stochastic feature mapping for automatic speech recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971690A (zh) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
US20150127342A1 (en) * 2013-11-04 2015-05-07 Google Inc. Speaker identification
US20150149165A1 (en) * 2013-11-27 2015-05-28 International Business Machines Corporation Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
CN105513597A (zh) * 2015-12-30 2016-04-20 百度在线网络技术(北京)有限公司 声纹认证处理方法及装置

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
KENNY, P. ET AL.: "Deep Neural Networks for Extracting Baum-welch Statistics for Speaker Recognition", PROC. OF ODYSSEY 2014 : THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 19 June 2014 (2014-06-19), pages 293 - 298, XP055361192 *
LEI, YUN ET AL.: "A Deep Neural Network Speaker Verification System Targeting Microphone Speech", INTERSPEECH 2014, 18 September 2014 (2014-09-18), pages 681 - 685, XP055396135 *
LEI, YUN ET AL.: "A Novel Scheme for Speaker Recognition Using a Phonetically-aware Deep Neural Network", ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014 IEEE INTERNATIONAL CONFERENCE ON, 9 May 2014 (2014-05-09), pages 1695 - 1699, XP032617660 *
LI, LANTIAN ET AL.: "Gender-dependent Feature Extraction for Speaker Recognition", SIGNAL AND INFORMATION PROCESSING ( CHINA SIP), 2015 IEEE CHINA SUMMIT AND INTERNATIONAL CONFERENCE ON, 15 July 2015 (2015-07-15), pages 509 - 513, XP055396130 *
SNYDER, D. ET AL.: "Time Delay Deep Neural Network-based Universal Background Models for Speaker Recognition", AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU, 17 December 2015 (2015-12-17), pages 92 - 97, XP032863530 *
XU, YAN ET AL.: "Improved i-Vector Representation for Speaker Diarization", CIRCUITS SYSTEMS AND SIGNAL PROCESSING, vol. 9, no. 35, 22 December 2015 (2015-12-22), pages 3393 - 3404, XP036015063 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109545227A (zh) * 2018-04-28 2019-03-29 华中师范大学 基于深度自编码网络的说话人性别自动识别方法及系统
CN109545227B (zh) * 2018-04-28 2023-05-09 华中师范大学 基于深度自编码网络的说话人性别自动识别方法及系统
WO2019214047A1 (zh) * 2018-05-08 2019-11-14 平安科技(深圳)有限公司 建立声纹模型的方法、装置、计算机设备和存储介质
CN110136726A (zh) * 2019-06-20 2019-08-16 厦门市美亚柏科信息股份有限公司 一种语音性别的估计方法、装置、系统及存储介质
CN111241512A (zh) * 2020-01-09 2020-06-05 珠海格力电器股份有限公司 留言信息播报方法、装置、电子设备及存储介质
CN114141255A (zh) * 2021-11-24 2022-03-04 中国电信股份有限公司 声纹识别模型的训练方法及装置、声纹识别方法及装置

Also Published As

Publication number Publication date
EP3296991A1 (en) 2018-03-21
CN105513597B (zh) 2018-07-10
US10685658B2 (en) 2020-06-16
KR101870093B1 (ko) 2018-06-21
US20180293990A1 (en) 2018-10-11
JP6682523B2 (ja) 2020-04-15
CN105513597A (zh) 2016-04-20
JP2018508799A (ja) 2018-03-29
EP3296991B1 (en) 2019-11-13
EP3296991A4 (en) 2018-07-25

Similar Documents

Publication Publication Date Title
WO2017113680A1 (zh) 声纹认证处理方法及装置
EP3477519B1 (en) Identity authentication method, terminal device, and computer-readable storage medium
WO2021068321A1 (zh) 基于人机交互的信息推送方法、装置和计算机设备
CN107492379B (zh) 一种声纹创建与注册方法及装置
WO2017113658A1 (zh) 基于人工智能的声纹认证方法以及装置
US20180197548A1 (en) System and method for diarization of speech, automated generation of transcripts, and automatic information extraction
Dobrišek et al. Towards efficient multi-modal emotion recognition
JP6567040B2 (ja) 人工知能に基づく声紋ログイン方法と装置
US11270698B2 (en) Proactive command framework
US6141644A (en) Speaker verification and speaker identification based on eigenvoices
US10509895B2 (en) Biometric authentication
US20210082438A1 (en) Convolutional neural network with phonetic attention for speaker verification
WO2021232594A1 (zh) 语音情绪识别方法、装置、电子设备及存储介质
CN110516083B (zh) 相册管理方法、存储介质及电子设备
US20170294192A1 (en) Classifying Signals Using Mutual Information
WO2021047319A1 (zh) 基于语音的个人信用评估方法、装置、终端及存储介质
US11756572B2 (en) Self-supervised speech representations for fake audio detection
CN112233648B (zh) 结合rpa及ai的数据的处理方法、装置、设备及存储介质
JP2015230455A (ja) 音声分類装置、音声分類方法、プログラム
CA3199456A1 (en) Embedded dictation detection
CN117235234B (zh) 对象信息获取方法、装置、计算机设备和存储介质
CN111081261A (zh) 一种基于lda的文本无关声纹识别方法
US11792365B1 (en) Message data analysis for response recommendations
Trabelsi et al. Training universal background models with restricted data for speech emotion recognition
Trabelsi et al. Dynamic sequence-based learning approaches on emotion recognition systems

Legal Events

Date Code Title Description
REEP Request for entry into the european phase

Ref document number: 2016829225

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2016829225

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 15501292

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2017519504

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16829225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE