[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109523994A - A kind of multitask method of speech classification based on capsule neural network - Google Patents

A kind of multitask method of speech classification based on capsule neural network Download PDF

Info

Publication number
CN109523994A
CN109523994A CN201811346110.8A CN201811346110A CN109523994A CN 109523994 A CN109523994 A CN 109523994A CN 201811346110 A CN201811346110 A CN 201811346110A CN 109523994 A CN109523994 A CN 109523994A
Authority
CN
China
Prior art keywords
neural network
capsule
speech
voice
multitask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811346110.8A
Other languages
Chinese (zh)
Inventor
陈盈科
毛华
吴雨
何涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201811346110.8A priority Critical patent/CN109523994A/en
Publication of CN109523994A publication Critical patent/CN109523994A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of multitask method of speech classification based on capsule neural network, are related to speech signal analysis, and the technical fields such as artificial intelligence solve the multitask classification problem in speech recognition.The present invention mainly has the feature representation for extracting voice, including from frequency domain, and multiple angles such as time domain go to extract the primary features of voice;With convolutional neural networks and capsule neural network, on the basis of voice primary features after the pre-treatment, then the abstract and study of profound phonetic feature is carried out;According to the multiple classifiers of multitask Demand Design after advanced features, the loss function of multiple classifiers is merged, unified training multitask Classification of Speech model is finally reached in multiple tasks while improving classification accuracy.

Description

capsule neural network-based multitask speech classification method
Technical Field
A multitask voice classification method based on a capsule neural network relates to the technical field of voice signal analysis processing, artificial intelligence and the like and solves the multitask voice recognition problem.
Background
Sound is one of the most convenient means for people to communicate in daily life, and rich information is transmitted at the same time. The voice is an indispensable part formed by big data as an important big data existence form, and has great research prospect in the current artificial intelligence era. Human-computer interaction emphasizes providing user comfort, a natural product experience feeling, and voice is the most natural way of interacting, and the importance of voice is not so much. Intelligent voice products such as intelligent music recommendation, voice synchronous translation, voice chat software and the like greatly facilitate daily life of people. Research on speech intelligence technology is also currently designed to several aspects: speech recognition, speech classification, semantic analysis, etc., wherein speech classification is the basis for studying speech data. Different classes of speech classification, such as accent recognition, speaker recognition, and speech emotion recognition have been used with much success. The speech classification and recognition capability of the computer is an important component of speech processing of the computer, is a key premise for realizing a natural human-computer interaction interface, and has great research value and application value.
Often the voice classification task is considered independent, but in practice a piece of voice can convey a variety of information, such as gender, text, mood, etc., and it is of display interest to study the interrelationship between the different tasks. For example, the accent recognition task and speaker recognition are typically treated as separate classification tasks. In fact, however, for the same piece of speech data, the accent of the spoken speaker will be determined once it is confirmed. According to the research content, by considering the real environment, richer information is expected to be analyzed from the voice audio, so that a plurality of different voice tasks are classified under a unified model.
The current artificial intelligence technology has several aspects, namely, the traditional deep neural network, the generation of an antagonistic network, the reinforcement of learning and the capsule network. The research content system aims to solve the problem of voice classification through multiple tasks by researching a capsule network, so that the recognition effect of the system is finally improved under multiple tasks.
Disclosure of Invention
The invention provides a capsule neural network-based multi-task speech classification method, which is used for analyzing the correlation among multi-task speech, solving the problem of multi-task speech classification, realizing abstract learning of speech characteristics and obtaining more accurate results obtained by speech classification in multiple tasks.
In order to achieve the purpose, the invention adopts the technical scheme that:
the multitask speech classification method based on the capsule neural network is characterized in that a deep convolutional neural network and the capsule neural network are utilized to learn more abstract high-level speech characteristics, and the multitask speech classification method comprises the following steps:
(1) preprocessing a voice original signal, and extracting a voice low-level feature expression by adopting a voice signal feature extraction algorithm;
(2) extracting a middle-layer feature expression of the voice signal by using a deep convolutional neural network;
(3) further extracting a high-level abstract feature representation of the speech using a capsule neural network;
(4) and designing a plurality of different classifiers and loss functions to realize the end-to-end integral training of the multi-task voice classification.
Further, the step (1) comprises the following steps:
(11) the original feature expression of the voice is one-dimensional high-dimensional feature, in a voice preprocessing model, different traditional feature extraction algorithms are adopted to extract feature time domain and frequency domain features from the original audio, and finally, a plurality of features are fused and expressed to be input into a deep neural network model;
(12) the time domain speech feature extraction algorithm adopts a Linear Predictive Coding Coefficient (LPCC), is a homomorphic signal processing method, adopts a Mel Frequency Cepstrum Coefficient (MFCC) and utilizes Fourier to extract a speech signal; and finally forming the input of the deep neural network model by fusing the primary voice features with different characteristics.
Further, the step (2) comprises the following steps:
(21) the convolution operation using the deep convolutional neural network in the step (2) extracts higher-layer features of the input features, which can be expressed by the following formula:
wherein,the input of the convolutional layer is defined,representing the learned weights in a convolution kernel, whereinIs a convolution kernel function and acts on a nonlinear mapping function;
(22) the step (2) of extracting higher-level features of the input features by using a pooling operation of the deep convolutional neural network can be represented by the following formula:
wherein,defining pooling layersInput, since pooling layers do not have learned parameters, there is no(ii) a Common pooling operationsThe function may take the maximum, minimum or average value.
Further, the step (3) comprises the following steps:
(31) the capsule neural network is different from the traditional deep neural network, the minimum unit of calculation is a group of neurons, and two weights with different functions exist in the capsule network and are respectively used for prediction and predicted weight;
(32) firstly, in a prediction layer of a capsule network, similar to the traditional feedforward calculation, a prediction result is obtained by matrix multiplication between an input capsule and a prediction weight, and a specific formula is calculated as follows:
wherein,is a low-level capsule neural network, and is characterized in that,expressed as a predicted result, it is noted thatAndare all the expression of a group of neurons;
(33) different from the traditional convolutional neural network, the characteristics of the lower layer network prediction upper layer are expressed in learning, the capsule neural network relearns the same predicted weight value for each part of the lower layer, and a specific formula is calculated as follows:
wherein,expressed as predictive low-level capsulesFor high-rise capsuleThe prediction of (a) is performed,expressed as the predicted weight, the final high-level capsule weights and sums all predictions to obtain the net input; it is noted that, unlike the conventional neural network in which the parameters are updated by a gradient descent method, the method of updating the parameters is not limited to the gradient descent methodUpdating by a dynamic routing algorithm;
(34) finally, the summed predicted expression needs to be mapped through a non-linear mapping, and since the smallest computational unit in the capsule neural network is a group of neurons, the activation function is changed, and is mainly expressed as follows:
the activated prediction expression has two meanings, wherein the direction of the prediction expression represents the attribute of the category, and the magnitude of the prediction expression represents the probability of the category.
Further, the step (4) comprises the following steps:
(41) determining voice multitask classification content and digitizing labels corresponding to the multitasks;
(42) defining a plurality of classifiers according to different types of classification contents;
(43) designing corresponding loss functions aiming at different classifiers; the specific function design is as follows:
wherein,for a certain class of real sample tags corresponding to speech,to be the probability value after the classifier softmax,representing the total number of samples, by superpositionObtaining the loss average value of all samples on a certain type of task by using the loss function of each sample on the certain type of task;
the above is only for a single classification result in multitask, a loss function is designed, and for the voice classification problem of multitask, the final loss function is defined as follows:
wherein,representing the above-described penalty function for a single task in the overall sample,representing the total loss function of the final multi-task speech recognition problem for the number of multiple tasks in practiceExpressed as the sum of all individual loss functions;
(44) through the steps of designing a network structure, constructing a data set, designing a loss function and the like, the whole end-to-end capsule nerve is trained by adopting a back propagation algorithm.
Compared with the prior art, the invention has the advantages that:
the preprocessing part skillfully fuses various original features of the voice, reduces data dimensionality compared with original voice data, and enriches voice input information compared with single primary feature expression;
secondly, on the basis of the deep convolutional neural network, further designing the most advanced capsule neural network for learning the feature expression of the higher level of the voice;
and thirdly, learning the correlation among the tasks through a multi-task loss function, thereby better training the network.
Drawings
FIG. 1 is a diagram of a model of capsule neural network-based multitask speech classification according to the present invention;
FIG. 2 is a flow chart of the capsule neural network based multitasking speech classification in the present invention;
fig. 3 is a topological diagram of a capsule according to the present invention.
Detailed description of the preferred embodiments
The invention is further described below with reference to the figures and examples.
Referring to fig. 1, a core model of a capsule neural network-based multitask speech recognition method is a capsule neural network model, which receives data input of different speech original feature combinations, and simultaneously adopts a convolution basic structure to perform feature learning on input features, and further performs feature extraction on primary features by deeply adopting a capsule network structure, and simultaneously considers a multitask learning target to design a new loss function, thereby effectively improving the accuracy of speech recognition on multiple tasks.
Referring to fig. 2, an overall data flow of the capsule neural network-based multitask speech classification method includes the following specific steps:
(1) audio preprocessing: the extraction algorithm of the speech features relates to various classical algorithms, and the Mel coefficient in the MFCC is calculated as follows:
wherein,representing the actual frequency of the voice, the above formula describes the relationship between Mel frequency and actual frequency in the algorithm, and the auditory frequency of human ears is consistent with the increase of Mel frequency.
The LPCC mainly calculates the linear prediction cepstrum coefficient in the voice, and the calculation mode is as follows:
wherein,is shown asA phase linear prediction function. The final model input features are obtained by mixing the various preliminary speech features described above.
(2) Convolution and pooling: the convolution operation using the deep convolutional neural network extracts higher-level features of the input features, which can be expressed by the following formula:
wherein,the input of the convolutional layer is defined,representing the learned weights in a convolution kernel, whereinIs a convolution kernel function and acts on a nonlinear mapping function;
the pooling operation using the deep convolutional neural network extracts higher-level features of the input features, which can be expressed by the following formula:
wherein,defining input to pooling layers, common pooling operationsThe function may take the maximum, minimum or average value.
(3) Capsule neural network: the basic calculation unit of the capsule network is a group of neurons, each vector represents a group of neuron structures, and the calculation between two layers of the capsule network needs to be carried out through two steps: the prediction is summed with the prediction. The intermediate prediction result is obtained by matrix multiplication between the input capsule and the prediction weight, and the specific formula is calculated as follows:
wherein,is a low-level capsule neural network, and is characterized in that,expressed as a predicted result.
The characteristics of the lower layer network prediction layer are expressed in learning, the capsule neural network relearns the same predicted weight value of each part of the lower layer, and a specific formula is calculated as follows:
wherein,expressed as predictive low-level capsulesFor high-rise capsuleThe prediction of (a) is performed,expressed as predicted weights, the final high-level capsule weights all predictions and sums to get the net input.
Finally, the summed predicted expression needs to be mapped through a non-linear mapping, and since the smallest computational unit in the capsule neural network is a group of neurons, the activation function is changed, and is mainly expressed as follows:
wherein,the final capsule output expression.
(4) Total loss function: firstly, the content of a task is determined, and a plurality of classifiers are designed to correspond to the learning target of the multi-task. For a learning objective of a single task, the corresponding loss function is designed as follows:
wherein,for a certain class of real sample tags corresponding to speech,to be the probability value after the classifier softmax,representing the total number of samples, by superpositionOf samples in a certain kind of taskA loss function is used for obtaining the average loss value of all samples on the task;
because the model corresponds to the problem of multi-task speech classification, a rule needs to be designed to fuse the independent loss functions, so that the total loss function of the multi-task speech classification model is specifically expressed as follows:
wherein,representing the above-described penalty function for a single task in the overall sample,representing the total loss function of the final multi-task speech recognition problem for the number of multiple tasks in practiceExpressed as the sum of all individual loss functions.
Referring to fig. 3, a computed topology map in any two-layer network based on a capsule neural network,feature expressions learned by the neural network of the capsule expressed as low layersTo perform a learning and a prediction of the input to an expression of the high level, the result of the prediction level is hidden in the diagramAnd each weight value of the prediction layerFinally, the expression of the next high-level capsule can be obtainedThe specific expression is as follows:

Claims (5)

1. A multitask speech classification method based on capsule neural network is characterized in that the capsule neural network is used for extracting high-level abstract features of speech, and meanwhile, a multi-classifier is adopted to complete multitask classification of the speech, and the method comprises the following steps:
(1) preprocessing a voice original signal, and extracting a voice low-level feature expression by adopting a voice signal feature extraction algorithm;
(2) extracting a middle-layer feature expression of the voice signal by using a deep convolutional neural network;
(3) further extracting a high-level abstract feature representation of the speech using a capsule neural network;
(4) and designing a plurality of different classifiers and loss functions to realize the end-to-end integral training of the multi-task voice classification.
2. The capsule neural network-based multitask speech classification method according to claim 1, said step (1) includes the following steps:
(11) the original feature expression of the voice is one-dimensional high-dimensional feature, in a voice preprocessing model, different traditional feature extraction algorithms are adopted to extract feature time domain and frequency domain features from the original audio, and finally, a plurality of features are fused and expressed to be input into a deep neural network model;
(12) the time domain speech feature extraction algorithm adopts a Linear Predictive Coding Coefficient (LPCC), is a homomorphic signal processing method, adopts a Mel Frequency Cepstrum Coefficient (MFCC) and utilizes Fourier to extract a speech signal; and finally forming the input of the deep neural network model by fusing the primary voice features with different characteristics.
3. The capsule neural network-based multitask speech classification method according to claim 1, said step (2) includes the following steps:
(21) the convolution operation using the deep convolutional neural network in the step (2) extracts higher-layer features of the input features, which can be expressed by the following formula:
wherein,the input of the convolutional layer is defined,represents the learned weights in the convolution kernel, whichInIs a convolution kernel function and acts on a nonlinear mapping function;
(22) the step (2) of extracting higher-level features of the input features by using a pooling operation of the deep convolutional neural network can be represented by the following formula:
wherein,defining the input of the pooling layer, since the pooling layer does not have the learned parameters, it does not(ii) a Common pooling operationsThe function may take the maximum, minimum or average value.
4. The capsule neural network-based multitask speech classification method according to claim 1, said step (3) includes the following steps:
(31) the capsule neural network is different from the traditional deep neural network, the minimum unit of calculation is a group of neurons, and two weights with different functions exist in the capsule network and are respectively used for prediction and predicted weight;
(32) firstly, in a prediction layer of a capsule network, similar to the traditional feedforward calculation, a prediction result is obtained by matrix multiplication between an input capsule and a prediction weight, and a specific formula is calculated as follows:
wherein,is a low-level capsule neural network, and is characterized in that,expressed as a predicted result, it is noted thatAndare all the expression of a group of neurons;
(33) different from the traditional convolutional neural network, the characteristics of the lower layer network prediction upper layer are expressed in learning, the capsule neural network relearns the same predicted weight value for each part of the lower layer, and a specific formula is calculated as follows:
wherein,expressed as predictive low-level capsulesFor high-rise capsuleThe prediction of (a) is performed,expressed as the predicted weight, the final high-level capsule weights and sums all predictions to obtain the net input; notably, the update of parameters is adopted differently from the conventional neural networkGradient descent method, hereinUpdating by a dynamic routing algorithm;
(34) finally, the summed predicted expression needs to be mapped through a non-linear mapping, and since the smallest computational unit in the capsule neural network is a group of neurons, the activation function is changed, and is mainly expressed as follows:
the activated prediction expression has two meanings, wherein the direction of the prediction expression represents the attribute of the category, and the magnitude of the prediction expression represents the probability of the category.
5. The capsule neural network-based multitask speech classification method according to claim 1, said step (4) includes the following steps:
(41) determining voice multitask classification content and digitizing labels corresponding to the multitasks;
(42) defining a plurality of classifiers according to different types of classification contents;
(43) designing corresponding loss functions aiming at different classifiers; the specific function design is as follows:
wherein,for a certain class of real sample tags corresponding to speech,to be the probability value after the classifier softmax,representing the total number of samples, by superpositionObtaining the loss average value of all samples on a certain type of task by using the loss function of each sample on the certain type of task;
the above is only for a single classification result in multitask, a loss function is designed, and for the voice classification problem of multitask, the final loss function is defined as follows:
wherein,representing the above-described penalty function for a single task in the overall sample,representing the total loss function of the final multi-task speech recognition problem for the number of multiple tasks in practiceExpressed as the sum of all individual loss functions;
(44) through the steps of designing a network structure, constructing a data set, designing a loss function and the like, the whole end-to-end capsule neural network is trained by adopting a back propagation algorithm.
CN201811346110.8A 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network Pending CN109523994A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811346110.8A CN109523994A (en) 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811346110.8A CN109523994A (en) 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network

Publications (1)

Publication Number Publication Date
CN109523994A true CN109523994A (en) 2019-03-26

Family

ID=65776175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811346110.8A Pending CN109523994A (en) 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network

Country Status (1)

Country Link
CN (1) CN109523994A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120224A (en) * 2019-05-10 2019-08-13 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of bird sound identification model
CN110428843A (en) * 2019-03-11 2019-11-08 杭州雄迈信息技术有限公司 A kind of voice gender identification deep learning method
CN110931046A (en) * 2019-11-29 2020-03-27 福州大学 Audio high-level semantic feature extraction method and system for overlapped sound event detection
CN110968729A (en) * 2019-11-21 2020-04-07 浙江树人学院(浙江树人大学) Family activity sound event classification method based on additive interval capsule network
CN111179961A (en) * 2020-01-02 2020-05-19 腾讯科技(深圳)有限公司 Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN111357051A (en) * 2019-12-24 2020-06-30 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
CN111584010A (en) * 2020-04-01 2020-08-25 昆明理工大学 Key protein identification method based on capsule neural network and ensemble learning
CN111862949A (en) * 2020-07-30 2020-10-30 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN112562725A (en) * 2020-12-09 2021-03-26 山西财经大学 Mixed voice emotion classification method based on spectrogram and capsule network
CN112599134A (en) * 2020-12-02 2021-04-02 国网安徽省电力有限公司 Transformer sound event detection method based on voiceprint recognition
CN112992191A (en) * 2021-05-12 2021-06-18 北京世纪好未来教育科技有限公司 Voice endpoint detection method and device, electronic equipment and readable storage medium
CN113314119A (en) * 2021-07-27 2021-08-27 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN113343924A (en) * 2021-07-01 2021-09-03 齐鲁工业大学 Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network
CN113362857A (en) * 2021-06-15 2021-09-07 厦门大学 Real-time speech emotion recognition method based on CapcNN and application device
CN113378855A (en) * 2021-06-22 2021-09-10 北京百度网讯科技有限公司 Method for processing multitask, related device and computer program product
CN113378984A (en) * 2021-07-05 2021-09-10 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN113782000A (en) * 2021-09-29 2021-12-10 北京中科智加科技有限公司 Language identification method based on multiple tasks
CN114267360A (en) * 2021-12-29 2022-04-01 达闼机器人有限公司 Speech recognition and speech-based joint processing model training method and device
CN115376518A (en) * 2022-10-26 2022-11-22 广州声博士声学技术有限公司 Voiceprint recognition method, system, device and medium for real-time noise big data
US11735168B2 (en) 2020-07-20 2023-08-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing voice
WO2023222088A1 (en) * 2022-05-20 2023-11-23 青岛海尔电冰箱有限公司 Voice recognition and classification method and apparatus
CN117275461A (en) * 2023-11-23 2023-12-22 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06295196A (en) * 1993-04-08 1994-10-21 Casio Comput Co Ltd Speech recognition device and signal recognition device
WO2005059811A1 (en) * 2003-12-16 2005-06-30 Canon Kabushiki Kaisha Pattern identification method, apparatus, and program
US20160284346A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Deep neural net based filter prediction for audio event classification and extraction
CN106601235A (en) * 2016-12-02 2017-04-26 厦门理工学院 Semi-supervision multitask characteristic selecting speech recognition method
US20170148431A1 (en) * 2015-11-25 2017-05-25 Baidu Usa Llc End-to-end speech recognition
CN107578775A (en) * 2017-09-07 2018-01-12 四川大学 A kind of multitask method of speech classification based on deep neural network
CN107610692A (en) * 2017-09-22 2018-01-19 杭州电子科技大学 The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net
US20180068675A1 (en) * 2016-09-07 2018-03-08 Google Inc. Enhanced multi-channel acoustic models
CN108010514A (en) * 2017-11-20 2018-05-08 四川大学 A kind of method of speech classification based on deep neural network
GB201807225D0 (en) * 2018-03-14 2018-06-13 Papercup Tech Limited A speech processing system and a method of processing a speech signal
CN108766461A (en) * 2018-07-17 2018-11-06 厦门美图之家科技有限公司 Audio feature extraction methods and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06295196A (en) * 1993-04-08 1994-10-21 Casio Comput Co Ltd Speech recognition device and signal recognition device
WO2005059811A1 (en) * 2003-12-16 2005-06-30 Canon Kabushiki Kaisha Pattern identification method, apparatus, and program
US20160284346A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Deep neural net based filter prediction for audio event classification and extraction
US20170148431A1 (en) * 2015-11-25 2017-05-25 Baidu Usa Llc End-to-end speech recognition
US20180068675A1 (en) * 2016-09-07 2018-03-08 Google Inc. Enhanced multi-channel acoustic models
CN106601235A (en) * 2016-12-02 2017-04-26 厦门理工学院 Semi-supervision multitask characteristic selecting speech recognition method
CN107578775A (en) * 2017-09-07 2018-01-12 四川大学 A kind of multitask method of speech classification based on deep neural network
CN107610692A (en) * 2017-09-22 2018-01-19 杭州电子科技大学 The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net
CN108010514A (en) * 2017-11-20 2018-05-08 四川大学 A kind of method of speech classification based on deep neural network
GB201807225D0 (en) * 2018-03-14 2018-06-13 Papercup Tech Limited A speech processing system and a method of processing a speech signal
CN108766461A (en) * 2018-07-17 2018-11-06 厦门美图之家科技有限公司 Audio feature extraction methods and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LE,D等: "Discretized continuous speech emotion recognition with multi-task deep recurrent neural network", 《18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH-COMMUNICATION-ASSOCIATION(INTERSPEECH2017)》 *
NAM KYUN KIM等: "Speech emotion recognition based on multi-task learning using a convolutional neural network", 《2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)》 *
余成波等: "基于胶囊网络的指静脉识别研究", 《电子技术应用》 *
朱应钊等: "胶囊网络技术及发展趋势研究", 《广东通信技术》 *
胡文凭: "基于深层神经网络的口语发音检测与错误分析", 《中国博士学位论文全文数据库信息科技辑》 *
郭俊文: "基于CAPSNET的可穿戴心电采集和心律失常检测系统研究", 《中国优秀硕士学位论文全文数据库医药卫生科技辑》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428843B (en) * 2019-03-11 2021-09-07 杭州巨峰科技有限公司 Voice gender recognition deep learning method
CN110428843A (en) * 2019-03-11 2019-11-08 杭州雄迈信息技术有限公司 A kind of voice gender identification deep learning method
CN110120224B (en) * 2019-05-10 2023-01-20 平安科技(深圳)有限公司 Method and device for constructing bird sound recognition model, computer equipment and storage medium
CN110120224A (en) * 2019-05-10 2019-08-13 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of bird sound identification model
CN110968729A (en) * 2019-11-21 2020-04-07 浙江树人学院(浙江树人大学) Family activity sound event classification method based on additive interval capsule network
CN110968729B (en) * 2019-11-21 2022-05-17 浙江树人学院(浙江树人大学) Family activity sound event classification method based on additive interval capsule network
CN110931046A (en) * 2019-11-29 2020-03-27 福州大学 Audio high-level semantic feature extraction method and system for overlapped sound event detection
CN111357051A (en) * 2019-12-24 2020-06-30 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
CN111357051B (en) * 2019-12-24 2024-02-02 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
WO2021127982A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech emotion recognition method, smart device, and computer-readable storage medium
US12039995B2 (en) 2020-01-02 2024-07-16 Tencent Technology (Shenzhen) Company Limited Audio signal processing method and apparatus, electronic device, and storage medium
CN111179961A (en) * 2020-01-02 2020-05-19 腾讯科技(深圳)有限公司 Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN111584010A (en) * 2020-04-01 2020-08-25 昆明理工大学 Key protein identification method based on capsule neural network and ensemble learning
CN111584010B (en) * 2020-04-01 2022-05-27 昆明理工大学 Key protein identification method based on capsule neural network and ensemble learning
US11735168B2 (en) 2020-07-20 2023-08-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing voice
CN111862949B (en) * 2020-07-30 2024-04-02 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN111862949A (en) * 2020-07-30 2020-10-30 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN112599134A (en) * 2020-12-02 2021-04-02 国网安徽省电力有限公司 Transformer sound event detection method based on voiceprint recognition
CN112562725A (en) * 2020-12-09 2021-03-26 山西财经大学 Mixed voice emotion classification method based on spectrogram and capsule network
CN112992191A (en) * 2021-05-12 2021-06-18 北京世纪好未来教育科技有限公司 Voice endpoint detection method and device, electronic equipment and readable storage medium
CN112992191B (en) * 2021-05-12 2021-11-05 北京世纪好未来教育科技有限公司 Voice endpoint detection method and device, electronic equipment and readable storage medium
CN113362857A (en) * 2021-06-15 2021-09-07 厦门大学 Real-time speech emotion recognition method based on CapcNN and application device
CN113378855A (en) * 2021-06-22 2021-09-10 北京百度网讯科技有限公司 Method for processing multitask, related device and computer program product
CN113343924A (en) * 2021-07-01 2021-09-03 齐鲁工业大学 Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network
CN113378984B (en) * 2021-07-05 2023-05-02 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN113378984A (en) * 2021-07-05 2021-09-10 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN113314119B (en) * 2021-07-27 2021-12-03 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN113314119A (en) * 2021-07-27 2021-08-27 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN113782000A (en) * 2021-09-29 2021-12-10 北京中科智加科技有限公司 Language identification method based on multiple tasks
CN114267360A (en) * 2021-12-29 2022-04-01 达闼机器人有限公司 Speech recognition and speech-based joint processing model training method and device
WO2023222088A1 (en) * 2022-05-20 2023-11-23 青岛海尔电冰箱有限公司 Voice recognition and classification method and apparatus
CN115376518A (en) * 2022-10-26 2022-11-22 广州声博士声学技术有限公司 Voiceprint recognition method, system, device and medium for real-time noise big data
CN117275461A (en) * 2023-11-23 2023-12-22 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment
CN117275461B (en) * 2023-11-23 2024-03-15 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109523994A (en) A kind of multitask method of speech classification based on capsule neural network
Abdullah et al. SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning
CN111312245B (en) Voice response method, device and storage medium
CN110263324A (en) Text handling method, model training method and device
CN113127624B (en) Question-answer model training method and device
CN108920622A (en) A kind of training method of intention assessment, training device and identification device
CN112216307B (en) Speech emotion recognition method and device
Shahriar et al. Classifying maqams of Qur’anic recitations using deep learning
CN113837299B (en) Network training method and device based on artificial intelligence and electronic equipment
CN113065344A (en) Cross-corpus emotion recognition method based on transfer learning and attention mechanism
CN115393933A (en) Video face emotion recognition method based on frame attention mechanism
CN112183106A (en) Semantic understanding method and device based on phoneme association and deep learning
CN111666752A (en) Circuit teaching material entity relation extraction method based on keyword attention mechanism
Atkar et al. Speech emotion recognition using dialogue emotion decoder and CNN Classifier
Cao et al. Speaker-independent speech emotion recognition based on random forest feature selection algorithm
Chen et al. Construction of affective education in mobile learning: The study based on learner’s interest and emotion recognition
CN113887836A (en) Narrative event prediction method fusing event environment information
CN116757195B (en) Implicit emotion recognition method based on prompt learning
CN117808103A (en) Co-emotion reply generation method based on speech-level feature dynamic interaction
CN109785863A (en) A kind of speech-emotion recognition method and system of deepness belief network
CN116403608A (en) Speech emotion recognition method based on multi-label correction and space-time collaborative fusion
CN114239565A (en) Deep learning-based emotion reason identification method and system
Novais A framework for emotion and sentiment predicting supported in ensembles
CN115270805A (en) Semantic information extraction method of service resources
Lin et al. Speech emotion recognition based on dynamic convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190326