CN109523994A - A kind of multitask method of speech classification based on capsule neural network - Google Patents
A kind of multitask method of speech classification based on capsule neural network Download PDFInfo
- Publication number
- CN109523994A CN109523994A CN201811346110.8A CN201811346110A CN109523994A CN 109523994 A CN109523994 A CN 109523994A CN 201811346110 A CN201811346110 A CN 201811346110A CN 109523994 A CN109523994 A CN 109523994A
- Authority
- CN
- China
- Prior art keywords
- neural network
- capsule
- speech
- voice
- multitask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002775 capsule Substances 0.000 title claims abstract description 62
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 13
- 238000013461 design Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 45
- 230000014509 gene expression Effects 0.000 claims description 24
- 238000011176 pooling Methods 0.000 claims description 12
- 210000002569 neuron Anatomy 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000003672 processing method Methods 0.000 claims description 2
- 230000001537 neural effect Effects 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000002203 pretreatment Methods 0.000 abstract 1
- 238000011160 research Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 229930014626 natural product Natural products 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of multitask method of speech classification based on capsule neural network, are related to speech signal analysis, and the technical fields such as artificial intelligence solve the multitask classification problem in speech recognition.The present invention mainly has the feature representation for extracting voice, including from frequency domain, and multiple angles such as time domain go to extract the primary features of voice;With convolutional neural networks and capsule neural network, on the basis of voice primary features after the pre-treatment, then the abstract and study of profound phonetic feature is carried out;According to the multiple classifiers of multitask Demand Design after advanced features, the loss function of multiple classifiers is merged, unified training multitask Classification of Speech model is finally reached in multiple tasks while improving classification accuracy.
Description
Technical Field
A multitask voice classification method based on a capsule neural network relates to the technical field of voice signal analysis processing, artificial intelligence and the like and solves the multitask voice recognition problem.
Background
Sound is one of the most convenient means for people to communicate in daily life, and rich information is transmitted at the same time. The voice is an indispensable part formed by big data as an important big data existence form, and has great research prospect in the current artificial intelligence era. Human-computer interaction emphasizes providing user comfort, a natural product experience feeling, and voice is the most natural way of interacting, and the importance of voice is not so much. Intelligent voice products such as intelligent music recommendation, voice synchronous translation, voice chat software and the like greatly facilitate daily life of people. Research on speech intelligence technology is also currently designed to several aspects: speech recognition, speech classification, semantic analysis, etc., wherein speech classification is the basis for studying speech data. Different classes of speech classification, such as accent recognition, speaker recognition, and speech emotion recognition have been used with much success. The speech classification and recognition capability of the computer is an important component of speech processing of the computer, is a key premise for realizing a natural human-computer interaction interface, and has great research value and application value.
Often the voice classification task is considered independent, but in practice a piece of voice can convey a variety of information, such as gender, text, mood, etc., and it is of display interest to study the interrelationship between the different tasks. For example, the accent recognition task and speaker recognition are typically treated as separate classification tasks. In fact, however, for the same piece of speech data, the accent of the spoken speaker will be determined once it is confirmed. According to the research content, by considering the real environment, richer information is expected to be analyzed from the voice audio, so that a plurality of different voice tasks are classified under a unified model.
The current artificial intelligence technology has several aspects, namely, the traditional deep neural network, the generation of an antagonistic network, the reinforcement of learning and the capsule network. The research content system aims to solve the problem of voice classification through multiple tasks by researching a capsule network, so that the recognition effect of the system is finally improved under multiple tasks.
Disclosure of Invention
The invention provides a capsule neural network-based multi-task speech classification method, which is used for analyzing the correlation among multi-task speech, solving the problem of multi-task speech classification, realizing abstract learning of speech characteristics and obtaining more accurate results obtained by speech classification in multiple tasks.
In order to achieve the purpose, the invention adopts the technical scheme that:
the multitask speech classification method based on the capsule neural network is characterized in that a deep convolutional neural network and the capsule neural network are utilized to learn more abstract high-level speech characteristics, and the multitask speech classification method comprises the following steps:
(1) preprocessing a voice original signal, and extracting a voice low-level feature expression by adopting a voice signal feature extraction algorithm;
(2) extracting a middle-layer feature expression of the voice signal by using a deep convolutional neural network;
(3) further extracting a high-level abstract feature representation of the speech using a capsule neural network;
(4) and designing a plurality of different classifiers and loss functions to realize the end-to-end integral training of the multi-task voice classification.
Further, the step (1) comprises the following steps:
(11) the original feature expression of the voice is one-dimensional high-dimensional feature, in a voice preprocessing model, different traditional feature extraction algorithms are adopted to extract feature time domain and frequency domain features from the original audio, and finally, a plurality of features are fused and expressed to be input into a deep neural network model;
(12) the time domain speech feature extraction algorithm adopts a Linear Predictive Coding Coefficient (LPCC), is a homomorphic signal processing method, adopts a Mel Frequency Cepstrum Coefficient (MFCC) and utilizes Fourier to extract a speech signal; and finally forming the input of the deep neural network model by fusing the primary voice features with different characteristics.
Further, the step (2) comprises the following steps:
(21) the convolution operation using the deep convolutional neural network in the step (2) extracts higher-layer features of the input features, which can be expressed by the following formula:
wherein,the input of the convolutional layer is defined,representing the learned weights in a convolution kernel, whereinIs a convolution kernel function and acts on a nonlinear mapping function;
(22) the step (2) of extracting higher-level features of the input features by using a pooling operation of the deep convolutional neural network can be represented by the following formula:
wherein,defining pooling layersInput, since pooling layers do not have learned parameters, there is no(ii) a Common pooling operationsThe function may take the maximum, minimum or average value.
Further, the step (3) comprises the following steps:
(31) the capsule neural network is different from the traditional deep neural network, the minimum unit of calculation is a group of neurons, and two weights with different functions exist in the capsule network and are respectively used for prediction and predicted weight;
(32) firstly, in a prediction layer of a capsule network, similar to the traditional feedforward calculation, a prediction result is obtained by matrix multiplication between an input capsule and a prediction weight, and a specific formula is calculated as follows:
wherein,is a low-level capsule neural network, and is characterized in that,expressed as a predicted result, it is noted thatAndare all the expression of a group of neurons;
(33) different from the traditional convolutional neural network, the characteristics of the lower layer network prediction upper layer are expressed in learning, the capsule neural network relearns the same predicted weight value for each part of the lower layer, and a specific formula is calculated as follows:
wherein,expressed as predictive low-level capsulesFor high-rise capsuleThe prediction of (a) is performed,expressed as the predicted weight, the final high-level capsule weights and sums all predictions to obtain the net input; it is noted that, unlike the conventional neural network in which the parameters are updated by a gradient descent method, the method of updating the parameters is not limited to the gradient descent methodUpdating by a dynamic routing algorithm;
(34) finally, the summed predicted expression needs to be mapped through a non-linear mapping, and since the smallest computational unit in the capsule neural network is a group of neurons, the activation function is changed, and is mainly expressed as follows:
the activated prediction expression has two meanings, wherein the direction of the prediction expression represents the attribute of the category, and the magnitude of the prediction expression represents the probability of the category.
Further, the step (4) comprises the following steps:
(41) determining voice multitask classification content and digitizing labels corresponding to the multitasks;
(42) defining a plurality of classifiers according to different types of classification contents;
(43) designing corresponding loss functions aiming at different classifiers; the specific function design is as follows:
wherein,for a certain class of real sample tags corresponding to speech,to be the probability value after the classifier softmax,representing the total number of samples, by superpositionObtaining the loss average value of all samples on a certain type of task by using the loss function of each sample on the certain type of task;
the above is only for a single classification result in multitask, a loss function is designed, and for the voice classification problem of multitask, the final loss function is defined as follows:
wherein,representing the above-described penalty function for a single task in the overall sample,representing the total loss function of the final multi-task speech recognition problem for the number of multiple tasks in practiceExpressed as the sum of all individual loss functions;
(44) through the steps of designing a network structure, constructing a data set, designing a loss function and the like, the whole end-to-end capsule nerve is trained by adopting a back propagation algorithm.
Compared with the prior art, the invention has the advantages that:
the preprocessing part skillfully fuses various original features of the voice, reduces data dimensionality compared with original voice data, and enriches voice input information compared with single primary feature expression;
secondly, on the basis of the deep convolutional neural network, further designing the most advanced capsule neural network for learning the feature expression of the higher level of the voice;
and thirdly, learning the correlation among the tasks through a multi-task loss function, thereby better training the network.
Drawings
FIG. 1 is a diagram of a model of capsule neural network-based multitask speech classification according to the present invention;
FIG. 2 is a flow chart of the capsule neural network based multitasking speech classification in the present invention;
fig. 3 is a topological diagram of a capsule according to the present invention.
Detailed description of the preferred embodiments
The invention is further described below with reference to the figures and examples.
Referring to fig. 1, a core model of a capsule neural network-based multitask speech recognition method is a capsule neural network model, which receives data input of different speech original feature combinations, and simultaneously adopts a convolution basic structure to perform feature learning on input features, and further performs feature extraction on primary features by deeply adopting a capsule network structure, and simultaneously considers a multitask learning target to design a new loss function, thereby effectively improving the accuracy of speech recognition on multiple tasks.
Referring to fig. 2, an overall data flow of the capsule neural network-based multitask speech classification method includes the following specific steps:
(1) audio preprocessing: the extraction algorithm of the speech features relates to various classical algorithms, and the Mel coefficient in the MFCC is calculated as follows:
wherein,representing the actual frequency of the voice, the above formula describes the relationship between Mel frequency and actual frequency in the algorithm, and the auditory frequency of human ears is consistent with the increase of Mel frequency.
The LPCC mainly calculates the linear prediction cepstrum coefficient in the voice, and the calculation mode is as follows:
wherein,is shown asA phase linear prediction function. The final model input features are obtained by mixing the various preliminary speech features described above.
(2) Convolution and pooling: the convolution operation using the deep convolutional neural network extracts higher-level features of the input features, which can be expressed by the following formula:
wherein,the input of the convolutional layer is defined,representing the learned weights in a convolution kernel, whereinIs a convolution kernel function and acts on a nonlinear mapping function;
the pooling operation using the deep convolutional neural network extracts higher-level features of the input features, which can be expressed by the following formula:
wherein,defining input to pooling layers, common pooling operationsThe function may take the maximum, minimum or average value.
(3) Capsule neural network: the basic calculation unit of the capsule network is a group of neurons, each vector represents a group of neuron structures, and the calculation between two layers of the capsule network needs to be carried out through two steps: the prediction is summed with the prediction. The intermediate prediction result is obtained by matrix multiplication between the input capsule and the prediction weight, and the specific formula is calculated as follows:
wherein,is a low-level capsule neural network, and is characterized in that,expressed as a predicted result.
The characteristics of the lower layer network prediction layer are expressed in learning, the capsule neural network relearns the same predicted weight value of each part of the lower layer, and a specific formula is calculated as follows:
wherein,expressed as predictive low-level capsulesFor high-rise capsuleThe prediction of (a) is performed,expressed as predicted weights, the final high-level capsule weights all predictions and sums to get the net input.
Finally, the summed predicted expression needs to be mapped through a non-linear mapping, and since the smallest computational unit in the capsule neural network is a group of neurons, the activation function is changed, and is mainly expressed as follows:
wherein,the final capsule output expression.
(4) Total loss function: firstly, the content of a task is determined, and a plurality of classifiers are designed to correspond to the learning target of the multi-task. For a learning objective of a single task, the corresponding loss function is designed as follows:
wherein,for a certain class of real sample tags corresponding to speech,to be the probability value after the classifier softmax,representing the total number of samples, by superpositionOf samples in a certain kind of taskA loss function is used for obtaining the average loss value of all samples on the task;
because the model corresponds to the problem of multi-task speech classification, a rule needs to be designed to fuse the independent loss functions, so that the total loss function of the multi-task speech classification model is specifically expressed as follows:
wherein,representing the above-described penalty function for a single task in the overall sample,representing the total loss function of the final multi-task speech recognition problem for the number of multiple tasks in practiceExpressed as the sum of all individual loss functions.
Referring to fig. 3, a computed topology map in any two-layer network based on a capsule neural network,feature expressions learned by the neural network of the capsule expressed as low layersTo perform a learning and a prediction of the input to an expression of the high level, the result of the prediction level is hidden in the diagramAnd each weight value of the prediction layerFinally, the expression of the next high-level capsule can be obtainedThe specific expression is as follows:
。
Claims (5)
1. A multitask speech classification method based on capsule neural network is characterized in that the capsule neural network is used for extracting high-level abstract features of speech, and meanwhile, a multi-classifier is adopted to complete multitask classification of the speech, and the method comprises the following steps:
(1) preprocessing a voice original signal, and extracting a voice low-level feature expression by adopting a voice signal feature extraction algorithm;
(2) extracting a middle-layer feature expression of the voice signal by using a deep convolutional neural network;
(3) further extracting a high-level abstract feature representation of the speech using a capsule neural network;
(4) and designing a plurality of different classifiers and loss functions to realize the end-to-end integral training of the multi-task voice classification.
2. The capsule neural network-based multitask speech classification method according to claim 1, said step (1) includes the following steps:
(11) the original feature expression of the voice is one-dimensional high-dimensional feature, in a voice preprocessing model, different traditional feature extraction algorithms are adopted to extract feature time domain and frequency domain features from the original audio, and finally, a plurality of features are fused and expressed to be input into a deep neural network model;
(12) the time domain speech feature extraction algorithm adopts a Linear Predictive Coding Coefficient (LPCC), is a homomorphic signal processing method, adopts a Mel Frequency Cepstrum Coefficient (MFCC) and utilizes Fourier to extract a speech signal; and finally forming the input of the deep neural network model by fusing the primary voice features with different characteristics.
3. The capsule neural network-based multitask speech classification method according to claim 1, said step (2) includes the following steps:
(21) the convolution operation using the deep convolutional neural network in the step (2) extracts higher-layer features of the input features, which can be expressed by the following formula:
wherein,the input of the convolutional layer is defined,represents the learned weights in the convolution kernel, whichInIs a convolution kernel function and acts on a nonlinear mapping function;
(22) the step (2) of extracting higher-level features of the input features by using a pooling operation of the deep convolutional neural network can be represented by the following formula:
wherein,defining the input of the pooling layer, since the pooling layer does not have the learned parameters, it does not(ii) a Common pooling operationsThe function may take the maximum, minimum or average value.
4. The capsule neural network-based multitask speech classification method according to claim 1, said step (3) includes the following steps:
(31) the capsule neural network is different from the traditional deep neural network, the minimum unit of calculation is a group of neurons, and two weights with different functions exist in the capsule network and are respectively used for prediction and predicted weight;
(32) firstly, in a prediction layer of a capsule network, similar to the traditional feedforward calculation, a prediction result is obtained by matrix multiplication between an input capsule and a prediction weight, and a specific formula is calculated as follows:
wherein,is a low-level capsule neural network, and is characterized in that,expressed as a predicted result, it is noted thatAndare all the expression of a group of neurons;
(33) different from the traditional convolutional neural network, the characteristics of the lower layer network prediction upper layer are expressed in learning, the capsule neural network relearns the same predicted weight value for each part of the lower layer, and a specific formula is calculated as follows:
wherein,expressed as predictive low-level capsulesFor high-rise capsuleThe prediction of (a) is performed,expressed as the predicted weight, the final high-level capsule weights and sums all predictions to obtain the net input; notably, the update of parameters is adopted differently from the conventional neural networkGradient descent method, hereinUpdating by a dynamic routing algorithm;
(34) finally, the summed predicted expression needs to be mapped through a non-linear mapping, and since the smallest computational unit in the capsule neural network is a group of neurons, the activation function is changed, and is mainly expressed as follows:
the activated prediction expression has two meanings, wherein the direction of the prediction expression represents the attribute of the category, and the magnitude of the prediction expression represents the probability of the category.
5. The capsule neural network-based multitask speech classification method according to claim 1, said step (4) includes the following steps:
(41) determining voice multitask classification content and digitizing labels corresponding to the multitasks;
(42) defining a plurality of classifiers according to different types of classification contents;
(43) designing corresponding loss functions aiming at different classifiers; the specific function design is as follows:
wherein,for a certain class of real sample tags corresponding to speech,to be the probability value after the classifier softmax,representing the total number of samples, by superpositionObtaining the loss average value of all samples on a certain type of task by using the loss function of each sample on the certain type of task;
the above is only for a single classification result in multitask, a loss function is designed, and for the voice classification problem of multitask, the final loss function is defined as follows:
wherein,representing the above-described penalty function for a single task in the overall sample,representing the total loss function of the final multi-task speech recognition problem for the number of multiple tasks in practiceExpressed as the sum of all individual loss functions;
(44) through the steps of designing a network structure, constructing a data set, designing a loss function and the like, the whole end-to-end capsule neural network is trained by adopting a back propagation algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811346110.8A CN109523994A (en) | 2018-11-13 | 2018-11-13 | A kind of multitask method of speech classification based on capsule neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811346110.8A CN109523994A (en) | 2018-11-13 | 2018-11-13 | A kind of multitask method of speech classification based on capsule neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109523994A true CN109523994A (en) | 2019-03-26 |
Family
ID=65776175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811346110.8A Pending CN109523994A (en) | 2018-11-13 | 2018-11-13 | A kind of multitask method of speech classification based on capsule neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109523994A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110120224A (en) * | 2019-05-10 | 2019-08-13 | 平安科技(深圳)有限公司 | Construction method, device, computer equipment and the storage medium of bird sound identification model |
CN110428843A (en) * | 2019-03-11 | 2019-11-08 | 杭州雄迈信息技术有限公司 | A kind of voice gender identification deep learning method |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
CN110968729A (en) * | 2019-11-21 | 2020-04-07 | 浙江树人学院(浙江树人大学) | Family activity sound event classification method based on additive interval capsule network |
CN111179961A (en) * | 2020-01-02 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Audio signal processing method, audio signal processing device, electronic equipment and storage medium |
CN111357051A (en) * | 2019-12-24 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
CN111584010A (en) * | 2020-04-01 | 2020-08-25 | 昆明理工大学 | Key protein identification method based on capsule neural network and ensemble learning |
CN111862949A (en) * | 2020-07-30 | 2020-10-30 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN112562725A (en) * | 2020-12-09 | 2021-03-26 | 山西财经大学 | Mixed voice emotion classification method based on spectrogram and capsule network |
CN112599134A (en) * | 2020-12-02 | 2021-04-02 | 国网安徽省电力有限公司 | Transformer sound event detection method based on voiceprint recognition |
CN112992191A (en) * | 2021-05-12 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Voice endpoint detection method and device, electronic equipment and readable storage medium |
CN113314119A (en) * | 2021-07-27 | 2021-08-27 | 深圳百昱达科技有限公司 | Voice recognition intelligent household control method and device |
CN113343924A (en) * | 2021-07-01 | 2021-09-03 | 齐鲁工业大学 | Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network |
CN113362857A (en) * | 2021-06-15 | 2021-09-07 | 厦门大学 | Real-time speech emotion recognition method based on CapcNN and application device |
CN113378855A (en) * | 2021-06-22 | 2021-09-10 | 北京百度网讯科技有限公司 | Method for processing multitask, related device and computer program product |
CN113378984A (en) * | 2021-07-05 | 2021-09-10 | 国药(武汉)医学实验室有限公司 | Medical image classification method, system, terminal and storage medium |
CN113782000A (en) * | 2021-09-29 | 2021-12-10 | 北京中科智加科技有限公司 | Language identification method based on multiple tasks |
CN114267360A (en) * | 2021-12-29 | 2022-04-01 | 达闼机器人有限公司 | Speech recognition and speech-based joint processing model training method and device |
CN115376518A (en) * | 2022-10-26 | 2022-11-22 | 广州声博士声学技术有限公司 | Voiceprint recognition method, system, device and medium for real-time noise big data |
US11735168B2 (en) | 2020-07-20 | 2023-08-22 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for recognizing voice |
WO2023222088A1 (en) * | 2022-05-20 | 2023-11-23 | 青岛海尔电冰箱有限公司 | Voice recognition and classification method and apparatus |
CN117275461A (en) * | 2023-11-23 | 2023-12-22 | 上海蜜度科技股份有限公司 | Multitasking audio processing method, system, storage medium and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06295196A (en) * | 1993-04-08 | 1994-10-21 | Casio Comput Co Ltd | Speech recognition device and signal recognition device |
WO2005059811A1 (en) * | 2003-12-16 | 2005-06-30 | Canon Kabushiki Kaisha | Pattern identification method, apparatus, and program |
US20160284346A1 (en) * | 2015-03-27 | 2016-09-29 | Qualcomm Incorporated | Deep neural net based filter prediction for audio event classification and extraction |
CN106601235A (en) * | 2016-12-02 | 2017-04-26 | 厦门理工学院 | Semi-supervision multitask characteristic selecting speech recognition method |
US20170148431A1 (en) * | 2015-11-25 | 2017-05-25 | Baidu Usa Llc | End-to-end speech recognition |
CN107578775A (en) * | 2017-09-07 | 2018-01-12 | 四川大学 | A kind of multitask method of speech classification based on deep neural network |
CN107610692A (en) * | 2017-09-22 | 2018-01-19 | 杭州电子科技大学 | The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net |
US20180068675A1 (en) * | 2016-09-07 | 2018-03-08 | Google Inc. | Enhanced multi-channel acoustic models |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
GB201807225D0 (en) * | 2018-03-14 | 2018-06-13 | Papercup Tech Limited | A speech processing system and a method of processing a speech signal |
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
-
2018
- 2018-11-13 CN CN201811346110.8A patent/CN109523994A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06295196A (en) * | 1993-04-08 | 1994-10-21 | Casio Comput Co Ltd | Speech recognition device and signal recognition device |
WO2005059811A1 (en) * | 2003-12-16 | 2005-06-30 | Canon Kabushiki Kaisha | Pattern identification method, apparatus, and program |
US20160284346A1 (en) * | 2015-03-27 | 2016-09-29 | Qualcomm Incorporated | Deep neural net based filter prediction for audio event classification and extraction |
US20170148431A1 (en) * | 2015-11-25 | 2017-05-25 | Baidu Usa Llc | End-to-end speech recognition |
US20180068675A1 (en) * | 2016-09-07 | 2018-03-08 | Google Inc. | Enhanced multi-channel acoustic models |
CN106601235A (en) * | 2016-12-02 | 2017-04-26 | 厦门理工学院 | Semi-supervision multitask characteristic selecting speech recognition method |
CN107578775A (en) * | 2017-09-07 | 2018-01-12 | 四川大学 | A kind of multitask method of speech classification based on deep neural network |
CN107610692A (en) * | 2017-09-22 | 2018-01-19 | 杭州电子科技大学 | The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
GB201807225D0 (en) * | 2018-03-14 | 2018-06-13 | Papercup Tech Limited | A speech processing system and a method of processing a speech signal |
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
Non-Patent Citations (6)
Title |
---|
LE,D等: "Discretized continuous speech emotion recognition with multi-task deep recurrent neural network", 《18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH-COMMUNICATION-ASSOCIATION(INTERSPEECH2017)》 * |
NAM KYUN KIM等: "Speech emotion recognition based on multi-task learning using a convolutional neural network", 《2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)》 * |
余成波等: "基于胶囊网络的指静脉识别研究", 《电子技术应用》 * |
朱应钊等: "胶囊网络技术及发展趋势研究", 《广东通信技术》 * |
胡文凭: "基于深层神经网络的口语发音检测与错误分析", 《中国博士学位论文全文数据库信息科技辑》 * |
郭俊文: "基于CAPSNET的可穿戴心电采集和心律失常检测系统研究", 《中国优秀硕士学位论文全文数据库医药卫生科技辑》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428843B (en) * | 2019-03-11 | 2021-09-07 | 杭州巨峰科技有限公司 | Voice gender recognition deep learning method |
CN110428843A (en) * | 2019-03-11 | 2019-11-08 | 杭州雄迈信息技术有限公司 | A kind of voice gender identification deep learning method |
CN110120224B (en) * | 2019-05-10 | 2023-01-20 | 平安科技(深圳)有限公司 | Method and device for constructing bird sound recognition model, computer equipment and storage medium |
CN110120224A (en) * | 2019-05-10 | 2019-08-13 | 平安科技(深圳)有限公司 | Construction method, device, computer equipment and the storage medium of bird sound identification model |
CN110968729A (en) * | 2019-11-21 | 2020-04-07 | 浙江树人学院(浙江树人大学) | Family activity sound event classification method based on additive interval capsule network |
CN110968729B (en) * | 2019-11-21 | 2022-05-17 | 浙江树人学院(浙江树人大学) | Family activity sound event classification method based on additive interval capsule network |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
CN111357051A (en) * | 2019-12-24 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
CN111357051B (en) * | 2019-12-24 | 2024-02-02 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
WO2021127982A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, smart device, and computer-readable storage medium |
US12039995B2 (en) | 2020-01-02 | 2024-07-16 | Tencent Technology (Shenzhen) Company Limited | Audio signal processing method and apparatus, electronic device, and storage medium |
CN111179961A (en) * | 2020-01-02 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Audio signal processing method, audio signal processing device, electronic equipment and storage medium |
CN111584010A (en) * | 2020-04-01 | 2020-08-25 | 昆明理工大学 | Key protein identification method based on capsule neural network and ensemble learning |
CN111584010B (en) * | 2020-04-01 | 2022-05-27 | 昆明理工大学 | Key protein identification method based on capsule neural network and ensemble learning |
US11735168B2 (en) | 2020-07-20 | 2023-08-22 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for recognizing voice |
CN111862949B (en) * | 2020-07-30 | 2024-04-02 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN111862949A (en) * | 2020-07-30 | 2020-10-30 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN112599134A (en) * | 2020-12-02 | 2021-04-02 | 国网安徽省电力有限公司 | Transformer sound event detection method based on voiceprint recognition |
CN112562725A (en) * | 2020-12-09 | 2021-03-26 | 山西财经大学 | Mixed voice emotion classification method based on spectrogram and capsule network |
CN112992191A (en) * | 2021-05-12 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Voice endpoint detection method and device, electronic equipment and readable storage medium |
CN112992191B (en) * | 2021-05-12 | 2021-11-05 | 北京世纪好未来教育科技有限公司 | Voice endpoint detection method and device, electronic equipment and readable storage medium |
CN113362857A (en) * | 2021-06-15 | 2021-09-07 | 厦门大学 | Real-time speech emotion recognition method based on CapcNN and application device |
CN113378855A (en) * | 2021-06-22 | 2021-09-10 | 北京百度网讯科技有限公司 | Method for processing multitask, related device and computer program product |
CN113343924A (en) * | 2021-07-01 | 2021-09-03 | 齐鲁工业大学 | Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network |
CN113378984B (en) * | 2021-07-05 | 2023-05-02 | 国药(武汉)医学实验室有限公司 | Medical image classification method, system, terminal and storage medium |
CN113378984A (en) * | 2021-07-05 | 2021-09-10 | 国药(武汉)医学实验室有限公司 | Medical image classification method, system, terminal and storage medium |
CN113314119B (en) * | 2021-07-27 | 2021-12-03 | 深圳百昱达科技有限公司 | Voice recognition intelligent household control method and device |
CN113314119A (en) * | 2021-07-27 | 2021-08-27 | 深圳百昱达科技有限公司 | Voice recognition intelligent household control method and device |
CN113782000A (en) * | 2021-09-29 | 2021-12-10 | 北京中科智加科技有限公司 | Language identification method based on multiple tasks |
CN114267360A (en) * | 2021-12-29 | 2022-04-01 | 达闼机器人有限公司 | Speech recognition and speech-based joint processing model training method and device |
WO2023222088A1 (en) * | 2022-05-20 | 2023-11-23 | 青岛海尔电冰箱有限公司 | Voice recognition and classification method and apparatus |
CN115376518A (en) * | 2022-10-26 | 2022-11-22 | 广州声博士声学技术有限公司 | Voiceprint recognition method, system, device and medium for real-time noise big data |
CN117275461A (en) * | 2023-11-23 | 2023-12-22 | 上海蜜度科技股份有限公司 | Multitasking audio processing method, system, storage medium and electronic equipment |
CN117275461B (en) * | 2023-11-23 | 2024-03-15 | 上海蜜度科技股份有限公司 | Multitasking audio processing method, system, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109523994A (en) | A kind of multitask method of speech classification based on capsule neural network | |
Abdullah et al. | SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning | |
CN111312245B (en) | Voice response method, device and storage medium | |
CN110263324A (en) | Text handling method, model training method and device | |
CN113127624B (en) | Question-answer model training method and device | |
CN108920622A (en) | A kind of training method of intention assessment, training device and identification device | |
CN112216307B (en) | Speech emotion recognition method and device | |
Shahriar et al. | Classifying maqams of Qur’anic recitations using deep learning | |
CN113837299B (en) | Network training method and device based on artificial intelligence and electronic equipment | |
CN113065344A (en) | Cross-corpus emotion recognition method based on transfer learning and attention mechanism | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
CN112183106A (en) | Semantic understanding method and device based on phoneme association and deep learning | |
CN111666752A (en) | Circuit teaching material entity relation extraction method based on keyword attention mechanism | |
Atkar et al. | Speech emotion recognition using dialogue emotion decoder and CNN Classifier | |
Cao et al. | Speaker-independent speech emotion recognition based on random forest feature selection algorithm | |
Chen et al. | Construction of affective education in mobile learning: The study based on learner’s interest and emotion recognition | |
CN113887836A (en) | Narrative event prediction method fusing event environment information | |
CN116757195B (en) | Implicit emotion recognition method based on prompt learning | |
CN117808103A (en) | Co-emotion reply generation method based on speech-level feature dynamic interaction | |
CN109785863A (en) | A kind of speech-emotion recognition method and system of deepness belief network | |
CN116403608A (en) | Speech emotion recognition method based on multi-label correction and space-time collaborative fusion | |
CN114239565A (en) | Deep learning-based emotion reason identification method and system | |
Novais | A framework for emotion and sentiment predicting supported in ensembles | |
CN115270805A (en) | Semantic information extraction method of service resources | |
Lin et al. | Speech emotion recognition based on dynamic convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190326 |