[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115331678B - Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient - Google Patents

Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient Download PDF

Info

Publication number
CN115331678B
CN115331678B CN202210304605.4A CN202210304605A CN115331678B CN 115331678 B CN115331678 B CN 115331678B CN 202210304605 A CN202210304605 A CN 202210304605A CN 115331678 B CN115331678 B CN 115331678B
Authority
CN
China
Prior art keywords
grnn
signal
mfcc
training
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210304605.4A
Other languages
Chinese (zh)
Other versions
CN115331678A (en
Inventor
汪勇
姚琦海
杨益新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210304605.4A priority Critical patent/CN115331678B/en
Publication of CN115331678A publication Critical patent/CN115331678A/en
Application granted granted Critical
Publication of CN115331678B publication Critical patent/CN115331678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention relates to a generalized regression neural network acoustic signal identification method utilizing Mel frequency cepstrum coefficients, which combines MFCC and GRNN, fully exerts the advantages of rich sound characteristics of the MFCC and nonlinear fitting of the GRNN, and effectively identifies seal types. Firstly, extracting the MFCC characteristics of an acoustic signal, carrying out FFT and Mel filtering, solving the L-order MFCC, calculating a cepstrum difference parameter, carrying out the test of a GRNN model, wherein the optimal expansion factor is determined by a k-fold cross validation method, dividing training data into k-folds, sequentially taking the k-folds as a validation set for the test, using the obtained optimal expansion factor for the training of the GRNN, and identifying the tested acoustic data. The influence of the signal-to-noise ratio reduction on the GRNN method is minimum, when the signal-to-noise ratio is more than 5dB, the GRNN method can realize accurate identification, and when the signal-to-noise ratio is 0dB, the GRNN method can still realize approximate identification.

Description

Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient
Technical Field
The invention belongs to the fields of digital signal processing, machine learning, underwater sound measurement and the like, and relates to a generalized regression neural network sound signal identification method utilizing Mel frequency cepstrum coefficients, which utilizes Mel frequency cepstrum coefficients and a generalized regression neural network model to realize sound signal identification under a plurality of signal to noise ratios.
Background
Under the background of rapid development of machine learning technology, as the data driving models such as the neural network and the like can mine deep features of different target acoustic signals, the influence of noise can be reduced to a great extent, and the autonomy and the intellectualization of classification decision can be effectively realized, the machine learning method is widely researched and applied in the field of acoustic signal processing. In 2012, liu Jian et al input the energy spectrum characteristics of the acoustic signals into a classification model of a support vector machine (Support vector machine, SVM) and the results indicate that the method can effectively identify ship radiation noise signals (Liu Jian, liu Zhong, xiong Ying. Underwater target identification based on wavelet packet energy spectrum and SVM. University of martial arts, journal of traffic science and engineering, 2012,36 (2): 5.). In 2017, yue et al realized that unlabeled underwater target acoustic signal identification was (Hao Y,Zhang L,Wang D,et al.The Classification of Underwater Acoustic Targets Based on Deep Learning Methods.2017 2nd International Conference on Control,Automation and Artificial Intelligence.2017.).2019 by establishing a deep belief network (Deep Brief Network, DBN) model, lv Haitao et al classified the framed and normalized ship noise signals using convolutional neural networks (Convolutional Neural Networks, CNN), and the results showed that the classification performance was superior to the traditional higher-order spectrum classification method (Lv Haitao, jianwen, kong Xiaopeng. Convolutional neural network-based underwater target classification technique, ship electronic engineering, 2019,39 (2): 158-162.). In 2020, zhong et al collect underwater sound signals and input the collected underwater sound signals into a CNN model for detection (Zhong M,Castellote M,Dodhia R,et al.Beluga whale acoustic signal classification using deep learning neural network models.The Journal of the Acoustical Society of America,147(3):1834.). of white whales, in 2021, MISHACHANDAR et al use CNN for marine noise identification, and a sound signal identification method based on a machine learning model above artificial sound, natural sound and marine animal sound (Mishachandar B,Vairamuthu S.Diverse ocean noise classification using deep learning.Applied Acoustics,2021.). is often complex and requires training of a large number of network parameters, and training time is long.
In summary, in a plurality of signal-to-noise ratio environments, a machine learning method that combines an acoustic signal feature extraction technique with a neural network nonlinear fitting capability, and has a simple structure and fewer parameters is indispensable.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides a generalized regression neural network acoustic signal identification method utilizing Mel frequency cepstrum coefficients.
Technical proposal
A generalized regression neural network acoustic signal identification method utilizing Mel frequency cepstrum coefficients is characterized by comprising the following steps:
Step 1: extracting MFCC characteristics from the acquired underwater acoustic signals;
Step 2: performing FFT on each frame of signal to obtain a frequency spectrum;
step 3: filtering the frequency spectrum through a group of triangular band-pass filters to obtain Mel filtering;
step 4: calculating the logarithmic energy output by each filter, calculating the discrete cosine transform of the logarithmic energy, and solving the MFCC of L-order;
Step 5: calculating cepstrum differential parameters by using the L MFCC cepstrum coefficients, and combining the three parameters of the MFCC, the first-order cepstrum differential parameters and the second-order cepstrum differential parameters to serve as feature vectors of signals;
step 6: 3/4 of all processed data are used for training and verifying data, and the other data are used for testing the GRNN model;
Step 7: the training and verifying data for the model are randomly divided into k folds, the optimal expansion factors are determined by a k-fold cross verification method, and the measurement indexes of the identification results of the verification set under different expansion factors are expressed as follows:
Where N is the number of samples, The number of correctly identified samples out of the N samples;
The verification set is 1-fold, and the training set is other k-1 folds;
step 7: for each expansion factor, firstly, a training set is used for training, then a test evidence set is tested, and the corresponding accuracy is calculated; taking each training sample as a primary verification set, repeating the process, and counting and calculating the average value of k accuracies, namely the average accuracy;
Repeating the steps for all the expansion factors, and taking the expansion factor corresponding to the minimum value of the average accuracy as the optimal expansion factor;
step 8: and taking the obtained optimal expansion factors as parameters of the GRNN model, using training data for training the GRNN model, inputting test data under a plurality of signal to noise ratios into the model, carrying out acoustic signal recognition on the test data, counting recognition results, analyzing the recognition performance of the GRNN model, and realizing the recognition of real-time data by the trained and tested GRNN model.
And when the MFCC characteristics are extracted, pre-emphasis, framing and windowing pretreatment are carried out on the acquired acoustic signals.
The value range of the expansion factor is as follows: 0.01,0.02, …,0.1, step size 0.01.
Advantageous effects
The generalized regression neural network acoustic signal identification method utilizing the Mel frequency cepstrum coefficient combines the MFCC with the GRNN, fully plays the advantages of rich sound characteristics of the MFCC and nonlinear fitting of the GRNN, and effectively identifies seal types. Firstly, extracting the MFCC characteristics of an acoustic signal, carrying out FFT and Mel filtering, solving the L-order MFCC, calculating a cepstrum difference parameter, carrying out the test of a GRNN model, wherein the optimal expansion factor is determined by a k-fold cross validation method, dividing training data into k-folds, sequentially taking the k-folds as a validation set for the test, using the obtained optimal expansion factor for the training of the GRNN, and identifying the tested acoustic data.
The study analyzes the identification performance of the method under different signal-to-noise ratios, gaussian white noise is added to an initial sound signal, the noise bandwidth is corresponding to different types of seal sound signals, the signal-to-noise ratios are respectively 10dB, 5dB and 0dB, waveforms of the initial signal of the seal and the signal under different signal-to-noise ratios are taken as examples, and as shown in fig. 6, the lower the signal-to-noise ratio is, the more burrs appear in the signal. The training samples are used for training and verifying the GRNN model by 10-fold cross verification, the GRNN optimization expansion factor process of the MFCC features of each scene is shown in fig. 7, and the GRNN optimization expansion factor process can be obtained analytically, and overall, as the expansion factor becomes larger, the recognition performance is reduced, and the optimal expansion factor is concentrated between 0 and 0.1.
In the research, SVM and CNN are used as comparison models, the SVM model uses radial basis functions as kernel functions, the CNN model structure is shown in figure 8, and the optimization algorithm adopts Sgdm algorithm. Tables 1, 2 and 3 show the recognition accuracy of GRNN, CNN and SVM methods in respective scenes, A, B, C showing leopard seals, ross seals and wedel seals, respectively. The GRNN, CNN and SVM methods can effectively identify seal types under the condition of high signal-to-noise ratio, but the influence of the reduction of the signal-to-noise ratio on the SVM method is larger, and when the signal-to-noise ratio is 0dB, the error of the SVM method is larger, so that the effective identification cannot be realized. Compared with SVM, the CNN method is less affected by the reduction of the signal-to-noise ratio, and has little difference with the GRNN method under the condition of high signal-to-noise ratio, the GRNN method is only superior to the CNN method to a certain extent, but under the condition of low signal-to-noise ratio, particularly at 0dB, the error of the CNN method is larger. The influence of the signal-to-noise ratio reduction on the GRNN method is minimum, when the signal-to-noise ratio is more than 5dB, the GRNN method can realize accurate identification, and when the signal-to-noise ratio is 0dB, the GRNN method can still realize approximate identification. Overall, GRNN can robustly achieve seal class identification at various signal-to-noise ratios because the MFCC-GRNN model combines the feature dominance of MFCCs with the nonlinear fitting capabilities of GRNNs.
Drawings
Fig. 1: GRNN model structure diagram
Fig. 2: seal type identification method overall flow block diagram based on MFCC and GRNN
Fig. 3: woltzfeldt database collection chart
Fig. 4: sound waveform diagram of seal of different kinds
Fig. 5: different kinds of seal MFCC characteristics
(A) A leopard seal; (b) ross seal; (c) Widel seal
Fig. 6: initial signal and waveforms at different signal-to-noise ratios
Fig. 7: optimizing expansion factor process for GRNN of each scene
Fig. 8: CNN structure diagram
Detailed Description
The invention will now be further described with reference to examples, figures:
mel frequency cepstral coefficient (Mel-frequency cepstral coefficient, MFCC), a feature widely used in speech recognition, is proposed based on human ear characteristics. Because of the specificity of the human ear structure, the listener can automatically separate the low frequency part and the high frequency part of the voice, wherein the low frequency part is the main part for recognizing the voice characteristics.
The generalized regression neural network (Generalized regression neural network, GRNN) belonging to the forward neural network has better nonlinear mapping capability based on kernel regression analysis. The GRNN obtains a conditional probability density function by calculating the input and output of training data and the input of test data, so that the output of the test data is further obtained. The GRNN comprises an input layer, a mode layer, a summation layer and an output layer, only one network parameter is needed, and other neural network models generally need to select a plurality of parameters, so that the GRNN has obvious advantages in network construction. GRNN only has one network parameter, and the training performance can be improved by optimizing the expansion factor. The present study uses k-fold cross-validation to determine the optimal expansion factor, and FIG. 1 is a graph of the GRNN model block diagram.
The present study proposes a method for accurately identifying seal types based on GRNN of input MFCC features. The research combines the MFCC with the GRNN, fully plays the advantages of the rich voice characteristics of the MFCC and the nonlinear fitting of the GRNN, and effectively identifies seal types.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
1) The present study requires extraction of MFCC characteristics of acoustic signals, and first, preprocessing of the acquired acoustic signals, including pre-emphasis, framing and windowing. Pre-emphasis is the flattening of the spectrum of a signal by boosting the spectrum of the high frequency portion of the signal. Framing may divide a signal into several short periods of time, during which the signal may be considered a stationary process. In the framing process, an overlapped segmentation method is generally adopted, so that frames are excessively smoothed. Windowing is used to reduce the truncation effect of the signal
2) After the preprocessing is completed, FFT is carried out on each frame of signal to obtain a frequency spectrum.
3) And filtering the frequency spectrum by a group of triangular band-pass filters to obtain Mel filtering.
4) The logarithmic energy of each filter output is calculated and its discrete cosine transform is calculated to find the MFCC of the L-order.
5) And calculating a cepstrum difference parameter (Delta Cepstrum) according to the L MFCC cepstrum values, and combining the MFCC, the first-order and second-order cepstrum difference parameters as a characteristic vector of the signal.
6) The collected different types of sound data are randomly divided in proportion, and part of the data are used for training and verification, and the other data are used for testing the GRNN model.
7) The expansion factor range and the step length are selected in the research, so that a large number of expansion factors are obtained for optimizing the GRNN model.
8) The method comprises the steps of dividing training data into k folds, sequentially using the k folds as verification sets for testing, and selecting the expansion factor with the best recognition performance for the verification sets as the optimal expansion factor.
9) And using the obtained optimal expansion factor for training of GRNN, and identifying the test sound data.
Detailed description of the preferred embodiment
By extracting the MFCC features and optimizing the expansion factors, the GRNN model can fully utilize the training data and optimizing the model to realize the recognition of the sound signal, and referring to fig. 2, the overall process is specifically constructed and trained by the following steps:
1) The data preprocessing includes pre-emphasis, framing and windowing. Pre-emphasis: the frequency spectrum of the high-frequency part of the signal is improved, so that the frequency spectrum of the signal becomes more gentle; framing: dividing the signal into a plurality of short-period signals, wherein the signals can be regarded as a stable process in the short period; windowing: let s (n) be the signal and w (n) be the window function. The windowed signal s' (n) is:
s'(n)=s(n)w(n) (1)
wherein N is more than or equal to 0 and less than or equal to N-1, N is the number of sample points, and w (N) is usually a Hamming window.
2) After preprocessing is completed, FFT is performed on each frame of signal to obtain a frequency spectrum, and the discrete frequency spectrum S' a (k) of the signal is:
3) The spectrum is filtered by a set of triangular band-pass filters to obtain Mel filtering, M filters are provided, the center frequency is f (M), and m=1, 2, … and M. The formula of the triangular filter is:
4) The logarithmic energy of each filter output is calculated:
5) Performing discrete cosine transform on the M logarithm energies obtained by calculation to obtain an L-order MFCC, wherein L is 12-16 in general, and the discrete cosine transform formula is as follows:
6) The cepstrum difference parameter is calculated according to the L MFCC cepstrum values, and the formula is as follows:
where d represents the nth first order difference result, C n represents the nth cepstrum coefficient obtained by the calculation of the formula (5), L represents the order when the MFCC is obtained, and K represents the time difference of the first order derivative, preferably 1 or 2. And (3) carrying the calculation result into the formula (6) again to obtain a second-order differential result.
7) And combining the MFCC, the first-order cepstrum differential parameter and the second-order cepstrum differential parameter to serve as signal characteristic vectors of the input GRNN model.
8) 3/4 Of the data from each type of sound data is randomly selected for training and verification of the model, and the remaining 1/4 is used for data testing.
9) The present study uses a k-fold cross validation method to determine the optimal spreading factor, which first requires determining the range of values for the spreading factor, e.g., 0.01,0.02, …,0.1, step size 0.01.
10 Randomly dividing the data for training and validation of the model into k-folds, the validation set being 1-fold therein, the training set being the other k-1-folds.
11 Using the accuracy as a measurement index of the identification result of the verification set under different expansion factors, and adopting the following formula:
Where N is the number of samples, For the number of correctly identified samples of the N samples.
12 For each expansion factor, firstly using the training set for training, then testing the test evidence set, and calculating the corresponding accuracy; and taking each training sample as a verification set, repeating the process, and counting and calculating the average value of k accuracies, namely the average accuracy.
13 Repeating the steps for all the expansion factors, and taking the expansion factor corresponding to the minimum value of the average accuracy as the optimal expansion factor.
14 The obtained optimal expansion factors are used as parameters of the GRNN model, training data are used for training the GRNN model, test data under multiple signal to noise ratios are input into the model, acoustic signal recognition is carried out on the test data, recognition results are counted, recognition performance of the GRNN model is analyzed, and the GRNN model after training and testing can realize recognition on real-time data.
Specific implementation examples:
In the research, the underwater acoustic signals of seals are taken as examples, the types of the seal, the Ross seal and the Widel seal are identified, and all three seals live in south poles and easily appear in the same seal area. The input data used in this study were target acoustic signals such as the seal, ross seal, and wick seal in the waters Database (WATKINS MARINE MAMMAL Sound Database), fig. 3 is an area collected by the Database, the seal data used in this study are in an elliptical area, and fig. 4 is a waveform diagram of different types of seal sounds.
The research input is the MFCC characteristic of audio data, firstly, each segment of data is segmented, 256 samples are respectively segmented, each segment is offset by 128 samples, the segmentation is consistent with the previous research framing, the MFCC of each segment of data is obtained, 24 groups of filters are arranged in the extraction process, the first-order and second-order differential coefficients of the MFCC are obtained, and the MFCC, the first-order and second-order differential coefficients are subjected to characteristic fusion, so that each segment of data obtains a characteristic vector of 1x 36, and the characteristic vector is used as the characteristic input of a single sample. 3/4 of the data in each category is selected for data training, and the remaining 1/4 is used for data testing, so that 14762 training samples and 4923 test samples are obtained. The MFCC characteristics of the leopard seal, ross seal and wedel seal are shown in fig. 5 (a), 5 (b) and 5 (c), respectively.
In practical application, the ocean often has different degrees of environmental noise, the research analyzes the identification performance of the method under different signal-to-noise ratios, the Gaussian white noise is added to an initial sound signal, the noise bandwidth is corresponding to different types of seal sound signals, the signal-to-noise ratios are respectively 10dB, 5dB and 0dB, the waveforms of the initial signal of the seal and the signal under different signal-to-noise ratios are taken as examples, and as shown in fig. 6, the lower the signal-to-noise ratio is, the more burrs appear in the signal. The training samples are used for training and verifying the GRNN model by 10-fold cross verification, the GRNN optimization expansion factor process of the MFCC features of each scene is shown in fig. 7, and the GRNN optimization expansion factor process can be obtained analytically, and overall, as the expansion factor becomes larger, the recognition performance is reduced, and the optimal expansion factor is concentrated between 0 and 0.1.
In the research, SVM and CNN are used as comparison models, the SVM model uses radial basis functions as kernel functions, the CNN model structure is shown in figure 8, and the optimization algorithm adopts Sgdm algorithm. Tables 1, 2 and 3 show the recognition accuracy of GRNN, CNN and SVM methods in respective scenes, A, B, C showing leopard seals, ross seals and wedel seals, respectively. The GRNN, CNN and SVM methods can effectively identify seal types under the condition of high signal-to-noise ratio, but the influence of the reduction of the signal-to-noise ratio on the SVM method is larger, and when the signal-to-noise ratio is 0dB, the error of the SVM method is larger, so that the effective identification cannot be realized. Compared with SVM, the CNN method is less affected by the reduction of the signal-to-noise ratio, and has little difference with the GRNN method under the condition of high signal-to-noise ratio, the GRNN method is only superior to the CNN method to a certain extent, but under the condition of low signal-to-noise ratio, particularly at 0dB, the error of the CNN method is larger. The influence of the signal-to-noise ratio reduction on the GRNN method is minimum, when the signal-to-noise ratio is more than 5dB, the GRNN method can realize accurate identification, and when the signal-to-noise ratio is 0dB, the GRNN method can still realize approximate identification. Overall, GRNN can robustly achieve seal class identification at various signal-to-noise ratios because the MFCC-GRNN model combines the feature dominance of MFCCs with the nonlinear fitting capabilities of GRNNs.
Table 1 GRNN method seal class identification accuracy
Signal to noise ratio/dB A B C Together, a total of
Noiseless 0.9956 0.9828 0.9424 0.9616
10 0.9867 0.9553 0.9124 0.9344
5 0.9757 0.9467 0.8937 0.9200
0 0.9115 0.9065 0.8647 0.8838
Table 2 CNN method seal class identification accuracy
Signal to noise ratio/dB A B C Together, a total of
Noiseless 0.9823 0.9713 0.9171 0.9423
10 0.9646 0.9524 0.9061 0.9279
5 0.9292 0.9346 0.8918 0.9104
0 0.8717 0.8521 0.8544 0.8552
Table 3 SVM method seal class identification accuracy
Signal to noise ratio/dB A B C Together, a total of
Noiseless 0.9862 0.9731 0.9252 0.9478
10 0.9403 0.9335 0.8966 0.9137
5 0.8805 0.8539 0.8302 0.8956
0 0.8164 0.8200 0.8163 0.8176

Claims (3)

1. A generalized regression neural network acoustic signal identification method utilizing Mel frequency cepstrum coefficients is characterized by comprising the following steps:
Step 1: extracting MFCC characteristics from the acquired underwater acoustic signals;
Step 2: performing FFT on each frame of signal to obtain a frequency spectrum;
step 3: filtering the frequency spectrum through a group of triangular band-pass filters to obtain Mel filtering;
step 4: calculating the logarithmic energy output by each filter, calculating the discrete cosine transform of the logarithmic energy, and solving the MFCC of L-order;
Step 5: calculating cepstrum differential parameters by using the L MFCC cepstrum coefficients, and combining the three parameters of the MFCC, the first-order cepstrum differential parameters and the second-order cepstrum differential parameters to serve as feature vectors of signals;
step 6: 3/4 of all processed data are used for training and verifying data, and the other data are used for testing the GRNN model;
Step 7: the training and verifying data for the model are randomly divided into k folds, the optimal expansion factors are determined by a k-fold cross verification method, and the measurement indexes of the identification results of the verification set under different expansion factors are expressed as follows:
Where N is the number of samples, The number of correctly identified samples out of the N samples;
The verification set is 1-fold, and the training set is other k-1 folds;
step 7: for each expansion factor, firstly, a training set is used for training, then a test evidence set is tested, and the corresponding accuracy is calculated; taking each training sample as a primary verification set, repeating the process, and counting and calculating the average value of k accuracies, namely the average accuracy;
Repeating the steps for all the expansion factors, and taking the expansion factor corresponding to the minimum value of the average accuracy as the optimal expansion factor;
step 8: and taking the obtained optimal expansion factors as parameters of the GRNN model, using training data for training the GRNN model, inputting test data under a plurality of signal to noise ratios into the model, carrying out acoustic signal recognition on the test data, counting recognition results, analyzing the recognition performance of the GRNN model, and realizing the recognition of real-time data by the trained and tested GRNN model.
2. The generalized regression neural network acoustic signal recognition method utilizing Mel frequency cepstrum coefficients according to claim 1, wherein: and when the MFCC characteristics are extracted, pre-emphasis, framing and windowing pretreatment are carried out on the acquired acoustic signals.
3. The generalized regression neural network acoustic signal recognition method utilizing Mel frequency cepstrum coefficients according to claim 1, wherein: the value range of the expansion factor is as follows: 0.01,0.02, …,0.1, step size 0.01.
CN202210304605.4A 2022-03-21 2022-03-21 Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient Active CN115331678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210304605.4A CN115331678B (en) 2022-03-21 2022-03-21 Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210304605.4A CN115331678B (en) 2022-03-21 2022-03-21 Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient

Publications (2)

Publication Number Publication Date
CN115331678A CN115331678A (en) 2022-11-11
CN115331678B true CN115331678B (en) 2024-10-22

Family

ID=83915684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210304605.4A Active CN115331678B (en) 2022-03-21 2022-03-21 Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient

Country Status (1)

Country Link
CN (1) CN115331678B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840877B (en) * 2022-12-06 2023-07-07 中国科学院空间应用工程与技术中心 Distributed stream processing method, system, storage medium and computer for MFCC extraction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105611477A (en) * 2015-12-27 2016-05-25 北京工业大学 Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid
CN108447470A (en) * 2017-12-28 2018-08-24 中南大学 A kind of emotional speech conversion method based on sound channel and prosodic features

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082679A1 (en) * 2016-09-18 2018-03-22 Newvoicemedia, Ltd. Optimal human-machine conversations using emotion-enhanced natural speech using hierarchical neural networks and reinforcement learning
KR20190019726A (en) * 2017-08-18 2019-02-27 인하대학교 산학협력단 System and method for hidden markov model based uav sound recognition using mfcc technique in practical noisy environments
CN108847244A (en) * 2018-08-22 2018-11-20 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Voiceprint recognition method and system based on MFCC and improved BP neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105611477A (en) * 2015-12-27 2016-05-25 北京工业大学 Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid
CN108447470A (en) * 2017-12-28 2018-08-24 中南大学 A kind of emotional speech conversion method based on sound channel and prosodic features

Also Published As

Publication number Publication date
CN115331678A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN108172238B (en) Speech enhancement algorithm based on multiple convolutional neural networks in speech recognition system
CN111816218A (en) Voice endpoint detection method, device, equipment and storage medium
WO2019232829A1 (en) Voiceprint recognition method and apparatus, computer device and storage medium
CN106847293A (en) Facility cultivation sheep stress behavior acoustical signal monitoring method
CN111899757B (en) Single-channel voice separation method and system for target speaker extraction
Wang et al. Deep learning assisted time-frequency processing for speech enhancement on drones
Dubey et al. Non-intrusive speech quality assessment using several combinations of auditory features
CN113191178B (en) Underwater sound target identification method based on auditory perception feature deep learning
WO2019232833A1 (en) Speech differentiating method and device, computer device and storage medium
CN112908344A (en) Intelligent recognition method, device, equipment and medium for bird song
CN115331678B (en) Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient
Hasan et al. Preprocessing of continuous bengali speech for feature extraction
CN114283829B (en) Voice enhancement method based on dynamic gating convolution circulation network
CN117789699B (en) Speech recognition method, device, electronic equipment and computer readable storage medium
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN118351881A (en) Fusion feature classification and identification method based on noise reduction underwater sound signals
CN115472168B (en) Short-time voice voiceprint recognition method, system and equipment for coupling BGCC and PWPE features
CN117198324A (en) Bird sound identification method, device and system based on clustering model
CN116863956A (en) Robust snore detection method and system based on convolutional neural network
CN113488069B (en) Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network
CN116417011A (en) Underwater sound target identification method based on feature fusion and residual CNN
CN112201226B (en) Sound production mode judging method and system
TWI749547B (en) Speech enhancement system based on deep learning
CN113707172A (en) Single-channel voice separation method, system and computer equipment of sparse orthogonal network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant