[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118016106A - Emotional health analysis and support system for the elderly - Google Patents

Emotional health analysis and support system for the elderly Download PDF

Info

Publication number
CN118016106A
CN118016106A CN202410411579.4A CN202410411579A CN118016106A CN 118016106 A CN118016106 A CN 118016106A CN 202410411579 A CN202410411579 A CN 202410411579A CN 118016106 A CN118016106 A CN 118016106A
Authority
CN
China
Prior art keywords
speech
signal
analysis
frequency
elderly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410411579.4A
Other languages
Chinese (zh)
Inventor
朱彤
姜惠
李曼曼
王莹
王惠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Provincial Hospital
Original Assignee
Shandong Provincial Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Provincial Hospital filed Critical Shandong Provincial Hospital
Priority to CN202410411579.4A priority Critical patent/CN118016106A/en
Publication of CN118016106A publication Critical patent/CN118016106A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B2503/00Evaluating a particular growth phase or type of persons or animals
    • A61B2503/08Elderly

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Psychiatry (AREA)
  • Surgery (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Veterinary Medicine (AREA)
  • Biomedical Technology (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Developmental Disabilities (AREA)
  • Educational Technology (AREA)
  • Psychology (AREA)
  • Social Psychology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Evolutionary Computation (AREA)
  • Physiology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Epidemiology (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明涉及语音分析技术领域,更进一步地,涉及老年人情感健康分析与支持系统。所述系统包括:语音采集装置、语音处理装置和语音情感分析装置;所述语音采集装置,用于在设定的时间段内,获取目标老年人的语音信号;所述语音处理装置,用于对采集到的语音信号应用预加重滤波器来平衡频谱,得到预处理信号,从预处理信号中提取MFCC特征、基频特征和能量特征,作为特征向量中的元素,组成特征向量;所述语音情感分析装置,用于对语音段的特征向量使用预先训练的语音情感分析模型进行情感分析,判断语音段的情感特征,发出需要情感干预的预警信号。本发明能够实时监测老年人情感状态、准确识别情感特征、及时发出预警信号、进行情感干预。

The present invention relates to the field of speech analysis technology, and further to an emotional health analysis and support system for the elderly. The system comprises: a speech acquisition device, a speech processing device and a speech emotion analysis device; the speech acquisition device is used to obtain the speech signal of the target elderly within a set time period; the speech processing device is used to apply a pre-emphasis filter to the collected speech signal to balance the spectrum, obtain a pre-processed signal, extract MFCC features, fundamental frequency features and energy features from the pre-processed signal as elements in a feature vector to form a feature vector; the speech emotion analysis device is used to perform emotion analysis on the feature vector of the speech segment using a pre-trained speech emotion analysis model, judge the emotional features of the speech segment, and issue a warning signal requiring emotional intervention. The present invention can monitor the emotional state of the elderly in real time, accurately identify emotional features, issue warning signals in a timely manner, and perform emotional intervention.

Description

老年人情感健康分析与支持系统Emotional health analysis and support system for the elderly

技术领域Technical Field

本发明属于语音分析技术领域,具体涉及老年人情感健康分析与支持系统。The present invention belongs to the technical field of speech analysis, and in particular relates to an emotional health analysis and support system for the elderly.

背景技术Background Art

随着社会的不断发展和人口老龄化的加剧,老年人的情感健康问题日益引起人们的关注。老年人的情感健康状况对其生活质量和社会参与度具有重要影响,因此,开发一种能够及时监测和支持老年人情感健康的系统具有重要意义。传统的情感健康监测方法主要依赖于医疗机构或专业人员进行面对面的评估,这种方法存在着资源消耗大、成本高、实时性差等问题,难以满足大规模的情感健康监测需求。因此,开发一种能够自动化地监测老年人情感健康,并在必要时提供支持和干预的系统具有重要的现实意义。With the continuous development of society and the aging of the population, the emotional health of the elderly has attracted increasing attention. The emotional health of the elderly has an important impact on their quality of life and social participation. Therefore, it is of great significance to develop a system that can timely monitor and support the emotional health of the elderly. Traditional emotional health monitoring methods mainly rely on face-to-face assessments by medical institutions or professionals. This method has problems such as high resource consumption, high cost, and poor real-time performance, and it is difficult to meet the needs of large-scale emotional health monitoring. Therefore, it is of great practical significance to develop a system that can automatically monitor the emotional health of the elderly and provide support and intervention when necessary.

在过去的几年中,随着语音处理和情感分析领域的发展,越来越多的研究关注于利用语音信号进行情感识别和健康监测。传统的语音情感分析方法主要依赖于基于机器学习的模型,如支持向量机(SVM)、深度神经网络(DNN)等。这些方法通过对语音信号进行特征提取,并将其与标注好的情感类别进行训练,从而实现情感识别和分类。然而,这些方法往往需要大量的标注数据和复杂的模型训练过程,且在实际应用中存在着准确率不高、泛化能力差等问题。In the past few years, with the development of speech processing and sentiment analysis, more and more research has focused on using speech signals for emotion recognition and health monitoring. Traditional speech sentiment analysis methods mainly rely on machine learning-based models, such as support vector machines (SVMs) and deep neural networks (DNNs). These methods extract features from speech signals and train them with labeled emotion categories to achieve emotion recognition and classification. However, these methods often require a large amount of labeled data and a complex model training process, and in practical applications, they have problems such as low accuracy and poor generalization ability.

除了基于机器学习的方法外,还有一些基于信号处理和特征提取的方法被提出用于语音情感分析。例如,利用语音信号的基频、能量、梅尔频率倒谱系数(MFCC)等特征进行情感识别。这些方法通常具有较好的实时性和适用性,但是其准确率和鲁棒性仍然有待提高。In addition to machine learning-based methods, some methods based on signal processing and feature extraction have been proposed for speech emotion analysis. For example, emotion recognition is performed using features such as the fundamental frequency, energy, and Mel-frequency cepstral coefficients (MFCC) of speech signals. These methods usually have good real-time performance and applicability, but their accuracy and robustness still need to be improved.

此外,针对老年人情感健康监测的系统也已经在学术界和工业界展开了一些尝试。这些系统通常包括语音采集装置、情感分析算法和支持干预机制等组成部分,旨在通过对老年人语音信号的分析来实现情感健康的监测和支持。然而,现有的系统往往缺乏对老年人语音特征的深入分析和有效的情感识别算法,导致其在实际应用中的效果不佳,且存在着误报率高、准确率低等问题。In addition, some attempts have been made in academia and industry to develop systems for monitoring the emotional health of the elderly. These systems usually include components such as voice acquisition devices, emotion analysis algorithms, and support intervention mechanisms, and aim to monitor and support the emotional health of the elderly by analyzing their voice signals. However, existing systems often lack in-depth analysis of the elderly's voice characteristics and effective emotion recognition algorithms, resulting in poor results in practical applications, and problems such as high false alarm rates and low accuracy.

因此,针对老年人情感健康分析与支持领域,需要一种能够结合语音处理技术和情感分析算法,实现对老年人情感状态的准确监测和支持的系统。这样的系统应具备对老年人语音信号的高效提取和特征分析能力,同时结合先进的情感分析算法,实现对老年人情感状态的准确判断和及时干预。同时,该系统还应考虑老年人的个性化需求和隐私保护问题,确保其在实际应用中的可接受性和可靠性。Therefore, in the field of emotional health analysis and support for the elderly, a system is needed that can combine speech processing technology and sentiment analysis algorithms to accurately monitor and support the emotional state of the elderly. Such a system should have the ability to efficiently extract and analyze the characteristics of the elderly's voice signals, and at the same time combine advanced sentiment analysis algorithms to accurately judge the emotional state of the elderly and intervene in a timely manner. At the same time, the system should also consider the personalized needs and privacy protection issues of the elderly to ensure its acceptability and reliability in practical applications.

发明内容Summary of the invention

本发明的主要目的在于提供电网一次调频智能控制系统,通过语音信号的采集、处理和情感分析,实现了对老年人情感健康的全面监测和支持,具有重要的应用前景和社会意义。本发明能够实时监测老年人情感状态、准确识别情感特征、及时发出预警信号、进行情感干预和支持等,将为老年人的情感健康提供有效的保障和支持。The main purpose of the present invention is to provide an intelligent control system for primary frequency modulation of the power grid, which realizes comprehensive monitoring and support for the emotional health of the elderly through the collection, processing and emotional analysis of voice signals, and has important application prospects and social significance. The present invention can monitor the emotional state of the elderly in real time, accurately identify emotional characteristics, issue early warning signals in a timely manner, conduct emotional intervention and support, etc., which will provide effective protection and support for the emotional health of the elderly.

为了解决上述问题,本发明的技术方案是这样实现的:In order to solve the above problems, the technical solution of the present invention is achieved as follows:

老年人情感健康分析与支持系统,所述系统包括:语音采集装置、语音处理装置和语音情感分析装置;所述语音采集装置,用于在设定的时间段内,获取目标老年人的语音信号,并对语音信号进行信号初步分析,以判断是否需要进行语音情感分析,具体包括:统计分析语音信号的语音平均能量和语音频次占比;所述语音平均能量定义为在设定时间段内,语音信号的总能量与时间段的比值;所述语音频次占比定义为在设定时间段内,语音信号的长度与时间段的比值;若语音平均能量或语音频次占比均处于各自对应的阈值范围内,则判断不需要进行语音情感分析,否则,则判断需要进行语音情感分析;所述语音处理装置,用于在语音采集装置判断需要进行语音情感分析时,对采集到的语音信号应用预加重滤波器来平衡频谱,得到预处理信号,从预处理信号中提取MFCC特征、基频特征和能量特征,作为特征向量中的元素,组成特征向量,并基于特征向量,使用零交叉率的方法将预处理信号划分为语音段和非语音段;所述语音情感分析装置,用于对语音段的特征向量使用预先训练的语音情感分析模型进行情感分析,判断语音段的情感特征,所述情感特征包括:正向情感特征、中性情感特征和负向情感特征,若在设定的时间周期内处于负向情感特征的语音段的总帧数与时间周期的长度的比值超过设定的阈值,则判断目标老年人处于负向情绪中,发出需要情感干预的预警信号。An emotional health analysis and support system for the elderly, the system comprising: a speech collection device, a speech processing device and a speech emotion analysis device; the speech collection device is used to obtain the speech signal of the target elderly within a set time period, and perform preliminary signal analysis on the speech signal to determine whether speech emotion analysis is required, specifically including: statistical analysis of the average speech energy and the proportion of speech frequencies of the speech signal; the average speech energy is defined as the ratio of the total energy of the speech signal to the time period within the set time period; the proportion of speech frequencies is defined as the ratio of the length of the speech signal to the time period within the set time period; if the average speech energy or the proportion of speech frequencies are both within the corresponding threshold ranges, it is determined that speech emotion analysis is not required, otherwise, it is determined that speech emotion analysis is required; the speech processing device is used to collect the speech signals in the speech collection device When it is determined that speech emotion analysis is needed, a pre-emphasis filter is applied to the collected speech signal to balance the spectrum to obtain a preprocessed signal, MFCC features, fundamental frequency features and energy features are extracted from the preprocessed signal as elements in a feature vector to form a feature vector, and based on the feature vector, the preprocessed signal is divided into speech segments and non-speech segments using a zero crossing rate method; the speech emotion analysis device is used to perform emotion analysis on the feature vector of the speech segment using a pre-trained speech emotion analysis model to determine the emotional features of the speech segment, the emotional features include: positive emotional features, neutral emotional features and negative emotional features, if the ratio of the total number of frames of the speech segment with negative emotional features within a set time period to the length of the time period exceeds a set threshold, it is determined that the target elderly person is in a negative emotion, and a warning signal requiring emotional intervention is issued.

进一步的,所述语音采集装置包括:采集单元、增强单元、初步分析单元和噪声分离单元;所述采集单元,用于在设定的时间段内,通过语音识别判断发出的语音信号是否为目标老年人发出,若是,则对语音信号进行采集;所述增强单元,用于对语音信号进行信号增强,得到增强语音信号;所述初步分析单元,对语音信号进行信号初步分析,以判断是否需要进行语音情感分析;所述噪声分离单元,用于在判断需要进行语音情感分析时将背景噪声从语音信号中分离。Furthermore, the voice collection device includes: a collection unit, an enhancement unit, a preliminary analysis unit and a noise separation unit; the collection unit is used to determine whether the voice signal emitted is emitted by the target elderly person through voice recognition within a set time period, and if so, to collect the voice signal; the enhancement unit is used to enhance the voice signal to obtain an enhanced voice signal; the preliminary analysis unit performs preliminary analysis of the voice signal to determine whether voice emotion analysis is required; the noise separation unit is used to separate background noise from the voice signal when it is determined that voice emotion analysis is required.

进一步的,增强单元,对语音信号进行信号增强,得到增强语音信号的方法包括:使用如下公式,将语音信号进行基于自回归模型的短时傅里叶变换,得到时频表示Furthermore, the enhancement unit performs signal enhancement on the speech signal to obtain an enhanced speech signal, which includes: using the following formula to enhance the speech signal: Perform short-time Fourier transform based on the autoregressive model to obtain the time-frequency representation :

;

其中,表示短时傅里叶变换的时间片段索引;表示频率索引;为每个时间片段的窗长;为窗函数;是自回归模型的系数,是自回归模型的阶数;为虚数符号;时域索引;通过如下公式,采用具有非线性动态范围压缩特性的Wiener滤波器对时频表示进行增强:in, represents the time segment index of the short-time Fourier transform; represents the frequency index; is the window length for each time segment; is the window function; are the coefficients of the autoregressive model, is the order of the autoregressive model; is the imaginary number symbol; Time domain index; the time-frequency representation is enhanced using a Wiener filter with nonlinear dynamic range compression characteristics through the following formula:

;

其中,分别表示噪声和语音信号的功率谱估计;为增强语音信号的频域表示,将增强后的频域信号进行逆短时傅里叶变换得到增强语音信号。in, and Represent the power spectrum estimation of noise and speech signal respectively; In order to enhance the frequency domain representation of speech signals, the enhanced frequency domain signals Perform inverse short-time Fourier transform to obtain the enhanced speech signal.

进一步的,噪声分离单元,在判断需要进行语音情感分析时将背景噪声从语音信号中分离的方法包括:将语音信号表示为时域上的波形函数;通过短时傅里叶变换将其转换到频域,得到频域表示;设时间段的长度为,使用长度为的窗函数对其进行分段,窗口长度为,窗口之间的重叠长度为;窗口函数选择汉明窗,定义窗口函数为:Furthermore, the noise separation unit, when determining that speech emotion analysis is required, separates background noise from speech signals by: representing the speech signals as waveform functions in the time domain ; Convert it to the frequency domain through short-time Fourier transform to get the frequency domain representation ; Let the length of the time period be , using length The window function is segmented, and the window length is , the overlap length between windows is ; Select the Hamming window as the window function and define the window function as:

;

其中,表示窗口的采样索引;通过将窗口函数应用于语音信号的各个时间片段,并应用零填充将其扩展到长度为,以获得时域上的窗口信号;对每个窗口信号应用离散傅里叶变换,得到频域表示;设背景噪声是稳态的,并且与语音信号是线性叠加的;使用频域上的自适应滤波器对背景噪声进行建模和估计;设表示语音信号的干净频谱,表示背景噪声的频谱;定义自适应滤波器的频域响应为:in, represents the sampling index of the window; it is extended to a length of 1 by applying the window function to each time segment of the speech signal and applying zero padding , to obtain the window signal in the time domain ; For each window signal Applying discrete Fourier transform, we get the frequency domain representation ; Assume that the background noise is steady-state and linearly superimposed with the speech signal; use an adaptive filter in the frequency domain to model and estimate the background noise; Assume represents the clean spectrum of the speech signal, Represents the spectrum of the background noise; the frequency domain response of the adaptive filter is defined as:

;

其中,是在时间处的自适应滤波器频域响应;通过如下公式,使用自适应滤波器对语音信号进行频域上的重构,得到重构信号:in, It's in time The frequency domain response of the adaptive filter at ; using the following formula, using the adaptive filter Reconstruct the speech signal in the frequency domain to obtain the reconstructed signal:

;

其中,为重构信号;将重构信号转换回时域,得到将背景噪声从语音信号中分离后的语音信号in, To reconstruct the signal; convert the reconstructed signal back to the time domain to obtain the speech signal after separating the background noise from the speech signal .

进一步的,语音处理装置,在语音采集装置判断需要进行语音情感分析时,通过如下公式,对采集到的语音信号应用预加重滤波器来平衡频谱,然后将语音信号分割成重叠的帧:Furthermore, when the voice collection device determines that voice emotion analysis is required, the voice processing device applies a pre-emphasis filter to the collected voice signal to balance the spectrum, and then divides the voice signal into overlapping frames by the following formula:

;

;

其中,是原始信号;是预加重后的信号;是预加重系数;表示第帧;是帧长;是帧移,表述相邻帧之间的重叠。in, is the original signal; It is the signal after pre-emphasis; is the pre-emphasis coefficient; Indicates frame; is the frame length; It is the frame shift, which describes the overlap between adjacent frames.

进一步的,语音处理装置对每帧信号,进行窗函数处理后,应用离散傅里叶变换,然后通过梅尔滤波器处理,以提取MFCC特征,公式如下:Furthermore, the speech processing device performs window function processing on each frame signal, applies discrete Fourier transform, and then processes it through Mel filter to extract MFCC features. The formula is as follows:

;

;

;

其中,个语音帧中的第个样本;表示经过离散傅里叶变换处理的第个频域系数表示离散傅里叶变换的点数,同时也是每个语音帧的样本数量;等于表示离散傅里叶变换中独立频率成分的数量;是第个梅尔滤波器在第个频率点的增益;梅尔滤波器是一组重叠的三角形带通滤波器,用于模仿人耳的频率感知,对频率进行非线性的梅尔尺度转换;为梅尔滤波器的数量,代表在梅尔尺度上划分的频率带的数量;为通过将第个梅尔滤波器应用于离散傅里叶变换的系数并取对数得到的结果,它代表了第个频率带的对数能量;为第个梅尔频率倒谱系数,是对应用离散余弦变换的结果,目的是将梅尔滤波器的对数能量谱转换为时间域的倒谱系数,减少特征间的相关性,并突出频谱形状特征;为最终提取的MFCC特征的数量。in, No. The first samples; represents the discrete Fourier transform processed Frequency domain coefficients Represents the number of points of the discrete Fourier transform, which is also the number of samples in each speech frame; equal Represents the number of independent frequency components in the discrete Fourier transform; It is The Mel filter is The Mel filter is a set of overlapping triangular bandpass filters, which is used to simulate the frequency perception of the human ear and perform nonlinear Mel scale conversion on the frequency. is the number of Mel filters, representing the number of frequency bands divided on the Mel scale; In order to pass the The result of applying a Mel filter to the coefficients of the discrete Fourier transform and taking the logarithm represents the The logarithmic energy of a frequency band; For the Mel-frequency cepstral coefficients are The result of applying discrete cosine transform is to convert the logarithmic energy spectrum of the Mel filter into cepstral coefficients in the time domain, reduce the correlation between features, and highlight the spectral shape characteristics; is the number of MFCC features finally extracted.

进一步的,基频特征通过如下公式计算得到:Furthermore, the fundamental frequency characteristics Calculated by the following formula:

;

其中,是自相关函数;的峰值位置;是采样频率;能量特征通过如下公式,计算得到:in, is the autocorrelation function; yes The peak position of is the sampling frequency; energy characteristics The calculation is done by the following formula:

;

其中,为第个语音帧的能量;得到的特征向量为:in, For the The energy of a speech frame; the obtained feature vector is:

.

进一步的,预先训练的语音情感分析模型为一个三分支持向量模型,其类别标签为:Furthermore, the pre-trained speech sentiment analysis model is a three-part support vector model, and its category labels are:

,

表示三种情感特征,类别标签中每个元素对应一种情感类别;所述语音情感分析模型使用如下公式进行表示:Represents three emotional features, category labels Each element in corresponds to an emotion category; the speech emotion analysis model is expressed using the following formula:

;

;

;

;

;

其中,表示决策超平面的法向量,用于定义分类边界;表示偏置项,也称为截距,它是支持向量机模型中的参数,用于平移分类边界;表示额外决策函数的权重向量,用于定义额外的分类边界;表示额外决策函数的偏置项,用于平移额外分类边界;表示松弛变量,表示允许偏离超平面的程度;表示正则化参数,控制了松弛变量的重要性,它的值越大,对误分类的惩罚越严重;表示指示变量;为额外的决策函数;采集历史的语音段的特征向量,以及对应的类别标签,作为模型训练测试数据,将模型训练测试数据中的一部分作为训练数据,训练语音情感分析模型,训练的目标是找到一个最优的决策边界,以最大程度地正确分类训练集中的样本,并且保持模型的泛化能力;将模型训练测试数据中除训练数据以外的部分作为测试数据,使用测试数据对训练好的支持向量机模型进行评估,评估指标为准确率,若准确率超过设定的准确率阈值,则停止训练,否则,则调整模型的参数,继续进行训练,直到准确率超过设定的准确率阈值。in, Represents the normal vector of the decision hyperplane, which is used to define the classification boundary; represents the bias term, also known as the intercept, which is a parameter in the support vector machine model used to shift the classification boundary; A weight vector representing an additional decision function, used to define additional classification boundaries; The bias term of the additional decision function is used to shift the additional classification boundary; represents the slack variable, indicating the degree of deviation from the hyperplane allowed; represents the regularization parameter, The importance of the slack variable is controlled. The larger its value, the more severe the penalty for misclassification. represents an indicator variable; is an additional decision function; the feature vectors of historical speech segments and the corresponding category labels are collected as model training test data, and a part of the model training test data is used as training data to train the speech sentiment analysis model. The training goal is to find an optimal decision boundary to correctly classify the samples in the training set to the greatest extent and maintain the generalization ability of the model; the part of the model training test data other than the training data is used as test data, and the trained support vector machine model is evaluated using the test data. The evaluation indicator is accuracy. If the accuracy exceeds the set accuracy threshold, the training is stopped, otherwise, the parameters of the model are adjusted and the training continues until the accuracy exceeds the set accuracy threshold.

进一步的,额外的决策函数使用如下公式进行表示:Furthermore, additional decision functions Use the following formula to express it:

.

本发明的老年人情感健康分析与支持系统,具有以下有益效果:The emotional health analysis and support system for the elderly of the present invention has the following beneficial effects:

首先,本发明通过对语音信号的采集装置、处理装置和情感分析装置的组合使用,实现了对老年人情感状态的全面监测。语音采集装置能够在设定的时间段内获取目标老年人的语音信号,并进行初步分析,判断是否需要进行情感分析。采集到的语音信号经过处理装置的预加重滤波器平衡频谱,然后分割成重叠的帧,并提取特征,如MFCC、基频和能量特征,为后续的情感分析做准备。情感分析装置则利用预先训练的语音情感分析模型,对提取的特征进行分析,判断老年人的情感状态,从而实现了对老年人情感健康的实时监测。First, the present invention realizes comprehensive monitoring of the emotional state of the elderly by combining the voice signal collection device, processing device and emotion analysis device. The voice collection device can obtain the voice signal of the target elderly within a set time period, and perform preliminary analysis to determine whether emotion analysis is needed. The collected voice signal is balanced by the pre-emphasis filter of the processing device, and then divided into overlapping frames, and features such as MFCC, fundamental frequency and energy features are extracted to prepare for subsequent emotion analysis. The emotion analysis device uses a pre-trained voice emotion analysis model to analyze the extracted features and determine the emotional state of the elderly, thereby realizing real-time monitoring of the emotional health of the elderly.

其次,本发明采用了基于机器学习的情感分析模型,结合了支持向量机等方法,能够对老年人的情感状态进行准确判断。预先训练的语音情感分析模型利用采集到的语音特征,能够有效地识别出语音信号中的情感特征,包括正向情感、中性情感和负向情感,从而准确判断老年人的情感状态。通过模型的训练和测试,可以不断优化模型参数,提高情感分析的准确率和鲁棒性,为老年人情感健康的监测提供可靠的支持。Secondly, the present invention adopts a sentiment analysis model based on machine learning, combined with methods such as support vector machines, which can accurately judge the emotional state of the elderly. The pre-trained speech sentiment analysis model uses the collected speech features to effectively identify the emotional features in the speech signal, including positive emotions, neutral emotions and negative emotions, so as to accurately judge the emotional state of the elderly. Through the training and testing of the model, the model parameters can be continuously optimized, the accuracy and robustness of sentiment analysis can be improved, and reliable support can be provided for the monitoring of the emotional health of the elderly.

再次,本发明还提供了实时干预机制,能够在老年人情感状态异常时及时发出预警信号,进行情感干预和支持。通过对情感分析结果的监测,当系统检测到老年人处于负向情感状态时,能够及时发出预警信号,提示相关工作人员或家庭成员进行干预和支持。这种及时的干预机制可以有效预防老年人情感健康问题的发生,提高其生活质量和幸福感。Thirdly, the present invention also provides a real-time intervention mechanism, which can send out early warning signals in time when the emotional state of the elderly is abnormal, and provide emotional intervention and support. By monitoring the results of emotional analysis, when the system detects that the elderly are in a negative emotional state, it can send out early warning signals in time to prompt relevant staff or family members to intervene and support. This timely intervention mechanism can effectively prevent the occurrence of emotional health problems in the elderly and improve their quality of life and sense of happiness.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例提供的老年人情感健康分析与支持系统的系统结构示意图。FIG1 is a schematic diagram of the system structure of the emotional health analysis and support system for the elderly provided in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the scheme of the present invention, the technical scheme in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of the present invention.

实施例1:参考图1,老年人情感健康分析与支持系统,所述系统包括:语音采集装置、语音处理装置和语音情感分析装置;所述语音采集装置,用于在设定的时间段内,获取目标老年人的语音信号,并对语音信号进行信号初步分析,以判断是否需要进行语音情感分析,具体包括:统计分析语音信号的语音平均能量和语音频次占比;所述语音平均能量定义为在设定时间段内,语音信号的总能量与时间段的比值;所述语音频次占比定义为在设定时间段内,语音信号的长度与时间段的比值;若语音平均能量或语音频次占比均处于各自对应的阈值范围内,则判断不需要进行语音情感分析,否则,则判断需要进行语音情感分析;所述语音处理装置,用于在语音采集装置判断需要进行语音情感分析时,对采集到的语音信号应用预加重滤波器来平衡频谱,得到预处理信号,从预处理信号中提取MFCC特征、基频特征和能量特征,作为特征向量中的元素,组成特征向量,并基于特征向量,使用零交叉率的方法将预处理信号划分为语音段和非语音段;所述语音情感分析装置,用于对语音段的特征向量使用预先训练的语音情感分析模型进行情感分析,判断语音段的情感特征,所述情感特征包括:正向情感特征、中性情感特征和负向情感特征,若在设定的时间周期内处于负向情感特征的语音段的总帧数与时间周期的长度的比值超过设定的阈值,则判断目标老年人处于负向情绪中,发出需要情感干预的预警信号。Embodiment 1: Referring to FIG1 , an emotional health analysis and support system for the elderly, the system comprises: a speech acquisition device, a speech processing device and a speech emotion analysis device; the speech acquisition device is used to acquire the speech signal of the target elderly within a set time period, and perform preliminary signal analysis on the speech signal to determine whether speech emotion analysis is required, specifically comprising: statistically analyzing the average speech energy and speech frequency ratio of the speech signal; the average speech energy is defined as the ratio of the total energy of the speech signal to the time period within the set time period; the speech frequency ratio is defined as the ratio of the length of the speech signal to the time period within the set time period; if the average speech energy or the speech frequency ratio are both within their respective corresponding threshold ranges, it is determined that speech emotion analysis is not required, otherwise, it is determined that speech emotion analysis is required; the speech processing device is used to perform preliminary signal analysis on the speech signal during the speech acquisition process; When the sound collection device determines that speech emotion analysis is needed, a pre-emphasis filter is applied to the collected speech signal to balance the spectrum to obtain a preprocessed signal, MFCC features, fundamental frequency features and energy features are extracted from the preprocessed signal as elements in a feature vector to form a feature vector, and based on the feature vector, the preprocessed signal is divided into speech segments and non-speech segments using a zero crossing rate method; the speech emotion analysis device is used to perform emotion analysis on the feature vector of the speech segment using a pre-trained speech emotion analysis model to determine the emotion characteristics of the speech segment, the emotion characteristics include: positive emotion characteristics, neutral emotion characteristics and negative emotion characteristics, if the ratio of the total number of frames of the speech segment with negative emotion characteristics within a set time period to the length of the time period exceeds a set threshold, it is determined that the target elderly person is in a negative emotion, and a warning signal requiring emotional intervention is issued.

具体的,语音采集装置利用内置的麦克风或外部设备,捕捉目标老年人在设定的时间段内的语音信号。当目标老年人与系统进行交互,或者在特定时间段内,系统设定的监测周期内发生了语音信号,采集装置就会开始工作。语音信号以模拟电信号的形式被传输到系统内部进行进一步处理和分析。语音处理装置接收来自语音采集装置的语音信号,对其进行预处理和特征提取。在预处理阶段,通常会应用预加重滤波器来平衡频谱,并消除语音信号中的高频部分。接着,从预处理信号中提取声音特征,如梅尔频率倒谱系数(MFCC)、基频特征和能量特征等。这些特征提取过程可以使用数字信号处理技术来实现。Specifically, the voice acquisition device uses a built-in microphone or an external device to capture the voice signal of the target elderly within a set time period. When the target elderly interacts with the system, or a voice signal occurs within a specific time period or within a monitoring period set by the system, the acquisition device will start working. The voice signal is transmitted to the system in the form of an analog electrical signal for further processing and analysis. The voice processing device receives the voice signal from the voice acquisition device, preprocesses it, and extracts features. In the preprocessing stage, a pre-emphasis filter is usually applied to balance the spectrum and eliminate the high-frequency part of the voice signal. Then, sound features such as Mel-frequency cepstral coefficients (MFCC), fundamental frequency features, and energy features are extracted from the preprocessed signal. These feature extraction processes can be implemented using digital signal processing technology.

MFCC特征是通过对语音信号进行傅立叶变换,然后在梅尔频率尺度上对频谱进行离散余弦变换而得到的一组系数。这些系数反映了语音信号在梅尔频率域上的特征,包括声音的谐波结构和共振峰。MFCC特征能够很好地表征语音信号的频谱结构和声学特性,对语音信号的语音内容进行了较为充分的描述。在情感分析中,MFCC特征可以捕捉到语音信号的声音质量、音调变化等信息,从而提供了有助于判断情感的重要特征。基频是指声音中最主要的频率成分,它决定了声音的音调。基频特征通常用来描述声音的音高或音调。基频特征能够捕捉到语音信号的音调变化,例如高低起伏、语调的变化等。这些音调的变化往往与情感状态密切相关,例如情绪高涨时的高音调、抑郁时的低音调等。因此,基频特征对于情感分析具有重要意义。基频是语音信号中最低频率的周期性振荡,通常对应于声音的音高。基频特征通常用于描述语音信号的音调或声音的高低。选择基频特征的原因在于,音调是语音信号中表达情感的重要指标之一。人们倾向于在不同的情感状态下改变声音的音调,例如在愉快时声音会较高,而在沮丧时声音会较低。因此,基频特征可以提供有关语音信号中情感信息的重要线索。能量特征是语音信号的能量分布情况,通常用于描述语音信号的强度或音量。选择能量特征的原因在于,情感表达通常伴随着声音的强度和音量的变化。例如,愤怒时声音会较大,而悲伤时声音会较小。因此,能量特征可以提供关于语音信号中情感强度的重要信息。MFCC features are a set of coefficients obtained by Fourier transforming the speech signal and then performing discrete cosine transform on the spectrum on the Mel frequency scale. These coefficients reflect the characteristics of the speech signal in the Mel frequency domain, including the harmonic structure and formant of the sound. MFCC features can well characterize the spectral structure and acoustic characteristics of the speech signal, and provide a relatively comprehensive description of the speech content of the speech signal. In sentiment analysis, MFCC features can capture information such as the sound quality and pitch changes of the speech signal, thereby providing important features that help judge emotions. Fundamental frequency refers to the most important frequency component in the sound, which determines the pitch of the sound. Fundamental frequency features are usually used to describe the pitch or tone of the sound. Fundamental frequency features can capture the pitch changes of the speech signal, such as ups and downs, changes in intonation, etc. These changes in tone are often closely related to the emotional state, such as high pitch when the mood is high and low pitch when the mood is depressed. Therefore, fundamental frequency features are of great significance for sentiment analysis. Fundamental frequency is the lowest frequency periodic oscillation in the speech signal, which usually corresponds to the pitch of the sound. Fundamental frequency features are usually used to describe the pitch of the speech signal or the height of the sound. The fundamental frequency feature is chosen because pitch is one of the important indicators of expressing emotions in speech signals. People tend to change the pitch of their voices in different emotional states, for example, the voice will be higher when they are happy and lower when they are frustrated. Therefore, the fundamental frequency feature can provide important clues about the emotional information in the speech signal. The energy feature is the energy distribution of the speech signal, which is usually used to describe the intensity or volume of the speech signal. The energy feature is chosen because emotional expression is usually accompanied by changes in the intensity and volume of the voice. For example, the voice will be louder when angry and lower when sad. Therefore, the energy feature can provide important information about the intensity of emotions in the speech signal.

能量、基频和MFCC代表了语音信号的不同方面。能量特征反映了语音信号的强度和振幅变化,基频特征反映了语音信号的音高变化,而MFCC特征提取了语音信号的频谱特征。将这三种特征结合起来使用,能够更全面地捕捉语音信号中的各种情感相关信息,包括情感的强度、音高的变化以及语音的频谱特性等。能量、基频和MFCC这三种特征各自代表了语音信号的不同方面,结合使用可以提高对说话者情感状态的准确性。例如,当声音的能量增加、基频变高并且声音频谱特征发生变化时,表示说话者处于愤怒或激动的情绪状态;而当声音的能量降低、基频变低并且声音频谱特征发生变化时,表示说话者处于悲伤或沮丧的情绪状态。由于每种特征都有其独特的信息表达方式,它们之间存在互补性和独立性。因此,将这三种特征结合起来使用可以增强情感分析系统的鲁棒性,降低对环境噪声和说话者个体差异的敏感性,提高系统的泛化能力和适用性。能量、基频和MFCC这三种特征可以被看作是语音信号的不同方面的抽象表示。结合使用这些特征可以提供一个更全面、更多维度的特征空间,有助于更好地描述语音信号中的情感信息,提高情感分析系统的性能。Energy, fundamental frequency and MFCC represent different aspects of speech signals. Energy features reflect the intensity and amplitude changes of speech signals, fundamental frequency features reflect the pitch changes of speech signals, and MFCC features extract the spectral features of speech signals. The combination of these three features can more comprehensively capture various emotion-related information in speech signals, including the intensity of emotions, changes in pitch, and spectral characteristics of speech. Energy, fundamental frequency and MFCC each represent different aspects of speech signals, and their combination can improve the accuracy of the speaker's emotional state. For example, when the energy of the voice increases, the fundamental frequency becomes higher, and the spectral characteristics of the voice change, it means that the speaker is in an angry or excited emotional state; and when the energy of the voice decreases, the fundamental frequency becomes lower, and the spectral characteristics of the voice change, it means that the speaker is in a sad or depressed emotional state. Since each feature has its own unique way of expressing information, there is complementarity and independence between them. Therefore, the combination of these three features can enhance the robustness of the sentiment analysis system, reduce the sensitivity to environmental noise and individual differences of speakers, and improve the generalization ability and applicability of the system. The three features of energy, fundamental frequency and MFCC can be regarded as abstract representations of different aspects of speech signals. Combining these features can provide a more comprehensive and multi-dimensional feature space, which helps to better describe the emotional information in speech signals and improve the performance of sentiment analysis systems.

实施例2:所述语音采集装置包括:采集单元、增强单元、初步分析单元和噪声分离单元;所述采集单元,用于在设定的时间段内,通过语音识别判断发出的语音信号是否为目标老年人发出,若是,则对语音信号进行采集;所述增强单元,用于对语音信号进行信号增强,得到增强语音信号;所述初步分析单元,对语音信号进行信号初步分析,以判断是否需要进行语音情感分析;所述噪声分离单元,用于在判断需要进行语音情感分析时将背景噪声从语音信号中分离。Embodiment 2: The speech collection device includes: a collection unit, an enhancement unit, a preliminary analysis unit and a noise separation unit; the collection unit is used to determine whether the voice signal emitted is emitted by the target elderly person through voice recognition within a set time period, and if so, to collect the voice signal; the enhancement unit is used to enhance the voice signal to obtain an enhanced voice signal; the preliminary analysis unit performs preliminary analysis of the voice signal to determine whether voice emotion analysis is required; the noise separation unit is used to separate background noise from the voice signal when it is determined that voice emotion analysis is required.

具体的,过语音识别判断发出的语音信号是否为目标老年人发出,以避免对于一些无效的语音信号进行后续的处理和分析。在日常生活中,会有各种各样的无关声音出现,如果对所有的语音信号进行无差别分析和处理会造成系统资源的浪费。噪声分离单元通过噪声抑制算法,如自适应滤波器、谱减法等,将环境中的背景噪声与目标语音信号分离开来,提高了后续情感分析的准确性和可靠性,使系统能够更好地专注于目标老年人的语音情感分析。Specifically, voice recognition is used to determine whether the voice signal is emitted by the target elderly person, so as to avoid subsequent processing and analysis of some invalid voice signals. In daily life, there will be various irrelevant sounds. If all voice signals are analyzed and processed indiscriminately, it will cause a waste of system resources. The noise separation unit separates the background noise in the environment from the target voice signal through noise suppression algorithms, such as adaptive filters and spectral subtraction, which improves the accuracy and reliability of subsequent sentiment analysis and enables the system to better focus on the voice sentiment analysis of the target elderly person.

对语音信号进行信号初步分析,以判断是否需要进行语音情感分析,具体包括:统计分析语音信号的语音平均能量和语音频次占比;所述语音平均能量定义为在设定时间段内,语音信号的总能量与时间段的比值;所述语音频次占比定义为在设定时间段内,语音信号的长度与时间段的比值;若语音平均能量或语音频次占比均处于各自对应的阈值范围内,则判断不需要进行语音情感分析,否则,则判断需要进行语音情感分析。语音平均能量反映了语音信号在一定时间内的声音强度。通过分析语音平均能量,可以初步了解语音信号的音量大小,从而判断说话者是否正在发出具有情感色彩的强烈语音。高能量的语音信号表明说话者处于激动、愤怒等情绪状态,而低能量的语音信号表明说话者情绪较为平静或消极。语音频次占比反映了语音信号在总时间段内的占比情况。通过分析语音频次占比,可以初步了解说话者的语音活跃度和频率,即说话者在一定时间段内发出语音的次数和频率。较高的语音频次占比表明说话者情绪波动较大,频繁发声,而较低的语音频次占比表明说话者情绪较为稳定或不活跃。通过语音信号的基本特征分析,初步了解说话者当前的情绪状态,包括情绪的强度和活跃度。如果语音信号的能量和频次都处于相对平静的范围内,表示说话者情绪较为平静或正常,此时不需要进行进一步的情感分析。反之,如果语音信号的能量和频次超过了预设的阈值范围,表明说话者处于情绪激动、愤怒、沮丧等状态,此时需要进行进一步的语音情感分析,以更深入地了解说话者的情感状态并提供相应的支持和干预。增强单元通过应用信号处理技术,如滤波、降噪等方法,对采集到的语音信号进行增强处理,消除存在的噪声、失真等干扰,得到更清晰、更可靠的增强语音信号,为后续分析提供更好的输入。Performing a preliminary signal analysis on the speech signal to determine whether speech emotion analysis is required, specifically including: statistically analyzing the average speech energy and speech frequency ratio of the speech signal; the average speech energy is defined as the ratio of the total energy of the speech signal to the time period within a set time period; the speech frequency ratio is defined as the ratio of the length of the speech signal to the time period within a set time period; if the average speech energy or the speech frequency ratio are both within their respective corresponding threshold ranges, it is determined that speech emotion analysis is not required, otherwise, it is determined that speech emotion analysis is required. The average speech energy reflects the sound intensity of the speech signal within a certain period of time. By analyzing the average speech energy, the volume of the speech signal can be preliminarily understood, so as to determine whether the speaker is emitting a strong speech with emotional color. A high-energy speech signal indicates that the speaker is in an emotional state such as excitement or anger, while a low-energy speech signal indicates that the speaker is relatively calm or negative. The speech frequency ratio reflects the proportion of the speech signal in the total time period. By analyzing the speech frequency ratio, the speaker's speech activity and frequency can be preliminarily understood, that is, the number and frequency of the speaker's speech within a certain period of time. A higher proportion of speech frequency indicates that the speaker has greater emotional fluctuations and speaks frequently, while a lower proportion of speech frequency indicates that the speaker is relatively stable or inactive. Through the basic feature analysis of the speech signal, we can preliminarily understand the current emotional state of the speaker, including the intensity and activity of the emotion. If the energy and frequency of the speech signal are within a relatively calm range, it means that the speaker is relatively calm or normal, and no further emotional analysis is required. On the contrary, if the energy and frequency of the speech signal exceed the preset threshold range, it indicates that the speaker is in a state of excitement, anger, frustration, etc. At this time, further speech emotion analysis is required to better understand the speaker's emotional state and provide corresponding support and intervention. The enhancement unit enhances the collected speech signal by applying signal processing techniques, such as filtering and noise reduction, to eliminate existing noise, distortion and other interference, and obtain a clearer and more reliable enhanced speech signal, providing better input for subsequent analysis.

实施例3:增强单元,对语音信号进行信号增强,得到增强语音信号的方法包括:使用如下公式,将语音信号进行基于自回归模型的短时傅里叶变换,得到时频表示Embodiment 3: The method for enhancing the speech signal by the enhancement unit to obtain the enhanced speech signal includes: using the following formula to enhance the speech signal Perform short-time Fourier transform based on the autoregressive model to obtain the time-frequency representation :

;

原始语音信号被分成若干个短时段,在每个短时段内采用窗函数进行截断。窗函数的作用是限制每个时间片段内信号的时域长度,并使其逐渐衰减到零,以减少时域分析时的频谱泄漏效应。在每个时间片段内,采用傅里叶变换将时域信号转换到频域,得到每个时间段内语音信号的频谱表示。这一过程使得可以观察到语音信号在不同频率上的能量分布情况,从而更好地理解语音信号的声学特性。此外,公式中的第二项是自回归模型的部分,用于捕捉语音信号的自相关性。自回归模型假设当前时刻的语音信号与之前的若干个时刻的信号存在线性关系,通过求解自回归系数,可以得到语音信号的自相关信息,进一步增强了语音信号的表示能力。The original speech signal is divided into several short time periods, and a window function is used to truncate each short time period. The role of the window function is to limit the time domain length of the signal in each time segment and gradually decay it to zero to reduce the spectrum leakage effect during time domain analysis. In each time segment, the time domain signal is converted to the frequency domain using Fourier transform to obtain the spectrum representation of the speech signal in each time period. This process makes it possible to observe the energy distribution of the speech signal at different frequencies, so as to better understand the acoustic characteristics of the speech signal. In addition, the second term in the formula is the part of the autoregressive model, which is used to capture the autocorrelation of the speech signal. The autoregressive model assumes that there is a linear relationship between the speech signal at the current moment and the signal at several previous moments. By solving the autoregressive coefficient, the autocorrelation information of the speech signal can be obtained, which further enhances the representation ability of the speech signal.

其中,表示短时傅里叶变换的时间片段索引;表示频率索引;为每个时间片段的窗长;为窗函数;是自回归模型的系数,是自回归模型的阶数;为虚数符号;时域索引;通过如下公式,采用具有非线性动态范围压缩特性的Wiener滤波器对时频表示进行增强:in, represents the time segment index of the short-time Fourier transform; represents the frequency index; is the window length for each time segment; is the window function; are the coefficients of the autoregressive model, is the order of the autoregressive model; is the imaginary number symbol; Time domain index; the time-frequency representation is enhanced using a Wiener filter with nonlinear dynamic range compression characteristics through the following formula:

;

分子部分表示的是原始语音信号在频域上的功率谱,也就是语音信号在不同频率上的能量分布情况。通过计算原始语音信号的功率谱,可以了解语音信号在不同频率上的强度。分母部分表示的是原始语音信号功率谱与噪声功率谱之间的比值。其中,表示噪声与语音信号的功率谱比值。这个比值越大,说明噪声在信号中所占比重越高,反之则说明信号在信号中所占比重越高。这部分的作用是对信号和噪声的功率谱进行比较,从而在增强过程中对噪声进行抑制。Wiener滤波器是一种经典的信号处理滤波器,具有非线性动态范围压缩特性,能够有效地抑制噪声并提高信号的质量。公式中的部分就是利用了Wiener滤波器的原理,通过对原始语音信号的功率谱和噪声功率谱进行比较,对信号进行增强。其中, 分别表示噪声和语音信号的功率谱估计;为增强语音信号的频域表示,将增强后的频域信号进行逆短时傅里叶变换得到增强语音信号。Molecular part It represents the power spectrum of the original speech signal in the frequency domain, that is, the energy distribution of the speech signal at different frequencies. By calculating the power spectrum of the original speech signal, we can understand the strength of the speech signal at different frequencies. It represents the ratio between the original speech signal power spectrum and the noise power spectrum. Represents the power spectrum ratio of noise to speech signal. The larger the ratio, the higher the proportion of noise in the signal, and vice versa. The function of this part is to compare the power spectrum of signal and noise, so as to suppress the noise in the enhancement process. The Wiener filter is a classic signal processing filter with nonlinear dynamic range compression characteristics, which can effectively suppress noise and improve the quality of the signal. Part of it is to use the principle of Wiener filter, by comparing the power spectrum of the original speech signal with the power spectrum of the noise, to enhance the signal. and Represent the power spectrum estimation of noise and speech signal respectively; In order to enhance the frequency domain representation of speech signals, the enhanced frequency domain signals Perform inverse short-time Fourier transform to obtain the enhanced speech signal.

实施例4:噪声分离单元,在判断需要进行语音情感分析时将背景噪声从语音信号中分离的方法包括:将语音信号表示为时域上的波形函数;通过短时傅里叶变换将其转换到频域,得到频域表示;设时间段的长度为,使用长度为的窗函数对其进行分段,窗口长度为,窗口之间的重叠长度为;窗口函数选择汉明窗,定义窗口函数为:Embodiment 4: A noise separation unit, when determining that speech emotion analysis is required, separates background noise from a speech signal, comprising: representing the speech signal as a waveform function in the time domain ; Convert it to the frequency domain through short-time Fourier transform to get the frequency domain representation ; Let the length of the time period be , using length The window function is segmented, and the window length is , the overlap length between windows is ; Select the Hamming window as the window function and define the window function as:

;

其中,表示窗口的采样索引;通过将窗口函数应用于语音信号的各个时间片段,并应用零填充将其扩展到长度为,以获得时域上的窗口信号;对每个窗口信号应用离散傅里叶变换,得到频域表示;设背景噪声是稳态的,并且与语音信号是线性叠加的;使用频域上的自适应滤波器对背景噪声进行建模和估计;设表示语音信号的干净频谱,表示背景噪声的频谱;定义自适应滤波器的频域响应为:in, represents the sampling index of the window; it is extended to a length of 1 by applying the window function to each time segment of the speech signal and applying zero padding , to obtain the window signal in the time domain ; For each window signal Applying discrete Fourier transform, we get the frequency domain representation ; Assume that the background noise is steady-state and linearly superimposed with the speech signal; use an adaptive filter in the frequency domain to model and estimate the background noise; Assume represents the clean spectrum of the speech signal, Represents the spectrum of the background noise; the frequency domain response of the adaptive filter is defined as:

;

其中,是在时间处的自适应滤波器频域响应;通过如下公式,使用自适应滤波器对语音信号进行频域上的重构,得到重构信号:in, It's in time The frequency domain response of the adaptive filter at ; using the following formula, using the adaptive filter Reconstruct the speech signal in the frequency domain to obtain the reconstructed signal:

;

其中,为重构信号;将重构信号转换回时域,得到将背景噪声从语音信号中分离后的语音信号in, To reconstruct the signal; convert the reconstructed signal back to the time domain to obtain the speech signal after separating the background noise from the speech signal .

具体的,首先,将原始的语音信号表示为时域上的波形函数。然后,通过短时傅里叶变换将时域的语音信号转换到频域,得到频域表示。这一步的目的是将语音信号从时域转换到频域,以便对其进行频域处理。将频域表示的语音信号进行分段,每段长度为,并应用汉明窗窗口函数,以减少频谱泄漏效应。这一步的目的是将语音信号分成多个窗口段,以便后续对每个窗口段进行独立处理,并且通过应用窗口函数来减少频谱泄漏效应。对每个窗口信号应用零填充将其扩展到长度为,然后对扩展后的窗口信号进行离散傅里叶变换,得到频域表示。这一步的目的是将每个窗口信号转换到频域,以便进行频域上的处理。假设背景噪声是稳态的,并且与语音信号是线性叠加的,使用频域上的自适应滤波器对背景噪声进行建模和估计。这一步的目的是估计出背景噪声的频谱特性,以便后续对其进行分离处理。根据语音信号的干净频谱和背景噪声的频谱,计算自适应滤波器的频域响应。这一步的目的是根据语音信号和背景噪声的频谱特性,计算出自适应滤波器的频域响应,以便后续对语音信号进行重构处理。使用自适应滤波器的频域响应对语音信号进行频域上的重构,得到重构信号。这一步的目的是根据自适应滤波器的频域响应,对语音信号进行去噪处理,分离出背景噪声成分,从而得到更干净的语音信号。将重构信号转换回时域,得到将背景噪声从语音信号中分离后的干净语音信号。这一步的目的是将频域表示的重构信号转换回时域,以便后续的语音情感分析和其他处理。Specifically, first, the original speech signal is represented as a waveform function in the time domain Then, the speech signal in the time domain is converted to the frequency domain through short-time Fourier transform to obtain the frequency domain representation The purpose of this step is to convert the speech signal from the time domain to the frequency domain so that it can be processed in the frequency domain. The speech signal represented in the frequency domain is segmented, and the length of each segment is , and apply the Hamming window function to reduce the spectral leakage effect. The purpose of this step is to divide the speech signal into multiple window segments so that each window segment can be processed independently later, and the spectral leakage effect can be reduced by applying the window function. Zero padding is applied to each window signal to extend it to a length of , and then perform discrete Fourier transform on the expanded window signal to obtain the frequency domain representation The purpose of this step is to convert each window signal into the frequency domain for processing in the frequency domain. Assuming that the background noise is steady-state and linearly superimposed with the speech signal, an adaptive filter in the frequency domain is used to model and estimate the background noise. The purpose of this step is to estimate the spectral characteristics of the background noise so that it can be separated and processed later. According to the clean spectrum of the speech signal and the spectrum of background noise , calculate the frequency domain response of the adaptive filter The purpose of this step is to calculate the frequency domain response of the adaptive filter based on the spectral characteristics of the speech signal and background noise, so as to reconstruct the speech signal in the subsequent processing. The frequency domain response of the adaptive filter is used to reconstruct the speech signal in the frequency domain to obtain the reconstructed signal The purpose of this step is to denoise the speech signal based on the frequency domain response of the adaptive filter, separate the background noise component, and thus obtain a cleaner speech signal. Convert the reconstructed signal back to the time domain to obtain a clean speech signal after separating the background noise from the speech signal. The purpose of this step is to convert the reconstructed signal represented in the frequency domain back to the time domain for subsequent speech sentiment analysis and other processing.

实施例5:语音处理装置,在语音采集装置判断需要进行语音情感分析时,通过如下公式,对采集到的语音信号应用预加重滤波器来平衡频谱,然后将语音信号分割成重叠的帧:Embodiment 5: When the speech acquisition device determines that speech emotion analysis is required, the speech processing device applies a pre-emphasis filter to the acquired speech signal to balance the spectrum, and then divides the speech signal into overlapping frames according to the following formula:

;

;

其中,是原始信号;是预加重后的信号;是预加重系数;表示第帧;是帧长;是帧移,表述相邻帧之间的重叠。in, is the original signal; It is the signal after pre-emphasis; is the pre-emphasis coefficient; Indicates frame; is the frame length; It is the frame shift, which describes the overlap between adjacent frames.

具体的,公式中的表示了预加重滤波器的应用。预加重滤波器用于平衡语音信号的频谱,增强高频部分的能量,从而提高语音信号的信噪比。在这个公式中,表示经过背景噪声分离处理后的干净语音信号,表示预加重后的语音信号,是预加重系数。预加重滤波器的作用是减少语音信号在传输过程中高频部分的衰减,从而提高语音信号的清晰度和可识别性公式中的表示了将预加重后的语音信号分割成重叠的帧的过程。在这个公式中,表示第帧,是帧长,是帧移(或称为帧移步长),表示相邻帧之间的重叠长度。通过将语音信号分割成帧的方式,可以将语音信号在时间上进行分段处理,使得每个帧内的语音信号可以被认为是短时稳态的,从而更容易进行后续的特征提取和分析。Specifically, the formula represents the application of pre-emphasis filter. The pre-emphasis filter is used to balance the spectrum of the speech signal and enhance the energy of the high-frequency part, thereby improving the signal-to-noise ratio of the speech signal. In this formula, represents the clean speech signal after background noise separation. Represents the pre-emphasized speech signal, is the pre-emphasis coefficient. The function of the pre-emphasis filter is to reduce the attenuation of the high-frequency part of the voice signal during transmission, thereby improving the clarity and recognizability of the voice signal. It represents the process of dividing the pre-emphasized speech signal into overlapping frames. In this formula, Indicates frame, is the frame length, is the frame shift (or frame shift step), which indicates the overlap length between adjacent frames. By dividing the speech signal into frames, the speech signal can be processed in segments in time, so that the speech signal in each frame can be considered as a short-term steady state, making it easier to perform subsequent feature extraction and analysis.

实施例6:语音处理装置对每帧信号,进行窗函数处理后,应用离散傅里叶变换,然后通过梅尔滤波器处理,以提取MFCC特征,公式如下:Embodiment 6: The speech processing device performs window function processing on each frame signal, applies discrete Fourier transform, and then processes it through Mel filter to extract MFCC features. The formula is as follows:

;

;

;

其中,个语音帧中的第个样本;表示经过离散傅里叶变换处理的第个频域系数;表示离散傅里叶变换的点数,同时也是每个语音帧的样本数量;等于表示离散傅里叶变换中独立频率成分的数量;是第个梅尔滤波器在第个频率点的增益;梅尔滤波器是一组重叠的三角形带通滤波器,用于模仿人耳的频率感知,对频率进行非线性的梅尔尺度转换;为梅尔滤波器的数量,代表在梅尔尺度上划分的频率带的数量;为通过将第个梅尔滤波器应用于离散傅里叶变换的系数并取对数得到的结果,它代表了第个频率带的对数能量;为第个梅尔频率倒谱系数,是对应用离散余弦变换的结果,目的是将梅尔滤波器的对数能量谱转换为时间域的倒谱系数,减少特征间的相关性,并突出频谱形状特征;为最终提取的MFCC特征的数量。in, No. The first samples; represents the discrete Fourier transform processed frequency domain coefficients; Represents the number of points of the discrete Fourier transform, which is also the number of samples in each speech frame; equal Represents the number of independent frequency components in the discrete Fourier transform; It is The Mel filter is The Mel filter is a set of overlapping triangular bandpass filters, which is used to simulate the frequency perception of the human ear and perform nonlinear Mel scale conversion on the frequency. is the number of Mel filters, representing the number of frequency bands divided on the Mel scale; In order to pass the The result of applying a Mel filter to the coefficients of the discrete Fourier transform and taking the logarithm represents the The logarithmic energy of a frequency band; For the Mel-frequency cepstral coefficients are The result of applying discrete cosine transform is to convert the logarithmic energy spectrum of the Mel filter into cepstral coefficients in the time domain, reduce the correlation between features, and highlight the spectral shape characteristics; is the number of MFCC features finally extracted.

具体的,首先,公式中的离散傅里叶变换(DFT)部分起着将时域语音信号转换为频域表示的作用。通过对每帧语音信号进行离散傅里叶变换,可以将语音信号在频域上进行分析,得到不同频率成分的幅度和相位信息。这一步骤对语音情感分析至关重要,因为不同的情感状态往往会表现在语音信号的频谱结构上,比如愤怒导致声音频率的增加或降低,而悲伤导致声音频率的变化。接着,梅尔滤波器处理部分将离散傅里叶变换后的频域信号映射到梅尔频率尺度上。梅尔频率尺度更符合人类听觉系统的感知特性,能够更好地捕捉到语音信号中重要的频率成分。梅尔滤波器将频谱分成一系列三角形带通滤波器,对不同频率范围内的能量进行加权,以更加准确地表示人类听觉对不同频率的感知。这一步骤的作用在于突出语音信号的重要频率成分,同时抑制不重要的频率成分,从而提高了特征的区分度和鲁棒性。最后,MFCC特征提取部分将经过梅尔滤波器处理后的频域能量谱转换为一组倒谱系数。这些倒谱系数反映了语音信号在梅尔频率轴上的频谱特征,包括语音信号的声音色彩和共振特性。通过采用离散余弦变换(DCT),可以减少特征之间的相关性,并将频域能量谱转换为更加紧凑和稳定的特征表示。MFCC特征具有较好的鉴别性和鲁棒性,适合用于语音情感分析和其他语音处理任务中。Specifically, first, the discrete Fourier transform (DFT) part in the formula plays the role of converting the time domain speech signal into a frequency domain representation. By performing a discrete Fourier transform on each frame of the speech signal, the speech signal can be analyzed in the frequency domain to obtain the amplitude and phase information of different frequency components. This step is crucial for speech emotion analysis because different emotional states are often reflected in the spectral structure of the speech signal, such as anger causing an increase or decrease in the sound frequency, while sadness causes a change in the sound frequency. Next, the Mel filter processing part maps the frequency domain signal after the discrete Fourier transform to the Mel frequency scale. The Mel frequency scale is more in line with the perceptual characteristics of the human auditory system and can better capture the important frequency components in the speech signal. The Mel filter divides the spectrum into a series of triangular bandpass filters and weights the energy in different frequency ranges to more accurately represent the human auditory perception of different frequencies. The role of this step is to highlight the important frequency components of the speech signal while suppressing the unimportant frequency components, thereby improving the distinguishability and robustness of the feature. Finally, the MFCC feature extraction part converts the frequency domain energy spectrum processed by the Mel filter into a set of cepstrum coefficients. These cepstral coefficients reflect the spectral characteristics of the speech signal on the Mel frequency axis, including the sound color and resonance characteristics of the speech signal. By using discrete cosine transform (DCT), the correlation between features can be reduced and the frequency domain energy spectrum can be converted into a more compact and stable feature representation. MFCC features have good discriminability and robustness, and are suitable for speech sentiment analysis and other speech processing tasks.

公式的第一部分表示了对每个语音帧进行离散傅里叶变换(DFT)。离散傅里叶变换是一种将时域信号转换到频域的技术,它将语音信号从时域表示转换为频域表示,得到了频域系数,其中表示频率索引,表示每个语音帧的样本数量。接下来,公式的第二部分表示了梅尔滤波器的处理过程。梅尔滤波器是一组用于模仿人耳频率感知的滤波器,它在频率轴上对频谱进行非线性的梅尔尺度转换。在这里,表示第个梅尔滤波器在第个频率点的增益,是离散傅里叶变换中独立频率成分的数量。通过梅尔滤波器的处理,可以将频率轴转换为梅尔尺度,更符合人耳的听觉特性,得到了对数能量谱。最后,公式的第三部分表示了MFCC特征的提取过程。MFCC是一种常用的语音信号特征提取方法,它通过将对数能量谱应用离散余弦变换,得到了一组倒谱系数,即MFCC。这些系数可以减少特征之间的相关性,并突出语音信号的频谱形状特征,提供了更具辨识度的特征表示。The first part of the formula For each speech frame Perform discrete Fourier transform (DFT). Discrete Fourier transform is a technique for converting time domain signals to frequency domain. It converts speech signals from time domain representation to frequency domain representation and obtains frequency domain coefficients. ,in represents the frequency index, represents the number of samples per speech frame. Next, the second part of the formula The figure shows the processing of the Mel filter. The Mel filter is a set of filters used to mimic the frequency perception of the human ear. It performs a nonlinear Mel scale transformation on the frequency axis. Here, Indicates The Mel filter is The gain at each frequency point, is the number of independent frequency components in the discrete Fourier transform. Through the processing of the Mel filter, the frequency axis can be converted to the Mel scale, which is more in line with the auditory characteristics of the human ear, and the logarithmic energy spectrum is obtained. Finally, the third part of the formula It shows the MFCC feature extraction process. MFCC is a commonly used method for extracting speech signal features. It converts the logarithmic energy spectrum into Applying discrete cosine transform, a set of cepstral coefficients, or MFCCs, are obtained. These coefficients can reduce the correlation between features and highlight the spectral shape characteristics of speech signals, providing a more recognizable feature representation.

实施例7:基频特征通过如下公式计算得到:Example 7: Baseband Characteristics Calculated by the following formula:

;

其中,是自相关函数;的峰值位置;是采样频率;能量特征通过如下公式,计算得到:in, is the autocorrelation function; yes The peak position of is the sampling frequency; energy characteristic The calculation is done by the following formula:

;

其中,为第个语音帧的能量;得到的特征向量为:in, For the The energy of a speech frame; the obtained feature vector is:

.

具体的,首先,基频特征的计算是通过自相关函数的分析得到的。自相关函数是一个描述信号与自身延迟版本之间相关性的函数,通过计算语音帧在不同延迟下的自相关值,可以找到自相关函数的峰值位置。基频特征被定义为采样频率与自相关函数峰值位置的比值。基频特征在语音信号分析中通常用于表示语音的基本频率,即声音的音高。其次,能量特征的计算通过对语音帧能量的差分得到。能量是语音帧中所有样本的幅度平方的总和,表示了语音帧的总体能量。能量特征则是相邻语音帧能量之间的差值,用于表示语音帧之间的能量变化情况。能量特征在语音信号分析中常用于表示语音的能量变化,例如声音的强度变化或者语音的停顿和活动部分的识别。最后,得到的特征向量包括基频特征、能量特征,以及通过MFCC提取的一系列倒谱系数。这些MFCC系数反映了语音信号在梅尔频率轴上的频谱特征,具有较好的鉴别性和鲁棒性。通过将基频特征、能量特征和MFCC系数结合起来,可以更全面地描述语音信号的频谱和能量特征,为后续的语音情感分析提供更丰富的特征信息。Specifically, first, the fundamental frequency characteristics The calculation is done by the autocorrelation function The autocorrelation function is a function that describes the correlation between a signal and its delayed version. At different delays The autocorrelation value under , the peak position of the autocorrelation function can be found . Fundamental frequency characteristics The sampling frequency is defined as The peak position of the autocorrelation function The fundamental frequency characteristic In speech signal analysis, it is usually used to represent the basic frequency of speech, that is, the pitch of the sound. The calculation is done by calculating the energy of the speech frame The energy is obtained by taking the difference of It is a speech frame The sum of the squares of the amplitudes of all samples in represents the overall energy of the speech frame. Energy characteristics It is the difference between the energies of adjacent speech frames, which is used to indicate the energy changes between speech frames. Energy characteristics In speech signal analysis, it is often used to represent the energy changes of speech, such as the intensity changes of the sound or the recognition of pauses and active parts of speech. Finally, the feature vector obtained is Including fundamental frequency characteristics , Energy characteristics , and a series of cepstral coefficients extracted by MFCC These MFCC coefficients reflect the spectral characteristics of the speech signal on the Mel frequency axis, and have good discrimination and robustness. By combining the fundamental frequency characteristics, energy characteristics and MFCC coefficients, the spectral and energy characteristics of the speech signal can be more comprehensively described, providing richer feature information for subsequent speech emotion analysis.

实施例8:预先训练的语音情感分析模型为一个三分支持向量模型,其类别标签为:Example 8: The pre-trained speech sentiment analysis model is a three-part support vector model, and its category label is:

,

表示三种情感特征,类别标签中每个元素对应一种情感类别;所述语音情感分析模型使用如下公式进行表示:Represents three emotional features, category labels Each element in corresponds to an emotion category; the speech emotion analysis model is expressed using the following formula:

;

;

;

;

;

其中,表示决策超平面的法向量,用于定义分类边界;表示偏置项,也称为截距,它是支持向量机模型中的参数,用于平移分类边界;表示额外决策函数的权重向量,用于定义额外的分类边界;表示额外决策函数的偏置项,用于平移额外分类边界;表示松弛变量,表示允许偏离超平面的程度;表示正则化参数,控制了松弛变量的重要性,它的值越大,对误分类的惩罚越严重;表示指示变量;为额外的决策函数;采集历史的语音段的特征向量,以及对应的类别标签,作为模型训练测试数据,将模型训练测试数据中的一部分作为训练数据,训练语音情感分析模型,训练的目标是找到一个最优的决策边界,以最大程度地正确分类训练集中的样本,并且保持模型的泛化能力;将模型训练测试数据中除训练数据以外的部分作为测试数据,使用测试数据对训练好的支持向量机模型进行评估,评估指标为准确率,若准确率超过设定的准确率阈值,则停止训练,否则,则调整模型的参数,继续进行训练,直到准确率超过设定的准确率阈值。in, Represents the normal vector of the decision hyperplane, which is used to define the classification boundary; represents the bias term, also known as the intercept, which is a parameter in the support vector machine model used to shift the classification boundary; A weight vector representing an additional decision function, used to define additional classification boundaries; The bias term of the additional decision function is used to shift the additional classification boundary; represents the slack variable, indicating the degree of deviation from the hyperplane allowed; represents the regularization parameter, The importance of the slack variable is controlled. The larger its value, the more severe the penalty for misclassification. represents an indicator variable; is an additional decision function; the feature vectors of historical speech segments and the corresponding category labels are collected as model training test data, and a part of the model training test data is used as training data to train the speech sentiment analysis model. The training goal is to find an optimal decision boundary to correctly classify the samples in the training set to the greatest extent and maintain the generalization ability of the model; the part of the model training test data other than the training data is used as test data, and the trained support vector machine model is evaluated using the test data. The evaluation indicator is accuracy. If the accuracy exceeds the set accuracy threshold, the training is stopped, otherwise, the parameters of the model are adjusted and the training continues until the accuracy exceeds the set accuracy threshold.

具体的,实施例8中描述了一个用于语音情感分析的支持向量机(SVM)模型,其目标是根据输入的语音特征向量对其进行情感分类,其中情感类别分为三类:负向情感()、中性情感()和正向情感()。该模型通过优化一个目标函数来学习一个最优的决策边界,以区分不同情感类别的语音。首先,目标函数由两部分组成。第一部分是最小化权重向量的平方范数,即,这部分的目标是找到一个合适的决策超平面,以最大程度地正确分类训练集中的样本。同时,为了避免过度拟合训练数据,目标函数还包括一个正则化项,其中是正则化参数,用于控制模型的复杂度,而是松弛变量,表示允许一定程度上的误分类。第二部分是关于约束条件的描述。约束条件确保训练样本在决策边界上的分类是正确的。对于每个训练样本,其特征向量与对应的类别标签之间的内积与边界的距离至少为。这样的约束保证了大部分样本都被正确地分类,同时通过引入松弛变量允许一定程度上的分类错误。额外的约束条件包括了一个辅助决策函数,它与主要的决策函数并列使用,以提高模型的分类性能。这个辅助决策函数也受到类似的约束条件,确保其在正确分类的边界上。另外,是一个指示变量,根据类别标签的值确定是否启用额外的约束条件。如果类别标签为中性情感,则被设为1,表示启用额外的约束条件,否则设为0,表示不启用额外的约束条件。在模型训练过程中,使用历史语音段的特征向量和对应的类别标签作为训练测试数据。这部分数据被分成训练数据和测试数据,其中训练数据用于训练模型,而测试数据用于评估模型的性能。评估指标通常为准确率,即正确分类的样本比例。如果准确率达到设定的阈值,则停止训练,否则调整模型参数并继续训练,直到达到设定的准确率阈值为止。Specifically, Example 8 describes a support vector machine (SVM) model for speech sentiment analysis, the goal of which is to The emotion classification is carried out, and the emotion categories are divided into three categories: negative emotion ( ), neutral sentiment ( ) and positive emotions ( ). The model learns an optimal decision boundary to distinguish speech of different emotion categories by optimizing an objective function. First, the objective function consists of two parts. The first part is to minimize the weight vector The square norm of , the goal of this part is to find a suitable decision hyperplane to maximize the correct classification of samples in the training set. At the same time, in order to avoid overfitting the training data, the objective function also includes a regularization term ,in is a regularization parameter used to control the complexity of the model, and is a slack variable, indicating that a certain degree of misclassification is allowed. The second part is a description of the constraints. The constraints ensure that the classification of the training samples on the decision boundary is correct. For each training sample, its feature vector The corresponding category label The distance between the inner product and the boundary is at least Such constraints ensure that most samples are correctly classified, and at the same time, by introducing slack variables A certain degree of classification error is allowed. Additional constraints include an auxiliary decision function , which is used in parallel with the main decision function to improve the classification performance of the model. This auxiliary decision function is also subject to similar constraints to ensure that it is on the boundary of the correct classification. In addition, is an indicator variable, according to the category label The value of determines whether additional constraints are enabled. If the category label For neutral emotions, Set to 1, indicating that additional constraints are enabled, otherwise set to 0, indicating that additional constraints are not enabled. During the model training process, the feature vectors of the historical speech segments and the corresponding category labels are used as training and testing data. This part of the data is divided into training data and test data, where the training data is used to train the model and the test data is used to evaluate the performance of the model. The evaluation indicator is usually the accuracy, that is, the proportion of samples that are correctly classified. If the accuracy reaches the set threshold, the training is stopped, otherwise the model parameters are adjusted and the training continues until the set accuracy threshold is reached.

实施例9:额外的决策函数使用如下公式进行表示:Example 9: Additional decision functions Use the following formula to express it:

.

具体的,额外的决策函数的引入是为了增加一个类别的分类。在支持向量机(SVM)模型中,原始的决策函数通常只能处理两个类别的分类问题,但在实际应用中,有时候需要处理更多的情感类别。额外的决策函数的作用就是通过引入额外的分类边界,使得支持向量机模型能够处理多于两个类别的分类问题,从而提高了模型的适用性和性能。具体地,额外的决策函数通过定义额外的权重向量和偏置项,为每个额外的情感类别提供了一个额外的分类边界。这些额外的分类边界与原始的分类边界共同构成了一个多类别的分类模型,使得模型能够同时区分和识别多种不同的情感类别。通过调整额外分类边界的参数,可以对每个情感类别的分类准确性进行调节,从而实现更加精准的情感分类。Specifically, additional decision functions The introduction of is to increase the classification of a category. In the support vector machine (SVM) model, the original decision function can usually only handle classification problems of two categories, but in practical applications, sometimes more emotional categories need to be processed. The role of the additional decision function is to enable the support vector machine model to handle classification problems of more than two categories by introducing additional classification boundaries, thereby improving the applicability and performance of the model. Specifically, the additional decision function defines an additional weight vector and the bias term , providing an additional classification boundary for each additional emotion category. These additional classification boundaries together with the original classification boundaries constitute a multi-category classification model, which enables the model to distinguish and identify multiple different emotion categories at the same time. By adjusting the parameters of the additional classification boundaries, the classification accuracy of each emotion category can be adjusted, thereby achieving more accurate emotion classification.

以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。As described above, the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit the same. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features thereof may be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1.老年人情感健康分析与支持系统,其特征在于,所述系统包括:语音采集装置、语音处理装置和语音情感分析装置;所述语音采集装置,用于在设定的时间段内,获取目标老年人的语音信号,并对语音信号进行信号初步分析,以判断是否需要进行语音情感分析,具体包括:统计分析语音信号的语音平均能量和语音频次占比;所述语音平均能量定义为在设定时间段内,语音信号的总能量与时间段的比值;所述语音频次占比定义为在设定时间段内,语音信号的长度与时间段的比值;若语音平均能量或语音频次占比均处于各自对应的阈值范围内,则判断不需要进行语音情感分析,否则,则判断需要进行语音情感分析;所述语音处理装置,用于在语音采集装置判断需要进行语音情感分析时,对采集到的语音信号应用预加重滤波器来平衡频谱,得到预处理信号,从预处理信号中提取MFCC特征、基频特征和能量特征,作为特征向量中的元素,组成特征向量,并基于特征向量,使用零交叉率的方法将预处理信号划分为语音段和非语音段;所述语音情感分析装置,用于对语音段的特征向量使用预先训练的语音情感分析模型进行情感分析,判断语音段的情感特征,所述情感特征包括:正向情感特征、中性情感特征和负向情感特征,若在设定的时间周期内处于负向情感特征的语音段的总帧数与时间周期的长度的比值超过设定的阈值,则判断目标老年人处于负向情绪中,发出需要情感干预的预警信号。1. An emotional health analysis and support system for the elderly, characterized in that the system comprises: a speech acquisition device, a speech processing device and a speech emotion analysis device; the speech acquisition device is used to acquire the speech signal of the target elderly within a set time period, and perform preliminary signal analysis on the speech signal to determine whether speech emotion analysis is needed, specifically including: statistically analyzing the average speech energy and speech frequency ratio of the speech signal; the average speech energy is defined as the ratio of the total energy of the speech signal to the time period within the set time period; the speech frequency ratio is defined as the ratio of the length of the speech signal to the time period within the set time period; if the average speech energy or the speech frequency ratio are both within the corresponding threshold range, it is determined that speech emotion analysis is not needed, otherwise, it is determined that speech emotion analysis is needed; the speech processing device is used to perform a statistical analysis on the speech signal in the speech When the acquisition device determines that speech emotion analysis is needed, a pre-emphasis filter is applied to the collected speech signal to balance the spectrum to obtain a preprocessed signal, MFCC features, fundamental frequency features and energy features are extracted from the preprocessed signal as elements in a feature vector to form a feature vector, and based on the feature vector, the preprocessed signal is divided into speech segments and non-speech segments using a zero crossing rate method; the speech emotion analysis device is used to perform emotion analysis on the feature vector of the speech segment using a pre-trained speech emotion analysis model to determine the emotion characteristics of the speech segment, the emotion characteristics include: positive emotion characteristics, neutral emotion characteristics and negative emotion characteristics, if the ratio of the total number of frames of the speech segment with negative emotion characteristics within a set time period to the length of the time period exceeds a set threshold, it is determined that the target elderly person is in a negative emotion, and a warning signal requiring emotional intervention is issued. 2.如权利要求1所述的老年人情感健康分析与支持系统,其特征在于,所述语音采集装置包括:采集单元、增强单元、初步分析单元和噪声分离单元;所述采集单元,用于在设定的时间段内,通过语音识别判断发出的语音信号是否为目标老年人发出,若是,则对语音信号进行采集;所述增强单元,用于对语音信号进行信号增强,得到增强语音信号;所述初步分析单元,对语音信号进行信号初步分析,以判断是否需要进行语音情感分析;所述噪声分离单元,用于在判断需要进行语音情感分析时将背景噪声从语音信号中分离。2. The emotional health analysis and support system for the elderly as described in claim 1 is characterized in that the voice collection device includes: a collection unit, an enhancement unit, a preliminary analysis unit and a noise separation unit; the collection unit is used to determine whether the voice signal emitted is emitted by the target elderly person through voice recognition within a set time period, and if so, to collect the voice signal; the enhancement unit is used to enhance the voice signal to obtain an enhanced voice signal; the preliminary analysis unit performs preliminary analysis of the voice signal to determine whether voice emotion analysis is needed; the noise separation unit is used to separate background noise from the voice signal when it is determined that voice emotion analysis is needed. 3.如权利要求2所述的老年人情感健康分析与支持系统,其特征在于,增强单元,对语音信号进行信号增强,得到增强语音信号的方法包括:使用如下公式,将语音信号进行基于自回归模型的短时傅里叶变换,得到时频表示3. The emotional health analysis and support system for the elderly as claimed in claim 2, characterized in that the enhancement unit performs signal enhancement on the voice signal to obtain the enhanced voice signal, comprising: using the following formula to enhance the voice signal Perform short-time Fourier transform based on the autoregressive model to obtain the time-frequency representation : ; 其中,表示短时傅里叶变换的时间片段索引;表示频率索引;为每个时间片段的窗长;为窗函数;是自回归模型的系数,是自回归模型的阶数;为虚数符号;时域索引;通过如下公式,采用具有非线性动态范围压缩特性的Wiener滤波器对时频表示进行增强:in, represents the time segment index of the short-time Fourier transform; represents the frequency index; is the window length for each time segment; is the window function; are the coefficients of the autoregressive model, is the order of the autoregressive model; is the imaginary number symbol; Time domain index; the time-frequency representation is enhanced using a Wiener filter with nonlinear dynamic range compression characteristics through the following formula: ; 其中,分别表示噪声和语音信号的功率谱估计;为增强语音信号的频域表示,将增强后的频域信号进行逆短时傅里叶变换得到增强语音信号。in, and Represent the power spectrum estimation of noise and speech signal respectively; In order to enhance the frequency domain representation of speech signals, the enhanced frequency domain signals Perform inverse short-time Fourier transform to obtain the enhanced speech signal. 4.如权利要求3所述的老年人情感健康分析与支持系统,其特征在于,噪声分离单元,在判断需要进行语音情感分析时将背景噪声从语音信号中分离的方法包括:将语音信号表示为时域上的波形函数;通过短时傅里叶变换将其转换到频域,得到频域表示;设时间段的长度为,使用长度为的窗函数对其进行分段,窗口长度为,窗口之间的重叠长度为;窗口函数选择汉明窗,定义窗口函数为:4. The emotional health analysis and support system for the elderly as claimed in claim 3, characterized in that the noise separation unit, when determining that speech emotion analysis is required, separates background noise from speech signals by: representing the speech signals as waveform functions in the time domain ; Convert it to the frequency domain through short-time Fourier transform to get the frequency domain representation ; Let the length of the time period be , using length The window function is segmented, and the window length is , the overlap length between windows is ; Select the Hamming window as the window function and define the window function as: ; 其中,表示窗口的采样索引;通过将窗口函数应用于语音信号的各个时间片段,并应用零填充将其扩展到长度为,以获得时域上的窗口信号;对每个窗口信号应用离散傅里叶变换,得到频域表示;设背景噪声是稳态的,并且与语音信号是线性叠加的;使用频域上的自适应滤波器对背景噪声进行建模和估计;设表示语音信号的干净频谱,表示背景噪声的频谱;定义自适应滤波器的频域响应为:in, represents the sampling index of the window; it is extended to a length of 1 by applying the window function to each time segment of the speech signal and applying zero padding , to obtain the window signal in the time domain ; For each window signal Applying discrete Fourier transform, we get the frequency domain representation ; Assume that the background noise is steady-state and linearly superimposed with the speech signal; use an adaptive filter in the frequency domain to model and estimate the background noise; Assume represents the clean spectrum of the speech signal, Represents the spectrum of the background noise; the frequency domain response of the adaptive filter is defined as: ; 其中,是在时间处的自适应滤波器频域响应;通过如下公式,使用自适应滤波器对语音信号进行频域上的重构,得到重构信号:in, It's in time The frequency domain response of the adaptive filter at ; using the following formula, using the adaptive filter Reconstruct the speech signal in the frequency domain to obtain the reconstructed signal: ; 其中,为重构信号;将重构信号转换回时域,得到将背景噪声从语音信号中分离后的语音信号in, To reconstruct the signal; convert the reconstructed signal back to the time domain to obtain the speech signal after separating the background noise from the speech signal . 5.如权利要求4所述的老年人情感健康分析与支持系统,其特征在于,语音处理装置,在语音采集装置判断需要进行语音情感分析时,通过如下公式,对采集到的语音信号应用预加重滤波器来平衡频谱,然后将语音信号分割成重叠的帧:5. The emotional health analysis and support system for the elderly as claimed in claim 4 is characterized in that the speech processing device, when the speech acquisition device determines that speech emotion analysis is required, applies a pre-emphasis filter to the collected speech signal to balance the spectrum by the following formula, and then divides the speech signal into overlapping frames: ; ; 其中,是原始信号;是预加重后的信号;是预加重系数;表示第帧;是帧长;是帧移,表述相邻帧之间的重叠。in, is the original signal; It is the signal after pre-emphasis; is the pre-emphasis coefficient; Indicates frame; is the frame length; It is the frame shift, which describes the overlap between adjacent frames. 6.如权利要求5所述的老年人情感健康分析与支持系统,其特征在于,语音处理装置对每帧信号,进行窗函数处理后,应用离散傅里叶变换,然后通过梅尔滤波器处理,以提取MFCC特征,公式如下:6. The emotional health analysis and support system for the elderly as claimed in claim 5, characterized in that the speech processing device performs window function processing on each frame signal, applies discrete Fourier transform, and then processes it through Mel filter to extract MFCC features, and the formula is as follows: ; ; ; 其中,个语音帧中的第个样本;表示经过离散傅里叶变换处理的第个频域系数;表示离散傅里叶变换的点数,同时也是每个语音帧的样本数量;等于表示离散傅里叶变换中独立频率成分的数量;是第个梅尔滤波器在第个频率点的增益;梅尔滤波器是一组重叠的三角形带通滤波器,用于模仿人耳的频率感知,对频率进行非线性的梅尔尺度转换;为梅尔滤波器的数量,代表在梅尔尺度上划分的频率带的数量;为通过将第个梅尔滤波器应用于离散傅里叶变换的系数并取对数得到的结果,它代表了第个频率带的对数能量;为第个梅尔频率倒谱系数,是对应用离散余弦变换的结果,目的是将梅尔滤波器的对数能量谱转换为时间域的倒谱系数,减少特征间的相关性,并突出频谱形状特征;为最终提取的MFCC特征的数量。in, No. The first samples; represents the discrete Fourier transform processed frequency domain coefficients; Represents the number of points of the discrete Fourier transform, which is also the number of samples in each speech frame; equal Represents the number of independent frequency components in the discrete Fourier transform; It is The Mel filter is The Mel filter is a set of overlapping triangular bandpass filters, which is used to simulate the frequency perception of the human ear and perform nonlinear Mel scale conversion on the frequency. is the number of Mel filters, representing the number of frequency bands divided on the Mel scale; In order to pass the The result of applying a Mel filter to the coefficients of the discrete Fourier transform and taking the logarithm represents the The logarithmic energy of a frequency band; For the Mel-frequency cepstral coefficients are The result of applying discrete cosine transform is to convert the logarithmic energy spectrum of the Mel filter into cepstral coefficients in the time domain, reduce the correlation between features, and highlight the spectral shape characteristics; is the number of MFCC features finally extracted. 7.如权利要求6所述的老年人情感健康分析与支持系统,其特征在于,基频特征通过如下公式计算得到:7. The emotional health analysis and support system for the elderly as claimed in claim 6, characterized in that the fundamental frequency feature Calculated by the following formula: ; 其中,是自相关函数;的峰值位置;是采样频率;能量特征通过如下公式,计算得到:in, is the autocorrelation function; yes The peak position of is the sampling frequency; energy characteristics The calculation is done by the following formula: ; 其中,为第个语音帧的能量;得到的特征向量为:in, For the The energy of a speech frame; the obtained feature vector is: . 8.如权利要求7所述的老年人情感健康分析与支持系统,其特征在于,预先训练的语音情感分析模型为一个三分支持向量模型,其类别标签为:8. The emotional health analysis and support system for the elderly as claimed in claim 7, characterized in that the pre-trained speech emotion analysis model is a three-point support vector model, and its category label is: , 表示三种情感特征,类别标签中每个元素对应一种情感类别;所述语音情感分析模型使用如下公式进行表示:Represents three emotional features, category labels Each element in corresponds to an emotion category; the speech emotion analysis model is expressed using the following formula: ; ; ; ; ; 其中,表示决策超平面的法向量,用于定义分类边界;表示偏置项,也称为截距,它是支持向量机模型中的参数,用于平移分类边界;表示额外决策函数的权重向量,用于定义额外的分类边界;表示额外决策函数的偏置项,用于平移额外分类边界;表示松弛变量,表示允许偏离超平面的程度;表示正则化参数,控制了松弛变量的重要性,它的值越大,对误分类的惩罚越严重;表示指示变量;为额外的决策函数;采集历史的语音段的特征向量,以及对应的类别标签,作为模型训练测试数据,将模型训练测试数据中的一部分作为训练数据,训练语音情感分析模型,训练的目标是找到一个最优的决策边界,以最大程度地正确分类训练集中的样本,并且保持模型的泛化能力;将模型训练测试数据中除训练数据以外的部分作为测试数据,使用测试数据对训练好的支持向量机模型进行评估,评估指标为准确率,若准确率超过设定的准确率阈值,则停止训练,否则,则调整模型的参数,继续进行训练,直到准确率超过设定的准确率阈值。in, Represents the normal vector of the decision hyperplane, which is used to define the classification boundary; represents the bias term, also known as the intercept, which is a parameter in the support vector machine model used to shift the classification boundary; A weight vector representing an additional decision function, used to define additional classification boundaries; The bias term of the additional decision function is used to shift the additional classification boundary; represents the slack variable, indicating the degree of deviation from the hyperplane allowed; represents the regularization parameter, The importance of the slack variable is controlled. The larger its value, the more severe the penalty for misclassification. represents an indicator variable; is an additional decision function; the feature vectors of historical speech segments and the corresponding category labels are collected as model training test data, and a part of the model training test data is used as training data to train the speech sentiment analysis model. The training goal is to find an optimal decision boundary to correctly classify the samples in the training set to the greatest extent and maintain the generalization ability of the model; the part of the model training test data other than the training data is used as test data, and the trained support vector machine model is evaluated using the test data. The evaluation indicator is accuracy. If the accuracy exceeds the set accuracy threshold, the training is stopped, otherwise, the parameters of the model are adjusted and the training continues until the accuracy exceeds the set accuracy threshold. 9.如权利要求8所述的老年人情感健康分析与支持系统,其特征在于,额外的决策函数使用如下公式进行表示:9. The emotional health analysis and support system for the elderly as claimed in claim 8, characterized in that the additional decision function Use the following formula to express it: .
CN202410411579.4A 2024-04-08 2024-04-08 Emotional health analysis and support system for the elderly Pending CN118016106A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410411579.4A CN118016106A (en) 2024-04-08 2024-04-08 Emotional health analysis and support system for the elderly

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410411579.4A CN118016106A (en) 2024-04-08 2024-04-08 Emotional health analysis and support system for the elderly

Publications (1)

Publication Number Publication Date
CN118016106A true CN118016106A (en) 2024-05-10

Family

ID=90950288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410411579.4A Pending CN118016106A (en) 2024-04-08 2024-04-08 Emotional health analysis and support system for the elderly

Country Status (1)

Country Link
CN (1) CN118016106A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118398010A (en) * 2024-06-25 2024-07-26 深圳市万德昌创新智能有限公司 Voice recognition-based pension service robot interaction method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173260B1 (en) * 1997-10-29 2001-01-09 Interval Research Corporation System and method for automatic classification of speech based upon affective content
KR20040038419A (en) * 2002-11-01 2004-05-08 에스엘투(주) A method and apparatus for recognizing emotion from a speech
RU2008141478A (en) * 2008-10-22 2010-04-27 Александр Вадимович Баклаев (RU) SYSTEM OF EMOTIONAL STABILIZATION OF SPEECH COMMUNICATIONS "EMOS"
CN102184408A (en) * 2011-04-11 2011-09-14 西安电子科技大学 Autoregressive-model-based high range resolution profile radar target recognition method
CN105374367A (en) * 2014-07-29 2016-03-02 华为技术有限公司 Abnormal frame detecting method and abnormal frame detecting device
CN108682432A (en) * 2018-05-11 2018-10-19 南京邮电大学 Speech emotion recognition device
CN109961803A (en) * 2017-12-18 2019-07-02 上海智臻智能网络科技股份有限公司 Voice mood identifying system
CN110491416A (en) * 2019-07-26 2019-11-22 广东工业大学 It is a kind of based on the call voice sentiment analysis of LSTM and SAE and recognition methods
CN110890096A (en) * 2019-10-12 2020-03-17 深圳供电局有限公司 An intelligent speech system and method based on speech analysis
CN111684522A (en) * 2019-05-15 2020-09-18 深圳市大疆创新科技有限公司 Voice recognition method, interaction method, voice recognition system, computer-readable storage medium, and removable platform
WO2020233504A1 (en) * 2019-05-17 2020-11-26 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for emotion recognition
CN114005432A (en) * 2021-10-21 2022-02-01 江苏信息职业技术学院 Chinese dialect identification method based on active learning
CN114792517A (en) * 2022-03-30 2022-07-26 重庆工程职业技术学院 A kind of intelligent water cup voice recognition method and device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173260B1 (en) * 1997-10-29 2001-01-09 Interval Research Corporation System and method for automatic classification of speech based upon affective content
KR20040038419A (en) * 2002-11-01 2004-05-08 에스엘투(주) A method and apparatus for recognizing emotion from a speech
RU2008141478A (en) * 2008-10-22 2010-04-27 Александр Вадимович Баклаев (RU) SYSTEM OF EMOTIONAL STABILIZATION OF SPEECH COMMUNICATIONS "EMOS"
CN102184408A (en) * 2011-04-11 2011-09-14 西安电子科技大学 Autoregressive-model-based high range resolution profile radar target recognition method
CN105374367A (en) * 2014-07-29 2016-03-02 华为技术有限公司 Abnormal frame detecting method and abnormal frame detecting device
CN109961803A (en) * 2017-12-18 2019-07-02 上海智臻智能网络科技股份有限公司 Voice mood identifying system
CN108682432A (en) * 2018-05-11 2018-10-19 南京邮电大学 Speech emotion recognition device
CN111684522A (en) * 2019-05-15 2020-09-18 深圳市大疆创新科技有限公司 Voice recognition method, interaction method, voice recognition system, computer-readable storage medium, and removable platform
WO2020233504A1 (en) * 2019-05-17 2020-11-26 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for emotion recognition
CN110491416A (en) * 2019-07-26 2019-11-22 广东工业大学 It is a kind of based on the call voice sentiment analysis of LSTM and SAE and recognition methods
CN110890096A (en) * 2019-10-12 2020-03-17 深圳供电局有限公司 An intelligent speech system and method based on speech analysis
CN114005432A (en) * 2021-10-21 2022-02-01 江苏信息职业技术学院 Chinese dialect identification method based on active learning
CN114792517A (en) * 2022-03-30 2022-07-26 重庆工程职业技术学院 A kind of intelligent water cup voice recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
费业泰: "《误差理论与数据处理》", 31 May 1995, 机械工业出版社, pages: 173 - 174 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118398010A (en) * 2024-06-25 2024-07-26 深圳市万德昌创新智能有限公司 Voice recognition-based pension service robot interaction method

Similar Documents

Publication Publication Date Title
CN108896878B (en) Partial discharge detection method based on ultrasonic waves
Kuresan et al. Fusion of WPT and MFCC feature extraction in Parkinson’s disease diagnosis
CN107657964A (en) Depression aided detection method and grader based on acoustic feature and sparse mathematics
CN108550375A (en) A kind of emotion identification method, device and computer equipment based on voice signal
CN111951824A (en) A detection method for discriminating depression based on sound
CN107610715A (en) A kind of similarity calculating method based on muli-sounds feature
AU2013274940B2 (en) Cepstral separation difference
CN103236260A (en) Voice recognition system
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium, and terminal
CN113012720A (en) Depression detection method by multi-voice characteristic fusion under spectral subtraction noise reduction
CN108682432B (en) Voice emotion recognition device
CN112397074A (en) Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning
CN111489763A (en) Adaptive method for speaker recognition in complex environment based on GMM model
Zhao et al. Multi-stream spectro-temporal features for robust speech recognition.
CN118016106A (en) Emotional health analysis and support system for the elderly
CN115346561B (en) Depression emotion assessment and prediction method and system based on voice characteristics
CN118173092A (en) Online customer service platform based on AI voice interaction
CN116434739A (en) Device and related components for constructing a classification model for identifying different stages of heart failure
Iwok et al. Evaluation of Machine Learning Algorithms using Combined Feature Extraction Techniques for Speaker Identification
CN118379986B (en) Keyword-based non-standard voice recognition method, device, equipment and medium
CN107871498A (en) A Hybrid Feature Combination Algorithm Based on Fisher's Criterion to Improve Speech Recognition Rate
Gayathri et al. Identification of voice pathology from temporal and cepstral features for vowel ‘a’low intonation
CN118522271B (en) Immersion type digital doctor assessment method based on AI technology
Saldanha et al. Jitter as a quantitative indicator of dysphonia in Parkinson's disease
CN114496221B (en) Automatic depression diagnosis system based on closed-loop voice chain and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20240510