计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 187-192.doi: 10.11896/jsjkx.210500114
尹文兵1, 高戈1, 曾邦1, 王霄1, 陈怡2
YIN Wen-bing1, GAO Ge1, ZENG Bang1, WANG Xiao1, CHEN Yi2
摘要: 传统基于生成对抗网络的语音增强算法(Speech Enhancement Algorithm Based on Generative Adversarial Networks,SEGAN)在时域上对语音进行增强处理,完全忽略了语音样本在频域上的分布情况。在低信噪比条件下,语音信号会淹没在噪声中,带噪语音的时域分布信息很难捕获,因此,SEGAN的增强性能会急剧下降,其增强语音的语音质量和语音可懂度很低。针对该问题,提出了基于时频域生成对抗网络的语音增强算法(Time-Frequency Domain SEGAN,TFSEGAN)。TFSEGAN采用了时频域双判别器的模型结构和时频域L1损失函数,时域判别器的输入为语音样本的时域特征,频域判别器的输入为语音样本的频域特征。在训练过程中,时域判别器将语音样本的时域分布信息作为判别标准,而频域判别器将语音样本的频域分布信息作为判别标准。在两个判别器的作用下,TFSEGAN的生成器能够同时学习语音样本在时域和频域中的分布规律和信息。实验证明,在低信噪比条件下,与SEGAN相比,TFSEGAN的语音质量与可懂度分别提升了约17.45%和11.75%。
中图分类号:
[1] BOLL S F.Suppression of acoustic noise in speech using spectral subtraction[J].IEEE Transactions on Acoustics Speech & Signal Processing,1979,27(2):113-120. [2] LIM J S,OPPENHEIM A V.Enhancement and bandwidth compression of noisy speech[J].Proceedings of the IEEE,2005,67(12):1586-1604. [3] MCAULAY R J,MALPASS M L.Speech enhancement using a soft-decision noise suppression filter[J].IEEE Trans. Acoust. Speech Signal Process,1980,28(2):137-145. [4] DENDRINOS M,BAKAMIDIS S,CARAYANNIS G.Speechenhancement from noise:A regenerative approach[J].Speech Communication,1991,10(1):45-57. [5] WANG D L.On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis[M].Springer,US,2005. [6] SRINIVASAN S,ROMAN N,WANG D L.Binary and ratiotime-frequency masks for robust speech recognition[J].Speech Communication,2006,48(11):1486-1501. [7] OORD A,DIELEMAN S,ZEN H,et al.Wavenet:A generative model for raw audio[J].arXiv:1609.03499,2016. [8] QIAN K,ZHANG Y,CHANG S,et al.Speech EnhancementUsing Bayesian Wavenet[C]//Interspeech.2017:2013-2017. [9] RETHAGE D,PONS J,SERRA X.A wavenet for speech de-noising[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:5069-5073. [10] PASCUAL S,BONAFONTE A,SERRA J.SEGAN:Speech enhancement generative adversarial network[J].arXiv:1703.09452,2017. [11] PHAN H,MCLOUGHLIN I V,PHAM L,et al.ImprovingGANs for speech enhancement[J].IEEE Signal Processing Letters,2020,27:1700-1704. [12] ZHANG Z,DENG C,SHEN Y,et al.On loss functions and recurrency training for GAN-based speech enhancement systems[J].arXiv:2007.14974,2020. [13] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680. [14] MIRZA M,OSINDERO S.Conditional Generative AdversarialNets[J].Computer Science,2014:2672-2680. [15] ODENA A.Semi-supervised learning with generative adversarial networks[J].arXiv:1606.01583,2016. [16] DONAHUE J,KRÄHENBÜHL P,DARRELL T.Adversarial feature learning[J].arXiv:1605.09782,2016. [17] MAO X,LI Q,XIE H,et al.Least squares generative adversarial networks[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:2794-2802. [18] YUAN W H,SHI Y L,HU S D,et al.A Speech Enhancement Approach Based on Fusion of Time-Domain and Frequency-Domain Features[J].Computer Engineering,2021,47(10):75-81. [19] LIU H,LI Y,YUAN H Q,et al.Speech Signal Separation Based on Generative Adversarial Networks[J].Computer Enginee-ring,2020,46(1):302-308. [20] LIU S H,SUN X,LI C B.Emotion Recognition Using EEG Signals Based on Location Information Reconstruction and Time-Frequency Information Fusion[J].Computer Engineering,2021,47(12):95-102. |
[1] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 |
[2] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[3] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[4] | 徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105 |
[5] | 高志宇, 王天荆, 汪悦, 沈航, 白光伟. 基于生成对抗网络的5G网络流量预测方法 Traffic Prediction Method for 5G Network Based on Generative Adversarial Network 计算机科学, 2022, 49(4): 321-328. https://doi.org/10.11896/jsjkx.210300240 |
[6] | 黎思泉, 万永菁, 蒋翠玲. 基于生成对抗网络去影像的多基频估计算法 Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal 计算机科学, 2022, 49(3): 179-184. https://doi.org/10.11896/jsjkx.201200081 |
[7] | 石达, 芦天亮, 杜彦辉, 张建岭, 暴雨轩. 基于改进CycleGAN的人脸性别伪造图像生成模型 Generation Model of Gender-forged Face Image Based on Improved CycleGAN 计算机科学, 2022, 49(2): 31-39. https://doi.org/10.11896/jsjkx.210600012 |
[8] | 唐雨潇, 王斌君. 基于深度生成模型的人脸编辑研究进展 Research Progress of Face Editing Based on Deep Generative Model 计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108 |
[9] | 李建, 郭延明, 于天元, 武与伦, 王翔汉, 老松杨. 基于生成对抗网络的多目标类别对抗样本生成算法 Multi-target Category Adversarial Example Generating Algorithm Based on GAN 计算机科学, 2022, 49(2): 83-91. https://doi.org/10.11896/jsjkx.210800130 |
[10] | 谈馨悦, 何小海, 王正勇, 罗晓东, 卿粼波. 基于Transformer交叉注意力的文本生成图像技术 Text-to-Image Generation Technology Based on Transformer Cross Attention 计算机科学, 2022, 49(2): 107-115. https://doi.org/10.11896/jsjkx.210600085 |
[11] | 陈贵强, 何军. 自然场景下遥感图像超分辨率重建算法研究 Study on Super-resolution Reconstruction Algorithm of Remote Sensing Images in Natural Scene 计算机科学, 2022, 49(2): 116-122. https://doi.org/10.11896/jsjkx.210700095 |
[12] | 蒋宗礼, 樊珂, 张津丽. 基于生成对抗网络和元路径的异质网络表示学习 Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning 计算机科学, 2022, 49(1): 133-139. https://doi.org/10.11896/jsjkx.201000179 |
[13] | 张玮琪, 汤轶丰, 李林燕, 胡伏原. 基于场景图的段落生成序列图像方法 Image Stream From Paragraph Method Based on Scene Graph 计算机科学, 2022, 49(1): 233-240. https://doi.org/10.11896/jsjkx.201100207 |
[14] | 林椹尠, 张梦凯, 吴成茂, 郑兴宁. 利用生成对抗网络的人脸图像分步补全法 Face Image Inpainting with Generative Adversarial Network 计算机科学, 2021, 48(9): 174-180. https://doi.org/10.11896/jsjkx.200800014 |
[15] | 刘立波, 苟婷婷. 融合深度典型相关分析和对抗学习的跨模态检索 Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning 计算机科学, 2021, 48(9): 200-207. https://doi.org/10.11896/jsjkx.200600119 |
|