research-article

Cochleagram-based identification of electronic disguised voice with pitch scaling in the noisy environment

Authors:

Wen Dou,

Hongxia Wang,

Ruixi YangAuthors Info & Claims

ACM TURC '19: Proceedings of the ACM Turing Celebration Conference - China

Article No.: 136, Pages 1 - 8

https://doi.org/10.1145/3321408.3326664

Published: 17 May 2019 Publication History

Get Access

Abstract

Audio editing software makes voice camouflage easily. That threats to the security and authenticity of audio. Whether the audio forensics can identify voice disguised by software has become an important issue. At the same time, since the audio used in daily life always contains noise, the other key point is improving the anti-noise performance. This paper proposed an algorithm on identification of electronic disguised voice with pitch scaling, which has high anti-noise performance. The algorithm is based on Least Mean Square (LMS) filter and cochleagram, an acoustic characteristic which could reflects the auditory features of human ear. In the algorithm, the noisy voice is sent to the LMS filter for noise reduction. Then cochleagram is extracted from the output signal of LMS filter. The cochleagram is handled at different resolution to construct the Least Mean Square-Multi Resolution Cochleagram (LMS-MRCG) feature. the Gaussian Mixture Model-Universe Background Model (GMM-UBM) is used as detection classifier to identify disguised voice. The pitch scaling type contains 5 different pitch for each speaker's voice. In the end the algorithm needs to identify the pitch type of each speaker. The results show that the algorithm has high detection rate Voice with different genders and languages both can be identified. Under the influence of various environmental noises such as Gaussian white noise, pink noise, factory noise, vehicle noise, etc. the algorithm maintains stable identification performance. Especially in low SNR environment, algorithm can maintain high accuracy of forensic classification. In the environment of noise-free, overall identification rate can reach 97.50%. In the low SNR environment as low as -5dB, identification rate can still remain above 85.83%.

References

[1]

H. J. Wu, Y. Wang and J. W. Huang (2018), Identification of Electronic Disguised Voices, IEEE Transactions on Information Forensics and Security, 9(3), 489--500.

Abstract

References

Cited By

Recommendations

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Improved Language Identification in Presence of Speech Coding

Fusion of TEO Phase with MFCC Features for Speaker Verification

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations