Abstract
In order to improve the impact of noise on the robustness and discrimination of the speech perceptual hashing scheme, improve retrieval efficiency and retrieval accuracy, and protect the privacy of the cloud speech data, a retrieval method for encrypted speech based on improved power normalized cepstrum coefficients (PNCC) and perceptual hashing was proposed in the paper. Firstly, the original speech was encrypted by Henon chaotic map inter-frame scrambling encryption algorithm before uploading to the encrypted speech library in cloud server. Secondly, the discrete wavelet transform (DWT) and first-order difference coefficient were used to improve the PNCC feature extraction algorithm to extract speech features, and the principal component analysis (PCA) was used to reduce high-dimensional audio features to one dimension to form frame features that can represent the speech segment. Finally, the frame features are constructed as binary hashing sequences using hash functions and upload it to the system hashing index table in the cloud. When the user retrieves, the hashing sequence of query speech is extracted and matched with the encrypted speech features by normalized hamming distance in the cloud system hashing index table to obtain the retrieval result. Experimental results show that compared with the existing methods, the proposed method has good robustness and discrimination, and improves retrieval efficiency and retrieval accuracy, the security of cloud speech data is improved. In addition, the proposed method has good recognition ability under simulated real noise environment.
Similar content being viewed by others
References
Abdullah RSAR, Saleh NL, Ahmad SMS, Salah AA, Rashid NE (2019) Ambiguity function analysis of human echolocator waveform by using Gammatone filter processing. J Eng 2019(20):6935–6939. https://doi.org/10.1049/joe.2019.0535
Alasadi AA, Deshmukh RR, Waghmare SD (2019) Review of Modgdf & PNCC techniques for features extraction in speech recognition. In 2019 IEEE international conference on electrical, computer and communication technologies (ICECCT). IEEE 1-7. https://doi.org/10.1109/ICECCT.2019.8869154
Ali Z, Talha M (2018) Innovative method for unsupervised voice activity detection and classification of audio segments. IEEE Access 6:15494–15504. https://doi.org/10.1109/ACCESS.2018.2805845
Bai J, Shi YY, Xue PY, Guo QY (2019) CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction. J Xidian Univ 46(1):86–92. https://doi.org/10.19665/j.issn1001-2400.2019.01.014
Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10(6):2301–2314. https://doi.org/10.1007/s12652-018-0828-x
Elzaher MFA, Shalaby M, El Ramly SH (2016) An Arnold cat map-based chaotic approach for securing voice communication. In proceedings of the 10th international conference on informatics and systems. ACM 329-331. https://doi.org/10.1145/2908446.2908508
He SF, Zhao H (2017) A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing. Comput Sci Inf Syst 14(3):703–718. https://doi.org/10.2298/CSIS170112024H
Ibtihal M, Hassan N (2017) Homomorphic encryption as a service for outsourced images in mobile cloud computing environment. Int J Cloud Appl Comput 7(2):27–40. https://doi.org/10.4018/IJCAC.2017040103
Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Trans Audio, Speech, Language Process 24(7):1315–1329. https://doi.org/10.1109/TASLP.2016.2545928
Kowsigan M, Balasubramanie P (2019) An efficient performance evaluation model for the resource clusters in cloud environment using continuous time Markov chain and Poisson process. Clust Comput 22(5):12411–12419. https://doi.org/10.1007/s10586-017-1640-7
Lee SC, Wang JF, Chen MH (2018) Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. Sensors 18(7):1–12. https://doi.org/10.3390/s18072068
Li WJ (2014) A study of encryption technology based on the analog voice. Master thesis, Xidian University (in Chinese), Xian, China.
Nair UR, Birajdar GK (2016) A secure audio watermarking employing AES technique. In 2016 international conference on inventive computation technologies (ICICT). IEEE 3:1–5. https://doi.org/10.1109/INVENTIVE.2016.7830133
Nayyar RK, Nair S, Patil O, Pawar R, Lolage A (2017) Content-based auto-tagging of audios using deep learning. In international conference on big data, IoT and data science, 2017 international conference on. IEEE 30-36. https://doi.org/10.1109/BID.2017.8336569
Waldekar S, Saha G (2020) Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features. Multimed Tools Appl 79(11–12):7911–7926. https://doi.org/10.1007/s11042-019-08279-5
Wang D, Zhang XW (2015) Thchs-30: a free Chinese speech corpus. arXiv preprint arXiv:1512.01882. https://arxiv.org/abs/1512.01882
Wang HX, Hao GY (2015) Encryption speech perceptual hashing algorithm and retrieval scheme based on time and frequency domain change characteristics. China patent, CN104835499A, 2015-08-12.
Wang XH, Yao PC, Ma LP, Wang WJ (2020) Algorithm for extraction of features of robot speech control in the factory environment. J Xidian Univ 47(2):16–22. https://doi.org/10.19665/j.issn1001-2400.2020.02.003
Wu JF, Qin HB, Hua YZ, Fan LY (2018) Pitch estimation and voicing classification using reconstructed spectrum from MFCC. IEICE Trans Inf Syst 101(2):556–559. https://doi.org/10.1587/transinf.2017EDL8162
Zhang HM, Wang GY, Jin PP (2017) Design of VOIP chaotic voice encryption system based on P2P. J Hangzhou Dianzi Univ (Natural Science Edition) 37(2):5–9. https://doi.org/10.13954/j.cnki.hdu.2017.02.002
Zhang K, Zhang G, Jiang C, Yang YS (2016) Research and implementation of security cipher-text clustered index based on B+ tree. In 2016 international conference on network and information Systems for Computers (ICNISC). IEEE 274-278. https://doi.org/10.1109/ICNISC.2016.067
Zhang Q, Zhou L, Zhang T, Zhang D (2019) A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing. Multimed Tools Appl 78(13):17825–17846. https://doi.org/10.1007/s11042-019-7180-9
Zhang Q, Ge Z, Hu Y, Bai J, Huang Y (2020) An encrypted speech retrieval algorithm based on chirp-Z transform and perceptual hashing second feature extraction. Multimed Tools Appl 79(9–10):6337–6361. https://doi.org/10.1007/s11042-019-08450-y
Zhang QY, Xing PF, Huang YB, Dong RH, Yang ZP (2016) Perceptual hashing algorithm for multi-format. J Beijing Univ Posts Telecomm 39(4):77–82. https://doi.org/10.13190/j.jbupt.2016.04.015
Zhang QY, Ge ZX, Qiao SB (2018) An efficient retrieval method of encrypted speech based on frequency band variance. J Inform Hiding Multimedia Signal Process 9(6):1452–1463
Zhang ZT (2018) Research of speech recognition technology based on wavelet and PNCC characteristic parameters. Chongqing University, Chongqing, China, Master thesis
Zhao H, He SF (2016) A retrieval algorithm for encrypted speech based on perceptual hashing. In 2016 12th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE 1840-1845. https://doi.org/10.1109/FSKD.2016.7603458
Zhong SM, Kuang P, Zhuang HS, Feng HD, Wang JY, Zhang H (2019) A robust gender recognition scheme for telephone speech based on PNCC and fundamental frequency. J South Chine Normal (Natural Science Edition) 51(6):118–122. https://doi.org/10.6054/j.jscnun.2019111
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61862041, 61363078). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Qy., Bai, J. & Xu, Fj. A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing. Multimed Tools Appl 81, 15127–15151 (2022). https://doi.org/10.1007/s11042-022-12560-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12560-5