A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing

244 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In order to improve the impact of noise on the robustness and discrimination of the speech perceptual hashing scheme, improve retrieval efficiency and retrieval accuracy, and protect the privacy of the cloud speech data, a retrieval method for encrypted speech based on improved power normalized cepstrum coefficients (PNCC) and perceptual hashing was proposed in the paper. Firstly, the original speech was encrypted by Henon chaotic map inter-frame scrambling encryption algorithm before uploading to the encrypted speech library in cloud server. Secondly, the discrete wavelet transform (DWT) and first-order difference coefficient were used to improve the PNCC feature extraction algorithm to extract speech features, and the principal component analysis (PCA) was used to reduce high-dimensional audio features to one dimension to form frame features that can represent the speech segment. Finally, the frame features are constructed as binary hashing sequences using hash functions and upload it to the system hashing index table in the cloud. When the user retrieves, the hashing sequence of query speech is extracted and matched with the encrypted speech features by normalized hamming distance in the cloud system hashing index table to obtain the retrieval result. Experimental results show that compared with the existing methods, the proposed method has good robustness and discrimination, and improves retrieval efficiency and retrieval accuracy, the security of cloud speech data is improved. In addition, the proposed method has good recognition ability under simulated real noise environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

An encrypted speech retrieval algorithm based on Chirp-Z transform and perceptual hashing second feature extraction

Article 14 December 2019

A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing

Article 15 January 2019

An efficient retrieval approach for encrypted speech based on biological hashing and spectral subtraction

Article 12 August 2020

References

Abdullah RSAR, Saleh NL, Ahmad SMS, Salah AA, Rashid NE (2019) Ambiguity function analysis of human echolocator waveform by using Gammatone filter processing. J Eng 2019(20):6935–6939. https://doi.org/10.1049/joe.2019.0535
Article Google Scholar
Alasadi AA, Deshmukh RR, Waghmare SD (2019) Review of Modgdf & PNCC techniques for features extraction in speech recognition. In 2019 IEEE international conference on electrical, computer and communication technologies (ICECCT). IEEE 1-7. https://doi.org/10.1109/ICECCT.2019.8869154
Ali Z, Talha M (2018) Innovative method for unsupervised voice activity detection and classification of audio segments. IEEE Access 6:15494–15504. https://doi.org/10.1109/ACCESS.2018.2805845
Article Google Scholar
Bai J, Shi YY, Xue PY, Guo QY (2019) CFCC feature extraction for fusion of the power-law nonlinearity function and spectral subtraction. J Xidian Univ 46(1):86–92. https://doi.org/10.19665/j.issn1001-2400.2019.01.014
Article Google Scholar
Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10(6):2301–2314. https://doi.org/10.1007/s12652-018-0828-x
Article Google Scholar
Elzaher MFA, Shalaby M, El Ramly SH (2016) An Arnold cat map-based chaotic approach for securing voice communication. In proceedings of the 10th international conference on informatics and systems. ACM 329-331. https://doi.org/10.1145/2908446.2908508
He SF, Zhao H (2017) A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing. Comput Sci Inf Syst 14(3):703–718. https://doi.org/10.2298/CSIS170112024H
Article Google Scholar
Ibtihal M, Hassan N (2017) Homomorphic encryption as a service for outsourced images in mobile cloud computing environment. Int J Cloud Appl Comput 7(2):27–40. https://doi.org/10.4018/IJCAC.2017040103
Article Google Scholar
Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Trans Audio, Speech, Language Process 24(7):1315–1329. https://doi.org/10.1109/TASLP.2016.2545928
Article Google Scholar
Kowsigan M, Balasubramanie P (2019) An efficient performance evaluation model for the resource clusters in cloud environment using continuous time Markov chain and Poisson process. Clust Comput 22(5):12411–12419. https://doi.org/10.1007/s10586-017-1640-7
Article Google Scholar
Lee SC, Wang JF, Chen MH (2018) Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. Sensors 18(7):1–12. https://doi.org/10.3390/s18072068
Article Google Scholar
Li WJ (2014) A study of encryption technology based on the analog voice. Master thesis, Xidian University (in Chinese), Xian, China.
Nair UR, Birajdar GK (2016) A secure audio watermarking employing AES technique. In 2016 international conference on inventive computation technologies (ICICT). IEEE 3:1–5. https://doi.org/10.1109/INVENTIVE.2016.7830133
Article Google Scholar
Nayyar RK, Nair S, Patil O, Pawar R, Lolage A (2017) Content-based auto-tagging of audios using deep learning. In international conference on big data, IoT and data science, 2017 international conference on. IEEE 30-36. https://doi.org/10.1109/BID.2017.8336569
Waldekar S, Saha G (2020) Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features. Multimed Tools Appl 79(11–12):7911–7926. https://doi.org/10.1007/s11042-019-08279-5
Article Google Scholar
Wang D, Zhang XW (2015) Thchs-30: a free Chinese speech corpus. arXiv preprint arXiv:1512.01882. https://arxiv.org/abs/1512.01882
Wang HX, Hao GY (2015) Encryption speech perceptual hashing algorithm and retrieval scheme based on time and frequency domain change characteristics. China patent, CN104835499A, 2015-08-12.
Wang XH, Yao PC, Ma LP, Wang WJ (2020) Algorithm for extraction of features of robot speech control in the factory environment. J Xidian Univ 47(2):16–22. https://doi.org/10.19665/j.issn1001-2400.2020.02.003
Article Google Scholar
Wu JF, Qin HB, Hua YZ, Fan LY (2018) Pitch estimation and voicing classification using reconstructed spectrum from MFCC. IEICE Trans Inf Syst 101(2):556–559. https://doi.org/10.1587/transinf.2017EDL8162
Article Google Scholar
Zhang HM, Wang GY, Jin PP (2017) Design of VOIP chaotic voice encryption system based on P2P. J Hangzhou Dianzi Univ (Natural Science Edition) 37(2):5–9. https://doi.org/10.13954/j.cnki.hdu.2017.02.002
Article Google Scholar
Zhang K, Zhang G, Jiang C, Yang YS (2016) Research and implementation of security cipher-text clustered index based on B+ tree. In 2016 international conference on network and information Systems for Computers (ICNISC). IEEE 274-278. https://doi.org/10.1109/ICNISC.2016.067
Zhang Q, Zhou L, Zhang T, Zhang D (2019) A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing. Multimed Tools Appl 78(13):17825–17846. https://doi.org/10.1007/s11042-019-7180-9
Article Google Scholar
Zhang Q, Ge Z, Hu Y, Bai J, Huang Y (2020) An encrypted speech retrieval algorithm based on chirp-Z transform and perceptual hashing second feature extraction. Multimed Tools Appl 79(9–10):6337–6361. https://doi.org/10.1007/s11042-019-08450-y
Article Google Scholar
Zhang QY, Xing PF, Huang YB, Dong RH, Yang ZP (2016) Perceptual hashing algorithm for multi-format. J Beijing Univ Posts Telecomm 39(4):77–82. https://doi.org/10.13190/j.jbupt.2016.04.015
Article Google Scholar
Zhang QY, Ge ZX, Qiao SB (2018) An efficient retrieval method of encrypted speech based on frequency band variance. J Inform Hiding Multimedia Signal Process 9(6):1452–1463
Google Scholar
Zhang ZT (2018) Research of speech recognition technology based on wavelet and PNCC characteristic parameters. Chongqing University, Chongqing, China, Master thesis
Zhao H, He SF (2016) A retrieval algorithm for encrypted speech based on perceptual hashing. In 2016 12th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE 1840-1845. https://doi.org/10.1109/FSKD.2016.7603458
Zhong SM, Kuang P, Zhuang HS, Feng HD, Wang JY, Zhang H (2019) A robust gender recognition scheme for telephone speech based on PNCC and fundamental frequency. J South Chine Normal (Natural Science Edition) 51(6):118–122. https://doi.org/10.6054/j.jscnun.2019111
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61862041, 61363078). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
Qiu-yu Zhang, Jian Bai & Fu-jiu Xu

Authors

Qiu-yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Bai
View author publications
You can also search for this author in PubMed Google Scholar
Fu-jiu Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiu-yu Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Qy., Bai, J. & Xu, Fj. A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing. Multimed Tools Appl 81, 15127–15151 (2022). https://doi.org/10.1007/s11042-022-12560-5

Download citation

Received: 05 August 2020
Revised: 14 March 2021
Accepted: 31 January 2022
Published: 27 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12560-5

A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An encrypted speech retrieval algorithm based on Chirp-Z transform and perceptual hashing second feature extraction

A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing

An efficient retrieval approach for encrypted speech based on biological hashing and spectral subtraction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An encrypted speech retrieval algorithm based on Chirp-Z transform and perceptual hashing second feature extraction

A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing

An efficient retrieval approach for encrypted speech based on biological hashing and spectral subtraction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation