Abstract
This paper is a part of our contribution to research on the enhancement of network automatic speech recognition system performance. We built a highly configurable platform by using hidden Markov models, Gaussian mixture models, and Mel frequency spectral coefficients, in addition to VoIP G.711-u and GSM codecs. To determine the optimal values for maximum performance, different acoustic models are prepared by varying the hidden Markov models (from 3 to 5) and Gaussian mixture models (8–16-32) with 13 feature extraction coefficients. Additionally, our generated acoustic models are tested by unencoded and encoded speech data based on G.711 and GSM codecs. The best parameterization performance is obtained for 3 HMM, 8–16 GMMs, and G.711 codecs.
Similar content being viewed by others
Data Availability
The speech database utilized in this study belongs to the laboratory and is its property.
References
Walid M, Bousselmi S, Dabbabi K, Cherif A (2019) Real-time implementation of isolated-word speech recognition system on raspberry Pi 3 using WAT-MFCC. IJCSNS 19(3):42
Hamidi M, Zealouk O, Satori H, Laaidi N, Salek A (2022) COVID-19 assessment using HMM cough recognition system. Int J Inf Technol 1–9
Kim HK, Cox RV (2001) A bitstream-based front-end for wireless speech recognition on IS-136 communications system. IEEE Trans Speech Audio Process 9(5):558–568
Lilly BT, Paliwal KK (1996) Effect of speech coders on speech recognition performance. In Proceedings of ICSLP, 2344–2347
Das TK, Nahar KM (2016) A voice identification system using hidden Markov model. Indian J Sci Technol 9(4)
Satori H, Elhaoussi F (2014) Investigation Amazigh speech recognition using CMU tools. Int J Speech Technol 17(3):235–243
Karan B, Sahoo J, Sahu PK (2015) Automatic speech recognition based Odia system. In Microwave, Optical and Communication Engineering (ICMOCE), International Conference on (pp. 353–356). IEEE
Micolini O, Herrera A, Erlang AM (2013) Traffic analysis over a VoIP server. 11(1):370–375
Handley M, Schulzrinne H, Schooler H et al (1999) RFC 2543. Session Initiation Protocol, SIP
RFC3550-IETF, R. T. P. (2003) A transport protocol for real-time applications internet engineering Task Force
Kumar A, Thorenoor SG (2011) Analysis of IP Network for different quality of service. In International Symposium on Computing, Communication, and Control (ISCCC), Proc. of CSIT Vol. 1
Karapantazis S, Pavlidou FN (2009) VoIP: a comprehensive survey on a promising technology. Comput Netw 53(12):2050–2090
Zealouk O, Satori H, Hamidi M, Laaidi N, Satori K (2018) Vocal parameters analysis of smoker using Amazigh language. Int J Speech Technol 21(1):85–91
Zealouk O, Satori H, Hamidi M, Satori K (2019) Speech recognition for moroccan dialects: feature extraction and classification methods. J Adv Res Dyn Control Syst 11(2):1401–1408
Lounnas K, Abbas M, Lichouri M, Hamidi M, Satori H, Teffahi H (2022) Enhancement of spoken digits recognition for under-resourced languages: case of Algerian and Moroccan dialects. Int J Speech Technol 25(2):443–455
Zealouk O, Satori H, Hamidi M, Satori K (2018. Voice pathology assessment based on automatic speech recognition using Amazigh digits. In Proceedings of the 2nd International Conference on Smart Digital Environment. ACM, pp. 100–105
Hamidi M, Satori H, Zealouk O, Satori K, Laaidi N (2018) Interactive voice response server voice network administration using hidden markov model speech recognition system. In 2018 Second World Conference on Smart Trends in Systems, Secur Sustain (WorldS4) (pp. 16–21). IEEE
Zealouk O, Hamidi M, Satori H, Satori K (2020) Amazigh digits speech recognition system under noise car environment. In Embedded systems and artificial intelligence: Proceedings of ESAI 2019, Fez, Morocco (pp. 421–428). Springer Singapore
Boutazart Y, Satori H, Anselme RAM, Hamidi M, Satori K (2023) COVID-19 dataset clustering based on K-means and EM algorithms. Int J Adv Comput Sci Appl 14(3):924–934
Zheng F, Zhang G, Song Z (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16(6):582–589
Shattuck-Hufnagel S, Klatt DH (1979) The limited use of distinctive features and markedness in speech production: evidence from speech error data. J Verbal Learn Verbal Behav 18(1):41–55
Fosler-Lussier E, Morgan N (1999) Effects of speaking rate and word frequency on pronunciations in convertional speech. Speech Commun 29(2–4):137–158
Lero RD, Exton C, Le Gear A (2019) Communications using a speech-to-text-to-speech pipeline. In 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob) (pp. 1–6). IEEE
Drude L, Heymann J, Schwarz A, Valin JM (2021) Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget. Preprint arXiv:2106.07994
Das S, Choudhury P (2020) Evaluation of perceived speech quality for VoIP codecs under different loudness and background noise condition. In Proceedings of the 21st International Conference on Distributed Computing and Networking (pp. 1–5)
Bakri A, Amrouche A, Abbas M, Bouchakour L (2018) Automatic speech recognition for VoIP with packet loss concealment. Procedia Comput Sci 128:72–78
Hamidi M, Zealouk O, Satori H (2023) Automatic speech recognition analysis over wireless networks. In: Bhateja, V., Yang, XS., Chun-Wei Lin, J., Das, R. (eds) Intelligent data engineering and analytics. FICTA 2022. Smart Innovation, Systems and Technologies, vol 327. Springer, Singapore
Shah SAA, ul Asar A, Shaukat SF (2009) Neural network solution for secure interactive voice response. World Appl Sci J 6(9):1264–1269, ISSN 1818- 4952
Ahmad J, Fiaz M, Kwon SI, Sodanil M, Vo B, Baik SW (2016) Gender identification using MFCC for telephone applications-a comparative study, arXiv preprint arXiv: 1601.01577
Hamidi M, Satori H, Zealouk O, Satori K (2020) Amazigh digits through interactive speech recognition system in noisy environment. Int J Speech Technol 23(1):101–109
Hamidi M, Satori H, Zealouk O, Satori K (2020) Interactive voice application-based amazigh speech recognition. In Embedded Systems and Artificial Intelligence (pp. 271–279). Springer, Singapore
Hamidi M, Satori H, Zealouk O, Satori K (2019) Speech coding effect on amazigh alphabet speech recognition performance. J Adv Res Dyn Control Syst 11(2):1392–1400
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hamidi, M., Zealouk, O. & Satori, H. Telephony speech system performance based on the codec effect. Ann. Telecommun. 78, 617–625 (2023). https://doi.org/10.1007/s12243-023-00968-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-023-00968-5