[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Telephony speech system performance based on the codec effect

  • Published:
Annals of Telecommunications Aims and scope Submit manuscript

Abstract

This paper is a part of our contribution to research on the enhancement of network automatic speech recognition system performance. We built a highly configurable platform by using hidden Markov models, Gaussian mixture models, and Mel frequency spectral coefficients, in addition to VoIP G.711-u and GSM codecs. To determine the optimal values for maximum performance, different acoustic models are prepared by varying the hidden Markov models (from 3 to 5) and Gaussian mixture models (8–16-32) with 13 feature extraction coefficients. Additionally, our generated acoustic models are tested by unencoded and encoded speech data based on G.711 and GSM codecs. The best parameterization performance is obtained for 3 HMM, 8–16 GMMs, and G.711 codecs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The speech database utilized in this study belongs to the laboratory and is its property.

References

  1. Walid M, Bousselmi S, Dabbabi K, Cherif A (2019) Real-time implementation of isolated-word speech recognition system on raspberry Pi 3 using WAT-MFCC. IJCSNS 19(3):42

    Google Scholar 

  2. Hamidi M, Zealouk O, Satori H, Laaidi N, Salek A (2022) COVID-19 assessment using HMM cough recognition system. Int J Inf Technol 1–9

  3. Kim HK, Cox RV (2001) A bitstream-based front-end for wireless speech recognition on IS-136 communications system. IEEE Trans Speech Audio Process 9(5):558–568

    Article  Google Scholar 

  4. Lilly BT, Paliwal KK (1996) Effect of speech coders on speech recognition performance. In Proceedings of ICSLP, 2344–2347

  5. Das TK, Nahar KM (2016) A voice identification system using hidden Markov model. Indian J Sci Technol 9(4)

  6. Satori H, Elhaoussi F (2014) Investigation Amazigh speech recognition using CMU tools. Int J Speech Technol 17(3):235–243

    Article  Google Scholar 

  7. Karan B, Sahoo J, Sahu PK (2015) Automatic speech recognition based Odia system. In Microwave, Optical and Communication Engineering (ICMOCE), International Conference on (pp. 353–356). IEEE

  8. Micolini O, Herrera A, Erlang AM (2013) Traffic analysis over a VoIP server. 11(1):370–375

  9. Handley M, Schulzrinne H, Schooler H et al (1999) RFC 2543. Session Initiation Protocol, SIP

    Google Scholar 

  10. RFC3550-IETF, R. T. P. (2003) A transport protocol for real-time applications internet engineering Task Force

  11. Kumar A, Thorenoor SG (2011) Analysis of IP Network for different quality of service. In International Symposium on Computing, Communication, and Control (ISCCC), Proc. of CSIT Vol. 1

  12. Karapantazis S, Pavlidou FN (2009) VoIP: a comprehensive survey on a promising technology. Comput Netw 53(12):2050–2090

    Article  Google Scholar 

  13. Zealouk O, Satori H, Hamidi M, Laaidi N, Satori K (2018) Vocal parameters analysis of smoker using Amazigh language. Int J Speech Technol 21(1):85–91

    Article  Google Scholar 

  14. Zealouk O, Satori H, Hamidi M, Satori K (2019) Speech recognition for moroccan dialects: feature extraction and classification methods. J Adv Res Dyn Control Syst 11(2):1401–1408

    Google Scholar 

  15. Lounnas K, Abbas M, Lichouri M, Hamidi M, Satori H, Teffahi H (2022) Enhancement of spoken digits recognition for under-resourced languages: case of Algerian and Moroccan dialects. Int J Speech Technol 25(2):443–455

    Article  Google Scholar 

  16. Zealouk O, Satori H, Hamidi M, Satori K (2018. Voice pathology assessment based on automatic speech recognition using Amazigh digits. In Proceedings of the 2nd International Conference on Smart Digital Environment. ACM, pp. 100–105

  17. Hamidi M, Satori H, Zealouk O, Satori K, Laaidi N (2018) Interactive voice response server voice network administration using hidden markov model speech recognition system. In 2018 Second World Conference on Smart Trends in Systems, Secur Sustain (WorldS4) (pp. 16–21). IEEE

  18. Zealouk O, Hamidi M, Satori H, Satori K (2020) Amazigh digits speech recognition system under noise car environment. In Embedded systems and artificial intelligence: Proceedings of ESAI 2019, Fez, Morocco (pp. 421–428). Springer Singapore

  19. Boutazart Y, Satori H, Anselme RAM, Hamidi M, Satori K (2023) COVID-19 dataset clustering based on K-means and EM algorithms. Int J Adv Comput Sci Appl 14(3):924–934

    Google Scholar 

  20. Zheng F, Zhang G, Song Z (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16(6):582–589

    Article  MATH  Google Scholar 

  21. Shattuck-Hufnagel S, Klatt DH (1979) The limited use of distinctive features and markedness in speech production: evidence from speech error data. J Verbal Learn Verbal Behav 18(1):41–55

    Article  Google Scholar 

  22. Fosler-Lussier E, Morgan N (1999) Effects of speaking rate and word frequency on pronunciations in convertional speech. Speech Commun 29(2–4):137–158

    Article  Google Scholar 

  23. Lero RD, Exton C, Le Gear A (2019) Communications using a speech-to-text-to-speech pipeline. In 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob) (pp. 1–6). IEEE

  24. Drude L, Heymann J, Schwarz A, Valin JM (2021) Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget. Preprint arXiv:2106.07994

  25. Das S, Choudhury P (2020) Evaluation of perceived speech quality for VoIP codecs under different loudness and background noise condition. In Proceedings of the 21st International Conference on Distributed Computing and Networking (pp. 1–5)

  26. Bakri A, Amrouche A, Abbas M, Bouchakour L (2018) Automatic speech recognition for VoIP with packet loss concealment. Procedia Comput Sci 128:72–78

    Article  Google Scholar 

  27. Hamidi M, Zealouk O, Satori H (2023) Automatic speech recognition analysis over wireless networks. In: Bhateja, V., Yang, XS., Chun-Wei Lin, J., Das, R. (eds) Intelligent data engineering and analytics. FICTA 2022. Smart Innovation, Systems and Technologies, vol 327. Springer, Singapore

  28. Shah SAA, ul Asar A, Shaukat SF (2009) Neural network solution for secure interactive voice response. World Appl Sci J 6(9):1264–1269, ISSN 1818- 4952

  29. Ahmad J, Fiaz M, Kwon SI, Sodanil M, Vo B, Baik SW (2016) Gender identification using MFCC for telephone applications-a comparative study, arXiv preprint arXiv: 1601.01577

  30. Hamidi M, Satori H, Zealouk O, Satori K (2020) Amazigh digits through interactive speech recognition system in noisy environment. Int J Speech Technol 23(1):101–109

    Article  Google Scholar 

  31. Hamidi M, Satori H, Zealouk O, Satori K (2020) Interactive voice application-based amazigh speech recognition. In Embedded Systems and Artificial Intelligence (pp. 271–279). Springer, Singapore

  32. Hamidi M, Satori H, Zealouk O, Satori K (2019) Speech coding effect on amazigh alphabet speech recognition performance. J Adv Res Dyn Control Syst 11(2):1392–1400

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Hamidi.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamidi, M., Zealouk, O. & Satori, H. Telephony speech system performance based on the codec effect. Ann. Telecommun. 78, 617–625 (2023). https://doi.org/10.1007/s12243-023-00968-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12243-023-00968-5

Keywords

Navigation