Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach
<p>The process of extracting the Mel spectrogram from an acoustic signal, where the output of each Mel filter is summed then combined to create the Mel spectrogram, which is visualized in terms of the amplitude of the frequency components over time.</p> "> Figure 2
<p>The anatomy of a deep neural network model, <span class="html-italic">a</span> is the number of hidden layers.</p> "> Figure 3
<p>Description of the convolutional neural networks. In (<b>A</b>), the filter size is 3 and the stride is 1, (<b>B</b>) the filter size is 2 and the stride is 1, and (<b>C</b>) the filter size is 2 and the stride is 2.</p> "> Figure 4
<p>A schematic overview of the conducted methodology.</p> "> Figure 5
<p>An illustration of implementing the first approach. In (<b>A</b>), the dataset is preprocessed and created, while (<b>B</b>) shows the utilized DNN model.</p> "> Figure 6
<p>A representation of a consultation’s acoustic signal. The first plot is the signal in time domain, the second is the ZCR, the third shows the MFCCs (showing the first 8 coefficients), and the fourth is the spectrogram representation.</p> "> Figure 7
<p>The process of converting the acoustic recordings into text and then extracting text-based features using AraVec.</p> "> Figure 8
<p>The structural design of the second approach. MP is the max-pooling operation, and BN is the batch normalization layer.</p> "> Figure 9
<p>The structure of combining the two approaches of the signal-based submodel and the transcript-based submodel.</p> "> Figure 10
<p>The heatmap representation of the confusion matrix of the best models obtained from using the MFCCs alone (<b>a</b>) and using the combination of all spectral features (<b>b</b>).</p> "> Figure 11
<p>A heatmap representation of the confusion matrices of the best models from the transcript-based approach with different embedding models.</p> "> Figure 12
<p>The convergence curves in terms of the accuracy of the best models for the three approaches using the spectral features only, the transcript features only, and the hybrid of both.</p> "> Figure 12 Cont.
<p>The convergence curves in terms of the accuracy of the best models for the three approaches using the spectral features only, the transcript features only, and the hybrid of both.</p> "> Figure 13
<p>The convergence curves in terms of the loss of the best models for the three approaches of using the spectral features only, the transcript features only, and the hybrid of both.</p> "> Figure 13 Cont.
<p>The convergence curves in terms of the loss of the best models for the three approaches of using the spectral features only, the transcript features only, and the hybrid of both.</p> ">
Abstract
:1. Introduction
- We develop a model to automate the quality prediction of medical consultations. Particularly, the contribution at this point is at the feature engineering and model development levels. The model combines spectral features from the signals and text-based features from the transcripts, which will then be used to train different structures of deep and convolutional learning models.
- We reinforce the advantages of artificial intelligence in telemedicine. The development of an automatic quality assessment model reduces the effort and time for evaluating the consultations manually by the operations team. Besides, in pandemic situations such as the emergent COVID-19 pandemic, such an approach can enhance the quality of the service and better serve callers (patients).
2. Related Works
3. Problem Definition
4. Background
4.1. MFCCs and Mel Spectrogram
- Pre-emphasizing the input signal to remove unwanted or high frequencies.
- Framing and windowing the signal, where the objective is to divide the signal into a sequence of short overlapping frames to ensure that they are stationary, where a stationary signal reflects the true statistical and temporal characteristics. The windowing is often performed using rectangular windows as the Hamming window that conceals the potential of distorted segments found at the boundaries of the windows by smoothing them.
- Applying the Fourier transform of the signals to convert them from the time domain to the frequency domain to represent them in terms of their statistical and spectral features.
- Applying filter banks (“Mel filters”) to generate frames in the Mel scale.
- Computing the logarithmic value of the magnitude of powers resulted from the Mel filters.
- Calculating the spectrum of the results produced from the previous step by applying the discrete cosine transform (DCT) that results in cepstral coefficients as represented by Equation (1), where n∈ {0, 1, … C-1}, represents the cepstral coefficients, and C is the number of MFCCs.Conventionally, the MFCCs are from 8 to 13 features, however, those 13 coefficients exhibit static features of the respective frames apart. The generation of more temporal features is done by finding the first and second derivatives of the cepstral coefficients known as the delta and delta-delta features. Accordingly, the MFCCs are extended from 13 to 39 coefficients.
4.2. Deep Neural Networks (Convnet)
5. Methodology
5.1. Data Description
5.2. Signal-Based Approach
5.2.1. Feature Extraction
5.2.2. Model Structure
5.3. Transcript-Based Approach
5.3.1. Feature Extraction
5.3.2. Model Structure
5.4. Hybrid Approach Combining Spectral Features and Transcript Features
5.5. Experimental Settings
5.6. Evaluation Criteria
6. Results
6.1. Signal-Based Results
6.2. Transcript-Based Results
6.3. Hybrid-Based Results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Mosadeghrad, A.M. Factors affecting medical service quality. Iran. J. Public Health 2014, 43, 210–220. [Google Scholar] [PubMed]
- McConnochie, K.M. Webside manner: A key to high-quality primary care telemedicine for all. Telemed. E-Health 2019, 25, 1007–1011. [Google Scholar] [CrossRef] [PubMed]
- Roy, T.; Marwala, T.; Chakraverty, S. A survey of classification techniques in speech emotion recognition. In Mathematical Methods in Interdisciplinary Sciences; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2020; pp. 33–48. [Google Scholar]
- Sharma, G.; Umapathy, K.; Krishnan, S. Trends in audio signal feature extraction methods. Appl. Acoust. 2020, 158, 107020. [Google Scholar] [CrossRef]
- Glowacz, A. Fault diagnostics of acoustic signals of loaded synchronous motor using SMOFS-25-EXPANDED and selected classifiers. Teh. Vjesn. 2016, 23, 1365–1372. [Google Scholar]
- Ranjan, J.; Patra, K.; Szalay, T.; Mia, M.; Gupta, M.K.; Song, Q.; Krolczyk, G.; Chudy, R.; Pashnyov, V.A.; Pimenov, D.Y. Artificial intelligence-based hole quality prediction in micro-drilling using multiple sensors. Sensors 2020, 20, 885. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Omari, T.; Al-Zubaidy, H. Call center performance evaluation. In Proceedings of the Canadian Conference on Electrical and Computer Engineering, Saskatoon, SK, Canada, 1–4 May 2005; pp. 1805–1808. [Google Scholar]
- Popovic, I.; Culibrk, D.; Mirkovic, M.; Vukmirovic, S. Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations. In Proceedings of the 1st International Workshop on Multimodal Conversational AI, Seattle, WA, USA, 12–16 October 2020; pp. 31–38. [Google Scholar]
- de Pinto, M.G.; Polignano, M.; Lops, P.; Semeraro, G. Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients. In Proceedings of the 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Bari, Italy, 27–29 May 2020; pp. 1–5. [Google Scholar]
- Yang, K.; Xu, H.; Gao, K. CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 521–528. [Google Scholar]
- Bae, S.M.; Ha, S.H.; Park, S.C. A web-based system for analyzing the voices of call center customers in the service industry. Expert Syst. Appl. 2005, 28, 29–41. [Google Scholar] [CrossRef]
- Takeuchi, H.; Subramaniam, L.V.; Nasukawa, T.; Roy, S. Automatic identification of important segments and expressions for mining of business-oriented conversations at contact centers. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; pp. 458–467. [Google Scholar]
- Garnier-Rizet, M.; Adda, G.; Cailliau, F.; Gauvain, J.L.; Guillemin-Lanne, S.; Lamel, L.; Vanni, S.; Waast-Richard, C. CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content. In Proceedings of the LREC 2008, Marrakech, Morocco, 26 May–1 June 2008. [Google Scholar]
- Pandharipande, M.A.; Kopparapu, S.K. A novel approach to identify problematic call center conversations. In Proceedings of the 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE), Bangkok, Thailand, 30 May–1 June 2012; pp. 1–5. [Google Scholar]
- Pallotta, V.; Delmonte, R.; Vrieling, L.; Walker, D. Interaction Mining: The new Frontier of Call Center Analytics. In Proceedings of the DART@AI*IA, Palermo, Italy, 17 September 2011. [Google Scholar]
- Kopparapu, S.K. Non-Linguistic Analysis of Call Center Conversations; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Karakus, B.; Aydin, G. Call center performance evaluation using big data analytics. In Proceedings of the 2016 International Symposium on Networks, Computers and Communications (ISNCC), Yasmine Hammamet, Tunisia, 11–13 May 2016; pp. 1–6. [Google Scholar]
- Chen, L.; Tao, J.; Ghaffarzadegan, S.; Qian, Y. End-to-end neural network based automated speech scoring. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 6234–6238. [Google Scholar]
- Perera, K.; Priyadarshana, Y.; Gunathunga, K.; Ranathunga, L.; Karunarathne, P.; Thanthriwatta, T. Automatic Evaluation Software for Contact Centre Agents’ voice Handling Performance. Int. J. Sci. Res. Publ. 2019, 5, 1–8. [Google Scholar]
- Ahmed, A.; Shaalan, K.; Toral, S.; Hifny, Y. A Multimodal Approach to improve Performance Evaluation of Call Center Agent. Sensors 2021, 21, 2720. [Google Scholar] [CrossRef]
- Vergin, R.; O’Shaughnessy, D.; Farhat, A. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech Audio Process. 1999, 7, 525–532. [Google Scholar] [CrossRef]
- Tsai, W.C.; Shih, Y.J.; Huang, N.T. Hardware-Accelerated, Short-Term Processing Voice and Nonvoice Sound Recognitions for Electric Equipment Control. Electronics 2019, 8, 924. [Google Scholar] [CrossRef] [Green Version]
- Rao, K.S.; Manjunath, K. Speech Recognition Using Articulatory and Excitation Source Features; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Bansal, V.; Pahwa, G.; Kannan, N. Cough Classification for COVID-19 based on audio mfcc features using Convolutional Neural Networks. In Proceedings of the 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2–4 October 2020; pp. 604–608. [Google Scholar]
- Chabot, P.; Bouserhal, R.E.; Cardinal, P.; Voix, J. Detection and classification of human-produced nonverbal audio events. Appl. Acoust. 2021, 171, 107643. [Google Scholar] [CrossRef]
- Sandi, C.; Riadi, A.O.P.; Khobir, F.; Laksono, A. Frequency Cepstral Coefficient and Learning Vector Quantization Method for Optimization of Human Voice Recognition System. Solid State Technol. 2020, 63, 3415–3423. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017, 234, 11–26. [Google Scholar] [CrossRef]
- Jia, Y.; Wang, M.; Wang, Y. Network intrusion detection algorithm based on deep neural network. IET Inf. Secur. 2018, 13, 48–53. [Google Scholar] [CrossRef]
- Wu, P.; Guo, H. LuNET: A deep neural network for network intrusion detection. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 617–624. [Google Scholar]
- Fredes, J.; Novoa, J.; King, S.; Stern, R.M.; Yoma, N.B. Locally normalized filter banks applied to deep neural-network-based robust speech recognition. IEEE Signal Process. Lett. 2017, 24, 377–381. [Google Scholar] [CrossRef]
- Seki, H.; Yamamoto, K.; Nakagawa, S. A deep neural network integrated with filterbank learning for speech recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 5480–5484. [Google Scholar]
- Jain, N.; Kumar, S.; Kumar, A.; Shamsolmoali, P.; Zareapoor, M. Hybrid deep neural networks for face emotion recognition. Pattern Recognit. Lett. 2018, 115, 101–106. [Google Scholar] [CrossRef]
- Bechtel, M.G.; McEllhiney, E.; Kim, M.; Yun, H. Deeppicar: A low-cost deep neural network-based autonomous car. In Proceedings of the 2018 IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Hakodate, Japan, 28–31 August 2018; pp. 11–21. [Google Scholar]
- Tian, Y.; Pei, K.; Jana, S.; Ray, B. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, 27 May–3 June 2018; pp. 303–314. [Google Scholar]
- Roy, A.; Sun, J.; Mahoney, R.; Alonzi, L.; Adams, S.; Beling, P. Deep learning detecting fraud in credit card transactions. In Proceedings of the 2018 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 27 April 2018; pp. 129–134. [Google Scholar]
- Yuan, S.; Wu, X.; Li, J.; Lu, A. Spectrum-based deep neural networks for fraud detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 2419–2422. [Google Scholar]
- Kollias, D.; Tagaris, A.; Stafylopatis, A.; Kollias, S.; Tagaris, G. Deep neural architectures for prediction in healthcare. Complex Intell. Syst. 2018, 4, 119–131. [Google Scholar] [CrossRef] [Green Version]
- Soliman, A.B.; Eissa, K.; El-Beltagy, S.R. Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Comput. Sci. 2017, 117, 256–265. [Google Scholar] [CrossRef]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-scale machine learning on heterogeneous systems. arXiv 2015, arXiv:1603.04467. [Google Scholar]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; Volume 8, pp. 18–25. [Google Scholar]
- Řehůřek, R.; Sojka, P. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks; ELRA: Valletta, Malta, 2010; pp. 45–50. [Google Scholar]
Precision | Recall | F1-score | Accuracy | Loss | L.R. | Model | |||
---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | ||||
0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 6.116 | 1 × 10 | Stacked DNN |
0.463 | 0.551 | 0.438 | 0.550 | 0.449 | 0.551 | 0.573 | 0.688 | 5 × 10 | |
0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 6.116 | 1 × 10 | |
0.520 | 0.588 | 0.398 | 0.577 | 0.451 | 0.577 | 0.614 | 0.690 | 5 × 10 | |
0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 9.221 | 1 × 10 | |
0.401 | 0.502 | 0.367 | 0.502 | 0.384 | 0.502 | 0.530 | 0.692 | 5 × 10 |
Precision | Recall | F1-score | Accuracy | Loss | L.R. | Model | |||
---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | ||||
0.398 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 9.221 | 1 × 10 | Stacked DNN |
0.455 | 0.557 | 0.586 | 0.560 | 0.512 | 0.551 | 0.555 | 0.695 | 5 × 10 | |
0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 6.116 | 1 × 10 | |
0.485 | 0.589 | 0.625 | 0.592 | 0.546 | 0.582 | 0.586 | 0.691 | 5 × 10 | |
0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 9.221 | 1 × 10 | |
0.448 | 0.589 | 0.813 | 0.575 | 0.578 | 0.519 | 0.526 | 0.694 | 5 × 10 |
Embedding Model | Precision | Recall | F1-score | Accuracy | Loss | E.D. | L.R. | |||
---|---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | |||||
AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 71.637 | 100 | 1 × 10 |
AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.494 | 5 × 10 | |
AraVec-Twitter-CBOW | 0.440 | 0.546 | 0.096 | 0.514 | 0.158 | 0.463 | 0.636 | 4.999 | 5 × 10 | |
AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 82.690 | 300 | 1 × 10 |
AraVec-Twitter-CBOW | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 4.419 | 5 × 10 | |
AraVec-Twitter-CBOW | 0.356 | 0.501 | 0.544 | 0.501 | 0.431 | 0.484 | 0.489 | 3.297 | 5 × 10 | |
AraVec-Twitter-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 78.438 | 100 | 1 × 10 |
AraVec-Twitter-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.406 | 5 × 10 | |
AraVec-Twitter-SG | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 5.240 | 5 × 10 | |
AraVec-Twitter-SG | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 84.065 | 300 | 1 × 10 |
AraVec-Twitter-SG | 0.356 | 0.678 | 1.000 | 0.502 | 0.525 | 0.267 | 0.358 | 4.355 | 5 × 10 | |
AraVec-Twitter-SG | 0.295 | 0.465 | 0.114 | 0.482 | 0.165 | 0.446 | 0.589 | 3.526 | 5 × 10 | |
AraVec-WiKi-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 1.420 | 100 | 1 × 10 |
AraVec-WiKi-SG | 0.000 | 0.321 | 0.000 | 0.495 | 0.000 | 0.390 | 0.639 | 5.856 | 5 × 10 | |
AraVec-WiKi-SG | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 4.662 | 5 × 10 | |
AraVec-WiKi-CBOW | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 79.096 | 100 | 1 × 10 |
AraVec-WiKi-CBOW | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 6.471 | 5 × 10 | |
AraVec-WiKi-CBOW | 0.378 | 0.518 | 0.395 | 0.519 | 0.386 | 0.518 | 0.555 | 5.008 | 5 × 10 |
Vocabs Size | Embedding Model | Precision | Recall | F1-score | Acc. | Loss | E.D. | |||
---|---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | |||||
9000 | AraVec-Wiki-CBOW | 0.355 | 0.511 | 0.991 | 0.500 | 0.523 | 0.271 | 0.358 | 5.834 | 100 |
18,000 | 0.261 | 0.449 | 0.053 | 0.485 | 0.088 | 0.420 | 0.611 | 6.895 | ||
27,000 | 0.365 | 0.558 | 0.939 | 0.520 | 0.526 | 0.352 | 0.399 | 5.139 | ||
9000 | AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.211 | 100 |
18,000 | 0.361 | 0.510 | 0.728 | 0.509 | 0.483 | 0.443 | 0.445 | 6.244 | ||
27,000 | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 6.414 | ||
9000 | AraVec-Twitter-SG | 0.362 | 0.626 | 0.991 | 0.515 | 0.531 | 0.302 | 0.377 | 4.556 | 300 |
18,000 | 0.276 | 0.456 | 0.070 | 0.484 | 0.112 | 0.429 | 0.604 | 5.237 | ||
27,000 | 0.356 | 0.504 | 0.868 | 0.502 | 0.505 | 0.365 | 0.396 | 4.713 | ||
9000 | AraVec-Twitter-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.036 | 100 |
18,000 | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 6.096 | ||
27,000 | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 5.870 |
Vocab Size | E.M. | Precision | Recall | F1-score | Acc. | Loss | Epochs | E.W. | E.D. | L.R. | B.S. | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | |||||||||
9000 | SG | 0.000 | 0.299 | 0.000 | 0.495 | 0.000 | 0.373 | 0.595 | 2.085 | 30 | Non | 300 | 5 × 10 | 128 |
SG | 0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 11.683 | 9 × 10 | |||||
SG | 0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 2.299 | 5 × 10 | |||||
All | SG | 0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 2.020 | 30 | Non | 300 | 5 × 10 | 128 |
SG | 0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 11.685 | 9 × 10 | |||||
SG | 0.394 | 0.322 | 0.977 | 0.491 | 0.562 | 0.286 | 0.393 | 2.162 | 5 × 10 | |||||
All | CBOW | 0.400 | 0.501 | 0.250 | 0.501 | 0.308 | 0.488 | 0.551 | 3.315 | 30 | Non | 100 | 5 × 10 | 128 |
CBOW | 0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 11.616 | 9 × 10 | |||||
CBOW | 0.383 | 0.387 | 0.891 | 0.469 | 0.535 | 0.309 | 0.383 | 2.452 | 5 × 10 | |||||
All | CBOW | 0.377 | 0.483 | 0.336 | 0.484 | 0.355 | 0.483 | 0.514 | 1.908 | 30 | Non | 100 | 5 × 10 | 64 |
CBOW | 0.408 | 0.510 | 0.586 | 0.511 | 0.481 | 0.495 | 0.495 | 11.623 | 9 × 10 | |||||
CBOW | 0.411 | 0.512 | 0.523 | 0.513 | 0.460 | 0.507 | 0.511 | 1.524 | 5 × 10 | |||||
All | CBOW | 0.397 | 0.489 | 0.898 | 0.496 | 0.550 | 0.355 | 0.414 | 1.256 | 30 | Non | 100 | 5 × 10 | 32 |
CBOW | 0.404 | 0.513 | 0.820 | 0.509 | 0.541 | 0.42 | 0.445 | 11.636 | 9 × 10 | |||||
CBOW | 0.399 | 0.499 | 0.844 | 0.500 | 0.541 | 0.394 | 0.430 | 1.205 | 5 × 10 | |||||
All | CBOW | 0.368 | 0.465 | 0.523 | 0.464 | 0.432 | 0.451 | 0.452 | 2.818 | 30 | Trainable | 100 | 5 × 10 | 128 |
CBOW | 0.368 | 0.477 | 0.305 | 0.479 | 0.333 | 0.475 | 0.514 | 0.932 | 50 | |||||
CBOW | 0.333 | 0.449 | 0.297 | 0.452 | 0.314 | 0.450 | 0.483 | 1.447 | 100 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Habib, M.; Faris, M.; Qaddoura, R.; Alomari, M.; Alomari, A.; Faris, H. Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach. Sensors 2021, 21, 3279. https://doi.org/10.3390/s21093279
Habib M, Faris M, Qaddoura R, Alomari M, Alomari A, Faris H. Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach. Sensors. 2021; 21(9):3279. https://doi.org/10.3390/s21093279
Chicago/Turabian StyleHabib, Maria, Mohammad Faris, Raneem Qaddoura, Manal Alomari, Alaa Alomari, and Hossam Faris. 2021. "Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach" Sensors 21, no. 9: 3279. https://doi.org/10.3390/s21093279
APA StyleHabib, M., Faris, M., Qaddoura, R., Alomari, M., Alomari, A., & Faris, H. (2021). Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach. Sensors, 21(9), 3279. https://doi.org/10.3390/s21093279