Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Accurate synthesis of dysarthric Speech for ASR data augmentation
Highlights- Modified a neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels.
- Providing data augmentation for machine learning tasks such ...
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech Recognition (ASR) systems can help dysarthric talkers communicate more ...
- research-articleOctober 2024
The effects of informational and energetic/modulation masking on the efficiency and ease of speech communication across the lifespan
Highlights- In more naturalistic everyday settings, communication efficiency gradually improves from childhood to adulthood irrespective of the listening condition (easy vs. challenging).
- Even moderate levels of background speech affect ...
Children and older adults have greater difficulty understanding speech when there are other voices in the background (informational masking, IM) than when the interference is a steady-state noise with a similar spectral profile but is not speech (...
- research-articleMay 2024
The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy
Highlights- Higher speech intensity and lower speech rates improve automatic speech recognition accuracy.
- Arab ESL teachers and students give more attention to pronunciation errors that do not affect intelligibility.
- Arabic-influenced ESL ...
The present study examines the impact of Arab speakers’ phonological and prosodic features on the accuracy of automatic speech recognition (ASR) of non-native English speech. The authors first investigated the perceptions of 30 Egyptian ESL ...
- research-articleFebruary 2024
The Role of Auditory and Visual Cues in the Perception of Mandarin Emotional Speech in Male Drug Addicts
Highlights- Fill the research gap in the field of speech perception among drug addicts.
- Reveal the presence of a disorder or deficit in multi-modal emotional speech processing in drug addicts.
- Suggest that visual cues, such as facial ...
Evidence from previous neurological studies has revealed that drugs can cause severe damage to the human brain structure, leading to significant cognitive disorders in emotion processing, such as psychotic-like symptoms (e.g., speech illusion: ...
- research-articleOctober 2023
Acoustic properties of non-native clear speech: Korean speakers of English
Highlights- Non-native clear speech is acoustically distinct from casual speech.
- The nature of modifications is the same in native and non-native clear speech.
- The magnitude of modifications is different in native and non-native clear speech. ...
The present study examined the acoustic properties of clear speech produced by non-native speakers of English (L1 Korean), in comparison to native clear speech. L1 Korean speakers of English (N=30) and native speakers of English (N=20) read an ...
-
- review-articleOctober 2023
Speech emotion recognition approaches: A systematic review
AbstractThe speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by ...
Highlights- The speech-emotion recognition (SER) field became crucial in advanced Human-computer interaction (HCI).
- Numerous SER systems have been proposed by researchers using Machine Learning (ML) and Deep Learning (DL).
- This survey aims to ...
- research-articleFebruary 2023
Shared and task-specific phase coding characteristics of gamma- and theta-bands in speech perception and covert speech
Speech Communication (SPCO), Volume 147, Issue CPages 63–73https://doi.org/10.1016/j.specom.2023.01.007AbstractCovert speech is the mental imagery of speaking. This task has gained increasing attention to understand the nature of thought and produce decoding methods for brain–computer interfaces. Building on previous work, we sought to ...
Highlights- Understanding speech-related temporal encoding useful for brain-computer interface training.
- research-articleFebruary 2023
Acoustic characterization and machine prediction of perceived masculinity and femininity in adults
Speech Communication (SPCO), Volume 147, Issue CPages 22–40https://doi.org/10.1016/j.specom.2023.01.002AbstractPrevious research has found that human voice can provide reliable information to be used for gender identification with a high level of accuracy. In social psychology, perceived masculinity and femininity (masculinity and femininity ...
Highlights- We modelled femininity/masculinity ratings for 129 female/96 male voices.
- ...
- research-articleJanuary 2023
Vocal characteristics of accuracy in eyewitness testimony
Speech Communication (SPCO), Volume 146, Issue CPages 82–92https://doi.org/10.1016/j.specom.2022.12.001Highlights- We demonstrate that the accuracy of statements in an eyewitness testimony can be communicated with auditory cues.
In two studies, we examined if correct and incorrect testimony statements were produced with vocally distinct characteristics. Participants watched a staged crime film and were interviewed as eyewitnesses. Witness responses were ...
- research-articleJanuary 2023
Effects of hearing loss and audio-visual cues on children's speech processing speed
Speech Communication (SPCO), Volume 146, Issue CPages 11–21https://doi.org/10.1016/j.specom.2022.11.003Highlights- Children with hearing loss process speech faster with visual cues.
- Audio-visual benefits are similar to those of children with normal hearing.
- Overall processing speed remains slower than that of children with normal hearing.
- ...
Children with hearing loss (HL) can generally achieve functional speech perception with the assistance of hearing aids and/or cochlear implants. However, their speech processing may be less efficient than that of their peers with normal hearing (...
- research-articleJanuary 2023
The effect of fluency strategy training on interpreter trainees’ speech fluency: Does content familiarity matter?
Speech Communication (SPCO), Volume 146, Issue CPages 1–10https://doi.org/10.1016/j.specom.2022.11.002Highlights- Fluency training significantly enhances the interpreter trainees’ speech fluency.
The present study examines the effect of fluency strategy training on the speech fluency of interpreter trainees using a pretest-posttest-delayed posttest design. Moreover, it investigates whether content familiarity influences the ...
- research-articleOctober 2022
Prosodic development from 4 to 10 years: Data from the Italian adaptation of the PEPS-C
Speech Communication (SPCO), Volume 144, Issue CPages 10–19https://doi.org/10.1016/j.specom.2022.08.007Highlights- The development of prosodic functions covers a long period, till adolescence.
- ...
The development of prosody covers a long age period, with some functions not well-mastered till adolescence. Moreover, languages show different prosodic developmental trajectories, which can be related to their distinctive ...
- research-articleOctober 2022
The role of visual cues indicating onset times of target speech syllables in release from informational or energetic masking
Speech Communication (SPCO), Volume 144, Issue CPages 20–25https://doi.org/10.1016/j.specom.2022.08.003Highlights- Listeners benefit from visually guided cues for the timing of syllables in noise.
This study examined the effect of visual cues that provide the timing information of syllables in nonsense target sentences on the recognition of target speech against either a speech-spectrum noise masker or a two-talker masker. When ...
- research-articleOctober 2022
Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphone
Speech Communication (SPCO), Volume 144, Issue CPages 42–56https://doi.org/10.1016/j.specom.2022.08.002Highlights- Under pronunciation principles, this paper presents a smartphone-based pronunciation training system for practicing monophthong regarding vowel articulation.
Learning a foreign language pronunciation is the most challenging task for non-native speakers. Improving pronunciation based on feedback on pronunciation error scores is also not easy for learners. Our goal is to develop an ...
- research-articleOctober 2022
Arm motion symmetry in conversation
Speech Communication (SPCO), Volume 144, Issue CPages 75–88https://doi.org/10.1016/j.specom.2022.08.001AbstractData-driven synthesis of human motion during conversational speech is an active research area with applications that include character animation, computer gaming and conversational agents. Natural looking motion is key to both ...
Highlights- Review the motion symmetry of multiple speakers during dyadic conversation.
- ...
- research-articleSeptember 2022
A bimodal network based on Audio–Text-Interactional-Attention with ArcFace loss for speech emotion recognition
Speech Communication (SPCO), Volume 143, Issue CPages 21–32https://doi.org/10.1016/j.specom.2022.07.004AbstractSpeech emotion recognition (SER) is an essential part of human–computer interaction. Meanwhile, the SER has widely utilized multimodal information in SER in recent years. This paper focuses on exploiting the acoustic and textual ...
Highlights- A bimodal network based on an Audio–Text-Interactional-Attention (ATIA) structure and ArcFace loss is proposed.
- research-articleSeptember 2022
A method for improving bot effectiveness by recognising implicit customer intent in contact centre conversations
Speech Communication (SPCO), Volume 143, Issue CPages 33–45https://doi.org/10.1016/j.specom.2022.07.003Highlights- New method for recognising the intent of a customer contacting a CC hotline, which is designed specifically for this industry.
Contact centre systems are increasingly using intelligent voicebots and chatbots. These solutions are constantly evolving and improving. One of the main tasks of a virtual assistant is to recognise customers’ ...
- research-articleJune 2022
Learning transfer from singing to speech: Insights from vowel analyses in aging amateur singers and non-singers
Speech Communication (SPCO), Volume 141, Issue CPages 28–39https://doi.org/10.1016/j.specom.2022.05.001Highlights- Articulatory space and vowel distinctiveness are independent vowel properties.
- ...
Task-independent (e.g., Ballard et al., 2003) and task-dependent models (e.g., Ziegler, 2003) differ in their predictions regarding the learning transfer from non-speech activities to speech. We argue that singing is ...
- research-articleJune 2022
Perceptual effects of interpolated Austrian and German standard varieties
Speech Communication (SPCO), Volume 141, Issue CPages 107–120https://doi.org/10.1016/j.specom.2022.04.003Highlights- Pluricentric Research on German by means of a listener judgment experiment.
- ...
This article focuses on the perception of standard varieties produced by Austrian and German TV newscasters from the perspective of listeners from both countries, Germany and Austria. Thus, the paper's sociolinguistic scope is located ...
- research-articleJune 2022
Seeing lexical tone: Head and face motion in production and perception of Cantonese lexical tones
- Denis Burnham,
- Eric Vatikiotis-Bateson,
- Adriano Vilela Barbosa,
- João Vítor Menezes,
- Hani C. Yehia,
- Rua Haszard Morris,
- Guillaume Vignali,
- Jessica Reynolds
Speech Communication (SPCO), Volume 141, Issue CPages 40–55https://doi.org/10.1016/j.specom.2022.03.011Highlights- Visual information for lexical tone is more resilient in running speech than auditory information for tone.
Previous studies show that lexical tones can be discriminated visually, but the locus of this information is unknown. Here we investigate the role of visual face and head information in the production and perception of the six ...