Wu K, E S, Yang N, Zhang A, Yan X, Mu C and Song Y. (2025). A novel approach to enhancing biomedical signal recognition via hybrid high-order information bottleneck driven spiking neural networks. Neural Networks. 183:C. Online publication date: 1-Mar-2025.

https://doi.org/10.1016/j.neunet.2024.106976

Li Y, Wang Y, Hoi L, Yang D and Im S. (2025). A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models. EURASIP Journal on Audio, Speech, and Music Processing. 2025:1. Online publication date: 21-Jan-2025.

https://doi.org/10.1186/s13636-024-00388-w

Huh J and Zisserman A. Character-Aware Audio-Visual Subtitling in Context. Computer Vision – ACCV 2024. (365-383).

https://doi.org/10.1007/978-981-96-0908-6_21

Hemmatian M, Shahzadi A and Mozaffari S. (2024). Uncertainty-based knowledge distillation for Bayesian deep neural network compression. International Journal of Approximate Reasoning. 175:C. Online publication date: 1-Dec-2024.

https://doi.org/10.1016/j.ijar.2024.109301

Pinto D, Arnau J, Riera M, Cruz J and González A. (2024). Exploiting beam search confidence for energy-efficient speech recognition. The Journal of Supercomputing. 80:17. (24908-24937). Online publication date: 1-Nov-2024.

https://doi.org/10.1007/s11227-024-06351-y

Parlak C and Altun Y. (2024). A Quest for Formant-Based Compact Nonuniform Trapezoidal Filter Banks for Speech Processing with VGG16. Circuits, Systems, and Signal Processing. 43:11. (7309-7338). Online publication date: 1-Nov-2024.

https://doi.org/10.1007/s00034-024-02794-z

Hinduja S, Darzi A, Ertugrul I, Provenza N, Gadot R, Storch E, Sheth S, Goodman W and Cohn J. (2024). Multimodal Prediction of Obsessive-Compulsive Disorder and Comorbid Depression Severity and Energy Delivered by Deep Brain Electrodes. IEEE Transactions on Affective Computing. 15:4. (2025-2041). Online publication date: 1-Oct-2024.

https://doi.org/10.1109/TAFFC.2024.3395117

Kimani M, Nderu L and Ndirangu D. Integrating Attention Mechanisms with Bidirectional Long Short-term Memory Recurrent Neural Networks for Improved Speech Recognition. Proceedings of the 2024 3rd International Conference on Algorithms, Data Mining, and Information Technology. (224-227).

https://doi.org/10.1145/3701100.3701147

Shanthamallappa M. (2024). Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review. Wireless Personal Communications: An International Journal. 137:4. (2085-2119). Online publication date: 1-Aug-2024.

https://doi.org/10.1007/s11277-024-11448-x

Zhou J, Gao S, Yu Z, Dong L and Wang W. DialectMoE: An End-to-End Multi-dialect Speech Recognition Model with Mixture-of-Experts. Chinese Computational Linguistics. (243-258).

https://doi.org/10.1007/978-981-97-8367-0_15

Chao D, Chen Y, Koudas N and Yu X. (2024). Optimizing Video Queries with Declarative Clues. Proceedings of the VLDB Endowment. 17:11. (3256-3268). Online publication date: 1-Jul-2024.

https://doi.org/10.14778/3681954.3681998

Nguyen Doan T, Huynh S, Nguyen A, Le A, Phan Thi Thuy A, Huynh D and Nguyen B. Vietnamese Automatic Speech Recognition for Financial Conversation Data. Intelligent Information and Database Systems. (372-383).

https://doi.org/10.1007/978-981-97-4985-0_29

Devi T and Das P. (2024). Disambiguation of Isolated Manipuri Tonal Contrast Word Pairs Using Acoustic Features. ACM Transactions on Asian and Low-Resource Language Information Processing. 23:3. (1-18). Online publication date: 31-Mar-2024.

https://doi.org/10.1145/3643830

de Goma J, Alberto J, Antonio K and San Pedro P. Speech Recognition of Tagalog Talisay Batangueño Accent in the Philippines using Wav2Vec2.0. Proceedings of the 2024 15th International Conference on E-Education, E-Business, E-Management and E-Learning. (416-421).

https://doi.org/10.1145/3670013.3670031

Sayed S, Ahmed Abdel Azeem Abul Seoud R, Abdel Naby H and Sameer M. (2024). Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers. Journal of Electrical and Computer Engineering. 2024. Online publication date: 1-Jan-2024.

https://doi.org/10.1155/2024/4976944

Tan Z, Yang W and Wang Z. Reimagining 3D Visual Grounding: Instance Segmentation and Transformers for Fragmented Point Cloud Scenarios. Proceedings of the 5th ACM International Conference on Multimedia in Asia. (1-7).

https://doi.org/10.1145/3595916.3626405

Xiao S, Ji X, Yan C, Zheng Z and Xu W. MicPro: Microphone-based Voice Privacy Protection. Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. (1302-1316).

https://doi.org/10.1145/3576915.3616616

Al Dujaili M and Ebrahimi-Moghadam A. (2023). Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers. Multimedia Tools and Applications. 82:27. (42783-42801). Online publication date: 1-Nov-2023.

https://doi.org/10.1007/s11042-023-15413-x

Zheng Z, Li X, Yan C, Ji X and Xu W. The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems. Proceedings of the 31st ACM International Conference on Multimedia. (7849-7858).

https://doi.org/10.1145/3581783.3613843

Yu Z, Ling T, Gu F, Sheng H and Liu Y. A Pre-trained Model for Chinese Medical Record Punctuation Restoration. Pattern Recognition and Computer Vision. (101-112).

https://doi.org/10.1007/978-981-99-8540-1_9

Latif S, Rana R, Khalifa S, Jurdak R and Schuller B. (2023). Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition. IEEE Transactions on Affective Computing. 14:4. (3164-3176). Online publication date: 1-Oct-2023.

https://doi.org/10.1109/TAFFC.2022.3221749

Chi P, Feng Y, Zhou M, Xiong X, Wang Y and Qiang B. (2023). TIAR: Text-Image-Audio Retrieval with weighted multimodal re-ranking. Applied Intelligence. 53:19. (22898-22916). Online publication date: 1-Oct-2023.

https://doi.org/10.1007/s10489-023-04669-3

Kath H, Lüers B, Gouvêa T and Sonntag D. Lost in Dialogue: A Review and Categorisation of Current Dialogue System Approaches and Technical Solutions. KI 2023: Advances in Artificial Intelligence. (98-113).

https://doi.org/10.1007/978-3-031-42608-7_9

Liu M, Zhao C, Peng X, Yu S, Wang H and Sha C. (2023). Task-Oriented ML/DL Library Recommendation Based on a Knowledge Graph. IEEE Transactions on Software Engineering. 49:8. (4081-4096). Online publication date: 1-Aug-2023.

https://doi.org/10.1109/TSE.2023.3285280

Yang Y, Gao F, Tao X, Liu G and Pan C. (2023). Environment Semantics Aided Wireless Communications: A Case Study of mmWave Beam Prediction and Blockage Prediction. IEEE Journal on Selected Areas in Communications. 41:7. (2025-2040). Online publication date: 1-Jul-2023.

https://doi.org/10.1109/JSAC.2023.3280966

Dua M, Akanksha and Dua S. (2023). Noise robust automatic speech recognition: review and analysis. International Journal of Speech Technology. 26:2. (475-519). Online publication date: 1-Jul-2023.

https://doi.org/10.1007/s10772-023-10033-0

Mishra A, Singh U and Singh K. (2023). A lightweight relation network for few-shots classification of hyperspectral images. Neural Computing and Applications. 35:15. (11417-11430). Online publication date: 1-May-2023.

https://doi.org/10.1007/s00521-023-08306-5

Wang Y, Li Z, Chelladurai P, Dannels W, Oh T and Peiris R. Haptic-Captioning: Using Audio-Haptic Interfaces to Enhance Speaker Indication in Real-Time Captions for Deaf and Hard-of-Hearing Viewers. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. (1-14).

https://doi.org/10.1145/3544548.3581076

Teshite K, Mamo G, Calpotura K and Shad S. (2023). Afan Oromo Speech-Based Computer Command and Control. Advances in Human-Computer Interaction. 2023. Online publication date: 1-Jan-2023.

https://doi.org/10.1155/2023/9959015

Zhang Y. Research on Phoneme Recognition using Attention-based Methods. Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition. (405-411).

https://doi.org/10.1145/3581807.3581866

Ratnarajah A, Tang Z, Aralikatti R and Manocha D. MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes. Proceedings of the 30th ACM International Conference on Multimedia. (924-933).

https://doi.org/10.1145/3503161.3548253

Shirur P, Raghuvanshi S and Bali V. (2022). Developing Accessible Websites for Differently Abled People Using Open Source Tools. International Journal of Software Innovation. 10:1. (1-21). Online publication date: 30-Sep-2022.

https://doi.org/10.4018/IJSI.303576

Abbasi W. Privacy-Preserving Speaker Verification and Speech Recognition. Emerging Technologies for Authorization and Authentication. (102-119).

https://doi.org/10.1007/978-3-031-25467-3_7

Cardaioli M, Conti M and Ravindranath A. For Your Voice Only: Exploiting Side Channels in Voice Messaging for Environment Detection. Computer Security – ESORICS 2022. (595-613).

https://doi.org/10.1007/978-3-031-17143-7_29

Lin C, Wang Y, Huang P, Shi Y and Chang Y. (2022). Spatial-temporal attention-based convolutional network with text and numerical information for stock price prediction. Neural Computing and Applications. 34:17. (14387-14395). Online publication date: 1-Sep-2022.

https://doi.org/10.1007/s00521-022-07234-0

Tang Z, Aralikatti R, Ratnarajah A and Manocha D. GWA: A Large High-Quality Acoustic Dataset for Audio Processing. ACM SIGGRAPH 2022 Conference Proceedings. (1-9).

https://doi.org/10.1145/3528233.3530731

Dhanjal A and Singh W. (2021). An automatic machine translation system for multi-lingual speech to Indian sign language. Multimedia Tools and Applications. 81:3. (4283-4321). Online publication date: 1-Jan-2022.

https://doi.org/10.1007/s11042-021-11706-1

Lü Y, Lin H, Wu P and Chen Y. (2021). Feature compensation based on independent noise estimation for robust speech recognition. EURASIP Journal on Audio, Speech, and Music Processing. 2021:1. Online publication date: 20-Dec-2021.

https://doi.org/10.1186/s13636-021-00213-8