default search action
14th ISCSLP 2024: Beijing, China
- Yanmin Qian, Qin Jin, Zhijian Ou, Zhenhua Ling, Zhiyong Wu, Ya Li, Lei Xie, Jianhua Tao:
14th IEEE International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, Beijing, China, November 7-10, 2024. IEEE 2024, ISBN 979-8-3315-1682-6 - Jingchen Li, Xin Liu, Xueliang Zhang:
OPC-KWS: Optimizing Keyword Spotting with Path Retrieval Decoding and Contrastive Learning. 1-5 - Binqiang Wang, Gang Dong, Yaqian Zhao, Rengang Li:
Personalized Multimodal Emotion Recognition: Integrating Temporal Dynamics and Individual Traits for Enhanced Performance. 408-412 - Shuoyi Zhou, Yixuan Zhou, Weiqing Li, Jun Chen, Runchuan Ye, Weihao Wu, Zijian Lin, Shun Lei, Zhiyong Wu:
The Codec Language Model-Based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024. 496-500 - Fang Hu:
Gradiency in Obstruent Devoicing in the Varieties of Wu Chinese. 111-115 - Qibing Bai, Shuai Wang, Zhijun Liu, Mingyang Zhang, Wei Rao, Yannan Wang, Haizhou Li:
Diffusion-Based Method with TTS Guidance for Foreign Accent Conversion. 284-288 - Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo:
COMOSVC: Consistency Model-Based Singing Voice Conversion. 184-188 - Xiyao Lu, Yukai Wan, Ruishan Li, Jinsong Zhang:
Statistical Analysis of F0 Characteristics of "Grade A Level 1" Mandarin Tones: On the Application of the T-Value Method. 1-5 - Xinrui Yan, Jiangyan Yi, Jianhua Tao, Yujie Chen, Hao Gu, Guanjun Li, Junzuo Zhou, Yong Ren, Tao Xu:
Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio. 476-480 - Guolun Sun, Li Wang:
Constant Q Transform for Audio-Visual Dysarthria Severity Assessment. 146-150 - Zhiyong Wang
, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li:
A Noval Feature via Color Quantisation for Fake Audio Detection. 1-5 - Aijun Li, Zhiwei Wang, Sichen Zhang, Jun Gao, Xin Zhou:
The Development of Speech Rhythm in Mandarin-Speaking Children. 274-278 - Juan Liu, Yudong Yang, Xiaokang Liu, Xiaoyi Zuo, Junfeng Li, Lan Wang, Nan Yan:
The ISCSLP 2024 Multimodal Dysarthria Severity Assessment (MDSA) Challenge: Dataset, Tracts, Baseline and Results. 136-140 - Sabrina Chow, Lilian Guo, Jonathan Chow, Chelsea Chia, Sarah Li, Dong-Yan Huang:
Semantic Search Using LLM-Aided Topic Generation on Knowledge Graphs for Paper Discovery. 353-357 - Zhongxuan Mao, Chenyu Li, Shanpeng Li:
Speech Rate Influence on Rhythm Alterations in Mandarin. 521-525 - Chengyuan Qin, Wenmeng Xiong, Maoshen Jia, Haoyang Zhou, Jing Zhang, Xianhong Chen, Qi Wang:
Robust Coherent sources Localization Based on Hankel Matrix Reconstruction. 706-710 - Yubo Jiang, Zhihua Huang:
Fast Sampling Based on Policy Gradient for Diffusion-Based Speech Enhancement. 576-580 - Shuang Zhou, Yinghao Li:
Categorical Perception of Tone 2 and Tone 3 of Standard Chinese by Bilingual Korean Ethnic Speakers in China. 551-555 - Zhuojun Wu, Dong Liu, Ming Li:
Lightweight Language Model for Speech Synthesis: Attempts and Analysis. 501-505 - Zhihan Yang, Chunfeng Wang, Zhiyong Wu, Jia Jia:
Inferring Agent Speaking Styles for Auditory-Visual User-Agent Conversation. 421-425 - Jinghua Liang, Bo Wang, Xihong Wu, Jing Chen:
Encoding and Decoding of Chinese Phonemes Based on MEG Signals. 224-228 - Yifan Hu, Rui Liu, Guanglai Gao, Haizhou Li:
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis. 299-303 - Haoxiang Hou, Xun Gong, Yanmin Qian:
ConMamba: A Convolution-Augmented Mamba Encoder Model for Efficient End-to-End ASR Systems. 711-715 - Tong Lee Chung, Jianxin Pang, Jun Cheng:
Empowering Robots with Multimodal Language Models for Task Planning with Interaction. 358-362 - Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Shuchen Shi, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Xin Qi, Guanjun Li:
ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024. 626-630 - Siyin Wang, Chao Zhang:
Speaker Diarization for Unlimited Number of Speakers Using Dynamic Linear. 368-372 - Yiwei Liang, Ming Li:
Vivid Background Audio Generation Based on Large Language Models and AudioLDM. 621-625 - Yuhang Yang, Yizhou Peng, Eng Siong Chng, Xionghu Zhong:
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs. 646-650 - Wenqian Bao, Yuchen Yan, Jinsong Zhang:
Enhancing Mispronunciation Detection with WavLM and Mixture-of-Experts Network. 189-193 - Yifeng Sun, Yanlu Xie, Jinsong Zhang, Dengfeng Ke:
Arti-Invar: A Pre-trained Model for Enhancing Acoustic-to-Articulatory Inversion Performance. 154-158 - Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling:
Leveraging Prompt Learning and Pause Encoding for Alzheimer's Disease Detection. 486-490 - Xin Zhou, Wangyou Zhang, Chenda Li, Yanmin Qian:
Insights from Hyperparameter Scaling of Online Speech Separation. 561-565 - Qixin Li, Gaowu Wang:
Focus and Gender Affect Sentence Type Perception: Observations on Mandarin Sentence-Final Particle Ba. 511-515 - Xiaoke Qi, Hao Gu, Jiangyan Yi, Jianhua Tao, Yong Ren, Jiayi He, Siding Zeng:
MADD: A Multi-Lingual Multi-Speaker Audio Deepfake Detection Dataset. 466-470 - Xiangzhu Kong, Tianqi Ning, Hao Huang, Zhijian Ou:
Cuside-Array: A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations. 721-725 - Bin Zhao, Gaoyan Zhang, Jianwu Dang, Aijun Li:
Bi-Directional Oscillatory Interaction in the Neural Networks Engaged in Sentence Oral Reading. 56-60 - Hongwu Ding, Yiquan Zhou, Wenyu Wang, Jiacheng Xu, Jiaqi Mei:
Hola-TTS: A Cross-Lingual Zero-Shot Text-to-Speech System for Chinese, English, Japanese, and Korean. 601-605 - Wei Dai, Menglong Li, Yingqi He, Yongqiang Zhu:
Fine-Tuning Pre-Trained Audio Models for Dysarthria Severity Classification: A Second Place Solution in the Multimodal Dysarthria Severity Classification Challenge. 151-153 - Pincheng Lu, Liang Xu, Jing Wang:
A Differential Quantization Based End-to-End Neural Speech Codec. 71-75 - Yicong Jiang, Youjun Chen, Tianzi Wang, Zengrui Jin, Xurong Xie, Hui Chen, Xunying Liu, Feng Tian:
Investigation of Cross Modality Feature Fusion for Audio-Visual Dysarthric Speech Assessment. 141-145 - Yuancheng Wang, Haoran Zheng, Qi Sun, Yong Ma, Shihu Zhu, Le Zhang, Wei-Qiang Zhang:
Cross-Lingual Alzheimer's Disease Detection Based on Scale Criteria. 491-495 - Jingran Xie, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng:
ERVQ: Leverage Residual Vector Quantization for Speech Emotion Recognition. 456-460 - Tao Zhuang, Jiaxin Zhong, Jing Lu:
The Feasibility of Sound Zone Control Using an Array of Parametric Array Loudspeakers. 66-70 - Juan Liu, Xiaokang Liu, Yudong Yang, Rukiye Ruzi, Xiaoyi Zuo, Changqing Xu, Chaojinzi Li, Xinyu Li, Rongfeng Su, An-Ming Hu, Yu-Mei Zhang, Shaofeng Zhao, Xiaoxia Du, Lan Wang, Nan Yan:
The Open-Access Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database for Intelligent Assessment. 131-135 - Shuang Liang, Yu Gu:
Multi-Modal Dysarthria Severity Assessment Using Dual-Branch Feature Decoupling Network and Mixed Expert Framework. 126-130 - Wenyi Yu, Chao Zhang:
An Optimizer for Conformer Based on Conjugate Gradient Method. 1-5 - Peng Zhao, Ruicong Wang, Xueyi Zhang, Mingrui Lao, Siqi Cai:
Binary-Temporal Convolutional Neural Network for Multi-Class Auditory Spatial Attention Detection. 1-5 - Huijun Lian, Keqi Chen, Zekai Sun, Yingming Gao, Ya Li:
G2DiaR: Enhancing Commonsense Reasoning of LLMs with Graph-to-Dialogue & Reasoning. 214-218 - Yi Han, Hang Chen, Jun Du, Chang-Qing Kong, Shifu Xiong, Jia Pan:
Layer-Adaptive Low-Rank Adaptation of Large ASR Model for Low-Resource Multilingual Scenarios. 696-700 - Kangxiang Xia, Dake Guo, Jixun Yao, Liumeng Xue, Hanzhao Li, Shuai Wang, Zhao Guo, Lei Xie, Qingqing Zhang, Lei Luo, Minghui Dong, Peng Sun:
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings. 506-510 - Junan Li, Yunxiang Li, Yuren Wang, Xixin Wu, Helen Meng:
Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease. 471-475 - Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Guanjun Li:
Exploring the Role of Audio in Multimodal Misinformation Detection. 204-208 - Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhenhua Cheng, Zheng Lian, Bin Liu:
IERP 2024: Induced Emotion Recognition with Personality Characteristics Challenge 2024. 413-416 - Grace Wenling Cao, Vincent Hughes, Bruce Xiao Wang, Peggy Mok:
Cross-Language Forensic Voice Comparison of Hong Kong Trilingual Speakers using Filled Pauses and an Automatic Speaker Recognition System. 279-283 - Tong Lee Chung, Jun Cheng, Jianxin Pang:
Mitigating Hallucination in Visual Language Model Segmentation with Negative Sampling. 344-348 - Kang Zhu, Xuefei Liu, Heng Xie, Cong Cai, Ruibo Fu, Guanjun Li, Zhengqi Wen, Jianhua Tao, Cunhang Fan, Zhao Lv, Le Wang, Hao Lin:
Transferring Personality Knowledge to Multimodal Sentiment Analysis. 431-435 - Yuxun Tang, Jiatong Shi, Yuning Wu, Qin Jin:
An Exploration on Singing MOS Prediction. 651-655 - Jizhou Cui, Xuefei Liu, Yongwei Li, Xiaoying Xu, Ruibo Fu, Jianhua Tao, Zhengqi Wen, Yukun Liu, Guanjun Li, Le Wang, Hao Lin:
Unlocking the Power of Emotions: Enhancing Personality Trait Recognition Through Utilization of Emotional Cues. 566-570 - Sinan Sun, Longxiang Zhang, Bo Wang, Xihong Wu, Jing Chen:
Representation of Articulatory Features in EEG During Speech Production Tasks. 219-223 - Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou:
An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought. 436-440 - Dongrui Han, Mingyu Cui, Jiawen Kang, Xixin Wu, Xunying Liu, Helen Meng:
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models. 631-635 - Peng Zhao, Ruicong Wang, Zijie Lin, Zexu Pan, Haizhou Li, Xueyi Zhang:
Ensemble Deep Learning Models for EEG-Based Auditory Attention Decoding. 339-343 - Yuan Jia, Xintong Zuo:
Acoustic Features of Standard Chinese Consonants by Uyghur Primary School Teachers. 1-5 - Muhammad Sharif, Jiangyan Yi, Muhammad Shoaib:
Unification of Balti and Trans-Border Sister Dialects in the Essence of LLMs and AI Technology. 244-248 - Xinwen Yue, Yupei Zhang, Jianqian Zhang, Zhiyu Li, Jing Wang, Shenghui Zhao:
Non-Intrusive Audio Quality Assessment Based on Deep Neural Network for Subjective MOS Prediction. 76-80 - Jiawei Ru, Maoshen Jia, Yuhao Zhao, Liang Tao:
A Dual-path Conformer-Based Network for Neural Speech Coding. 661-665 - Di Zhou, Daisuke Mizuguchi, Takeshi Yamamoto, Yasuhiro Omiya:
A Study on Depression Detection Through Explainable Features of Speech. 36-40 - Dake Guo, Jixun Yao, Xinfa Zhu, Kangxiang Xia, Zhao Guo, Ziyu Zhang, Yao Wang, Jie Liu, Lei Xie:
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge. 616-620 - Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang:
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective. 426-430 - Yuting Zhang, Xiaoying Xu:
The Effect of Focus Position on Downstep in Chinese Non-Interrogative Sentences. 1-5 - Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li:
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech. 294-298 - Yu Chen, Yu Bai, Ju Zhang:
A Study on the Effectiveness of Mandarin Seven-Sound Test Across Multiple Speakers. 106-110 - Shuwen Chen, Jun Gao, Zixuan Jia:
Production of Mandarin Chinese R-Suffix by Mandarin-Speaking Children: A Preliminary Study. 101-105 - Hengzhi Zhou, Mingyue Shi, Qinglin Meng:
Evaluating Speech Intelligibility for Cochlear Implants Using Automatic Speech Recognition. 1-5 - Yumei Zhang, Maoshen Jia, Xuan Cao, Zichen Zhao:
Speech Emotion Recognition Based on Shallow Structure of Wav2vec 2.0 and Attention Mechanism. 398-402 - Tingxiao Zhou, Leying Zhang, Yanmin Qian:
Knowledge Distillation from Discriminative Model to Generative Model with Parallel Architecture for Speech Enhancement. 179-183 - Jin Li, Lirong Dai:
Optimizing Deep Speaker Embeddings with a Dynamic Cross Triplet Framework. 378-382 - Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie:
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets. 26-30 - Qilong Yuan, Di Zhu, Enze Shi, Kui Zhao:
The NWPU-BBIC System for the ISCSLP 2024 Chinese Auditory Attention Decoding Challenge. 329-333 - Rui Niu, Changhe Song, Zhiyong Wu:
NLPP: A Natural Language Prosodic Prominence Dataset Assisted by ChatGPT. 441-445 - Yubang Zhang, Jie Zhang, Zhenhua Ling:
The NERCSLIP-USTC System for Track2 of the First Chinese Auditory Attention Decoding Challenge. 319-323 - Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie:
E-Chat: Emotion-Sensitive Spoken Dialogue System with Large Language Models. 586-590 - Zhiqiang Duan, Jian Zhou, Cunhang Fan, Liang Tao, Zhao Lv:
CATAD: Conformer-Based Adversarial Training with Adaptive Diffusion for Bone-Conducted Speech Enhancement. 159-163 - Yi Zhang, Lishan Li, Xiaoying Xu:
Individual Differences in Tone Perception and Production in Emerging Dialect: A Case Study of Elementary School Children in Changsha. 269-273 - Wenbo Zhao, Ziwei Li, Chuan Yu, Zhijian Ou:
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer Based Streaming ASR. 11-15 - Cheng Chi, Xiaoyu Li, Rui Zhang, Xiaodong Li, Chengshi Zheng:
The Impact of Dynamic Cue and Audio Stimulus Type on Subjective Localization in VR Headsets. 86-90 - Yubo Zhou, Weizhen Bian, Kaitai Zhang, Xiaohan Gu:
Advancing Music Therapy: Integrating Eastern Five-Element Music Theory and Western Techniques with AI in the Novel Five-Element Harmony System. 234-238 - Rui Feng, Yin-Long Liu, Zhen-Hua Ling, Jia-Hong Yuan:
Wav2f0: Exploring the Potential of Wav2vec 2.0 for Speech Fundamental Frequency Extraction. 169-173 - Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin:
A Systematic Exploration of Joint-Training for Singing Voice Synthesis. 289-293 - Yu-Fei Shi, Yang Ai, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling:
SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features. 199-203 - Nan Li, Yadong Niu, Liushuai Yuan, Xihong Wu, Jing Chen:
A Spectral Change Enhancement Method Based on Self-Supervised Learning Framework. 571-575 - Chenyu Li, Zhongxuan Mao, Shanpeng Li:
Analysis of Normal and Slow Speech Rate on the F0 Contour of Tones in Mandarin Broadcasting Speech. 526-530 - Ruishan Li, Yanlu Xie:
Acoustic Features at Intonational Phrase Boundaries: Comparative Study of Native Speakers and L2 Learners of Chinese Mandarin. 671-675 - Honghong Wang, Xupeng Jia, Jing Deng, Rong Zheng:
Speech Emotion Recognition using Fine-Tuned DWFormer: A Study on Track 1 of the IERP Challenge 2024. 403-407 - Shuanghong Liu, Zhida Song, Zhihua Fang, Liang He
:
LE-CAM++: A Lighter and More Efficient CAM++ for Speaker Verification. 393-397 - Rui Feng, Yu-Ang Chen, Yin-Long Liu, Jia-Hong Yuan, Zhen-Hua Ling:
Wav2Nas: An Exploratory Approach to Nasalance Estimation in Speech. 1-5 - Linfeng Feng, Xiao-Lei Zhang, Xuelong Li:
Quantization-Error-Free Soft Label for 2D Sound Source Localization. 194-198 - Yuejiao Wang, Xianmin Gong, Xixin Wu, Patrick C. M. Wong, Hoi-lam Helene Fung, Man-Wai Mak, Helen Meng:
Naturalistic Language-Related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder. 31-35 - Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang, Rilin Chen, Lirong Dai:
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation. 309-313 - Kaixin Yang, Gaofeng Cheng, Ta Li, Qingwei Zhao, Yonghong Yan:
Query-by-Example Speech Search using Mamba and Random Offset Mixed Padding. 726-730 - Shixin Jiang, Ming Liu, Bing Qin:
Fusion Pruning for Large Language Models. 349-352 - Tianyou Cheng, Maokui He, Gaobin Yang, Shutong Niu, Yanqiang Lei, Limei Peng, Jun Du:
Online Neural Speaker Diarization with Spectral Clustering for Meeting Scenarios. 373-377 - Zhengyang Chen, Shuai Wang, Bing Han, Yanmin Qian:
Combining Self-Supervised Learning and Adversarial Training Based Domain Adaptation for Speaker Verification. 701-705 - Yuanyuan Zhu, Jiaxu He, Ruihao Jing, Yaodong Song, Jie Lian, Xiao-Lei Zhang, Jie Li:
LLM-Based Expressive Text-to-Speech Synthesizer with Style and Timbre Disentanglement. 596-600 - Jingyu Li, Aemon Yat Fei Chiu
, Tan Lee:
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems. 388-392 - Zelin Qiu, Junfeng Li, Yingyi Luo:
Spatial Attention in Interfering Speech Perception. 61-65 - Haoyu Wang, Tianrui Wang, Cheng Gong, Yu Jiang, Qiuyu Liu, Longbiao Wang, Jianwu Dang:
Expressive Speech Synthesis with Theme-Oriented Few-Shot Learning in ICAGC 2024. 606-610 - Jiawen Kang, Junan Li, Jinchao Li, Xixin Wu, Helen Meng:
Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection. 254-258 - Jingran Xie, Changhe Song, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng:
CMAST: Efficient Speech-Text Joint Training Method to Enhance Linguistic Features Learning of Speech Representations. 656-660 - Xingguang Dong, Cunhang Fan, Hongyu Zhang, Xiaoke Yang, Sheng Zhang, Jian Zhou, Zhao Lv:
CSDA: Cross-Session Domain Adaptation in Auditory Attention Decoding of EEG for a Single Subject. 451-455 - Wei Chen, Xintao Zhao, Jun Chen, Binzhu Sha, Zhiwei Lin, Zhiyong Wu:
RobustSVC: HuBERT-Based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion. 164-168 - Xiaowang Liu, Xiaolin Wu, Jinsong Zhang:
A Study on the Information Mechanism of the 3rd Tone Sandhi Rule in Mandarin Across Word Boundaries. 546-550 - Dekun Chen, Zhizheng Wu:
Zh-Paral: Benchmark Dataset for Comprehension of Chinese Paralinguistic Speech. 363-367 - Ruibo Liu, Shuai-Xin Wang, Zhuang-Zhuang Liu, Jiang-Jiang Zhao, Yuling Ren, Yu Liu:
Convincing Audio Generation Based on LLM and Speech Tokenization. 591-595 - Yu Jiang, Tianrui Wang, Haoyu Wang, Cheng Gong, Qiuyu Liu, Zikang Huang, Longbiao Wang, Jianwu Dang:
Expressive Text-to-Speech with Contextual Background for ICAGC 2024. 611-615 - Fuqian Wu, Xiyu Wu:
Contributions of Acoustic Factors to Tone Identification in Whispered Mandarin. 516-520 - Dawei Xiang, Yong Ma, Yiming Yang:
A Study of Brain Mechanisms by Which Sound Source Location and Amount of Masking Affect Target Perception. 239-243 - Jiahao Li, Cunhang Fan, Enrui Liu, Jian Zhou, Zhao Lv:
Dual-Strategy Fusion Method in Noise-Robust Speech Recognition. 16-20 - Zhengshun Xia, Ziyang Ma, Zhisheng Zheng, Xie Chen:
Improving Emotion Recognition with Pre-Trained Models, Multimodality, and Contextual Information. 636-640 - Zelin Qiu, Dingding Yao, Junfeng Li:
StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture. 1-5 - Ruofan Yan, Shu Peng, Zhige Chen, Zhi-An Huang, Rui Liu, Kay Chen Tan, Jibin Wu:
Enhancing Spatio-Temporal Auditory Attention Decoding with ST-AADNet. 334-338 - Wenjun Ding, Xinsheng Wang, Lijian Gao, Qirong Mao:
TF-DiffuSE: Time-Frequency Prior-Conditioned Diffusion Model for Speech Enhancement. 581-585 - Siyi Zhao, Wei Wang, Yanmin Qian:
Band-Wise Front-End Distortion Suppression for Robust Speech Recognition. 681-685 - Yaqin Wu, Yan Chang, Yanzhang Geng, Xiaofeng Cao, Jiawei Zhao:
GM-LPC Based Multiband Analysis and Enhancement of Pathological Voice. 174-178 - Yuanming Zhang, Zeyan Song, Haoliang Du, Xia Gao, Jing Lu:
Robustness and Generalization Capability Validation of Convolutional Neural Network on a Chinese EEG Auditory Attention Decoding Dataset. 51-55 - Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye:
Does Current Deepfake Audio Detection Model Effectively Detect ALM-Based Deepfake Audio? 481-485 - Boda Xiao, Bo Wang, Xuning Chen, Xiran Xu, Xihong Wu, Jing Chen:
Comparing Human-Labeled and LLM-Generated Semantic Features via Cortical Neural Representation. 666-670 - Hui-Peng Du, Yang Ai, Rui-Chen Zheng, Zhen-Hua Ling:
APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm. 676-680 - Weizhen Bian, Yubo Zhou, Kaitai Zhang, Xiaohan Gu:
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations. 417-420 - Hannan Cheng, Kangyue Li, Long Ye, Jingling Wang:
EnvFake: An Initial Environmental-Fake Audio Dataset for Scene-Consistency Detection. 81-85 - Chong Cao, Qian Li:
The Role of F0 in the Recognition of Aspiration Contrasts in Mandarin. 536-540 - Pei-Jun Liao, Hung-Yi Lee, Hsin-Min Wang:
Ensemble Knowledge Distillation from Speech SSL Models Considering Inter-Teacher Differences. 716-720 - Shitong Fan, Wenbo Wang, Feiyang Xiao, Shiheng Zhang, Qiaoxi Zhu, Jian Guan:
Independent Feature Enhanced Crossmodal Fusion for Match-Mismatch Classification of Speech Stimulus and EEG Response. 209-213 - Hanzhe Xu, Xuefei Liu, Cong Cai, Kang Zhu, Jizhou Cui, Ruibo Fu, Heng Xie, Jianhua Tao, Zhengqi Wen, Ziping Zhao, Guanjun Li, Le Wang, Hao Lin:
Temporal Shift for Personality Recognition with Pre-Trained Representations. 446-450 - Zonghui Wang, Zhihua Fang, Zhida Song, Liang He
:
Simplified Skip-Connected UNet for Robust Speaker Verification Under Noisy Environments. 691-695 - Fengping Wang, Bingsong Bai, Yayue Deng, Jinlong Xue, Yingming Gao, Ya Li:
ExpressiveSinger: Synthesizing Expressive Singing Voice as an Instrument. 304-308 - Lishan Li, Xiaoying Xu:
Multiple Patterns of Merging Guangzhou Cantonese Tones in Production and Perception: Study on Youth Groups. 1-5 - Yueqian Lin, Dong Liu, Yunfei Xu, Hongbin Suo, Ming Li:
Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice Generation. 229-233 - Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian:
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification. 383-387 - Jia-Jyu Su, Chen-Yu Chiang, Yue-Shan Chang, Chao-Yin Lin, Jiunn-Horng Kang, Min-Yuh Day:
A Preliminary Study on Constructing Mandarin Personalized Speech Recognition Systems for the Speech Impaired. 21-25 - Xiaoming Liang, Zhihua Huang:
The Contributions of Formants to the Intelligibility in Uyghur Sine-Wave Sentences. 1-5 - Shaochuan Zhang, Fengji Li, Li Wang, Jie Zhou, Haijun Niu:
Tongue Model-Driven Method Based on Fully Connected Neural Network. 121-125 - Mewlude Nijat, Dong Wang, Askar Hamdulla:
A Fresh Review on Chinese Pronunciation Acquisition: Insights and Recommendations for L2 Foreign Children. 91-95 - Xiaoke Yang, Cunhang Fan, Hongyu Zhang, Xingguang Dong, Jian Zhou, Xinhui Li, Zhao Lv:
Cross-Subject Domain Adaptation for EEG-Based Auditory Attention Decoding via Prototypical Representation. 461-465 - Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou:
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-Based Multilingual Pretraining. 264-268 - Yuhao Zhao, Maoshen Jia, Jiawei Ru, Junqi Tai:
A Hybrid DFSMN and Mamba Architecture for Low Bitrate Neural Speech Coding. 1-5 - Fengyu Xu, Yongxiong Xiao, Qiang Fu:
ViT-Based EEG Analysis Method for Auditory Attention Detection. 324-328 - Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Xu Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge. 641-645
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.