default search action
Wangyou Zhang
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c32]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. EMNLP 2024: 10205-10224 - [c31]Wangyou Zhang, Jee-weon Jung, Yanmin Qian:
Improving Design of Input Condition Invariant Speech Enhancement. ICASSP 2024: 10696-10700 - [c30]Linfeng Yu, Wangyou Zhang, Chenpeng Du, Leying Zhang, Zheng Liang, Yanmin Qian:
Generation-Based Target Speech Extraction with Speech Discretization and Vocoder. ICASSP 2024: 12612-12616 - [i28]Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian:
Improving Design of Input Condition Invariant Speech Enhancement. CoRR abs/2401.14271 (2024) - [i27]Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe:
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models. CoRR abs/2401.17230 (2024) - [i26]Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024) - [i25]Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian:
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement. CoRR abs/2406.04269 (2024) - [i24]Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement. CoRR abs/2406.04660 (2024) - [i23]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. CoRR abs/2407.00837 (2024) - [i22]Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas W. D. Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe:
Text-To-Speech Synthesis In The Wild. CoRR abs/2409.08711 (2024) - [i21]Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas W. D. Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe:
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild. CoRR abs/2409.17285 (2024) - 2023
- [j4]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing. J. Open Source Softw. 8(91): 5403 (2023) - [j3]Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu:
A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL. IEEE Trans. Pattern Anal. Mach. Intell. 45(11): 13796-13813 (2023) - [c29]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. ASRU 2023: 1-8 - [c28]Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. ASRU 2023: 1-8 - [c27]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction. ASRU 2023: 1-6 - [c26]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement For Diverse Input Conditions. ASRU 2023: 1-6 - [c25]Wangyou Zhang, Lei Yang, Yanmin Qian:
Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing. ASRU 2023: 1-6 - [c24]Linfeng Yu, Wangyou Zhang, Chenda Li, Yanmin Qian:
Overlap Aware Continuous Speech Separation without Permutation Invariant Training. INTERSPEECH 2023: 3512-3516 - [c23]Wangyou Zhang, Yanmin Qian:
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition. INTERSPEECH 2023: 3517-3521 - [c22]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. WASPAA 2023: 1-5 - [d1]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310). Zenodo, 2023 - [i20]Wangyou Zhang, Yanmin Qian:
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition. CoRR abs/2305.16286 (2023) - [i19]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. CoRR abs/2307.12231 (2023) - [i18]Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. CoRR abs/2309.13876 (2023) - [i17]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning. CoRR abs/2309.15317 (2023) - [i16]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement for Diverse Input Conditions. CoRR abs/2309.17384 (2023) - [i15]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction. CoRR abs/2310.08277 (2023) - 2022
- [j2]Wangyou Zhang, Xuankai Chang, Christoph Böddeker, Tomohiro Nakatani, Shinji Watanabe, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3173-3188 (2022) - [c21]Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu:
Text Adaptive Detection for Customizable Keyword Spotting. ICASSP 2022: 6652-6656 - [c20]Zhikai Zhou, Wei Wang, Wangyou Zhang, Yanmin Qian:
Exploring Effective Data Utilization for Low-Resource Speech Recognition. ICASSP 2022: 8192-8196 - [c19]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge. ICASSP 2022: 9201-9205 - [c18]Wei Wang, Xun Gong, Yifei Wu, Zhikai Zhou, Chenda Li, Wangyou Zhang, Bing Han, Yanmin Qian:
The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021. ICASSP 2022: 9261-9265 - [c17]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-wise Permutation Invariant Training. INTERSPEECH 2022: 5383-5387 - [c16]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. INTERSPEECH 2022: 5458-5462 - [c15]Wei Wang, Wangyou Zhang, Shaoxiong Lin, Yanmin Qian:
Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition. ISCSLP 2022: 334-338 - [c14]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. SLT 2022: 496-501 - [i14]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge. CoRR abs/2202.12298 (2022) - [i13]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Multi-speaker ASR with Independent Vector Analysis. CoRR abs/2204.00218 (2022) - [i12]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. CoRR abs/2207.09514 (2022) - 2021
- [c13]Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, Yuekai Zhang:
Recent Developments on Espnet Toolkit Boosted By Conformer. ICASSP 2021: 5874-5878 - [c12]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. ICASSP 2021: 6898-6902 - [c11]Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach:
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation. ICASSP 2021: 8428-8432 - [c10]Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe:
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration. SLT 2021: 785-792 - [c9]Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian:
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions. WASPAA 2021: 146-150 - [i11]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. CoRR abs/2102.11525 (2021) - [i10]Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian:
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions. CoRR abs/2110.14139 (2021) - [i9]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-Wise Permutation Invariant Training. CoRR abs/2110.14142 (2021) - 2020
- [j1]Wangyou Zhang, Xuankai Chang, Yanmin Qian, Shinji Watanabe:
Improving End-to-End Single-Channel Multi-Talker Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1385-1394 (2020) - [c8]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe:
End-To-End Multi-Speaker Speech Recognition With Transformer. ICASSP 2020: 6134-6138 - [c7]Wangyou Zhang, Yanmin Qian:
Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition. INTERSPEECH 2020: 304-308 - [c6]Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. INTERSPEECH 2020: 324-328 - [i8]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe:
End-to-End Multi-speaker Speech Recognition with Transformer. CoRR abs/2002.03921 (2020) - [i7]Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. CoRR abs/2005.10479 (2020) - [i6]Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, Yuekai Zhang:
Recent Developments on ESPnet Toolkit Boosted by Conformer. CoRR abs/2010.13956 (2020) - [i5]Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe:
ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration. CoRR abs/2011.03706 (2020) - [i4]Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Shinji Watanabe, Reinhold Haeb-Umbach:
Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation. CoRR abs/2011.15003 (2020) - [i3]Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, Wangyou Zhang:
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans. CoRR abs/2012.13006 (2020)
2010 – 2019
- 2019
- [c5]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe:
MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition. ASRU 2019: 237-244 - [c4]Shigeki Karita, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto:
A Comparative Study on Transformer vs RNN in Speech Applications. ASRU 2019: 449-456 - [c3]Wangyou Zhang, Man Sun, Lan Wang, Yanmin Qian:
End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform. ASRU 2019: 660-666 - [c2]Wangyou Zhang, Xuankai Chang, Yanmin Qian:
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System. INTERSPEECH 2019: 2633-2637 - [c1]Wangyou Zhang, Ying Zhou, Yanmin Qian:
Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking. INTERSPEECH 2019: 2703-2707 - [i2]Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang:
A Comparative Study on Transformer vs RNN in Speech Applications. CoRR abs/1909.06317 (2019) - [i1]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe:
MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition. CoRR abs/1910.06522 (2019)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-15 19:30 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint