default search action
ASRU 2015: Scottsdale, AZ, USA
- 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015. IEEE 2015, ISBN 978-1-4799-7291-3
Automatic Speech Recognition I
- Irina Illina, Dominique Fohr:
Different word representations and their combination for proper name retrieval from diachronic documents. 1-7 - Ciprian Chelba, Noam Shazeer:
Sparse non-negative matrix language modeling for geo-annotated query session data. 8-14 - Naoyuki Kanda, Mitsuyoshi Tachimori, Xugang Lu, Hisashi Kawai:
Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling. 15-21 - Khe Chai Sim:
On constructing and analysing an interpretable brain model for the DNN based on hidden activity patterns. 22-29 - Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani, Andrew W. Senior:
Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms. 30-36 - Suman V. Ravuri:
Hybrid DNN-Latent structured SVM acoustic models for continuous speech recognition. 37-44 - Shuangyu Chang, Abhik Lahiri, Issac Alphonso, Barlas Oguz, Michael Levit, Benoît Dumoulin:
Discriminative training of context-dependent language model scaling factors and interpolation weights. 45-51 - Ryu Takeda
, Kazunori Komatani, Kazuhiro Nakadai:
Acoustic model training based on node-wise weight boundary model increasing speed of discrete neural networks. 52-58 - Zhichao Wang, Xingyu Na, Xin Li, Jielin Pan, Yonghong Yan:
Two-stage ASGD framework for parallel training of DNN acoustic models using Ethernet. 59-64 - Taesup Moon, Heeyoul Choi, Hoshik Lee, Inchul Song:
RNNDROP: A novel dropout for RNNS in ASR. 65-70 - Hadrien Glaude, Cyrille Enderli, Olivier Pietquin:
Spectral learning with non negative probabilities for finite state automaton. 71-77 - Abdel-rahman Mohamed, Frank Seide, Dong Yu, Jasha Droppo
, Andreas Stolcke, Geoffrey Zweig, Gerald Penn
:
Deep bi-directional recurrent networks over spectral windows. 78-83 - Bo-Hsiang Tseng, Hung-yi Lee, Lin-Shan Lee:
Personalizing universal recurrent neural network language model with user characteristic features by social network crowdsourcing. 84-91 - David Snyder, Daniel Garcia-Romero, Daniel Povey:
Time delay deep neural network-based universal background models for speaker recognition. 92-97
Text-to-Speech Systems
- Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu:
Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. 98-102 - Nichola Lubold, Heather Pon-Barry, Erin Walker:
Naturalness and rapport in a pitch adaptive learning companion. 103-110 - Sai Krishna Rallabandi, Sai Sirisha Rallabandi, Padmini Bandi, Suryakanth V. Gangashetty
:
Learning continuous representation of text for phone duration modeling in statistical parametric speech synthesis. 111-115 - Mahsa Sadat Elyasi Langarani, Jan P. H. van Santen
:
Speaker intonation adaptation for transforming text-to-speech synthesis speaker identity. 116-123
Automatic Speech Recognition II
- Gueorgui Pironkov, Stéphane Dupont
, Thierry Dutoit:
Investigating sparse deep neural networks for speech recognition. 124-129 - Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng
, Thomas Hain
:
Latent Dirichlet Allocation based organisation of broadcast media archives for deep neural network adaptation. 130-136 - Yi-Hsiu Liao, Hung-yi Lee, Lin-Shan Lee:
Towards structured deep neural network for automatic speech recognition. 137-144 - Lahiru Samarakoon, Khe Chai Sim:
Learning factorized feature transforms for speaker normalization. 145-152 - Thiago Fraga-Silva, Antoine Laurent, Jean-Luc Gauvain, Lori Lamel, Viet Bac Le, Abdelkhalek Messaoudi:
Improving data selection for low-resource STT and KWS. 153-159 - Rogier C. van Dalen, Jingzhou Yang, Haipeng Wang, Anton Ragni, Chao Zhang, Mark J. F. Gales:
Structured discriminative models using deep neural-network features. 160-166 - Yajie Miao, Mohammad Gowayyed, Florian Metze:
EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. 167-174 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, Mirna Adriani:
Stochastic Gradient Variational Bayes for deep learning-based ASR. 175-180 - Xie Chen, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Investigation of back-off based interpolation between recurrent neural network and n-gram language models. 181-186 - Jinyu Li
, Abdelrahman Mohamed, Geoffrey Zweig, Yifan Gong:
LSTM time and frequency recurrence for automatic speech recognition. 187-191
Spoken Document Retrieval, Speech Summarization, and Applications
- Scott Novotney, Kevin Jett, Owen Kimball:
Incorporating user feedback to re-rank keyword search results. 192-199 - Nagisa Sakamoto, Kazumasa Yamamoto, Seiichi Nakagawa:
Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification. 200-206 - Kuan-Yu Chen, Kai-Wun Shih, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang
:
Incorporating paragraph embeddings and density peaks clustering for spoken document summarization. 207-214 - Meng Cai, Zhiqiang Lv, Cheng Lu, Jian Kang, Like Hui, Zhuo Zhang, Jia Liu:
High-performance Swahili keyword search with very limited language pack: The THUEE system for the OpenKWS15 evaluation. 215-222 - Paula Lopez-Otero
, Laura Docío Fernández
, Carmen García-Mateo
:
Phonetic unit selection for cross-lingual query-by-example spoken term detection. 223-229 - Zhiqiang Lv, Meng Cai, Cheng Lu, Jian Kang, Like Hui, Wei-Qiang Zhang, Jia Liu:
Improved system fusion for keyword search. 230-236 - David F. Harwath, James R. Glass:
Deep multimodal semantic embeddings for speech and images. 237-244 - Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Chia-Hsiang Liu, Hung-yi Lee, Lin-Shan Lee:
An iterative deep learning framework for unsupervised discovery of speech features and linguistic units with applications on spoken term detection. 245-251 - Sakriani Sakti, Faiz Ilham, Graham Neubig, Tomoki Toda
, Ayu Purwarianti, Satoshi Nakamura:
Incremental sentence compression using LSTM recurrent networks. 252-258 - Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran, Abhinav Sethy, Kartik Audhkhasi, Xiaodong Cui, Ellen Kislal, Lidia Mangu, Markus Nußbaum-Thom, Michael Picheny, Zoltán Tüske, Pavel Golik
, Ralf Schlüter
, Hermann Ney, Mark J. F. Gales, Kate M. Knill, Anton Ragni, Haipeng Wang, Philip C. Woodland:
Multilingual representations for low resource speech recognition and keyword search. 259-266
Robustness in Automatic Speech Recognition, Speech-to-Speech Translation, and Spontaneous Speech Processing
- Laurent Besacier, Benjamin Lecouteux, Ngoc-Quang Luong, Ngoc-Tien Le:
Spoken language translation graphs re-decoding using automatic quality assessment. 267-274 - Mirco Ravanelli
, Luca Cristoforetti
, Roberto Gretter, Marco Pellin, Alessandro Sosi, Maurizio Omologo
:
The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments. 275-282 - Sri Harish Reddy Mallidi, Tetsuji Ogawa, Hynek Hermansky
:
Uncertainty estimation of DNN classifiers. 283-288 - Ivan Himawan
, Petr Motlícek
, Marc Ferras Font, Srikanth R. Madikeri:
Towards utterance-based neural network adaptation in acoustic modeling. 289-295 - Nicholas Ruiz, Marcello Federico:
Phonetically-oriented word error alignment for speech recognition error analysis in speech translation. 296-302 - Lara J. Martin
, Andrew Wilkinson, Sai Sumanth Miryala, Vivian Robison, Alan W. Black:
Utterance classification in speech-to-speech translation for zero-resource languages in the hospital administration domain. 303-309 - Yanmin Qian, Maofan Yin, Yongbin You, Kai Yu:
Multi-task joint-learning of deep neural networks for robust speech recognition. 310-316 - Vikramjit Mitra, Horacio Franco:
Time-frequency convolutional networks for robust speech recognition. 317-323 - Wen Wang, Haibo Li, Heng Ji:
Name-aware language model adaptation and sparse features for statistical machine translation. 324-330 - Shivesh Ranjan, Gang Liu, John H. L. Hansen:
An i-Vector PLDA based gender identification approach for severely distorted and multilingual DARPA RATS data. 331-337 - Zhou Yu, Vikram Ramanarayanan, David Suendermann-Oeft, Xinhao Wang, Klaus Zechner, Lei Chen, Jidong Tao, Aliaksei Ivanou, Yao Qian:
Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech. 338-345
Spoken Language Understanding
- Mohamed Morchid, Richard Dufour, Georges Linarès:
Topic-space based setup of a neural network for theme identification of highly imperfect transcriptions. 346-352 - Yangyang Shi, Kaisheng Yao, Hu Chen, Yi-Cheng Pan, Mei-Yuh Hwang:
Semi-supervised slot tagging in spoken language understanding using recurrent transductive support vector machines. 353-360 - Asli Celikyilmaz
, Zhaleh Feizollahi, Dilek Hakkani-Tür
, Ruhi Sarikaya:
A universal model for flexible item selection in conversational dialogs. 361-367 - Suman V. Ravuri, Andreas Stolcke:
A comparative study of neural network models for lexical intent classification. 368-374 - Yun-Nung Chen, Dilek Hakkani-Tür, Xiaodong He:
Detecting actionable items in meetings by convolutional deep structured semantic models. 375-382 - Mickael Rouvier, Sebastien Delecraz
, Benoît Favre
, Meriem Bendris, Frédéric Béchet:
Multimodal embedding fusion for robust speaker role recognition in video broadcast. 383-389 - Marc-Antoine Rondeau, Yi Su:
Recent improvements to NeuroCRFs for named entity recognition. 390-396 - Xiaohu Liu
, Asli Celikyilmaz
, Ruhi Sarikaya:
Natural language understanding for partial queries. 397-400
3rd Chime Speech Separation and Recognition Challenge
- Alexey Prudnikov, Maxim Korenevsky, Sergei Aleinik:
Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition. 401-408 - Shahab Jalalvand, Daniele Falavigna, Marco Matassoni, Piergiorgio Svaizer
, Maurizio Omologo
:
Boosted acoustic model learning and hypotheses rescoring on the CHiME-3 task. 409-415 - Yusuke Fujita, Ryoichi Takashima, Takeshi Homma
, Rintaro Ikeshita, Yohei Kawaguchi
, Takashi Sumiyoshi, Takashi Endo, Masahito Togami:
Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection. 416-422 - Thanh T. Vu
, Benjamin Bigot, Engsiong Chng
:
Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge. 423-429 - Jun Du, Qing Wang, Yanhui Tu, Xiao Bao, Li-Rong Dai, Chin-Hui Lee:
An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework. 430-435 - Takuya Yoshioka, Nobutaka Ito, Marc Delcroix
, Atsunori Ogawa, Keisuke Kinoshita
, Masakiyo Fujimoto, Chengzhu Yu, Wojciech J. Fabian, Miquel Espi, Takuya Higuchi, Shoko Araki
, Tomohiro Nakatani
:
The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices. 436-443 - Jahn Heymann, Lukas Drude, Aleksej Chinaev
, Reinhold Haeb-Umbach
:
BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge. 444-451 - Lukas Pfeifenberger, Tobias Schrank, Matthias Zöhrer, Martin Hagmüller
, Franz Pernkopf
:
Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results. 452-459 - Shengkui Zhao, Xiong Xiao, Zhaofeng Zhang, Thi Ngoc Tho Nguyen, Xionghu Zhong, Bo Ren, Longbiao Wang, Douglas L. Jones, Engsiong Chng
, Haizhou Li
:
Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction. 460-467 - Niko Moritz, Stephan Gerlach, Kamil Adiloglu, Jörn Anemüller, Birger Kollmeier, Stefan Goetze
:
A CHiME-3 challenge system: Long-term acoustic features for noise robust automatic speech recognition. 468-474 - Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe
:
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition. 475-481 - Sunit Sivasankaran, Aditya Arie Nugraha
, Emmanuel Vincent, Juan Andres Morales-Cordovilla, Siddharth Dalmia, Irina Illina, Antoine Liutkus:
Robust ASR using neural network based speech enhancement and feature simulation. 482-489 - Ning Ma
, Ricard Marxer
, Jon Barker, Guy J. Brown
:
Exploiting synchrony spectra and deep neural networks for noise-robust automatic speech recognition. 490-495 - Deblin Bagchi, Michael I. Mandel, Zhongqiu Wang, Yanzhang He, Andrew R. Plummer
, Eric Fosler-Lussier:
Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition. 496-503 - Jon Barker, Ricard Marxer
, Emmanuel Vincent, Shinji Watanabe
:
The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines. 504-511
Automatic Speech Recognition in Reverberant Environments (ASPIRE)
- Jennifer Melot
, Nicolas Malyska, Jessica Ray, Wade Shen:
Analysis of factors affecting system performance in the ASpIRE challenge. 512-517 - Jonathan William Dennis, Tran Huy Dat:
Single and multi-channel approaches for distant speech recognition under noisy reverberant conditions: I2R'S system description for the ASpIRE challenge. 518-524 - Vikramjit Mitra, Julien van Hout, Wen Wang, Martin Graciarena, Mitchell McLaren, Horacio Franco, Dimitra Vergyri:
Improving robustness against reverberation for automatic speech recognition. 525-532 - Roger Hsiao, Jeff Z. Ma, William Hartmann, Martin Karafiát
, Frantisek Grézl, Lukás Burget
, Igor Szöke, Jan Cernocký
, Shinji Watanabe
, Zhuo Chen, Sri Harish Reddy Mallidi, Hynek Hermansky
, Stavros Tsakalidis, Richard M. Schwartz:
Robust speech recognition in unknown reverberant and noisy conditions. 533-538 - Vijayaditya Peddinti, Guoguo Chen, Vimal Manohar, Tom Ko
, Daniel Povey, Sanjeev Khudanpur:
JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS. 539-546 - Mary Harper:
The Automatic Speech recogition In Reverberant Environments (ASpIRE) challenge. 547-554
Automatic Speech Recognition III
- Sina Hamidi Ghalehjegh, Richard C. Rose:
Deep bottleneck features for i-vector based text-independent speaker verification. 555-560 - Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
:
Discriminative segmental cascades for feature-rich phone recognition. 561-568 - Steven Sandoval, Phillip L. De Leon
, Julie M. Liss:
Hilbert spectral analysis of vowels using intrinsic mode functions. 569-575 - Ahmed M. Ali, Walid Magdy
, Peter Bell, Steve Renals
:
Multi-reference WER for evaluating ASR for languages with no orthographic rules. 576-580 - Yuzong Liu, Katrin Kirchhoff:
Acoustic modeling with neural graph embeddings. 581-588 - Olivier Siohan, David Rybach:
Multitask learning and system combination for automatic speech recognition. 589-595 - Zoltán Tüske, Pavel Golik
, Ralf Schlüter
, Hermann Ney:
Speaker adaptive joint training of Gaussian mixture models and bottleneck features. 596-603 - Andrew W. Senior, Hasim Sak, Felix de Chaumont Quitry, Tara N. Sainath, Kanishka Rao:
Acoustic modelling with CD-CTC-SMBR LSTM RNNS. 604-609 - Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe
, Kevin Duh:
Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy. 610-616 - Shawn Tan, Khe Chai Sim, Mark J. F. Gales:
Improving the interpretability of deep neural networks with stimulated learning. 617-623
The MGB Challenge - Recognition of Multi-Genre Broadcast Data
- Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner
, Raymond W. M. Ng
, Madina Hasan, Yulan Liu, Thomas Hain
:
The 2015 sheffield system for transcription of Multi-Genre Broadcast media. 624-631 - Rosanna Milner
, Oscar Saz, Salil Deena, Mortaza Doulaty, Raymond W. M. Ng
, Thomas Hain
:
The 2015 sheffield system for longitudinal diarisation of broadcast media. 632-638 - Philip C. Woodland, Xunying Liu, Yanmin Qian, Chao Zhang, Mark J. F. Gales, Penny Karanasou, Pierre Lanchantin, Linlin Wang:
Cambridge university transcription systems for the multi-genre broadcast challenge. 639-646 - Pierre Lanchantin, Mark J. F. Gales, Penny Karanasou, Xunying Liu, Yanmin Qian, Linlin Wang, Philip C. Woodland, Chao Zhang:
The development of the cambridge university alignment systems for the multi-genre broadcast challenge. 647-653 - Quoc Truong Do, Michael Heck, Sakriani Sakti, Graham Neubig, Tomoki Toda
, Satoshi Nakamura:
The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function. 654-659 - Penny Karanasou, Mark J. F. Gales, Pierre Lanchantin, Xunying Liu, Yanmin Qian, Linlin Wang, Philip C. Woodland, Chao Zhang:
Speaker diarisation and longitudinal linking in multi-genre broadcast data. 660-666 - Jesús Antonio Villalba López, Alfonso Ortega
, Antonio Miguel, Eduardo Lleida
:
Variational Bayesian PLDA for speaker diarization in the MGB challenge. 667-674 - Peter Bell, Steve Renals
:
A system for automatic alignment of broadcast media captions using weighted finite-state transducers. 675-680 - Vishwa Gupta, Paul Deléglise, Gilles Boulianne
, Yannick Estève, Sylvain Meignier, Anthony Rousseau:
CRIM and LIUM approaches for multi-genre broadcast media transcription. 681-686 - Peter Bell, Mark J. F. Gales, Thomas Hain
, Jonathan Kilgour, Pierre Lanchantin, Xunying Liu, Andrew McParland, Steve Renals
, Oscar Saz, Mirjam Wester, Philip C. Woodland:
The MGB challenge: Evaluating multi-genre broadcast media recognition. 687-693
Spoken Dialog Systems
- Lukás Zilka, Filip Jurcícek:
Incremental LSTM-based dialog state tracker. 757-762 - David Vandyke, Pei-hao Su, Milica Gasic, Nikola Mrksic, Tsung-Hsien Wen, Steve J. Young:
Multi-domain dialogue success classifiers for policy training. 763-770 - Jeesoo Bang, Sangdo Han, Kyusong Lee, Gary Geunbae Lee:
Open-domain personalized dialog system using user-interested topics in system responses. 771-776 - Nurul Lubis, Sakriani Sakti, Graham Neubig, Koichiro Yoshino, Tomoki Toda
, Satoshi Nakamura:
A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows. 777-783 - Masahiro Mizukami, Hideaki Kizuki, Toshio Nomura, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda
, Satoshi Nakamura:
Adaptive selection from multiple response candidates in example-based dialogue. 784-790 - Hang Ren, Weiqun Xu, Yonghong Yan:
Optimizing human-interpretable dialog management policy using genetic algorithm. 791-797 - Sangjun Koo, Seonghan Ryu, Gary Geunbae Lee:
Implementation of generic positive-negative tracker in extensible dialog system. 798-805 - Milica Gasic, Nikola Mrksic, Pei-hao Su, David Vandyke, Tsung-Hsien Wen, Steve J. Young:
Policy committee for adaptation in multi-domain spoken dialogue systems. 806-812 - Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou:
Applying deep learning to answer selection: A study and an open task. 813-820
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.