Venkataramani et al., 2017 - Google Patents
Adaptive front-ends for end-to-end source separationVenkataramani et al., 2017
View PDF- Document ID
- 9409704723440602405
- Author
- Venkataramani S
- Casebeer J
- Smaragdis P
- Publication year
- Publication venue
- Proc. NIPS
External Links
Snippet
Source separation and other audio applications have traditionally relied on the use of short- time Fourier transforms as a front-end frequency domain representation step. We present an auto-encoder neural network that can act as an equivalent to short-time front-end transforms …
- 238000000926 separation method 0 title abstract description 25
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Venkataramani et al. | Adaptive front-ends for end-to-end source separation | |
Venkataramani et al. | End-to-end source separation with adaptive front-ends | |
Li et al. | Glance and gaze: A collaborative learning framework for single-channel speech enhancement | |
Qian et al. | Speech Enhancement Using Bayesian Wavenet. | |
Grais et al. | Raw multi-channel audio source separation using multi-resolution convolutional auto-encoders | |
WO2019008580A1 (en) | Method and system for enhancing a speech signal of a human speaker in a video using visual information | |
Wichern et al. | Phase reconstruction with learned time-frequency representations for single-channel speech separation | |
Yuan | A time–frequency smoothing neural network for speech enhancement | |
CN108198566B (en) | Information processing method and device, electronic device and storage medium | |
Geng et al. | End-to-end speech enhancement based on discrete cosine transform | |
Lee et al. | Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities | |
Xu et al. | CASE-Net: Integrating local and non-local attention operations for speech enhancement | |
Tu et al. | A complex-valued multichannel speech enhancement learning algorithm for optimal tradeoff between noise reduction and speech distortion | |
Fan et al. | CompNet: Complementary network for single-channel speech enhancement | |
Wu et al. | Self-supervised speech denoising using only noisy audio signals | |
Şimşekli et al. | Non-negative tensor factorization models for Bayesian audio processing | |
Zheng et al. | Low-latency monaural speech enhancement with deep filter-bank equalizer | |
CN101322183B (en) | Signal distortion elimination apparatus and method | |
Raj et al. | Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients | |
Nie et al. | Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation | |
Ullah et al. | Single channel speech dereverberation and separation using RPCA and SNMF | |
Chen et al. | A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation | |
Li et al. | Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation. | |
Badiezadegan et al. | A wavelet-based thresholding approach to reconstructing unreliable spectrogram components | |
Mamun et al. | CFTNet: Complex-valued frequency transformation network for speech enhancement |