[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Venkataramani et al., 2017 - Google Patents

Adaptive front-ends for end-to-end source separation

Venkataramani et al., 2017

View PDF
Document ID
9409704723440602405
Author
Venkataramani S
Casebeer J
Smaragdis P
Publication year
Publication venue
Proc. NIPS

External Links

Snippet

Source separation and other audio applications have traditionally relied on the use of short- time Fourier transforms as a front-end frequency domain representation step. We present an auto-encoder neural network that can act as an equivalent to short-time front-end transforms …
Continue reading at paris.cs.illinois.edu (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition

Similar Documents

Publication Publication Date Title
Venkataramani et al. Adaptive front-ends for end-to-end source separation
Venkataramani et al. End-to-end source separation with adaptive front-ends
Li et al. Glance and gaze: A collaborative learning framework for single-channel speech enhancement
Qian et al. Speech Enhancement Using Bayesian Wavenet.
Grais et al. Raw multi-channel audio source separation using multi-resolution convolutional auto-encoders
WO2019008580A1 (en) Method and system for enhancing a speech signal of a human speaker in a video using visual information
Wichern et al. Phase reconstruction with learned time-frequency representations for single-channel speech separation
Yuan A time–frequency smoothing neural network for speech enhancement
CN108198566B (en) Information processing method and device, electronic device and storage medium
Geng et al. End-to-end speech enhancement based on discrete cosine transform
Lee et al. Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities
Xu et al. CASE-Net: Integrating local and non-local attention operations for speech enhancement
Tu et al. A complex-valued multichannel speech enhancement learning algorithm for optimal tradeoff between noise reduction and speech distortion
Fan et al. CompNet: Complementary network for single-channel speech enhancement
Wu et al. Self-supervised speech denoising using only noisy audio signals
Şimşekli et al. Non-negative tensor factorization models for Bayesian audio processing
Zheng et al. Low-latency monaural speech enhancement with deep filter-bank equalizer
CN101322183B (en) Signal distortion elimination apparatus and method
Raj et al. Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients
Nie et al. Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation
Ullah et al. Single channel speech dereverberation and separation using RPCA and SNMF
Chen et al. A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation
Li et al. Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation.
Badiezadegan et al. A wavelet-based thresholding approach to reconstructing unreliable spectrogram components
Mamun et al. CFTNet: Complex-valued frequency transformation network for speech enhancement