Lim et al., 2017 - Google Patents

Harmonic and percussive source separation using a convolutional auto encoder

Lim et al., 2017

Document ID: 1493233667207657261
Author: Lim W; Lee T
Publication year: 2017
Publication venue: 2017 25th European Signal Processing Conference (EUSIPCO)

External Links

Cited by

Snippet

Real world audio signals are generally a mixture of harmonic and percussive sounds. In this paper, we present a novel method for separating the harmonic and percussive audio signals from an audio mixture. Proposed method involves the use of a convolutional auto-encoder …

Continue reading at eurasip.org (PDF) (other versions)

238000000926 separation method 0 title abstract description 17

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices

Similar Documents

Publication	Publication Date	Title
Wang et al.	2021	TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain
Choi et al.	2018	Phase-aware speech enhancement with deep complex u-net
Kadıoğlu et al.	2020	An empirical study of Conv-TasNet
Lim et al.	2017	Harmonic and percussive source separation using a convolutional auto encoder
Strauss et al.	2021	A flow-based neural network for time domain speech enhancement
CN110942766A (en)	2020-03-31	Audio event detection method, system, mobile terminal and storage medium
Yan et al.	2020	An iterative graph spectral subtraction method for speech enhancement
Geng et al.	2020	End-to-end speech enhancement based on discrete cosine transform
Hou et al.	2020	Multi-task learning for end-to-end noise-robust bandwidth extension
Takeuchi et al.	2020	Invertible DNN-based nonlinear time-frequency transform for speech enhancement
Wu et al.	2023	Self-supervised speech denoising using only noisy audio signals
Do et al.	2020	Speech Separation in the Frequency Domain with Autoencoder.
Huang et al.	2022	Dccrgan: Deep complex convolution recurrent generator adversarial network for speech enhancement
Jannu et al.	2023	Multi-stage progressive learning-based speech enhancement using time–frequency attentive squeezed temporal convolutional networks
CN108806725A (en)	2018-11-13	Speech differentiation method, apparatus, computer equipment and storage medium
Joy et al.	2020	Deep scattering power spectrum features for robust speech recognition
Hepsiba et al.	2022	Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN
Hossain et al.	2021	Dual-transform source separation using sparse nonnegative matrix factorization
CN113611321A (en)	2021-11-05	Voice enhancement method and system
CN111009259B (en)	2022-09-16	Audio processing method and device
Zhang et al.	2024	CicadaNet: Deep learning based automatic cicada chorus filtering for improved long-term bird monitoring
Sunny et al.	2012	Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam
CN110619886A (en)	2019-12-27	End-to-end voice enhancement method for low-resource Tujia language
Abuhajar et al.	2021	Network compression and frame stitching for efficient and robust speech enhancement
Mashiana et al.	2019	Speech enhancement using residual convolutional neural network