Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index
Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch ...
Body Conducted Speech Enhancement by Equalization and Signal Fusion
This paper studies body-conducted speech for noise robust speech processing purposes. As body-conducted speech is typically limited in bandwidth, signal processing is required to obtain a signal that is both high in quality and low in noise. We propose ...
Soundfield Imaging in the Ray Space
In this work we propose a general approach to acoustic scene analysis based on a novel data structure (ray-space image) that encodes the directional plenacoustic function over a line segment (Observation Window, OW). We define and describe a system for ...
Cross-Lingual Automatic Speech Recognition Using Tandem Features
Automatic speech recognition depends on large amounts of transcribed speech recordings in order to estimate the parameters of the acoustic model. Recording such large speech corpora is time-consuming and expensive; as a result, sufficient quantities of ...
Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement
This paper proposes a versatile technique for integrating two conventional speech enhancement approaches, a spatial clustering approach (SCA) and a factorial model approach (FMA), which are based on two different features of signals, namely spatial and ...
Linearly-Constrained Minimum-Variance Method for Spherical Microphone Arrays Based on Plane-Wave Decomposition of the Sound Field
Speech signals recorded in real environments may be corrupted by ambient noise and reverberation. Therefore, noise reduction and dereverberation algorithms for speech enhancement are typically employed in speech communication systems. Although ...
Source/Filter Factorial Hidden Markov Model, With Application to Pitch and Formant Tracking
Tracking vocal tract formant frequencies $(f_{p})$ and estimating the fundamental frequency $(f_{0})$ are two tracking problems that have been tackled in many speech processing works, often independently, with applications to articulatory parameters ...
A Bag of Systems Representation for Music Auto-Tagging
We present a content-based automatic tagging system for music that relies on a high-level, concise “Bag of Systems” (BoS) representation of the characteristics of a musical piece. The BoS representation leverages a rich dictionary of musical codewords, ...
HMM Based Intermediate Matching Kernel for Classification of Sequential Patterns of Speech Using Support Vector Machines
In this paper, we address the issues in the design of an intermediate matching kernel (IMK) for classification of sequential patterns using support vector machine (SVM) based classifier for tasks such as speech recognition. Specifically, we address the ...
Geometry-Based Spatial Sound Acquisition Using Distributed Microphone Arrays
Traditional spatial sound acquisition aims at capturing a sound field with multiple microphones such that at the reproduction side a listener can perceive the sound image as it was at the recording location. Standard techniques for spatial sound ...
A Class of Optimal Rectangular Filtering Matrices for Single-Channel Signal Enhancement in the Time Domain
In this paper, we introduce a new class of optimal rectangular filtering matrices for single-channel speech enhancement. The new class of filters exploits the fact that the dimension of the signal subspace is lower than that of the full space. By doing ...
Understanding Effects of Subjectivity in Measuring Chord Estimation Accuracy
To assess the performance of an automatic chord estimation system, reference annotations are indispensable. However, owing to the complexity of music and the sometimes ambiguous harmonic structure of polyphonic music, chord annotations are inherently ...
Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs
Today's speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models whose parameters are estimated using a discriminative training criterion such as Maximum Mutual Information (MMI) or Minimum Phone Error (MPE). ...
Declipping of Audio Signals Using Perceptual Compressed Sensing
The restoration of clipped audio signals, commonly known as declipping, is important to achieve an improved level of audio quality in many audio applications. In this paper, a novel declipping algorithm is presented, jointly based on the theory of ...
List of Reviewers
Lists the reviewers who contributed to IEEE Transactions on Audio, Speech, and Language Processing in 2013.