[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20070203694A1 - Single-sided speech quality measurement - Google Patents

Single-sided speech quality measurement Download PDF

Info

Publication number
US20070203694A1
US20070203694A1 US11/364,252 US36425206A US2007203694A1 US 20070203694 A1 US20070203694 A1 US 20070203694A1 US 36425206 A US36425206 A US 36425206A US 2007203694 A1 US2007203694 A1 US 2007203694A1
Authority
US
United States
Prior art keywords
speech
speech quality
models
consistency
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/364,252
Inventor
Wai-Yip Chan
Tiago Falk
Mohamed El-Hennawey
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arlington Technologies LLC
Avaya Management LP
Original Assignee
Nortel Networks Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US11/364,252 priority Critical patent/US20070203694A1/en
Assigned to NORTEL NETWORKS LIMITED reassignment NORTEL NETWORKS LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHAN, WAI-YIP, FALK, TIAGO H., EL-HENNAWEY, MOHAMED
Application filed by Nortel Networks Ltd filed Critical Nortel Networks Ltd
Publication of US20070203694A1 publication Critical patent/US20070203694A1/en
Assigned to CITIBANK, N.A., AS ADMINISTRATIVE AGENT reassignment CITIBANK, N.A., AS ADMINISTRATIVE AGENT SECURITY AGREEMENT Assignors: AVAYA INC.
Assigned to CITICORP USA, INC., AS ADMINISTRATIVE AGENT reassignment CITICORP USA, INC., AS ADMINISTRATIVE AGENT SECURITY AGREEMENT Assignors: AVAYA INC.
Assigned to AVAYA INC. reassignment AVAYA INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NORTEL NETWORKS LIMITED
Assigned to BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLATERAL AGENT, THE reassignment BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLATERAL AGENT, THE SECURITY AGREEMENT Assignors: AVAYA INC., A DELAWARE CORPORATION
Priority to US13/195,338 priority patent/US9786300B2/en
Assigned to AVAYA INC. reassignment AVAYA INC. BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 025863/0535 Assignors: THE BANK OF NEW YORK MELLON TRUST, NA
Assigned to AVAYA INC. reassignment AVAYA INC. BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 023892/0500 Assignors: CITIBANK, N.A.
Assigned to SIERRA HOLDINGS CORP., AVAYA, INC. reassignment SIERRA HOLDINGS CORP. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: CITICORP USA, INC.
Assigned to AVAYA LLC, AVAYA MANAGEMENT L.P. reassignment AVAYA LLC INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT Assignors: CITIBANK, N.A.
Assigned to AVAYA LLC, AVAYA MANAGEMENT L.P. reassignment AVAYA LLC INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT Assignors: WILMINGTON SAVINGS FUND SOCIETY, FSB
Assigned to ARLINGTON TECHNOLOGIES, LLC reassignment ARLINGTON TECHNOLOGIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AVAYA LLC
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals

Definitions

  • This invention relates generally to the field of telecommunications, and more particularly to double-ended measurement of speech quality.
  • the capability of measuring speech quality in a telecommunications network is important to telecommunications service providers. Measurements of speech quality can be employed to assist with network maintenance and troubleshooting, and can also be used to evaluate new technologies, protocols and equipment. However, anticipating how people will perceive speech quality can be difficult.
  • the traditional technique for measuring speech quality is a subjective listening test. In a subjective listening test a group of people manually, i.e., by listening, score the quality of speech according to, e.g., an Absolute Categorical Rating (“ACR”) scale, Bad (1), Poor (2), Fair (3), Good (4), Excellent (5).
  • ACR Absolute Categorical Rating
  • MOS Mean Opinion Score
  • DMOS degradation mean opinion scores
  • Objective measurement provides a rapid and economical means to estimate user opinion, and makes it possible to perform real-time speech quality measurement on a network-wide scale. Objective measurement can be performed either intrusively or non-intrusively.
  • Intrusive measurement also called double-ended or input-output-based measurement, is based on measuring the distortion between the received and transmitted speech signals, often with an underlying requirement that the transmitted signal be a “clean” signal of high quality.
  • Non-intrusive measurement also called single-ended or output-based measurement, does not require the clean signal to estimate quality. In a working commercial network it may be difficult to provide both the clean signal and the received speech signal to the test equipment because of the distances between endpoints. Consequently, non-intrusive techniques should be more practical for implementation outside of a test facility because they do not require a clean signal.
  • a single-ended speech quality measurement method comprises the steps of: extracting perceptual features from a received speech signal; assessing the perceptual features with at least one statistical model of the features to form indicators of speech quality; and employing the indicators of speech quality to produce a speech quality score.
  • apparatus operable to provide a single-ended speech quality measurement comprises: a feature extraction module operable to extract perceptual features from a received speech signal; a statistical reference model and consistency calculation module operable in response to output from the feature extraction module to assess the perceptual features to form indicators of speech quality; and a scoring module operable to employ the indicators of speech quality to produce a speech quality score.
  • One advantage of the inventive technique is reduction of processing requirements for speech quality measurement without significant degradation in performance.
  • Simulations with Perceptual Linear Prediction (“PLP”) coefficients have shown that the inventive technique can outperform P.563 by up to 44.74% in correlation R for SMV coded speech under noisy conditions.
  • the inventive technique is comparable to P.563 under various other conditions.
  • An average 40% reduction in processing time was obtained compared to P.563, with P.563 implemented using a quicker procedural computer language than the interpretive language used to run the inventive technique.
  • the speedup that can be obtained from the inventive technique programmed with a procedural language such as C is expected to be much greater.
  • FIG. 1 is a block diagram of a non-intrusive measurement technique including a statistical reference model.
  • FIG. 1 illustrates a relatively easily calculable non-intrusive measurement technique.
  • the input is a speech (“test”) signal for which a subjective quality score is to be estimated ( 100 ), e.g., a speech signal that has been processed by network equipment, transmitted on a communications link, or both.
  • a feature extraction module ( 102 ) is employed to extract perceptual features, frame by frame, from the test signal.
  • a time segmentation module ( 104 ) labels the feature vector of each frame as belonging to one of three possible segment classes: voiced, unvoiced, or inactive. In a separate process, statistical or probability models such as Gaussian Mixture Models are formed.
  • statistical model and “statistical reference model” as used herein encompass probability models, statistical probability models and the like, as those terms are understood in the art. Different models may be formed for different classes of speech signals. For instance, one class could be high-quality, undistorted speech signal. Other classes could be speech impaired by different types of distortions. A distinct model may be used for each of the segment classes in each speech signal class, or one single model may be used for a speech class with no distinction between segments.
  • the different statistical models together comprise a reference model ( 106 ) of the behavior of speech features. Features extracted from the test signal ( 100 ) are assessed using the reference model by calculating a “consistency” measure with respect to each statistical model via a consistency calculation module ( 108 ).
  • the consistency values serve as indicators of speech quality and are mapped to an estimated subjective score, such as Mean Opinion Score (“MOS”), degradation mean opinion score (“DMOS”), or some other type of subjective score, using a mapping module ( 110 ), thereby producing an estimated score ( 112 ).
  • MOS Mean Opinion Score
  • DMOS degradation mean opinion score
  • 112 an estimated score
  • perceptual linear prediction (“PLP”) cepstral coefficients serve as primary features and are extracted from the speech signal every 10 ms.
  • the coefficients are obtained from an “auditory spectrum” constructed to exploit three psychoacoustic precepts: critical band spectral resolution, equal-loudness curve, and intensity loudness power law.
  • the auditory spectrum is approximated by an all-pole auto-regressive model, the coefficients of which are transformed to PLP cepstral coefficients.
  • the order of the auto-regressive model determines the amount of detail in the auditory spectrum preserved by the model. Higher order models tend to preserve more speaker-dependent information.
  • time segmentation is employed to separate the speech frames into different classes. Each class appears to exert different influence on the overall speech quality.
  • Time segmentation is performed using a voice activity detector (“VAD”) and a voicing detector.
  • VAD voice activity detector
  • the VAD identifies each 10-ms speech frame as being active or inactive.
  • the voicing detector further labels active frames as voiced or unvoiced.
  • the VAD from ITU-T Rec. G.729-Annex B, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, International Telecommunication Union, Geneva, Switzerland. November 1996, which is incorporated by reference, is employed.
  • a Gaussian mixture density is a weighted sum of M component densities as p ⁇ ( u
  • GMM parameters are initialized using the k-means algorithm described in A. Gersho and R. Gray, Vector Quantization and Signal Compression . Norwell, Mass.: Kluwer, 1992, which is incorporated by reference, and estimated using the expectation-maximization (“EM”) algorithm described in A. Dempster, N. Lair, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J Royal Statistical Society , vol. ⁇ 39, pp. 1-38, 1977, which is incorporated by reference.
  • EM expectation-maximization
  • the EM algorithm iterations produce a sequence of models with monotonically non-decreasing log-likelihood (“LL”) values.
  • the algorithm is deemed to have converged when the difference of LL values between two consecutive iterations drops below 10 ⁇ 3 .
  • a GMM is used to model the PLP cepstral coefficients of each class of speech frames. For instance, consider the class of clean speech signals. Three different Gaussian mixture densities p class (u
  • . . , x Nclass are the feature vectors in the class, and N class is the number of such vectors in the statistical model class. Larger C class indicates greater consistency. C class is set to zero whenever N class is zero. For each class, the product of the consistency measure C class and the fraction of frames of that class in the speech signal is calculated. The products for all the model classes serve as quality indicators to be mapped to an objective estimate of the subjective score value.
  • mapping functions which may be utilized include multivariate polynomial regression and multivariate adaptive regression splines (“MARS”), as described in J. H. Friedman, “Multivariate adaptive regression splines,” The Annals of Statistics , vol. 19, no 1, pp. 1-141, March 1991.
  • MARS multivariate polynomial regression and multivariate adaptive regression splines

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.

Description

    FIELD OF THE INVENTION
  • This invention relates generally to the field of telecommunications, and more particularly to double-ended measurement of speech quality.
  • BACKGROUND OF THE INVENTION
  • The capability of measuring speech quality in a telecommunications network is important to telecommunications service providers. Measurements of speech quality can be employed to assist with network maintenance and troubleshooting, and can also be used to evaluate new technologies, protocols and equipment. However, anticipating how people will perceive speech quality can be difficult. The traditional technique for measuring speech quality is a subjective listening test. In a subjective listening test a group of people manually, i.e., by listening, score the quality of speech according to, e.g., an Absolute Categorical Rating (“ACR”) scale, Bad (1), Poor (2), Fair (3), Good (4), Excellent (5). The average of the scores, known as a Mean Opinion Score (“MOS”), is then calculated and used to characterize the performance of speech codecs, transmission equipment, and networks. Other kinds of subjective tests and scoring schemes may also be used, e.g., degradation mean opinion scores (“DMOS”). Regardless of the scoring scheme, subjective listening tests are time consuming and costly.
  • Machine-automated, “objective” measurement is known as an alternative to subjective listening tests. Objective measurement provides a rapid and economical means to estimate user opinion, and makes it possible to perform real-time speech quality measurement on a network-wide scale. Objective measurement can be performed either intrusively or non-intrusively. Intrusive measurement, also called double-ended or input-output-based measurement, is based on measuring the distortion between the received and transmitted speech signals, often with an underlying requirement that the transmitted signal be a “clean” signal of high quality. Non-intrusive measurement, also called single-ended or output-based measurement, does not require the clean signal to estimate quality. In a working commercial network it may be difficult to provide both the clean signal and the received speech signal to the test equipment because of the distances between endpoints. Consequently, non-intrusive techniques should be more practical for implementation outside of a test facility because they do not require a clean signal.
  • Several non-intrusive measurement schemes are known. In C. Jin and R. Kubichek. “Vector quantization techniques for output-based objective speech quality,” in Proc. IEEE Inf. Conf. Acoustics, Speech, Signal Processing, vol. 1, May 1996, pp. 491-494, comparisons between features of the received speech signal and vector quantizer (“VQ”) codebook representations of the features of clean speech are used to estimate quality. In W. Li and R. Kubichek, “Output-based objective speech quality measurement using continuous hidden Markov models,” in Proc. 7th Inl. Strap. Signal Processing Applications, vol. I. July 2003. pp. 389-392, the VQ codebook reference is replaced with a hidden Markov model. In P. Gray, M. P. Hollier. and R. E. Massara. “Non-intrusive speech-quality assessment using vocal-tract models,” Proc. Inst. Elect. Eng., Vision, Image. Signal Process., vol. 147, no. 6, pp. 493-501, December 2000 and D. S. Kim. “ANIQUE: An auditory model for single-ended speech quality estimation,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 821-831, September 2005, vocal tract modeling and modulation-spectral features derived from the temporal envelope of speech, respectively, provide quality cues for non-intrusive quality measurement. More recently, a non-intrusive method using neurofuzzy inference was proposed in G. Chen and V. Parsa, “Nonintrusive speech quality evaluation using an adaptive neurofuzzy inference system,” IEEE Signal Process. Lett., vol. 12, no. 5, pp. 403-106, May 2005. The International Telecommunications Union ITU-T P.563 standard represents the “state-of-the-art” algorithm, ITU-T P.563, Single Ended Method for Objective Speech Quality Assessment in Narrow-Band Telephony Applications, International Telecommunication Union, Geneva, Switzerland, May 2004. However, each of these known non-intrusive measurement schemes is computationally intensive relative to the capabilities of equipment which could currently be widely deployed at low cost. Consequently, a less computationally intensive non-intrusive solution would be desirable in order to facilitate deployment outside of test facilities.
  • SUMMARY OF THE INVENTION
  • In accordance with one embodiment of the invention, a single-ended speech quality measurement method comprises the steps of: extracting perceptual features from a received speech signal; assessing the perceptual features with at least one statistical model of the features to form indicators of speech quality; and employing the indicators of speech quality to produce a speech quality score.
  • In accordance with another embodiment of the invention, apparatus operable to provide a single-ended speech quality measurement, comprises: a feature extraction module operable to extract perceptual features from a received speech signal; a statistical reference model and consistency calculation module operable in response to output from the feature extraction module to assess the perceptual features to form indicators of speech quality; and a scoring module operable to employ the indicators of speech quality to produce a speech quality score.
  • One advantage of the inventive technique is reduction of processing requirements for speech quality measurement without significant degradation in performance. Simulations with Perceptual Linear Prediction (“PLP”) coefficients have shown that the inventive technique can outperform P.563 by up to 44.74% in correlation R for SMV coded speech under noisy conditions. The inventive technique is comparable to P.563 under various other conditions. An average 40% reduction in processing time was obtained compared to P.563, with P.563 implemented using a quicker procedural computer language than the interpretive language used to run the inventive technique. Thus, the speedup that can be obtained from the inventive technique programmed with a procedural language such as C is expected to be much greater.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 is a block diagram of a non-intrusive measurement technique including a statistical reference model.
  • DETAILED DESCRIPTION
  • FIG. 1 illustrates a relatively easily calculable non-intrusive measurement technique. The input is a speech (“test”) signal for which a subjective quality score is to be estimated (100), e.g., a speech signal that has been processed by network equipment, transmitted on a communications link, or both. A feature extraction module (102) is employed to extract perceptual features, frame by frame, from the test signal. A time segmentation module (104) labels the feature vector of each frame as belonging to one of three possible segment classes: voiced, unvoiced, or inactive. In a separate process, statistical or probability models such as Gaussian Mixture Models are formed. The terms “statistical model” and “statistical reference model” as used herein encompass probability models, statistical probability models and the like, as those terms are understood in the art. Different models may be formed for different classes of speech signals. For instance, one class could be high-quality, undistorted speech signal. Other classes could be speech impaired by different types of distortions. A distinct model may be used for each of the segment classes in each speech signal class, or one single model may be used for a speech class with no distinction between segments. The different statistical models together comprise a reference model (106) of the behavior of speech features. Features extracted from the test signal (100) are assessed using the reference model by calculating a “consistency” measure with respect to each statistical model via a consistency calculation module (108). The consistency values serve as indicators of speech quality and are mapped to an estimated subjective score, such as Mean Opinion Score (“MOS”), degradation mean opinion score (“DMOS”), or some other type of subjective score, using a mapping module (110), thereby producing an estimated score (112).
  • Referring now to the feature extraction module (102), perceptual linear prediction (“PLP”) cepstral coefficients serve as primary features and are extracted from the speech signal every 10 ms. The coefficients are obtained from an “auditory spectrum” constructed to exploit three psychoacoustic precepts: critical band spectral resolution, equal-loudness curve, and intensity loudness power law. The auditory spectrum is approximated by an all-pole auto-regressive model, the coefficients of which are transformed to PLP cepstral coefficients. The order of the auto-regressive model determines the amount of detail in the auditory spectrum preserved by the model. Higher order models tend to preserve more speaker-dependent information. Since the illustrated embodiment is directed to measuring quality variation due to the transmission system rather than the speaker, speaker independence is a desirable property. In the illustrated embodiment fifth-order PLP coefficients as described in H. Hermansky, “Perceptual linear prediction (PLP) analysis of speech,” J. Acoust. Soc. Amer., vol. 87, pp. 1738-1752, 1990, (“Hermansky”), which is incorporated by reference, are employed as speaker-independent speech spectral parameters. Other types of features, such as RASTA-PLP, may also be employed in lieu of PLP.
  • Referring now to the time segmentation module (104), time segmentation is employed to separate the speech frames into different classes. Each class appears to exert different influence on the overall speech quality. Time segmentation is performed using a voice activity detector (“VAD”) and a voicing detector. The VAD identifies each 10-ms speech frame as being active or inactive. The voicing detector further labels active frames as voiced or unvoiced. In the illustrated embodiment the VAD from ITU-T Rec. G.729-Annex B, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, International Telecommunication Union, Geneva, Switzerland. November 1996, which is incorporated by reference, is employed.
  • Referring to the GMM reference model (106), where u is a K-dimensional feature vector, a Gaussian mixture density is a weighted sum of M component densities as p ( u | λ ) = i = 1 M α i b i ( u ) ( Eq . 1 )
    where αi≧0, i=1, . . . , M are the mixture weights, with i = 1 M α i = 1 ,
    and bi(u), i=1, . . . , M, are K-variate Gaussian densities with mean vector μi and covariance matrix Σi. The parameter list λ={λ1, . . . , λM} defines a particular Gaussian mixture density, where λi={μi, Σi, αi}. GMM parameters are initialized using the k-means algorithm described in A. Gersho and R. Gray, Vector Quantization and Signal Compression. Norwell, Mass.: Kluwer, 1992, which is incorporated by reference, and estimated using the expectation-maximization (“EM”) algorithm described in A. Dempster, N. Lair, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J Royal Statistical Society, vol.˜39, pp. 1-38, 1977, which is incorporated by reference. The EM algorithm iterations produce a sequence of models with monotonically non-decreasing log-likelihood (“LL”) values. The algorithm is deemed to have converged when the difference of LL values between two consecutive iterations drops below 10−3.
  • Referring specifically to the reference model (106), a GMM is used to model the PLP cepstral coefficients of each class of speech frames. For instance, consider the class of clean speech signals. Three different Gaussian mixture densities pclass(u|λ) are trained. The subscript “class” represents either voiced, unvoiced, or inactive frames. In principle, by evaluating a statistical model at the PLP cepstral coefficients x of the test signal, i.e., pclass(x|λ), a measure of consistency between the coefficient vector and the statistical model is obtained. Voiced coefficient vectors are applied to pvoiced(u|λ), unvoiced vectors to punvoiced(u|λ), and inactive vectors to pinactive(u|λ).
  • Referring now to the consistency calculation module (108), it should be noted that a simplifying assumption is made that vectors between frames are independent. Improved performance might be obtained from more sophisticated approaches that model the statistical dependency between frames, such as Markov modeling. Nevertheless, a model with low computational complexity has benefits as already discussed above. For a given speech signal whose feature vectors have been classified as described above, the consistency between the feature vectors of a class and the statistical model of that class is calculated as c class ( x 1 , , x N class ) = 1 N class j = 1 N class log ( p class ( x j | λ ) ) ( Eq . 2 )
    where x1, . . . , xNclass, are the feature vectors in the class, and Nclass is the number of such vectors in the statistical model class. Larger Cclass indicates greater consistency. Cclass is set to zero whenever Nclass is zero. For each class, the product of the consistency measure Cclass and the fraction of frames of that class in the speech signal is calculated. The products for all the model classes serve as quality indicators to be mapped to an objective estimate of the subjective score value.
  • Referring now to the mapping module (110), mapping functions which may be utilized include multivariate polynomial regression and multivariate adaptive regression splines (“MARS”), as described in J. H. Friedman, “Multivariate adaptive regression splines,” The Annals of Statistics, vol. 19, no 1, pp. 1-141, March 1991. With MARS, the mapping is constructed as a weighted sum of basis functions, each taking the form of a truncated spline
  • While the invention is described through the above exemplary embodiments, it will be understood by those of ordinary skill in the art that modification to and variation of the illustrated embodiments may be made without departing from the inventive concepts herein disclosed. Moreover, while the preferred embodiments are described in connection with various illustrative structures, one skilled in the art will recognize that the system may be embodied using a variety of specific structures. Accordingly, the invention should not be viewed as limited except by the scope and spirit of the appended claims.

Claims (18)

1. A single-ended speech quality measurement method comprising the steps of:
extracting perceptual features from a speech signal;
assessing the perceptual features with statistical models to form indicators of speech quality; and
employing the indicators of speech quality to produce a speech quality score.
2. The method of claim 1 including the further step of extracting the perceptual features from the received speech signal frame-by-frame.
3. The method of claim 2 further including the step of classifying the frames by signal contents.
4. The method of claim 3 including the further step of separately modeling the probability distribution of the features for each frame class and different classes of speech signals with statistical models.
5. The method of claim 4 including the further step of calculating a consistency measure indicative of speech quality for each class separately with a plurality of statistical models.
6. The method of claim 5 including the further step of employing the consistency measures to obtain an estimate of subjective scores.
7. The method of claim 6 including the further step of mapping the consistency measures to a speech quality score using a mapping, such as Multivariate Adaptive Regression Splines.
8. The method of claim 1 wherein the perceptual features are assessed with Gaussian Mixture Models to form indicators of speech quality.
9. The method of claim 4 wherein the classes include voiced, unvoiced, and inactive.
10. Apparatus operable to provide a single-ended speech quality measurement, comprising:
a feature extraction module operable to extract perceptual features from a received speech signal;
a statistical reference model and consistency calculation module operable in response to output from the feature extraction module to assess the perceptual features to form indicators of speech quality; and
a scoring module operable to employ the indicators of speech quality to produce a speech quality score.
11. The apparatus of claim 10 wherein the feature extraction module is further operable to extract the perceptual features from the received speech signal frame-by-frame.
12. The apparatus of claim 11 further including a time segmentation module operable to classify the frames by signal contents.
13. The apparatus of claim 12 wherein the consistency calculation module is further operable to separately model the probability distribution of the features for each class and different classes of speech signals with the statistical models.
14. The apparatus of claim 13 wherein the consistency calculation module is further operable to calculate a consistency measure indicative of speech quality for each class separately with a plurality of Gaussian Mixture Models.
15. The apparatus of claim 14 further including a mapping module operable to employ the consistency measures to obtain an estimate of subjective scores.
16. The apparatus of claim 15 wherein the mapping module employs a mapping, such as one optimized using Multivariate Adaptive Regression Splines.
17. The apparatus of claim 10 wherein the statistical reference model includes Gaussian Mixture Models.
18. The apparatus of claim 13 wherein the classes include voiced, unvoiced, and inactive.
US11/364,252 2006-02-28 2006-02-28 Single-sided speech quality measurement Abandoned US20070203694A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/364,252 US20070203694A1 (en) 2006-02-28 2006-02-28 Single-sided speech quality measurement
US13/195,338 US9786300B2 (en) 2006-02-28 2011-08-01 Single-sided speech quality measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/364,252 US20070203694A1 (en) 2006-02-28 2006-02-28 Single-sided speech quality measurement

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/195,338 Continuation US9786300B2 (en) 2006-02-28 2011-08-01 Single-sided speech quality measurement

Publications (1)

Publication Number Publication Date
US20070203694A1 true US20070203694A1 (en) 2007-08-30

Family

ID=38445095

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/364,252 Abandoned US20070203694A1 (en) 2006-02-28 2006-02-28 Single-sided speech quality measurement
US13/195,338 Active 2027-05-19 US9786300B2 (en) 2006-02-28 2011-08-01 Single-sided speech quality measurement

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/195,338 Active 2027-05-19 US9786300B2 (en) 2006-02-28 2011-08-01 Single-sided speech quality measurement

Country Status (1)

Country Link
US (2) US20070203694A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233469A1 (en) * 2006-03-30 2007-10-04 Industrial Technology Research Institute Method for speech quality degradation estimation and method for degradation measures calculation and apparatuses thereof
US20130080172A1 (en) * 2011-09-22 2013-03-28 General Motors Llc Objective evaluation of synthesized speech attributes
US20140188470A1 (en) * 2012-12-31 2014-07-03 Jenny Chang Flexible architecture for acoustic signal processing engine
CN116504274A (en) * 2023-05-30 2023-07-28 南开大学 Non-invasive voice quality evaluation method enhanced by retrieval
US12051440B1 (en) * 2023-04-12 2024-07-30 Civil Aviation Flight University Of China Self-attention-based speech quality measuring method and system for real-time air traffic control
US12093314B2 (en) * 2019-11-22 2024-09-17 Tencent Music Entertainment Technology (Shenzhen) Co., Ltd. Accompaniment classification method and apparatus

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9830905B2 (en) 2013-06-26 2017-11-28 Qualcomm Incorporated Systems and methods for feature extraction
US9685173B2 (en) * 2013-09-06 2017-06-20 Nuance Communications, Inc. Method for non-intrusive acoustic parameter estimation
US9870784B2 (en) 2013-09-06 2018-01-16 Nuance Communications, Inc. Method for voicemail quality detection
US9917952B2 (en) 2016-03-31 2018-03-13 Dolby Laboratories Licensing Corporation Evaluation of perceptual delay impact on conversation in teleconferencing system
US11495244B2 (en) 2018-04-04 2022-11-08 Pindrop Security, Inc. Voice modification detection using physical models of speech production
CN110047466B (en) * 2019-04-16 2021-04-13 深圳市数字星河科技有限公司 Method for openly creating voice reading standard reference model

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715372A (en) * 1995-01-10 1998-02-03 Lucent Technologies Inc. Method and apparatus for characterizing an input signal
US5794188A (en) * 1993-11-25 1998-08-11 British Telecommunications Public Limited Company Speech signal distortion measurement which varies as a function of the distribution of measured distortion over time and frequency
US5848384A (en) * 1994-08-18 1998-12-08 British Telecommunications Public Limited Company Analysis of audio quality using speech recognition and synthesis
US5999900A (en) * 1993-06-21 1999-12-07 British Telecommunications Public Limited Company Reduced redundancy test signal similar to natural speech for supporting data manipulation functions in testing telecommunications equipment
US6266638B1 (en) * 1999-03-30 2001-07-24 At&T Corp Voice quality compensation system for speech synthesis based on unit-selection speech database
US6477492B1 (en) * 1999-06-15 2002-11-05 Cisco Technology, Inc. System for automated testing of perceptual distortion of prompts from voice response systems
US6564181B2 (en) * 1999-05-18 2003-05-13 Worldcom, Inc. Method and system for measurement of speech distortion from samples of telephonic voice signals
US6609092B1 (en) * 1999-12-16 2003-08-19 Lucent Technologies Inc. Method and apparatus for estimating subjective audio signal quality from objective distortion measures
US6625785B2 (en) * 2000-04-19 2003-09-23 Georgia Tech Research Corporation Method for diagnosing process parameter variations from measurements in analog circuits
US20040186715A1 (en) * 2003-01-18 2004-09-23 Psytechnics Limited Quality assessment tool
US7313517B2 (en) * 2003-03-31 2007-12-25 Koninklijke Kpn N.V. Method and system for speech quality prediction of an audio transmission system
US7526394B2 (en) * 2003-01-21 2009-04-28 Psytechnics Limited Quality assessment tool

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE60006995T2 (en) * 1999-11-08 2004-10-28 British Telecommunications P.L.C. NON-INFLUENCING ASSESSMENT OF LANGUAGE QUALITY
GB2407952B (en) * 2003-11-07 2006-11-29 Psytechnics Ltd Quality assessment tool
US20060200346A1 (en) * 2005-03-03 2006-09-07 Nortel Networks Ltd. Speech quality measurement based on classification estimation
US7856355B2 (en) * 2005-07-05 2010-12-21 Alcatel-Lucent Usa Inc. Speech quality assessment method and system
CN101411171B (en) * 2006-01-31 2013-05-08 艾利森电话股份有限公司 Non-intrusive signal quality assessment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5999900A (en) * 1993-06-21 1999-12-07 British Telecommunications Public Limited Company Reduced redundancy test signal similar to natural speech for supporting data manipulation functions in testing telecommunications equipment
US5794188A (en) * 1993-11-25 1998-08-11 British Telecommunications Public Limited Company Speech signal distortion measurement which varies as a function of the distribution of measured distortion over time and frequency
US5848384A (en) * 1994-08-18 1998-12-08 British Telecommunications Public Limited Company Analysis of audio quality using speech recognition and synthesis
US5715372A (en) * 1995-01-10 1998-02-03 Lucent Technologies Inc. Method and apparatus for characterizing an input signal
US6266638B1 (en) * 1999-03-30 2001-07-24 At&T Corp Voice quality compensation system for speech synthesis based on unit-selection speech database
US6564181B2 (en) * 1999-05-18 2003-05-13 Worldcom, Inc. Method and system for measurement of speech distortion from samples of telephonic voice signals
US6477492B1 (en) * 1999-06-15 2002-11-05 Cisco Technology, Inc. System for automated testing of perceptual distortion of prompts from voice response systems
US6609092B1 (en) * 1999-12-16 2003-08-19 Lucent Technologies Inc. Method and apparatus for estimating subjective audio signal quality from objective distortion measures
US6625785B2 (en) * 2000-04-19 2003-09-23 Georgia Tech Research Corporation Method for diagnosing process parameter variations from measurements in analog circuits
US20040186715A1 (en) * 2003-01-18 2004-09-23 Psytechnics Limited Quality assessment tool
US7526394B2 (en) * 2003-01-21 2009-04-28 Psytechnics Limited Quality assessment tool
US7313517B2 (en) * 2003-03-31 2007-12-25 Koninklijke Kpn N.V. Method and system for speech quality prediction of an audio transmission system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233469A1 (en) * 2006-03-30 2007-10-04 Industrial Technology Research Institute Method for speech quality degradation estimation and method for degradation measures calculation and apparatuses thereof
US7801725B2 (en) * 2006-03-30 2010-09-21 Industrial Technology Research Institute Method for speech quality degradation estimation and method for degradation measures calculation and apparatuses thereof
US20130080172A1 (en) * 2011-09-22 2013-03-28 General Motors Llc Objective evaluation of synthesized speech attributes
US20140188470A1 (en) * 2012-12-31 2014-07-03 Jenny Chang Flexible architecture for acoustic signal processing engine
US9653070B2 (en) * 2012-12-31 2017-05-16 Intel Corporation Flexible architecture for acoustic signal processing engine
US12093314B2 (en) * 2019-11-22 2024-09-17 Tencent Music Entertainment Technology (Shenzhen) Co., Ltd. Accompaniment classification method and apparatus
US12051440B1 (en) * 2023-04-12 2024-07-30 Civil Aviation Flight University Of China Self-attention-based speech quality measuring method and system for real-time air traffic control
CN116504274A (en) * 2023-05-30 2023-07-28 南开大学 Non-invasive voice quality evaluation method enhanced by retrieval

Also Published As

Publication number Publication date
US20110288865A1 (en) 2011-11-24
US9786300B2 (en) 2017-10-10

Similar Documents

Publication Publication Date Title
US9786300B2 (en) Single-sided speech quality measurement
Falk et al. Single-ended speech quality measurement using machine learning methods
US8195449B2 (en) Low-complexity, non-intrusive speech quality assessment
Malfait et al. P. 563—The ITU-T standard for single-ended speech quality assessment
Rix et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs
EP1083541B1 (en) A method and apparatus for speech detection
US6609092B1 (en) Method and apparatus for estimating subjective audio signal quality from objective distortion measures
Rix Perceptual speech quality assessment-a review
US20060200346A1 (en) Speech quality measurement based on classification estimation
Falk et al. Non-intrusive GMM-based speech quality measurement
Dubey et al. Non-intrusive speech quality assessment using several combinations of auditory features
Falk et al. Nonintrusive speech quality estimation using Gaussian mixture models
Chen et al. Bayesian model based non-intrusive speech quality evaluation
Picovici et al. Output-based objective speech quality measure using self-organizing map
Huber et al. Single-ended speech quality prediction based on automatic speech recognition
Picovici et al. New output-based perceptual measure for predicting subjective quality of speech
Zafar et al. Speech quality assessment using mel frequency spectrograms of speech signals
Kim A cue for objective speech quality estimation in temporal envelope representations
Falk et al. Enhanced non-intrusive speech quality measurement using degradation models
Hinterleitner et al. Comparison of approaches for instrumentally predicting the quality of text-to-speech systems: Data from Blizzard Challenges 2008 and 2009
Mahdi Perceptual non‐intrusive speech quality assessment using a self‐organizing map
Mahdi et al. New single-ended objective measure for non-intrusive speech quality evaluation
Zha et al. A data mining approach to objective speech quality measurement
Wang et al. Non-intrusive objective speech quality measurement based on GMM and SVR for narrowband and wideband speech
Aburas et al. Symbian Based Perceptual Evaluation of Speech Quality for Telecommunication Networks

Legal Events

Date Code Title Description
AS Assignment

Owner name: NORTEL NETWORKS LIMITED, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHAN, WAI-YIP;FALK, TIAGO H.;EL-HENNAWEY, MOHAMED;REEL/FRAME:017630/0230;SIGNING DATES FROM 20060227 TO 20060228

AS Assignment

Owner name: CITIBANK, N.A., AS ADMINISTRATIVE AGENT,NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023892/0500

Effective date: 20100129

Owner name: CITIBANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023892/0500

Effective date: 20100129

AS Assignment

Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001

Effective date: 20100129

Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT,NEW YO

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001

Effective date: 20100129

Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT, NEW Y

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001

Effective date: 20100129

AS Assignment

Owner name: AVAYA INC.,NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NORTEL NETWORKS LIMITED;REEL/FRAME:023998/0878

Effective date: 20091218

Owner name: AVAYA INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NORTEL NETWORKS LIMITED;REEL/FRAME:023998/0878

Effective date: 20091218

AS Assignment

Owner name: BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLAT

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC., A DELAWARE CORPORATION;REEL/FRAME:025863/0535

Effective date: 20110211

Owner name: BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLATERAL AGENT, THE, PENNSYLVANIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC., A DELAWARE CORPORATION;REEL/FRAME:025863/0535

Effective date: 20110211

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: AVAYA INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 023892/0500;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044891/0564

Effective date: 20171128

Owner name: AVAYA INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 025863/0535;ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST, NA;REEL/FRAME:044892/0001

Effective date: 20171128

AS Assignment

Owner name: SIERRA HOLDINGS CORP., NEW JERSEY

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITICORP USA, INC.;REEL/FRAME:045045/0564

Effective date: 20171215

Owner name: AVAYA, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITICORP USA, INC.;REEL/FRAME:045045/0564

Effective date: 20171215

AS Assignment

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:WILMINGTON SAVINGS FUND SOCIETY, FSB;REEL/FRAME:066894/0227

Effective date: 20240325

Owner name: AVAYA LLC, DELAWARE

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:WILMINGTON SAVINGS FUND SOCIETY, FSB;REEL/FRAME:066894/0227

Effective date: 20240325

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:066894/0117

Effective date: 20240325

Owner name: AVAYA LLC, DELAWARE

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:066894/0117

Effective date: 20240325

AS Assignment

Owner name: ARLINGTON TECHNOLOGIES, LLC, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AVAYA LLC;REEL/FRAME:067022/0780

Effective date: 20240329