US20070203694A1 - Single-sided speech quality measurement - Google Patents
Single-sided speech quality measurement Download PDFInfo
- Publication number
- US20070203694A1 US20070203694A1 US11/364,252 US36425206A US2007203694A1 US 20070203694 A1 US20070203694 A1 US 20070203694A1 US 36425206 A US36425206 A US 36425206A US 2007203694 A1 US2007203694 A1 US 2007203694A1
- Authority
- US
- United States
- Prior art keywords
- speech
- speech quality
- models
- consistency
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000005259 measurement Methods 0.000 title claims description 18
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000013179 statistical model Methods 0.000 claims abstract description 13
- 238000013507 mapping Methods 0.000 claims abstract description 10
- 239000000203 mixture Substances 0.000 claims abstract description 9
- 230000003044 adaptive effect Effects 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000000691 measurement method Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 7
- 230000002860 competitive effect Effects 0.000 abstract 1
- 238000010801 machine learning Methods 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 15
- 238000012360 testing method Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000001303 quality assessment method Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/69—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
Definitions
- This invention relates generally to the field of telecommunications, and more particularly to double-ended measurement of speech quality.
- the capability of measuring speech quality in a telecommunications network is important to telecommunications service providers. Measurements of speech quality can be employed to assist with network maintenance and troubleshooting, and can also be used to evaluate new technologies, protocols and equipment. However, anticipating how people will perceive speech quality can be difficult.
- the traditional technique for measuring speech quality is a subjective listening test. In a subjective listening test a group of people manually, i.e., by listening, score the quality of speech according to, e.g., an Absolute Categorical Rating (“ACR”) scale, Bad (1), Poor (2), Fair (3), Good (4), Excellent (5).
- ACR Absolute Categorical Rating
- MOS Mean Opinion Score
- DMOS degradation mean opinion scores
- Objective measurement provides a rapid and economical means to estimate user opinion, and makes it possible to perform real-time speech quality measurement on a network-wide scale. Objective measurement can be performed either intrusively or non-intrusively.
- Intrusive measurement also called double-ended or input-output-based measurement, is based on measuring the distortion between the received and transmitted speech signals, often with an underlying requirement that the transmitted signal be a “clean” signal of high quality.
- Non-intrusive measurement also called single-ended or output-based measurement, does not require the clean signal to estimate quality. In a working commercial network it may be difficult to provide both the clean signal and the received speech signal to the test equipment because of the distances between endpoints. Consequently, non-intrusive techniques should be more practical for implementation outside of a test facility because they do not require a clean signal.
- a single-ended speech quality measurement method comprises the steps of: extracting perceptual features from a received speech signal; assessing the perceptual features with at least one statistical model of the features to form indicators of speech quality; and employing the indicators of speech quality to produce a speech quality score.
- apparatus operable to provide a single-ended speech quality measurement comprises: a feature extraction module operable to extract perceptual features from a received speech signal; a statistical reference model and consistency calculation module operable in response to output from the feature extraction module to assess the perceptual features to form indicators of speech quality; and a scoring module operable to employ the indicators of speech quality to produce a speech quality score.
- One advantage of the inventive technique is reduction of processing requirements for speech quality measurement without significant degradation in performance.
- Simulations with Perceptual Linear Prediction (“PLP”) coefficients have shown that the inventive technique can outperform P.563 by up to 44.74% in correlation R for SMV coded speech under noisy conditions.
- the inventive technique is comparable to P.563 under various other conditions.
- An average 40% reduction in processing time was obtained compared to P.563, with P.563 implemented using a quicker procedural computer language than the interpretive language used to run the inventive technique.
- the speedup that can be obtained from the inventive technique programmed with a procedural language such as C is expected to be much greater.
- FIG. 1 is a block diagram of a non-intrusive measurement technique including a statistical reference model.
- FIG. 1 illustrates a relatively easily calculable non-intrusive measurement technique.
- the input is a speech (“test”) signal for which a subjective quality score is to be estimated ( 100 ), e.g., a speech signal that has been processed by network equipment, transmitted on a communications link, or both.
- a feature extraction module ( 102 ) is employed to extract perceptual features, frame by frame, from the test signal.
- a time segmentation module ( 104 ) labels the feature vector of each frame as belonging to one of three possible segment classes: voiced, unvoiced, or inactive. In a separate process, statistical or probability models such as Gaussian Mixture Models are formed.
- statistical model and “statistical reference model” as used herein encompass probability models, statistical probability models and the like, as those terms are understood in the art. Different models may be formed for different classes of speech signals. For instance, one class could be high-quality, undistorted speech signal. Other classes could be speech impaired by different types of distortions. A distinct model may be used for each of the segment classes in each speech signal class, or one single model may be used for a speech class with no distinction between segments.
- the different statistical models together comprise a reference model ( 106 ) of the behavior of speech features. Features extracted from the test signal ( 100 ) are assessed using the reference model by calculating a “consistency” measure with respect to each statistical model via a consistency calculation module ( 108 ).
- the consistency values serve as indicators of speech quality and are mapped to an estimated subjective score, such as Mean Opinion Score (“MOS”), degradation mean opinion score (“DMOS”), or some other type of subjective score, using a mapping module ( 110 ), thereby producing an estimated score ( 112 ).
- MOS Mean Opinion Score
- DMOS degradation mean opinion score
- 112 an estimated score
- perceptual linear prediction (“PLP”) cepstral coefficients serve as primary features and are extracted from the speech signal every 10 ms.
- the coefficients are obtained from an “auditory spectrum” constructed to exploit three psychoacoustic precepts: critical band spectral resolution, equal-loudness curve, and intensity loudness power law.
- the auditory spectrum is approximated by an all-pole auto-regressive model, the coefficients of which are transformed to PLP cepstral coefficients.
- the order of the auto-regressive model determines the amount of detail in the auditory spectrum preserved by the model. Higher order models tend to preserve more speaker-dependent information.
- time segmentation is employed to separate the speech frames into different classes. Each class appears to exert different influence on the overall speech quality.
- Time segmentation is performed using a voice activity detector (“VAD”) and a voicing detector.
- VAD voice activity detector
- the VAD identifies each 10-ms speech frame as being active or inactive.
- the voicing detector further labels active frames as voiced or unvoiced.
- the VAD from ITU-T Rec. G.729-Annex B, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, International Telecommunication Union, Geneva, Switzerland. November 1996, which is incorporated by reference, is employed.
- a Gaussian mixture density is a weighted sum of M component densities as p ⁇ ( u
- GMM parameters are initialized using the k-means algorithm described in A. Gersho and R. Gray, Vector Quantization and Signal Compression . Norwell, Mass.: Kluwer, 1992, which is incorporated by reference, and estimated using the expectation-maximization (“EM”) algorithm described in A. Dempster, N. Lair, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J Royal Statistical Society , vol. ⁇ 39, pp. 1-38, 1977, which is incorporated by reference.
- EM expectation-maximization
- the EM algorithm iterations produce a sequence of models with monotonically non-decreasing log-likelihood (“LL”) values.
- the algorithm is deemed to have converged when the difference of LL values between two consecutive iterations drops below 10 ⁇ 3 .
- a GMM is used to model the PLP cepstral coefficients of each class of speech frames. For instance, consider the class of clean speech signals. Three different Gaussian mixture densities p class (u
- . . , x Nclass are the feature vectors in the class, and N class is the number of such vectors in the statistical model class. Larger C class indicates greater consistency. C class is set to zero whenever N class is zero. For each class, the product of the consistency measure C class and the fraction of frames of that class in the speech signal is calculated. The products for all the model classes serve as quality indicators to be mapped to an objective estimate of the subjective score value.
- mapping functions which may be utilized include multivariate polynomial regression and multivariate adaptive regression splines (“MARS”), as described in J. H. Friedman, “Multivariate adaptive regression splines,” The Annals of Statistics , vol. 19, no 1, pp. 1-141, March 1991.
- MARS multivariate polynomial regression and multivariate adaptive regression splines
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- This invention relates generally to the field of telecommunications, and more particularly to double-ended measurement of speech quality.
- The capability of measuring speech quality in a telecommunications network is important to telecommunications service providers. Measurements of speech quality can be employed to assist with network maintenance and troubleshooting, and can also be used to evaluate new technologies, protocols and equipment. However, anticipating how people will perceive speech quality can be difficult. The traditional technique for measuring speech quality is a subjective listening test. In a subjective listening test a group of people manually, i.e., by listening, score the quality of speech according to, e.g., an Absolute Categorical Rating (“ACR”) scale, Bad (1), Poor (2), Fair (3), Good (4), Excellent (5). The average of the scores, known as a Mean Opinion Score (“MOS”), is then calculated and used to characterize the performance of speech codecs, transmission equipment, and networks. Other kinds of subjective tests and scoring schemes may also be used, e.g., degradation mean opinion scores (“DMOS”). Regardless of the scoring scheme, subjective listening tests are time consuming and costly.
- Machine-automated, “objective” measurement is known as an alternative to subjective listening tests. Objective measurement provides a rapid and economical means to estimate user opinion, and makes it possible to perform real-time speech quality measurement on a network-wide scale. Objective measurement can be performed either intrusively or non-intrusively. Intrusive measurement, also called double-ended or input-output-based measurement, is based on measuring the distortion between the received and transmitted speech signals, often with an underlying requirement that the transmitted signal be a “clean” signal of high quality. Non-intrusive measurement, also called single-ended or output-based measurement, does not require the clean signal to estimate quality. In a working commercial network it may be difficult to provide both the clean signal and the received speech signal to the test equipment because of the distances between endpoints. Consequently, non-intrusive techniques should be more practical for implementation outside of a test facility because they do not require a clean signal.
- Several non-intrusive measurement schemes are known. In C. Jin and R. Kubichek. “Vector quantization techniques for output-based objective speech quality,” in Proc. IEEE Inf. Conf. Acoustics, Speech, Signal Processing, vol. 1, May 1996, pp. 491-494, comparisons between features of the received speech signal and vector quantizer (“VQ”) codebook representations of the features of clean speech are used to estimate quality. In W. Li and R. Kubichek, “Output-based objective speech quality measurement using continuous hidden Markov models,” in Proc. 7th Inl. Strap. Signal Processing Applications, vol. I. July 2003. pp. 389-392, the VQ codebook reference is replaced with a hidden Markov model. In P. Gray, M. P. Hollier. and R. E. Massara. “Non-intrusive speech-quality assessment using vocal-tract models,” Proc. Inst. Elect. Eng., Vision, Image. Signal Process., vol. 147, no. 6, pp. 493-501, December 2000 and D. S. Kim. “ANIQUE: An auditory model for single-ended speech quality estimation,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 821-831, September 2005, vocal tract modeling and modulation-spectral features derived from the temporal envelope of speech, respectively, provide quality cues for non-intrusive quality measurement. More recently, a non-intrusive method using neurofuzzy inference was proposed in G. Chen and V. Parsa, “Nonintrusive speech quality evaluation using an adaptive neurofuzzy inference system,” IEEE Signal Process. Lett., vol. 12, no. 5, pp. 403-106, May 2005. The International Telecommunications Union ITU-T P.563 standard represents the “state-of-the-art” algorithm, ITU-T P.563, Single Ended Method for Objective Speech Quality Assessment in Narrow-Band Telephony Applications, International Telecommunication Union, Geneva, Switzerland, May 2004. However, each of these known non-intrusive measurement schemes is computationally intensive relative to the capabilities of equipment which could currently be widely deployed at low cost. Consequently, a less computationally intensive non-intrusive solution would be desirable in order to facilitate deployment outside of test facilities.
- In accordance with one embodiment of the invention, a single-ended speech quality measurement method comprises the steps of: extracting perceptual features from a received speech signal; assessing the perceptual features with at least one statistical model of the features to form indicators of speech quality; and employing the indicators of speech quality to produce a speech quality score.
- In accordance with another embodiment of the invention, apparatus operable to provide a single-ended speech quality measurement, comprises: a feature extraction module operable to extract perceptual features from a received speech signal; a statistical reference model and consistency calculation module operable in response to output from the feature extraction module to assess the perceptual features to form indicators of speech quality; and a scoring module operable to employ the indicators of speech quality to produce a speech quality score.
- One advantage of the inventive technique is reduction of processing requirements for speech quality measurement without significant degradation in performance. Simulations with Perceptual Linear Prediction (“PLP”) coefficients have shown that the inventive technique can outperform P.563 by up to 44.74% in correlation R for SMV coded speech under noisy conditions. The inventive technique is comparable to P.563 under various other conditions. An average 40% reduction in processing time was obtained compared to P.563, with P.563 implemented using a quicker procedural computer language than the interpretive language used to run the inventive technique. Thus, the speedup that can be obtained from the inventive technique programmed with a procedural language such as C is expected to be much greater.
-
FIG. 1 is a block diagram of a non-intrusive measurement technique including a statistical reference model. -
FIG. 1 illustrates a relatively easily calculable non-intrusive measurement technique. The input is a speech (“test”) signal for which a subjective quality score is to be estimated (100), e.g., a speech signal that has been processed by network equipment, transmitted on a communications link, or both. A feature extraction module (102) is employed to extract perceptual features, frame by frame, from the test signal. A time segmentation module (104) labels the feature vector of each frame as belonging to one of three possible segment classes: voiced, unvoiced, or inactive. In a separate process, statistical or probability models such as Gaussian Mixture Models are formed. The terms “statistical model” and “statistical reference model” as used herein encompass probability models, statistical probability models and the like, as those terms are understood in the art. Different models may be formed for different classes of speech signals. For instance, one class could be high-quality, undistorted speech signal. Other classes could be speech impaired by different types of distortions. A distinct model may be used for each of the segment classes in each speech signal class, or one single model may be used for a speech class with no distinction between segments. The different statistical models together comprise a reference model (106) of the behavior of speech features. Features extracted from the test signal (100) are assessed using the reference model by calculating a “consistency” measure with respect to each statistical model via a consistency calculation module (108). The consistency values serve as indicators of speech quality and are mapped to an estimated subjective score, such as Mean Opinion Score (“MOS”), degradation mean opinion score (“DMOS”), or some other type of subjective score, using a mapping module (110), thereby producing an estimated score (112). - Referring now to the feature extraction module (102), perceptual linear prediction (“PLP”) cepstral coefficients serve as primary features and are extracted from the speech signal every 10 ms. The coefficients are obtained from an “auditory spectrum” constructed to exploit three psychoacoustic precepts: critical band spectral resolution, equal-loudness curve, and intensity loudness power law. The auditory spectrum is approximated by an all-pole auto-regressive model, the coefficients of which are transformed to PLP cepstral coefficients. The order of the auto-regressive model determines the amount of detail in the auditory spectrum preserved by the model. Higher order models tend to preserve more speaker-dependent information. Since the illustrated embodiment is directed to measuring quality variation due to the transmission system rather than the speaker, speaker independence is a desirable property. In the illustrated embodiment fifth-order PLP coefficients as described in H. Hermansky, “Perceptual linear prediction (PLP) analysis of speech,” J. Acoust. Soc. Amer., vol. 87, pp. 1738-1752, 1990, (“Hermansky”), which is incorporated by reference, are employed as speaker-independent speech spectral parameters. Other types of features, such as RASTA-PLP, may also be employed in lieu of PLP.
- Referring now to the time segmentation module (104), time segmentation is employed to separate the speech frames into different classes. Each class appears to exert different influence on the overall speech quality. Time segmentation is performed using a voice activity detector (“VAD”) and a voicing detector. The VAD identifies each 10-ms speech frame as being active or inactive. The voicing detector further labels active frames as voiced or unvoiced. In the illustrated embodiment the VAD from ITU-T Rec. G.729-Annex B, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, International Telecommunication Union, Geneva, Switzerland. November 1996, which is incorporated by reference, is employed.
- Referring to the GMM reference model (106), where u is a K-dimensional feature vector, a Gaussian mixture density is a weighted sum of M component densities as
where αi≧0, i=1, . . . , M are the mixture weights, with
and bi(u), i=1, . . . , M, are K-variate Gaussian densities with mean vector μi and covariance matrix Σi. The parameter list λ={λ1, . . . , λM} defines a particular Gaussian mixture density, where λi={μi, Σi, αi}. GMM parameters are initialized using the k-means algorithm described in A. Gersho and R. Gray, Vector Quantization and Signal Compression. Norwell, Mass.: Kluwer, 1992, which is incorporated by reference, and estimated using the expectation-maximization (“EM”) algorithm described in A. Dempster, N. Lair, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J Royal Statistical Society, vol.˜39, pp. 1-38, 1977, which is incorporated by reference. The EM algorithm iterations produce a sequence of models with monotonically non-decreasing log-likelihood (“LL”) values. The algorithm is deemed to have converged when the difference of LL values between two consecutive iterations drops below 10−3. - Referring specifically to the reference model (106), a GMM is used to model the PLP cepstral coefficients of each class of speech frames. For instance, consider the class of clean speech signals. Three different Gaussian mixture densities pclass(u|λ) are trained. The subscript “class” represents either voiced, unvoiced, or inactive frames. In principle, by evaluating a statistical model at the PLP cepstral coefficients x of the test signal, i.e., pclass(x|λ), a measure of consistency between the coefficient vector and the statistical model is obtained. Voiced coefficient vectors are applied to pvoiced(u|λ), unvoiced vectors to punvoiced(u|λ), and inactive vectors to pinactive(u|λ).
- Referring now to the consistency calculation module (108), it should be noted that a simplifying assumption is made that vectors between frames are independent. Improved performance might be obtained from more sophisticated approaches that model the statistical dependency between frames, such as Markov modeling. Nevertheless, a model with low computational complexity has benefits as already discussed above. For a given speech signal whose feature vectors have been classified as described above, the consistency between the feature vectors of a class and the statistical model of that class is calculated as
where x1, . . . , xNclass, are the feature vectors in the class, and Nclass is the number of such vectors in the statistical model class. Larger Cclass indicates greater consistency. Cclass is set to zero whenever Nclass is zero. For each class, the product of the consistency measure Cclass and the fraction of frames of that class in the speech signal is calculated. The products for all the model classes serve as quality indicators to be mapped to an objective estimate of the subjective score value. - Referring now to the mapping module (110), mapping functions which may be utilized include multivariate polynomial regression and multivariate adaptive regression splines (“MARS”), as described in J. H. Friedman, “Multivariate adaptive regression splines,” The Annals of Statistics, vol. 19, no 1, pp. 1-141, March 1991. With MARS, the mapping is constructed as a weighted sum of basis functions, each taking the form of a truncated spline
- While the invention is described through the above exemplary embodiments, it will be understood by those of ordinary skill in the art that modification to and variation of the illustrated embodiments may be made without departing from the inventive concepts herein disclosed. Moreover, while the preferred embodiments are described in connection with various illustrative structures, one skilled in the art will recognize that the system may be embodied using a variety of specific structures. Accordingly, the invention should not be viewed as limited except by the scope and spirit of the appended claims.
Claims (18)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/364,252 US20070203694A1 (en) | 2006-02-28 | 2006-02-28 | Single-sided speech quality measurement |
US13/195,338 US9786300B2 (en) | 2006-02-28 | 2011-08-01 | Single-sided speech quality measurement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/364,252 US20070203694A1 (en) | 2006-02-28 | 2006-02-28 | Single-sided speech quality measurement |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/195,338 Continuation US9786300B2 (en) | 2006-02-28 | 2011-08-01 | Single-sided speech quality measurement |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070203694A1 true US20070203694A1 (en) | 2007-08-30 |
Family
ID=38445095
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/364,252 Abandoned US20070203694A1 (en) | 2006-02-28 | 2006-02-28 | Single-sided speech quality measurement |
US13/195,338 Active 2027-05-19 US9786300B2 (en) | 2006-02-28 | 2011-08-01 | Single-sided speech quality measurement |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/195,338 Active 2027-05-19 US9786300B2 (en) | 2006-02-28 | 2011-08-01 | Single-sided speech quality measurement |
Country Status (1)
Country | Link |
---|---|
US (2) | US20070203694A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070233469A1 (en) * | 2006-03-30 | 2007-10-04 | Industrial Technology Research Institute | Method for speech quality degradation estimation and method for degradation measures calculation and apparatuses thereof |
US20130080172A1 (en) * | 2011-09-22 | 2013-03-28 | General Motors Llc | Objective evaluation of synthesized speech attributes |
US20140188470A1 (en) * | 2012-12-31 | 2014-07-03 | Jenny Chang | Flexible architecture for acoustic signal processing engine |
CN116504274A (en) * | 2023-05-30 | 2023-07-28 | 南开大学 | Non-invasive voice quality evaluation method enhanced by retrieval |
US12051440B1 (en) * | 2023-04-12 | 2024-07-30 | Civil Aviation Flight University Of China | Self-attention-based speech quality measuring method and system for real-time air traffic control |
US12093314B2 (en) * | 2019-11-22 | 2024-09-17 | Tencent Music Entertainment Technology (Shenzhen) Co., Ltd. | Accompaniment classification method and apparatus |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9830905B2 (en) | 2013-06-26 | 2017-11-28 | Qualcomm Incorporated | Systems and methods for feature extraction |
US9685173B2 (en) * | 2013-09-06 | 2017-06-20 | Nuance Communications, Inc. | Method for non-intrusive acoustic parameter estimation |
US9870784B2 (en) | 2013-09-06 | 2018-01-16 | Nuance Communications, Inc. | Method for voicemail quality detection |
US9917952B2 (en) | 2016-03-31 | 2018-03-13 | Dolby Laboratories Licensing Corporation | Evaluation of perceptual delay impact on conversation in teleconferencing system |
US11495244B2 (en) | 2018-04-04 | 2022-11-08 | Pindrop Security, Inc. | Voice modification detection using physical models of speech production |
CN110047466B (en) * | 2019-04-16 | 2021-04-13 | 深圳市数字星河科技有限公司 | Method for openly creating voice reading standard reference model |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715372A (en) * | 1995-01-10 | 1998-02-03 | Lucent Technologies Inc. | Method and apparatus for characterizing an input signal |
US5794188A (en) * | 1993-11-25 | 1998-08-11 | British Telecommunications Public Limited Company | Speech signal distortion measurement which varies as a function of the distribution of measured distortion over time and frequency |
US5848384A (en) * | 1994-08-18 | 1998-12-08 | British Telecommunications Public Limited Company | Analysis of audio quality using speech recognition and synthesis |
US5999900A (en) * | 1993-06-21 | 1999-12-07 | British Telecommunications Public Limited Company | Reduced redundancy test signal similar to natural speech for supporting data manipulation functions in testing telecommunications equipment |
US6266638B1 (en) * | 1999-03-30 | 2001-07-24 | At&T Corp | Voice quality compensation system for speech synthesis based on unit-selection speech database |
US6477492B1 (en) * | 1999-06-15 | 2002-11-05 | Cisco Technology, Inc. | System for automated testing of perceptual distortion of prompts from voice response systems |
US6564181B2 (en) * | 1999-05-18 | 2003-05-13 | Worldcom, Inc. | Method and system for measurement of speech distortion from samples of telephonic voice signals |
US6609092B1 (en) * | 1999-12-16 | 2003-08-19 | Lucent Technologies Inc. | Method and apparatus for estimating subjective audio signal quality from objective distortion measures |
US6625785B2 (en) * | 2000-04-19 | 2003-09-23 | Georgia Tech Research Corporation | Method for diagnosing process parameter variations from measurements in analog circuits |
US20040186715A1 (en) * | 2003-01-18 | 2004-09-23 | Psytechnics Limited | Quality assessment tool |
US7313517B2 (en) * | 2003-03-31 | 2007-12-25 | Koninklijke Kpn N.V. | Method and system for speech quality prediction of an audio transmission system |
US7526394B2 (en) * | 2003-01-21 | 2009-04-28 | Psytechnics Limited | Quality assessment tool |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE60006995T2 (en) * | 1999-11-08 | 2004-10-28 | British Telecommunications P.L.C. | NON-INFLUENCING ASSESSMENT OF LANGUAGE QUALITY |
GB2407952B (en) * | 2003-11-07 | 2006-11-29 | Psytechnics Ltd | Quality assessment tool |
US20060200346A1 (en) * | 2005-03-03 | 2006-09-07 | Nortel Networks Ltd. | Speech quality measurement based on classification estimation |
US7856355B2 (en) * | 2005-07-05 | 2010-12-21 | Alcatel-Lucent Usa Inc. | Speech quality assessment method and system |
CN101411171B (en) * | 2006-01-31 | 2013-05-08 | 艾利森电话股份有限公司 | Non-intrusive signal quality assessment |
-
2006
- 2006-02-28 US US11/364,252 patent/US20070203694A1/en not_active Abandoned
-
2011
- 2011-08-01 US US13/195,338 patent/US9786300B2/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5999900A (en) * | 1993-06-21 | 1999-12-07 | British Telecommunications Public Limited Company | Reduced redundancy test signal similar to natural speech for supporting data manipulation functions in testing telecommunications equipment |
US5794188A (en) * | 1993-11-25 | 1998-08-11 | British Telecommunications Public Limited Company | Speech signal distortion measurement which varies as a function of the distribution of measured distortion over time and frequency |
US5848384A (en) * | 1994-08-18 | 1998-12-08 | British Telecommunications Public Limited Company | Analysis of audio quality using speech recognition and synthesis |
US5715372A (en) * | 1995-01-10 | 1998-02-03 | Lucent Technologies Inc. | Method and apparatus for characterizing an input signal |
US6266638B1 (en) * | 1999-03-30 | 2001-07-24 | At&T Corp | Voice quality compensation system for speech synthesis based on unit-selection speech database |
US6564181B2 (en) * | 1999-05-18 | 2003-05-13 | Worldcom, Inc. | Method and system for measurement of speech distortion from samples of telephonic voice signals |
US6477492B1 (en) * | 1999-06-15 | 2002-11-05 | Cisco Technology, Inc. | System for automated testing of perceptual distortion of prompts from voice response systems |
US6609092B1 (en) * | 1999-12-16 | 2003-08-19 | Lucent Technologies Inc. | Method and apparatus for estimating subjective audio signal quality from objective distortion measures |
US6625785B2 (en) * | 2000-04-19 | 2003-09-23 | Georgia Tech Research Corporation | Method for diagnosing process parameter variations from measurements in analog circuits |
US20040186715A1 (en) * | 2003-01-18 | 2004-09-23 | Psytechnics Limited | Quality assessment tool |
US7526394B2 (en) * | 2003-01-21 | 2009-04-28 | Psytechnics Limited | Quality assessment tool |
US7313517B2 (en) * | 2003-03-31 | 2007-12-25 | Koninklijke Kpn N.V. | Method and system for speech quality prediction of an audio transmission system |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070233469A1 (en) * | 2006-03-30 | 2007-10-04 | Industrial Technology Research Institute | Method for speech quality degradation estimation and method for degradation measures calculation and apparatuses thereof |
US7801725B2 (en) * | 2006-03-30 | 2010-09-21 | Industrial Technology Research Institute | Method for speech quality degradation estimation and method for degradation measures calculation and apparatuses thereof |
US20130080172A1 (en) * | 2011-09-22 | 2013-03-28 | General Motors Llc | Objective evaluation of synthesized speech attributes |
US20140188470A1 (en) * | 2012-12-31 | 2014-07-03 | Jenny Chang | Flexible architecture for acoustic signal processing engine |
US9653070B2 (en) * | 2012-12-31 | 2017-05-16 | Intel Corporation | Flexible architecture for acoustic signal processing engine |
US12093314B2 (en) * | 2019-11-22 | 2024-09-17 | Tencent Music Entertainment Technology (Shenzhen) Co., Ltd. | Accompaniment classification method and apparatus |
US12051440B1 (en) * | 2023-04-12 | 2024-07-30 | Civil Aviation Flight University Of China | Self-attention-based speech quality measuring method and system for real-time air traffic control |
CN116504274A (en) * | 2023-05-30 | 2023-07-28 | 南开大学 | Non-invasive voice quality evaluation method enhanced by retrieval |
Also Published As
Publication number | Publication date |
---|---|
US20110288865A1 (en) | 2011-11-24 |
US9786300B2 (en) | 2017-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9786300B2 (en) | Single-sided speech quality measurement | |
Falk et al. | Single-ended speech quality measurement using machine learning methods | |
US8195449B2 (en) | Low-complexity, non-intrusive speech quality assessment | |
Malfait et al. | P. 563—The ITU-T standard for single-ended speech quality assessment | |
Rix et al. | Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs | |
EP1083541B1 (en) | A method and apparatus for speech detection | |
US6609092B1 (en) | Method and apparatus for estimating subjective audio signal quality from objective distortion measures | |
Rix | Perceptual speech quality assessment-a review | |
US20060200346A1 (en) | Speech quality measurement based on classification estimation | |
Falk et al. | Non-intrusive GMM-based speech quality measurement | |
Dubey et al. | Non-intrusive speech quality assessment using several combinations of auditory features | |
Falk et al. | Nonintrusive speech quality estimation using Gaussian mixture models | |
Chen et al. | Bayesian model based non-intrusive speech quality evaluation | |
Picovici et al. | Output-based objective speech quality measure using self-organizing map | |
Huber et al. | Single-ended speech quality prediction based on automatic speech recognition | |
Picovici et al. | New output-based perceptual measure for predicting subjective quality of speech | |
Zafar et al. | Speech quality assessment using mel frequency spectrograms of speech signals | |
Kim | A cue for objective speech quality estimation in temporal envelope representations | |
Falk et al. | Enhanced non-intrusive speech quality measurement using degradation models | |
Hinterleitner et al. | Comparison of approaches for instrumentally predicting the quality of text-to-speech systems: Data from Blizzard Challenges 2008 and 2009 | |
Mahdi | Perceptual non‐intrusive speech quality assessment using a self‐organizing map | |
Mahdi et al. | New single-ended objective measure for non-intrusive speech quality evaluation | |
Zha et al. | A data mining approach to objective speech quality measurement | |
Wang et al. | Non-intrusive objective speech quality measurement based on GMM and SVR for narrowband and wideband speech | |
Aburas et al. | Symbian Based Perceptual Evaluation of Speech Quality for Telecommunication Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NORTEL NETWORKS LIMITED, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHAN, WAI-YIP;FALK, TIAGO H.;EL-HENNAWEY, MOHAMED;REEL/FRAME:017630/0230;SIGNING DATES FROM 20060227 TO 20060228 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS ADMINISTRATIVE AGENT,NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023892/0500 Effective date: 20100129 Owner name: CITIBANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023892/0500 Effective date: 20100129 |
|
AS | Assignment |
Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT, NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001 Effective date: 20100129 Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT,NEW YO Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001 Effective date: 20100129 Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT, NEW Y Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001 Effective date: 20100129 |
|
AS | Assignment |
Owner name: AVAYA INC.,NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NORTEL NETWORKS LIMITED;REEL/FRAME:023998/0878 Effective date: 20091218 Owner name: AVAYA INC., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NORTEL NETWORKS LIMITED;REEL/FRAME:023998/0878 Effective date: 20091218 |
|
AS | Assignment |
Owner name: BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLAT Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC., A DELAWARE CORPORATION;REEL/FRAME:025863/0535 Effective date: 20110211 Owner name: BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLATERAL AGENT, THE, PENNSYLVANIA Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC., A DELAWARE CORPORATION;REEL/FRAME:025863/0535 Effective date: 20110211 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: AVAYA INC., CALIFORNIA Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 023892/0500;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044891/0564 Effective date: 20171128 Owner name: AVAYA INC., CALIFORNIA Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 025863/0535;ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST, NA;REEL/FRAME:044892/0001 Effective date: 20171128 |
|
AS | Assignment |
Owner name: SIERRA HOLDINGS CORP., NEW JERSEY Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITICORP USA, INC.;REEL/FRAME:045045/0564 Effective date: 20171215 Owner name: AVAYA, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITICORP USA, INC.;REEL/FRAME:045045/0564 Effective date: 20171215 |
|
AS | Assignment |
Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:WILMINGTON SAVINGS FUND SOCIETY, FSB;REEL/FRAME:066894/0227 Effective date: 20240325 Owner name: AVAYA LLC, DELAWARE Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:WILMINGTON SAVINGS FUND SOCIETY, FSB;REEL/FRAME:066894/0227 Effective date: 20240325 Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:066894/0117 Effective date: 20240325 Owner name: AVAYA LLC, DELAWARE Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:066894/0117 Effective date: 20240325 |
|
AS | Assignment |
Owner name: ARLINGTON TECHNOLOGIES, LLC, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AVAYA LLC;REEL/FRAME:067022/0780 Effective date: 20240329 |