Abstract
A log-index weighted cepstral distance measure is proposed and tested in speaker-independent and speaker-dependent isolated word recognition systems using statistic techniques. The weights for the cepstral coefficients of this measure equal the logarithm of the corresponding indices. The experimental results show that this kind of measure works better than any other weighted Euclidean cepstral distance measures on three speech databases. The error rate obtained using this measure is about 1.8 percent for three databases on average, which is a 25% reduction from that obtained using other measures, and a 40% reduction from that obtained using Log Likelihood Ratio (LLR) measure. The experimental results also show that this kind of distance measure works well in both speaker-dependent and speaker-independent speech recognition systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Itakura F. Minimum prediction residual principle applied to speech recognition.IEEE Trans. Acoust., Speech, Signal. Processing, 1975, ASSP-23: 67–72.
Nocerino N, Soong F K, Rabiner L R, Klatt D H. Comparative study of several distortion measures for speech recognition. InProc. ICASSP 1985, vol.11, Mar. 1985, pp.25–28.
Furui S. Cepstral analysis technique for automatic speaker verification.IEEE Trans. Acoust., Speech, Signal Processing, 1981, ASSP-29: 254–272.
Paliwal K K. On the performance of the quefrency-weighted cepstral coefficients in vowel recognition.Speech Commun., 1982, 1: 151–154.
Tohkura Y. A weighted cepstral distance measure for speech recognition.IEEE Trans. Acoust., Speech, Signal Processing, 1987, ASSP-35(10): 1414–1422.
Juang B H, Rabiner L R, Wilpon J G. On the use of bandpass liftering in speech recognition.IEEE Trans. Acoust., Speech, Signal Processing, 1987, ASSP-35(7): 947–953.
Jiang Li, Wu Wenhu, Cai Lianhong, Fang Ditang. A real-time speaker-independent speech recognition system based on SPM for 208 Chinese words. InProc. ICSP’90, pp.473–476, 1990.
Zheng Fang, Yang Hongbo, Wu Wenhu, Fang Ditang. A continuous distance density segmental probabilistic model. InProc. National Conference on Man-Machine Speech Communication (NCMMSC’94), Speech Recognition and Synthesis, pp.238–241, Oct. 1994. (in Chinese)
Zheng Fang, Wu Wenhu, Fang Ditang. The CDCPM with applications to speech recognition. Accepted byChinese J. Advanced Software Research, 1996. (in Chinese)
Juang B H, Rabiner L R, Wilpon J G. On the use of bandpass liftering in speech recognition.IEEE Trans. ASSP, 1987, ASSP-35: 947–953.
Makhoul J. Linear prediction: A tutorial review. InProc. IEEE, Apr. 1975, vol.63, pp.562–580.
Gold B, Rader C M. Digital Processing of Signals. New York, McGraw-Hill, 1969, p.246.
Author information
Authors and Affiliations
Corresponding author
Additional information
Zheng Fang was born in Jiangsu Province, P.R.China, in 1967. He received the B.S. degree and the M.S. degree from Tsinghua Univ., P.R. China, both in computer science and technology, in 1990 and 1992, respectively. He is now a lecturer and, at the same time, a Ph.D. candidate in Tsinghua University. He is also the Executive Director of the Analog Devices Inc.-Tsinghua DSP Technology Research Center. Since 1988, He has been working on Speech Recognition at Speech Lab., Dept. of Computer Science and Technology, Tsinghua University.
Wu Wenhu was born in Beijing, P.R.China, in 1936. He studied in the Department of Electrical Engineering, Tsinghua University from 1955 to 1958, and then in the Department of Automation, Tsinghua University, from 1958 to 1961. Since then he has been at Tsinghua University and now a Professor in the Department of Computer Science and Technology. He is the Director of the Speech Lab. now. He is devoted in the research of Chinese speech recognition and understanding, especially the speaker-independent Chinese speech recognition. As a result, he has been awarded several times. He is also engaged in the computer spread education. He is the Chairman of Computer Spread Education Commission of CCF (Chinese Computer Federation). He led the China Team to take part in the IOI’89—IOI’95 (International Olympiad in Informatics) and won many gold medals.
Fang Ditang was born in Shanghai, P.R.China, in 1930. He received the B.S. degree from Jiaotong University and the M.S. degree from Tsinghua University, both in electrical engineering, in 1953 and 1956, respectively. Since then, he has been teaching at Tsinghua University and now a Professor in the Department of Computer Science and Technology. In 1979, he founded the Laboratory for Human-Machine Speech Communications and was the Director from 1979 to 1990. The laboratory won the National Scientific Research and Technology Progress Award twice, in 1987 and 1989, respectively, the National Scientific Invention Award in 1990, and three other awards. He is the Deputy Chief of the Artificial Intelligence and Pattern Recognition Committee of the Chinese Computer Science Society.
Rights and permissions
About this article
Cite this article
Zheng, F., Wu, W. & Fang, D. A log-index weighted cepstral distance measure for speech recognition. J. of Comput. Sci. & Technol. 12, 177–184 (1997). https://doi.org/10.1007/BF02951337
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02951337