Research Article
Open access
Published: 01 December 2006

Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

Panikos Heracleous^1,2,
Tomomi Kaino¹,
Hiroshi Saruwatari¹ &
…
Kiyohiro Shikano¹

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 094068 (2006) Cite this article

1653 Accesses
16 Citations
Metrics details

Abstract

We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.

References

Nakajima Y, Kashioka H, Shikano K, Campbell N: Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong 5: 708–711.
Google Scholar
Zheng Y, Liu Z, Zhang Z, et al.: Air- and bone-conductive integrated microphones for robust speech detection and enhancement. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA 249–254.
Google Scholar
Liu Z, Subramanya A, Zhang Z, Droppo J, Acero A: Leakage model and teeth clack removal for air- and bone-conductive integrated microphones. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 1093–1096.
Google Scholar
Graciarena M, Franco H, Sonmez K, Bratt H: Combining standard and throat microphones for robust speech recognition. IEEE Signal Processing Letters 2003,10(3):72–74. 10.1109/LSP.2003.808549
Article Google Scholar
Strand OM, Holter T, Egeberg A, Stensby S: On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA 315–320.
Google Scholar
Jou S-C, Schultz T, Waibel A: Adaptation for soft whisper recognition using a throat microphone. Proceedings of International Conference on Speech and Language Processing (ICSLP '04), October 2004, Jeju Island, Korea
Google Scholar
Nakajima Y, Kashioka H, Shikano K, Campbell N: Non-audible murmur recognition. Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), September 2003, Geneva, Switzerland 2601–2604.
Google Scholar
Lee A, Kawahara T, Takeda K, Shikano K: A new phonetic tied-mixture model for efficient decoding. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1269–1272.
Google Scholar
Heracleous P, Nakajima Y, Lee A, Saruwatari H, Shikano K: Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA 73–76.
Google Scholar
Heracleous P, Nakajima Y, Lee A, Saruwatari H, Shikano K: Non-audible murmur (NAM) recognition using a stethoscopic NAM microphone. Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech '04 - ICSLP), October 2004, Jeju Island, Korea 1469–1472.
Google Scholar
Heracleous P, Kaino T, Saruwatari H, Shikano K: Applications of NAM microphones in speech recognition for privacy in human-machine communication. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05 - EUROSPEECH), September 2005, Lisboa, Portugal 3041–3044.
Google Scholar
Nakajima Y, Kashioka H, Shikano K, Campbell N: Remodeling of the sensor for non-audible murmur (NAM). Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05 - EUROSPEECH), September 2005, Lisboa, Portugal 389–392.
Google Scholar
Heracleous P, Kaino T, Saruwatari H, Shikano K: Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05 - EUROSPEECH), September 2005, Lisboa, Portugal 2649–2652.
Google Scholar
Heracleous P, Nakajima Y, Lee A, Saruwatari H, Shikano K: Audible (normal) speech and inaudible murmur recognition using NAM microphone. Proceedings of the 7th European Signal Processing Conference (EUSIPCO '04), September 2004, Vienna, Austria 329–332.
Google Scholar
Kawahara T, Lee A, Kobayashi T, et al.: Free software toolkit for Japanese large vocabulary continuous speech recognition. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China IV-476–IV-479.
Google Scholar
Itou K, Yamamoto M, Takeda K, et al.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. The Journal of the Acoustical Society of Japan (E) 1999,20(3):199–206.
Article Google Scholar
Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171–185. 10.1006/csla.1995.0010
Article Google Scholar
Lee C-H, Lin C-H, Juang B-H: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Transactions on Signal Processing 1991,39(4):806–814. 10.1109/78.80902
Article Google Scholar
Woodland PC, Pye D, Gales MJF: Iterative unsupervised adaptation using maximum likelihood linear regression. Proceedings of the 4th International Conference on Spoken Language (ICSLP '96), October 1996, Philadelphia, Pa, USA 2: 1133–1136.
Article Google Scholar
Junqua J-C: The Lombard reflex and its role on human listeners and automatic speech recognizers. Journal of the Acoustical Society of America 1993,93(1):510–524. 10.1121/1.405631
Article Google Scholar
Wakao A, Takeda K, Itakura F: Variability of Lombard effects under different noise conditions. Proceedings of the 4th International Conference on Spoken Language (ICSLP '96), October 1996, Philadelphia, Pa, USA 4: 2009–2012.
Article Google Scholar
Hansen JHL: Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect. IEEE Transactions on Speech and Audio Processing 1994,2(4):598–614. 10.1109/89.326618
Article Google Scholar
Hanson BA, Applebaum TH: Robust speaker-independent word recognition using static, dynamicand acceleration features: experiments with Lombard and noisy speech. Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 857–860.
Article Google Scholar
Ruiz R, Absil E, Harmegnies B, Legros C, Poch D: Time- and spectrum-related variabilities in stressed speech under laboratory and real conditions. Speech Communication 1996,20(1–2):111–129. 10.1016/S0167-6393(96)00048-9
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0192, Japan
Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari & Kiyohiro Shikano
Department of Computer Science, University of Cyprus, 75 Kallipoleos Street, P.O. Box 537, Nicosia, 1678, Cyprus
Panikos Heracleous

Authors

Panikos Heracleous
View author publications
You can also search for this author in PubMed Google Scholar
Tomomi Kaino
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Saruwatari
View author publications
You can also search for this author in PubMed Google Scholar
Kiyohiro Shikano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panikos Heracleous.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Heracleous, P., Kaino, T., Saruwatari, H. et al. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor. EURASIP J. Adv. Signal Process. 2007, 094068 (2006). https://doi.org/10.1155/2007/94068

Download citation

Received: 22 September 2005
Revised: 06 January 2006
Accepted: 30 January 2006
Published: 01 December 2006
DOI: https://doi.org/10.1155/2007/94068