More Web Proxy on the site http://driver.im/

Article

SpeakerSense: energy efficient unobtrusive speaker identification on mobile phones

Authors:

A. J. Bernheim Brush,

Bodhi Priyantha,

Amy K. Karlson,

Jie LiuAuthors Info & Claims

Pervasive'11: Proceedings of the 9th international conference on Pervasive computing

Pages 188 - 205

Published: 12 June 2011 Publication History

Abstract

Automatically identifying the person you are talking with using continuous audio sensing has the potential to enable many pervasive computing applications from memory assistance to annotating life logging data. However, a number of challenges, including energy efficiency and training data acquisition, must be addressed before unobtrusive audio sensing is practical on mobile devices. We built SpeakerSense, a speaker identification prototype that uses a heterogeneous multi-processor hardware architecture that splits computation between a low power processor and the phone's application processor to enable continuous background sensing with minimal power requirements. Using SpeakerSense, we benchmarked several system parameters (sampling rate, GMM complexity, smoothing window size, and amount of training data needed) to identify thresholds that balance computation cost with performance. We also investigated channel compensation methods that make it feasible to acquire training data from phone calls and an automatic segmentation method for training speaker models based on one-to-one conversations.

References

[1]

Hayes, G., Patel, S., Truong, K., Iachello, G., Kientz, J., Farmer, R., Abowd, G.: The Personal Audio Loop: Designing a Ubiquitous Audio-Based Memory Aid. In: Proc. Mobile HCI 2004 (2004).

[2]

Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: A Retrospective Memory Aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177-193. Springer, Heidelberg (2006).

Digital Library

[3]

Huang, L., Yang, C.: A Novel Approach to Robust Speech Endpoint Detection in Car Environments. In: ICASSP 2000, Istambul, Turkey, vol. 3, pp. 1751-1754 (May 2000).

[4]

Kapur, N.: Compensating for Memory Deficits with Memory Aids. In: Wilson, B. (ed.) Memory Rehabilitation Integrating Theory and Practice, pp. 52-73. Guilford Press, New York.

[5]

Lee, M., Dey, A.: Lifelogging Memory Appliance for People with Episodic Memory Impairment. In: Proc. UbiComp, pp. 44-53 (2008).

Digital Library

[6]

Lu, H., Pan, W., Lane, W., Choudhury, T., Campbell, A.: SoundSense: scalable sound sensing for people-centric applications on mobile phones. In: Proc. MobiSys 2009, pp. 165-178 (2009).

Digital Library

[7]

Miluzzo, E., Cornelius, C., Ramaswamy, A., Choudhury, T., Liu, Z., Campbell, A.: Darwin Phones: the Evolution of Sensing and Inference on Mobile Phones. In: Proc. MobiSys 2010, pp. 5-20 (2010).

Digital Library

[8]

Miluzzo, E., Lane, N., Fodor, K., Peterson, R., Lu, H., Musolesi, M., Eisenman, S., Zheng, X., Campbell, A.: Sensing meets mobile social networks: The design, implementation and evaluation of the CenceMe application. In: Proc. SenSys 2008, pp. 337-350 (2008).

Digital Library

[9]

Power Monitor, http://www.msoon.com/LabEquipment/PowerMonitor/

[10]

Priyantha, B., Lymberopoulos, D., Liu, J.: LittleRock: Enabling Energy Effcient Continuous Sensing on Mobile Phones. IEEE Pervasive Computing Magazine (April-June 2011).

Digital Library

[11]

Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., McGonegal, C.A.: A comparative performance study of several pitchdetection algorithms. IEEE Trans. Acoust., Speech, and Signal Processing, 399-418 (October 1976).

[12]

Rachuri, K., Musolesi, M., Mascolo, C., Rentfrow, P., Longworth, C., Aucinas, A.: EmotionSense: A Mobile Phone based Adaptive Platform for Experimental Social Psychology Research. In: Proc. UbiComp 2010, pp. 281-290 (2010).

Digital Library

[13]

Reynolds, D.A.: An Overview of Automatic Speaker Recognition Technology. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, pp. 4072-4075 (2002).

[14]

Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72-83 (1995).

[15]

Saunders, J.: Real time discrimination of broadcast speech/music. In: Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), pp. 993-996 (1996).

Digital Library

[16]

Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. ICASSP 1998 (May 1998).

Digital Library

[17]

Vemuri, S., Schmandt, C., Bender, W.: iRemember: a Personal, Long-term Memory Prosthesis. In: Proc. CARPE 2006 (2006).

Digital Library

[18]

Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 133-147 (1998).

Digital Library

[19]

Wang, Y., Lin, J., Annavaram, M., Jacobson, Q., Hong, J., Krishnamachari, B., Sadeh, N.: A framework of energy efficient mobile sensing for automatic user state recognition. In: Proc. MobiSys, pp. 179-192.

Digital Library

[20]

Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Computer Science & Technology 16(6), 582-589 (2001).

Digital Library

Cited By

Gu FChung MChignell MValaee SZhou BLiu X(2021)A Survey on Deep Learning for Human Activity RecognitionACM Computing Surveys10.1145/347229054:8(1-34)Online publication date: 4-Oct-2021
https://dl.acm.org/doi/10.1145/3472290
Alrumayh ALehman STan CChen SOnishi RAnanthanarayanan GLi Q(2019)ABACUSProceedings of the 4th ACM/IEEE Symposium on Edge Computing10.1145/3318216.3363376(395-400)Online publication date: 7-Nov-2019
https://dl.acm.org/doi/10.1145/3318216.3363376
Islam MNirjon SEskicioglu RMottola LPriyantha B(2019)SoundSemanticsProceedings of the 18th International Conference on Information Processing in Sensor Networks10.1145/3302506.3310402(217-228)Online publication date: 16-Apr-2019
https://dl.acm.org/doi/10.1145/3302506.3310402
Show More Cited By

SpeakerSense: energy efficient unobtrusive speaker identification on mobile phones
1. Hardware
  1. Communication hardware, interfaces and storage

Recommendations

Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style'...
Text-Independent Speaker Identification Using Vowel Formants

Automatic speaker identification has become a challenging research problem due to its wide variety of applications. Neural networks and audio-visual identification systems can be very powerful, but they have limitations related to the number of ...
Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification

Speech enhancement is a very important pre-processing step in various speech processing applications such as speech recognition, speaker identification, speech coding, and speech synthesis. In this paper, we focus on speech enhancement prior to speaker ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Pervasive'11: Proceedings of the 9th international conference on Pervasive computing

June 2011

369 pages

ISBN:9783642217258

Editors:
Kent Lyons
Intel Labs, Intel Corporation, Santa Clara, CA
,
Jeffrey Hightower
Google, Seattle, Seattle, WA
,
Elaine M. Huang
University of Zurich, Department of Informatics, Zurich, Switzerland

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 12 June 2011

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gu FChung MChignell MValaee SZhou BLiu X(2021)A Survey on Deep Learning for Human Activity RecognitionACM Computing Surveys10.1145/347229054:8(1-34)Online publication date: 4-Oct-2021
https://dl.acm.org/doi/10.1145/3472290
Alrumayh ALehman STan CChen SOnishi RAnanthanarayanan GLi Q(2019)ABACUSProceedings of the 4th ACM/IEEE Symposium on Edge Computing10.1145/3318216.3363376(395-400)Online publication date: 7-Nov-2019
https://dl.acm.org/doi/10.1145/3318216.3363376
Islam MNirjon SEskicioglu RMottola LPriyantha B(2019)SoundSemanticsProceedings of the 18th International Conference on Information Processing in Sensor Networks10.1145/3302506.3310402(217-228)Online publication date: 16-Apr-2019
https://dl.acm.org/doi/10.1145/3302506.3310402
Nguyen SLai VDam-Ba QNguyen-Xuan APham C(2018)Vietnamese Speaker Authentication Using Deep ModelsProceedings of the 9th International Symposium on Information and Communication Technology10.1145/3287921.3287954(177-184)Online publication date: 6-Dec-2018
https://dl.acm.org/doi/10.1145/3287921.3287954
Liu RCornelius CRawassizadeh RPeterson RKotz D(2018)Vocal ResonanceProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/31917512:1(1-23)Online publication date: 26-Mar-2018
https://dl.acm.org/doi/10.1145/3191751
Bari RAdams RRahman MParsons MBuder EKumar S(2018)rConverseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/31917342:1(1-27)Online publication date: 26-Mar-2018
https://dl.acm.org/doi/10.1145/3191734
Georgiev PBhattacharya SLane NMascolo C(2017)Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network RepresentationsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/31318951:3(1-19)Online publication date: 11-Sep-2017
https://dl.acm.org/doi/10.1145/3131895
Liu RRawassizadeh RKotz DZhang MAshok A(2017)Toward Accurate and Efficient Feature Selection for Speaker Recognition on WearablesProceedings of the 2017 Workshop on Wearable Systems and Applications10.1145/3089351.3089352(41-46)Online publication date: 19-Jun-2017
https://dl.acm.org/doi/10.1145/3089351.3089352
Georgiev PLane NMascolo CChu DChoudhury TKo SCampbell AGanesan D(2017)Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU OffloadingProceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services10.1145/3081333.3081358(306-318)Online publication date: 16-Jun-2017
https://dl.acm.org/doi/10.1145/3081333.3081358
Naderiparizi SZhang PPhilipose MPriyantha BLiu JGanesan DChoudhury TKo SCampbell AGanesan D(2017)GlimpseProceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services10.1145/3081333.3081347(292-305)Online publication date: 16-Jun-2017
https://dl.acm.org/doi/10.1145/3081333.3081347
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents