research-article

Speak-as-you-swipe (SAYS): a multimodal interface combining speech and gesture keyboard synchronously for continuous mobile text entry

Author:

Khe Chai SimAuthors Info & Claims

ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

Pages 555 - 560

https://doi.org/10.1145/2388676.2388793

Published: 22 October 2012 Publication History

Get Access

Abstract

Modern mobile devices, such as the smartphones and tablets, are becoming increasingly popular amongst users of all ages. Text entry is one of the most important modes of interaction between human and their mobile devices. Although typing on a touchscreen display using a soft keyboard remains the most common text input method for many users, the process can be frustratingly slow, especially on smartphones with a much smaller screen. Voice input offers an attractive alternative that completely eliminates the need for typing. However, voice input relies on automatic speech recognition technology whose performance degrades significantly in noisy environment or for non-native users. This paper presents Speak-As-You-Swipe (SAYS), a novel multimodal interface that enables efficient continuous text entry on mobile devices. SAYS integrates a gesture keyboard with speech recognition to improve the efficiency and accuracy of text entry. The swipe gesture and voice inputs provide complementary information that can be very effective in disambiguating confusions in word predictions. The word prediction hypotheses from a gesture keyboard are directly incorporated into the speech recognition process so that the SAYS interface can handle continuous input. Experimental results show that for a 20k vocabulary, the proposed SAYS interface can achieve prediction accuracy of 96.4% in clean condition and about 94.0% in noisy environment, compared to 92.2% using a gesture keyboard alone.

References

[1]

A. Acero, L. Deng, T. Kristjansson, and J. Zhang. HMM adaptation using vector Taylor series for noisy speech recognition. In Proc. of ICSLP, volume 3, pages 869--872, 2000.

Google Scholar

[2]

E. Clarkson, J. Clawson, K. Lyons, and T. Starner. An empirical study of typing rates on mini-qwerty keyboards. In CHI '05 extended abstracts on Human factors in computing systems, CHI EA '05, pages 1288--1291, New York, NY, USA, 2005. ACM.

Digital Library

Google Scholar

[3]

S. B. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic, Speech and Signal Processing, 28(4):357--366, 1980.

Crossref

Google Scholar

[4]

J. Garofalo et al. CSR-I (WSJ0) Sennheiser. Linguistic Data Consortium, Philadelphia, 1993.

Google Scholar

[5]

P. O. Kristensson and K. Vertanen. Asynchronous multimodal text entry using speech and gesture keyboards. In Proceedings of Interspeech, 2011.

Google Scholar

[6]

P. O. Kristensson and S. Zhai. SHARK²: a large vocabulary shorthand writing system for pen-based computers. In the 17th Annual ACM Symposium on User Interface Software and Technology, pages 43--52. ACM Press, 2004.

Digital Library

Google Scholar

[7]

H. Ney and S. Ortmanns. Dynamic programming search for continuous speech recognition. IEEE Signal Processing Magazine, 16(5):64--83, 1999.

Crossref

Google Scholar

[8]

S. Ortmanns, H. Ney, H. Coenen, and E. A. Look-ahead techniques for fast beam search. In ICASSP, 1997.

Digital Library

Google Scholar

[9]

K. C. Sim. Haptic voice recognition: Augmenting speech modality with touch events for efficient speech recognition. In Proc. SLT Workshop, 2010.

Crossref

Google Scholar

[10]

K. C. Sim. Probabilistic integration of partial lexical information for noise robust haptic voice recognition. In Proceedings of the 50th annual meeting on Association for Computational Linguistics, ACL '12, 2012.

Digital Library

Google Scholar

[11]

A. Varga and H. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3):247--251, 1993.

Digital Library

Google Scholar

[12]

S. Young, N. Russell, and J. Thornton. Token passing: a simple conceptual model for connected speech recognition systems. Technical report, 1989.

Google Scholar

[13]

S. J. Young, G. Evermann, M. J. F. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C. Woodland. The HTK Book (for HTK version 3.4). Cambridge University, December 2006.

Google Scholar

Cited By

View all

Mehra BShen KYen HLiu C(2023)Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text CompositionProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3597134(1-11)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3571884.3597134
Zhao MHuang HLi ZLiu RCui WToshniwal KGoel AWang AZhao XRashidian SBaig FPhi KZhai SRamakrishnan IWang FBi X(2022)EyeSayCorrect: Eye Gaze and Voice Based Hands-free Text Correction for Mobile DevicesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511103(470-482)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490099.3511103
Zhao MCui WRamakrishnan IZhai SBi X(2021)Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for SmartphonesThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474742(162-178)Online publication date: 10-Oct-2021
https://dl.acm.org/doi/10.1145/3472749.3474742
Show More Cited By

Index Terms

Speak-as-you-swipe (SAYS): a multimodal interface combining speech and gesture keyboard synchronously for continuous mobile text entry

Recommendations

Hype or Ready for Prime Time?: Speech Recognition on Mobile Handheld Devices MASR

The pervasiveness of mobile handheld devices and advancement in real-time continuous speech recognition technology has opened up a wide range of research opportunities in human-computer interaction for those devices. On the one hand, there has been an ...
B2B-Swipe: Swipe Gesture for Rectangular Smartwatches from a Bezel to a Bezel
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

We present B2B-Swipe, a single-finger swipe gesture for a rectangular smartwatch that starts at a bezel and ends at a bezel to enrich input vocabulary. There are 16 possible B2B-Swipes because a rectangular smartwatch has four bezels. Moreover, B2B-...
Bezel swipe: conflict-free scrolling and multiple selection on mobile touch screen devices
CHI '09: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Zooming user interfaces are increasingly popular on mobile devices with touch screens. Swiping and pinching finger gestures anywhere on the screen manipulate the displayed portion of a page, and taps open objects within the page. This makes navigation ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

October 2012

636 pages

ISBN:9781450314671

DOI:10.1145/2388676

General Chairs:
Louis-Philippe Morency
University of Southern California, USA
,
Dan Bohus
Microsoft Research, USA
,
Hamid Aghajan
Stanford University, USA
,
Program Chairs:
Justine Cassell
Carnegie Mellon University, USA
,
Anton Nijholt
University of Twente, Netherlands
,
Julien Epps
The University of New South Wales, Australia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '12

Sponsor:

SIGCHI

ICMI '12: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 22 - 26, 2012

California, Santa Monica, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
389
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Mehra BShen KYen HLiu C(2023)Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text CompositionProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3597134(1-11)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3571884.3597134
Zhao MHuang HLi ZLiu RCui WToshniwal KGoel AWang AZhao XRashidian SBaig FPhi KZhai SRamakrishnan IWang FBi X(2022)EyeSayCorrect: Eye Gaze and Voice Based Hands-free Text Correction for Mobile DevicesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511103(470-482)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490099.3511103
Zhao MCui WRamakrishnan IZhai SBi X(2021)Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for SmartphonesThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474742(162-178)Online publication date: 10-Oct-2021
https://dl.acm.org/doi/10.1145/3472749.3474742
Zhang MWen HCui WZhu SAndrew Schwartz HBi XWobbrock J(2021)AI-Driven Intelligent Text Correction Techniques for Mobile Text EntryArtificial Intelligence for Human Computer Interaction: A Modern Approach10.1007/978-3-030-82681-9_5(131-168)Online publication date: 5-Nov-2021
https://doi.org/10.1007/978-3-030-82681-9_5
Gao SYan SZhao HNathan AGao SYan SZhao HNathan A(2021)Emerging ApplicationsTouch-Based Human-Machine Interaction10.1007/978-3-030-68948-3_7(179-229)Online publication date: 26-Mar-2021
https://doi.org/10.1007/978-3-030-68948-3_7
Ghosh DLiu CZhao SHara K(2020)Commanding and Re-DictationACM Transactions on Computer-Human Interaction10.1145/339088927:4(1-31)Online publication date: 3-Aug-2020
https://dl.acm.org/doi/10.1145/3390889
Cui WZhu SZhang MSchwartz HWobbrock JBi XIqbal SMacLean KChevalier FMueller S(2020)JustCorrectProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415857(487-499)Online publication date: 20-Oct-2020
https://dl.acm.org/doi/10.1145/3379337.3415857
Gherman OSchipor OGheran B(2018)VErGE: A system for collecting voice, eye gaze, gesture, and EEG data for experimental studies2018 International Conference on Development and Application Systems (DAS)10.1109/DAAS.2018.8396088(150-155)Online publication date: May-2018
https://doi.org/10.1109/DAAS.2018.8396088
Hussain NWook TNoor SMohamed H(2017)Multi-touch gestures in multimodal systems interaction among preschool children2017 6th International Conference on Electrical Engineering and Informatics (ICEEI)10.1109/ICEEI.2017.8312436(1-6)Online publication date: Nov-2017
https://doi.org/10.1109/ICEEI.2017.8312436
Sim K(2014)A multimodal stroke-based predictive input for efficient Chinese text entry on mobile devices2014 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT.2014.7078616(448-453)Online publication date: Dec-2014
https://doi.org/10.1109/SLT.2014.7078616
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Hype or Ready for Prime Time?: Speech Recognition on Mobile Handheld Devices MASR

B2B-Swipe: Swipe Gesture for Rectangular Smartwatches from a Bezel to a Bezel

Bezel swipe: conflict-free scrolling and multiple selection on mobile touch screen devices