[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2388676.2388793acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Speak-as-you-swipe (SAYS): a multimodal interface combining speech and gesture keyboard synchronously for continuous mobile text entry

Published: 22 October 2012 Publication History

Abstract

Modern mobile devices, such as the smartphones and tablets, are becoming increasingly popular amongst users of all ages. Text entry is one of the most important modes of interaction between human and their mobile devices. Although typing on a touchscreen display using a soft keyboard remains the most common text input method for many users, the process can be frustratingly slow, especially on smartphones with a much smaller screen. Voice input offers an attractive alternative that completely eliminates the need for typing. However, voice input relies on automatic speech recognition technology whose performance degrades significantly in noisy environment or for non-native users. This paper presents Speak-As-You-Swipe (SAYS), a novel multimodal interface that enables efficient continuous text entry on mobile devices. SAYS integrates a gesture keyboard with speech recognition to improve the efficiency and accuracy of text entry. The swipe gesture and voice inputs provide complementary information that can be very effective in disambiguating confusions in word predictions. The word prediction hypotheses from a gesture keyboard are directly incorporated into the speech recognition process so that the SAYS interface can handle continuous input. Experimental results show that for a 20k vocabulary, the proposed SAYS interface can achieve prediction accuracy of 96.4% in clean condition and about 94.0% in noisy environment, compared to 92.2% using a gesture keyboard alone.

References

[1]
A. Acero, L. Deng, T. Kristjansson, and J. Zhang. HMM adaptation using vector Taylor series for noisy speech recognition. In Proc. of ICSLP, volume 3, pages 869--872, 2000.
[2]
E. Clarkson, J. Clawson, K. Lyons, and T. Starner. An empirical study of typing rates on mini-qwerty keyboards. In CHI '05 extended abstracts on Human factors in computing systems, CHI EA '05, pages 1288--1291, New York, NY, USA, 2005. ACM.
[3]
S. B. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic, Speech and Signal Processing, 28(4):357--366, 1980.
[4]
J. Garofalo et al. CSR-I (WSJ0) Sennheiser. Linguistic Data Consortium, Philadelphia, 1993.
[5]
P. O. Kristensson and K. Vertanen. Asynchronous multimodal text entry using speech and gesture keyboards. In Proceedings of Interspeech, 2011.
[6]
P. O. Kristensson and S. Zhai. SHARK2: a large vocabulary shorthand writing system for pen-based computers. In the 17th Annual ACM Symposium on User Interface Software and Technology, pages 43--52. ACM Press, 2004.
[7]
H. Ney and S. Ortmanns. Dynamic programming search for continuous speech recognition. IEEE Signal Processing Magazine, 16(5):64--83, 1999.
[8]
S. Ortmanns, H. Ney, H. Coenen, and E. A. Look-ahead techniques for fast beam search. In ICASSP, 1997.
[9]
K. C. Sim. Haptic voice recognition: Augmenting speech modality with touch events for efficient speech recognition. In Proc. SLT Workshop, 2010.
[10]
K. C. Sim. Probabilistic integration of partial lexical information for noise robust haptic voice recognition. In Proceedings of the 50th annual meeting on Association for Computational Linguistics, ACL '12, 2012.
[11]
A. Varga and H. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3):247--251, 1993.
[12]
S. Young, N. Russell, and J. Thornton. Token passing: a simple conceptual model for connected speech recognition systems. Technical report, 1989.
[13]
S. J. Young, G. Evermann, M. J. F. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C. Woodland. The HTK Book (for HTK version 3.4). Cambridge University, December 2006.

Cited By

View all
  • (2023)Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text CompositionProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3597134(1-11)Online publication date: 19-Jul-2023
  • (2022)EyeSayCorrect: Eye Gaze and Voice Based Hands-free Text Correction for Mobile DevicesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511103(470-482)Online publication date: 22-Mar-2022
  • (2021)Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for SmartphonesThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474742(162-178)Online publication date: 10-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction
October 2012
636 pages
ISBN:9781450314671
DOI:10.1145/2388676
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gesture keyboard
  2. mobile text input
  3. multimodal interface
  4. voice input

Qualifiers

  • Research-article

Conference

ICMI '12
Sponsor:
ICMI '12: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
October 22 - 26, 2012
California, Santa Monica, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text CompositionProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3597134(1-11)Online publication date: 19-Jul-2023
  • (2022)EyeSayCorrect: Eye Gaze and Voice Based Hands-free Text Correction for Mobile DevicesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511103(470-482)Online publication date: 22-Mar-2022
  • (2021)Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for SmartphonesThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474742(162-178)Online publication date: 10-Oct-2021
  • (2021)AI-Driven Intelligent Text Correction Techniques for Mobile Text EntryArtificial Intelligence for Human Computer Interaction: A Modern Approach10.1007/978-3-030-82681-9_5(131-168)Online publication date: 5-Nov-2021
  • (2021)Emerging ApplicationsTouch-Based Human-Machine Interaction10.1007/978-3-030-68948-3_7(179-229)Online publication date: 26-Mar-2021
  • (2020)Commanding and Re-DictationACM Transactions on Computer-Human Interaction10.1145/339088927:4(1-31)Online publication date: 3-Aug-2020
  • (2020)JustCorrectProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415857(487-499)Online publication date: 20-Oct-2020
  • (2018)VErGE: A system for collecting voice, eye gaze, gesture, and EEG data for experimental studies2018 International Conference on Development and Application Systems (DAS)10.1109/DAAS.2018.8396088(150-155)Online publication date: May-2018
  • (2017)Multi-touch gestures in multimodal systems interaction among preschool children2017 6th International Conference on Electrical Engineering and Informatics (ICEEI)10.1109/ICEEI.2017.8312436(1-6)Online publication date: Nov-2017
  • (2014)A multimodal stroke-based predictive input for efficient Chinese text entry on mobile devices2014 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT.2014.7078616(448-453)Online publication date: Dec-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media