Article

Free access

Patterns of entry and correction in large vocabulary continuous speech recognition systems

Authors:

Clare-Marie Karat,

Christine Halverson,

Daniel Horn,

John KaratAuthors Info & Claims

CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing Systems

Pages 568 - 575

https://doi.org/10.1145/302979.303160

Published: 01 May 1999 Publication History

PDF eReader

Abstract

A study was conducted to evaluate user performance and satisfaction in completion of a set of text creation tasks using three commercially available continuous speech recognition systems. The study also compared user performance on similar tasks using keyboard input. One part of the study (Initial Use) involved 24 users who enrolled, received training and carried out practice tasks, and then completed a set of transcription and composition tasks in a single session. In a parallel effort (Extended Use), four researchers used speech recognition to carry out real work tasks over 10 sessions with each of the three speech recognition software products. This paper presents results from the Initial Use phase of the study along with some preliminary results from the Extended Use phase. We present details of the kinds of usability and system design problems likely in current systems and several common patterns of error correction that we found.

References

[1]

Clark, H. H. & Brennan, S. E. (1991). Grounding in communication. In J. Levine, L. B. Resnick, and S. D. Behrand (Eds.), Shared Cognition: Thinking as Social Practice. APA Books, Washington.

Google Scholar

[2]

Danis, C. & Karat, J. (1995). Technology-driven design of speech recognition systems, in G. Olson and S. Schuon (eds.) Symposium on designing interactive systems. ACM: New York, 17-24.

Digital Library

Google Scholar

[3]

Gould, J. D., Conti, J., & Hovanyecz, T, (1983). Composing letters with a simulated listening typewriter. Communications of the ACM, 26, 4, 295- 308.

Digital Library

Google Scholar

[4]

Karat, J. (1995). Scenario use in the design of a speech recognition system. In J. Carroll (ed.) Scenario-based design. New York: Wiley.

Digital Library

Google Scholar

[5]

Kidd, A. (1994). The marks are on the knowledge worker, in Proceedings of CH1 '94 (Boston M.A, April 1994), ACM Press, 186-191.

Digital Library

Google Scholar

[6]

Lai, J. & Vergo, J. (1997). MedSpeak: Report Creation with Continuous Speech Recognition, in Proceedings of CHI '97 (Atlanta GA, March 1997), ACM Press, 431 - 438.

Digital Library

Google Scholar

[7]

Laurel, B. (1993). Computers as Theatre. Adison Wesley, New York.

Digital Library

Google Scholar

[8]

Ogozalek, V.Z., & Praag, J.V. (1986). Comparison of elderly and younger users on keyboard and voice input computer-based composition tasks, in Proceedings of CH1 '86, ACM Press, 205-211.

Digital Library

Google Scholar

[9]

Oviatt, S. (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language, 9, 19-35.

Crossref

Google Scholar

[10]

Yankelovich, N., Levow, G. A., & Marx, M. (1995). Designing SpeechActs: Issues in speech user interfaces, in Proceedings of CHI ~95 (Denver CO, May 1995), ACM Press, 369-376.

Digital Library

Google Scholar

Cited By

View all

Li QWen JJin H(2024)Governing Open Vocabulary Data Leaks Using an Edge LLM through Programming by ExampleProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997608:4(1-31)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699760
Ji KPathiyan Cherumanal STrippas JHettiachchi DSalim FScholer FSpina D(2024)Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational SearchAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680245(1-10)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3680245
Lin SWarner JZamfirescu-Pereira JLee MJain SCai SLertvittayakumjorn PHuang MZhai SHartmann BLiu C(2024)Rambler: Supporting Writing With Speech via LLM-Assisted Gist ManipulationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642217(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642217
Show More Cited By

Index Terms

Patterns of entry and correction in large vocabulary continuous speech recognition systems
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Touch screens
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Large vocabulary continuous speech recognition for Urdu
FIT '10: Proceedings of the 8th International Conference on Frontiers of Information Technology

This paper presents the development of acoustic and language models for robust Urdu speech recognition using the CMU Sphinx Open Source Toolkit for speech recognition. Three models have been developed incrementally, with the addition of speech data of ...
Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in ...
Using tone information in Cantonese continuous speech recognition

In Chinese languages, tones carry important information at various linguistic levels. This research is based on the belief that tone information, if acquired accurately and utilized effectively, contributes to the automatic speech recognition of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing Systems

May 1999

632 pages

ISBN:0201485591

DOI:10.1145/302979

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1999

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CHI99

Sponsor:

SIGCHI

CHI99: Conference on Human Factors in Computing Systems

May 15 - 20, 1999

Pennsylvania, Pittsburgh, USA

Acceptance Rates

CHI '99 Paper Acceptance Rate 78 of 312 submissions, 25%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

164
Total Citations
View Citations
2,153
Total Downloads

Downloads (Last 12 months)198
Downloads (Last 6 weeks)32

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Li QWen JJin H(2024)Governing Open Vocabulary Data Leaks Using an Edge LLM through Programming by ExampleProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997608:4(1-31)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699760
Ji KPathiyan Cherumanal STrippas JHettiachchi DSalim FScholer FSpina D(2024)Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational SearchAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680245(1-10)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3680245
Lin SWarner JZamfirescu-Pereira JLee MJain SCai SLertvittayakumjorn PHuang MZhai SHartmann BLiu C(2024)Rambler: Supporting Writing With Speech via LLM-Assisted Gist ManipulationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642217(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642217
Taieb-Maimon MRomanovskii-Chernik L(2024)Improving Error Correction and Text Editing Using Voice and Mouse Multimodal InterfaceInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2352932(1-24)Online publication date: 22-May-2024
https://doi.org/10.1080/10447318.2024.2352932
Hombeck JVoigt HLawonn K(2024)Voice user interfaces for effortless navigation in medical virtual reality environmentsComputers & Graphics10.1016/j.cag.2024.104069(104069)Online publication date: Sep-2024
https://doi.org/10.1016/j.cag.2024.104069
Chomel L(2023)YouTube et l’initiation politique des jeunes adultes. Quand les commentaires amorcent une pratique délibérativeRevue française des sciences de l’information et de la communication10.4000/rfsic.1403926Online publication date: 2023
https://doi.org/10.4000/rfsic.14039
Mehra BShen KYen HLiu C(2023)Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text CompositionProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3597134(1-11)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3571884.3597134
Yashin A(2023)A Challenge for Bringing a BCI Closer to Motor Control: The “Interface Uncanny Valley” Hypothesis2023 IEEE Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine (CSGB)10.1109/CSGB60362.2023.10329830(242-247)Online publication date: 28-Sep-2023
https://doi.org/10.1109/CSGB60362.2023.10329830
Bjerknes AOpdal LCanrinus E(2023)‘I finally understand my mistakes’ – the benefits of screencast feedbackTechnology, Pedagogy and Education10.1080/1475939X.2023.225813433:1(43-55)Online publication date: 20-Sep-2023
https://doi.org/10.1080/1475939X.2023.2258134
Liu CHu SFeng LFan M(2022)Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text AuthoringProceedings of the ACM on Human-Computer Interaction10.1145/35557586:CSCW2(1-33)Online publication date: 11-Nov-2022
https://dl.acm.org/doi/10.1145/3555758
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Large vocabulary continuous speech recognition for Urdu

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

Using tone information in Cantonese continuous speech recognition