[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3678884.3681910acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Sorting Pauses in Speech Based Dialog Systems: Effects Of Knowing Whether Pauses in Speech are Mid-Turn or at The End Of the Turn

Published: 13 November 2024 Publication History

Abstract

Intelligent tutoring systems (ITS) can provide scalable and accessible education at scale. Generally, speech-based tutoring systems interpret the users' long pauses as a signal to take the floor. However, such behavior creates barriers for minority and low-literacy individuals who pause mid-sentence just to finish articulating their thoughts. For them, these systems inappropriately interrupt them, resulting in frustration. In this study, we present a bigram pause detection model that determines whether a pause is occurring in the middle or at the end of a sentence and implemented this model on an ITS (MODIFIED). We also compared the effect of this modification with a baseline ITS that interprets long pauses as the end of a turn (TRADITIONAL). Our results show that participants interacted longer with MODIFIED, and perceive it more human-like than the TRADITIONAL system. We argue that conversational systems need to be designed considering unique needs of their intended interlocutors.

References

[1]
Nancy Adler, Judy Stewart, and The Psychosocial Working Group. 2007. The MacArthur Scale of Subjective Social Status. Technical Report. University of California, San Francisco. http://www.macses.ucsf.edu/research/psychosocial/subjective.php
[2]
Christoph Bartneck, Dana Kuliç, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International journal of social robotics, Vol. 1 (2009), 71--81.
[3]
Aggelina Chatziagapi, Dimitris Sgouropoulos, Constantinos Karouzos, Thomas Melistas, Theodoros Giannakopoulos, Athanasios Katsamanis, and Shrikanth Narayanan. 2022. Audio and ASR-based Filled Pause Detection. In 2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--7. https://doi.org/10.1109/ACII55700.2022.9953889
[4]
Daniel Diermeier, Jean-François Godbout, Bei Yu, and Stefan Kaufmann. 2012. Language and Ideology in Congress. British Journal of Political Science, Vol. 42, 1 (1 2012), 31--55. https://doi.org/10.1017/S0007123411000160
[5]
Nicholas Epley, Adam Waytz, and John T. Cacioppo. 2007. On seeing human: A three-factor theory of anthropomorphism. Psychological Review, Vol. 114, 4 (2007), 864--886. https://doi.org/10.1037/0033--295X.114.4.864
[6]
Ying Fang, Anne Lippert, Zhiqiang Cai, Su Chen, Jan C. Frijters, Daphne Greenberg, and Arthur C. Graesser. 2021. Patterns of Adults with Low Literacy Skills Interacting with an Intelligent Tutoring System. International Journal of Artificial Intelligence in Education (8 2021). https://doi.org/10.1007/s40593-021-00266-y
[7]
Sarik Ghazarian, Ralph Weischedel, Aram Galstyan, and Nanyun Peng. 2020. Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 05 (4 2020), 7789--7796. https://doi.org/10.1609/aaai.v34i05.6283
[8]
A C Graesser, P Chipman, B C Haynes, and A Olney. 2005. AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. Education, IEEE Transactions on, Vol. 48, 4 (2005), 612--618. https://doi.org/10.1109/te.2005.856149
[9]
Lu Guo, Dong Wang, Fei Gu, Yazheng Li, Yezhu Wang, and Rongting Zhou. 2021. Evolution and trends in intelligent tutoring systems research: a multidisciplinary and scientometric view. Asia Pacific Education Review, Vol. 22, 3 (2021), 441--461.
[10]
Elizabeth A Hahn, Seung W Choi, James W Griffith, Kathleen J Yost, and David W Baker. 2011. Health literacy assessment using talking touchscreen technology (Health LiTT): a new item response theory-based measure of health literacy. Journal of health communication, Vol. 16 Suppl 3, Suppl 3 (2011), 150--62. https://doi.org/10.1080/10810730.2011.605434
[11]
Francisco Iacobelli, Ginger Dragon, Giselle Mazur, and Judith Guitelman. 2021. Web-Based Information Seeking Behaviors of Low-Literacy Hispanic Survivors of Breast Cancer: Observational Pilot Study. JMIR Formative Research, Vol. 5, 10 (10 2021), e22809. https://doi.org/10.2196/22809
[12]
Ruth Janning, Carlotta Schatten, and Lars Schmidt-Thieme. 2015. Recognising perceived task difficulty from speech and pause histograms. In International Workshop on Affect, Meta-Affect, Data and Learning (AMADL 2015). 14.
[13]
Byung Ok Kang, Hyung-Bae Jeon, and Yun Kyung Lee. 2024. AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation. ETRI Journal (2024), e12646.
[14]
Abdulkadir Karaci, Halil Ibrahim, Goksal Bilgici, and Nursal Arici. 2018. Effects of Web-based Intelligent Tutoring Systems on Academic Achievement and Retention. International Journal of Computer Applications, Vol. 181, 16 (9 2018), 35--41. https://doi.org/10.5120/ijca2018917806
[15]
Chandra Khatri, Anu Venkatesh, Behnam Hedayatnia, Ashwin Ram, Raefer Gabriel, and Rohit Prasad. 2018. Alexa Prize State of the Art in Conversational AI. AI Magazine, Vol. 39, 3 (9 2018), 40--55. https://doi.org/10.1609/aimag.v39i3.2810
[16]
Annabel Latham, Keeley Crockett, David McLean, and Bruce Edmonds. 2011. Oscar: an intelligent adaptive conversational agent tutoring system. In Agent and Multi-Agent Systems: Technologies and Applications: 5th KES International Conference, KES-AMSTA 2011, Manchester, UK, June 29--July 1, 2011. Proceedings 5. 563--572.
[17]
Xin Li and Dan Roth. 2002. Learning Question Classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics.
[18]
Xinge Li and Yongjun Sung. 2021. Anthropomorphism brings us closer: The mediating role of psychological distance in User--AI assistant interactions. Computers in Human Behavior, Vol. 118 (5 2021), 106680. https://doi.org/10.1016/J.CHB.2021.106680
[19]
Benjamin D. Nye, Xiangen. Hu, Arthur C. Graesser, and Zhiqiang Cai. 2014. AutoTutor in the cloud: a service-oriented paradigm for an interoperable natural-language ITS. Journal of Advanced Distributed Learning Technology, Vol. 2, 6 (2014), 49--63.
[20]
Johann C. Rocholl, Vicky Zayats, Daniel D. Walker, Noah B. Murad, Aaron Schneider, and Daniel J. Liebling. 2021. Disfluency Detection with Unlabeled Data and Small BERT Models. (4 2021).
[21]
Matt Shannon, Gabor Simko, Shuo Yiin Chang, and Carolina Parada. 2017. Improved end-of-query detection for streaming speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2017-August. https://doi.org/10.21437/Interspeech.2017--496
[22]
Olabanji Shonibare, Xiaosu Tong, and Venkatesh Ravichandran. 2022. Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass. (2 2022).
[23]
Betina R. Yanez, Diana Buitrago, Joanna Buscemi, Francisco Iacobelli, Rachel F. Adler, Marya E. Corden, Alejandra Perez-Tamayo, Judy Guitelman, and Frank J. Penedo. 2018. Study design and protocol for My Guide : An e-health intervention to improve patient-centered outcomes among Hispanic breast cancer survivors. Contemporary Clinical Trials, Vol. 65 (2 2018), 61--68. https://doi.org/10.1016/j.cct.2017.11.018

Index Terms

  1. Sorting Pauses in Speech Based Dialog Systems: Effects Of Knowing Whether Pauses in Speech are Mid-Turn or at The End Of the Turn

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        CSCW Companion '24: Companion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing
        November 2024
        755 pages
        ISBN:9798400711145
        DOI:10.1145/3678884
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 13 November 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. human-computer interaction
        2. linguistic patterns
        3. speech based tutoring systems

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        CSCW '24
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

        Upcoming Conference

        CSCW '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 46
          Total Downloads
        • Downloads (Last 12 months)46
        • Downloads (Last 6 weeks)46
        Reflects downloads up to 12 Dec 2024

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media