[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3577190.3614105acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article
Open access

Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews

Published: 09 October 2023 Publication History

Abstract

The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the quality of therapy sessions. In this work, we propose to assess therapist empathy using multimodal behavioral data, i.e. spoken language (text) and audio in real-world motivational interviewing (MI) sessions for alcohol abuse intervention. We first study each modality (text vs. audio) individually and then evaluate a multimodal approach using different fusion strategies for automated recognition of empathy levels (high vs. low). Leveraging recent pre-trained models both for text (DistilRoBERTa) and speech (HuBERT) as strong unimodal baselines, we obtain consistent 2-3 point improvements in F1 scores with early and late fusion, and the highest absolute improvement of 6–12 points over unimodal baselines. Our models obtain F1 scores of 68% when only looking at an early segment of the sessions and up to 72% in a therapist-dependent setting. In addition, our results show that a relatively small portion of sessions, specifically the second quartile, is most important in empathy prediction, outperforming predictions on later segments and on the full sessions. Our analyses in late fusion results show that fusion models rely more on the audio modality in limited-data settings, such as in individual quartiles and when using only therapist turns. Further, we observe the highest misclassification rates for parts of the sessions with MI inconsistent utterances (20% misclassified by all models), likely due to the complex nature of these types of intents in relation to perceived empathy.

References

[1]
Paul C Amrhein, William R Miller, Carolina E Yahne, Michael Palmer, and Laura Fulcher. 2003. Client commitment language during motivational interviewing predicts drug use outcomes.Journal of consulting and clinical psychology 71, 5 (2003), 862.
[2]
Taylor Berg-Kirkpatrick, David Burkett, and Dan Klein. 2012. An Empirical Investigation of Statistical Significance in NLP. In Proc. Spoken Language Processing (ICSLP). ACL, Jeju Island, South Korea, 995–1005.
[3]
Brian Borsari, Lindsey Hopkins, Jennifer Manuel, Timothy Apodaca, Nadine Mastroleo, Kristina Jackson, Molly Magill, Jerika Norona, and Kate Carey. 2019. Improvement in therapist skills over sessions in brief motivational interventions predicts client language and alcohol use outcomes. Psychology of Addictive Behaviors 33, 5 (2019), 484–494.
[4]
Brian Borsari, John TP Hustad, Nadine R Mastroleo, Tracy O’Leary Tevyaw, Nancy P Barnett, Christopher W Kahler, Erica Eaton Short, and Peter M Monti. 2012. Addressing alcohol use and problems in mandated college students: a randomized clinical trial using stepped care.Journal of consulting and clinical psychology 80, 6 (2012), 1062.
[5]
Doğan Can, David C Atkins, and Shrikanth S Narayanan. 2015. A dialog act tagging approach to behavioral coding: A case study of addiction counseling conversations. In Sixteenth Annual Conference of the International Speech Communication Association. International Speech Communication Association, Dresden, Germany, 339–343.
[6]
Derek Caperton, David Atkins, and Zac Imel. 2018. Rating motivational interviewing fidelity from thin slices. Psychology of addictive behaviors 32 (2018), 434–441. Issue 4.
[7]
Kate Carey, Lori Scott-Sheldon, Lorra Garey, Jennifer Elliott, and Michael Carey. 2016. Alcohol interventions for mandated college students: A meta-analytic review. Journal of Consulting & Clinical Psychology 84, 7 (2016), 619–632.
[8]
Sandeep Nallan Chakravarthula, Bo Xiao, Zac E. Imel, David C. Atkins, and Panayiotis G. Georgiou. 2015. Assessing empathy using static and dynamic behavior models based on therapist’s language in addiction counseling. In Interspeech. International Speech Communication Association, Dresden, Germany, 668–672.
[9]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling., 9 pages.
[10]
Domenic V Cicchetti. 1994. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology.Psychological assessment 6, 4 (1994), 284.
[11]
Suzanne M Colby, Lindsay Orchowski, Molly Magill, James G Murphy, Linda A Brazil, Timothy R Apodaca, Christopher W Kahler, and Nancy P Barnett. 2018. Brief motivational intervention for underage young adult drinkers: Results from a randomized clinical trial. Alcoholism: clinical and experimental research 42, 7 (2018), 1342–1351.
[12]
Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. 2020. GoEmotions: A Dataset of Fine-Grained Emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4040–4054. https://doi.org/10.18653/v1/2020.acl-main.372
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[14]
Linda A. Dimeff, John S. Baer, Daniel R. Kivlahan, and G. Alan Marlatt. 2002. Brief alcohol screening and intervention for college students (BASICS): A harm reduction approach. Guilford Press, New York, NY.
[15]
Bradley Efron and Robert Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall/CRC, Philadelphia, PA.
[16]
Nikolaos Flemotomos, Victor R Martinez, Zhuohao Chen, Torrey A Creed, David C Atkins, and Shrikanth Narayanan. 2021. Automated quality assessment of cognitive behavioral therapy sessions through highly contextualized language representations. PloS one 16, 10 (2021), e0258639.
[17]
Nikolaos Flemotomos, Victor R Martinez, James Gibson, David C Atkins, Torrey A Creed, and Shrikanth S Narayanan. 2018. Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions. In Interspeech. International Speech Communication Association, Hyderabad, India, 1908–1912.
[18]
Kathryn Fokas, Jon Houck, and Barbara McCrady. 2020. Inside Alcohol Behavioral Couple Therapy (ABCT): In-session speech trajectories and drinking outcomes. Journal of Substance Use & Addiction Treatment (JSAT) 118 (2020), 7 pages.
[19]
Itai Gat, Idan Schwartz, Alexander Schwing, and Tamir Hazan. 2020. Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies. In Advances in Neural Information Processing Systems (Vancouver, BC, Canada), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 3197–3208. https://proceedings.neurips.cc/paper_files/paper/2020/file/20d749bc05f47d2bd3026ce457dcfd8e-Paper.pdf
[20]
James Gibson, Dogan Can, Bo Xiao, Zac E Imel, David C Atkins, Panayiotis Georgiou, and Shrikanth Narayanan. 2016. A deep learning approach to modeling empathy in addiction counseling. Commitment 111, 2016 (2016), 21.
[21]
Jochen Hartmann. 2022. Emotion English DistilRoBERTa-base. online. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/
[22]
Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, and Soujanya Poria. 2022. Analyzing Modality Robustness in Multimodal Sentiment Analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 685–696. https://doi.org/10.18653/v1/2022.naacl-main.50
[23]
Zihao He, Leili Tavabi, Kristina Lerman, and Mohammad Soleymani. 2021. Speaker Turn Modeling for Dialogue Act Classification. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2150–2157. https://doi.org/10.18653/v1/2021.findings-emnlp.185
[24]
Jon Houck, Sarah Hunter, Jennifer Benson, Linda Cochrum, Lauren Rowell, and Elizabeth D’Amico. 2015. Temporal variation in facilitator and client behavior during group motivational interviewing sessions. Psychology of Addictive Behaviors 29, 4 (2015), 941–949.
[25]
Jon M Houck, Sarah B Hunter, Jennifer G Benson, Linda L Cochrum, Lauren N Rowell, and Elizabeth J D’Amico. 2015. Temporal variation in facilitator and client behavior during group motivational interviewing sessions.Psychology of Addictive Behaviors 29, 4 (2015), 941.
[26]
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 3451–3460.
[27]
Rimita Lahiri, Md Nasir, Catherine Lord, So Hyun Kim, and Shrikanth Narayanan. 2023. A Context-Aware Computational Approach for Measuring Vocal Entrainment in Dyadic Conversations. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Piscataway, NJ. https://doi.org/10.1109/ICASSP49357.2023.10095512
[28]
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie (Yufan) Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2021. MultiBench: Multiscale Benchmarks for Multimodal Representation Learning. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung (Eds.). Vol. 1. Curran, Red Hook, NY. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/37693cfc748049e45d87b8c7d8b9aacd-Paper-round1.pdf
[29]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 1 (2019), 13 pages.
[30]
Sarah Lord, Elisa Sheng, Zac Imel, John Baer, and David Atkins. 2015. More Than Reflections: Empathy in Motivational Interviewing Includes Language Style Synchrony Between Therapist and Client. Behavior Therapy 46 (11 2015), 16 pages.
[31]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR). OpenReview, New Orleans, LA, USA, 18 pages.
[32]
Reza Lotfian and Carlos Busso. 2017. Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing 10, 4 (2017), 471–483.
[33]
Molly Magill, Tim Janssen, Nadine Mastroleo, Ariel Hoadley, Justin Walthers, Nancy Barnett, and Suzanne Colby. 2019. Motivational interviewing technical process and moderated relational process with underage young adult heavy drinkers. Psychology of Addictive Behaviors 33, 2 (2019), 128–138.
[34]
Leena Mathur, Micol Spitale, Hao Xi, Jieyun Li, and Maja J Matarić. 2021. Modeling user empathy elicited by a robot storyteller. In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, IEEE, Piscataway, NJ, 1–8.
[35]
Scott W McQuiggan and James C Lester. 2006. Learning empathy: a data-driven framework for modeling empathetic companion agents. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. Association for Computing Machinery, New York, NY, USA, 961–968.
[36]
Scott W McQuiggan and James C Lester. 2007. Modeling and evaluating empathy in embodied companion agents. International Journal of Human-Computer Studies 65, 4 (2007), 348–360.
[37]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (Eds.). Vol. 26. Curran Associates, Inc., Lake Tahoe, Nevada, 9 pages.
[38]
Mary Beth Miller, Thad Leffingwell, Kasey Claborn, Ellen Meier, Scott Walters, and Clayton Neighbors. 2013. Personalized feedback interventions for college alcohol misuse: An update of Walters & Neighbors (2005). Psychology of Addictive Behaviors 27, 4 (2013), 909–920.
[39]
William Miller and Stephen Rollnick. 2013. Motivational interviewing: Helping people change. Guilford Press, New York, NY.
[40]
William R Miller, Theresa B Moyers, Denise Ernst, and Paul Amrhein. 2003. Manual for the motivational interviewing skill code (MISC)., 50 pages.
[41]
Saif Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. SemEval-2018 Task 1: Affect in Tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation. Association for Computational Linguistics, New Orleans, Louisiana, 1–17. https://doi.org/10.18653/v1/S18-1001
[42]
TB Moyers, T Martin, JK Manuel, WR Miller, and D Ernst. 2010. Revised global scales: Motivational interviewing treatment integrity 3.1. 1 (MITI 3.1. 1)., 29 pages.
[43]
James G. Murphy, Kathryn S. Gex, Ashley A. Dennhardt, Alex P. Miller, Susan E. O’Neill, and Brian Borsari. 2022. Beyond BASICS: A scoping review of novel intervention content to enhance the efficacy of brief alcohol interventions for emerging adults. Psychology of Addictive Behaviors 36, 6 (2022), 607–618.
[44]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NeurIPS Autodiff Workshop. NeurIPS 2017, Long Beach, CA, USA, 4 pages.
[45]
James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.
[46]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162
[47]
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2019. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 527–536. https://doi.org/10.18653/v1/P19-1050
[48]
Fabien Ringeval, Björn Schuller, Michel Valstar, Nicholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, Siyang Song, Shuo Liu, Ziping Zhao, Adria Mallol-Ragolta, Zhao Ren, Mohammad Soleymani, and Maja Pantic. 2019. AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop (Nice, France) (AVEC ’19). Association for Computing Machinery, New York, NY, USA, 3–12. https://doi.org/10.1145/3347320.3357688
[49]
Carl R Rogers. 1957. The necessary and sufficient conditions of therapeutic personality change.Journal of consulting psychology 21, 2 (1957), 95.
[50]
Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. 2018. CARER: Contextualized Affect Representations for Emotion Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3687–3697. https://doi.org/10.18653/v1/D18-1404
[51]
Ruth Scheeffer. 1971. Toward effective counseling and psychotherapy. Arquivos Brasileiros de Psicologia Aplicada 23, 1 (1971), 151–152.
[52]
Klaus R Scherer and Harald G Wallbott. 1994. Evidence for universality and cultural variation of differential emotion response patterning.Journal of personality and social psychology 66, 2 (1994), 310.
[53]
Leili Tavabi, Kalin Stefanov, Setareh Nasihati Gilani, David Traum, and Mohammad Soleymani. 2019. Multimodal learning for identifying opportunities for empathetic responses. In 2019 International Conference on Multimodal Interaction. Association for Computing Machinery, Suzhou, China, 95–104.
[54]
Leili Tavabi, Kalin Stefanov, Larry Zhang, Brian Borsari, Joshua D. Woolley, Stefan Scherer, and Mohammad Soleymani. 2020. Multimodal Automatic Coding of Client Behavior in Motivational Interviewing. In Proceedings of the 2020 International Conference on Multimodal Interaction (Virtual Event, Netherlands) (ICMI ’20). Association for Computing Machinery, New York, NY, USA, 406–413. https://doi.org/10.1145/3382507.3418853
[55]
Leili Tavabi, Trang Tran, Brian Borsari, Joannalyn Delacruz, Joshua D Woolley, Stefan Scherer, and Mohammad Soleymani. 2023. Therapist empathy assessment in motivational interviews. In 2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, IEEE, Boston, MA, 1–8.
[56]
Leili Tavabi, Trang Tran, Kalin Stefanov, Brian Borsari, Joshua Woolley, Stefan Scherer, and Mohammad Soleymani. 2021. Analysis of Behavior Classification in Motivational Interviewing. In Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access. Association for Computational Linguistics, Online, 110–115. https://doi.org/10.18653/v1/2021.clpsych-1.13
[57]
Charles Truax. 1971. Research on certain therapist interpersonal skill in relation to process and outcome.
[58]
Scott T. Walters. 2000. In Praise of Feedback: An Effective Intervention for College Students Who Are Heavy Drinkers. Journal of American College Health 48, 5 (2000), 235–238. https://doi.org/10.1080/07448480009599310
[59]
Scott T. Walters and Clayton Neighbors. 2005. Feedback interventions for college alcohol misuse: what, why and for whom?Journal of Addictive Behaviors 30, 6 (2005), 1168–1182.
[60]
Bo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac E Imel, David C Atkins, Panayiotis G Georgiou, and Shrikanth S Narayanan. 2014. Modeling therapist empathy through prosody in drug addiction counseling. In Proc. Interspeech 2014. International Speech Communication Association, Singapore, 213–217.
[61]
Bo Xiao, Dogan Can, Panayiotis Georgiou, David Atkins, and Shrikanth Narayanan. 2012. Analyzing the language of therapist empathy in motivational interview based psychotherapy. In Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, IEEE, Hollywood, CA, USA, 1–4.
[62]
Bo Xiao, Panayiotis G Georgiou, Zac E Imel, David C Atkins, and Shrikanth S Narayanan. 2013. Modeling therapist empathy and vocal entrainment in drug addiction counseling. In Interspeech. International Speech Communication Association, Lyon, France, 2861–2865.
[63]
Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, and Shrikanth S. Narayanan. 2015. Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counseling. In Proc. Interspeech 2015. International Speech Communication Association, Dresden, Germany, 2489–2493. https://doi.org/10.21437/Interspeech.2015-537
[64]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 1103–1114. https://doi.org/10.18653/v1/D17-1115

Cited By

View all
  • (2024)Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI61247.2024.00057(392-401)Online publication date: 3-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction
October 2023
858 pages
ISBN:9798400700552
DOI:10.1145/3577190
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Motivational interview
  2. empathy
  3. language
  4. multimodal learning
  5. speech

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)433
  • Downloads (Last 6 weeks)39
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI61247.2024.00057(392-401)Online publication date: 3-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media