[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1873951.1874094acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Automatic role recognition based on conversational and prosodic behaviour

Published: 25 October 2010 Publication History

Abstract

This paper proposes an approach for the automatic recognition of roles in settings like news and talk-shows, where roles correspond to specific functions like Anchorman, Guest or Interview Participant. The approach is based on purely nonverbal vocal behavioral cues, including who talks when and how much (turn-taking behavior), and statistical properties of pitch, formants, energy and speaking rate (prosodic behavior). The experiments have been performed over a corpus of around 50 hours of broadcast material and the accuracy, percentage of time correctly labeled in terms of role, is up to 89%. Both turn-taking and prosodic behavior lead to satisfactory results. Furthermore, on one database, their combination leads to a statistically significant improvement.

References

[1]
J. Ajmera and C. Wooters. A robust speaker clustering algorithm. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pages 411--416, 2003.
[2]
S. Banerjee and A. Rudnicky. Using simple speech based features to detect the state of a meeting and the roles of the meeting participants. In Proceedings of International Conference on Spoken Language Processing, pages 221--231, 2004.
[3]
R. Barzilay, M. Collins, J. Hirschberg, and S. Whittaker. The rules behind the roles: identifying speaker roles in radio broadcasts. In Proceedings of the 17$^th$ National Conference on Artificial Intelligence, pages 679--684, 2000.
[4]
P. Boersma and D. Weenink. Praat: doing phonetics by computer, 2005.
[5]
W. Dong, B. Lepri, A. Cappelletti, A. Pentland, F. Pianesi, and M. Zancanaro. Using the influence model to recognize functional roles in meetings. In Proceedings of the 9th International Conference on Multimodal Interfaces, pages 271--278, November 2007.
[6]
S. Favre, A. Dielmann, and A. Vinciarelli. Automatic role recognition in multiparty recordings using social networks and probabilistic sequential models. In Proceedings of ACM International Conference on Multimedia. 2009.
[7]
N. Garg, S. Favre, H. Salamin, D. Hakkani-Tur, and A. Vinciarelli. Role recognition for meeting participants: an approach based olexical information and Social Network Analysis. In Proceedings of the ACM International Conference on Multimedia, pages 693--696, 2008.
[8]
J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data, 2001.
[9]
K. Laskowski, M. Ostendorf, and T. Schultz. Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In proceedings of the 9th ISCA/ACL SIGdial Workshop on Discourse and Dialogue, pages 148--155, June 2008.
[10]
J. Levine and R. Moreland. Progress in small group research. Annual review of psychology, 41(1):585--634, 1990.
[11]
Y. Liu. Initial study on automatic identification of speaker role in broadcast news speech. In Proceedings of the Human Language Technology Conference on the NAACL, Companion Volume: Short Papers, pages 81--84, June 2006.
[12]
H. Salamin, S. Favre, and A. Vinciarelli. Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia, to appear, 11(7):1373--1380. 2009.
[13]
A. Vinciarelli. Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Transactions on Multimedia, 9(9):1215--1226, 2007.
[14]
A. Vinciarelli, M. Pantic, and H. Bourlard. Social Signal Processing: Survey of an emerging domain. Image and Vision Computing Journal, 27(12):1743--1759, 2009.
[15]
M. Zancanaro, B. Lepri, and F. Pianesi. Automatic detection of group functional roles in face to face interactions. In Proceedings of International Conference on Mutlimodal Interfaces, pages 47--54, 2006.

Cited By

View all
  • (2019)A Unified Framework for Head Pose, Age and Gender Classification through End-to-End Face SegmentationEntropy10.3390/e2107064721:7(647)Online publication date: 30-Jun-2019
  • (2018)Measuring Interaction Proxemics with Wearable Light TagsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/31917572:1(1-30)Online publication date: 26-Mar-2018
  • (2018)Prosodic Plot of Dialogues: A Conceptual Framework to Trace Speakers’ RoleSpeech and Computer10.1007/978-3-319-99579-3_65(636-645)Online publication date: 25-Aug-2018
  • Show More Cited By

Index Terms

  1. Automatic role recognition based on conversational and prosodic behaviour

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '10: Proceedings of the 18th ACM international conference on Multimedia
    October 2010
    1836 pages
    ISBN:9781605589336
    DOI:10.1145/1873951
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. broadcast data
    2. conditional random field
    3. multiparty recordings
    4. role recognition

    Qualifiers

    • Short-paper

    Conference

    MM '10
    Sponsor:
    MM '10: ACM Multimedia Conference
    October 25 - 29, 2010
    Firenze, Italy

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A Unified Framework for Head Pose, Age and Gender Classification through End-to-End Face SegmentationEntropy10.3390/e2107064721:7(647)Online publication date: 30-Jun-2019
    • (2018)Measuring Interaction Proxemics with Wearable Light TagsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/31917572:1(1-30)Online publication date: 26-Mar-2018
    • (2018)Prosodic Plot of Dialogues: A Conceptual Framework to Trace Speakers’ RoleSpeech and Computer10.1007/978-3-319-99579-3_65(636-645)Online publication date: 25-Aug-2018
    • (2017)Analysis of Small GroupsSocial Signal Processing10.1017/9781316676202.025(349-367)Online publication date: 13-Jul-2017
    • (2013)Modeling Functional Roles Dynamics in Small Group InteractionsIEEE Transactions on Multimedia10.1109/TMM.2012.222503915:1(83-95)Online publication date: Jan-2013
    • (2011)Automatic recognition of coordination level in an imitation taskProceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding10.1145/2072572.2072582(25-26)Online publication date: 1-Dec-2011

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media