[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Published: 25 May 2023 Publication History

Abstract

Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.

References

[1]
C. Darwin, “Expression of the emotions in man and animals,” Nature, vol. 36, no. 926, pp. 294–295, 1887, ISBN: 0195158067.
[2]
J. K. Burgoon, N. Magnenat-Thalmann, M. Pantic, and A. Vinciarelli, Social Signal Processing. Cambridge, U.K.: Cambridge Univ. Press, 2017.
[3]
A. Vinciarelli, M. Pantic, and H. Bourlard, “Social signal processing: Survey of an emerging domain,” Image Vis. Comput., vol. 27, no. 12, pp. 1743–1759, 2009.
[4]
M. Mancini, G. Varni, D. Glowinski, and G. Volpe, “Computing and evaluating the body laughter index,” in Proc. Int. Workshop Hum. Behav. Understanding, 2012, pp. 90–98, ISBN: 9783642340130.
[5]
R. Niewiadomski, J. Urbain, C. Pelachaud, and T. Dutoit, “Finding out the audio and visual features that influence the perception of laughter intensity and differ in inhalation and exhalation phases,” in Proc. 4th Int. Workshop Corpora Res. Emotion, 2012, pp. 25–32.
[6]
G. McKeown et al., “Human perception of laughter from context-free whole body motion dynamic stimuli,” in Proc. Humaine Assoc. Conf. Affect. Comput. Intell. Interact., 2013, pp. 306–311.
[7]
E. Di Lascio, S. Gashi, and S. Santini, “Laughter recognition using non-invasive wearable devices,” in Proc. 13th EAI Int. Conf. Pervasive Comput. Technol. Healthcare, 2019, pp. 262–271.
[8]
R. Niewiadomski, Y. Ding, M. Mancini, C. Pelachaud, G. Volpe, and A. Camurri, “Perception of intensity incongruence in synthesized multimodal expressions of laughter,” in Proc. Int. Conf. Affect. Comput. Intell. Interact., 2015, pp. 684–690, ISBN: 9781479999538.
[9]
K. El Haddad, S. N. Chakravarthula, and J. Kennedy, “Smile and laugh dynamics in naturalistic dyadic interactions: Intensity levels, sequences and roles,” in Proc. Int. Conf. Multimodal Interact., New York, NY, USA: Association for Computing Machinery, 2019, pp. 259–263.
[10]
W. Curran, G. J. McKeown, M. Rychlowska, E. André, J. Wagner, and F. Lingenfelser, “Social context disambiguates the interpretation of laughter,” Front. Psychol., vol. 8, pp. 1–12, 2018.
[11]
C. Mazzocconi, Y. Tian, and J. Ginzburg, “What's your laughter doing there? A taxonomy of the pragmatic functions of laughter,” IEEE Trans. Affect. Comput., vol. 13, no. 3, pp. 1302–1321, Third Quarter 2020.
[12]
S. Petridis and M. Pantic, “Audiovisual discrimination between laughter and speech,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2008, pp. 5117–5120.
[13]
S. Petridis, B. Martinez, and M. Pantic, “The MAHNOB laughter database,” Image Vis. Comput., vol. 31, no. 2, pp. 186–202, 2013.
[14]
K. P. Truong and D. A. van Leeuwen, “Automatic discrimination between laughter and speech,” Speech Commun., vol. 49, no. 2, pp. 144–158, 2007, ISBN: 0167–6393.
[15]
K. P. Truong and J. Trouvain, “On the acoustics of overlapping laughter in conversational speech,” in Proc. Interspeech, 2012, pp. 850–853.
[16]
L. Cabrera-Quiros, A. Demetriou, E. Gedik, L. van der Meij, and H. Hung, “The MatchNMingle dataset: A novel multi-sensor resource for the analysis of social interactions and group dynamics in-the-wild during free-standing conversations and speed dates,” IEEE Trans. Affect. Comput., vol. 12, no. 1, pp. 113–130, First Quarter 2021.
[17]
E. Gedik and H. Hung, “Personalised models for speech detection from body movements using transductive parameter transfer,” Pers. Ubiquitous Comput., vol. 21, no. 4, pp. 723–737, 2017.
[18]
J. Vargas and H. Hung, “CNNs and fisher vectors for no-audio multimodal speech detection,” in Proc. Work. Notes Proc. MediaEval 2019 Workshop, 2019, pp. 11–13.
[19]
C. Raman, J. Vargas-Quiros, S. Tan, E. Gedik, A. Islam, and H. Hung, “ConfLab: A rich multimodal multisensor dataset of free-standing social interactions in the wild,” Jul. 2022,.
[20]
J. Ginzburg, E. Breitholtz, R. Cooper, J. Hough, and T. Ye, “Understanding laughter,” in Proc. 20th Amsterdam Colloq., 2015, Art. no.
[21]
K. Oatley and P. Johnson-Laird, “Cognitive approaches to emotions,” Trends Cogn. Sci., vol. 18, no. 3, pp. 134–140, Mar. 2014.
[22]
K. R. Scherer, “The dynamic architecture of emotion: Evidence for the component process model,” Cogn. Emotion, vol. 23, no. 7, pp. 1307–1351, Nov. 2009.
[23]
P. Glenn, Laughter in Interact.. Cambridge, U.K: Cambridge Univ. Press, 2003. [Online]. Available: https://www.cambridge.org/core/books/laughter-in-interaction/4629463A15293CFEBD21EE70AAC966F2
[24]
P. Glenn and E. Holt, Studies of Laughter in Interact., P. Glenn and E. Holt Eds., London, U.K.: Continuum Press, May 2013. [Online]. Available: http://www.bloomsbury.com/UK/studies-of-laughter-in-interaction-9781441164797/
[25]
G. Jefferson, “On the organization of laughter in talk about troubles,” in Structures of Social Action, J. M. Atkinson Ed., Cambridge, U.K.: Cambridge Univ. Press, 1985, pp. 346–369. [Online]. Available: https://www.cambridge.org/core/books/structures-of-social-action/on-the-organization-of-laughter-in-talk-about-troubles/7BA34066E5BA65A405570ECA8418B688
[26]
M. Gervais and D. S. Wilson, “The evolution and functions of laughter and humor: A synthetic approach,” Quart. Rev. Biol., vol. 80, no. 4, pp. 395–430, 2005.
[27]
R. I. M. Dunbar, “Laughter and its role in the evolution of human social bonding,” Philos. Trans. Roy. Soc. B: Biol. Sci., vol. 377, no. 1863, Sep. 2022, Art. no. [Online]. Available: https://royalsocietypublishing.org/doi/full/10.1098/rstb.2021.0176
[28]
M. Miller and W. F. Fry, “The effect of mirthful laughter on the human cardiovascular system,” Med. Hypotheses, vol. 73, no. 5, pp. 636–639, Nov. 2009.
[29]
J. Gillick, W. Deng, K. Ryokai, and D. Bamman, “Robust laughter detection in noisy environments,” in Proc. Interspeech, 2021, pp. 2481–2485.
[30]
S. Petridis, M. Leveque, and M. Pantic, “Audiovisual detection of laughter in human-machine interaction,” in Proc. Humaine Assoc. Conf. Affect. Comput. Intell. Interact., 2013, pp. 129–134.
[31]
H. J. Griffin et al., “Laughter type recognition from whole body motion,” in Proc. Humaine Assoc. Conf. Affect. Comput. Intell. Interact., 2013, pp. 349–355, ISSN: 2156–8111.
[32]
H. J. Griffin et al., “Perception and automatic recognition of laughter from whole-body motion: Continuous and categorical perspectives,” IEEE Trans. Affect. Comput., vol. 6, no. 2, pp. 165–178, Second Quarter 2015.
[33]
R. Niewiadomski and C. Pelachaud, “Towards multimodal expression of laughter,” in Proc. Int. Conf. Intell. Virtual Agents, 2012, Art. no.
[34]
G. McKeown, W. Curran, J. Wagner, F. Lingenfelser, and E. André, “The belfast storytelling database: A spontaneous social interaction database with laughter focused annotation,” in Proc. Int. Conf. Affect. Comput. Intell. Interact., 2015, pp. 166–172.
[35]
M.-P. Jansen, K. P. Truong, D. S. Nazareth, and D. K. J. Heylen, “Introducing MULAI: A multimodal database of laughter during dyadic interactions,” in Proc. 12th Lang. Resour. Eval. Conf., 2020, pp. 4333–4342.
[36]
R. Provine, Laughter: A Scientific Investigation. London, U.K.: Penguin Press, 2001.
[37]
E. J. Capistrano, K. A. R. Espiritu, M. Tandoc, J. K. G. Lim, and J. Cu, “Classifying laughter using the component process model,” in Proc. 10th Int. Conf. Affect. Comput. Intell. Interact., 2022, pp. 1–5, ISSN:.
[38]
R. R. Provine, “Laughter punctuates speech: Linguistic, social and gender contexts of laughter,” Ethology, vol. 95, no. 4, pp. 291–298, 1993.
[39]
K. P. Truong, J. Trouvain, and M.-P. Jansen, “Towards an annotation scheme for complex laughter in speech corpora,” in Proc. Interspeech, 2019, pp. 529–533.
[40]
K. El Haddad, H. Cakmak, and T. Dutoit, “On laughter intensity level: Analysis and estimation,” in Proc. Laughter Workshop, 2018, pp. 34–39.
[41]
S. Dupont et al., “Laughter Research: A Review of the ILHAIRE Project,” in Toward Robotic Socially Believable Behaving Systems. Berlin, Germany: Springer, 2016.
[42]
E. Holt, “The last laugh: Shared laughter and topic termination,” J. Pragmatics, vol. 42, no. 6, pp. 1513–1525, 2010.
[43]
N. O’donnell-Trujillo and K. Adams, “Heheh in conversation: Some coordinating accomplishments of laughter,” Western J. Speech Commun., vol. 47, no. 2, pp. 175–191, 1983.
[44]
A. Wood and P. Niedenthal, “Developing a social functional account of laughter,” Social Pers. Psychol. Compass, vol. 12, no. 4, 2018, Art. no.
[45]
K. Truong and D. V. Leeuwen, “Automatic detection of laughter,” in Proc. 9th Eur. Conf. Speech Commun. Technol., 2005, pp. 485–488, ISBN: 1855212986.
[46]
S. Petridis and M. Pantic, “Audiovisual laughter detection based on temporal features,” in Proc. Belgian/Netherlands Artif. Intell. Conf., 2008, pp. 351–352.
[47]
S. Petridis and M. Pantic, “Audiovisual discrimination between speech and laughter: Why and when visual information might help,” Why When Vis., vol. 13, no. 2, pp. 216–234, 2011.
[48]
H. Bohy, K. El Haddad, and T. Dutoit, “A new perspective on smiling and laughter detection: Intensity levels matter,” in Proc. 10th Int. Conf. Affect. Comput. Intell. Interact., 2022, pp. 1–8, ISSN:.
[49]
C. Feichtenhofer, H. Fan, J. Malik, and K. He, “SlowFast networks for video recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Seoul, South Korea: IEEE, 2019, pp. 6201–6210. [Online]. Available: https://ieeexplore.ieee.org/document/9008780/
[50]
R. Niewiadomski, M. Mancini, G. Varni, G. Volpe, and A. Camurri, “Automated laughter detection from full-body movements,” IEEE Trans. Hum.-Mach. Syst., vol. 46, no. 1, pp. 113–123, Feb. 2016.
[51]
C. Beyan, M. Shahid, and V. Murino, “RealVAD: A real-world dataset and a method for voice activity detection by body motion analysis,” IEEE Trans. Multimedia, vol. 9210, pp. 2071–2085, 2020.
[52]
L. Cabrera-Quiros, D. M. J. Tax, and H. Hung, “Gestures in-the-wild: Detecting conversational hand gestures in crowded scenes using a multimodal fusion of bags of video trajectories and body worn acceleration,” IEEE Trans. Multimedia, vol. 22, no. 1, pp. 138–147, Jan. 2020.
[53]
X. Wang, J. Zhu, and O. Scharenborg, “Multimodal fusion of body movement signals for no-audio speech detection,” in Proc. Work. Notes Proc. MediaEval 2020 Workshop, 2020, Art. no.
[54]
K. E. Haddad, S. Dupont, J. Urbain, and T. Dutoit, “Speech-laughs: An HMM-based approach for amused speech synthesis,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2015, pp. 4939–4943, ISSN:.
[55]
S. Petridis, “A short introduction to laughter,” 2015. [Online]. Available: https://ibug.doc.ic.ac.UK/media/uploads/documents/shortintrotolaughter.pdf
[57]
J. Trouvain, “Segmenting phonetic units in laughter,” in Proc. Int. Congr. Phonetic Sci., 2003, pp. 2793–2796, ISBN: 1876346485.
[58]
H. Bohy, A. Hammoudeh, A. Maiorca, S. Dupont, and T. Dutoit, “Analysis of co-laughter gesture relationship on RGB videos in dyadic conversation context,” in Proc. Workshop Smiling Laughter Across Contexts Life-Span, 13th Lang. Resour. Eval. Conf., Marseille, France:European Language Resources Association, 2022, pp. 21–25. [Online]. Available: https://aclanthology.org/2022.smila-1.5
[59]
A. Hammoudeh, A. Maiorca, S. Dupont, and T. Dutoit, “Are there any body-movement differences between women and men when they laugh?,” in Proc. Workshop Smiling Laughter Across Contexts Life-Span, 13th Lang. Resour. Eval. Conf., Marseille, France:European Language Resources Association, 2022, pp. 30–31. [Online]. Available: https://aclanthology.org/2022.smila-1.9
[60]
T. R. Jordan and L. Abedipour, “The importance of laughing in your face: Influences of visual laughter on auditory laughter perception,” Perception, vol. 39, no. 9, pp. 1283–1285, 2010.
[61]
J. Carletta, “Unleashing the killer corpus: Experiences in creating the multi-everything AMI meeting corpus,” Lang. Resour. Eval., vol. 41, no. 2, pp. 181–190, 2007.
[62]
G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schröder, “The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent,” IEEE Trans. Affect. Comput., vol. 3, no. 1, pp. 5–17, Mar. 2012, ISBN: 1949–3045.
[63]
B. Reuderink, M. Poel, K. Truong, R. Poppe, and M. Pantic, “Decision-level fusion for audio-visual laughter detection,” in Proc. Int. Workshop Mach. Learn. Multimodal Interact., 2008, pp. 137–148.
[64]
M. Mancini et al., “Towards automated full body detection of laughter driven by human expert annotation,” in Proc. Humaine Assoc. Conf. Affect. Comput. Intell. Interact., 2013, pp. 757–762.
[65]
J. Cu, M. B. Luz, M. Nocum, and T. J. Purganan, “Affective laughter expressions from body movements,” in Proc. Pacific Rim Int. Conf. Artif. Intell., 2016, pp. 139–145.
[66]
Z. H. Tan, A. K. Sarkar, and N. Dehak, “rVAD: An unsupervised segment-based robust voice activity detection method,” Comput. Speech Lang., vol. 59, pp. 1–21, 2020.
[67]
T. H. Crystal and A. S. House, “Articulation rate and the duration of syllables and stress groups in connected speech,” J. Acoustical Soc. Amer., vol. 88, no. 1, pp. 101–112, Jul. 1990.
[68]
X. Alameda-Pineda et al., “SALSA: A novel dataset for multimodal group behavior analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1707–1720, Aug. 2016.
[69]
M. P. I. for Psycholinguistics, “ELAN [Computer software],”2021. [Online]. Available: https://archive.mpi.nl/tla/elan
[70]
J. V. Quiros, S. Tan, C. Raman, L. Cabrera-Quiros, and H. Hung, “Covfee: An extensible web framework for continuous-time annotation of human behavior,” in Understanding Social Behavior in Dyadic and Small Group Interactions. Cambridge MA, USA: PMLR, Mar. 2022, pp. 265–293, ISSN:.
[71]
, “Prolific,” 2014. [Online]. Available: https://www.prolific.co
[72]
Z. Huang et al., “An investigation of annotation delay compensation and output-associative fusion for multimodal continuous emotion prediction,” in Proc. Proc. 5th Int. Workshop Audio/Visual Emotion Challenge, 2015, pp. 41–48.
[73]
S. Khorram, M. McInnis, and E. Mower Provost, “Jointly aligning and predicting continuous emotion annotations,” IEEE Trans. Affect. Comput., vol. 12, no. 4, pp. 1069–1083, Fourth Quarter 2021.
[74]
S. Mariooryad and C. Busso, “Correcting time-continuous emotional labels by modeling the reaction lag of evaluators,” IEEE Trans. Affect. Comput., vol. 6, no. 2, pp. 97–108, Second Quarter 2015.
[75]
K. A. Hallgren, “Computing inter-rater reliability for observational data: An overview and tutorial,” Gff, vol. 82, no. 2, pp. 218–226, 2012.
[76]
A. Lücking, S. Ptock, and K. Bergmann, “Assessing agreement on segmentations by means of staccato, the segmentation agreement calculator according to thomann,” in Proc. 9th Int. Conf. Gesture Sign Lang. Hum.- Comput. Interact. Embodied Commun., 2011, pp. 129–138.
[77]
Y. Mathet, A. Widlöcher, and J.-P. Métivier, “The unified and holistic method gamma for inter-annotator agreement measure and alignment,” Comput. Linguistics, vol. 41, no. 3, pp. 437–479, Sep. 2015, Cambridge, MA, USA: MIT Press. [Online]. Available: https://aclanthology.org/J15–3003
[78]
Y.-W. Chao, S. Vijayanarasimhan, B. Seybold, D. A. Ross, J. Deng, and R. Sukthankar, “Rethinking the faster R-CNN architecture for temporal action localization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1130–1139. [Online]. Available: https://ieeexplore.ieee.org/document/8578222/
[79]
H. Fan et al., “PyTorchVideo: A deep learning library for video understanding,” in Proc. 29th ACM Int. Conf. Multimedia, 2021.
[80]
J. F. Gemmeke et al., “Audio set: An ontology and human-labeled dataset for audio events,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., New Orleans, LA, USA, 2017, pp. 3783–3786.
[81]
I. Oguiza, “tsai - a state-of-the-art deep learning library for time series and sequential data,” 2022. [Online]. Available: https://github.com/timeseriesAI/tsai
[82]
K. P. Truong, R. Poppe, I. De Kok, and D. Heylen, “A multimodal analysis of vocal and visual backchannels in spontaneous dialogs,” in Proc. 12th Annu. Conf. Int. Speech Commun. Assoc., 2011, pp. 2973–2976, 2011.

Cited By

View all
  • (2024)The Discontent with Intent Estimation In-the-Wild: The Case for Unrealized IntentionsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3644055(1-9)Online publication date: 11-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing  Volume 15, Issue 2
April-June 2024
375 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 25 May 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Discontent with Intent Estimation In-the-Wild: The Case for Unrealized IntentionsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3644055(1-9)Online publication date: 11-May-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media