More Web Proxy on the site http://driver.im/

short-paper

Public Access

Predicting meeting extracts in group discussions using multimodal convolutional neural networks

Authors:

Yukiko I. Nakano,

Yutaka TakaseAuthors Info & Claims

ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

Pages 421 - 425

https://doi.org/10.1145/3136755.3136803

Published: 03 November 2017 Publication History

Abstract

This study proposes the use of multimodal fusion models employing Convolutional Neural Networks (CNNs) to extract meeting minutes from group discussion corpus. First, unimodal models are created using raw behavioral data such as speech, head motion, and face tracking. These models are then integrated into a fusion model that works as a classifier. The main advantage of this work is that the proposed models were trained without any hand-crafted features, and they outperformed a baseline model that was trained using hand-crafted features. It was also found that multimodal fusion is useful in applying the CNN approach to model multimodal multiparty interaction.

References

[1]

Murray, G and Carenini, G. 2008. Summarizing Spoken and Written Conversations. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 773–782. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613813

Digital Library

[2]

Xie, S. Hakkani-Tur, D. Favre, B. and Liu, Y. 2009. Integrating prosodic features in extractive meeting summarization. IEEE Workshop on Speech Recognition and Understanding (ASRU), 387–391.

[3]

Wang, L and Cardie, C. 2012. Focused Meeting Summarization via Unsupervised Relation Extraction. Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association for Computational Linguistics, 304–313. Retrieved from http://dl.acm.org/citation.cfm?id=2392800.2392853

Digital Library

[4]

Maskey, S and Hirschberg, J. 2005. Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. INTERSPEECH- 2005, 621–624. Retrieved from http://virtualhost.cs.columbia.edu/~julia/files/eurospeech05_vfinal.pdf

[5]

Nihei, F. Nakano, YI. and Takase, Y. 2016. Meeting Extracts for Discussion Summarization Based on Multimodal Nonverbal Information. Proceedings of the 18th ACM International Conference on Multimodal Interaction, ACM, 185– 192.

Digital Library

[6]

Aran, O and Gatica-Perez, D. 2013. One of a Kind: Inferring Personality Impressions in Meetings. Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ACM, 11–18.

Digital Library

[7]

Nicolaou, MA. Gunes, H. and Pantic, M. 2011. Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space. IEEE Transactions on Affective Computing 2, 2: 92–105.

Digital Library

[8]

Hinton, GE. Osindero, S. and Teh, Y-W. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 18, 7: 1527–1554.

Digital Library

[9]

Le, Q V. 2013. Building high-level features using large scale unsupervised learning. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8595–8598.

[10]

Hinton, GE and Salakhutdinov, RR. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786: 504–507.

[11]

Bengio, Y. Lamblin, P. Popovici, D. and Larochelle, H. 2006. Greedy Layer-wise Training of Deep Networks. Proceedings of the 19th International Conference on Neural Information Processing Systems, MIT Press, 153–160. Retrieved from http://dl.acm.org/citation.cfm?id=2976456.2976476

Digital Library

[12]

Pan, J. Sayrol, E. Giro-I-Nieto, X. McGuinness, K. and O’Connor, NE. 2016. Shallow and Deep Convolutional Networks for Saliency Prediction. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 598–606.

[13]

Sainath, TN. Weiss, RJ. Senior, AW. Wilson, KW. and Vinyals, O. 2015. Learning the speech front-end with raw waveform CLDNNs. INTERSPEECH- 2015, 1–5. Retrieved from http://www.iscaspeech.org/archive/interspeech_2015/i15_0001.html

[14]

Golik, P. Tüske, Z. Schlüter, R. and Ney, H. 2015. Convolutional neural networks for acoustic modeling of raw time signal in LVCSR. INTERSPEECH- 2015, 26–30.

[15]

Zhang, S. Zhang, S. Huang, T. and Gao, W. 2016. Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition. Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ACM, 281–284.

Digital Library

[16]

Nojavanasghari, B. Gopinath, D. Koushik, J. Baltrušaitis, T. and Morency, L-P. 2016. Deep Multimodal Fusion for Persuasiveness Prediction. Proceedings of the 18th ACM International Conference on Multimodal Interaction, ACM, 284–288.

Digital Library

[17]

Nihei, F. Nakano, YI. Hayashi, Y. Hung, H-H. and Okada, S. 2014. Predicting Influential Statements in Group Discussions Using Speech and Head Motion Information. Proceedings of the 16th International Conference on Multimodal Interaction, ACM, 136–143.

Digital Library

[18]

Fan, Y. Lu, X. Li, D. and Liu, Y. 2016. Video-based Emotion Recognition Using CNN-RNN and C3D Hybrid Networks. Proceedings of the 18th ACM International Conference on Multimodal Interaction, ACM, 445–450.

Digital Library

[19]

Krizhevsky, A. Sutskever, I. and Hinton, GE. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems, Curran Associates Inc., 1097–1105.

Digital Library

[20]

Retrieved from http://dl.acm.org/citation.cfm?id=2999134.2999257

Cited By

Mu SCui MHuang X(2020)Multimodal Data Fusion in Learning Analytics: A Systematic ReviewSensors10.3390/s2023685620:23(6856)Online publication date: 30-Nov-2020
https://doi.org/10.3390/s20236856
Miura GOkada S(2019)Task-independent Multimodal Prediction of Group Performance Based on Product Dimensions2019 International Conference on Multimodal Interaction10.1145/3340555.3353729(264-273)Online publication date: 14-Oct-2019
https://dl.acm.org/doi/10.1145/3340555.3353729
Arakawa RYakura HBrewster SFitzpatrick GCox AKostakos V(2019)REsCUEProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300802(1-13)Online publication date: 2-May-2019
https://dl.acm.org/doi/10.1145/3290605.3300802
Show More Cited By

Index Terms

Predicting meeting extracts in group discussions using multimodal convolutional neural networks
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
    2. Interaction paradigms
      1. Collaborative interaction

Recommendations

Fusing Verbal and Nonverbal Information for Extractive Meeting Summarization
GIFT'18: Proceedings of the Group Interaction Frontiers in Technology

Automatic meeting summarization would reduce the cost of producing minutes during or after a meeting. With the goal of establishing a method for extractive meeting summarization, we propose a multimodal fusion model that identifies the important ...
Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking
iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion

Video hyperlinking represents a classical example of multimodal problems. Common approaches to such problems are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Convolutional neural networks (CNNs) are employed to estimate the visual focus of attention (VFoA), also called gaze direction , in multiparty face-to-face meetings on the basis of multimodal nonverbal behaviors including head pose, direction of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

November 2017

676 pages

ISBN:9781450355438

DOI:10.1145/3136755

General Chairs:
Edward Lank
University of Waterloo, Canada
,
Alessandro Vinciarelli
University of Glasgow, UK
,
Program Chairs:
Eve Hoggan
Aarhus University, Denmark
,
Sriram Subramanian
University of Sussex, UK
,
Stephen A. Brewster
University of Glasgow, UK

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Japan Science and Technology Agency

Conference

ICMI '17

Sponsor:

SIGCHI

ICMI '17: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 13 - 17, 2017

Glasgow, UK

Acceptance Rates

ICMI '17 Paper Acceptance Rate 65 of 149 submissions, 44%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
334
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)7

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mu SCui MHuang X(2020)Multimodal Data Fusion in Learning Analytics: A Systematic ReviewSensors10.3390/s2023685620:23(6856)Online publication date: 30-Nov-2020
https://doi.org/10.3390/s20236856
Miura GOkada S(2019)Task-independent Multimodal Prediction of Group Performance Based on Product Dimensions2019 International Conference on Multimodal Interaction10.1145/3340555.3353729(264-273)Online publication date: 14-Oct-2019
https://dl.acm.org/doi/10.1145/3340555.3353729
Arakawa RYakura HBrewster SFitzpatrick GCox AKostakos V(2019)REsCUEProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300802(1-13)Online publication date: 2-May-2019
https://dl.acm.org/doi/10.1145/3290605.3300802
Echeverria VMartinez-Maldonado RBuckingham Shum SBrewster SFitzpatrick GCox AKostakos V(2019)Towards Collaboration TranslucenceProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300269(1-16)Online publication date: 2-May-2019
https://dl.acm.org/doi/10.1145/3290605.3300269
Nihei FNakano YTakase Y(2018)Fusing Verbal and Nonverbal Information for Extractive Meeting SummarizationProceedings of the Group Interaction Frontiers in Technology10.1145/3279981.3279987(1-9)Online publication date: 16-Oct-2018
https://dl.acm.org/doi/10.1145/3279981.3279987
Miller C(2018)Using Parallel Episodes of Speech to Represent and Identify Interaction Dynamics for Group MeetingsProceedings of the Group Interaction Frontiers in Technology10.1145/3279981.3279983(1-7)Online publication date: 16-Oct-2018
https://dl.acm.org/doi/10.1145/3279981.3279983
Otsuka KKasuga KKöhler MD'Mello SGeorgiou PScherer SProvost ESoleymani MWorsley M(2018)Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural NetworksProceedings of the 20th ACM International Conference on Multimodal Interaction10.1145/3242969.3242973(191-199)Online publication date: 2-Oct-2018
https://dl.acm.org/doi/10.1145/3242969.3242973

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents