research-article

HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do

Authors:

Keith Curtis,

George Awad,

Shahzad Rajput,

Ian SoboroffAuthors Info & Claims

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

Pages 355 - 361

https://doi.org/10.1145/3372278.3390742

Published: 08 June 2020 Publication History

Get Access

Abstract

In this paper we propose a new evaluation challenge and direction in the area of High-level Video Understanding. The challenge we are proposing is designed to test automatic video analysis and understanding, and how accurately systems can comprehend a movie in terms of actors, entities, events and their relationship to each other. A pilot High-Level Video Understanding (HLVU) dataset of open source movies were collected for human assessors to build a knowledge graph representing each of them. A set of queries will be derived from the knowledge graph to test systems on retrieving relationships among actors, as well as reasoning and retrieving non-visual concepts. The objective is to benchmark if a computer system can "understand" non-explicit but obvious relationships the same way humans do when they watch the same movies. This is long-standing problem that is being addressed in the text domain and this project moves similar research to the video domain. Work of this nature is foundational to future video analytics and video understanding technologies. This work can be of interest to streaming services and broadcasters hoping to provide more intuitive ways for their customers to interact with and consume video content.

References

[1]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision. 2425--2433.

Digital Library

Google Scholar

[2]

George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, and Georges QuÃ©not. 2019. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval. In Proceedings of TRECVID 2019. NIST, USA.

Google Scholar

[3]

Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--970.

Crossref

Google Scholar

[4]

Paola Cascante-Bonilla, Kalpathy Sitaraman, Mengjia Luo, and Vicente Ordonez. 2019. Moviescope: Large-scale Analysis of Movies using Multiple Modalities. arXiv preprint arXiv:1908.03180 (2019).

Google Scholar

[5]

Creative Commons. 2019. About The Licenses. https://creativecommons.org/licenses/, Last accessed on 2019--11-06.

Google Scholar

[6]

Jeremy Debattista, Fahim A Salim, Fasih Haider, Clare Conran, Owen Conlan, Keith Curtis, Wang Wei, Ademar Crotti Junior, and Declan O'Sullivan. 2018. Expressing Multimedia Content Using Semantics-A Vision. In 2018 IEEE 12th International Conference on Semantic Computing (ICSC). IEEE, 302--303.

Google Scholar

[7]

Bhavan Jasani, Rohit Girdhar, and Deva Ramanan. 2019. Are we asking the right questions in MovieQA?. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0--0.

Crossref

Google Scholar

[8]

Anna Rohrbach and Jae Sung Park. 2019. Large Scale Movie Description Challenge (LSMDC) 2019. https://sites.google.com/site/describingmovies/lsmdc-2019, Last accessed on 2019--11-06.

Google Scholar

[9]

Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loic Barrault, Lucia Specia, and Florian Metze. 2018. How2: a large-scale dataset for multimodal language understanding. arXiv preprint arXiv:1811.00347 (2018).

Google Scholar

[10]

Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2016. Movieqa: Understanding stories in movies through question-answering. In Proceedings of the IEEE conference on computer vision and pattern recognition . 4631--4640.

Crossref

Google Scholar

[11]

Cornelis Joost Van Rijsbergen. 1979. Information retrieval. (1979).

Google Scholar

[12]

Ellen M Voorhees et al. 1999. The TREC-8 question answering track report. In Trec, Vol. 99. Citeseer, 77--82.

Google Scholar

Cited By

View all

Sanders KVan Durme B(2024)A Survey of Video Datasets for Grounded Event Understanding2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00727(7314-7327)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00727
Liu RFang YYu FTian RRen TWu GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Deep Video Understanding with Video-Language ModelProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612863(9551-9555)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612863
Li RGuo JLi MWu ZLiang CEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)A Hierarchical Deep Video Understanding Method with Shot-Based Instance Search and Large Language ModelProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612838(9425-9429)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612838
Show More Cited By

Index Terms

HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Recommendations

Deep Video Understanding of Character Relationships in Movies
ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

Humans can easily understand storylines and character relationships in movies. However, the automatic relationship analysis from videos is challenging. In this paper, we introduce a deep video understanding system to infer relationships between movie ...
The ACM Multimedia 2022 Deep Video Understanding Grand Challenge
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

This is the overview paper for the Deep Video Understanding (DVU) Grand Challenge. In recent years, a growing trend towards working on understanding videos (in particular movies) to a deeper level started to motivate researchers working in multimedia ...
International Workshop on Deep Video Understanding
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

This is the introduction paper to the International Workshop on Deep Video Understanding, organized at the 22nd ACM Interational Conference on Multimodal Interaction. In recent years, a growing trend towards working on understanding videos (in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

June 2020

605 pages

ISBN:9781450370875

DOI:10.1145/3372278

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Björn Þór Jónsson
IT University of Copenhagen, Denmark
,
Noriko Kando
National Institute of Informatics, Tokyo
,
Program Chairs:
Klaus Schoeffmann
Klagenfurt University, Austria
,
Phoebe Chen
La Trobe University, Australia
,
Noel E. O'Connor
Dublin City University, Ireland

© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR '20

Sponsor:

SIGMM

ICMR '20: International Conference on Multimedia Retrieval

June 8 - 11, 2020

Dublin, Ireland

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
248
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)7

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sanders KVan Durme B(2024)A Survey of Video Datasets for Grounded Event Understanding2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00727(7314-7327)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00727
Liu RFang YYu FTian RRen TWu GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Deep Video Understanding with Video-Language ModelProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612863(9551-9555)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612863
Li RGuo JLi MWu ZLiang CEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)A Hierarchical Deep Video Understanding Method with Shot-Based Instance Search and Large Language ModelProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612838(9425-9429)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612838
Curtis KAwad GGodil ASoboroff IEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)The ACM Multimedia 2023 Deep Video Understanding Grand ChallengeProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612829(9606-9609)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612829
Wang HHu YZhu YQi JWu BEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long VideosProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612175(67-76)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612175
Wu ZLi RGuo JWang ZLiang C(2023)A Deep Understanding Video Q&A System for Film Education in Acting Department2023 International Conference on Intelligent Education and Intelligent Research (IEIR)10.1109/IEIR59294.2023.10391232(1-7)Online publication date: 5-Nov-2023
https://doi.org/10.1109/IEIR59294.2023.10391232
Giunchiglia EStoian MKhan SCuzzolin FLukasiewicz T(2023)ROAD-R: the autonomous driving dataset with logical requirementsMachine Learning10.1007/s10994-023-06322-z112:9(3261-3291)Online publication date: 1-May-2023
https://doi.org/10.1007/s10994-023-06322-z
Ramesh RAnand VChen ZDong YChen YLin CMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Leveraging Text Representation and Face-head Tracking for Long-form Multimodal Semantic Relation UnderstandingProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3551610(7215-7219)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3551610
Qin PYu JGao YXu DChen YWu SXu TChen EHao YMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Unified QA-aware Knowledge Graph Generation Based on Multi-modal ModelingProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3551604(7185-7189)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3551604
Zhang BFang YRen TWu GMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Multimodal Analysis for Deep Video Understanding with Video Language TransformerProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3551600(7165-7169)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3551600
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Deep Video Understanding of Character Relationships in Movies

The ACM Multimedia 2022 Deep Video Understanding Grand Challenge

International Workshop on Deep Video Understanding

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations