[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3394171.3413632acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Performance over Random: A Robust Evaluation Protocol for Video Summarization Methods

Published: 12 October 2020 Publication History

Abstract

This paper proposes a new evaluation approach for video summarization algorithms. We start by studying the currently established evaluation protocol; this protocol, defined over the ground-truth annotations of the SumMe and TVSum datasets, quantifies the agreement between the user-defined and the automatically-created summaries with F-Score, and reports the average performance on a few different training/testing splits of the used dataset. We evaluate five publicly-available summarization algorithms under a large-scale experimental setting with 50 randomly-created data splits. We show that the results reported in the papers are not always congruent with their performance on the large-scale experiment, and that the F-Score cannot be used for comparing algorithms evaluated on different splits. We also show that the above shortcomings of the established evaluation protocol are due to the significantly varying levels of difficulty among the utilized splits, that affect the outcomes of the evaluations. Further analysis of these findings indicates a noticeable performance correlation among all algorithms and a random summarizer. To mitigate these shortcomings we propose an evaluation protocol that makes estimates about the difficulty of each used data split and utilizes this information during the evaluation process. Experiments involving different evaluation settings demonstrate the increased representativeness of performance results when using the proposed evaluation approach, and the increased reliability of comparisons when the examined methods have been evaluated on different data splits.

Supplementary Material

MP4 File (3394171.3413632.mp4)
This is the presentation video of a paper titled ?Performance over Random: A Robust Evaluation Protocol for Video Summarization Methods?. The presentation starts by briefly explaining the goal of video summarization technologies and the main types of automatically-generated summaries found in the bibliography. Then, it describes the most commonly used datasets and measures for evaluating video summarization, that are adopted by the majority of state of the art video summarization works. It discusses the characteristics of the currently established evaluation protocol and analyses its observed weaknesses. Based on this analysis, it presents the proposed evaluation approach, called ?Performance over Random?, that aims to mitigate these weaknesses and increase the reliability of performance evaluations and comparisons. In the sequel, it reports on the conducted experiments and their findings that document the merits of the proposed evaluation approach, and finally it provides the main conclusions of this study.

References

[1]
Jurandy Almeida, Neucimar J. Leite, and Ricardo da S. Torres. 2012. VISON: VIdeo Summarization for ONline Applications. Pattern Recogn. Lett. 33, 4 (March 2012), 397--409.
[2]
Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, and Ioannis Patras. 2020. Unsupervised Video Summarization via Attention-Driven Adversarial Learning. In Proc. of the Multi Media Modeling 2020, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 492--504.
[3]
Evlampios Apostolidis, Alexandros I. Metsai, Eleni Adamantidou, Vasileios Mezaris, and Ioannis Patras. 2019. A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization. In Proc. of the 1st Int. Workshop on AI for Smart TV Content Production, Access and Delivery (Nice, France) (AI4TV '19). Association for Computing Machinery, New York, NY, USA, 17--25.
[4]
Edward J. Y. C. Cahuina and Guillermo C. Chavez. 2013. A New Method for Static Video Summarization Using Local Descriptors and Video Temporal Segmentation. In Proc. of the 2013 XXVI Conf. on Graphics, Patterns and Images. 226--233.
[5]
Vasileios Chasanis, Aristidis Likas, and Nikolaos Galatsanos. 2008. Efficient Video Shot Summarization Using an Enhanced Spectral Clustering Approach. In Proc. of the Artificial Neural Networks - ICANN 2008, Véra Kůrková, Roman Neruda, and Jan Koutník (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 847--856.
[6]
Yiyan Chen, Li Tao, Xueting Wang, and Toshihiko Yamasaki. 2019. Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning. In Proc. of the ACM Multimedia Asia (Beijing, China) (MMAsia '19). Association for Computing Machinery, New York, NY, USA.
[7]
Wei-Ta Chu and Yu-Hsin Liu. 2019. Spatiotemporal Modeling and Label Distribution Learning for Video Summarization. In Proc. of the 2019 IEEE 21st Int. Workshop on Multimedia Signal Processing (MMSP). 1--6.
[8]
Sandra E. F. de Avila, Antonio da Luz Jr., Arnaldo de A. Araújo, and Matthieu Cord. 2008. VSUMM: An Approach for Automatic Video Summarization and Quantitative Evaluation. In Proc. of the 2008 XXI Brazilian Symposium on Computer Graphics and Image Processing. 103--110.
[9]
Sandra E. F. de Avila, Ana P. B. Lopes, Antonio da Luz Jr., and Arnaldo de A. Araújo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern Recognition Letters 32, 1 (Jan. 2011), 56--68.
[10]
Mahmut Demir and H. Isil Bozma. 2015. Video Summarization via Segments Summary Graphs. In Proc. of the 2015 IEEE Int. Conf. on Computer Vision Workshop (ICCVW). 1071--1077.
[11]
Naveed Ejaz, Irfan Mehmood, and Sung Wook Baik. 2014. Feature Aggregation Based Visual Attention model for Video Summarization. Computers and Electrical Engineering 40, 3 (2014), 993--1005. Special Issue on Image and Video Processing.
[12]
Naveed Ejaz, Tayyab Bin Tariq, and Sung Wook Baik. 2012. Adaptive Key Frame Extraction for Video Summarization Using an Aggregation Mechanism. Journal of Visual Communication and Image Representation 23, 7 (Oct. 2012), 1031--1040.
[13]
Mohamed Elfeki and Ali Borji. 2019. Video Summarization Via Actionness Ranking. In Proc. of the 2019 IEEE Winter Conf. on Applications of Computer Vision (WACV). 754--763.
[14]
Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, and Paolo Remagnino. 2019. Summarizing Videos with Attention. In Proc. of the 2018 Asian Conf. on Computer Vision (ACCV) Workshops, Gustavo Carneiro and Shaodi You (Eds.). Springer International Publishing, Cham, 39--54.
[15]
Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. 2014. Diverse Sequential Subset Selection for Supervised Video Summarization. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2069--2077.
[16]
Genliang Guan, Zhiyong Wang, Shaohui Mei, Max Ott, Mingyi He, and David Dagan Feng. 2014. A Top-Down Approach for Video Summarization. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 1 (Sept. 2014), 21.
[17]
Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating Summaries from User Videos. In Proc. of the 2014 European Conf. on Computer Vision (ECCV), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 505--520.
[18]
Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video Summarization by Learning Submodular Mixtures of Objectives. In Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3090--3098.
[19]
Xufeng He, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue, Ruhui Ma, Neil Robertson, and Haibing Guan. 2019. Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks. In Proc. of the 27th ACM Int. Conf. on Multimedia (Nice, France) (MM '19). Association for Computing Machinery, New York, NY, USA, 2296--2304.
[20]
Hyun Sung Chang, Sanghoon Sull, and Sang Uk Lee. 1999. Efficient Video Indexing Scheme for Content-Based Retrieval. IEEE Transactions on Circuits and Systems for Video Technology 9, 8 (1999), 1269--1279.
[21]
Hugo Jacob, Flávio L. Pádua, Anisio Lacerda, and Adriano C. Pereira. 2017. A Video Summarization Approach Based on the Emulation of Bottom-up Mechanisms of Visual Attention. Journal of Intelligent Information Systems 49, 2 (Oct. 2017), 193--211.
[22]
Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2019. Video Summarization with Attention-Based Encoder-Decoder Networks. IEEE Transactions on Circuits and Systems for Video Technology (2019), 1--1.
[23]
Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, and In-So Kweon. 2019. Discriminative Feature Learning for Unsupervised Video Summarization. In Proc. of the 2019 AAAI Conf. on Artificial Intelligence.
[24]
Maurice G. Kendall. 1945. The Treatment of Ties in Ranking Problems. Biometrika 33, 3 (1945), 239--251.
[25]
Stephen Kokoska and Daniel Zwillinger. 2000. CRC Standard Probability and Statistics Tables and Formulae. Crc Press.
[26]
Shamit Lal, Shivam Duggal, and Indu Sreedevi. 2019. Online Video Summarization: Predicting Future to Better Summarize Present. In Proc. of the 2019 IEEE Winter Conf. on Applications of Computer Vision (WACV). 471--480.
[27]
Xuelong Li, Bin Zhao, and Xiaoqiang Lu. 2017. A General Framework for Edited Video and Raw Video Summarization. IEEE Transactions on Image Processing 26, 8 (2017), 3652--3664.
[28]
Tie-Yan Liu, Xu-Dong Zhang, Jian Feng, and Kwok-Tung Lo. 2004. Shot Reconstruction Degree: A Novel Criterion for Key Frame Selection. Pattern Recognition Letters 25 (2004), 1451--1457.
[29]
Yen-Ting Liu, Yu-Jhe Li, Fu-En Yang, Shang-Fu Chen, and Yu-Chiang F. Wang. 2019. Learning Hierarchical Self-Attention for Video Summarization. In Proc. of the 2019 IEEE Int. Conf. on Image Processing (ICIP). 3377--3381.
[30]
Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised Video Summarization with Adversarial LSTM Networks. In Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2982--2991.
[31]
Karim M. Mahmoud, Nagia M. Ghanem, and Mohamed A. Ismail. 2013 Unsupervised Video Summarization via Dynamic Modeling-Based Hierarchical Clustering. In Proc. of the 12th Int. Conf. on Machine Learning and Applications, Vol. 2. 303--308.
[32]
Esa Rahtu Mayu Otani, Yuta Nakahima and Janne Heikkilä. 2019. Rethinking the Evaluation of Video Summaries. In Proc. of the 2019 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
[33]
Shaohui Mei, Genliang Guan, Zhiyong Wang, Shuai Wan, Mingyi He, and David [Dagan Feng]. 2015. Video Summarization via Minimum Sparse Reconstruction. Pattern Recognition 48, 2 (2015), 522--533.
[34]
Jingjing Meng, Suchen Wang, Hongxing Wang, Junsong Yuan, and Yap-Peng Tan. 2018. Video Summarization Via Multiview Representative Selection. IEEE Transactions on Image Processing 27, 5 (2018), 2134--2145.
[35]
Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, and Naokazu Yokoya. 2017. Video Summarization Using Deep Semantic Features. In Proc. of the 2017 Asian Conf. on Computer Vision (ACCV), Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato (Eds.). Springer International Publishing, Cham, 361--377.
[36]
Mrigank Rochan and Yang Wang. 2019. Video Summarization by Learning From Unpaired Data. In Proc. of the 2019 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 7894--7903.
[37]
Junbo Wang, Wei Wang, Zhiyong Wang, Liang Wang, Dagan Feng, and Tieniu Tan. 2019. Stacked Memory Network for Video Summarization. In Proc. of the 27th ACM Int. Conf. on Multimedia (Nice, France) (MM '19). Association for Computing Machinery, New York, NY, USA, 836--844.
[38]
Huawei Wei, Bingbing Ni, Yichao Yan, Huanyu Yu, Xiaokang Yang, and Chen Yao. 2018. Video Summarization via Semantic Attended Networks. In Proc. of the 2018 AAAI Conf. on Artificial Intelligence. 216--223.
[39]
Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 5179--5187.
[40]
Li Yuan, Francis Eng Hock Tay, Ping Li, Li Zhou, and Jiashi Feng. 2019. CycleSUM: Cycle-Consistent Adversarial LSTM Networks for Unsupervised Video Summarization. In Proc. of the 2019 AAAI Conf. on Artificial Intelligence.
[41]
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video Summarization with Long Short-Term Memory. In Proc. of the 2016 European Conf. on Computer Vision (ECCV), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 766--782.
[42]
Yujia Zhang, Michael Kampffmeyer, Xiaoguang Zhao, and Min Tan 2019. DTRGAN: Dilated Temporal Relational Adversarial Network for Video Summarization. In Proc. of the ACM Turing Celebration Conf. - China (Chengdu, China) (ACM TURC '19). Association for Computing Machinery, New York, NY, USA.
[43]
Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2017. Hierarchica Recurrent Neural Network for Video Summarization. In Proc. of the 25th ACM Int. Conf. on Multimedia (Mountain View, California, USA) (MM '17). Association for Computing Machinery, New York, NY, USA, 863--871.
[44]
Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2018. HSA-RNN: Hierarchical StructureAdaptive RNN for Video Summarization. In 2018 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 7405--7414.
[45]
Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2019. Property-Constrained Dual Learning for Video Summarization. IEEE Transactions on Neural Networks and Learning Systems (2019), 1--12.
[46]
Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018. Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. In Proc. of the 2018 AAAI Conf. on Artificial Intelligence.
[47]
Kaiyang Zhou, Tao Xiang, and Andrea Cavallaro. 2018. Video Summarisation by Classification with Deep Reinforcement Learning. In Proc. of the 2018 British Machine Vision Conferenve (BMVC)

Cited By

View all
  • (2024)A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization MethodsProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672417(71-79)Online publication date: 12-Jun-2024
  • (2024)Cluster-Based Video Summarization with Temporal Context AwarenessImage and Video Technology10.1007/978-981-97-0376-0_2(15-28)Online publication date: 12-Feb-2024
  • (2023)Self-Supervised Adversarial Video Summarizer With Context Latent Sequence LearningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324046433:8(4122-4136)Online publication date: Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. covariance
  2. evaluation protocol
  3. human performance
  4. pearson correlation coefficient
  5. performance over random
  6. random performance
  7. video summarization

Qualifiers

  • Research-article

Funding Sources

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization MethodsProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672417(71-79)Online publication date: 12-Jun-2024
  • (2024)Cluster-Based Video Summarization with Temporal Context AwarenessImage and Video Technology10.1007/978-981-97-0376-0_2(15-28)Online publication date: 12-Feb-2024
  • (2023)Self-Supervised Adversarial Video Summarizer With Context Latent Sequence LearningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324046433:8(4122-4136)Online publication date: Aug-2023
  • (2022)Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video FramesProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531404(407-415)Online publication date: 27-Jun-2022
  • (2022)EEG-Video Emotion-Based Summarization: Learning With EEG Auxiliary SignalsIEEE Transactions on Affective Computing10.1109/TAFFC.2022.320825913:4(1827-1839)Online publication date: 1-Oct-2022
  • (2022)Summarization of Cricket Videos Using Deep Learning Technique2022 International Conference on Frontiers of Information Technology (FIT)10.1109/FIT57066.2022.00016(30-35)Online publication date: Dec-2022
  • (2021)Combining Adversarial and Reinforcement Learning for Video Thumbnail SelectionProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463630(1-9)Online publication date: 24-Aug-2021
  • (2021)Video Summarization Using Deep Neural Networks: A SurveyProceedings of the IEEE10.1109/JPROC.2021.3117472109:11(1838-1863)Online publication date: Nov-2021
  • (2021)Combining Global and Local Attention with Positional Encoding for Video Summarization2021 IEEE International Symposium on Multimedia (ISM)10.1109/ISM52913.2021.00045(226-234)Online publication date: Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media