[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3338533.3366593acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Video Summarization based on Sparse Subspace Clustering with Automatically Estimated Number of Clusters

Published: 10 January 2020 Publication History

Abstract

Advancements in technology resulted in a sharp growth in the number of digital cameras at people's disposal all across the world. Consequently, the huge storage space consumed by the videos from these devices on video repositories make the job of video processing and analysis to be time-consuming. Furthermore, this also slows down the video browsing and retrieval. Video summarization plays a very crucial role in solving these issues. Despite the number of video summarization approaches proposed up to the present time, the goal is to take a long video and generate a video summary in form of a short video skim without losing the meaning or the message transmitted by the original lengthy video. This is done by selecting the important frames called key-frames. The approach proposed by this work performs automatic summarization of digital videos based on detected objects' deep features. To this end, we apply sparse subspace clustering with an automatically estimated number of clusters to the objects' deep features. The summary generated from our scheme will store the meta-data for each short video inferred from the clustering results. In this paper, we also suggest a new video dataset for video summarization. We evaluate the performance of our work using the TVSum dataset and our video summarization dataset.

References

[1]
M. Ajmal, M. H. Ashraf, M. Shakir, Y. Abbas, and F.A. Shah. 2012. Video Summarization: Techniques and Classification. In International Conference on Computer Vision and Graphics. 1--13.
[2]
S.E.F.D. Avila, A.P.B. Lopes, A.D.L. Jr., and A.D.A. Araújo. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters 32 (2011), 56--68.
[3]
K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In British Machine Vision Conference.
[4]
F. Chen, M. Cooper, and J. Adcock. 2007. Video Summarization Preserving Dynamic Content. In International Workshop on TRECVID Video Summarization. 40--44.
[5]
W. Chu, Y. Song, and A. Jaimes. 2015. Video Co-summarization: Video Summarization by Visual Co-occurrence. In IEEE Conference on Computer Vision and Recognition. 3584--3592.
[6]
M.V.M. Cirne and H. Pedrini. 2013. A Video Summarization Method Based on Spectral Clustering. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. 479--486.
[7]
E. Elhamifar and R. Vidal. 2013. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 11 (2013), 2765--2781.
[8]
N. Fachada, M.A.T. Figueiredo, V.V. Lopes, R.C. Martins, and A.C. Rosa. 2014. Spectrometric differentiation of yeast strains using minimum volume increase and minimum direction change clustering criteria. 45 (2014), 55--61.
[9]
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In IEEE Conference on Computer Vision and Recognition. 580--587.
[10]
M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. 2014. Creating Summaries from User Videos. In European Conference on Computer Vision. 505--520.
[11]
R. Hari, C. P. Roopesh, and M. Wilscy. 2013. Human face based approach for video summarization. In IEEE Recent Advances in Intelligent Computational Systems. 245--250.
[12]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Recognition. 770--778.
[13]
V. Jain and E. Learned-Miller. 2010. FDDB: A Benchmark for Face Detection in Unconstrained Settings. Technical Report UM-CS-2010-009. Technical Report, University of Massachusetts, Amherst.
[14]
S. Ji, M. Yang, and K. Yu. 2012. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221--231.
[15]
A. Krizhevsky, I. Sutskever, and G.E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In International Conference on Neural Information Processing Systems, Vol. 60. 1097--1105.
[16]
Y. Lecun, B. E. Boser, J. S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1, 4 (1989), 541--551.
[17]
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. 2013. Robust Recovery of Subspace Structures by Low-Rank Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 171--184.
[18]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A.C. Berg. 2016. SSD: Single Shot Multi Box Detector. In European Conference on Computer Vision. 21--37.
[19]
B. Mahasseni, M. Lam, and S. Todorovic. 2017. Unsupervised Video Summarization with Adversarial LSTM Networks. In IEEE Conference on Computer Vision and Recognition. 2982--2991.
[20]
P. Mundur, Y. Rao, and Y.Yesha. 2006. Keyframe-based Video Summarization Using Delaunay Clustering. International Journal on Digital Libraries 6, 2 (2006), 219--232.
[21]
A. Y. Ng, M. I. Jordan, and Y. Weiss. 2001. On Spectral Clustering: Analysis and an algorithm. In International Conference on Neural Information Processing Systems, Vol. 14. 849--856.
[22]
R. Panda, A. Das, Z. Wu, J. Ernst, and A. K. Roy-Chowdhury. 2017. Weakly Supervised Summarization of Web Videos. In IEEE International Conference on Computer Vision. 3677--3686.
[23]
J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conference on Computer Vision and Recognition. 779--788.
[24]
J. Redmon and A. Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In IEEE Conference on Computer Vision and Recognition. 6517--6525.
[25]
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.
[26]
Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing web videos using titles. In IEEE Conference on Computer Vision and Recognition. 5179--5187.
[27]
Libor Spacek. 2007. Face Recognition Data, University of Essex, UK, Faces 96. http://cswww.essex.ac.uk/mv/allfaces/faces96.html
[28]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going Deeper with Convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1--9.
[29]
J. Yan and M.Pollefeys. 2006. A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate. In European Conference on Computer Vision. 94--106.
[30]
S. Yang, P. Luo, C. L. Chen, and X. Tang. 2016. WIDER FACE: A Face Detection Benchmark. In IEEE Conference on Computer Vision and Recognition. 5525--5533.
[31]
L. Zelnikmanor. 2004. Self-Tuning Spectral Clustering. Advances in Neural Information Processing Systems (2004), 1601--1608.
[32]
K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.
[33]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. 2015. Object Detectors Emerge in Deep Scene CNNs. In International Conference on Learning Representations.
[34]
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. 2014. Learning Deep Features for Scene Recognition using Places Database. In International Conference on Neural Information Processing Systemss. 487--495.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia
December 2019
403 pages
ISBN:9781450368414
DOI:10.1145/3338533
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning
  2. Object detection
  3. Subspace clustering
  4. Video summarization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Natural Science Foundation of China
  • Zhejiang Provincial Natural Science Foundation of China

Conference

MMAsia '19
Sponsor:
MMAsia '19: ACM Multimedia Asia
December 15 - 18, 2019
Beijing, China

Acceptance Rates

MMAsia '19 Paper Acceptance Rate 59 of 204 submissions, 29%;
Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media