[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1963405.1963448acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Video summarization via transferrable structured learning

Published: 28 March 2011 Publication History

Abstract

It is well-known that textual information such as video transcripts and video reviews can significantly enhance the performance of video summarization algorithms. Unfortunately, many videos on the Web such as those from the popular video sharing site YouTube do not have useful textual information. The goal of this paper is to propose a transfer learning framework for video summarization: in the training process both the video features and textual features are exploited to train a summarization algorithm while for summarizing a new video only its video features are utilized. The basic idea is to explore the transferability between videos and their corresponding textual information. Based on the assumption that video features and textual features are highly correlated with each other, we can transfer textual information into knowledge on summarization using video information only. In particular, we formulate the video summarization problem as that of learning a mapping from a set of shots of a video to a subset of the shots using the general framework of SVM-based structured learning. Textual information is transferred by encoding them into a set of constraints used in the structured learning process which tend to provide a more detailed and accurate characterization of the different subsets of shots. Experimental results show significant performance improvement of our approach and demonstrate the utility of textual information for enhancing video summarization.

References

[1]
Ibm multimedia analysis and retrieval system, http://www.alphaworks.ibm.com/tech/imars.
[2]
Trec video, http://trecvid.nist.gov/.
[3]
A. Bagga, J. Hu, J. Zhong, and G. Ramesh. Multi-source combined-media video tracking for summarization. Pattern Recognition, International Conference on, 2:20818, 2002.
[4]
B.-W. Chen, J.-C. Wang, and J.-F. Wang. A novel video summarization based on mining the story-structure and semantic relations among concept entities. Multimedia, 11(2):295--312, Feb. 2009.
[5]
G. Cohen, A. Amir, D. Ponceleon, B. Blanchard, D. Petkovic, and S. Srinivasan. Using audio time scale modification for video browsing. In HICSS, page 3046, Washington, DC, USA, 2000. IEEE Computer Society.
[6]
W. Dai, Y. Chen, G.-R. Xue, Q. Yang, and Y. Yu. Translated learning: Transfer learning across different feature spaces. In NIPS, pages 353--360, 2008.
[7]
W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive bayes classifiers for text classification. In AAAI, pages 540--545. AAAI Press, 2007.
[8]
M. Detyniecki and C. Marsala. Video rushes summarization by adaptive acceleration and stacking of shots. In TVS, pages 65--69, New York, NY, USA, 2007. ACM.
[9]
L. He, E. Sanocki, A. Gupta, and J. Grudin. Auto-summarization of audio-video presentations. In MULTIMEDIA, pages 489--498, New York, NY, USA, 1999. ACM.
[10]
Q. Huang, Z. Liu, A. Rosenberg, D. Gibbo, and B. Shahraray. Automated generation of news content hierarchy by integrating audio, video, and text information. In International Conference on Acoustics, Speech, and Signal Processing, 1999.
[11]
J. Jiang and C. Zhai. Instance weighting for domain adaptation in nlp. In ACL. The Association for Computer Linguistics, 2007.
[12]
S. X. Ju, M. J. Black, S. Minneman, and D. Kimber. Summarization of video-taped presentations: Automatic analysis of motion and gesture. IEEE Trans. on Circuits and Systems for Video Technology, 8:686--696, 1998.
[13]
S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.
[14]
N. D. Lawrence and J. C. Platt. Learning to learn with the informative vector machine. In ICML, page 65, New York, NY, USA, 2004. ACM.
[15]
L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Enhancing diversity, coverage and balance for summarization through structure learning. In WWW, pages 71--80, New York, NY, USA, 2009. ACM.
[16]
Y. Li, S.-H. Lee, C.-H. Yeh, and C. C. J. Kuo. Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. Signal Processing Magazine, 23(2):79--89, 2006.
[17]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, 2004.
[18]
R. B. M. Sonka, V. Hlavac. Image Processing, Analysis, and machine vision. 2007.
[19]
L. Mihalkova, T. Huynh, and R. J. Mooney. Mapping and revising markov logic networks for transfer learning. In AAAI, pages 608--614. AAAI Press, 2007.
[20]
M. Mills, J. Cohen, and Y. Y. Wong. A magnifier tool for video data. In CHI, pages 93--98, New York, NY, USA, 1992. ACM.
[21]
C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang. Video summarization and scene detection by graph modeling. IEEE transactions on circuits and systems for video technology, 15(2):296--305, 2005.
[22]
A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision, 42(3):145--175, 2001.
[23]
D. M. Russell. A design pattern-based video summarization technique: Moving from low-level signals to high-level structure. Hawaii International Conference on System Sciences, 3:3048, 2000.
[24]
M. A. Smith and T. Kanade. Video skimming and characterization through the combination of image and language understanding techniques. pages 370--382, 2001.
[25]
Y. Taniguchi, A. Akutsu, Y. Tonomura, and H. Hamada. An intuitive and efficient access interface to real-time incoming video based on automatic indexing. In MULTIMEDIA, pages 25--33, New York, NY, USA, 1995. ACM.
[26]
C. M. Taskiran, Z. Pizlo, A. Amir, D. Ponceleon, and E. J. Delp. Automated video program summarization using speech transcripts. Multimedia, 8(4):775--791, 2006.
[27]
M. E. Taylor and P. Stone. Cross-domain transfer for reinforcement learning. In ICML, pages 879--886, New York, NY, USA, 2007. ACM.
[28]
R. Tibshirani and G. Hinton. Coaching variables for regression and classification. Statistics and Computing, 8(1):25--33, 1998.
[29]
B. T. Truong and S. Venkatesh. Generating comprehensible summaries of rushes sequences based on robust feature matching. In TVS, pages 30--34, New York, NY, USA, 2007. ACM.
[30]
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005.
[31]
S. Uchihashi, J. Foote, A. Girgensohn, and J. Boreczky. Video manga: generating semantically meaningful video summaries. In MULTIMEDIA, pages 383--392, New York, NY, USA, 1999. ACM.
[32]
Z. Xiong, R. Radhakrishan, A. Divakaran, and Y. Ishikawa. Generation of sports highlights using motion activity in combination with a common audio feature extraction framework. In ICIP, 2003.
[33]
M. M. Yeung and B. L. Yeo. Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans. on Circuits and Systems for Video Technology, 7:771--785, 1997.
[34]
C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In ICML, pages 1169--1176, New York, NY, USA, 2009. ACM.
[35]
Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In ICML, pages 1224--1231, New York, NY, USA, 2008. ACM.
[36]
X. Zhu, J. Fan, A. K. Elmagarmid, and X. Wu. Hierarchical video content description and summarization using unified semantic and visual similarity. Multimedia Syst., 9(1):31--53, 2003.

Cited By

View all
  • (2023)Multimodal text summarization with evaluation approachesSādhanā10.1007/s12046-023-02284-z48:4Online publication date: 24-Oct-2023
  • (2023)Bayesian fuzzy clustering and deep CNN-based automatic video summarizationMultimedia Tools and Applications10.1007/s11042-023-15431-983:1(963-1000)Online publication date: 30-May-2023
  • (2021)Selective Transfer Classification Learning With Classification-Error-Based Consensus RegularizationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.28927625:2(178-190)Online publication date: Apr-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '11: Proceedings of the 20th international conference on World wide web
March 2011
840 pages
ISBN:9781450306324
DOI:10.1145/1963405
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. structural svm
  2. transfer learning
  3. video summarization

Qualifiers

  • Research-article

Conference

WWW '11
WWW '11: 20th International World Wide Web Conference
March 28 - April 1, 2011
Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Multimodal text summarization with evaluation approachesSādhanā10.1007/s12046-023-02284-z48:4Online publication date: 24-Oct-2023
  • (2023)Bayesian fuzzy clustering and deep CNN-based automatic video summarizationMultimedia Tools and Applications10.1007/s11042-023-15431-983:1(963-1000)Online publication date: 30-May-2023
  • (2021)Selective Transfer Classification Learning With Classification-Error-Based Consensus RegularizationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.28927625:2(178-190)Online publication date: Apr-2021
  • (2021)Unsupervised Domain Adaptation Based on Correlation MaximizationIEEE Access10.1109/ACCESS.2021.31115869(127054-127067)Online publication date: 2021
  • (2021)Video Summarization Using a Dense Captioning (DenseCap) ModelIntelligent Multi‐modal Data Processing10.1002/9781119571452.ch5(97-129)Online publication date: 30-Apr-2021
  • (2020)Client-Driven Personalized Trailer Framework Using Thumbnail ContainersIEEE Access10.1109/ACCESS.2020.29829928(60417-60427)Online publication date: 2020
  • (2018)A novel compact yet rich key frame creation method for compressed video summarizationMultimedia Tools and Applications10.1007/s11042-017-4843-277:10(11957-11977)Online publication date: 1-May-2018
  • (2017)Personalized Key Frame RecommendationProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080776(315-324)Online publication date: 7-Aug-2017
  • (2017)V-JAUNEACM Transactions on Multimedia Computing, Communications, and Applications10.1145/306353213:2(1-19)Online publication date: 26-Apr-2017
  • (2017)Enhancing Video Summarization via Vision-Language Embedding2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2017.118(1052-1060)Online publication date: Jul-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media