[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1873951.1873967acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A generic framework for event detection in various video domains

Published: 25 October 2010 Publication History

Abstract

Event detection is essential for the extensively studied video analysis and understanding area. Although various approaches have been proposed for event detection, there is a lack of a generic event detection framework that can be applied to various video domains (e.g. sports, news, movies, surveillance). In this paper, we present a generic event detection approach based on semi-supervised learning and Internet vision. Concretely, a Graph-based Semi-Supervised Multiple Instance Learning (GSSMIL) algorithm is proposed to jointly explore small-scale expert labeled videos and large-scale unlabeled videos to train the event models to detect video event boundaries. The expert labeled videos are obtained from the analysis and alignment of well-structured video related text (e.g. movie scripts, web-casting text, close caption). The unlabeled data are obtained by querying related events from the video search engine (e.g. YouTube) in order to give more distributive information for event modeling. A critical issue of GSSMIL in constructing a graph is the weight assignment, where the weight of an edge specifies the similarity between two data points. To tackle this problem, we propose a novel Multiple Instance Learning Induced Similarity (MILIS) measure by learning instance sensitive classifiers. We perform the thorough experiments in three popular video domains: movies, sports and news. The results compared with the state-of-the-arts are promising and demonstrate our proposed approach is performance-effective.

References

[1]
http://www.mosek.com/.
[2]
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning, 2002. In NIPS.
[3]
N. Babaguchi, Y. Kawai, and T. Kitahashi. Event based indexing of broadcasted sports video by intermodal collaboration, 2002. IEEE Transactions on Multimedia.
[4]
A. Basharat, A. Gritai, and M. Shah. Learning object motion patterns for anomaly detection and improved object detection, 2008. In CVPR.
[5]
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering, 2002. In NIPS.
[6]
S. Boyd and L. Vandenberghe. Convex optimization, 2003. Cambridge University Press.
[7]
H. Cheng, Z. Liu, and Z. Liu. Sparsity induced similarity measure for label propagation, 2009. In ICCV.
[8]
T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar. Movie/script: Alignment and parsing of video and text transcription, 2008. In ECCV.
[9]
M.-S. Dao and N. Babaguchi. Sports event detection using temporal patterns mining and web-casting text, 2008. In AREA '08: Proceeding of the 1st ACM workshop on Analysis and retrieval of events/actions and workflows in video streams.
[10]
T. Dietterich, R. Lathrop, and T. Lozano-Perez. Solving the multiple-instance problem with axis parallel rectangles, 1997. Artificial Intelligence.
[11]
P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features, 2005. In PETS.
[12]
O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce. Automatic annotation of human actions in video, 2009. In ICCV.
[13]
A. Ekin, A. M. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization, 2003. IEEE Trans. on Image Processing.
[14]
M. Everingham, J. Sivic, and A. Zisserman. Hello! my name is buffy - automatic naming of characters in tv video, 2006. In BMVC.
[15]
M. Fleischman and D. Roy. Grounded language modeling for automatic speech recognition of sports video. In Proceedings of ACL-08: HLT.
[16]
A. G. Hauptmann and M. J. Witbrock. Story segmentation and detection of commercials in broadcast news video, 1998. Advances in Digital Libraries.
[17]
http://opennlp.sourceforge.net.
[18]
Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. S. Huang. Action detection in complex scenes with spatial and temporal ambiguities, 2009. In ICCV.
[19]
C. Huang, W. Hsu, and S. Chang. Automatic closed caption alignment based on speech recognition transcripts, 2003. Tech. Rep. 007, Columbia University.
[20]
Y. Jia and C. Zhang. Instance-level semisupervised multiple instance learning, 2008. AAAI'08: Proceedings of the 23rd national conference on Artificial intelligence.
[21]
T. Kato, H. Kashima, and M. Sugiyama. Robust label propagation on multiple networks. 2009. IEEE TNN.
[22]
J. G. Kim, H. S. Chang, K. Kang, M. Kim, J. Kim, and H. M. Kim. Summarization of news video and its description for content-based access. International Journal of Imaging Systems and Technology, 13:267--274, 2003.
[23]
I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies, 2008. In CVPR.
[24]
V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos. Anomaly detection in crowded scenes, 2010. In CVPR.
[25]
M. Müller. Information retrieval for music and motion. Springer, page 65, 2007.
[26]
B. N, K. Y, O. T, and K. T. Personalized abstraction of broadcasted american football video by highlight selection. IEEE Trans Multimedia, 6:575--586, 2004.
[27]
Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for tv baseball programs, 2000. In Proc. of ACM Multimedia, Los Angeles.
[28]
A. Singh, R. D. Nowak, and X. Zhu. Unlabeled data: now it helps, now it doesn't, 2008. In NIPS.
[29]
A. Smola, S. Vishwanathan, and T. Hofmann. Kernel methods for missing variables, 2005. Proc. International Workshop on Artificial Intelligence and Statistics.
[30]
C. Wang, L. Zhang, and H.-J. Zhang. Graph-based multiple-instance learning for object-based image retrieval, 2008. MIR '08: Proceeding of the 1st ACM international conference on Multimedia information retrieval.
[31]
J. Wang, C. Xu, E. Chng, K. Wan, and Q. Tian. Automatic generation of personalized music sports video, 2005. In Proc. of ACM International Conference on Multimedia.
[32]
www.dtSearch.com.
[33]
C. Xu, J. Wang, K. Kwan, Y. Li, and L. Duan. Live sports event detection based on broadcast video and web-casting text, 2006. In MM'06 Conference Proceedings.
[34]
A. Yuille and A. Rangarajan. The concave-convex procedure. 15(4):915--936, 2003. Neural Computation.
[35]
D. Zhang and S. Chang. Event detection in baseball video using superimposed caption recognition, 2002. In Proc. of ACM International Conference on Multimedia.
[36]
T. Zhang, H. Lu, and S. Li. Learning semantic scene models by object classification and trajectory clustering, 2009. In CVPR.
[37]
X. Zhu. Semi-supervised learning literature survey, 2008. Computer Sciences Technical Report 1530, University of Wisconsin-Madison.
[38]
D.S.B and M.P. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences, 1980. IEEE Trans. ASSP.

Cited By

View all
  • (2018)Watch, Think and AttendProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240566(690-699)Online publication date: 15-Oct-2018
  • (2018)Deep-Structured Event Modeling for User-Generated PhotosIEEE Transactions on Multimedia10.1109/TMM.2017.278821020:8(2100-2113)Online publication date: Aug-2018
  • (2018)A framework for video event classification by modeling temporal context of multimodal features using HMMJournal of Visual Communication and Image Representation10.1016/j.jvcir.2013.12.00125:2(285-295)Online publication date: 27-Dec-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '10: Proceedings of the 18th ACM international conference on Multimedia
October 2010
1836 pages
ISBN:9781605589336
DOI:10.1145/1873951
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. broadcast video
  2. event detection
  3. internet
  4. multiple instance learning
  5. semi-supervised learning
  6. web-casting text

Qualifiers

  • Research-article

Conference

MM '10
Sponsor:
MM '10: ACM Multimedia Conference
October 25 - 29, 2010
Firenze, Italy

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Watch, Think and AttendProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240566(690-699)Online publication date: 15-Oct-2018
  • (2018)Deep-Structured Event Modeling for User-Generated PhotosIEEE Transactions on Multimedia10.1109/TMM.2017.278821020:8(2100-2113)Online publication date: Aug-2018
  • (2018)A framework for video event classification by modeling temporal context of multimodal features using HMMJournal of Visual Communication and Image Representation10.1016/j.jvcir.2013.12.00125:2(285-295)Online publication date: 27-Dec-2018
  • (2017)A Unified Personalized Video Recommendation via Dynamic Recurrent Neural NetworksProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123433(127-135)Online publication date: 23-Oct-2017
  • (2016)Audio Event Detection using Weakly Labeled DataProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964310(1038-1047)Online publication date: 1-Oct-2016
  • (2015)High-Level codewords based on Granger Causality for video event detectionAdvances in Multimedia10.1155/2015/6983162015(6-6)Online publication date: 1-Jan-2015
  • (2015)Max-margin adaptive model for complex video pattern recognitionMultimedia Tools and Applications10.1007/s11042-014-2010-674:2(505-521)Online publication date: 1-Jan-2015
  • (2014)Video Event Detection Using Motion Relativity and Feature SelectionIEEE Transactions on Multimedia10.1109/TMM.2014.231578016:5(1303-1315)Online publication date: Aug-2014
  • (2014)Pedestrianly event detection using grid-based features2014 International Computer Science and Engineering Conference (ICSEC)10.1109/ICSEC.2014.6978237(440-445)Online publication date: Jul-2014
  • (2014)Perimeter-intrusion event classification for on-line detection using multiple instance learning solving temporal ambiguities2014 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2014.7025487(2408-2412)Online publication date: Oct-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media