More Web Proxy on the site http://driver.im/

research-article

A generic framework for event detection in various video domains

Authors:

Hanqing LuAuthors Info & Claims

MM '10: Proceedings of the 18th ACM international conference on Multimedia

Pages 103 - 112

https://doi.org/10.1145/1873951.1873967

Published: 25 October 2010 Publication History

Abstract

Event detection is essential for the extensively studied video analysis and understanding area. Although various approaches have been proposed for event detection, there is a lack of a generic event detection framework that can be applied to various video domains (e.g. sports, news, movies, surveillance). In this paper, we present a generic event detection approach based on semi-supervised learning and Internet vision. Concretely, a Graph-based Semi-Supervised Multiple Instance Learning (GSSMIL) algorithm is proposed to jointly explore small-scale expert labeled videos and large-scale unlabeled videos to train the event models to detect video event boundaries. The expert labeled videos are obtained from the analysis and alignment of well-structured video related text (e.g. movie scripts, web-casting text, close caption). The unlabeled data are obtained by querying related events from the video search engine (e.g. YouTube) in order to give more distributive information for event modeling. A critical issue of GSSMIL in constructing a graph is the weight assignment, where the weight of an edge specifies the similarity between two data points. To tackle this problem, we propose a novel Multiple Instance Learning Induced Similarity (MILIS) measure by learning instance sensitive classifiers. We perform the thorough experiments in three popular video domains: movies, sports and news. The results compared with the state-of-the-arts are promising and demonstrate our proposed approach is performance-effective.

References

[1]

http://www.mosek.com/.

[2]

S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning, 2002. In NIPS.

[3]

N. Babaguchi, Y. Kawai, and T. Kitahashi. Event based indexing of broadcasted sports video by intermodal collaboration, 2002. IEEE Transactions on Multimedia.

Digital Library

[4]

A. Basharat, A. Gritai, and M. Shah. Learning object motion patterns for anomaly detection and improved object detection, 2008. In CVPR.

[5]

M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering, 2002. In NIPS.

[6]

S. Boyd and L. Vandenberghe. Convex optimization, 2003. Cambridge University Press.

Digital Library

[7]

H. Cheng, Z. Liu, and Z. Liu. Sparsity induced similarity measure for label propagation, 2009. In ICCV.

[8]

T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar. Movie/script: Alignment and parsing of video and text transcription, 2008. In ECCV.

Digital Library

[9]

M.-S. Dao and N. Babaguchi. Sports event detection using temporal patterns mining and web-casting text, 2008. In AREA '08: Proceeding of the 1st ACM workshop on Analysis and retrieval of events/actions and workflows in video streams.

Digital Library

[10]

T. Dietterich, R. Lathrop, and T. Lozano-Perez. Solving the multiple-instance problem with axis parallel rectangles, 1997. Artificial Intelligence.

Digital Library

[11]

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features, 2005. In PETS.

[12]

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce. Automatic annotation of human actions in video, 2009. In ICCV.

[13]

A. Ekin, A. M. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization, 2003. IEEE Trans. on Image Processing.

Digital Library

[14]

M. Everingham, J. Sivic, and A. Zisserman. Hello! my name is buffy - automatic naming of characters in tv video, 2006. In BMVC.

[15]

M. Fleischman and D. Roy. Grounded language modeling for automatic speech recognition of sports video. In Proceedings of ACL-08: HLT.

[16]

A. G. Hauptmann and M. J. Witbrock. Story segmentation and detection of commercials in broadcast news video, 1998. Advances in Digital Libraries.

Digital Library

[17]

http://opennlp.sourceforge.net.

[18]

Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. S. Huang. Action detection in complex scenes with spatial and temporal ambiguities, 2009. In ICCV.

[19]

C. Huang, W. Hsu, and S. Chang. Automatic closed caption alignment based on speech recognition transcripts, 2003. Tech. Rep. 007, Columbia University.

[20]

Y. Jia and C. Zhang. Instance-level semisupervised multiple instance learning, 2008. AAAI'08: Proceedings of the 23rd national conference on Artificial intelligence.

Digital Library

[21]

T. Kato, H. Kashima, and M. Sugiyama. Robust label propagation on multiple networks. 2009. IEEE TNN.

Digital Library

[22]

J. G. Kim, H. S. Chang, K. Kang, M. Kim, J. Kim, and H. M. Kim. Summarization of news video and its description for content-based access. International Journal of Imaging Systems and Technology, 13:267--274, 2003.

[23]

I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies, 2008. In CVPR.

[24]

V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos. Anomaly detection in crowded scenes, 2010. In CVPR.

[25]

M. Müller. Information retrieval for music and motion. Springer, page 65, 2007.

Digital Library

[26]

B. N, K. Y, O. T, and K. T. Personalized abstraction of broadcasted american football video by highlight selection. IEEE Trans Multimedia, 6:575--586, 2004.

Digital Library

[27]

Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for tv baseball programs, 2000. In Proc. of ACM Multimedia, Los Angeles.

Digital Library

[28]

A. Singh, R. D. Nowak, and X. Zhu. Unlabeled data: now it helps, now it doesn't, 2008. In NIPS.

[29]

A. Smola, S. Vishwanathan, and T. Hofmann. Kernel methods for missing variables, 2005. Proc. International Workshop on Artificial Intelligence and Statistics.

[30]

C. Wang, L. Zhang, and H.-J. Zhang. Graph-based multiple-instance learning for object-based image retrieval, 2008. MIR '08: Proceeding of the 1st ACM international conference on Multimedia information retrieval.

Digital Library

[31]

J. Wang, C. Xu, E. Chng, K. Wan, and Q. Tian. Automatic generation of personalized music sports video, 2005. In Proc. of ACM International Conference on Multimedia.

Digital Library

[32]

www.dtSearch.com.

[33]

C. Xu, J. Wang, K. Kwan, Y. Li, and L. Duan. Live sports event detection based on broadcast video and web-casting text, 2006. In MM'06 Conference Proceedings.

Digital Library

[34]

A. Yuille and A. Rangarajan. The concave-convex procedure. 15(4):915--936, 2003. Neural Computation.

Digital Library

[35]

D. Zhang and S. Chang. Event detection in baseball video using superimposed caption recognition, 2002. In Proc. of ACM International Conference on Multimedia.

Digital Library

[36]

T. Zhang, H. Lu, and S. Li. Learning semantic scene models by object classification and trajectory clustering, 2009. In CVPR.

[37]

X. Zhu. Semi-supervised learning literature survey, 2008. Computer Sciences Technical Report 1530, University of Wisconsin-Madison.

[38]

D.S.B and M.P. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences, 1980. IEEE Trans. ASSP.

Cited By

Gao JZhang TXu CBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Watch, Think and AttendProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240566(690-699)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3240508.3240566
Yang XZhang TXu C(2018)Deep-Structured Event Modeling for User-Generated PhotosIEEE Transactions on Multimedia10.1109/TMM.2017.278821020:8(2100-2113)Online publication date: Aug-2018
https://doi.org/10.1109/TMM.2017.2788210
Chen HTsai W(2018)A framework for video event classification by modeling temporal context of multimodal features using HMMJournal of Visual Communication and Image Representation10.1016/j.jvcir.2013.12.00125:2(285-295)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1016/j.jvcir.2013.12.001
Show More Cited By

Index Terms

A generic framework for event detection in various video domains
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Multimedia Event Detection Using Event-Driven Multiple Instance Learning
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

A complex event can be recognized by observing necessary evidences. In the real world scenarios, this is a difficult task because the evidences can happen anywhere in a video. A straightforward solution is to decompose the video into several segments ...
Online MIL tracking with instance-level semi-supervised learning

In this paper we propose an online multiple instance boosting algorithm with instance-level semi-supervised learning, termed SemiMILBoost, to achieve robust object tracking. Our work revisits the multiple instance learning (MIL) formulation to alleviate ...
Live sports event detection based on broadcast video and web-casting text
MM '06: Proceedings of the 14th ACM international conference on Multimedia

Event detection is essential for sports video summarization, indexing and retrieval and extensive research efforts have been devoted to this area. However, the previous approaches are heavily relying on video content itself and require the whole video ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '10: Proceedings of the 18th ACM international conference on Multimedia

October 2010

1836 pages

ISBN:9781605589336

DOI:10.1145/1873951

General Chairs:
Alberto del Bimbo
University of Florence, Italy
,
Shih-Fu Chang
Columbia University, USA
,
Program Chair:
Arnold Smeulders
University of Amsterdam, NL

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '10

Sponsor:

SIGMM

MM '10: ACM Multimedia Conference

October 25 - 29, 2010

Firenze, Italy

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
614
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gao JZhang TXu CBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Watch, Think and AttendProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240566(690-699)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3240508.3240566
Yang XZhang TXu C(2018)Deep-Structured Event Modeling for User-Generated PhotosIEEE Transactions on Multimedia10.1109/TMM.2017.278821020:8(2100-2113)Online publication date: Aug-2018
https://doi.org/10.1109/TMM.2017.2788210
Chen HTsai W(2018)A framework for video event classification by modeling temporal context of multimodal features using HMMJournal of Visual Communication and Image Representation10.1016/j.jvcir.2013.12.00125:2(285-295)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1016/j.jvcir.2013.12.001
Gao JZhang TXu CLiu QLienhart RWang HChen SBoll SChen PFriedland GLi JYan S(2017)A Unified Personalized Video Recommendation via Dynamic Recurrent Neural NetworksProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123433(127-135)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3123266.3123433
Kumar ARaj BHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Audio Event Detection using Weakly Labeled DataProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964310(1038-1047)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2964310
Huang SHuang DKhuhro M(2015)High-Level codewords based on Granger Causality for video event detectionAdvances in Multimedia10.1155/2015/6983162015(6-6)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1155/2015/698316
Yu LShao JXu XShen H(2015)Max-margin adaptive model for complex video pattern recognitionMultimedia Tools and Applications10.1007/s11042-014-2010-674:2(505-521)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1007/s11042-014-2010-6
Wang FSun ZJiang YNgo C(2014)Video Event Detection Using Motion Relativity and Feature SelectionIEEE Transactions on Multimedia10.1109/TMM.2014.231578016:5(1303-1315)Online publication date: Aug-2014
https://doi.org/10.1109/TMM.2014.2315780
Preechasuk JPiamsa-nga P(2014)Pedestrianly event detection using grid-based features2014 International Computer Science and Engineering Conference (ICSEC)10.1109/ICSEC.2014.6978237(440-445)Online publication date: Jul-2014
https://doi.org/10.1109/ICSEC.2014.6978237
Vijverberg JJanssen Rde Zwart Rde With P(2014)Perimeter-intrusion event classification for on-line detection using multiple instance learning solving temporal ambiguities2014 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2014.7025487(2408-2412)Online publication date: Oct-2014
https://doi.org/10.1109/ICIP.2014.7025487
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents