[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-319-04114-8_33guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Resource Constrained Multimedia Event Detection

Published: 06 January 2014 Publication History

Abstract

We present a study comparing the cost and efficiency tradeoffs of multiple features for multimedia event detection. Low-level as well as semantic features are a critical part of contemporary multimedia and computer vision research. Arguably, combinations of multiple feature sets have been a major reason for recent progress in the field, not just as a low dimensional representations of multimedia data, but also as a means to semantically summarize images and videos. However, their efficacy for complex event recognition in unconstrained videos on standardized datasets has not been systematically studied. In this paper, we evaluate the accuracy and contribution of more than 10 multi-modality features, including semantic and low-level video representations, using two newly released NIST TRECVID Multimedia Event Detection (MED) open source datasets, i.e. MEDTEST and KINDREDTEST, which contain more than 1000 hours of videos. Contrasting multiple performance metrics, such as average precision, probability of missed detection and minimum normalized detection cost, we propose a framework to balance the trade-off between accuracy and computational cost. This study provides an empirical foundation for selecting feature sets that are capable of dealing with large-scale data with limited computational resources and are likely to produce superior multimedia event detection accuracy. This framework also applies to other resource limited multimedia analyses such as selecting/fusing multiple classifiers and different representations of each feature set.

References

[1]
http://www.psc.edu/index.php/computing-resources/blacklight
[2]
Bao, L., Yu, S.-I., Lan, Z.Z., Overwijk, A., Jin, Q., Langner, B., Garbus, M., Burger, S., Metze, F., Hauptmann, A.: Informedia@ trecvid 2011. In: TRECVID 2011 (2011)
[3]
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol.ä3951, pp. 404—417. Springer, Heidelberg (2006)
[4]
Chen, M.-Y., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. CMU-CS-09-161 (2009)
[5]
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol.ä1, pp. 886—893. IEEE (2005)
[6]
Ebadollahi, S., Chang, S.-F., Xie, L., Smith John, R.: Visual event detection using multi-dimensional concept semantics. In: ICME, pp. 881—884 (2006)
[7]
Jiang, Y.-G.: Super: Towards real-time event recognition in internet videos. In: ICMR, p. 7. ACM (2012)
[8]
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106—1114 (2012)
[9]
Lan, Z.-z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol.ä7131, pp. 173—185. Springer, Heidelberg (2012)
[10]
Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimedia Tools and Applications, 1—15 (2013)
[11]
Laptev, I.: On space-time interest points. IJCVä64(2-3), 107—123 (2005)
[12]
Li, L.-J., Su, H., Fei-Fei, L., Xing, E.P.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: NIPS, pp. 1378——1386 (2010)
[13]
Liu, J., Yu, Q., Javed, O., Ali, S., Tamrakar, A., Divakaran, A., Cheng, H., Sawhney, H.S.: Video event recognition using concept attributes. In: WACV, pp. 339—346 (2013)
[14]
Merler, M., Member, S., Huang, B., Xie, L., Hua, G.: Semantic Model vectors for complex video event recognition. IEEE Trans. on Multimediaä14(1), 88—101 (2012)
[15]
Moosmann, F., Nowak, E., Jurie, F.: Randomized clustering forests for image classification. PAMIä30(9), 1632—1646 (2008)
[16]
Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Shaw, B., Kraaij, W., Smeaton, A.F., Quéenot, G.: Trecvid 2012 — an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID. NIST, USA (2012)
[17]
Tamrakar, A., Ali, S., Yu, Q., Liu, J., Javed, O., Divakaran, A., Cheng, H., Sawhney, H., International Sarnoff, S.R.I.: Evaluation of low-level leatures and their combinations for complex event detection in open source videos. In: CVPR, pp. 3681—3688 (2012)
[18]
Van De Sande, K.E.A., Gevers, T., Cees, G.M.S.: Evaluating color descriptors for object and scene recognition. PAMIä32(9), 1582—1596 (2010)
[19]
Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: CVPR, pp. 3169—3176. IEEE (2011)
[20]
Yang, J., Jiang, Y.-G., Hauptmann, A.G., Ngo, C.-W.: Evaluating bag-of-visual-words representations in scene classification. In: Workshop on ICMR, pp. 197—206. ACM (2007)

Cited By

View all
  • (2016)Event-based media processing and analysisImage and Vision Computing10.1016/j.imavis.2016.05.00553:C(3-19)Online publication date: 1-Sep-2016
  • (2015)Content-Based Video Search over 1 Million Videos with 1 Core in 1 SecondProceedings of the 5th ACM on International Conference on Multimedia Retrieval10.1145/2671188.2749398(419-426)Online publication date: 22-Jun-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
MMM 2014: Proceedings of the 20th Anniversary International Conference on MultiMedia Modeling - Volume 8325
January 2014
604 pages
ISBN:9783319041131
  • Editors:
  • Cathal Gurrin,
  • Frank Hopfgartner,
  • Wolfgang Hurst,
  • Håvard Johansen,
  • Hyowon Lee,
  • Noel O'Connor

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 06 January 2014

Author Tags

  1. Feature Selection
  2. Limited Resource
  3. Multimedia Event Detection

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Event-based media processing and analysisImage and Vision Computing10.1016/j.imavis.2016.05.00553:C(3-19)Online publication date: 1-Sep-2016
  • (2015)Content-Based Video Search over 1 Million Videos with 1 Core in 1 SecondProceedings of the 5th ACM on International Conference on Multimedia Retrieval10.1145/2671188.2749398(419-426)Online publication date: 22-Jun-2015

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media