[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-642-33718-5_11guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Script data for attribute-based recognition of composite activities

Published: 07 October 2012 Publication History

Abstract

State-of-the-art human activity recognition methods build on discriminative learning which requires a representative training set for good performance. This leads to scalability issues for the recognition of large sets of highly diverse activities. In this paper we leverage the fact that many human activities are compositional and that the essential components of the activities can be obtained from textual descriptions or scripts. To share and transfer knowledge between composite activities we model them by a common set of attributes corresponding to basic actions and object participants. This attribute representation allows to incorporate script data that delivers new variations of a composite activity or even to unseen composite activities. In our experiments on 41 composite cooking tasks, we found that script data to successfully capture the high variability of composite activities. We show improvements in a supervised case where training data for all composite cooking tasks is available, but we are also able to recognize unseen composites by just using script data and without any manual video annotation.

References

[1]
Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action Recognition by Dense Trajectories. In: CVPR (2011).
[2]
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010).
[3]
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392-405. Springer, Heidelberg (2010).
[4]
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004).
[5]
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009).
[6]
Liu, J.G., Luo, J.B., Shah, M.: Recognizing realistic actions from videos 'in the wild'. In: CVPR (2009).
[7]
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008).
[8]
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009).
[9]
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities, cvpr. In: ICCV (2011).
[10]
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009).
[11]
Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., Schiele, B.: What Helps Where - And Why? Semantic Relatedness for Knowledge Transfer. In: CVPR (2010).
[12]
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011).
[13]
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR (2012).
[14]
Laptev, I.: On space-time interest points. In: IJCV (2005).
[15]
Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J., Roca, F.X.: A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: ICCV (2011).
[16]
Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007).
[17]
Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., Rehg, J.M.: A scalable approach to activity recognition based on object use. In: ICCV (2007).
[18]
Li, L.J., Su, H., Lim, Y., Fei-Fei, L.: Objects as attributes for scene classification. In: ECCV (2010).
[19]
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei1, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011).
[20]
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007).
[21]
Farhadi, A., Endres, I., Hoiem, D.: Attribute-centric recognition for cross-category generalization. In: CVPR (2010).
[22]
Fellbaum, C.: WordNet: An Electronical Lexical Database. The MIT Press (1998).
[23]
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008).
[24]
Snoek, C., Worring, M., van Gemert, J., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM Multimedia (2006).
[25]
Hauptmann, A.G., Christel, M.G., Yan, R.: Video retrieval based on semantic concepts. Proceedings of IEEE 96 (2008).
[26]
Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR (2011).
[27]
Schank, R.C., Abelson, R.P.: Scripts, Plans, Goals and Understanding (1977).
[28]
Barr, A., Feigenbaum, E.: The Handbook of Artificial Intelligence, vol. 1. William Kaufman Inc., Los Altos (1981).
[29]
Regneri, M., Koller, A., Pinkal, M.: Learning script knowledge with web experiments. In: Proceedings of ACL 2010 (2010).
[30]
Bloem, J., Regneri, M., Thater, S.: Robust processing of noisy web-collected data. In: Proceedings of KONVENS 2012 (2012).
[31]
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. In: Information Processing and Management (1988).
[32]
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010).

Cited By

View all
  • (2024)Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment LocalizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680774(9214-9223)Online publication date: 28-Oct-2024
  • (2024)Momentum Cross-Modal Contrastive Learning for Video Moment RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334409734:7(5977-5994)Online publication date: 1-Jul-2024
  • (2024)Boundary-Aware Noise-Resistant Video Moment RetrievalArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72338-4_14(193-206)Online publication date: 17-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ECCV'12: Proceedings of the 12th European conference on Computer Vision - Volume Part I
October 2012
881 pages
ISBN:9783642337178
  • Editors:
  • Andrew Fitzgibbon,
  • Svetlana Lazebnik,
  • Pietro Perona,
  • Yoichi Sato,
  • Cordelia Schmid

Sponsors

  • Toshiba Corporation: Toshiba Corporation
  • University of Cambridge: University of Cambridge
  • Adobe
  • TOYOTA: TOYOTA
  • Google Inc.
  • IBMR: IBM Research
  • NVIDIA
  • DATALOGIC: DATALOGIC
  • Microsoft Reasearch: Microsoft Reasearch
  • Point Grey: Point Grey
  • technicolor
  • Mobileye: Mobileye

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 October 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment LocalizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680774(9214-9223)Online publication date: 28-Oct-2024
  • (2024)Momentum Cross-Modal Contrastive Learning for Video Moment RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334409734:7(5977-5994)Online publication date: 1-Jul-2024
  • (2024)Boundary-Aware Noise-Resistant Video Moment RetrievalArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72338-4_14(193-206)Online publication date: 17-Sep-2024
  • (2023)Semantic Collaborative Learning for Cross-Modal Moment LocalizationACM Transactions on Information Systems10.1145/362066942:2(1-26)Online publication date: 7-Sep-2023
  • (2023)Probability Distribution Based Frame-supervised Language-driven Action LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612512(5164-5173)Online publication date: 26-Oct-2023
  • (2023)Reducing Intrinsic and Extrinsic Data Biases for Moment Localization with Natural LanguageProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612357(4584-4594)Online publication date: 26-Oct-2023
  • (2023)A Survey on Video Moment LocalizationACM Computing Surveys10.1145/355653755:9(1-37)Online publication date: 16-Jan-2023
  • (2023)Progressive Localization Networks for Language-Based Moment LocalizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354385719:2(1-21)Online publication date: 6-Feb-2023
  • (2023)A Survey on Temporal Sentence Grounding in VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353262619:2(1-33)Online publication date: 6-Feb-2023
  • (2022)Dual-Channel Localization Networks for Moment Retrieval with Natural LanguageProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531394(351-359)Online publication date: 27-Jun-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media