[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2647868.2654992acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

What Can We Learn about Motion Videos from Still Images?

Published: 03 November 2014 Publication History

Abstract

Human action recognition from motion videos plays an important role in multimedia analysis. Different from the temporal cues of action series in motion videos, the motion tendency can also be revealed from the still images or key frames. Thus, if the action knowledge in related still images can be well adapted to the target motion videos, we would have a great chance to improve the performance of video action recognition. In this paper, we propose a framework of Still-to-Motion Adaptation (SMA) for human action recognition. Common visual features are extracted both from the related images and target videos' key frames, by which the gap between still images and videos are bridged. Meanwhile, to utilize the unlabeled training videos in target domain, we incorporate a semi-supervised process into our framework. By minimizing the difference of action prediction from still features and motion features, we formulate the still-to-motion adaptation into a joint optimization process. Experiments successfully demonstrate the effectiveness of the proposed framework and show the better performance of action recognition compared with the state-of-the-art methods. We also analyze the impact on the recognition results of target videos by knowledge adaptation from still images.

References

[1]
V. Delaitre, I. Laptev, and J. Sivic. Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC (2010).
[2]
L. Duan, D. Xu, and I. Tsang. Learning with augmented features for heterogeneous domain adaptation. In ICML (2012).
[3]
A. Gupta, A. Kembhavi, and L. S. Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on PAMI, 2009.
[4]
Y. Han, Y. Yang, Z. Ma, H. Shen, N. Sebe, and X. Zhou. Image attribute adaptation. IEEE Transactions on Multimedia, 2014.
[5]
Z. Ma, Y. Yang, N. Sebe, and A. G. Hauptmann. Knowledge adaptation with partially shared features for event detection using few exemplars. IEEE Transactions on PAMI, 2014.
[6]
K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01, November 2012.
[7]
H. Wang, M. M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition. In BMVC(2009).
[8]
F. Wu, X. Lu, Z. Zhang, S. Yan, Y. Rui, and Y. Zhuang. Cross-media semantic representation via bi-directional learning to rank. In ACM MM (2013).
[9]
Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia, 2013.
[10]
Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Transfer tagging from image to video. In ACM MM (2011).
[11]
B. Yao and L. Fei-Fei. Grouplet: A structured image representation for recognizing human and object interactions. In CVPR (2010).
[12]
B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action recognition by learning bases of action attributes and parts. In ICCV (2011).

Cited By

View all
  • (2019)A feature selection framework for video semantic recognition via integrated cross-media analysis and embedded learningEURASIP Journal on Image and Video Processing10.1186/s13640-019-0428-52019:1Online publication date: 13-Feb-2019
  • (2016)Adaptive incremental learning of image semantics with application to social robotNeurocomputing10.1016/j.neucom.2015.07.104173:P1(93-101)Online publication date: 15-Jan-2016
  • (2016)Active domain adaptation with noisy labels for multimedia analysisWorld Wide Web10.1007/s11280-015-0343-319:2(199-215)Online publication date: 1-Mar-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '14: Proceedings of the 22nd ACM international conference on Multimedia
November 2014
1310 pages
ISBN:9781450330633
DOI:10.1145/2647868
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. action recognition
  2. domain adaptation
  3. images
  4. videos

Qualifiers

  • Poster

Funding Sources

Conference

MM '14
Sponsor:
MM '14: 2014 ACM Multimedia Conference
November 3 - 7, 2014
Florida, Orlando, USA

Acceptance Rates

MM '14 Paper Acceptance Rate 55 of 286 submissions, 19%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)A feature selection framework for video semantic recognition via integrated cross-media analysis and embedded learningEURASIP Journal on Image and Video Processing10.1186/s13640-019-0428-52019:1Online publication date: 13-Feb-2019
  • (2016)Adaptive incremental learning of image semantics with application to social robotNeurocomputing10.1016/j.neucom.2015.07.104173:P1(93-101)Online publication date: 15-Jan-2016
  • (2016)Active domain adaptation with noisy labels for multimedia analysisWorld Wide Web10.1007/s11280-015-0343-319:2(199-215)Online publication date: 1-Mar-2016
  • (2016)A cross-media distance metric learning framework based on multi-view correlation mining and matchingWorld Wide Web10.1007/s11280-015-0342-419:2(181-197)Online publication date: 1-Mar-2016
  • (2015)Full-Space Local Topology Extraction for Cross-Modal RetrievalIEEE Transactions on Image Processing10.1109/TIP.2015.241907424:7(2212-2224)Online publication date: Jul-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media