poster

What Can We Learn about Motion Videos from Still Images?

Authors:

Jianmin JiangAuthors Info & Claims

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Pages 973 - 976

https://doi.org/10.1145/2647868.2654992

Published: 03 November 2014 Publication History

Get Access

Abstract

Human action recognition from motion videos plays an important role in multimedia analysis. Different from the temporal cues of action series in motion videos, the motion tendency can also be revealed from the still images or key frames. Thus, if the action knowledge in related still images can be well adapted to the target motion videos, we would have a great chance to improve the performance of video action recognition. In this paper, we propose a framework of Still-to-Motion Adaptation (SMA) for human action recognition. Common visual features are extracted both from the related images and target videos' key frames, by which the gap between still images and videos are bridged. Meanwhile, to utilize the unlabeled training videos in target domain, we incorporate a semi-supervised process into our framework. By minimizing the difference of action prediction from still features and motion features, we formulate the still-to-motion adaptation into a joint optimization process. Experiments successfully demonstrate the effectiveness of the proposed framework and show the better performance of action recognition compared with the state-of-the-art methods. We also analyze the impact on the recognition results of target videos by knowledge adaptation from still images.

References

[1]

V. Delaitre, I. Laptev, and J. Sivic. Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC (2010).

Google Scholar

[2]

L. Duan, D. Xu, and I. Tsang. Learning with augmented features for heterogeneous domain adaptation. In ICML (2012).

Google Scholar

[3]

A. Gupta, A. Kembhavi, and L. S. Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on PAMI, 2009.

Digital Library

Google Scholar

[4]

Y. Han, Y. Yang, Z. Ma, H. Shen, N. Sebe, and X. Zhou. Image attribute adaptation. IEEE Transactions on Multimedia, 2014.

Digital Library

Google Scholar

[5]

Z. Ma, Y. Yang, N. Sebe, and A. G. Hauptmann. Knowledge adaptation with partially shared features for event detection using few exemplars. IEEE Transactions on PAMI, 2014.

Google Scholar

[6]

K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01, November 2012.

Google Scholar

[7]

H. Wang, M. M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition. In BMVC(2009).

Google Scholar

[8]

F. Wu, X. Lu, Z. Zhang, S. Yan, Y. Rui, and Y. Zhuang. Cross-media semantic representation via bi-directional learning to rank. In ACM MM (2013).

Digital Library

Google Scholar

[9]

Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia, 2013.

Google Scholar

[10]

Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Transfer tagging from image to video. In ACM MM (2011).

Digital Library

Google Scholar

[11]

B. Yao and L. Fei-Fei. Grouplet: A structured image representation for recognizing human and object interactions. In CVPR (2010).

Google Scholar

[12]

B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action recognition by learning bases of action attributes and parts. In ICCV (2011).

Digital Library

Google Scholar

Cited By

View all

Zhang JHan YJiang JZhou ZAn DLiu JSong Z(2019)A feature selection framework for video semantic recognition via integrated cross-media analysis and embedded learningEURASIP Journal on Image and Video Processing10.1186/s13640-019-0428-52019:1Online publication date: 13-Feb-2019
https://doi.org/10.1186/s13640-019-0428-5
Zhang HWu PBeck AZhang ZGao X(2016)Adaptive incremental learning of image semantics with application to social robotNeurocomputing10.1016/j.neucom.2015.07.104173:P1(93-101)Online publication date: 15-Jan-2016
https://dl.acm.org/doi/10.1016/j.neucom.2015.07.104
Liu GYan YSubramanian RSong JLu GSebe N(2016)Active domain adaptation with noisy labels for multimedia analysisWorld Wide Web10.1007/s11280-015-0343-319:2(199-215)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s11280-015-0343-3
Show More Cited By

Index Terms

What Can We Learn about Motion Videos from Still Images?
1. Computing methodologies

Recommendations

Do less and achieve more

We collect three large web action image datasets.We verify that web action images are complementary to training videos by extensive experiments.We show both filtered and unfiltered web action images are complementary to training videos.We show ...
Local velocity-adapted motion events for spatio-temporal recognition

In this paper, we address the problem of motion recognition using event-based local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we ...
Human action recognition in videos using motion impression image
ICIMCS '09: Proceedings of the First International Conference on Internet Multimedia Computing and Service

Human action recognition in surveillance has become a hot topic in computer vision. In this paper, we develope a new method to recognize human action using motion information in video. Video sequence is compressed along time axis into a Motion Impression ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

November 2014

1310 pages

ISBN:9781450330633

DOI:10.1145/2647868

General Chairs:
Kien A. Hua
University of Central Florida, USA
,
Yong Rui
Microsoft Research, China
,
Ralf Steinmetz
Technische Universitt Darmstadt, Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, Netherlands
,
Apostol (Paul) Natsev
Google, USA
,
Wenwu Zhu
Tsinghua University, China

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

Conference

MM '14

Sponsor:

SIGMM

MM '14: 2014 ACM Multimedia Conference

November 3 - 7, 2014

Florida, Orlando, USA

Acceptance Rates

MM '14 Paper Acceptance Rate 55 of 286 submissions, 19%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
222
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang JHan YJiang JZhou ZAn DLiu JSong Z(2019)A feature selection framework for video semantic recognition via integrated cross-media analysis and embedded learningEURASIP Journal on Image and Video Processing10.1186/s13640-019-0428-52019:1Online publication date: 13-Feb-2019
https://doi.org/10.1186/s13640-019-0428-5
Zhang HWu PBeck AZhang ZGao X(2016)Adaptive incremental learning of image semantics with application to social robotNeurocomputing10.1016/j.neucom.2015.07.104173:P1(93-101)Online publication date: 15-Jan-2016
https://dl.acm.org/doi/10.1016/j.neucom.2015.07.104
Liu GYan YSubramanian RSong JLu GSebe N(2016)Active domain adaptation with noisy labels for multimedia analysisWorld Wide Web10.1007/s11280-015-0343-319:2(199-215)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s11280-015-0343-3
Zhang HGao XWu PXu X(2016)A cross-media distance metric learning framework based on multi-view correlation mining and matchingWorld Wide Web10.1007/s11280-015-0342-419:2(181-197)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s11280-015-0342-4
Zhang LZhang YHong RTian Q(2015)Full-Space Local Topology Extraction for Cross-Modal RetrievalIEEE Transactions on Image Processing10.1109/TIP.2015.241907424:7(2212-2224)Online publication date: Jul-2015
https://doi.org/10.1109/TIP.2015.2419074

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Do less and achieve more

Local velocity-adapted motion events for spatio-temporal recognition

Human action recognition in videos using motion impression image