[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2696454.2696462acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article

Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?

Published: 02 March 2015 Publication History

Abstract

In this paper, we present a core technology to enable robot recognition of human activities during human-robot interactions. In particular, we propose a methodology for early recognition of activities from robot-centric videos (i.e., first-person videos) obtained from a robot's viewpoint during its interaction with humans. Early recognition, which is also known as activity prediction, is an ability to infer an ongoing activity at its early stage. We present an algorithm to recognize human activities targeting the camera from streaming videos, enabling the robot to predict intended activities of the interacting person as early as possible and take fast reactions to such activities (e.g., avoiding harmful events targeting itself before they actually occur). We introduce the novel concept of 'onset' that efficiently summarizes pre-activity observations, and design a recognition approach to consider event history in addition to visual features from first-person videos. We propose to represent an onset using a cascade histogram of time series gradients, and we describe a novel algorithmic setup to take advantage of such onset for early recognition of activities. The experimental results clearly illustrate that the proposed concept of onset enables better/earlier recognition of human activities from first-person videos collected with a robot.

References

[1]
J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43:16:1'16:43, April 2011.
[2]
J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. A uni'ed framework for gesture recognition and spatiotemporal gesture segmentation. IEEE T PAMI, 31(9):1685'1699, 2009.
[3]
P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In IEEE Workshop on VS-PETS, 2005.
[4]
A. Fathi, A. Farhadi, and J. M. Rehg. Understanding egocentric activities. In ICCV, 2011.
[5]
A. Fathi, J. Hodgins, and J. Rehg. Social interactions: A 'rst-person perspective. In CVPR, 2012.
[6]
M. Hoai and F. D. la Torre. Max-margin early event detectors. In CVPR, 2012.
[7]
K. M. Kitani, T. Okabe, Y. Sato, and A. Sugimoto. Fast unsupervised ego-action learning for 'rst-person sports videos. In CVPR, 2011.
[8]
K. M. Kitani, B. D. Ziebart, J. A. Bagnell, and M. Hebert. Activity forecasting. In ECCV, 2012.
[9]
H. S. Koppula and A. Saxena. Anticipating human activities using object affordances for reactive robotic response. In RSS, 2013.
[10]
T. Lan, L. Sigal, and G. Mori. Social roles in hierarchical models for human activity recognition. In CVPR, 2012.
[11]
I. Laptev. On space-time interest points. IJCV, 64(2):107'123, 2005.
[12]
Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, 2012.
[13]
H. Pirsiavash and D. Ramanan. Detecting activities of daily living in 'rst-person camera views. In CVPR, 2012.
[14]
M. S. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In ICCV, 2011.
[15]
M. S. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, 2009.
[16]
M. S. Ryoo and L. Matthies. First-person activity recognition: What are they doing to me' In CVPR, 2013.
[17]
S. Sadanand and J. Corso. Action bank: A high-level representation of activity in video. In CVPR, 2012.
[18]
S. Waldherr, S. Thrun, and R. Romero. A gesture-based interface for human-robot interaction. Autonomous Robots, 9(2):151'173, 2000.
[19]
L. Xia, C.-C. Chen, and J. K. Aggarwal. View invariant human action recognition using histograms of 3D joints. In CVPRW, 2012.
[20]
D. Xie, S. Todorovic, and S.-C. Zhu. Inferring 'dark matter' and 'dark energy' from videos. In ICCV, 2013.

Cited By

View all
  • (2024)Egocentric intention object prediction based on a human-like mannerEgyptian Informatics Journal10.1016/j.eij.2024.10048226(100482)Online publication date: Jun-2024
  • (2024)PALM: Predicting Actions through Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-73007-8_9(140-158)Online publication date: 1-Oct-2024
  • (2024)Action Transition Recognition Using Principal Component Analysis for Agricultural Robot FollowingIntelligent Autonomous Systems 1810.1007/978-3-031-44851-5_14(175-187)Online publication date: 25-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '15: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction
March 2015
368 pages
ISBN:9781450328838
DOI:10.1145/2696454
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. activity recognition
  2. first-person videos
  3. human-robot interaction

Qualifiers

  • Research-article

Funding Sources

Conference

HRI '15
Sponsor:

Acceptance Rates

HRI '15 Paper Acceptance Rate 43 of 169 submissions, 25%;
Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Egocentric intention object prediction based on a human-like mannerEgyptian Informatics Journal10.1016/j.eij.2024.10048226(100482)Online publication date: Jun-2024
  • (2024)PALM: Predicting Actions through Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-73007-8_9(140-158)Online publication date: 1-Oct-2024
  • (2024)Action Transition Recognition Using Principal Component Analysis for Agricultural Robot FollowingIntelligent Autonomous Systems 1810.1007/978-3-031-44851-5_14(175-187)Online publication date: 25-Apr-2024
  • (2023)Active vision reinforcement learning under limited visual observabilityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666575(10316-10338)Online publication date: 10-Dec-2023
  • (2023)Self-Regulated Learning for Egocentric Video Activity AnticipationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.305992345:6(6715-6730)Online publication date: 1-Jun-2023
  • (2023)GA-Net: A Guidance Aware Network for Skeleton-Based Early Activity RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.313774525(1061-1073)Online publication date: 2023
  • (2023)Magi-Net: Meta Negative Network for Early Activity PredictionIEEE Transactions on Image Processing10.1109/TIP.2023.327999132(3254-3265)Online publication date: 2023
  • (2023)Analyzing Interactions in Paired Egocentric Videos2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG57933.2023.10042654(1-7)Online publication date: 5-Jan-2023
  • (2023)MECCANO: A multimodal egocentric dataset for humans behavior understanding in the industrial-like domainComputer Vision and Image Understanding10.1016/j.cviu.2023.103764(103764)Online publication date: Jun-2023
  • (2023)Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive TasksInternational Journal of Computer Vision10.1007/s11263-023-01850-6131:12(3272-3288)Online publication date: 13-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media