More Web Proxy on the site http://driver.im/

research-article

Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?

Authors:

Thomas J. Fuchs,

Larry MatthiesAuthors Info & Claims

HRI '15: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction

Pages 295 - 302

https://doi.org/10.1145/2696454.2696462

Published: 02 March 2015 Publication History

Abstract

In this paper, we present a core technology to enable robot recognition of human activities during human-robot interactions. In particular, we propose a methodology for early recognition of activities from robot-centric videos (i.e., first-person videos) obtained from a robot's viewpoint during its interaction with humans. Early recognition, which is also known as activity prediction, is an ability to infer an ongoing activity at its early stage. We present an algorithm to recognize human activities targeting the camera from streaming videos, enabling the robot to predict intended activities of the interacting person as early as possible and take fast reactions to such activities (e.g., avoiding harmful events targeting itself before they actually occur). We introduce the novel concept of 'onset' that efficiently summarizes pre-activity observations, and design a recognition approach to consider event history in addition to visual features from first-person videos. We propose to represent an onset using a cascade histogram of time series gradients, and we describe a novel algorithmic setup to take advantage of such onset for early recognition of activities. The experimental results clearly illustrate that the proposed concept of onset enables better/earlier recognition of human activities from first-person videos collected with a robot.

References

[1]

J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43:16:1'16:43, April 2011.

Digital Library

[2]

J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. A uni'ed framework for gesture recognition and spatiotemporal gesture segmentation. IEEE T PAMI, 31(9):1685'1699, 2009.

Digital Library

[3]

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In IEEE Workshop on VS-PETS, 2005.

[4]

A. Fathi, A. Farhadi, and J. M. Rehg. Understanding egocentric activities. In ICCV, 2011.

Digital Library

[5]

A. Fathi, J. Hodgins, and J. Rehg. Social interactions: A 'rst-person perspective. In CVPR, 2012.

Digital Library

[6]

M. Hoai and F. D. la Torre. Max-margin early event detectors. In CVPR, 2012.

Digital Library

[7]

K. M. Kitani, T. Okabe, Y. Sato, and A. Sugimoto. Fast unsupervised ego-action learning for 'rst-person sports videos. In CVPR, 2011.

Digital Library

[8]

K. M. Kitani, B. D. Ziebart, J. A. Bagnell, and M. Hebert. Activity forecasting. In ECCV, 2012.

Digital Library

[9]

H. S. Koppula and A. Saxena. Anticipating human activities using object affordances for reactive robotic response. In RSS, 2013.

[10]

T. Lan, L. Sigal, and G. Mori. Social roles in hierarchical models for human activity recognition. In CVPR, 2012.

[11]

I. Laptev. On space-time interest points. IJCV, 64(2):107'123, 2005.

Digital Library

[12]

Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, 2012.

Digital Library

[13]

H. Pirsiavash and D. Ramanan. Detecting activities of daily living in 'rst-person camera views. In CVPR, 2012.

Digital Library

[14]

M. S. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In ICCV, 2011.

Digital Library

[15]

M. S. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, 2009.

[16]

M. S. Ryoo and L. Matthies. First-person activity recognition: What are they doing to me' In CVPR, 2013.

Digital Library

[17]

S. Sadanand and J. Corso. Action bank: A high-level representation of activity in video. In CVPR, 2012.

Digital Library

[18]

S. Waldherr, S. Thrun, and R. Romero. A gesture-based interface for human-robot interaction. Autonomous Robots, 9(2):151'173, 2000.

Digital Library

[19]

L. Xia, C.-C. Chen, and J. K. Aggarwal. View invariant human action recognition using histograms of 3D joints. In CVPRW, 2012.

[20]

D. Xie, S. Todorovic, and S.-C. Zhu. Inferring 'dark matter' and 'dark energy' from videos. In ICCV, 2013.

Digital Library

Cited By

Ma ZMen JZhang FNan Z(2024)Egocentric intention object prediction based on a human-like mannerEgyptian Informatics Journal10.1016/j.eij.2024.10048226(100482)Online publication date: Jun-2024
https://doi.org/10.1016/j.eij.2024.100482
Kim SHuang DXian YHilliges OVan Gool LWang X(2024)PALM: Predicting Actions through Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-73007-8_9(140-158)Online publication date: 1-Oct-2024
https://doi.org/10.1007/978-3-031-73007-8_9
Ooka COhya AYorozu A(2024)Action Transition Recognition Using Principal Component Analysis for Agricultural Robot FollowingIntelligent Autonomous Systems 1810.1007/978-3-031-44851-5_14(175-187)Online publication date: 25-Apr-2024
https://doi.org/10.1007/978-3-031-44851-5_14
Show More Cited By

Index Terms

Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Sensors and actuators
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
      2. Image and video acquisition
        Motion capture
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing

Recommendations

Multi-scale Conditional Random Fields for first-person activity recognition on elders and disabled patients

We propose a novel pervasive system to recognise human daily activities from a wearable device. The system is designed in a form of reading glasses, named 'Smart Glasses', integrating a 3-axis accelerometer and a first-person view camera. Our aim is to ...
First-Person Animal Activity Recognition from Egocentric Videos
ICPR '14: Proceedings of the 2014 22nd International Conference on Pattern Recognition

This paper introduces the concept of first-person animal activity recognition, the problem of recognizing activities from a view-point of an animal (e.g., a dog). Similar to first-person activity recognition scenarios where humans wear cameras, our ...
First-Person Activity Recognition: Feature, Temporal Structure, and Prediction

This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand `what activity others are performing to it' from ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '15: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction

March 2015

368 pages

ISBN:9781450328838

DOI:10.1145/2696454

General Chairs:
Julie A. Adams
Vanderbilt University, USA
,
William Smart
Oregon State University, USA
,
Program Chairs:
Bilge Mutlu
University of Wisconsin-Madison, USA
,
Leila Takayama
Google[x], USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
RA: IEEE Robotics and Automation Society
SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Army Research Laboratory

Conference

HRI '15

Sponsor:

HRI '15: ACM/IEEE International Conference on Human-Robot Interaction

March 2 - 5, 2015

Oregon, Portland, USA

Acceptance Rates

HRI '15 Paper Acceptance Rate 43 of 169 submissions, 25%;

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
581
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma ZMen JZhang FNan Z(2024)Egocentric intention object prediction based on a human-like mannerEgyptian Informatics Journal10.1016/j.eij.2024.10048226(100482)Online publication date: Jun-2024
https://doi.org/10.1016/j.eij.2024.100482
Kim SHuang DXian YHilliges OVan Gool LWang X(2024)PALM: Predicting Actions through Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-73007-8_9(140-158)Online publication date: 1-Oct-2024
https://doi.org/10.1007/978-3-031-73007-8_9
Ooka COhya AYorozu A(2024)Action Transition Recognition Using Principal Component Analysis for Agricultural Robot FollowingIntelligent Autonomous Systems 1810.1007/978-3-031-44851-5_14(175-187)Online publication date: 25-Apr-2024
https://doi.org/10.1007/978-3-031-44851-5_14
Shang JRyoo MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Active vision reinforcement learning under limited visual observabilityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666575(10316-10338)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666575
Qi ZWang SSu CSu LHuang QTian Q(2023)Self-Regulated Learning for Egocentric Video Activity AnticipationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.305992345:6(6715-6730)Online publication date: 1-Jun-2023
https://doi.org/10.1109/TPAMI.2021.3059923
Wang WChang FLiu CLi GWang B(2023)GA-Net: A Guidance Aware Network for Skeleton-Based Early Activity RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.313774525(1061-1073)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3137745
Wang WChang FZhang JYan RLiu CWang BShou M(2023)Magi-Net: Meta Negative Network for Early Activity PredictionIEEE Transactions on Image Processing10.1109/TIP.2023.327999132(3254-3265)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3279991
Khatri AButler ZNwogu I(2023)Analyzing Interactions in Paired Egocentric Videos2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG57933.2023.10042654(1-7)Online publication date: 5-Jan-2023
https://doi.org/10.1109/FG57933.2023.10042654
Ragusa FFurnari AFarinella G(2023)MECCANO: A multimodal egocentric dataset for humans behavior understanding in the industrial-like domainComputer Vision and Image Understanding10.1016/j.cviu.2023.103764(103764)Online publication date: Jun-2023
https://doi.org/10.1016/j.cviu.2023.103764
Xu XLi YLu C(2023)Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive TasksInternational Journal of Computer Vision10.1007/s11263-023-01850-6131:12(3272-3288)Online publication date: 13-Aug-2023
https://doi.org/10.1007/s11263-023-01850-6
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten