Article

Joint segmentation and classification of human actions in video

Authors:

Minh Hoai,

Zhen-Zhong Lan,

F. De la TorreAuthors Info & Claims

CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Pages 3265 - 3272

https://doi.org/10.1109/CVPR.2011.5995470

Published: 20 June 2011 Publication History

Abstract

Automatic video segmentation and action recognition has been a long-standing problem in computer vision. Much work in the literature treats video segmentation and action recognition as two independent problems; while segmentation is often done without a temporal model of the activity, action recognition is usually performed on pre-segmented clips. In this paper we propose a novel method that avoids the limitations of the above approaches by jointly performing video segmentation and action recognition. Unlike standard approaches based on extensions of dynamic Bayesian networks, our method is based on a discriminative temporal extension of the spatial bag-of-words model that has been very popular in object recognition. The classification is performed robustly within a multi-class SVM framework whereas the inference over the segments is done efficiently with dynamic programming. Experimental results on honeybee, Weizmann, and Hollywood datasets illustrate the benefits of our approach compared to state-of-the-art methods.

Cited By

View all

Jing LXue YYan XZheng CWang DZhang RWang ZFang HZhao BLi ZWooldridge MDy JNatarajan S(2024)X4D-SceneFormerProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i3.28045(2670-2678)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i3.28045
Intharah TTurmukhambetov DBrostow G(2019)HILCACM Transactions on Interactive Intelligent Systems10.1145/32345089:2-3(1-27)Online publication date: 18-Mar-2019
https://dl.acm.org/doi/10.1145/3234508
Ling MGeng X(2019)Soft video parsing by label distribution learningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-8015-y13:2(302-317)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s11704-018-8015-y
Show More Cited By

Recommendations

Human action segmentation and recognition via motion and shape analysis

In this paper, we present an automated video analysis system which addresses segmentation and detection of human actions in an indoor environment, such as a gym. The system aims at segmenting different movements from the input video and recognizing the ...
A discriminative structural model for joint segmentation and recognition of human actions

Achieving joint segmentation and recognition of continuous actions in a long-term video is a challenging task due to the varying durations of actions and the complex transitions of multiple actions. In this paper, a novel discriminative structural model ...
Silhouette-based human action recognition using sequences of key poses

In this paper, a human action recognition method is presented in which pose representation is based on the contour points of the human silhouette and actions are learned by making use of sequences of multi-view key poses. Our contribution is twofold. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

June 2011

3558 pages

ISBN:9781457703942

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 June 2011

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jing LXue YYan XZheng CWang DZhang RWang ZFang HZhao BLi ZWooldridge MDy JNatarajan S(2024)X4D-SceneFormerProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i3.28045(2670-2678)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i3.28045
Intharah TTurmukhambetov DBrostow G(2019)HILCACM Transactions on Interactive Intelligent Systems10.1145/32345089:2-3(1-27)Online publication date: 18-Mar-2019
https://dl.acm.org/doi/10.1145/3234508
Ling MGeng X(2019)Soft video parsing by label distribution learningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-8015-y13:2(302-317)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s11704-018-8015-y
Wei ZWang BHoai MZhang JLin ZShen XMěch RSamaras D(2018)Sequence-to-segments networks for segment detectionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327269(3511-3520)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327269
Jain HHarit G(2018)Leveraging information from imperfect examplesProceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3293353.3293416(1-8)Online publication date: 18-Dec-2018
https://dl.acm.org/doi/10.1145/3293353.3293416
Liu CHou JWu XJia Y(2018)A discriminative structural model for joint segmentation and recognition of human actionsMultimedia Tools and Applications10.1007/s11042-018-6189-977:24(31627-31645)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1007/s11042-018-6189-9
Xiao QSong R(2018)Action recognition based on hierarchical dynamic Bayesian networkMultimedia Tools and Applications10.1007/s11042-017-4614-077:6(6955-6968)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1007/s11042-017-4614-0
Geng XLing MSingh SMarkovitch S(2017)Soft video parsing by label distribution learningProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298239.3298434(1331-1337)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3298239.3298434
Zhou LNagahashi H(2017)Real-time Action Recognition Based on Key Frame DetectionProceedings of the 9th International Conference on Machine Learning and Computing10.1145/3055635.3056569(272-277)Online publication date: 24-Feb-2017
https://dl.acm.org/doi/10.1145/3055635.3056569
Intharah TTurmukhambetov DBrostow GPapadopoulos GKuflik TChen FDuarte CFu W(2017)Help, It Looks ConfusingProceedings of the 22nd International Conference on Intelligent User Interfaces10.1145/3025171.3025176(233-243)Online publication date: 7-Mar-2017
https://dl.acm.org/doi/10.1145/3025171.3025176
Show More Cited By

Abstract

Cited By

Recommendations

Human action segmentation and recognition via motion and shape analysis

A discriminative structural model for joint segmentation and recognition of human actions

Silhouette-based human action recognition using sequences of key poses

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations