Learning Layered Motion Segmentations of Video

M. Pawan Kumar¹,
P. H. S. Torr¹ &
A. Zisserman²

366 Accesses
3 Altmetric
Explore all metrics

Abstract

We present an unsupervised approach for learning a layered representation of a scene from a video for motion segmentation. Our method is applicable to any video containing piecewise parametric motion. The learnt model is a composition of layers, which consist of one or more segments. The shape of each segment is represented using a binary matte and its appearance is given by the rgb value for each point belonging to the matte. Included in the model are the effects of image projection, lighting, and motion blur. Furthermore, spatial continuity is explicitly modeled resulting in contiguous segments. Unlike previous approaches, our method does not use reference frame(s) for initialization. The two main contributions of our method are: (i) A novel algorithm for obtaining the initial estimate of the model by dividing the scene into rigidly moving components using efficient loopy belief propagation; and (ii) Refining the initial estimate using α β-swap and α-expansion algorithms, which guarantee a strong local minima. Results are presented on several classes of objects with different types of camera motion, e.g. videos of a human walking shot with static or translating cameras. We compare our method with the state of the art and demonstrate significant improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Agarwal, A., & Triggs, B. (2004). Tracking articulated motion using a mixture of autoregressive models. In ECCV (Vol. III, pp. 54–65).
Black, M., & Fleet, D. (2000). Probabilistic detection and tracking of motion discontinuities. International Journal of Computer Vision, 38, 231–245.
Article MATH Google Scholar
Blake, A., Rother, C., Brown, M., Perez, P., & Torr, P. H. S. (2004). Interactive image segmentation using an adaptive GMMRF model. In ECCV (Vol. I, pp. 428–441).
Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV (Vol. I, pp. 105–112).
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Article Google Scholar
Cremers, D., & Soatto, S. (2003). Variational space-time motion segmentation. In ICCV (Vol. II, pp. 886–892).
Felzenszwalb, P. F., & Huttenlocher, D. P. (2003). Fast algorithms for large state space HMMs with applications to web usage analysis. In NIPS.
Jojic, N., & Frey, B. (2001). Learning flexible sprites in video layers. In CVPR (Vol. 1, pp. 199–206).
Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.
Article Google Scholar
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2004). Learning layered pictorial structures from video. In ICVGIP (pp. 148–153).
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2005a). Learning layered motion segmentations of video. In ICCV (Vol. I, pp. 33–40).
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2005b). OBJ CUT. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 18–25).
Lafferty, J., McCallum, A., & Pereira, F. (2005). Conditional random fields: probabilistic models for segmenting and labelling sequence data. In ICML.
Magee, D. R., & Boyle, R. D. (2002). Detecting lameness using re-sampling condensation and multi-stream cyclic hidden Markov models. Image and Vision Computing, 20(8), 581–594.
Article Google Scholar
Pearl, J. (1998). Probabilistic reasoning in intelligent systems: networks of plausible inference. Los Altos: Kauffman.
Google Scholar
Ramanan, D., & Forsyth, D. A. (2003). Using temporal coherence to build models of animals. In ICCV (pp. 338–345).
Sidenbladh, H., & Black, M. J. (2003). Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1), 181–207.
Article Google Scholar
Torr, P. H. S., & Zisserman, A. (1999). Feature based methods for structure and motion estimation. In W. Triggs, A. Zisserman, & R. Szeliski (Eds.). International workshop on vision algorithms (pp. 278–295).
Torr, P. H. S., Szeliski, R., & Anandan, P. (2001). An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 297–304.
Article Google Scholar
Vogiatzis, G., Torr, P. H. S., Seitz, S., & Cipolla, R. (2004). Reconstructing relief surfaces. In BMVC (pp. 117–126).
Wang, J., & Adelson, E. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638.
Article Google Scholar
Weiss, Y., & Adelson, E. A unified mixture framework for motion segmentation. In CVPR (pp. 321–326).
Williams, C., & Titsias, M. (2004). Greedy learning of multiple objects in images using robust statistics and factorial learning. Neural Computation, 16(5), 1039–1062.
Article MATH Google Scholar
Wills, J., Agarwal, S., & Belongie, S. (2003). What went where. In CVPR (pp. I:37–44).
Winn, J., & Blake, A. (2004). Generative affine localisation and tracking. In NIPS (pp. 1505–1512).

Download references

Author information

Authors and Affiliations

Department of Computing, Oxford Brookes University, Oxford, UK
M. Pawan Kumar & P. H. S. Torr
Department of Eng. Science, University of Oxford, Oxford, UK
A. Zisserman

Authors

M. Pawan Kumar
View author publications
You can also search for this author in PubMed Google Scholar
P. H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar
A. Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Pawan Kumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pawan Kumar, M., Torr, P.H.S. & Zisserman, A. Learning Layered Motion Segmentations of Video. Int J Comput Vis 76, 301–319 (2008). https://doi.org/10.1007/s11263-007-0064-x

Download citation

Received: 09 September 2006
Accepted: 18 May 2007
Published: 27 July 2007
Issue Date: March 2008
DOI: https://doi.org/10.1007/s11263-007-0064-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extending Layered Models to 3D Motion

Modeling Blurred Video with Layers

Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning Layered Motion Segmentations of Video

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extending Layered Models to 3D Motion

Modeling Blurred Video with Layers

Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation