Abstract
A popular framework for the interpretation of image sequences is the layers or sprite model, see e.g. [15], [6] . Jojic and Frey [8] provide a generative probabilistic model framework for this task, but their algorithm is slow as it needs to search over discretized transformations (e.g. translations, or affines) for each layer simultaneously. Exact computation with this model scales exponentially with the number of objects, so Jojic and Frey used an approximate variational algorithm to speed up inference. Williams and Titsias [16] proposed an alternative sequential algorithm for the extraction of objects one at a time using a robust statistical method, thus avoiding the combinatorial explosion.
In this chapter we elaborate on our sequential algorithm in the following ways: Firstly, we describe a method to speed up the computation of the transformations based on approximate tracking of the multiple objects in the scene. Secondly, for sequences where the motion of an object is large so that different views (or aspects) of the object are visible at different times in the sequence, we learn appearance models of the different aspects. We demonstrate our method on four video sequences, including a sequence where we learn articulated parts of a human body.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allan, M., Titsias, M.K., Williams, C.K.I.: Fast Learning of Sprites using Invariant Features. In: Proceedings of the British Machine Vision Conference 2005, pp. 40–49 (2005)
Black, M.J., Jepson, A.D.: EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation. In: Proc. ECCV, pp. 329–342 (1996)
Darrell, T., Pentland, A.P.: Cooperative Robust Estimation Using Layers of Support. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(5), 474–487 (1995)
Fitzgibbon, A., Zisserman, A.: On Affine Invariant Clustering and Automatic Cast Listing in Movies. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 304–320. Springer, Heidelberg (2002)
Frey, B.J., Jojic, N.: Transformation Invariant Clustering Using the EM Algorithm. IEEE Trans Pattern Analysis and Machine Intelligence 25(1), 1–17 (2003)
Irani, M., Rousso, B., Peleg, S.: Computing Occluding and Transparent Motions. International Journal of Computer Vision 12(1), 5–16 (1994)
Jepson, A.D., Fleet, D.J., Black, M.J.: A Layered Motion Representation with Occlusion and Compact Spatial Support. In: ECCV 2002. LNCS, vol. 2353, pp. 692–706. Springer, Heidelberg (2002)
Jojic, N., Frey, B.J.: Learning Flexible Sprites in Video Layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2001. IEEE Computer Society Press, Kauai (2001)
Kumar, M.P., Torr, P.H.S., Zisserman, A.: Learning layered pictorial structures from video. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, pp. 158–163 (2004)
Sawhney, H.S., Ayer, S.: Compact Representations of Videos Through Dominant and Multiple Motion Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(8), 814–830 (1996)
Tao, H., Sawhney, H.S., Kumar, R.: Dynamic Layer Representation with Applications to Tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.II: 134–141 (2000)
Titsias, M.K., Williams, C.K.I.: Fast unsupervised greedy learning of multiple objects and parts from video. In: Proc. Generative-Model Based Vision Workshop (2004)
Titsias, M.K.: Unsupervised Learning of Multiple Objects in Images. Ph.D thesis, School of Informatics, University of Edinburgh (2005)
Torr, P.H.S.: Geometric motion segmentation and model selection. Phil. Trans. Roy. Soc. Lond. A 356, 1321–1340 (1998)
Wang, J.Y.A., Adelson, E.H.: Representing Moving Images with Layers. IEEE Transactions on Image Processing 3(5), 625–638 (1994)
Williams, C.K.I., Titsias, M.K.: Greedy Learning of Multiple Objects in Images using Robust Statistics and Factorial Learning. Neural Computation 16(5), 1039–1062 (2004)
Wills, J., Agarwal, S., Belongie, S.: What Went Where. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2003, pp.I: 37–44 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Titsias, M.K., Williams, C.K.I. (2006). Sequential Learning of Layered Models from Video. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_29
Download citation
DOI: https://doi.org/10.1007/11957959_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68794-8
Online ISBN: 978-3-540-68795-5
eBook Packages: Computer ScienceComputer Science (R0)