[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding

Published: 01 October 2010 Publication History

Abstract

We propose a novel approach to understanding activities from their partial observations monitored through multiple non-overlapping cameras separated by unknown time gaps. In our approach, each camera view is first decomposed automatically into regions based on the correlation of object dynamics across different spatial locations in all camera views. A new Cross Canonical Correlation Analysis (xCCA) is then formulated to discover and quantify the time delayed correlations of regional activities observed within and across multiple camera views in a single common reference space. We show that learning the time delayed activity correlations offers important contextual information for (i) spatial and temporal topology inference of a camera network; (ii) robust person re-identification and (iii) global activity interpretation and video temporal segmentation. Crucially, in contrast to conventional methods, our approach does not rely on either intra-camera or inter-camera object tracking; it thus can be applied to low-quality surveillance videos featured with severe inter-object occlusions. The effectiveness and robustness of our approach are demonstrated through experiments on 330 hours of videos captured from 17 cameras installed at two busy underground stations with complex and diverse scenes.

References

[1]
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164-171.
[2]
Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by probability distributions. Bulletin of the Calcutta Mathematical Society, 35, 99-109.
[3]
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
[4]
Chen, T. P., Haussecker, H., Bovyrin, A., Belenov, R., Rodyushkin, K., Kuranov, A., & Eruhimov, V. (2005). Computer vision workload analysis: Case study of video surveillance systems. Intel Technology Journal, 9(2), 109-118.
[5]
Cohen, N., Gatusso, J., & MacLennan-Brown, K. (2006). CCTV operational requirements manual--is your CCTV system fit for purpose? Home Office Scientific Development Branch, version 4 (55/06) edition.
[6]
Comaniciu, D., Ramesh, V., & Meer, P. (2000). Real-time tracking of non-rigid objects using mean shift. In IEEE international conference on computer vision and pattern recognition, pp. 142-149.
[7]
Du, Y., Chen, F., & Xu, W. (2007). Human interaction representation and recognition through motion decomposition. IEEE Signal Processing Letters, 14(12), 952-955.
[8]
Friedman, N., & Russell, S. (1997). Image segmentation in video sequences: a probabilistic approach. In Uncertainty in artificial intelligence, pp. 175-181.
[9]
Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions of Information Theory, 21, 32-40.
[10]
Gheissari, N., Sebastian, T. B., Rittscher, J., & Hartley, R. (2006). Person reidentification using spatiotemporal appearance. In IEEE international conference on computer vision and pattern recognition, pp. 1528-1535.
[11]
Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In IEEE international conference on computer vision, pp. 742-749.
[12]
Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In European conference on computer vision, pp. 262-275.
[13]
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, pp. 321-377.
[14]
Hu, W., Hu, M., Zhou, X., Tan, T., Lou, J., & Maybank, S. (2006a). Principal axis-based correspondence between multiple cameras for people tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 663-671.
[15]
Hu, W., Xiao, X., Fu, Z., Xie, D., Tan, T., & Maybank, S. (2006b). A system for learning statistical motion patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1450-1464.
[16]
Javed, O., Rasheed, Z., Shafique, K., & Shah, M. (2003). Tracking across multiple cameras with disjoint views. In IEEE international conference on computer vision, pp. 952-957.
[17]
Javed, O., Shafique, K., & Shah, M. (2005). Appearance modeling for tracking in multiple non-overlapping cameras. In IEEE international conference on computer vision and pattern recognition, pp. 26-33.
[18]
Kendall, M., & Ord, J. K. (1990). Time series. Sevenoaks: Edward Arnold.
[19]
Kratz, L., & Nishino, K. (2009). Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In IEEE international conference on computer vision and pattern recognition, pp. 1446-1453.
[20]
Kruegle, H. (2006). CCTV surveillance: video practices and technology. Stoneham: Butterworth-Heinemann.
[21]
Lee, L., Romano, R., & Stein, G. (2000). Monitoring activities from multiple video streams: establishing a common coordinate frame. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 758-768.
[22]
Li, J., Gong, S., & Xiang, T. (2008). Scene segmentation for behaviour correlation. In European conference on computer vision, pp. 383-395.
[23]
Liao, T. W. (2005). Clustering of time series data--a survey. Pattern Recognition, 38(11), 1857-1874.
[24]
Loy, C. C., Xiang, T., & Gong, S. (2009). Multi-camera activity correlation analysis. In IEEE international conference on computer vision and pattern recognition, pp. 1988-1995.
[25]
Makris, D., Ellis, T., & Black, J. (2004). Bridging the gaps between cameras. In IEEE international conference on computer vision and pattern recognition, pp. 205-210.
[26]
Murphy, K. P. (2002). Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California at Berkeley, Computer Science Division.
[27]
Neapolitan, R. E. (2003). Learning Bayesian network. New York: Prentice Hall.
[28]
Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. In Advances in neural information processing systems, pp. 849-856.
[29]
Oliver, N., Rosario, B., & Pentland, A. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831-843.
[30]
Pilet, J., Strecha, C., & Fua, P. (2008). Making background subtraction robust to sudden illumination changes. In European conference on computer vision, pp. 567-580.
[31]
Prosser, B., Gong, S., & Xiang, T. (2008). Multi-camera matching using bi-directional cumulative brightness transfer functions. In British machine vision conference.
[32]
Russell, D., & Gong, S. (2006). Minimum cuts of a time-varying background. In British machine vision conference, pp. 809-818.
[33]
Saleemi, I., Shafique, K., & Shah, M. (2009). Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1472-1485.
[34]
Stauffer, C., & Grimson, W. E. L. (2000). Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 747-757.
[35]
Sung, K., Hwang, Y., & Kweon, I. (2008). Robust background maintenance for dynamic scenes with global intensity level changes. In International conference on ubiquitous robots and ambient intelligence, pp. 759-762.
[36]
Tieu, K., Dalley, G., & Grimson, W. E. L. (2005). Inference of non-overlapping camera network topology by measuring statistical dependence. In IEEE international conference on computer vision, pp. 1842-1849.
[37]
van den Hengel, A., Dick, A., & Hill, R. (2006). Activity topology estimation for large networks of cameras. In IEEE conference on advanced video and signal based surveillance.
[38]
Wang, X., Ma, X., & Grimson, W. E. L. (2009). Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 539-555.
[39]
Wang, X., Tieu, K., & Grimson, W. E. L. (2010). Correspondence-free activity analysis and scene modeling in multiple camera views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 56-71.
[40]
Xie, B., Ramesh, V., & Boult, T. (2004). Sudden illumination change detection using order consistency. Image and Vision Computing, 22(2), 117-125.
[41]
Yang, Y., Liu, J., & Shah, M. (2009). Video scene understanding using multi-scale analysis. In International conference of computer vision.
[42]
Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Advances in neural information processing systems, pp. 1601-1608.
[43]
Zelniker, E. E., Gong, S., & Xiang, T. (2008). Global abnormal behaviour detection using a network of CCTV cameras. In IEEE international workshop on visual surveillance.
[44]
Zheng, W., Gong, S., & Xiang, T. (2009). Associating groups of people. In British machine vision conference.
[45]
Zhou, H., & Kimber, D. (2006). Unusual event detection via multicamera video mining. In IEEE international conference on pattern recognition, pp. 1161-1166.
[46]
Zivkovic, Z., & van der Heijden, F. (2006). Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, 27(7), 773-780.

Cited By

View all
  • (2024)LSTKCProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i14.29554(16202-16210)Online publication date: 20-Feb-2024
  • (2024)CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-IdentificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370103621:1(1-20)Online publication date: 21-Oct-2024
  • (2024)Style Variable and Irrelevant Learning for Generalizable Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367100320:9(1-22)Online publication date: 6-Jun-2024
  • Show More Cited By
  1. Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image International Journal of Computer Vision
    International Journal of Computer Vision  Volume 90, Issue 1
    October 2010
    129 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 October 2010

    Author Tags

    1. Camera topology inference
    2. Correlation modelling
    3. Multi-camera activity modelling
    4. Person re-identification
    5. Time delay estimation
    6. Visual surveillance

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)LSTKCProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i14.29554(16202-16210)Online publication date: 20-Feb-2024
    • (2024)CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-IdentificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370103621:1(1-20)Online publication date: 21-Oct-2024
    • (2024)Style Variable and Irrelevant Learning for Generalizable Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367100320:9(1-22)Online publication date: 6-Jun-2024
    • (2024)Mitigate Catastrophic Remembering via Continual Knowledge Purification for Noisy Lifelong Person Re-IdentificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681235(5790-5799)Online publication date: 28-Oct-2024
    • (2024)ReFID: Reciprocal Frequency-aware Generalizable Person Re-identification via Decomposition and FilteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364368420:7(1-20)Online publication date: 16-Feb-2024
    • (2024)Generalizable Metric Network for Cross-Domain Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.339541134:10_Part_1(9039-9052)Online publication date: 1-Oct-2024
    • (2024)Person re‐identification via deep compound eye network and pose repair moduleIET Computer Vision10.1049/cvi2.1228218:6(826-841)Online publication date: 4-Apr-2024
    • (2024)Multilinear subspace learning for Person Re-Identification based fusion of high order tensor featuresEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107521128:COnline publication date: 14-Mar-2024
    • (2024)IGMGComputer Vision and Image Understanding10.1016/j.cviu.2023.103905240:COnline publication date: 1-Mar-2024
    • (2024)Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge ConsolidationInternational Journal of Computer Vision10.1007/s11263-024-02110-x132:11(4850-4865)Online publication date: 1-Nov-2024
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media