Multi-sensory and Multi-modal Fusion for Sentient Computing

Christopher Town¹

285 Accesses
14 Citations
Explore all metrics

Abstract

This paper presents an approach to multi-sensory and multi-modal fusion in which computer vision information obtained from calibrated cameras is integrated with a large-scale sentient computing system known as “SPIRIT”. The SPIRIT system employs an ultrasonic location infrastructure to track people and devices in an office building and model their state. Vision techniques include background and object appearance modelling, face detection, segmentation, and tracking modules. Integration is achieved at the system level through the metaphor of shared perceptions, in the sense that the different modalities are guided by and provide updates to a shared world model. This model incorporates aspects of both the static (e.g. positions of office walls and doors) and the dynamic (e.g. location and appearance of devices and people) environment.

Fusion and inference are performed by Bayesian networks that model the probabilistic dependencies and reliabilities of different sources of information over time. It is shown that the fusion process significantly enhances the capabilities and robustness of both sensory modalities, thus enabling the system to maintain a richer and more accurate world model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Addlesee, M., Curwen, R., Hodges, S., Newman, J., Steggles, P., Ward, A., and Hopper, A. 2001. Implementing a sentient computing system. IEEE Computer, 34(8):50–56.
Google Scholar
Bouguet, J.-Y. Matlab calibration toolbox. http://www.vision.caltech.edu/bouguetj.
Cattin, P., Zlatnik, D., and Borer, R. 2001. Biometric System using Human Gait. Mechatronics and Machine Vision in Practice, (M2VIP).
Cerney, M. and Vance, J. 2005. Gesture recognition in virtual environments: A review and framework for future development. Technical report, Human Computer Interaction Center, Iowa State University.
Choudhury, T., Rehg, J., Pavlovic, V., and Pentland, A. 2002. Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection. In Proc. Int. Conference on Pattern Recognition.
Crowley, J., Coutaz, J., Rey, G., and Reignier, P. 2002. Perceptual components for context aware computing. In Proc. Ubicomp.
Dey, A. 2001. Understanding and using context. Personal and Ubiquitous Computing, 5(1):4–7.
Article Google Scholar
De la Torre, F. and Black, M. 2001. Robust principal component analysis for computer vision. In Proc. International Conference on Computer Vision.
De la Torre, F. and Black, M. 2003. Robust parameterized component analysis: Theory and applications to 2d facial appearance models. Computer Vision and Image Understanding.
Erickson, T. 2002. Some problems with the notion of context-aware computing. Communications of the ACM, 45(2):102–104.
Article Google Scholar
Fritsch, J., Kleinehagenbrock, M., Lang, S., Plotz, T., Fink, G., and Sagerer, G. 2003. Multi-modal anchoring for human-robot interaction. Robotics and Autonomous Systems, 43(2).
Garcia, C. and Tziritas, G. 1999. Face detection using quantized skin color regions merging and wavelet packet analysis. IEEE Transactions on Multimedia, 1(3):264–277, 1999.
Article Google Scholar
Gavrila, D. 1999. The visual analysis of human movement: A survey. Computer Vision and Image Understanding: CVIU, 73(1):82–98.
Article MATH Google Scholar
Genco, A. 2005. Three Step Bluetooth Positioning. In LNCS 3479: Location- and Context-Awareness.
Hanbury, A. 2003. Circular statistics applied to colour images. 8th Computer Vision Winter Workshop.
Harle, R. 2004. Maintaining World Models in Context-Aware Environments. PhD thesis, University of Cambridge Engineering Department.
Harle, R. and Hopper, A. 2005. Deploying and evaluating a location-aware system. In Proc. MobiSys 2005.
Harter, A. and Hopper, A. 1994. A distributed location system for the active office. IEEE Network, 8(1).
Hazas M., Scott J., and Krumm, J. 2004. Location-aware computing comes of age. IEEE Computer, pp. 95–97.
Hopper, A. 2000. Sentient computing—the Royal society clifford paterson lecture. Philosophical Transactions of the Royal Society of London, 358(1773):2349–2358.
Article MATH Google Scholar
Ipina, D. and Hopper, A. 2002. TRIP: A low-cost vision-based location system for ubiquitous computing. Personal and Ubiquitous Computing, 6(3):206–219.
Article Google Scholar
Isard, M. and Blake, A. 1998. Condensation—Conditional density propagation for visual tracking. Int. Journal of Computer Vision, 29(1):5–28.
Article Google Scholar
Mansley, K., Beresford, A., and Scott, D. 2004. The Carrot Approach: Encouraging use of location systems. In Proceedings of UbiComp. Springer.
McKenna, S., Raja, Y., and Gong, S. 1998. Object tracking using adaptive color mixture models. In Proc. Asian Conference on Computer Vision, pp. 615–622.
Nummiaro, K., Koller-Meier, E., and Gool, L.V. 2003. An adaptive color-based particle filter. Image and Vision Computing, 21:99–110.
Article Google Scholar
Perez, P., Vermaak, J., and Blake, A. 2004. Data fusion for visual tracking with particles. In IEEE Trans. on Pattern Analysis and Machine Intelligence.
Priyantha, N., Allen, K., Balakrishnan, H., and Teller, S.J. 2001. The cricket compass for context-aware mobile applications. In Mobile Computing and Networking, pp. 1–14.
Sherrah, J. and Gong, S. 2001. Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects. In Proc. International Conference on Computer Vision.
Sinclair, D. 2000. Smooth region structure: Folds, domes, bowls, ridges, valleys and slopes. In Proc. Conference on Computer Vision and Pattern Recognition, pp. 389–394.
Skocaj, D. and Leonardis, A. 2002. Robust continuous subspace learning and recognition. In Proc. Int. Electrotechnical and Computer Science Conference.
Spengler, M. and Schiele, B. 2001. Towards robust multi-cue integration for visual tracking. Lecture Notes in Computer Science, 2095:93–106.
Article MATH Google Scholar
Stillman, S. and Essa, I. 2001. Towards reliable multimodal sensing in aware environments. In Proc. Perceptual User Interfaces Workshop, ACM UIST.
Murphy, A.K., Freeman, W., and Mark, A. 2003. Context-based vision system for place and object recognition. In Proc. International Conference on Computer Vision
Town, C.P. 2004a. Ontology based Visual Information Processing. PhD thesis, University of Cambridge.
Town, C.P. 2004b. Ontology-driven Bayesian networks for dynamic scene understanding. In Proc. Int. Workshop on Detection and Recognition of Events in Video (at CVPR04).
Toyama, K. and Horvitz, E. 2000. Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proc. Asian Conference on Computer Vision.
Turk, M. 2004. Computer vision in the interface. Communications of the ACM, 47(1).

Download references

Author information

Authors and Affiliations

University of Cambridge Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
Christopher Town

Authors

Christopher Town
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Town.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Town, C. Multi-sensory and Multi-modal Fusion for Sentient Computing. Int J Comput Vision 71, 235–253 (2007). https://doi.org/10.1007/s11263-006-7834-8

Download citation

Received: 20 March 2005
Revised: 27 December 2005
Accepted: 21 February 2006
Published: 01 July 2006
Issue Date: February 2007
DOI: https://doi.org/10.1007/s11263-006-7834-8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Object Recognition Module for Social Robots

Context-Based Fusion of Physical and Human Data for Level 5 Information Fusion

A Voting-Based Sensor Fusion Approach for Human Presence Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-sensory and Multi-modal Fusion for Sentient Computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Object Recognition Module for Social Robots

Context-Based Fusion of Physical and Human Data for Level 5 Information Fusion

A Voting-Based Sensor Fusion Approach for Human Presence Detection

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation