[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

SEVA: Sensor-enhanced video annotation

Published: 14 August 2009 Publication History

Abstract

In this article, we study how a sensor-rich world can be exploited by digital recording devices such as cameras and camcorders to improve a user's ability to search through a large repository of image and video files. We design and implement a digital recording system that records identities and locations of objects (as advertised by their sensors) along with visual images (as recorded by a camera). The process, which we refer to as Sensor-Enhanced Video Annotation (SEVA), combines a series of correlation, interpolation, and extrapolation techniques. It produces a tagged stream that later can be used to efficiently search for videos or frames containing particular objects or people. We present detailed experiments with a prototype of our system using both stationary and mobile objects as well as GPS and ultrasound. Our experiments show that: (i) SEVA has zero error rates for static objects, except very close to the boundary of the viewable area; (ii) for moving objects or a moving camera, SEVA only misses objects leaving or entering the viewable area by 1--2 frames; (iii) SEVA can scale to 10 fast-moving objects using current sensor technology; and (iv) SEVA runs online using relatively inexpensive hardware.

References

[1]
Adams, B., Phung, D., and Venkatesh, S. 2006. Extraction of social context and application to personal multimedia exploration. In Proceedings of the 14th Annual ACM International Conference on Multimedia (MULTIMEDIA '06). ACM Press, New York, 987--996.
[2]
Ahern, S., Eckles, D., Good, N., King, S., Naaman, M., and Nair, R. 2007. Over-exposed? Privacy patterns and considerations in online and mobile photo sharing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 357--366.
[3]
Aizawa, K., Tancharoen, D., Kawasaki, S., and Yamasaki, T. 2004. Efficient retrieval of life log based on context and content. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), 22--31.
[4]
Appan, P. and Sundaram, H. 2004. Networked multimedia event exploration. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MULTIMEDIA '04). ACM Press, New York, 40--47.
[5]
Bahl, P. and Padmanabhan, V. N. 2000. Radar: An in-building rf-based user location and tracking system. In Proceedings of the 19th Annual Joint Conference of the IEEE Computer and Communications Societies (InfoCom'00), vol. 2, 775--784.
[6]
Bajaj, R., Ranaweera, S. L., and Agrawal, D. P. 2002. Gps: Location-tracking technology. Comput. 35, 4, 92--94.
[7]
Barry, B. 2005. Mindful documentary. Ph.D. thesis, Massachusetts Institute of Technology.
[8]
Davis, M., King, S., Good, N., and Sarvas, R. 2004. From context to content: Leveraging context to infer media metadata. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 188--195.
[9]
Devore, J. L. 1999. Probability and Statistics for Engineering and the Sciences, 5th Ed. Brooks/Cole.
[10]
Dourish, P. 2004. What we talk about when we talk about context. Personal Ubiquitous Comput. 8, 1, 19--30.
[11]
Ellis, D. P. W. and Lee, K. 2004. Minimal-impact audio-based personal archives. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), 39--47.
[12]
Fan, J., Gao, Y., and Luo, H. 2004. Multi-level annotation of natural scenes using dominant image components and semantic concepts. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 540--547.
[13]
Feng, H., Shi, R., and Chua, T. 2004. A bootstrapping framework for annotating and retrieving www images. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 960--967.
[14]
Finkenzeller, K. 2003. RFID Handbook: Fundamentals and Applications in Contactless Smart Cards and Identification, 2nd Ed. John Willey & Sons.
[15]
Gemmell, J., Bell, G., Lueder, R., Drucker, S., and Wong, C. 2002. Mylifebits: Fulfilling the memex vision. In Proceedings of the 10th Annual ACM International Conference on Multimedia (MM'02), 235--238.
[16]
Gemmell, J., Williams, L., Wood, K., Lueder, R., and Bell, G. 2004. Passive capture and ensuing issues for a personal lifetime store. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), 48--55.
[17]
geocoder. Find the latitude and longitude of any us address. http://www.geocoder.us.
[18]
gpsdrive: Gpsdrive 2.09. http://www.gpsdrive.cc/.
[19]
Grimm, R. 2002. System support for pervasive applications. Ph.D. thesis, University of Washington, Department of Computer Science and Engineering.
[20]
Hähnel, D., Burgard, W., Fox, D., Fishkin, K., and Philipose, M. 2004. Mapping and localization with rfid technology. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'05), 1015--1020.
[21]
Harter, A., Hopper, A., Steggles, P., Ward, A., and Webster, P. 1999. The anatomy of a context-aware application. In Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom'99), 59--68.
[22]
Hightower, J. and Borriello, G. 2001. Location systems for ubiquitous computing. Comput. 34, 8, 57--66.
[23]
Hightower, J., Want, R., and Borriello, G. 2000. Spoton: An indoor 3D location sensing technology based on rf signal strength. Tech. rep. 00-02-02, University of Washington.
[24]
Hill, J. and Culler, D. 2002. Mica: A wireless platform for deeply embedded networks. IEEE Micro 22, 6, 1224.
[25]
Hong, J. I. and Landay, J. A. 2004. An architecture for privacy-sensitive ubiquitous computing. In Proceedings of the 2nd International Conference on Mobile Systems, Applications, and Services, 177--189.
[26]
Jin, R., Chai, J. Y., and Si, L. 2004. Effective automatic image annotation via a coherent language model and active learning. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 892--899.
[27]
Johanson, B., Fox, A., and Winograd, T. 2002. The interactive workspaces project: Experiences with ubiquitous computing rooms. IEEE Pervasive Comput. 1, 2.
[28]
Kindberg, T. and et. al. 2002. People, places, things: Web presence for the real world. Mobile Netw. 7, 5.
[29]
Li, B. and Goh, K. 2003. Confidence-based dynamic ensemble for image annotation and semantics discovery. In Proceedings of the 11th Annual ACM International Conference on Multimedia (MM'03), 195--206.
[30]
Liu, X., Corner, M., and Shenoy, P. 2005. Seva: Sensor-enhanced video annotation. In Proceedings of the 13th ACM Annual Conference on Multimedia (MM'05), 618--627.
[31]
Liu, X., Corner, M., and Shenoy, P. 2006. Ferret: Rfid localization for pervasive multimedia. In Proceedings of the 8th International Conference on Ubiquitous Computing (UbiComp'06).
[32]
Lymberopoulos, D. and Savvides, A. 2005. XYZ: A motion-enabled, power aware sensor node platform for distributed sensor network applications. In Proceedings of Information Processing in Sensor Networks (ISPN).
[33]
Mainwaring, A., Polastre, J., Szewczyk, R., Culler, D., and Anderson, J. 2002. Wireless sensor networks for habitat monitoring. In Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications (WSNA'02), 88--97.
[34]
Manjunath, B. S., Salembier, P., and Sikora, T. 2002. Introduction to MPEG 7: Multimedia Content Description Language, 4th Ed. John Wiley & Sons.
[35]
Mealling, M. 2003. Auto-id object name service (ons) 1.0. Working Draft 12.
[36]
Naaman, M., Harada, S., Wang, Q., Garcia-Molina, H., and Paepcke, A. 2004. Context data in geo-referenced digital photo collections. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 196--203.
[37]
Naaman, M., Paepcke, A., and Garcia-Molina, H. 2003. From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (CoopIS'03), 196--217.
[38]
Nack, F. and Putz, W. 2004. Saying what it means: Semi-automated (News) media annotation. Multimedia Tools and Applications 22, 3, 263--302.
[39]
Ni, L. M., Liu, Y., Lau, Y. C., and Patil, A. P. 2003. Landmarc: Indoor location sensing using active rfid. In Proceedings of the 1st IEEE International Conference on Pervasive Computing and Communications (PerCom'03). 407--417.
[40]
Polastre, J., Szewczyk, R., and Culler, D. 2005. Telos: Enabling ultra-low power wireless research. In Proceedings of the 4th International Conference on Information Processing in Sensor Networks: Special Track on Platform Tools and Design Methods for Network Embedded Sensors (IPSN/SPOTS).
[41]
Priyantha, N. B., Chakraborty, A., and Balakrishnan, H. 2000. The cricket location-support system. In Proceedings of the 6th Annual ACM International Conference on Mobile Computing and Networking (MobiCom'00), 32--43.
[42]
Roman, M., Hess, C., and Campbell, R. 2002. Gaia: An oo middleware infrastructure for ubiquitous computing environments. In ECOOP Workshop on Object-Orientation and Operating Systems.
[43]
Simon, D. 2006. Optimal State Estimation, 1st Ed. Wiley-Interscience.
[44]
Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Patt. Anal. Mach. Intell. 22, 12, 1349--1380.
[45]
Smith, A., Balakrishnan, H., Goraczko, M., and Priyantha, N. 2004. Tracking moving devices with the cricket location system. In Proceedings of the 2nd ACM International Conference on Mobile Systems, Applications, and Services (MobiSys'04), 190--202.
[46]
Su, N. M., Park, H., Bostrom, E., Burke, J., Srivastava, M. B., and Estrin, D. 2004. Augmemting film and video footage with sensor data. In Proceedings of the 2nd IEEE Annual Conference on Pervasive Computing and Communications (PerComm'04), 3--12.
[47]
Toyama, K., Logan, R., and Roseway, A. 2003. Geographic location tags on digital images. In Proceedings of the 11th Annual ACM International Conference on Multimedia (MM'03), 156--166.
[48]
Want, R., Hopper, A., Falcao, V., and Gibbons, J. 1992. The active badge location system. ACM Trans. Inf. Syst. 10, 1, 91--102.
[49]
Zhang, L., Hu, Y., Li, M., Ma, W., and Zhang, H. 2004. Effective propagation for face annotation in family albums. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MM'04), 716--723.

Cited By

View all
  • (2025)A framework for automatically generating composite keywords for geo-tagged street imagesKuwait Journal of Science10.1016/j.kjs.2024.10033352:1(100333)Online publication date: Jan-2025
  • (2020)Semantic Analysis of Videos for Tags Prediction and SegmentationIndustrial Internet of Things and Cyber-Physical Systems10.4018/978-1-7998-2803-7.ch014(296-307)Online publication date: 2020
  • (2019)Towards Accurate Georeferenced Video Search With Camera Field of View ModelingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2018.284820029:6(1844-1855)Online publication date: Jun-2019
  • Show More Cited By

Index Terms

  1. SEVA: Sensor-enhanced video annotation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 5, Issue 3
    August 2009
    204 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/1556134
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2009
    Accepted: 01 May 2008
    Revised: 01 December 2007
    Received: 01 September 2006
    Published in TOMM Volume 5, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Video annotation
    2. context-based retrieval
    3. location-based services
    4. sensor-enhanced

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A framework for automatically generating composite keywords for geo-tagged street imagesKuwait Journal of Science10.1016/j.kjs.2024.10033352:1(100333)Online publication date: Jan-2025
    • (2020)Semantic Analysis of Videos for Tags Prediction and SegmentationIndustrial Internet of Things and Cyber-Physical Systems10.4018/978-1-7998-2803-7.ch014(296-307)Online publication date: 2020
    • (2019)Towards Accurate Georeferenced Video Search With Camera Field of View ModelingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2018.284820029:6(1844-1855)Online publication date: Jun-2019
    • (2018)Spatio-Temporal Metadata Querying for CCTV Video RetrievalProceedings of the 9th ACM SIGSPATIAL International Workshop on Indoor Spatial Awareness10.1145/3282461.3282465(7-14)Online publication date: 6-Nov-2018
    • (2017)Error distribution modeling of embedded sensors on smartphones by using laser ranger2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)10.1109/ICMEW.2017.8026218(387-392)Online publication date: Jul-2017
    • (2015)An Advanced Visibility Restoration Algorithm for Single Hazy ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/272694711:4(1-21)Online publication date: 2-Jun-2015
    • (2015)On Demand Retrieval of Crowdsourced Mobile VideoIEEE Sensors Journal10.1109/JSEN.2014.233629215:5(2632-2642)Online publication date: May-2015
    • (2015)Video Spatio-Temporal Filtering Based on Cameras and Target Objects Trajectories -- Videosurveillance Forensic FrameworkProceedings of the 2015 10th International Conference on Availability, Reliability and Security10.1109/ARES.2015.102(611-617)Online publication date: 24-Aug-2015
    • (2014)Mobile Video StreamingAdvanced Content Delivery, Streaming, and Cloud Services10.1002/9781118909690.ch7(141-158)Online publication date: 3-Oct-2014
    • (2011)Detecting and identifying people in mobile videosProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2071927(1017-1020)Online publication date: 28-Nov-2011

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media