More Web Proxy on the site http://driver.im/

research-article

NoScope: optimizing neural network queries over video at scale

Authors:

Matei ZahariaAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 10, Issue 11

Pages 1586 - 1597

https://doi.org/10.14778/3137628.3137664

Published: 01 August 2017 Publication History

Abstract

Recent advances in computer vision---in the form of deep neural networks---have made it possible to query increasing volumes of video data with high accuracy. However, neural network inference is computationally expensive at scale: applying a state-of-the-art object detector in real time (i.e., 30+ frames per second) to a single video requires a $4000 GPU. In response, we present NoScope, a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search. Given a target video, object to detect, and reference neural network, NoScope automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive. NoScope cascades two types of models: specialized models that forego the full generality of the reference model but faithfully mimic its behavior for the target video and object; and difference detectors that highlight temporal differences across frames. We show that the optimal cascade architecture differs across videos and objects, so NoScope uses an efficient cost-based optimizer to search across models and cascades. With this approach, NoScope achieves two to three order of magnitude speed-ups (265-15,500x real-time) on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1--5% of state-of-the-art neural networks.

References

[1]

Typical cnn architecture. Creative Commons Attribution-Share Alike 4.0 International, Wikimedia Commons.

[2]

Cisco VNI forecast and methodology, 2015--2020. Technical report, 2016.

[3]

2017. https://fortunelords.com/youtube-statistics/.

[4]

D. J. Abadi et al. The design of the Borealis stream processing engine. In CIDR, 2005.

[5]

C.-N. E. Anagnostopoulos et al. License plate recognition from still images and video sequences: A survey. IEEE Trans. on intelligent transportation systems, 2008.

Digital Library

[6]

W. Aref et al. Video query processing in the VDBMS testbed for video database research. In ACM-MMDB, 2003.

Digital Library

[7]

F. Arman et al. Image processing on compressed data for large video databases. In ACMMM, 1993.

Digital Library

[8]

J. Ba and R. Caruana. Do deep nets really need to be deep? In NIPS, 2014.

Digital Library

[9]

B. Babcock et al. Operator scheduling in data stream systems. VLDBJ, 2004.

Digital Library

[10]

B. Babenko et al. Robust object tracking with online multiple instance learning. IEEE trans. on pattern analysis and machine intelligence, 2011.

Digital Library

[11]

S. Babu et al. Adaptive ordering of pipelined stream filters. In SIGMOD, 2004.

Digital Library

[12]

P. Bailis et al. Macrobase: Prioritizing attention in fast data. In SIGMOD, 2017.

Digital Library

[13]

P. Bailis, E. Gan, K. Rong, and S. Suri. Prioritizing attention in fast data: Principles and promise. In CIDR, 2017.

[14]

M. G. Bello. Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks. IEEE Trans. on Neural networks.

Digital Library

[15]

R. Benenson et al. Ten years of pedestrian detection, what have we learned? In ECCV, 2014.

[16]

R. Brunelli, O. Mich, and C. M. Modena. A survey on the automatic indexing of video data. Journal of visual communication and image representation, 1999.

Digital Library

[17]

C. Bucilua et al. Model compression. In KDD, 2006.

Digital Library

[18]

Z. Cai et al. Learning complexity-aware cascades for deep pedestrian detection. In ICCV, 2015.

Digital Library

[19]

S. Chandrasekaran et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, 2003.

[20]

S. Chaudhuri and V. Narasayya. Self-tuning database systems: a decade of progress. In VLDB, 2007.

Digital Library

[21]

S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. TODS, 24(2):177--228, 1999.

Digital Library

[22]

W. Chen et al. Compressing neural networks with the hashing trick. In ICML, 2015.

Digital Library

[23]

Y. Cheng et al. An exploration of parameter redundancy in deep networks with circulant projections. In ICCV, 2015.

Digital Library

[24]

Z. Chi et al. Dual deep network for visual tracking. IEEE Trans. on Image Processing, 2017.

Digital Library

[25]

D. Crankshaw et al. The missing piece in complex analytics: Low latency, scalable model management and serving with Velox, 2015.

[26]

C. Cranor et al. Gigascope: a stream database for network applications. In SIGMOD, 2003.

Digital Library

[27]

M. Danelljan et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV, 2016.

[28]

P. Dollar et al. Pedestrian detection: An evaluation of the state of the art. TPAMI, 34(4):743--761, 2012.

Digital Library

[29]

J. Donahue et al. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.

[30]

D. Du et al. Online deformable object tracking based on structure-aware hyper-graph. IEEE Trans. on Image Processing, 2016.

Digital Library

[31]

A. Esteva et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115--118, 2017.

[32]

M. Everingham et al. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.

[33]

C. Feichtenhofer et al. Spatiotemporal residual networks for video action recognition. In NIPS, 2016.

[34]

P. F. Felzenszwalb et al. Object detection with discriminatively trained part-based models. IEEE TPAMI, 2010.

Digital Library

[35]

X. Feng et al. Towards a unified architecture for in-RDBMS analytics. In SIGMOD, 2012.

Digital Library

[36]

M. Flickner et al. Query by image and video content: The QBIC system. Computer, 1995.

Digital Library

[37]

A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.

Digital Library

[38]

S. Gibbs et al. Audio/video databases: An object-oriented approach. In ICDE, 1993.

Digital Library

[39]

L. Girod et al. Wavescope: a signal-oriented data stream management system. In ICDE, 2006.

Digital Library

[40]

I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.

Digital Library

[41]

S. Han et al. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.

[42]

S. Han et al. EIE: efficient inference engine on compressed deep neural network. In ISCA, 2016.

Digital Library

[43]

W. Han et al. Seq-nms for video object detection. CoRR, abs/1602.08465, 2016.

[44]

K. He et al. Deep residual learning for image recognition. In CVPR, 2016.

[45]

G. Hinton et al. Distilling the knowledge in a neural network. NIPS, 2014.

[46]

G. Hinton and T. Tieleman. Lecture 6.5 - rmsprop. Technical report, 2012.

[47]

S. Idreos, M. L. Kersten, S. Manegold, et al. Database cracking. In CIDR, 2007.

[48]

R. Jain and A. Hampapur. Metadata in video databases. In SIGMOD, 1994.

Digital Library

[49]

H. Jiang et al. Scene change detection techniques for video database systems. Multimedia systems, 1998.

Digital Library

[50]

K. Kang et al. Object detection from video tubelets with convolutional neural networks. In CVPR, pages 817--825, 2016.

[51]

K. Kang et al. T-cnn: Tubelets with convolutional neural networks for object detection from videos. arXiv preprint arXiv:1604.02532, 2016.

[52]

V. Kastrinaki, M. Zervakis, and K. Kalaitzakis. A survey of video processing techniques for traffic applications. Image and vision computing, 2003.

[53]

J. B. Kim and H. J. Kim. Efficient region-based motion segmentation for a video monitoring system. Pattern Recognition Letters, 24(1):113--128, 2003.

Digital Library

[54]

T. Kraska et al. Mlbase: A distributed machine-learning system. In CIDR, 2013.

[55]

M. Kristan et al. The visual object tracking vot2016 challenge results. ECCV, 2016.

[56]

A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

Digital Library

[57]

M. La Cascia and E. Ardizzone. Jacob: Just a content-based query system for video databases. In ICASSP, 1996.

Digital Library

[58]

Y. LeCun et al. Deep learning. Nature, 521(7553):436--444, 2015.

[59]

J. Lee, J. Oh, and S. Hwang. Strg-index: Spatio-temporal region graph indexing for large video databases. In SIGMOD, 2005.

Digital Library

[60]

H. Li et al. A conv. neural network cascade for face detection. In CVPR, 2015.

[61]

K. Lin et al. Deep learning of binary hash codes for fast image retrieval. In CVPR, 2015.

[62]

T.-Y. Lin et al. Microsoft coco: Common objects in context. September 2014.

[63]

D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, 1999.

Digital Library

[64]

D. Lu and Q. Weng. A survey of image classification methods and techniques for improving classification performance. International journal of Remote sensing, 28(5):823--870, 2007.

Digital Library

[65]

H. Maâmatou et al. Sequential Monte Carlo filter based on multiple strategies for a scene specialization classifier. EURASIP Journal on Image and Video Processing, 2016.

[66]

J. Malik. Technical perspective: What led computer vision to deep learning? Communications of the ACM, 60(6):82--83, 2017.

Digital Library

[67]

C. Metz. AI is about to learn more like humans---with a little uncertainty. Wired, 2017. https://goo.gl/yCvSSz.

[68]

A. Mhalla et al. Faster R-CNN scene specialization with a sequential Monte-Carlo framework. In DICTA, 2016.

[69]

K. Munagala et al. Optimization of continuous queries with shared expensive filters. In SIGMOD, 2007.

Digital Library

[70]

D. Murray and A. Basu. Motion tracking with an active camera. IEEE Trans. Pattern Anal. Mach. Intell., 16(5):449--459, May 1994.

Digital Library

[71]

H. Nam et al. Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242, 2016.

[72]

V. E. Ogle and M. Stonebraker. Chabot: Retrieval from a relational database of images. Computer, 28(9):40--48, 1995.

Digital Library

[73]

J. Oh and K. A. Hua. Efficient and cost-effective techniques for browsing and indexing large video databases. In SIGMOD, 2000.

Digital Library

[74]

S. A. Papert. The summer vision project. 1966.

[75]

A. Patait and E. Young. High performance video encoding with NVIDIA GPUs. 2016 GPU Technology Conference (https://goo.gl/Bdjdgm), 2016.

[76]

J. Philbin et al. Object retrieval with large vocabularies and fast spatial matching. In CVPR, pages 1--8. IEEE, 2007.

[77]

M. Rastegari et al. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV, 2016.

[78]

B. Reagen et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ISCA, 2016.

Digital Library

[79]

J. Redmon et al. You only look once: Unified, real-time object detection. CVPR, 2016.

[80]

J. Redmon and A. Farhadi. Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016.

[81]

S. Ren et al. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015.

Digital Library

[82]

O. Russakovsky et al. Imagenet large scale visual recognition challenge. IJCV, 2015. http://image-net.org/.

Digital Library

[83]

P. G. Selinger et al. Access path selection in a relational database management system. In SIGMOD, 1979.

Digital Library

[84]

P. Sermanet et al. Pedestrian detection with unsupervised multi-stage feature learning. In CVPR, 2013.

Digital Library

[85]

Y. Sun et al. Deep conv. network cascade for facial point detection. In CVPR, 2013.

Digital Library

[86]

R. Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.

Digital Library

[87]

B. Tian, Q. Yao, Y. Gu, K. Wang, and Y. Li. Video processing techniques for traffic flow monitoring: A survey. IEEE, 2011.

[88]

J. R. Uijlings et al. Selective search for object recognition. IJCV, 2013.

Digital Library

[89]

R. Verschae et al. A unified learning framework for object detection and classification using nested cascades of boosted classifiers. Machine Vision and Applications, 19(2), 2008.

Digital Library

[90]

P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.

[91]

P. Weinzaepfel et al. Learning to track for spatio-temporal action localization. In ICCV, 2015.

Digital Library

[92]

Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, pages 2411--2418, 2013.

Digital Library

[93]

A. Yilmaz et al. Object tracking: A survey. CSUR, 2006.

Digital Library

[94]

A. Yoshitaka and T. Ichikawa. A survey on content-based retrieval for multimedia databases. TKDE, 11(1):81--93, 1999.

Digital Library

[95]

K.-H. Yu et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Comm., 2016.

[96]

J. Yue-Hei Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. In CVPR, pages 53--61, 2015.

[97]

H. Zhang et al. Live video analytics at scale with approximation and delay-tolerance. NSDI, 2017.

Digital Library

[98]

S. Zhanget et al. How far are we from solving pedestrian detection? In CVPR, 2016.

[99]

W. Zhao et al. Face recognition: A literature survey. CSUR, 2003.

Digital Library

[100]

Q. Zhu et al. Fast human detection using a cascade of histograms of oriented gradients. In CVPR. IEEE, 2006.

Digital Library

[101]

X. Zhu et al. Video data mining: Semantic indexing and event detection from the association perspective. TKDE, 17(5):665--677, 2005.

Digital Library

[102]

B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017.

Cited By

Madden SCafarella MFranklin MKraska T(2024)Databases Unbound: Querying All of the World's Bytes with AIProceedings of the VLDB Endowment10.14778/3685800.368591617:12(4546-4554)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685916
Chao DChen YKoudas NYu X(2024)Optimizing Video Queries with Declarative CluesProceedings of the VLDB Endowment10.14778/3681954.368199817:11(3256-3268)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681998
Chang CLo EYe C(2024)Biathlon: Harnessing Model Resilience for Accelerating ML Inference PipelinesProceedings of the VLDB Endowment10.14778/3675034.367505217:10(2631-2640)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675052
Show More Cited By

Recommendations

Static, Dynamic, and Hybrid Neural Networks in Forecasting Inflation

The back-propagation neural network (BPN) model has been the most popular form of artificial neural network model used for forecasting, particularly in economics and finance. It is a static (feed-forward) model which has a learning process in both hidden ...
The Training of Pi-Sigma Artificial Neural Networks with Differential Evolution Algorithm for Forecasting
Abstract
Looking at the artificial neural networks’ literature, most of the studies started with feedforward artificial neural networks and the training of many feedforward artificial neural networks models are performed with derivative-based algorithms ...
Augmentation of Elman Recurrent Network Learning with Particle Swarm Optimization
AMS '08: Proceedings of the 2008 Second Asia International Conference on Modelling & Simulation (AMS)

Despite a variety of Artificial Neural Network (ANN) categories, Backpropagation Network (BP) and Elman Recurrent Network (ERN) are the widespread modus operandi in real applications. However, there are many drawbacks in BP network, for instance, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 10, Issue 11

August 2017

432 pages

ISSN:2150-8097

Editors:
Peter Boncz
CWI
,
Ken Salem
University of Waterloo

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2017

Published in PVLDB Volume 10, Issue 11

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

125
Total Citations
View Citations
606
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)6

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Madden SCafarella MFranklin MKraska T(2024)Databases Unbound: Querying All of the World's Bytes with AIProceedings of the VLDB Endowment10.14778/3685800.368591617:12(4546-4554)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685916
Chao DChen YKoudas NYu X(2024)Optimizing Video Queries with Declarative CluesProceedings of the VLDB Endowment10.14778/3681954.368199817:11(3256-3268)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681998
Chang CLo EYe C(2024)Biathlon: Harnessing Model Resilience for Accelerating ML Inference PipelinesProceedings of the VLDB Endowment10.14778/3675034.367505217:10(2631-2640)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675052
Kittivorawong CGe YHelal YCheung A(2024)Spatialyze: A Geospatial Video Analytics System with Spatial-Aware OptimizationsProceedings of the VLDB Endowment10.14778/3665844.366584617:9(2136-2148)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665846
Jin TMittal AMo CFang JZhang CDai TKang D(2024)AIDB: a Sparsely Materialized Database for Queries using Machine LearningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663329(23-28)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663329
Kong ZXu QHu YOkoshi TKo JLiKamWa R(2024)ARISE: High-Capacity AR Offloading Inference Serving via Proactive SchedulingProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661894(451-464)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661894
Rastikerdar MHuang JFang SGuan HGanesan DOkoshi TKo JLiKamWa R(2024)CACTUS: Dynamically Switchable Context-aware micro-Classifiers for Efficient IoT InferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661888(505-518)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661888
Xu YZhang DZhang SWu SFeng ZChen G(2024)Predictive and Near-Optimal Sampling for View Materialization in Video DatabasesProceedings of the ACM on Management of Data10.1145/36392742:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639274
Mendoza DRomero FTrippel C(2024)Model Selection for Latency-Critical Inference ServingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629565(1016-1038)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629565
Rahmanian AAmin SGustafsson HAli-Eldin A(2024)CVFProceedings of the 15th ACM Multimedia Systems Conference10.1145/3625468.3647627(231-242)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3625468.3647627
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents