[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

NoScope: optimizing neural network queries over video at scale

Published: 01 August 2017 Publication History

Abstract

Recent advances in computer vision---in the form of deep neural networks---have made it possible to query increasing volumes of video data with high accuracy. However, neural network inference is computationally expensive at scale: applying a state-of-the-art object detector in real time (i.e., 30+ frames per second) to a single video requires a $4000 GPU. In response, we present NoScope, a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search. Given a target video, object to detect, and reference neural network, NoScope automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive. NoScope cascades two types of models: specialized models that forego the full generality of the reference model but faithfully mimic its behavior for the target video and object; and difference detectors that highlight temporal differences across frames. We show that the optimal cascade architecture differs across videos and objects, so NoScope uses an efficient cost-based optimizer to search across models and cascades. With this approach, NoScope achieves two to three order of magnitude speed-ups (265-15,500x real-time) on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1--5% of state-of-the-art neural networks.

References

[1]
Typical cnn architecture. Creative Commons Attribution-Share Alike 4.0 International, Wikimedia Commons.
[2]
Cisco VNI forecast and methodology, 2015--2020. Technical report, 2016.
[3]
2017. https://fortunelords.com/youtube-statistics/.
[4]
D. J. Abadi et al. The design of the Borealis stream processing engine. In CIDR, 2005.
[5]
C.-N. E. Anagnostopoulos et al. License plate recognition from still images and video sequences: A survey. IEEE Trans. on intelligent transportation systems, 2008.
[6]
W. Aref et al. Video query processing in the VDBMS testbed for video database research. In ACM-MMDB, 2003.
[7]
F. Arman et al. Image processing on compressed data for large video databases. In ACMMM, 1993.
[8]
J. Ba and R. Caruana. Do deep nets really need to be deep? In NIPS, 2014.
[9]
B. Babcock et al. Operator scheduling in data stream systems. VLDBJ, 2004.
[10]
B. Babenko et al. Robust object tracking with online multiple instance learning. IEEE trans. on pattern analysis and machine intelligence, 2011.
[11]
S. Babu et al. Adaptive ordering of pipelined stream filters. In SIGMOD, 2004.
[12]
P. Bailis et al. Macrobase: Prioritizing attention in fast data. In SIGMOD, 2017.
[13]
P. Bailis, E. Gan, K. Rong, and S. Suri. Prioritizing attention in fast data: Principles and promise. In CIDR, 2017.
[14]
M. G. Bello. Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks. IEEE Trans. on Neural networks.
[15]
R. Benenson et al. Ten years of pedestrian detection, what have we learned? In ECCV, 2014.
[16]
R. Brunelli, O. Mich, and C. M. Modena. A survey on the automatic indexing of video data. Journal of visual communication and image representation, 1999.
[17]
C. Bucilua et al. Model compression. In KDD, 2006.
[18]
Z. Cai et al. Learning complexity-aware cascades for deep pedestrian detection. In ICCV, 2015.
[19]
S. Chandrasekaran et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, 2003.
[20]
S. Chaudhuri and V. Narasayya. Self-tuning database systems: a decade of progress. In VLDB, 2007.
[21]
S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. TODS, 24(2):177--228, 1999.
[22]
W. Chen et al. Compressing neural networks with the hashing trick. In ICML, 2015.
[23]
Y. Cheng et al. An exploration of parameter redundancy in deep networks with circulant projections. In ICCV, 2015.
[24]
Z. Chi et al. Dual deep network for visual tracking. IEEE Trans. on Image Processing, 2017.
[25]
D. Crankshaw et al. The missing piece in complex analytics: Low latency, scalable model management and serving with Velox, 2015.
[26]
C. Cranor et al. Gigascope: a stream database for network applications. In SIGMOD, 2003.
[27]
M. Danelljan et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV, 2016.
[28]
P. Dollar et al. Pedestrian detection: An evaluation of the state of the art. TPAMI, 34(4):743--761, 2012.
[29]
J. Donahue et al. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
[30]
D. Du et al. Online deformable object tracking based on structure-aware hyper-graph. IEEE Trans. on Image Processing, 2016.
[31]
A. Esteva et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115--118, 2017.
[32]
M. Everingham et al. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
[33]
C. Feichtenhofer et al. Spatiotemporal residual networks for video action recognition. In NIPS, 2016.
[34]
P. F. Felzenszwalb et al. Object detection with discriminatively trained part-based models. IEEE TPAMI, 2010.
[35]
X. Feng et al. Towards a unified architecture for in-RDBMS analytics. In SIGMOD, 2012.
[36]
M. Flickner et al. Query by image and video content: The QBIC system. Computer, 1995.
[37]
A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
[38]
S. Gibbs et al. Audio/video databases: An object-oriented approach. In ICDE, 1993.
[39]
L. Girod et al. Wavescope: a signal-oriented data stream management system. In ICDE, 2006.
[40]
I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.
[41]
S. Han et al. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
[42]
S. Han et al. EIE: efficient inference engine on compressed deep neural network. In ISCA, 2016.
[43]
W. Han et al. Seq-nms for video object detection. CoRR, abs/1602.08465, 2016.
[44]
K. He et al. Deep residual learning for image recognition. In CVPR, 2016.
[45]
G. Hinton et al. Distilling the knowledge in a neural network. NIPS, 2014.
[46]
G. Hinton and T. Tieleman. Lecture 6.5 - rmsprop. Technical report, 2012.
[47]
S. Idreos, M. L. Kersten, S. Manegold, et al. Database cracking. In CIDR, 2007.
[48]
R. Jain and A. Hampapur. Metadata in video databases. In SIGMOD, 1994.
[49]
H. Jiang et al. Scene change detection techniques for video database systems. Multimedia systems, 1998.
[50]
K. Kang et al. Object detection from video tubelets with convolutional neural networks. In CVPR, pages 817--825, 2016.
[51]
K. Kang et al. T-cnn: Tubelets with convolutional neural networks for object detection from videos. arXiv preprint arXiv:1604.02532, 2016.
[52]
V. Kastrinaki, M. Zervakis, and K. Kalaitzakis. A survey of video processing techniques for traffic applications. Image and vision computing, 2003.
[53]
J. B. Kim and H. J. Kim. Efficient region-based motion segmentation for a video monitoring system. Pattern Recognition Letters, 24(1):113--128, 2003.
[54]
T. Kraska et al. Mlbase: A distributed machine-learning system. In CIDR, 2013.
[55]
M. Kristan et al. The visual object tracking vot2016 challenge results. ECCV, 2016.
[56]
A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[57]
M. La Cascia and E. Ardizzone. Jacob: Just a content-based query system for video databases. In ICASSP, 1996.
[58]
Y. LeCun et al. Deep learning. Nature, 521(7553):436--444, 2015.
[59]
J. Lee, J. Oh, and S. Hwang. Strg-index: Spatio-temporal region graph indexing for large video databases. In SIGMOD, 2005.
[60]
H. Li et al. A conv. neural network cascade for face detection. In CVPR, 2015.
[61]
K. Lin et al. Deep learning of binary hash codes for fast image retrieval. In CVPR, 2015.
[62]
T.-Y. Lin et al. Microsoft coco: Common objects in context. September 2014.
[63]
D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, 1999.
[64]
D. Lu and Q. Weng. A survey of image classification methods and techniques for improving classification performance. International journal of Remote sensing, 28(5):823--870, 2007.
[65]
H. Maâmatou et al. Sequential Monte Carlo filter based on multiple strategies for a scene specialization classifier. EURASIP Journal on Image and Video Processing, 2016.
[66]
J. Malik. Technical perspective: What led computer vision to deep learning? Communications of the ACM, 60(6):82--83, 2017.
[67]
C. Metz. AI is about to learn more like humans---with a little uncertainty. Wired, 2017. https://goo.gl/yCvSSz.
[68]
A. Mhalla et al. Faster R-CNN scene specialization with a sequential Monte-Carlo framework. In DICTA, 2016.
[69]
K. Munagala et al. Optimization of continuous queries with shared expensive filters. In SIGMOD, 2007.
[70]
D. Murray and A. Basu. Motion tracking with an active camera. IEEE Trans. Pattern Anal. Mach. Intell., 16(5):449--459, May 1994.
[71]
H. Nam et al. Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242, 2016.
[72]
V. E. Ogle and M. Stonebraker. Chabot: Retrieval from a relational database of images. Computer, 28(9):40--48, 1995.
[73]
J. Oh and K. A. Hua. Efficient and cost-effective techniques for browsing and indexing large video databases. In SIGMOD, 2000.
[74]
S. A. Papert. The summer vision project. 1966.
[75]
A. Patait and E. Young. High performance video encoding with NVIDIA GPUs. 2016 GPU Technology Conference (https://goo.gl/Bdjdgm), 2016.
[76]
J. Philbin et al. Object retrieval with large vocabularies and fast spatial matching. In CVPR, pages 1--8. IEEE, 2007.
[77]
M. Rastegari et al. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV, 2016.
[78]
B. Reagen et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ISCA, 2016.
[79]
J. Redmon et al. You only look once: Unified, real-time object detection. CVPR, 2016.
[80]
J. Redmon and A. Farhadi. Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016.
[81]
S. Ren et al. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015.
[82]
O. Russakovsky et al. Imagenet large scale visual recognition challenge. IJCV, 2015. http://image-net.org/.
[83]
P. G. Selinger et al. Access path selection in a relational database management system. In SIGMOD, 1979.
[84]
P. Sermanet et al. Pedestrian detection with unsupervised multi-stage feature learning. In CVPR, 2013.
[85]
Y. Sun et al. Deep conv. network cascade for facial point detection. In CVPR, 2013.
[86]
R. Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.
[87]
B. Tian, Q. Yao, Y. Gu, K. Wang, and Y. Li. Video processing techniques for traffic flow monitoring: A survey. IEEE, 2011.
[88]
J. R. Uijlings et al. Selective search for object recognition. IJCV, 2013.
[89]
R. Verschae et al. A unified learning framework for object detection and classification using nested cascades of boosted classifiers. Machine Vision and Applications, 19(2), 2008.
[90]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.
[91]
P. Weinzaepfel et al. Learning to track for spatio-temporal action localization. In ICCV, 2015.
[92]
Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, pages 2411--2418, 2013.
[93]
A. Yilmaz et al. Object tracking: A survey. CSUR, 2006.
[94]
A. Yoshitaka and T. Ichikawa. A survey on content-based retrieval for multimedia databases. TKDE, 11(1):81--93, 1999.
[95]
K.-H. Yu et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Comm., 2016.
[96]
J. Yue-Hei Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. In CVPR, pages 53--61, 2015.
[97]
H. Zhang et al. Live video analytics at scale with approximation and delay-tolerance. NSDI, 2017.
[98]
S. Zhanget et al. How far are we from solving pedestrian detection? In CVPR, 2016.
[99]
W. Zhao et al. Face recognition: A literature survey. CSUR, 2003.
[100]
Q. Zhu et al. Fast human detection using a cascade of histograms of oriented gradients. In CVPR. IEEE, 2006.
[101]
X. Zhu et al. Video data mining: Semantic indexing and event detection from the association perspective. TKDE, 17(5):665--677, 2005.
[102]
B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 10, Issue 11
August 2017
432 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2017
Published in PVLDB Volume 10, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)6
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Databases Unbound: Querying All of the World's Bytes with AIProceedings of the VLDB Endowment10.14778/3685800.368591617:12(4546-4554)Online publication date: 1-Aug-2024
  • (2024)Optimizing Video Queries with Declarative CluesProceedings of the VLDB Endowment10.14778/3681954.368199817:11(3256-3268)Online publication date: 1-Jul-2024
  • (2024)Biathlon: Harnessing Model Resilience for Accelerating ML Inference PipelinesProceedings of the VLDB Endowment10.14778/3675034.367505217:10(2631-2640)Online publication date: 1-Jun-2024
  • (2024)Spatialyze: A Geospatial Video Analytics System with Spatial-Aware OptimizationsProceedings of the VLDB Endowment10.14778/3665844.366584617:9(2136-2148)Online publication date: 1-May-2024
  • (2024)AIDB: a Sparsely Materialized Database for Queries using Machine LearningProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663329(23-28)Online publication date: 9-Jun-2024
  • (2024)ARISE: High-Capacity AR Offloading Inference Serving via Proactive SchedulingProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661894(451-464)Online publication date: 3-Jun-2024
  • (2024)CACTUS: Dynamically Switchable Context-aware micro-Classifiers for Efficient IoT InferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661888(505-518)Online publication date: 3-Jun-2024
  • (2024)Predictive and Near-Optimal Sampling for View Materialization in Video DatabasesProceedings of the ACM on Management of Data10.1145/36392742:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Model Selection for Latency-Critical Inference ServingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629565(1016-1038)Online publication date: 22-Apr-2024
  • (2024)CVFProceedings of the 15th ACM Multimedia Systems Conference10.1145/3625468.3647627(231-242)Online publication date: 15-Apr-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media