Abstract
Object context has been verified its significance for appearance modeling in different proposed tracking-by-detection approaches. Unfortunately, the restrictive representation of the target’s contextual relationship within spatial domain has intensively limited its utility with high-level classification strategies. By investigating the learning capability of long-term dependencies from sequential data, in this paper, we propose a novel appearance model by transforming the target contextual dependency into a semantic sequential representation. It can be effectively utilized by a recurrent neural network embedded with bidirectional long short-term memory cells for online tracking-by-learning. Based on the trained BLSTM-RNN model, a searching mechanism by labeling score is proposed to improve the tracking robustness. With the implied appearance variation by labeling, the proposed tracking method has demonstrated to outperform most of state-of-the-art trackers on challenging benchmark videos via a heuristic strategy for model updating.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adam A, Rivlin E, Shimshoni I (2006) Robust fragments-based tracking using the integral histogram. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 798–805
Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 983–990
Dengue L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3–4):197–387
Fan B, Wang L, Soong FK, Xie L (2015) Photo-real talking head with deep bidirectional LSTM. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4884–4888
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 580–587
Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting. In: The British machine vision conference (BMVC), BMVA Press, pp 6.1–6.10
Graves A (2012a) Offline Arabic handwriting recognition with multidimensional recurrent neural networks. In: Guide to OCR for Arabic Scripts. Springer, London, pp 297–313
Graves A (2012b) Supervised sequence labelling with recurrent neural networks. Stud Comput Intell, vol 385. Springer, Berlin
Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. IEEE international conference on acoustics speech and signal processing (ICASSP), IEEE, pp 6645–6649
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. CoRR abs/1503.04069
Hare S, Saffari A, Torr PHS (2011) Struck: structured output tracking with kernels. In: The international conference on computer vision (ICCV), IEEE, pp 263–270
Henriques JAF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: The European conference on computer vision (ECCV), Springer International Publishing, pp 702–715
Hochreiter S, Bengio Y, Frasconi P (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Field guide to dynamical recurrent networks, IEEE Press
Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. In: The international conference on machine learning (ICML), JMLR Workshop and Conference Proceedings, pp 597–606
Hong Z, Wang C, Mei X, Prokhorov D, Tao D (2014) Tracking using multilevel quantizations. In: The European conference on computer vision (ECCV), vol 8694. Springer International Publishing, pp 155–171
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Nat Acad Sci 79(8):2554–2558
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3128–3137
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25 (NIPS), Curran Associates, Inc., pp 1097–1105
Kwon J, Lee KM (2011) Tracking by sampling trackers. In: The international conference on computer vision (ICCV), pp 1195–1202
Li H, Li Y, Porikli F (2014) Robust online visual tracking with a single convolutional neural network. In: The Asian conference on computer vision (ACCV). Springer International Publishing, pp 194–209
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3367–3375
Mei X, Hong Z, Prokhorov D, Tao D (2015) Robust multitask multiview tracking in videos. IEEE Trans Neural Netw Learn Syst 26(11):2874–2890
Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling. In: The international conference on machine learning (ICML), JMLR Workshop and Conference Proceedings, pp 82–90
Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1–3):125–141
Sak H, Senior AW, Rao K, Beaufays F (2015) Fast and accurate recurrent neural network acoustic models for speech recognition. In: InterSpeech, IEEE
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Advances in neural information processing systems 26 (NIPS), Curran Associates, Inc., pp 2553–2561
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Advances in neural information processing systems 26 (NIPS), Curran Associates, Inc., pp 809–817
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2411–2418
Zeiler MD, Fergus R (2014) The European conference on computer vision (ECCV). Visualizing and understanding convolutional networks. Springer International Publishing, pp 818–833
Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization. In: The European conference on computer vision (ECCV). Springer International Publishing
Zhang K, Zhang L, Yang MH (2012) Real-time compressive tracking. In: The European conference on computer vision (ECCV). Springer International Publishing, pp 864–877
Zhong W, Lu H, Yang MH (2012) Robust object tracking via sparsity-based collaborative model. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 1838–1845
Zhou X, Xie L, Zhang P, Zhang Y (2014) An ensemble of deep neural networks for object tracking. In: IEEE international conference on image processing (ICIP), pp 843–847
Zhou X, Xie L, Zhang P, Zhang Y (2015) Online object tracking based on CNN with metropolis-hasting re-sampling. In: The 23rd ACM international conference on multimedia (ACM MM), ACM, pp 1163–1166
Zuo Z, Shuai B, Wang G, Liu X, Wang X, Wang B, Chen Y (2015) Convolutional recurrent neural networks: learning spatial dependencies for image representation. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 18–26
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61571363) and The National High Technology Research and Development Program of China (Grant No. 2015AA016402).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Zhou, X., Xie, L., Zhang, P. et al. Online object tracking based on BLSTM-RNN with contextual-sequential labeling. J Ambient Intell Human Comput 8, 861–870 (2017). https://doi.org/10.1007/s12652-017-0514-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-017-0514-4