A Robust Tracking-by-Detection Algorithm Using Adaptive Accumulated Frame Differencing and Corner Features
<p>Block diagram of combination of Adaptive Accumulated Frame differencing (AAFD) and corner detector for people detection.</p> "> Figure 2
<p>Examples of region of movement extraction with/without application of our motion coefficient (Frame 4921 of the ES2002a video from Augmented Multiparty Interaction (AMI) video data sets), where the first image is region of movement extraction without using motion, the second image is region of movement extraction based on coefficient of motion and the third image is accumulated frame differencing with window size = 100. Using this coefficient of motion reduces errors resulting from compression artefacts.</p> "> Figure 3
<p>Sample images to show corner detection; <b>left</b>: without a corner scoring function; <b>right</b>: with a corner scoring function.</p> "> Figure 4
<p>Example from ES2002a video. Frame 1700 shows the meeting room when participants are sitting around the table. Frame 7009 is an example of the challenge when people are passing each other. Frames 7034 and 8679 show touching cases. Examples of occlusion (overlapping) are shown in frames 8714 and 8716.</p> "> Figure 5
<p>Detection results for slow movement (i.e., people are sitting). Left to right, frame number, original frame, region of movement, detection results.</p> "> Figure 6
<p>Detection results for different motions of people (i.e., low motion when people are sitting and significant motion when they are moving in a meeting room).</p> "> Figure 7
<p>Sample image to show detection result when meeting participants are near each other.</p> "> Figure 8
<p>Example results produced by a generative tracker. The first row illustrates detecting the head of a person based on his head template. The detection results were very robust when the person started moving even though his head shape changed with respect to the viewing angle of the camera. The disadvantage of template matching is that strong colour features are needed to detect the head when its shape changes. The second row again shows good results where a meeting participant sits with only small changes to his head shape. However, the technique fails when the person starts moving, due to larger changes in the appearance and in the absence of strong colour features.</p> "> Figure 9
<p>Qualitative evaluation of our approach FD_Corner in comparison with three baseline and state of art trackers. Tracking results for two examples from the meeting video sequence 01 shows that all trackers exhibit relatively good performance in a sequence whith no major appearance change.</p> "> Figure 10
<p>Qualitative evaluation of our approach FD_Corner in comparison with three baseline and state of art trackers on sequence 02 in terms of appearance change. Our tracker FD corner outperforms discriminative correlation filter with channel and spatial reliability (CSRDCF), Kernelized Correlation Filter (KCF), multiple instance learning (MIL) trackers in scenarios where people start moving in the meeting room and their appearance change significantly due to partial occlusion (person’s head out of camera view).</p> "> Figure 11
<p>The precision plot for sequences with attributes: background clutter, deformation, illumination variation, scale variation and occlusion. Our FD Corner tracker perform favourably on background clutters, deformation and illumination variation attributes.</p> ">
Abstract
:1. Introduction
1.1. The Need for Meeting Support Technologies
1.2. Meetings
2. Related Work
2.1. People Detection and Tracking in Meeting Applications
2.1.1. Tracking Approaches Using Single Video Cameras
2.1.2. 3D Vision-Based Tracking Algorithms
2.1.3. Tracking Approaches Using Omnidirectional Cameras
2.2. Object Detection and Tracking
2.2.1. Object Detection Based on Motion-Based Methods
Motion Detection Methods Based on Background Modelling
Other Motion Detection Methods
- Type A: insertion of the stationary moving object into the scene by a human, such as a backpack or suitcase.
- Type B: insertion of a moving object that has become static without any interaction with a human (e.g., a vehicle that has been parked).
- Type C: a moving person that becomes totally or partially static.
2.2.2. Object Tracking
Generative Tracking Methods
Discriminative Tracking Methods
- Firstly, at every time step, t, the object location is known by the tracker. Furthermore, the tracker crops out a set of image patches inside the region of interest (within a radius, s, of the current tracker location), and feature vectors are calculated.
- Secondly, the MIL classifier is used to calculate the probability of each image patch being foreground.
- The tracker location is updated based on the image patch with the highest probability.
- A set of positive and negative versions of image patches is cropped, and the MIL appearance model is updated with one positive bag and a number of negative bags (each bag containing a single negative image patch).
- In the first frame, the model is trained with an image patch obtained based on the initial position of the object.
- For a new frame, a test image is extracted based on the current location of the bounding box. After that, the target is detected by finding the maximum score location and updating the target position (bounding box location).
- Finally, a new model is trained based on the new location [77].
3. Combining Accumulated Frame Differencing and Corner Detection for Motion Detection
3.1. Outline of Combination of AAFD and Corner Detection Technique
3.1.1. Detection of Moving Region
- People detection is applied based on an accumulated frame differencing image using a large temporal window size(e.g., Temporal window size = 100). Starting with a large window size allows the robust segmentation of all foreground pixels, even if they belong to objects moving very little.
- For each detected blob, motion analysis using the shape features of the blob is applied. Two shape features, namely “fill ratio” and “blob area”, are used to accept or reject blobs. Fill ratio refers to the area of the blob divided by the area of its bounding box. Acceptable blobs are assumed to be quite square and their fill ratio will be closer to 1 than 0. Fast moving objects lead to elongated blobs with a low fill ratio. If the fill ratio is smaller than a defined threshold, we conclude that the blob relates to the either the merging of multiple nearby objects, or to a fast-moving single object. With a large temporal window size, blob area is therefore a suitable feature for rejecting small blobs (due to noise) and large blobs (due to merged or fast moving objects).
- Finally, the detection is executed again with a different temporal window size based on the shape feature of the blob,(e.g, larger temporal window size = 150, small temporal window size = 25).
Lossy Compression Issues
- Firstly, an ROI(x,y) is converted to a binary image:
- Secondly, the coefficient of motion is calculated =
3.1.2. Detection of Object Features (Shi-Tomasi Corner Detection)
Combining Corners with Motion
4. Results and Discussion
4.1. Performance Evaluation in a Meeting Context
4.1.1. Data Set
4.1.2. Evaluation Methodology
Qualitative Evaluation
Quantitative Evaluation
4.1.3. Tracking Evaluation Results Using Clear-Mot Metrics
Test Objective and Parameters
Experimental Results and Discussion
4.1.4. Tracking Evaluation Results Using Track Quality Measures
Test Objective and Parameters
Experimental Results and Discussion
4.1.5. Comparison with Published Results of Multiple People Tracking in Clear-Mot Workshops 2006
4.1.6. Comparison with Baseline and Top Performing Tracking Methods
Test Objective and Parameters
Experimental Results and Discussion
- Quantitative Analysis on the Entire Video Sequence
- Quantitative Analysis on Each Video Sequence
- Qualitative Evaluation
- Robustness to Initialisation
4.2. Attribute-Based Evaluation on Generic Visual Object Tracking Dataset
4.2.1. Test Objective and Parameters
4.2.2. Experimental Results and Discussion
5. Conclusions
6. Limitations and Future Work
Author Contributions
Funding
Conflicts of Interest
Abbreviations
ROI | Region of movement |
FD_Corner | Frame differencing Corner |
ROI | region of interest |
AAFD | Adaptive Accumulated Frame Differencing |
MIL | online multiple instance learning |
KCF | Kernelized Correlation Filter |
CSRDCF | discriminative correlation filter with channel and spatial reliability |
TLD | Tracking-learning-detection |
References
- Yu, Z.; Nakamura, Y. Smart meeting systems: A survey of state-of-the-art and open issues. ACM Comput. Surv. (CSUR) 2010, 42, 8. [Google Scholar] [CrossRef]
- Renals, S.; Bourlard, H.; Carletta, J.; Popescu-Belis, A. Multimodal Signal Processing: Human Interactions in Meetings; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- McCowan, I.; Bengio, S.; Gatica-Perez, D.; Lathoud, G.; Monay, F.; Moore, D.; Wellner, P.; Bourlard, H. Modeling human interaction in meetings. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), Hong Kong, China, 6–10 April 2003; Volume 4, p. IV-748. [Google Scholar]
- McCowan, L.; Gatica-Perez, D.; Bengio, S.; Lathoud, G.; Barnard, M.; Zhang, D. Automatic analysis of multimodal group actions in meetings. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 305–317. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Janin, A.; Baron, D.; Edwards, J.; Ellis, D.; Gelbart, D.; Morgan, N.; Peskin, B.; Pfau, T.; Shriberg, E.; Stolcke, A.; et al. The ICSI meeting corpus. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), Hong Kong, China, 6–10 April 2003; Volume 1. [Google Scholar]
- McCowan, I.; Carletta, J.; Kraaij, W.; Ashby, S.; Bourban, S.; Flynn, M.; Guillemot, M.; Hain, T.; Kadlec, J.; Karaiskos, V.; et al. The AMI meeting corpus. In Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, Wageningen, The Netherlands, 30 August–2 September 2005; Volume 88, p. 100. [Google Scholar]
- Mostefa, D.; Moreau, N.; Choukri, K.; Potamianos, G.; Chu, S.M.; Tyagi, A.; Casas, J.R.; Turmo, J.; Cristoforetti, L.; Tobia, F.; et al. The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms. Lang. Resour. Eval. 2007, 41, 389–407. [Google Scholar] [CrossRef]
- Tur, G.; Stolcke, A.; Voss, L.; Dowding, J.; Favre, B.; Fernández, R.; Frampton, M.; Frandsen, M.; Frederickson, C.; Graciarena, M.; et al. The CALO meeting speech recognition and understanding system. In Proceedings of the 2008 IEEE Spoken Language Technology Workshop, Goa, India, 15–19 December 2008; pp. 69–72. [Google Scholar]
- Waibel, A.; Bett, M.; Finke, M.; Stiefelhagen, R. Meeting browser: Tracking and summarizing meetings. In Proceedings of the DARPA broadcast news workshop. Citeseer, Lansdowne, VA, USA, 8–11 February 1998; pp. 281–286. [Google Scholar]
- Waibel, A.; Bett, M.; Metze, F.; Ries, K.; Schaaf, T.; Schultz, T.; Soltau, H.; Yu, H.; Zechner, K. Advances in automatic meeting record creation and access. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, 7–11 May 2001; Volume 1, pp. 597–600. [Google Scholar]
- Gross, R.; Bett, M.; Yu, H.; Zhu, X.; Pan, Y.; Yang, J.; Waibel, A. Towards a multimodal meeting record. In Proceedings of the 2000 IEEE International Conference on Multimedia and Expo (ICME2000), New York, NY, USA, 30 July–2 August 2000; Volume 3, pp. 1593–1596. [Google Scholar]
- Cutler, R.; Rui, Y.; Gupta, A.; Cadiz, J.J.; Tashev, I.; He, L.w.; Colburn, A.; Zhang, Z.; Liu, Z.; Silverberg, S. Distributed meetings: A meeting capture and broadcasting system. In Proceedings of the tenth ACM international conference on Multimedia, Juan-les-Pins, France, 1–6 December 2002; pp. 503–512. [Google Scholar]
- Lee, D.S.; Erol, B.; Graham, J.; Hull, J.J.; Murata, N. Portable meeting recorder. In Proceedings of the tenth ACM international conference on Multimedia, Juan-les-Pins, France, 1–6 December 2002; pp. 493–502. [Google Scholar]
- Trivedi, M.M.; Huang, K.S.; Mikic, I. Dynamic context capture and distributed video arrays for intelligent spaces. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2005, 35, 145–163. [Google Scholar] [CrossRef]
- Mikic, I.; Huang, K.; Trivedi, M. Activity monitoring and summarization for an intelligent meeting room. In Proceedings of the Workshop on Human Motion, Austin, TX, USA, 7–8 December 2000; pp. 107–112. [Google Scholar]
- Patil, R.; Rybski, P.E.; Kanade, T.; Veloso, M.M. People detection and tracking in high resolution panoramic video mosaic. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Sendai, Japan, 28 September–2 October 2004; Volume 2, pp. 1323–1328. [Google Scholar]
- Yu, Z.; Ozeki, M.; Fujii, Y.; Nakamura, Y. Towards smart meeting: Enabling technologies and a real-world application. In Proceedings of the 9th International Conference on Multimodal Interfaces, Aichi, Japan, 12–15 November 2007; pp. 86–93. [Google Scholar]
- Ronzhin, A.; Karpov, A. A software system for the audiovisual monitoring of an intelligent meeting room in support of scientific and education activities. Pattern Recognit. Image Anal. 2015, 25, 237–254. [Google Scholar] [CrossRef]
- Stiefelhagen, R.; Bowers, R.; Fiscus, J. Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8–11, 2007, Revised Selected Papers; Springer: Berlin, Germany, 2008; Volume 4625. [Google Scholar]
- Bernardin, K.; Gehrig, T.; Stiefelhagen, R. Multi-and single view multiperson tracking for smart room environments. In International Evaluation Workshop on Classification of Events, Activities and Relationships; Springer: Berlin, Germany, 2006; pp. 81–92. [Google Scholar]
- Odobez, J.M.; Lanz, O. Sampling techniques for audio-visual tracking and head pose estimation. In Multimodal Signal Processing: Human Interactions in Meetings; Cambridge University Press: Cambridge, UK, 2012; number BOOK_CHAP. [Google Scholar]
- Popescu-Belis, A.; Carletta, J. Multimodal Signal Processing for Meetings: An Introduction. In Multimodal Signal Processing: Human Interactions in Meetings; Cambridge University Press: Cambridge, UK, 2012; number BOOK_CHAP. [Google Scholar]
- Ahmed, I.; Adnan, A. A robust algorithm for detecting people in overhead views. Cluster Comput. 2017, 21, 633–654. [Google Scholar] [CrossRef]
- Hradiš, M.; Juránek, R. Real-time tracking of participants in meeting video. In Proceedings of the CESCG, Columbia, MD, USA, 18–21 April 2006. [Google Scholar]
- Nait-Charif, H.; McKenna, S.J. Head tracking and action recognition in a smart meeting room. In Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Beijing, China, 12–13 October 2003. [Google Scholar]
- Wu, B.; Nevatia, R. Tracking of multiple humans in meetings. In Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06), New York, NY, USA, 17–22 June 2006; p. 143. [Google Scholar]
- Li, Y.; Nevatia, R. Key object driven multi-category object recognition, localization and tracking using spatio-temporal context. In European Conference on Computer Vision; Springer: Berlin, Germany, 2008; pp. 409–422. [Google Scholar]
- Abad, A.; Canton-Ferrer, C.; Segura, C.; Landabaso, J.L.; Macho, D.; Casas, J.; Hernando, J.; Pardàs, M.; Nadeu, C. UPC audio, video and multimodal person tracking systems in the CLEAR evaluation campaign. In International Evaluation Workshop on Classification of Events, Activities and Relationships; Springer: Berlin, Germany, 2006; pp. 93–104. [Google Scholar]
- Katsarakis, N.; Souretis, G.; Talantzis, F.; Pnevmatikakis, A.; Polymenakos, L. 3D audiovisual person tracking using Kalman filtering and information theory. In International Evaluation Workshop on Classification of Events, Activities and Relationships; Springer: Berlin, Germany, 2006; pp. 45–54. [Google Scholar]
- Wallhoff, F.; Zobl, M.; Rigoll, G.; Potucek, I. Face tracking in meeting room scenarios using omnidirectional views. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK, 26 August 2004; Volume 4, pp. 933–936. [Google Scholar]
- Stiefelhagen, R.; Garofolo, J. Multimodal Technologies for Perception of Humans: First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006, Southampton, UK, 6–7 April 2006, Revised Selected Papers; Springer: Berlin, Germany, 2007; Volume 4122. [Google Scholar]
- Balaji, S.; Karthikeyan, S. A survey on moving object tracking using image processing. In Proceedings of the 2017 11th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India, 5–6 January 2017; pp. 469–474. [Google Scholar]
- Lin, Y.; Tong, Y.; Cao, Y.; Zhou, Y.; Wang, S. Visual-attention-based background modeling for detecting infrequently moving objects. IEEE Trans. Circuits Syst. Video Technol. 2016, 27, 1208–1221. [Google Scholar] [CrossRef]
- Vasuhi, S.; Vijayakumar, M.; Vaidehi, V. Real time multiple human tracking using kalman filter. In Proceedings of the 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN), Chennai, India, 26–28 March 2015; pp. 1–6. [Google Scholar]
- Supreeth, H.; Patil, C.M. Efficient multiple moving object detection and tracking using combined background subtraction and clustering. Signal Image Video Process. 2018, 12, 1097–1105. [Google Scholar] [CrossRef]
- Ye, Q.; Gu, R.; Ji, Y. Human detection based on motion object extraction and head–shoulder feature. Optik 2013, 124, 3880–3885. [Google Scholar] [CrossRef]
- Mahalingam, T.; Subramoniam, M. A robust single and multiple moving object detection, tracking and classification. Appl. Comput. Inform. 2018. Available online: https://www.sciencedirect.com/science/article/pii/S221083271730217X (accessed on 5 January 2018). [CrossRef]
- MartínezMartín, E.; Del Pobil, A.P. Robust Motion Detection in Real-Life Scenarios; Springer: Berlin, Germany, 2012. [Google Scholar]
- Abdulrahim, K.; Salam, R.A. Cumulative frame differencing for urban vehicle detection. In Proceedings of the First International Workshop on Pattern Recognition, Tokyo, Japan, 11–13 May 2016; Volume 10011. [Google Scholar] [CrossRef]
- Wren, C.R.; Azarbayejani, A.; Darrell, T.; Pentland, A.P. Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 780–785. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Dong, J.; Zhang, B.; Xu, D. Background modeling methods in video analysis: A review and comparative evaluation. CAAI Trans. Intell. Technol. 2016, 1, 43–60. [Google Scholar] [CrossRef] [Green Version]
- Stauffer, C.; Grimson, W.E.L. Adaptive background mixture models for real-time tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, USA, 23–25 June 1999; Volume 2, pp. 246–252. [Google Scholar]
- Zivkovic, Z. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK, 26 August 2004; Volume 2, pp. 28–31. [Google Scholar]
- Zhang, Y.; Liang, Z.; Hou, Z.; Wang, H.; Tan, M. An adaptive mixture gaussian background model with online background reconstruction and adjustable foreground mergence time for motion segmentation. In Proceedings of the 2005 IEEE International Conference on Industrial Technology, Hong Kong, China, 14–17 December 2005; pp. 23–27. [Google Scholar]
- Bouwmans, T. Traditional and recent approaches in background modeling for foreground detection: An overview. Comput. Sci. Rev. 2014, 11, 31–66. [Google Scholar] [CrossRef]
- Elgammal, A.; Harwood, D.; Davis, L. Non-parametric model for background subtraction. In European Conference on Computer Vision; Springer: Berlin, Gernmany, 2000; pp. 751–767. [Google Scholar]
- Barnich, O.; Van Droogenbroeck, M. ViBe: A powerful random technique to estimate the background in video sequences. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 945–948. [Google Scholar]
- Toyama, K.; Krumm, J.; Brumitt, B.; Meyers, B. Wallflower: Principles and practice of background maintenance. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 1, pp. 255–261. [Google Scholar]
- Heikkila, J.; Silvén, O. A real-time system for monitoring of cyclists and pedestrians. In Proceedings of the Second IEEE Workshop on Visual Surveillance (VS’99) (Cat. No. 98-89223), Fort Collins, CO, USA, 26 June 1999; pp. 74–81. [Google Scholar]
- Farnebäck, G. Two-frame motion estimation based on polynomial expansion. In Scandinavian Conference on Image Analysis; Springer: Berlin, Germany, 2003; pp. 363–370. [Google Scholar]
- Han, X.; Gao, Y.; Lu, Z.; Zhang, Z.; Niu, D. Research on moving object detection algorithm based on improved three frame difference method and optical flow. In Proceedings of the 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC), Qinhuangdao, China, 18–20 September 2015; pp. 580–584. [Google Scholar]
- Leng, B.; Dai, Q. Video Object Segmentation Based on Accumulative Frame Difference. 2007. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.377.4067 (accessed on 21 April 2020).
- Zhang, D.; Lu, G. Segmentation of moving objects in image sequence: A review. Circuits Syst. Signal Process. 2001, 20, 143–183. [Google Scholar] [CrossRef]
- Zhang, T.; Zhou, G.; Zhang, C.; Li, G.; Chen, J.; Liu, K. A novel temporal-spatial variable scale algorithm for detecting multiple moving objects. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 627–641. [Google Scholar] [CrossRef]
- Chinchkhede, D.; Uke, N. Image segmentation in video sequences using modified background subtraction. Int. J. Comput. Sci. Inf. Technol. 2012, 4, 93. [Google Scholar]
- Guo, J.; Wang, J.; Bai, R.; Zhang, Y.; Li, Y. A New Moving Object Detection Method Based on Frame-difference and Background Subtraction. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2017; Volume 242, p. 012115. [Google Scholar]
- Bobick, A.F.; Davis, J.W. The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 257–267. [Google Scholar] [CrossRef] [Green Version]
- Morde, A.; Ma, X.; Guler, S. Learning a background model for change detection. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 15–20. [Google Scholar]
- Ahad, M.A.R.; Tan, J.K.; Kim, H.; Ishikawa, S. Motion history image: Its variants and applications. Mach. Vis. Appl. 2012, 23, 255–281. [Google Scholar] [CrossRef]
- Cuevas, C.; Martínez, R.; García, N. Detection of stationary foreground objects: A survey. Comput. Vis. Image Underst. 2016, 152, 41–57. [Google Scholar] [CrossRef]
- Pan, J.; Fan, Q.; Pankanti, S. Robust abandoned object detection using region-level analysis. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 3597–3600. [Google Scholar]
- Martínez, R.; Cuevas, C.; Berjón, D.; García, N. Detection of static moving objects using multiple nonparametric background models. In Proceedings of the 2015 International Symposium on Consumer Electronics (ISCE), Madrid, Spain, 24–26 June 2015; pp. 1–2. [Google Scholar]
- Danelljan, M.; Shahbaz Khan, F.; Felsberg, M.; Van de Weijer, J. Adaptive color attributes for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1090–1097. [Google Scholar]
- Babenko, B.; Yang, M.H.; Belongie, S. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1619–1632. [Google Scholar] [CrossRef] [Green Version]
- Jia, X.; Lu, H.; Yang, M.H. Visual tracking via adaptive structural local sparse appearance model. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1822–1829. [Google Scholar]
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-Learning-Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1409–1422. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Comaniciu, D.; Ramesh, V.; Meer, P. Real-time tracking of non-rigid objects using mean shift. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000) (Cat. No. PR00662), Hilton Head Island, SC, USA, 15 June 2000; Volume 2, pp. 142–149. [Google Scholar]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. 1981. Available online: https://ri.cmu.edu/pub_files/pub3/lucas_bruce_d_1981_2/lucas_bruce_d_1981_2.pdf (accessed on 21 April 2020).
- Smeulders, A.W.; Chu, D.M.; Cucchiara, R.; Calderara, S.; Dehghan, A.; Shah, M. Visual tracking: An experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1442–1468. [Google Scholar]
- Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference, Reading, UK, 31 August–2 September 1988; Volume 15. [Google Scholar]
- Shi, J.; Tomasi, C. Good features to track. In Proceedings of the 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’94), Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar]
- Jansari, D.; Parmar, S. Novel object detection method based on optical flow. In Proceedings of the 3rd International Conference on Emerging Trends in Computer and Image Processing (ICETCIP’2013), Kuala Lumpur, Malaysia, 8–9 January 2013; pp. 197–200. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kalal, Z.; Matas, J.; Mikolajczyk, K. P-N learning: Bootstrapping binary classifiers by structural constraints. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 49–56. [Google Scholar] [CrossRef] [Green Version]
- Chen, Z.; Hong, Z.; Tao, D. An experimental survey on correlation filter-based tracking. arXiv 2015, arXiv:1509.05520. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the circulant structure of tracking-by-detection with kernels. In European Conference on Computer Vision; Springer: Berlin, Germany, 2012; pp. 702–715. [Google Scholar]
- Henriques, J.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef] [Green Version]
- Lukežič, A.; Voj’iř, T.; Čehovin Zajc, L.; Matas, J.; Kristan, M. Discriminative Correlation Filter Tracker with Channel and Spatial Reliability. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6309–6318. [Google Scholar]
- OpenCV-Documentation. GoodFeaturesToTrack Open CV Method at: Opencv3.0.0, Documentation, org.opencv.imgproc Package, Imgproc Class. Available online: http://docs.opencv.org/java/3.0.0/ (accessed on 8 April 2018).
- Algethami, N.; Redfern, S. Combining accumulated frame differencing and corner detection for motion detection. In Proceedings of the Conference on Computer Graphics & Visual Computing, Delft, The Netherlands, 16–20 April 2018; pp. 7–14. [Google Scholar]
- Smith, K.; Gatica-Perez, D.; Odobez, J.M.; Ba, S. Evaluating multi-object tracking. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; p. 36. [Google Scholar]
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. J. Image Video Process. 2008, 2008, 1. [Google Scholar] [CrossRef] [Green Version]
- Wu, B.; Nevatia, R. Tracking of multiple, partially occluded humans based on static body part detection. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 951–958. [Google Scholar]
- Bernardin, K.; Elbs, A.; Stiefelhagen, R. Multiple object tracking performance metrics and evaluation in a smart room environment. In Proceedings of the Sixth IEEE International Workshop on Visual Surveillance, Graz, Austria, 13 May 2006; Volume 90, p. 91. [Google Scholar]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. CVPR 2001, 1, 511–518. [Google Scholar]
- Held, D.; Thrun, S.; Savarese, S. Learning to track at 100 fps with deep regression networks. In European Conference on Computer Vision; Springer: Berlin, Germany, 2016; pp. 749–765. [Google Scholar]
- Uddin, M.A.; Lee, Y.K. Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition. Sensors 2019, 19, 1599. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 568–576. [Google Scholar]
Seq01 | Seq02 | Seq03 | Seq04 | Seq05 | Seq06 | |
---|---|---|---|---|---|---|
Total No of frames | 2484 | 4533 | 104 | 2775 | 17875 | 1775 |
event: sit down | Y | Y | Y | Y | Y | Y |
Event: occlusion (passing each other) | N | Y | N | N | N | Y |
Event: occlusion (walk past each other) | N | N | Y | N | N | Y |
Event: leaving | N | N | N | N | N | Y |
Event: touching /people close to each other | Y | Y | Y | N | Y | Y |
Event: stand up(walking) | N | Y | Y | Y | N | Y |
Description | All people are sitting | People are sitting and start moving to the whiteboard. Occlusion( passing each other) | occlusion between two people (Fully overlapping) | People are sitting, while one is moving with no overlapping | All people are sitting | Some participants leaving the meeting room (walking near each other when they are leaving) |
Sequence # | MOTP | FN | FP | Mismatches | MOTA |
---|---|---|---|---|---|
Sequence 01 | 17 | 0 | 0 | 0 | 100% |
Sequence 02 | 20 | 12 | 12 | 0 | 96.70% |
Sequence 03 | 15 | 2 | 2 | 0 | 80% |
Sequence 04 | 15 | 0 | 0 | 0 | 100% |
Sequence 05 | 18 | 33 | 33 | 0 | 97.70% |
Sequence 06 | 22 | 16 | 42 | 1 | 77.40% |
Overall | 18 | 241 | 267 | 1 | 89.20% |
Sequence # | MT | PT | ML |
---|---|---|---|
Sequence 01 | 4 | 0 | 0 |
Sequence 02 | 4 | 0 | 0 |
Sequence 03 | 3 | 1 | 0 |
Sequence 04 | 4 | 0 | 0 |
Sequence 05 | 4 | 0 | 0 |
Sequence 06 | 4 | 0 | 0 |
Overall | 4 | 0 | 0 |
Tracking Method | Clear-MOT Metrics | ||||
---|---|---|---|---|---|
MOTA | FN | FP | Mismatches | MOTA | |
FD_Corner | 18.67 | 241 | 267 | 1 | 89.2% |
MIL | 35.69 | 902 | 925 | 7 | 61.00% |
KCF | 18.87 | 329 | 357 | 5 | 85.30% |
CSRDCF | 22.47 | 317 | 340 | 3 | 86.00% |
Sequences/ Clear-MOT Metrics | Tracker | |||||||
---|---|---|---|---|---|---|---|---|
Our FD_Corner | KCF | MIL | CSRDCF | |||||
MOTP | MOTA | MOTP | MOTA | MOTP | MOTA | MOTP | MOTA | |
Seq01 | 17 | 100.00% | 6 | 100.00% | 11 | 100.00% | 11 | 100.00% |
Seq02 | 20 | 96.70% | 16 | 55.20% | 25 | 65.50% | 18 | 72.10% |
Seq03 | 15 | 80.00% | 20 | 70.00% | 12 | 80.00% | 12 | 50.00% |
Seq04 | 15 | 100.00% | 8 | 49.30% | 24 | 73.90% | 17 | 100.00% |
Seq05 | 18 | 97.70% | 20 | 100.00% | 27 | 98.70% | 19 | 97.80% |
Seq06 | 22 | 77.40% | 25 | 55.60% | 23 | 44.10% | 18 | 62.10% |
Sequences/MOTA | FD_Corner | KCF | MIL | CSRDCF |
---|---|---|---|---|
Seq01 | 100.00% | 100.00% | 100.00% | 100.00% |
Seq02 | 96.70% | 55.20% | 65.50% | 72.10% |
Seq04 | 100.00% | 49.30% | 73.90% | 100.00% |
Seq05 | 97.70% | 100.00% | 98.70% | 97.80% |
Seq06 | 77.40% | 55.60% | 44.10% | 62.10% |
Average | 94.36 | 72.02 | 76.44 | 86.40 |
Segments # | FD_Corner | KCF | MIL | CSR-DCF |
---|---|---|---|---|
Segm01 F4150-F8650 | 96.70% | 55.20% | 65.50% | 72.10% |
Segm02 F5000-F8650 | 99.30% | 58.50% | 57.30% | 70.10% |
Segm03 F6300-F8650 | 98.90% | 52.60% | 65.80% | 63.20% |
Average | 98.30% | 55.43% | 62.87% | 68.47% |
Sequence | Total NO of frames | Attributes |
---|---|---|
1. Crossing | 95 | SV, DEF, FM, OPR, BC |
2. Crowds | 322 | IV, DEF, BC |
3. Human5 | 688 | SV, OCC, DEF |
4. RedTeam | 1893 | SV, OCC, IPR, OPR, LR |
5. Walking | 387 | SV, OCC, DEF |
6. Walking2 | 475 | SV, OCC, LR |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Algethami, N.; Redfern, S. A Robust Tracking-by-Detection Algorithm Using Adaptive Accumulated Frame Differencing and Corner Features. J. Imaging 2020, 6, 25. https://doi.org/10.3390/jimaging6040025
Algethami N, Redfern S. A Robust Tracking-by-Detection Algorithm Using Adaptive Accumulated Frame Differencing and Corner Features. Journal of Imaging. 2020; 6(4):25. https://doi.org/10.3390/jimaging6040025
Chicago/Turabian StyleAlgethami, Nahlah, and Sam Redfern. 2020. "A Robust Tracking-by-Detection Algorithm Using Adaptive Accumulated Frame Differencing and Corner Features" Journal of Imaging 6, no. 4: 25. https://doi.org/10.3390/jimaging6040025
APA StyleAlgethami, N., & Redfern, S. (2020). A Robust Tracking-by-Detection Algorithm Using Adaptive Accumulated Frame Differencing and Corner Features. Journal of Imaging, 6(4), 25. https://doi.org/10.3390/jimaging6040025