Abstract
In this paper, we investigate the number of training samples required for deep learning techniques to achieve better accuracy of inspection than a human on a simple visual inspection task. We also examine whether there are differences in terms of finding anomalies when deep learning techniques outperform human subjects. To this end, we design a simple task that can be performed by non-experts. It required that participants distinguish between normal and anomalous symbols in images. We automatically generated a large number of training samples containing normal and anomalous symbols in the task. The results show that the deep learning techniques required several thousand training samples to detect the locations of the anomalous symbols and tens of thousands to divide these symbols into segments. We also confirmed that deep learning techniques have both advantages and disadvantages in the task of identifying anomalies compared with humans.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Beyerer, J., Leon, F.P., Frese, C.: Machine Vision: Automated Visual Inspection: Theory, Practice and Applications. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47794-6
Cha, Y.J., Choi, W., Suh, G., Mahmoudkhani, S., Buyukozturk, O.: Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput. Aided Civil Infrastruct. Eng. 33(9), 731–747 (2018)
Cowan, N.: The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav. Brain Sci. 24(1), 87–114 (2001)
Dodge, S., Karam, L.: A study and comparison of human and deep learning recognition performance under visual distortions. In: Proceedings of the 26th International Conference on Computer Communication and Networks, ICCCN, pp. 1–7 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778 (2016)
Huang, S.H., Pan, Y.C.: Automated visual inspection in the semiconductor industry: a survey. Comput. Ind. 66, 1–10 (2015)
Wu, J., Zhong, S., Ma, Z., Heinen, S.J., Jiang, J.: Gaze aware deep learning model for video summarization. In: Proceedings of Pacific Rim Conference on Multimedia, pp. 285–295 (2018)
Kheradpisheh, S.R., Ghodrati, M., Ganjtabesh, M., Masquelier, T.: Deep networks can resemble human feed-forward vision in invariant object recognition. Sci. Rep. 6(32672), 1–24 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)
Murrugarra-Llerena, N., Kovashka, A.: Learning attributes from human gaze. In: Proceedings of Winter Conference on Applications of Computer Vision, pp. 510–519 (2017)
Newman, T.S., Jain, A.K.: A survey of automated visual inspection. Comput. Vis. Image Underst. 61(2), 231–262 (1995)
Rekabdar, B., Mousas, C.: Dilated convolutional neural network for predicting driver’s activity. In: Proceedings of International Conference on Intelligent Transportation Systems, pp. 3245–3250. IEEE (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (2015)
Tavakoli, H.R., Rahtu, E., Kannala, J., Borji, A.: Digging deeper into egocentric gaze prediction. In: Proceedings of Winter Conference on Applications of Computer Vision, pp. 273–282. IEEE (2019)
Qiao, T., Dong, J., Xu, D.: Exploring human-like attention supervision in visual question answering (2018)
Wang, X., Gao, L., Song, J., Zhen, X., Sebe, N., Shen, H.T.: Deep appearance and motion learning for egocentric activity recognition. Neurocomputing 275, 438–447 (2018)
Yu, Q., Yang, Y., Liu, F., Song, Y.Z., Xiang, T., Hospedales, T.M.: Sketch-a-net: a deep neural network that beats humans. Int. J. Comput. Vis. 122(3), 411–425 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kato, N., Inoue, M., Nishiyama, M., Iwai, Y. (2020). Comparing the Recognition Accuracy of Humans and Deep Learning on a Simple Visual Inspection Task. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-41299-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41298-2
Online ISBN: 978-3-030-41299-9
eBook Packages: Computer ScienceComputer Science (R0)